top | item 17082829

(no title)

MichaelGG | 7 years ago

Fast checking is really useful in things like HTTP/SIP parsing. Rust should expose such a function as well seeing as their strings must be UTF-8 validated. Though it's even faster if you can just avoid utf8 strings and work only on a few known ASCII bytes, it means you might push garbage further down the line.

discuss

masklinn|7 years ago

> Rust should expose such a function as well seeing as their strings must be UTF-8 validated.

That's more or less what std::str::from_utf8 is: it runs UTF8 validation on the input slice, and just casts it to an &str if it's valid: https://doc.rust-lang.org/src/core/str/mod.rs.html#332-335

from_utf8_unchecked nothing more than an unsafe (c-style) cast: https://doc.rust-lang.org/src/core/str/mod.rs.html#437 and so should be a no-op at runtime.

MichaelGG|7 years ago

I meant Rust should have a SIMD optimised version that assumes mostly ASCII. I'm guessing there is a trade-off involved depending on the content of the string.

chrismorgan|7 years ago

std::str::from_bytes is that API.

Rust’s current implementation of full validation: https://github.com/rust-lang/rust/blob/2a3f5367a23a769a068c3...

I have a vague feeling there’s an even faster path for probably-ASCII out there, but I can’t immediately recall where and am going to bed. Doubtless someone else will answer before I get up.

The core team will be amenable to replacing this algorithm with something faster presuming it’s still correct.

kzrdude|7 years ago

Given the new simd features, it's probably time to revisit that now