"Whether to allow duplicate object entry names." This is interesting. I just did a test and it look like `jq` evaluates `{ "a": 1, "a": 2 }` to just `{ "a": 2 }`. I have always thought that this was invalid JSON. This mean that the order of keys in JSON do have some semantic meaning.
> The goal of this specification is only to define the syntax of valid JSON texts. Its intent is not to provide any semantics or interpretation of text conforming to that syntax.
So it is legal JSON although not useful with a lot of concrete implementations. Maybe a way to find an exciting security vulnerability involving two parsers differing in their interpretation...
"It is expected that the json-threat-protection crate will be faster than the serde_json crate because it never store the deserialized JSON Value in memory, which reduce the cost on memory allocation and deallocation."
"As you can see from the table, the json-threat-protection crate is faster than the serde_json crate for all datasets, but the number depends on the dataset. So you could get your own performance number by specifying the JSON_FILE to your dataset."
However:
"This project is not a parser, and never give you the deserialized JSON Value!"
Is this performance comparison to serde_json fair? If serde_json is a parser and has a different feature set than json-threat-protection, does it make sense to compare performance?
> If serde_json is a parser and has a different feature set than json-threat-protection, does it make sense to compare performance?
If you were using serde_json just to validate a payload before passing it on to another service (like a WAF), then the comparison makes sense. If you had more complex validations or wanted to extract some of the data, then maybe not.
This crate is not an alternative of the serde_json, it only do the validation.
Currently, there is no other crates do the sames validation works on JSON, so I have to parse the dataset by a common JSON parser (sede_json) and do the same validation on its deserialized value as the comparable results.
So it would be better to compare to other crates which do the same work, but I didn't found the similar crate so far. And this is also the reason I developed this crate.
I don't think it was intended to say that this crate is "better" than serde_json. I interpreted it to be a measurement of the overhead of adding it as an additional step on top of parsing.
The point of the article is to parse AND validate input AT THE BOUNDARY between the outside world and your program, rather than a bunch of ad-hoc validations at various points after the suspect data has entered the castle walls and has already been (at least partially) processed (thus making the program state harder to reason about). By enforcing your invariants at the border, you ensure that all data entering your system always conforms to your expectations, just like a strong type system ensures that invalid states are not representable. A schema is basically a type system for your raw data.
Great to see this article, I totally agreed with the view that rejecting any invalid case by designing the right data structure.
Unfortunately, it is hard to achieve it in practice and people even don't realize this, JSON Object is a good example, Human are incline expecting the duplicated key is not allowed in JSON, but it happens.
For this goal, I think the Protobuf is good way to eliminate the possible invalid data for data transportation.
anonymoushn|1 year ago
ADD-SP|1 year ago
blirio|1 year ago
ADD-SP|1 year ago
For human, this is invalid, but many web services accepts this kind of JSON consciously or unconsciously.
I'm guessing this may have become a feature of some services and it's hard for maintainers to break this behavior. ᵕ︵ᵕ
scottlamb|1 year ago
> The goal of this specification is only to define the syntax of valid JSON texts. Its intent is not to provide any semantics or interpretation of text conforming to that syntax.
So it is legal JSON although not useful with a lot of concrete implementations. Maybe a way to find an exciting security vulnerability involving two parsers differing in their interpretation...
thesuperbigfrog|1 year ago
"As you can see from the table, the json-threat-protection crate is faster than the serde_json crate for all datasets, but the number depends on the dataset. So you could get your own performance number by specifying the JSON_FILE to your dataset."
However:
"This project is not a parser, and never give you the deserialized JSON Value!"
Is this performance comparison to serde_json fair? If serde_json is a parser and has a different feature set than json-threat-protection, does it make sense to compare performance?
matthews2|1 year ago
If you were using serde_json just to validate a payload before passing it on to another service (like a WAF), then the comparison makes sense. If you had more complex validations or wanted to extract some of the data, then maybe not.
ADD-SP|1 year ago
Currently, there is no other crates do the sames validation works on JSON, so I have to parse the dataset by a common JSON parser (sede_json) and do the same validation on its deserialized value as the comparable results.
So it would be better to compare to other crates which do the same work, but I didn't found the similar crate so far. And this is also the reason I developed this crate.
michaelmior|1 year ago
peterkelly|1 year ago
kstenerud|1 year ago
The point of the article is to parse AND validate input AT THE BOUNDARY between the outside world and your program, rather than a bunch of ad-hoc validations at various points after the suspect data has entered the castle walls and has already been (at least partially) processed (thus making the program state harder to reason about). By enforcing your invariants at the border, you ensure that all data entering your system always conforms to your expectations, just like a strong type system ensures that invalid states are not representable. A schema is basically a type system for your raw data.
This concept is also a major element of Domain Driven Design https://en.wikipedia.org/wiki/Domain-driven_design
ADD-SP|1 year ago
Unfortunately, it is hard to achieve it in practice and people even don't realize this, JSON Object is a good example, Human are incline expecting the duplicated key is not allowed in JSON, but it happens.
For this goal, I think the Protobuf is good way to eliminate the possible invalid data for data transportation.