top | item 39398028

(no title)

Eiim | 2 years ago

I ran a quick test on 100 semi-random files I had laying around. Of those, 81 were detected correctly, 6 were detected as the wrong file type, and 12 were detected with an unspecific file type (unknown binary/generic text) when a more specific type existed. In 4 of the unspecific cases, a low-confidence guess was provided, which was wrong in each case. However, almost all of the files which were detected wrong/unspecific are of types not supported by Magika, with one exception of a JSON file containing a lot of JS code as text, which was detected as JS code. For comparison, file 5.45 (the version I happened to have installed) got 83 correct, 6 wrong, and 10 not specific. It detected the weird JSON correctly, but also had its own strange issues, such as detecting a CSV as just "data". The "wrong" here was somewhat skewed by the 4 GLSL shader code files that were in the dataset for some reason, all of which it detected as C code (Magika called them unknown). The other two "wrong" detections were also code formats that it seems it doesn't support. It was also able to output a lot more information about the media files. Not sure what to make of these tests but perhaps they're useful to somebody.

discuss

order

pizzalife|2 years ago

> The "wrong" here was somewhat skewed by the 4 GLSL shader code files that were in the dataset for some reason, all of which it detected as C code

To be fair though, a snippet of GLSL shader code can be perfectly valid C.

Eiim|2 years ago

Indeed, which is why I felt the need to call it out here. I'm not certain if the files on question actually happened to be valid C but whether that's a meaningful mistake regardless is left to the reader to decide.