top | item 45140085

(no title)

xmddmx | 5 months ago

I share the author's sentiment. I hate these things.

True story: trying to reverse engineer macOS Photos.app sqlite database format to extract human-readable location data from an image.

I eventually figured it out, but it was:

A base64 encoded Binary Plist format with one field containing a ProtoBuffer which contained another protobuffer which contained a unicode string which contained improperly encoded data (for example, U+2013 EN DASH was encoded as \342\200\223)

This could have been a simple JSON string.

discuss

order

tgma|5 months ago

> This could have been a simple JSON string.

There's nothing "simple" about parsing JSON as a serialization format.

Zambyte|5 months ago

Having attempted writing a JSON parser from scratch and a protobuf parser from scratch and only completing one of them, I disagree.

wvenable|5 months ago

Except that most often you can just look at it and figure it out.

fluoridation|5 months ago

I mean... you can nest-encode stuff in any serial format. You're not describing a problem either intrinsic or unique to Protobuf, you're just seeing the development org chart manifested into a data structure.

xmddmx|5 months ago

Good points this wasn't entirely a protobuf-specific issue, so much as it was a (likely hierarchical and historical set of) bad decisions to use it at all.

Using Protobuffers for a few KB of metadata, when the photo library otherwise is taking multiple GB of data, is just pennywise pound foolish.

Of course, even my preference for a simple JSON string would be problematic: data in a database really should be stored properly normalized to a separate table and fields.

My guess is that protobuffers did play a role here in causing this poor design. I imagine this scenario:

- Photos.app wants to look up location data

- the server returns structured data in a ProtoBuffer

- there's no easy or reasonable way to map a protobuf to database fields (one point of TFA)

- Surrender! just store the binary blob in SQLITE and let the next poor sod deal with it

pjjpo|5 months ago

The JSON version would have also had the wrong encoding - all formats are just a framing for data fed in from code written by a human. In mac's case, em dash will always be an issue because that's just what Mac decided on intentionally.

seanw444|5 months ago

That's horrendous. For some reason I imagine Apple's software to be much cleaner, but I guess that's just the marketing getting to my head. Under the hood it's still the same spaghetti.

ninkendo|5 months ago

Yeah, the problem is Apple and all the other contemporary tech companies have engineers bounce around between them all the time, and they take their habits with them.

At some point there becomes a critical mass of xooglers in an org, and when a new use case happens no one bothers to ask “how is serialization typically done in Apple frameworks”, they just go with what they know. And then you get protobuf serialization inside a plist. (A plist being the vanilla “normal” serialization format at Apple. Protobuf inside a plist is a sign that somebody was shoehorning what they’re comfortable with into the code.)

05|5 months ago

It that's any consolation, in the current version's schema they are just plain ZLATITUDE FLOAT, ZLONGITUDE FLOAT in ZASSET table..