top | item 41010589

(no title)

MilStdJunkie | 1 year ago

I mentioned it in a separate parent, but null purge is - for the stuff I work with - completely non-negotiable. Nulls seem to break virtually everything, just by existing. Furthermore, old-timey PDFs are chock full of the things, for God knows what reason, and a huge amount of data I work with are old-timey PDF.

discuss

order

toast0|1 year ago

> Furthermore, old-timey PDFs are chock full of the things, for God knows what reason, and a huge amount of data I work with are old-timey PDF.

Probably UCS-2/UTF-16 encoding with ascii data.

MilStdJunkie|1 year ago

It's hard to localize. Early Postscript - PDF software was the wild west, particularly when it comes to the text streams. Something I've noticed is that they're used a LOT in things like randlists (bullet lists), tab leaders, other "space that isn't a space".

I'm reminded of how you have to use `{empty}` character refs in lightweight markup like Asciidoc to "hold your place" in a list, in case you need the first element in the list to be an admonition or block. Like so:

  . {empty}
  +
  WARNING: Don't catch yourself on fire!
  +
  Pour the gasoline.
And the equivalent XML which would be something like

  <procedure>
    <step>
      <warning> Don't catch yourself on fire!</warning>
      <para>Pour the gasoline.</para>
    </step>
  </procedure>
This is one of those rare cases where the XML is more elegant than the lightweight markup. That hack with `{empty}` bugs me.

Anyways, I'm spitballing that these old-timey nulls I'm seeing are being employed in an equivalent way, some sort of internal bespoke workaround to a format restriction.