top | item 35870837

(no title)

jkwchui | 2 years ago

Hello. Font's author here. You and Jeff are correct in guessing this is (ab)using ligatures maximally :) To satisfy your curiosity, we can go deeper.

----

Conceptually it is simple: 1. assign a default (most likely) sound for each character, 2. loop through contexts, extracting words (char-combos) where the sound is different from the default ("alt-word") 3. create SVGs + font-paths (fallback for incompatible systems) for every char and every alt-word 4. assign a ligature to substitute each char-sequence that forms the alt-word (e.g., "when 乾隆 appears adjacently, replace with `uniF1234` (the codepoint for the alt-word 乾隆")

It is not perfect, but I didn't expect this to work so well, and was stunned when the testers report high accuracy. I have always believed that bespoke computation with word segmentation (with some 1M frequency attached library) and large data-bank (100k+ words) was necessary.

----

Practically it was horrific, tedious, mind-numbing, gawd-awful set of "why this doesn't work": 1. SVG automation that works for 10^3 breaks with 10^5 2. what worked for Latin breaks for unicode 3. what worked for unicode breaks for PUA 4. what worked for monochrome breaks for color 5. what worked for single glyphs breaks for ligatures 6. what?! The assignments in the database is wrong?? 7. [...]

As I was trying to coerce the system to do what it wasn't designed to do, many of these breaks are undocumented, pretty mysterious to solve, and some steps just got manually gritted through. (And each of the 15k+ glyphs got gritted through about five times.)

It does look pretty elegant at the end ;)

discuss

ackfoobar|2 years ago

In the FAQ you mentioned

> Unfortunately, without being able to do proper word segmentation, this will remain a limitation.

Can the user manually add a zero width space to help?

jkwchui|2 years ago

Technically yes, but the general public probably doesn't have a concept of zero-width space.

(For everyone else wonder what ackfoobar is proposing: let's take the phrase (if you don't read Chinese, just treat them as shapes) 香港地少人多, properly segmented, is 香港.地少.人多. The font treats this incorrectly, because "香港地" is a commonly used fragment, the 地 in the fragment have a special sound, and parsing as 香港地.少.人多 gives a mistaken sound for 地.

Ackfoobar is absolutely correct that we can coerce the correct reading by going 香港[ ]地少人多 --- where the [ ] is an invisible spacer. My contention is that most users don't know how to do that in their favorite word processor.

Someone is probably thinking, could you add "香港地少" as a fragment? Purist says it's not pretty, but I'm a pragmatist, so I did do many of these patching. Doing this or not relies on some acumen as a native speaker, and there were hundreds of these decisions made. This language knowledge would be necessary if someone were to do Mandarin (or Thai or, ...))

jfk13|2 years ago

This is an awesome piece of work - congratulations!

I notice you're using OpenType-SVG here; have you investigated whether it would be possible to implement this using COLRv1 (which would potentially result in a lighter-weight font, I suspect, and eventually wider support)? Or are there technical limitations in COLRv1 that make it impossible?

jkwchui|2 years ago

Color fonts really hasn't converged into a standard, and their adoption is slow. OpenType-SVG was accepted 10 years ago, and it was implemented into FreeType only one year ago --- it hasn't even trickled down to most Linux distros (nor is it usable on Windows). I don't see COLRv1 in Win/Mac/Linux until 2026 at the earliest.

But I did try to make it into COLRv1 (as well as COLR/CPAL). The only tools that build COLRv1 right now are the tools from the Google Fonts team; I remember them stalling for hours before saying completion, yet the output was broken (I can't remember how it was broken).

I personally would love to see a COLR/CPAL version, and have some idea on how that could happen. But I probably should be working on some revenue-generating product instead ;)

creamyhorror|2 years ago

That is amazing work. You've really plumbed the depths of what's possible with font technology, kudos.

jkwchui|2 years ago

Thank you. Who is really amazing is Simon Cozens, who wrote a set of articles on fonts/global script: https://simoncozens.github.io/fonts-and-layout/

The history of digital fonts added a great deal of complexity to font formats, and without him writing such a concise yet comprehensive guide, I would have been stuck for even longer.