Shapecatcher: Draw the Unicode character you want

[+] apendleton|3 years ago|reply

Seems to be matching against one particular typeface, and is very sensitive to how similar what you write is to that typeface. The first thing I tried was a lowercase "a," which, in my handwriting, is a one-story "a," and it had no idea what it was. Likewise it guessed poorly at my lowercase "f" because the top of my "f" curves back down, and its doesn't, etc. Seems like it would benefit from a dataset with more variations on how various characters are written in practice.

Neat concept though!

[+] croon|3 years ago|reply

I tried multiple times to match an ampersand, and thought I did a fairly accurate job.

I eventually managed to get it as a third match, but this seems to be what it matches against [0], which isn't the ampersand I'm used to (nor the one you get in a google image search).

[0] https://shapecatcher.com/unicode_img/38.png

[+] Symbiote|3 years ago|reply

It's now useful for arrows, mathematics etc.

[+] crazygringo|3 years ago|reply

Only halfway related, but a long time ago I had an idea for a character encoding that was stroke based.

In other words, based on minimal stroke primitives (line, arc, circle, etc.) that were placed not with any exact coordinates, but simply in relation to each other conceptually and with crude size/position categories. E.g. "downward stroke, top to bottom" is a capital "I" while "downward stroke, middle to bottom, dot closely above first stroke" would be a lowercase "i". And then for matching forms with different meanings, there would be a final "family" selector, e.g. to distinguish an em-dash from the Chinese character for "one", or an en-dash ("punctuation" family) from a minus sign ("math symbol" family).

And then a suitably compressed bit encoding for the instructions. So in the end, something like "I" might just be 3-4 bits long, while a complex Chinese glyph might be 60 bits.

But the main feature being that a font renderer could always draw a primitive version of any glyph, even if you don't have it in a single font anywhere, because the character code itself encodes it. And then that character codes wouldn't be just something totally arbitrary invented by Unicode, but inherently meaningful, and anyone could be free to invent any character they wanted, that would always been drawn by any software, no Unicode gatekeepers needed.

Obviously it's not terribly practical, for a whole host of reasons. But I still sometimes think about how elegant it would be to have a "geometric" self-describing character encoding, and to get away from all of the political decisions around language scripts and where they get put in Unicode and in which version.

[+] bradrn|3 years ago|reply

Problem is, how do you define these representations in the first place? For one thing, some characters have multiple different forms… like a/ɑ, or g/ɡ [0]. Then you have characters which differ only minutely — like Thai ช/ซ, or Ethiopic ሀ/ህ, or Cherokee Ꭺ/Ꭿ. And, worse, those characters can also look entirely different between fonts (see e.g. [1] for Thai). So by the time you’ve finished working through all those choices, and created a format which can distinguish them all, you’ve effectively created yet another font — just in a very lossy vectorised format.

Or if you go the opposite direction, and expect each individual character to make its own individual choices in this regard — well, that basically has the same problems as PDFs: it may look good, but it’s totally impossible to process programmatically, since it overspecifies the visual details at the expense of semantics. And although that might be fine and even desirable for certain usecases, it does limit the places where this format could be used.

That being said, I can certainly see possibilities and places where this could be useful to me. Perhaps I’ll have a go at implementing this someday.

[0] In case those look the same in your font, here they are again:

    a/ɑ  g/ɡ

And in case those look the same too, then… well, have a look at the codepoints in the Unicode reference charts, I guess!

[1] https://wrdingham.co.uk/thai/tellthai_preface.htm

[+] lifthrasiir|3 years ago|reply

You will appreciate RFC 5242 [1].

[1] https://datatracker.ietf.org/doc/html/rfc5242

[+] bentcorner|3 years ago|reply

It's an interesting idea, my first thoughts are how do you make it machine readable. Like if you're writing a browser how do you translate something that "looks" like google.com and know you need to go to google.com and not googIe.com?

Maybe you don't bother (i.e., don't try to parse bytes in this encoding as plain text) but that has a bunch of consequences too.

[+] keybored|3 years ago|reply

Sounds like a glyph encoding.

[+] mapierce2|3 years ago|reply

Reminds me of Detexify, a similar tool for learning the LaTeX codes for a symbol.

http://detexify.kirelabs.org/classify.html

[+] hnlmorg|3 years ago|reply

I’ve been using this site sporadically for nearly a decade. It’s been a handy resource.

[+] wcoenen|3 years ago|reply

This reminds me of qhanzi, which I found to be super useful for studying Chinese characters.

https://www.qhanzi.com/

[+] dontlaugh|3 years ago|reply

Pleco does this well too.

[+] yellow_lead|3 years ago|reply

I can't get it to work for any chinese characters.

[+] roland35|3 years ago|reply

Hmm, it doesn't seem to recognize "Egyptian Hieroglyph D053", no matter how accurately I draw it!

[+] marginalia_nu|3 years ago|reply

Drew a h-bar. It found Cyrillic tshe (ћ), as well as h with a stroke (ħ), even Planck's constant (h),

... but not ℏ.

[+] rippercushions|3 years ago|reply

This doesn't appear to even try to match to Chinese characters? I drew the ones for convex and concave (凸 and 凹, respectively), but only got back dominoes and APL symbols as matches. Drawing a box returns the Japanese katakana ロ, but not the Chinese hanzi 口.

[+] nyx|3 years ago|reply

Yeah, the text to the right of the input is explicit about this:

> Currently, there are 11817 unicode character glyphs in the database. Japanese, Korean and Chinese characters are currently not supported.

[+] plank|3 years ago|reply

Tried it with β and with ξ but did not get a match. (Disclaimer, I did eventually get the β working, but first versions only gave other results such as P, ρ or Բ).

[+] BugsJustFindMe|3 years ago|reply

I cannot for the life of me get it to recognize a snowman.

[+] Isognoviastoma|3 years ago|reply

After failing too many times looked up how it is supposed to look <https://img.shapecatcher.com/svg/9731.svg>, then got it on second approach <https://cdn.imgpaste.net/2023/01/14/K6IkIp.png>.

[+] scatters|3 years ago|reply

Try an 8 with the cross-bar erased, and two eyes - nothing else. That gives me U+26C4. I can't get U+2603, though.

[+] fuzzythinker|3 years ago|reply

It will be great if it can find all variants of:

   ‾⎻⎼⎽ lines

   _ light lines

| bar, ⎸left bar, right bar⎹

different angles of /, \

etc...

I'm using these for documentation and monodraw is only useful up to a point and references to these are scattered in different pages in no relation for drawing purposes.

I tried drawing a \, but shapecatcher only show just "\" and only if it's semi-close to 45°.

Edit: Thanks @mapierce2 , Detexify seems to work better for this purpose, but the results seems to be images, not text.

[+] psidebot|3 years ago|reply

I looked up the snowman in several fonts and copied as best I could. Utter failure. If it can't find the snowman....

[+] maxbond|3 years ago|reply

Snowman didn't work for me, but the other characters I tried did. I had some trouble with "therefore" because there were a lot of virtually identical symbols, but I can't blame it for that.

It mentions not all characters are in the the database.

ETA: it's got a pretty weird snowman. http://shapecatcher.com/unicode/info/9731

By immitating this rendition I was able to bring it up.

[+] chrisux|3 years ago|reply

Awesome, I got it to work with all the characters I tried.

It would be nice for it to also give alt codes for it in the output

like é = [alt 1 3 0]

[+] lokedhs|3 years ago|reply

That just the decimal representation of the code point, isn't it?

[+] SecurityNoob|3 years ago|reply

I am rather impressed with this and will definitely make good use of it.

[+] quechimba|3 years ago|reply

Funny, I just found this on Google yesterday when looking for some glyphs.

[+] singularity2001|3 years ago|reply

are Egyptian not hieroglyphs not included? because it didn't recognize even one of my impeccable drawings

[+] wruza|3 years ago|reply

U+23FB ⏻ POWER SYMBOL

Doesn’t recognize.

[+] 6451937099|3 years ago|reply

[deleted]

[+] o_____________o|3 years ago|reply

dammit, we all know what most of you tried first.

[+] thom|3 years ago|reply

Glagolitic capital letter dobro?

[+] keybored|3 years ago|reply

⑅

37 comments