top | item 5150107

Shapecatcher: Draw the Unicode character you want

271 points| barredo | 13 years ago |shapecatcher.com

107 comments

order
[+] Hupo|13 years ago|reply
Couldn't get it to find the unicode snowman[1]: http://i.imgur.com/gaIY9Gd.png (my drawing skills are awesome, no?)

But that aside, this looks like a neat idea. Not something I have any immediate use for myself, but could certainly be useful in some situations.

[1] http://www.fileformat.info/info/unicode/char/2603/index.htm

[+] fredley|13 years ago|reply
Can someone tell me why 'Cat with a wry smile' is in Unicode? Presumably at some point someone thought that it would be useful to somebody else, hence it's inclusion. It would be very interesting to hear the back-story behind such seemingly useless glyphs.
[+] wmil|13 years ago|reply
I can't get it to recognize 'Pile of Poo' (U+1F4A9)

Major bug.

[+] lucb1e|13 years ago|reply
If a pc can't find it, perhaps drawing things is going to be the new captcha!
[+] flux_w42|13 years ago|reply
Hmm, I got the 0x26c4: Snowman without snow: ⛄
[+] quirm|13 years ago|reply
You have to draw the snowflakes
[+] seanp2k2|13 years ago|reply
For the curious, this explains a lot of the science in a hands-on, approachable way: http://stackoverflow.com/questions/10168686/algorithm-improv...

EDIT: As this is a deep topic, there are also books if that's more your style: http://www.amazon.com/dp/0123725380/?tag=stackoverfl08-20

or maybe you like Wikipedia (Ol' Trusty): http://en.wikipedia.org/wiki/Feature_detection_(computer_vis...

[+] rurounijones|13 years ago|reply
Damn that looks good, it worked for me for ② and u with umlauts.

Get the Japanese support in there and it will be amazing. What about using MS Mincho or MS Gothic for that? (It is free as in beer, but is the licensing off?)

[+] uvdiv|13 years ago|reply
My simple, clean ampersand (&) became a paperclip (0x1f4ce), a "fried shrimp" (0x1f364), several species of geometric triangle, and dozens of other silly, silly glyphs.

http://i.imgur.com/NLkl75J.png

the unicode block containing "fried shrimp" 0x1f364 -- why does this exist???

http://www.unicode.org/charts/PDF/U1F300.pdf

Latest Abstruse Goose comic sums up my emotional response:

http://abstrusegoose.com/496

[+] nwh|13 years ago|reply
Who doesn't love 🍤?

There's lot of other strange Unicode too. There's things like '⁢' (U2062 INVISIBLE TIMES), ⓞⓓⓓ ©ⓗⓐⓡⓐ©ⓣⓔⓡ ⓢⓔⓣⓢ and sɹǝʇɔɐɹɐɥɔ uʍop ǝpısdn.

All of which can be used to bypass filters and generally cause browser-crashing havoc. For example, this address looks like Google, but it really links to hacker news.

http://news.ycombinator.com/?/moc.elgoog//:ptth

[+] uvdiv|13 years ago|reply
Experiments with some common glyphs.

(!) -- 8th suggestion, after three apparently identical "upside-down 'i'" (including an "upside-down capital 'I' with a dot underneath")

(@) -- 1st suggestion

(#) -- not suggested. Top suggestion is "capital 'H' with stroke"

($) -- not suggested, although the 14th is an indistinguishable glyph "Canadian syllabics carrier sh" 0x165a, a phoenetic symbol for representing a Canadian aboriginal language.

(%) -- 1st suggestion

(^) -- 10th suggestion, to be fair this one is impossible

(&) -- not suggested (fried shrimp lol)

(*) -- 3rd suggestion

(?) -- 1st suggestion

(∫) -- 1st suggestion

(∂) -- 4th suggestion (1st one is same thing in boldface)

----------

This is suggests a really easy way to greatly improve the results: weight them by a prior probability (i.e., frequency of occurrence in a letter count). The OCR itself seems pretty good. Common math symbols are more likely than silly shrimps. Glyphs from common languages are more probable than esoteric ones, and real languages more than constructed ones. Plain glyphs are more common than variants (bold/italic) -- and these should be grouped together anyway. fried shrimp horseshoe.

[+] masklinn|13 years ago|reply
> the unicode block containing "fried shrimp" 0x1f364 -- why does this exist???

A softbank-based Emoji set was imported (in an extended form) into Unicode 6.0: http://en.wikipedia.org/wiki/Emoji#Emoji_in_the_Unicode_stan...

Because Emoji originated in japan messaging, a number of them relate to japanese culture: foodstuff (not just fried shrimp but rice balls, dango, oden, fugu), cultural practices (kadomatsu, hinamatsuri, koinobori, Fūrin wind chimes) and other such things which may be present in other cultures but usually not as prominently (e.g. Unicode Love Hotel)

[+] ChrisNorstrom|13 years ago|reply
Very nice and amazingly accurate. But am I using your website the right way? When I look through the characters in the search results, if I see one that isn't rendering on my computer and just has a block instead of the symbol. I've been clicking on "bad" for "rate this suggestion". Thinking that it tallies up the total good/bad for a character to mean "how likely people are to have this character installed and working on their computers".

However, I now have a feeling that's not what that feature is for.

[+] kayge|13 years ago|reply
My guess was that the Good/Bad rating is to help with some sort of Machine Learning going on the background. E.g. the way I draw my Ampersand might be slightly different than yours, so if either or both of us see the result we were hoping for (&), that should get our Good rating. If it returns an (8), it may or may not deserve a Bad. If something way off appears as a top result (^), that would be pretty Bad.
[+] adnam|13 years ago|reply
It recognized my drawing of a cactus! http://i.imgur.com/pDgIOPk.png
[+] seanp2k2|13 years ago|reply
Ooh my word. I drew a Skrillex and it recognized it perfectly; some Chuck Tays because he's a chill bro, school because his music appeals to people in school (12-24), and the Bengali vowel sound because he plays that in his "drops".

EDIT: forgot to link screenshot: http://imgur.com/LyaXy4h

[+] seanp2k2|13 years ago|reply
This is seriously the funniest thing I've seen on HN in weeks. Made me actually laugh.
[+] MojoJolo|13 years ago|reply
Damn. I think it's not intended as a cactus. Hahaha.
[+] WA|13 years ago|reply
I wonder how many people try to paint cacti as a quick test for tools like that ;)
[+] darrenkopp|13 years ago|reply
i must be pretty terrible at drawing the snowman, because I couldn't get it to match that.
[+] adnam|13 years ago|reply
My comment, with >36 votes, was flagged to oblivion. What a bunch of puritanical, cactus-haters you are, Hacker News.
[+] jQueryIsAwesome|13 years ago|reply
It's not every day that the top comment on reddit.com/r/funny is also the top comment on Hacker News.
[+] speeder|13 years ago|reply
I know this is supposed to be funny, but I had to flag it.

I mean, I did found it funny, but I did not enjoyed seeing that in my work (specially because I work with kids stuff)

[+] symmetricsaurus|13 years ago|reply
Tried to draw an alpha but it didn't get it.

So far I haven't tried Shapecatcher a lot but I think that http://detexify.kirelabs.org/classify.html works much better. Detexify is of course only for LaTeX symbols and doesn't do unicode.

To be useful Shapecatcher needs to become better at recognition.

[+] cygwin98|13 years ago|reply
Nice job done. Here are what I got: ᗧ ᗣ ᗤ ᗢ

could be awesome for a character-based PacMan impl.

[+] drucken|13 years ago|reply
"If you can't find Chinese, Japanese or Korean glyphs, it is because I have yet to find a good free CJK font to use."

Are there not some CJK (or otherwise) fonts from, for example, Linux distributions that could have been used?

Or perhaps the emphasis could be on clarifying what is meant by "good" that deserves excluding such a large and useful character space for this type of application?

[+] smcl|13 years ago|reply
Sounds cool but sadly hasn't worked for the letters I often need but haven't easy access to (czech characters like: ř, ď and š). Perhaps because the element that makes them distinct from the latin (the "haček") is so tiny.

Edit, I just drew the ř larger and it recognised it correctly. Cool :)

[+] k_bx|13 years ago|reply
(surprisingly noone said this) Too bad it doesn't generate links to drawing results.
[+] jessaustin|13 years ago|reply
Hosting uploaded images exposes sites to a great deal of annoyance.
[+] quirm|13 years ago|reply
I was working on this last weekend - sorry I didn't know someone would post this is again to ycombinator ;)
[+] fredley|13 years ago|reply
Trying to draw U+1F4A9 (Pile of Poo). After several attempts, no luck.

I have learnt that Unicode contains even more weirdness than I thought before though, including 'Alchemical symbol for borax-3' (U+1f744), and 'doughnut' (U+1f369).

[+] speeder|13 years ago|reply
Nice idea, too bad that I tried to draw several variations of PI, and it showed me several interesting characters, but never a PI.

Seriously, it even showed some very PI-like things, but not PI itself. This is a downer.

[+] the_gipsy|13 years ago|reply
Idea: instead of matching the shape of what the user has drawn raster-wise, let the user draw an svg-like path, and try to identify the letter by the trace.
[+] seanp2k2|13 years ago|reply
Agreed that a pen tool or some type of editor would be kind of nice, but for what he's going for (proving out an idea), this is still pretty fun. I know a little of the science behind it, but it'd be great to read through some well-commented source code. He did link in the thesis on this, however: http://shapecatcher.com/B_Milde%20-%20On%20The%20Security%20...
[+] a_p|13 years ago|reply
Does anyone know if the mirror image of this character: “ (U+201C) exists? I'm looking for a character that is the mirror image of the left double quotation mark, where the base is on the bottom and the character tapers from bottom right to top left. I don't know if any languages use that character.