Can someone tell me why 'Cat with a wry smile' is in Unicode? Presumably at some point someone thought that it would be useful to somebody else, hence it's inclusion. It would be very interesting to hear the back-story behind such seemingly useless glyphs.
Damn that looks good, it worked for me for ② and u with umlauts.
Get the Japanese support in there and it will be amazing. What about using MS Mincho or MS Gothic for that? (It is free as in beer, but is the licensing off?)
My simple, clean ampersand (&) became a paperclip (0x1f4ce), a "fried shrimp" (0x1f364), several species of geometric triangle, and dozens of other silly, silly glyphs.
All of which can be used to bypass filters and generally cause browser-crashing havoc. For example, this address looks like Google, but it really links to hacker news.
(!) -- 8th suggestion, after three apparently identical "upside-down 'i'" (including an "upside-down capital 'I' with a dot underneath")
(@) -- 1st suggestion
(#) -- not suggested. Top suggestion is "capital 'H' with stroke"
($) -- not suggested, although the 14th is an indistinguishable glyph "Canadian syllabics carrier sh" 0x165a, a phoenetic symbol for representing a Canadian aboriginal language.
(%) -- 1st suggestion
(^) -- 10th suggestion, to be fair this one is impossible
(&) -- not suggested (fried shrimp lol)
(*) -- 3rd suggestion
(?) -- 1st suggestion
(∫) -- 1st suggestion
(∂) -- 4th suggestion (1st one is same thing in boldface)
----------
This is suggests a really easy way to greatly improve the results: weight them by a prior probability (i.e., frequency of occurrence in a letter count). The OCR itself seems pretty good. Common math symbols are more likely than silly shrimps. Glyphs from common languages are more probable than esoteric ones, and real languages more than constructed ones. Plain glyphs are more common than variants (bold/italic) -- and these should be grouped together anyway. fried shrimp horseshoe.
Because Emoji originated in japan messaging, a number of them relate to japanese culture: foodstuff (not just fried shrimp but rice balls, dango, oden, fugu), cultural practices (kadomatsu, hinamatsuri, koinobori, Fūrin wind chimes) and other such things which may be present in other cultures but usually not as prominently (e.g. Unicode Love Hotel)
Very nice and amazingly accurate. But am I using your website the right way? When I look through the characters in the search results, if I see one that isn't rendering on my computer and just has a block instead of the symbol. I've been clicking on "bad" for "rate this suggestion". Thinking that it tallies up the total good/bad for a character to mean "how likely people are to have this character installed and working on their computers".
However, I now have a feeling that's not what that feature is for.
My guess was that the Good/Bad rating is to help with some sort of Machine Learning going on the background. E.g. the way I draw my Ampersand might be slightly different than yours, so if either or both of us see the result we were hoping for (&), that should get our Good rating. If it returns an (8), it may or may not deserve a Bad. If something way off appears as a top result (^), that would be pretty Bad.
Ooh my word. I drew a Skrillex and it recognized it perfectly; some Chuck Tays because he's a chill bro, school because his music appeals to people in school (12-24), and the Bengali vowel sound because he plays that in his "drops".
So far I haven't tried Shapecatcher a lot but I think that http://detexify.kirelabs.org/classify.html works much better. Detexify is of course only for LaTeX symbols and doesn't do unicode.
To be useful Shapecatcher needs to become better at recognition.
"If you can't find Chinese, Japanese or Korean glyphs, it is because I have yet to find a good free CJK font to use."
Are there not some CJK (or otherwise) fonts from, for example, Linux distributions that could have been used?
Or perhaps the emphasis could be on clarifying what is meant by "good" that deserves excluding such a large and useful character space for this type of application?
Sounds cool but sadly hasn't worked for the letters I often need but haven't easy access to (czech characters like: ř, ď and š). Perhaps because the element that makes them distinct from the latin (the "haček") is so tiny.
Edit, I just drew the ř larger and it recognised it correctly. Cool :)
Trying to draw U+1F4A9 (Pile of Poo). After several attempts, no luck.
I have learnt that Unicode contains even more weirdness than I thought before though, including 'Alchemical symbol for borax-3' (U+1f744), and 'doughnut' (U+1f369).
Idea: instead of matching the shape of what the user has drawn raster-wise, let the user draw an svg-like path, and try to identify the letter by the trace.
Agreed that a pen tool or some type of editor would be kind of nice, but for what he's going for (proving out an idea), this is still pretty fun. I know a little of the science behind it, but it'd be great to read through some well-commented source code. He did link in the thesis on this, however: http://shapecatcher.com/B_Milde%20-%20On%20The%20Security%20...
Does anyone know if the mirror image of this character: “ (U+201C) exists? I'm looking for a character that is the mirror image of the left double quotation mark, where the base is on the bottom and the character tapers from bottom right to top left. I don't know if any languages use that character.
[+] [-] Hupo|13 years ago|reply
But that aside, this looks like a neat idea. Not something I have any immediate use for myself, but could certainly be useful in some situations.
[1] http://www.fileformat.info/info/unicode/char/2603/index.htm
[+] [-] fredley|13 years ago|reply
[+] [-] wmil|13 years ago|reply
Major bug.
[+] [-] lucb1e|13 years ago|reply
[+] [-] flux_w42|13 years ago|reply
[+] [-] quirm|13 years ago|reply
[+] [-] rryan|13 years ago|reply
[+] [-] seanp2k2|13 years ago|reply
EDIT: As this is a deep topic, there are also books if that's more your style: http://www.amazon.com/dp/0123725380/?tag=stackoverfl08-20
or maybe you like Wikipedia (Ol' Trusty): http://en.wikipedia.org/wiki/Feature_detection_(computer_vis...
[+] [-] quirm|13 years ago|reply
There is a whole chapter on shape contexts in it, which I use with shapecatcher, too.
[+] [-] rurounijones|13 years ago|reply
Get the Japanese support in there and it will be amazing. What about using MS Mincho or MS Gothic for that? (It is free as in beer, but is the licensing off?)
[+] [-] andydrizen|13 years ago|reply
[+] [-] uvdiv|13 years ago|reply
http://i.imgur.com/NLkl75J.png
the unicode block containing "fried shrimp" 0x1f364 -- why does this exist???
http://www.unicode.org/charts/PDF/U1F300.pdf
Latest Abstruse Goose comic sums up my emotional response:
http://abstrusegoose.com/496
[+] [-] nwh|13 years ago|reply
There's lot of other strange Unicode too. There's things like '' (U2062 INVISIBLE TIMES), ⓞⓓⓓ ©ⓗⓐⓡⓐ©ⓣⓔⓡ ⓢⓔⓣⓢ and sɹǝʇɔɐɹɐɥɔ uʍop ǝpısdn.
All of which can be used to bypass filters and generally cause browser-crashing havoc. For example, this address looks like Google, but it really links to hacker news.
http://news.ycombinator.com/?/moc.elgoog//:ptth
[+] [-] uvdiv|13 years ago|reply
(!) -- 8th suggestion, after three apparently identical "upside-down 'i'" (including an "upside-down capital 'I' with a dot underneath")
(@) -- 1st suggestion
(#) -- not suggested. Top suggestion is "capital 'H' with stroke"
($) -- not suggested, although the 14th is an indistinguishable glyph "Canadian syllabics carrier sh" 0x165a, a phoenetic symbol for representing a Canadian aboriginal language.
(%) -- 1st suggestion
(^) -- 10th suggestion, to be fair this one is impossible
(&) -- not suggested (fried shrimp lol)
(*) -- 3rd suggestion
(?) -- 1st suggestion
(∫) -- 1st suggestion
(∂) -- 4th suggestion (1st one is same thing in boldface)
----------
This is suggests a really easy way to greatly improve the results: weight them by a prior probability (i.e., frequency of occurrence in a letter count). The OCR itself seems pretty good. Common math symbols are more likely than silly shrimps. Glyphs from common languages are more probable than esoteric ones, and real languages more than constructed ones. Plain glyphs are more common than variants (bold/italic) -- and these should be grouped together anyway. fried shrimp horseshoe.
[+] [-] masklinn|13 years ago|reply
A softbank-based Emoji set was imported (in an extended form) into Unicode 6.0: http://en.wikipedia.org/wiki/Emoji#Emoji_in_the_Unicode_stan...
Because Emoji originated in japan messaging, a number of them relate to japanese culture: foodstuff (not just fried shrimp but rice balls, dango, oden, fugu), cultural practices (kadomatsu, hinamatsuri, koinobori, Fūrin wind chimes) and other such things which may be present in other cultures but usually not as prominently (e.g. Unicode Love Hotel)
[+] [-] dexen|13 years ago|reply
[+] [-] ChrisNorstrom|13 years ago|reply
However, I now have a feeling that's not what that feature is for.
[+] [-] kayge|13 years ago|reply
[+] [-] adnam|13 years ago|reply
[+] [-] seanp2k2|13 years ago|reply
EDIT: forgot to link screenshot: http://imgur.com/LyaXy4h
[+] [-] seanp2k2|13 years ago|reply
[+] [-] MojoJolo|13 years ago|reply
[+] [-] WA|13 years ago|reply
[+] [-] darrenkopp|13 years ago|reply
[+] [-] adnam|13 years ago|reply
[+] [-] hwang89|13 years ago|reply
[+] [-] dpham|13 years ago|reply
[+] [-] hellbanimminent|13 years ago|reply
[deleted]
[+] [-] jQueryIsAwesome|13 years ago|reply
[+] [-] speeder|13 years ago|reply
I mean, I did found it funny, but I did not enjoyed seeing that in my work (specially because I work with kids stuff)
[+] [-] seanp2k2|13 years ago|reply
[+] [-] jaymzcampbell|13 years ago|reply
[+] [-] symmetricsaurus|13 years ago|reply
So far I haven't tried Shapecatcher a lot but I think that http://detexify.kirelabs.org/classify.html works much better. Detexify is of course only for LaTeX symbols and doesn't do unicode.
To be useful Shapecatcher needs to become better at recognition.
[+] [-] cygwin98|13 years ago|reply
could be awesome for a character-based PacMan impl.
[+] [-] drucken|13 years ago|reply
Are there not some CJK (or otherwise) fonts from, for example, Linux distributions that could have been used?
Or perhaps the emphasis could be on clarifying what is meant by "good" that deserves excluding such a large and useful character space for this type of application?
[+] [-] smcl|13 years ago|reply
Edit, I just drew the ř larger and it recognised it correctly. Cool :)
[+] [-] k_bx|13 years ago|reply
[+] [-] jessaustin|13 years ago|reply
[+] [-] quirm|13 years ago|reply
[+] [-] fredley|13 years ago|reply
I have learnt that Unicode contains even more weirdness than I thought before though, including 'Alchemical symbol for borax-3' (U+1f744), and 'doughnut' (U+1f369).
[+] [-] zenon|13 years ago|reply
Clearly it's not impressed by my drawing skills.
[+] [-] dschep|13 years ago|reply
[+] [-] speeder|13 years ago|reply
Seriously, it even showed some very PI-like things, but not PI itself. This is a downer.
[+] [-] the_gipsy|13 years ago|reply
[+] [-] seanp2k2|13 years ago|reply
[+] [-] oftenwrong|13 years ago|reply
A version of this for kanji that is very accurate
http://kanji.sljfaq.org/draw-canvas.html
[+] [-] a_p|13 years ago|reply
[+] [-] FreeFull|13 years ago|reply