The problem with hash avatars in general is that people want to use them for identity verification -- and humans are wired to do so automatically -- but technologically, they cannot provide this. The space of possible avatars (2^256, in this case) is far, far larger than the number of distinct objects that humans can distinguish between. Which means that there will invariably be "collisions:" two avatars that are not identical, but appear identical to humans. As a result, if an attacker can brute-force an avatar that looks very similar to, say, Elon Musk's avatar, they can trivially scam people.
It follows that, since avatars do not provide any proof of identity, there is actually no harm in greatly truncating the hash space when generating them! That is, rather than trying to encode all 256 bits into the avatar, you can use a much more manageable number, like 16. But isn't this too small? Won't there be lots of collisions? Yes -- but that's a feature! If collisions are common, then the average user will be aware that avatar != identity, which makes them less susceptible to scamming. But 16 bits is still enough to meet the real goal of avatars: quickly distinguishing between different people in a conversation (or transaction, or whatever).
(This also shows why making avatars more costly to generate, e.g. with scrypt, can do more harm than good: doing so makes collisions less likely, but still not impossible. Meaning that if a collision does occur, whether accidental or malicious, you are less likely to notice it.)
You might get more milage in if the avatars are unique to the user viewing them rather than identical between users. If the nonce/salt used in generation it itself secure then it'd be phrohibity difficult for adversaries to force a collision without obvious detection, doubly so in communities.
There might not be 2^256 distinguishable objects but maybe someone can come up with 2^16 distinguishable objects and just string 16 of them together. If there is one character off in a string of 40 hexadecimal characters it is hard to notice but that would be easier to detect in a set of 16 symbols.
On a related note, I've been experimenting with using a simple word list (like the eff diceware list) to generate strings of words encoding data. Trickiest part is figuring out how to encode padding, and the eventual size of the word list, and how complicated the final solution should be (eg using word lists that are not even binary numbers and leftover bits and all that). The diceware word list is nice since the words are not ambiguous and don't have homophones.
I assumed there would be existing implementations of something similar but have not found one that fits criteria other than some that use very small word lists. Diceware has 7776 words and pushing that to 8192 should be feasible and is a bit easier to work with.
There's no need to distinguish between every object at every comparison. In most applications, you'll only be comparing a few dozen avatars with each other.
I still like snowflakes for this: https://levien.com/snowflake-explain.html is a half-finished blog post explaining the motivation and algorithm I came up with. I never did careful user testing, but suspect that the answer would be that some people can reliably distinguish the patterns, others won't be able to.
In any case, there are a lot of variations on this "visual hash" idea, including the original fractal one, and I heard of more recent work to use the hash to seed StyleGAN face generation.
As a warning, this would not be good for colorblind people (such as myself).
The "Hello, Hacker News!" Hash's middle ring has half it's ring that looks identical to me, and unless I looked carefully, that entire ring looked the same to me.
What would you suggest as a solution ? I considered swapping Hue for Lightness in order to increase contrast changes. Would you be interested in testing out some variants ?
Urbit also developed a solution for turning a number into an avatar, although theirs only have 32 bits of entropy, and to be honest there are many that are difficult to tell apart:
You should check out this paper where they tested different representations on humans to see what they could tell apart, and came up with a novel representation called Moji.
Since it doesn't seem to be lossy, I was wondering if it could be somehow adapted to something that could be scanned as a QR code. I guess the minor color shifts might be hard to get right, but maybe combined/replaced with some form of symbol inside rings to help, a dot/dash combination?
It would be a lot more work, but it might work better if you picked something which humans are particularly tuned to notice subtle details such as faces.
Using the hash as a seed for an AI face generator like thispersondoesnotexist would be pretty powerful. Free idea for anyone who wants to give it a shot.
OpenSSH's randomart was too visually indistinctive for me so I've patched it to draw TrueColor images of cats. I wanted to actually seed a GAN to generate consistent images, but that turned out to be too much of a bother so I'm just keeping a local cache on a machine. Works nicely for that use-case as I'm able to associate a particular image with a particular location when working at a particular box. Good enough.
Despite the issue where it would be trivial to brute force similar looking but not identical 'avatars', I think this still has a few good uses for non-identification.
1. Creating at least some default avatar. Not to be used to verify identity but just somewhat better than having a very limited set of default images. Having rate limits on account creation would prevent most brute force methods.
2. Avatar suitable for partial-identification for very small populations. Imagine a matrix/Element room that as <100,000 people. The hash/math could be modified to drastically trim down the space of the hash (e.g. 2^256) to something similar to the size of the room.
#2 sounds pretty interesting. It could be expanded by making parts of the image/avatar dependent on some other input other than the user ID like the user's role in the chat group. Another segment/ring could something more short lived and relative like just identifying users in recent chat messages.
I always thought ssh randomart representations were visually unique enough; maybe combine smaller, simpler shapes with color too?
The rings are neat, but I found many to be too similar based on color alone, and segments too are really hard pick up on a pattern or something memorable
How hard would it be to instead generate faces with random facial features? Humans are already hardwired to be able to detect subtle differences between faces.
That would obviously not make it suitable for generating avatars to identify humans, but it would make this really useful to eg identify git commits or hash signatures.
As a sidenote, your website breaks in Vivaldi with cookies denied and several ad-blockers. It keeps on reloading, making it impossible to close the tab or the browser. Please fix your site.
Makes me wonder if you could effectively apply Chernoff faces (https://en.wikipedia.org/wiki/Chernoff_face) to make different hashes easier for humans to recognize. TLDR map parts of the hash to modify aspects of a face (position, size, orientation of eyes, ears etc.) and you can take advantage of all the in-built circuitry in the human brain which can identify very small differences in facial appearance.
The idea is explored a bit in Peter Watts novel Blindsight - not for hashes, but visualizing high dimensional multivariate data via clouds of tormented faces :)
[+] [-] nemo1618|5 years ago|reply
It follows that, since avatars do not provide any proof of identity, there is actually no harm in greatly truncating the hash space when generating them! That is, rather than trying to encode all 256 bits into the avatar, you can use a much more manageable number, like 16. But isn't this too small? Won't there be lots of collisions? Yes -- but that's a feature! If collisions are common, then the average user will be aware that avatar != identity, which makes them less susceptible to scamming. But 16 bits is still enough to meet the real goal of avatars: quickly distinguishing between different people in a conversation (or transaction, or whatever).
(This also shows why making avatars more costly to generate, e.g. with scrypt, can do more harm than good: doing so makes collisions less likely, but still not impossible. Meaning that if a collision does occur, whether accidental or malicious, you are less likely to notice it.)
[+] [-] andromeduck|5 years ago|reply
[+] [-] FqOD4xih7Uq6m9Z|5 years ago|reply
[+] [-] adzm|5 years ago|reply
I assumed there would be existing implementations of something similar but have not found one that fits criteria other than some that use very small word lists. Diceware has 7776 words and pushing that to 8192 should be feasible and is a bit easier to work with.
[+] [-] timClicks|5 years ago|reply
[+] [-] sva_|5 years ago|reply
That sounds intriguing to me. Are you aware of any research into this?
[+] [-] raphlinus|5 years ago|reply
In any case, there are a lot of variations on this "visual hash" idea, including the original fractal one, and I heard of more recent work to use the hash to seed StyleGAN face generation.
[+] [-] ianopolous|5 years ago|reply
[+] [-] kop316|5 years ago|reply
The "Hello, Hacker News!" Hash's middle ring has half it's ring that looks identical to me, and unless I looked carefully, that entire ring looked the same to me.
[+] [-] franky47|5 years ago|reply
[+] [-] sm4rk0|5 years ago|reply
It hashes the user's email http://en.gravatar.com/site/implement/hash/ and creates an "identicon" from the hash http://scott.sherrillmix.com/blog/blogger/wp_identicon/ or loads a user-defined image.
[+] [-] toastal|5 years ago|reply
https://www.libravatar.org/
[+] [-] leipert|5 years ago|reply
[+] [-] franky47|5 years ago|reply
[+] [-] zellyn|5 years ago|reply
[+] [-] kevincox|5 years ago|reply
[+] [-] franky47|5 years ago|reply
[+] [-] milkey_mouse|5 years ago|reply
https://urbit.org/blog/creating-sigils/
[+] [-] mvolfik|5 years ago|reply
[+] [-] joshbuddy|5 years ago|reply
https://exascale.info/assets/pdf/students/MSc_Thesis_-_Micha...
[+] [-] geoah|5 years ago|reply
Since it doesn't seem to be lossy, I was wondering if it could be somehow adapted to something that could be scanned as a QR code. I guess the minor color shifts might be hard to get right, but maybe combined/replaced with some form of symbol inside rings to help, a dot/dash combination?
[+] [-] geoah|5 years ago|reply
[+] [-] RcouF1uZ4gsC|5 years ago|reply
[+] [-] franky47|5 years ago|reply
[+] [-] ilammy|5 years ago|reply
https://github.com/ilammy/homebrew-ssh
[+] [-] InfiniteCode|5 years ago|reply
[+] [-] tosh|5 years ago|reply
[+] [-] KingMachiavelli|5 years ago|reply
1. Creating at least some default avatar. Not to be used to verify identity but just somewhat better than having a very limited set of default images. Having rate limits on account creation would prevent most brute force methods. 2. Avatar suitable for partial-identification for very small populations. Imagine a matrix/Element room that as <100,000 people. The hash/math could be modified to drastically trim down the space of the hash (e.g. 2^256) to something similar to the size of the room.
#2 sounds pretty interesting. It could be expanded by making parts of the image/avatar dependent on some other input other than the user ID like the user's role in the chat group. Another segment/ring could something more short lived and relative like just identifying users in recent chat messages.
[+] [-] petee|5 years ago|reply
The rings are neat, but I found many to be too similar based on color alone, and segments too are really hard pick up on a pattern or something memorable
[+] [-] irdc|5 years ago|reply
That would obviously not make it suitable for generating avatars to identify humans, but it would make this really useful to eg identify git commits or hash signatures.
[+] [-] toptoppler|5 years ago|reply
[+] [-] mvolfik|5 years ago|reply
[+] [-] emsign|5 years ago|reply
[+] [-] rumblefrog|5 years ago|reply
[+] [-] franky47|5 years ago|reply
https://github.com/wzulfikar/hashvatar
[+] [-] tosh|5 years ago|reply
[+] [-] sneak|5 years ago|reply
https://robohash.org/
[+] [-] mrsharpoblunto|5 years ago|reply
The idea is explored a bit in Peter Watts novel Blindsight - not for hashes, but visualizing high dimensional multivariate data via clouds of tormented faces :)