Interesting, I have made a similar project, except instead of rotN, it encodes the input as UTF-8, and then shifts up the codepoints to display each byte as a different character to what it would normally be. The invariant is that `byte & 0xff` is the real byte value.
I think in principle (judging by the description of rot8000), my tool should be able to decode rot8000 messages natively, but it doesn't seem to work on the example given here. From looking directly at the codepoints given, I think the example is wrong. It starts:
u+7c5d u+7c71 u+7c6e - which works out to "]qn" instead of "The", unless I am misunderstanding something. And in fact that looks definitely wrong if we're expecting ASCII output because they're all more than 127 away from 0x8000, no matter how it works.
The rot8000 page says:
> It also bypasses 32 control characters, technically making it rotFFE0, sometimes with an additional offset.
I definitely don't understand how this is meant to work. Why does skipping 32 control characters turn it from rot8000 into rotFFE0? Should that say 7FE0? I still don't see how ASCII is coming out as 7Cxx.
Taking `char - 0x7c09` gets the expected ASCII output.
One nice property of rot13 is it reverses itself; rot13(rot13(X)) = X. At least, for basic ASCII alphabet. Your UTF-8 encoding step makes that impossible. I wonder if there's a sensible Unicode-friendly algorithm that has that rot13 property.
This is a very bad idea because it's going to rotate ordinary characters to code points where Unicode normalization has an effect, including combining characters, whitespace, control characters... After normalization, rotating back will produce garbage.
I wonder if it is possible to generate a ROT8000 quine, that is a phrase like "hello world" which yields a semantically matching phrase in some other language?
I remember seeing this the last time it was posted on HN [0] in 2018. The About page seems to be a bit outdated since it actually skips a lot more characters than the 32 mentioned.
Running this on CJK text is an interesting exercise.
Nice idea. I often use base64 for this, since it's somewhat recognizable and there are tons of decoding tools available.
Base64 does lengthen the text by a third, which may or may not be a problem. On the other hand, it doesn't need special handling of control characters, and manages to hide word lengths well.
Many years ago I was involved in finding and fixing a messaging bug that only appeared when the base64 encoded payload had a length that was a multiple of 87 bytes (it might have been some other value - it was 15+ years ago).
I am just getting boxes with hex codes in them if I type ascii letters so that is not so very nice. Even if you have all of the required fonts I am not sure it is that great to get characters from a completely foreign language. Also, I suppose, one could end up with surrogate code points which do not have a character representation. To summarize: I think this sounded like more fun in theory than it turns out to be in practice.
Real world use case: geocaching.com uses it to hide hints, so you don't read and spoil yourself by accident. It's pretty much accepted and adopted by the users. I also would ban words like "dumb" or, for another example "easy" in IT and CS contexts.
In this case, it would be very unlikely to actually happen, for a few reasons.
Almost all combining rules (including skin tone modifiers) require a zero-width joiner character between the person emoji and the modifier emoji. So really it's frowning face + ZWJ + brown texture = brown frowning face. (Although technically I don't think frowning face can be modified.) Also, there are relatively few ZWJ combinations.
Technically, there are some older combination emojis that predate ZWJ, mainly the flags, which are composed of two single-letter emojis, e.g. regional-indicator-U + regional-indicator-S = United States flag. So I guess it might be possible to get a couple of those.
And in any case, I think this page assumes that you're staying within the bounds of the basic multilingual plane (it mentions a self-inverting transform would be ROT32768), which doesn't include emojis or skin tone modifiers.
Hi, I created rot8000 for The Wrong, an online biennial of digital art -- specifically for the pl41nt3xt pavillion, which included text-only works. The pavillion was taken down when the biennial ended, and looks like that link is no longer valid
What is the name of the non-reversible encoding scheme that translates "internationalization" to "i18n", "localization" to "l10n", "kubernetes" to "k8s", and other abbreviations like "f2k" "y1u"?
> While rot13 is the self-inverse for a 26-character system, and rot47 for ANSI, the Basic Multilingual Plane of Unicode requires rot32768 (or 8000 in hex) for a reciprical cypher
Not all emoji is in the BMP, at least some are in the Supplementary Multilingual Plane.
It's weird to me that if you're gonna do this dumb "rot13 but for Unicode", you'd only do it for the BMP, and not ALL of Unicode.
The details you might be missing is that some emoji existed in Unicode before color graphic "emoji" was actually a thing. The stars (and hearts) are examples of ones which used to be just a basic shape in the font but now are commonly full color graphical "images".
There's a couple of browser plugins that do this, with a password, so long as someone else knows the password it will decode. I'm not near my machine that has it, but I know it does Korean, japanese, and Chinese characters - you choose which set you want. And it doesn't back-translate to anything useful, it's just encoding.
[+] [-] jstanley|4 years ago|reply
I call it "Mojibake Steganography": https://incoherency.co.uk/mojibake/
I think in principle (judging by the description of rot8000), my tool should be able to decode rot8000 messages natively, but it doesn't seem to work on the example given here. From looking directly at the codepoints given, I think the example is wrong. It starts:
u+7c5d u+7c71 u+7c6e - which works out to "]qn" instead of "The", unless I am misunderstanding something. And in fact that looks definitely wrong if we're expecting ASCII output because they're all more than 127 away from 0x8000, no matter how it works.
The rot8000 page says:
> It also bypasses 32 control characters, technically making it rotFFE0, sometimes with an additional offset.
I definitely don't understand how this is meant to work. Why does skipping 32 control characters turn it from rot8000 into rotFFE0? Should that say 7FE0? I still don't see how ASCII is coming out as 7Cxx.
Taking `char - 0x7c09` gets the expected ASCII output.
[+] [-] robinhouston|4 years ago|reply
[+] [-] NelsonMinar|4 years ago|reply
One nice property of rot13 is it reverses itself; rot13(rot13(X)) = X. At least, for basic ASCII alphabet. Your UTF-8 encoding step makes that impossible. I wonder if there's a sensible Unicode-friendly algorithm that has that rot13 property.
[+] [-] Tepix|4 years ago|reply
However the feature of rot13 (and rot8000) that you can use the same operation to "decrypt" it again is unfortunately missing in your variant.
[+] [-] amptorn|4 years ago|reply
[+] [-] shakna|4 years ago|reply
It actually skips whitespace, control characters and surrogate pairs [0].
[0] https://github.com/rottytooth/rot8000/blob/main/Rottytooth.R...
[+] [-] GekkePrutser|4 years ago|reply
But anyway ROT in itself is a pretty stupid idea anyway, usually just done for show.
[+] [-] Spooky23|4 years ago|reply
[+] [-] sandwell|4 years ago|reply
[+] [-] eurasiantiger|4 years ago|reply
[+] [-] kderbyma|4 years ago|reply
[+] [-] Arcorann|4 years ago|reply
Running this on CJK text is an interesting exercise.
[0] https://news.ycombinator.com/item?id=18495518
[+] [-] BoppreH|4 years ago|reply
Base64 does lengthen the text by a third, which may or may not be a problem. On the other hand, it doesn't need special handling of control characters, and manages to hide word lengths well.
[+] [-] arethuza|4 years ago|reply
Bug was in a C++ base64 encoder component.
[+] [-] zzzaim|4 years ago|reply
[1] https://github.com/241m/rot8000
[+] [-] cjfd|4 years ago|reply
[+] [-] SommaRaikkonen|4 years ago|reply
[+] [-] GekkePrutser|4 years ago|reply
I bet using ROT on this will lead to unintended consequences because the original characters won't combine but the replaced ones will.
But anyway ROT is a dumb thing to do anyway so it doesn't have any real-world use.
[+] [-] kasitmp|4 years ago|reply
[+] [-] kyle-rb|4 years ago|reply
Almost all combining rules (including skin tone modifiers) require a zero-width joiner character between the person emoji and the modifier emoji. So really it's frowning face + ZWJ + brown texture = brown frowning face. (Although technically I don't think frowning face can be modified.) Also, there are relatively few ZWJ combinations.
Technically, there are some older combination emojis that predate ZWJ, mainly the flags, which are composed of two single-letter emojis, e.g. regional-indicator-U + regional-indicator-S = United States flag. So I guess it might be possible to get a couple of those.
And in any case, I think this page assumes that you're staying within the bounds of the basic multilingual plane (it mentions a self-inverting transform would be ROT32768), which doesn't include emojis or skin tone modifiers.
[1] https://emojipedia.org/emoji-zwj-sequence/
[+] [-] kapp_in_life|4 years ago|reply
ab => frowning face + brown texture = brown frowning face => ab
[+] [-] omershapira|4 years ago|reply
[+] [-] ajanuary|4 years ago|reply
类籸籽籁簹簹簹簵 籸籷 籽籱籮 籸籽籱籮类 籱籪籷籭簵 籹类籮籼籮类籿籮籼 籵籮籼籼 籼籽类籾籬籽籾类籮 簱米籾籼籽 籼籹籪籬籮籼簲簷 籝籱籲籼 籶籪籴籮籼 籲籽 籿籲籼籾籪籵籵粂 籶籸类籮 籭籮籷籼籮簵 籪籷籭 籵籮籼籼 籽籪籷籽籪籵籲籼籲籷籰 籪籼 籪 籼籹籸籲籵籮类簶籽籮粁籽 籶籮籬籱籪籷籲籼籶簷 籋籾籽 粀籸类籴籲籷籰 粀籲籽籱 籷籸籷簶籵籪籽籲籷 籼籬类籲籹籽籼 籨籲籼籨 籪籹籹籮籪籵籲籷籰簷 籒 籽籱籲籷籴 籽籱籮类籮 籲籼 籪 籾籼籮 籬籪籼籮 籯籸类 籸籷籵粂 类籸籽籪籽籲籷籰 籬籱籪类籪籬籽籮类籼 籲籷 籽籱籮 籵籮籽籽籮类 籬籪籽籮籰籸类粂簷
[+] [-] dash2|4 years ago|reply
[+] [-] ditherstudies|4 years ago|reply
[+] [-] chris_st|4 years ago|reply
Leet-speak for "plaintext".
[+] [-] kevinmgranger|4 years ago|reply
[+] [-] DonHopkins|4 years ago|reply
[+] [-] LordDragonfang|4 years ago|reply
https://en.wikipedia.org/wiki/Numeronym
[+] [-] rsj_hn|4 years ago|reply
https://onlineasciitools.com/convert-ascii-to-morse
[+] [-] jhvkjhk|4 years ago|reply
[+] [-] Uberphallus|4 years ago|reply
[+] [-] Gormisdomai|4 years ago|reply
E.g. stars and hearts get rotated but sunglasses do not
(EDIT: rewrote my example to use words because HN doesn't render emoji, duh)
[+] [-] OskarS|4 years ago|reply
> While rot13 is the self-inverse for a 26-character system, and rot47 for ANSI, the Basic Multilingual Plane of Unicode requires rot32768 (or 8000 in hex) for a reciprical cypher
Not all emoji is in the BMP, at least some are in the Supplementary Multilingual Plane.
It's weird to me that if you're gonna do this dumb "rot13 but for Unicode", you'd only do it for the BMP, and not ALL of Unicode.
[+] [-] jsjohnst|4 years ago|reply
Star = U+2B50 which is less than U+FFFF
Sunglasses = U+1F576 which is greater than U+FFFF
The details you might be missing is that some emoji existed in Unicode before color graphic "emoji" was actually a thing. The stars (and hearts) are examples of ones which used to be just a basic shape in the font but now are commonly full color graphical "images".
[+] [-] maerF0x0|4 years ago|reply
[+] [-] genewitch|4 years ago|reply
[+] [-] perl4ever|4 years ago|reply
[+] [-] azhenley|4 years ago|reply
[+] [-] Tepix|4 years ago|reply
[+] [-] DonHopkins|4 years ago|reply
[+] [-] stavros|4 years ago|reply