top | item 111100

Smjörið er brætt og hveitið smátt og smátt hrært út í það, þangað til það er gengið upp í smjörið.

125 points| pg | 18 years ago | reply

Thanks to a fix by Patrick Collison, utf-8 now seems to work right.

154 comments

order
[+] far33d|18 years ago|reply
So. A note to all the "unicode makes this unusable" people -

Apparently, while you were complaining, someone else was solving.

[+] henning|18 years ago|reply
OK. Now how about database access (with support for prepared statements), regular expressions, and networking?
[+] microdan|18 years ago|reply
I think the complaints were more about how pg was originally saying that he intended to never support Unicode. That said, people should realize that UTF-8 encoding/decoding is the zeroth step to internationalization with Unicode.
[+] gojomo|18 years ago|reply
♫♪♫ to my ears. I ♥ unicode! To ∞ and beyond, ☺
[+] tocomment|18 years ago|reply
How do I make the infinity? Actually where do I get all those symbols?
[+] prescod|18 years ago|reply
Do I understand correctly that Arc strings are sequences of octets?

If so: I really don't want to be a negativity guy but it seems like every language that has made an 8-bit string the default string type has regretted it later because it is so painful to change it without breaking code. Okay, Paul says that he won't mind breaking code. Maybe he means it, but it doesn't make any sense to me to knowingly and consciously repeat a design mistake that dozens of other people have made and regretted.

It really just takes one day to get this right. You need to distinguish between the raw bytes read from a device and the true string type (which needs to be 21 bit or greater). You need a trivial converter from one to the other (which you can presumably steal from MZScheme) and back.

That's it. You get this right at the beginning and you never have to backtrack or break code.

My apologies in advance if this post is based on incorrect premises. I'm trying to help.

[+] olavk|18 years ago|reply
Arc snarfs the string implementation from MzScheme which support Unicode in The Right Way, as code points rather than octets.
[+] dzorz|18 years ago|reply
Could you offer a better solution? What would your solution offer that octets do not? Random character access? No, because not a single unicode encoding offers easy random character access (because they are made of possibly several codepoints, which, in some encodings, are made of more than one basic "chars"). Gylph, word and sentence segmentation? I guess not.
[+] nickb|18 years ago|reply
.(; sɹǝpuoʍ sǝop ǝɹnssǝɹd ɔı1qnd ɟo ʇıq ǝ1ʇʇı1 ɐ 'ǝǝs ¡ʍou ǝɯosǝʍɐ sı ɔɹɐ uı ʇɹoddns ǝpoɔıun ¡ɥɐɥ
[+] dcurtis|18 years ago|reply
Yes, because writing upside down is so incredibly useful! How did we ever live without it?

/sarcasm

[+] mdemare|18 years ago|reply
Nâh, dâh zèn we maui klâh mei...

    (define Y
      (λ (m)
        ((λ (f) (m (λ (a) ((f f) a))))
         (λ (f) (m (λ (a) ((f f) a)))))))
[+] Zak|18 years ago|reply
Great... but that's Scheme, not Arc.
[+] Tuna-Fish|18 years ago|reply
oh yes.

Make λ an alias of fn, and have it replace automatically in whatever editor you use?

fn is fast to write, but λ is much more readable, 'cos it stands out.

[+] pchristensen|18 years ago|reply
What language is that? I'm guessing Icelandic; it's a little too unicodey to be Danish or Norwegian, but the words look similar.
[+] pg|18 years ago|reply
Good guess.
[+] jamiequint|18 years ago|reply
农历新年 Happy (Chinese) New Year!
[+] dmoney|18 years ago|reply
I'll never understand why asians type all in question marks. It must be some kind of unary system.
[+] tel|18 years ago|reply
新年快乐!
[+] TMCMan|18 years ago|reply
If you are wondering: On Linux/X11, there's Ctrl+Shift+[unicde number in hexadecimal], gnome-character-map, umap or KCharMap (ت)

And now for the less serious part:

ሞሡሢ Am I the only one whom these Ethiopic characters remind of Tengwar? BTW, are there Unicode chars for Tengwar? I think there should be! (But not for Klingon, because it sucks.) I have fun wirting this on my ⌨, but ℐ∫ ᚾℍℹ⑀ not pointless? Who cares? Anyway, now we can use distinct characters for Roman numerals: Ⅰ,Ⅱ,Ⅲ,Ⅳ,Ⅴ,Ⅵ,Ⅶ,Ⅷ,Ⅹ,Ⅻ,Ⅽ,Ⅿ! Ye darn kids! Everythin we had was 7-bit ASCII, without parity, and we were damn greatful for it? You think you had it bad? I had to use Morse code for browsing porn, back in my days! And I had to etch my public key into the wall of a rotten ol' cave! We did not have this fancy-shmancy routed network, i had to remember the way from here to there all by myself!

--- this post was presented to you by Too Much Coffee.

[+] r7000|18 years ago|reply
Freude, schöner Götterfunken! Tochter aus Elysium!
[+] mixmax|18 years ago|reply
if you search for "Smjörið er brætt og hveitið smátt og smátt hrært út í það, þangað til það er gengið upp í smjörið." on Google this thread is the fourth result.

Damn fast...

[+] rams|18 years ago|reply
இது தமிழ
[+] jey|18 years ago|reply
Tamil++; // (இது C)
[+] rams|18 years ago|reply
हिन्दी
[+] polar|18 years ago|reply
മലയാളം
[+] nreece|18 years ago|reply
किसी वस्तु, व्यक्ति, स्थान, या भावना का नाम बताने वाले शब्द को संज्ञा कहते हैं। जैसे - गोविन्द, हिमालय, वाराणसी, त्याग आदि संज्ञा में तीन शब्द-रूप हो सकते हैं -- प्रत्यक्ष रूप, अप्रत्यक्ष रूप और संबोधन रूप ।
[+] kmt|18 years ago|reply
Браво!
[+] ph0rque|18 years ago|reply
В самом деле браво!
[+] kajecounterhack|18 years ago|reply
Happy chinese new year. 白人看不懂
[+] tel|18 years ago|reply
这个白人看得懂。
[+] olavk|18 years ago|reply
Røv og nøgler! PG succumbs to the demands of political correctness! Will we soon see mandatory static type declarations and CSS in Arc?
[+] pg|18 years ago|reply
Well, not quite. I gave Patrick an early version of the code, a couple weeks before Arc was released, and he immediately sent me this fix. I just didn't get around to incorporating it till now.

There's a difference between things I don't care about, and things I'm actively against. I don't care about character sets and css, so those things will no doubt gradually get better.

Classic static typing, however, I think is actually a bad idea in a general-purpose language. It makes languages weaker. So it's never likely to happen in Arc itself. However, one of the explicit goals of Arc is to be a good language for writing other languages on top of, and I can imagine plenty of languages for specific types of problems (e.g. circuit design) in which static typing would be a good idea.

[+] mixmax|18 years ago|reply
Ååååh another Dane on the ropes :-)
[+] Create|18 years ago|reply
Öt szép szűzlány őrült írót nyúz. Egy hűtlen vejét fülöncsípő, dühös mexikói úr Wesselényinél mázol Quitóban.
[+] bootload|18 years ago|reply
Η ευχαρίστηση στην εργασία βάζει την τελειότητα στην εργασία ~ kudos to 'Patrick Collison'
[+] eusman|18 years ago|reply
Έλληνας;!