btn | 4 years ago | on: Reverse-Engineering Apple Dictionary (2020)
btn's comments
btn | 7 years ago | on: Loss aversion is not supported by the evidence
Which itself is a part of a series of articles in JCP debating the issue: https://onlinelibrary.wiley.com/doi/abs/10.1002/jcpy.1054
The definitive statement made by this article's headline isn't really supported by the evidence presented in the papers. Rather, the state of affairs seems to be that "loss aversion" has been the victim of incessant overgeneralisation. It's a very simple hypothesis about human behaviour that plays nicely into a lot of interesting (and therefore publishable) narratives. This has lead people to blindly accept the general hypothesis of loss aversion without enough critical investigation of its manifestation. The authors don't really refute "loss aversion" (i.e. they don't present an alternative theory to explain the papers that purport to demonstrate "loss aversion"), but rather they refute the pop-psychology belief that it's a general principle of human behaviour.
btn | 9 years ago | on: How Not to Explain Success
Online surveys are certainly becoming more popular as they are significantly cheaper to conduct than the alternatives, and yield publishable results that garner media attention. There are peer reviewers that will be sympathetic to these issues, regardless of the method's robustness.
However, there are others that would say this reeks of dredging (p-hacking) in a very murky pool of data. Their "scepticism" rarely makes the New York Times (or a bestselling book), though.
btn | 10 years ago | on: Facebook and How UIs Twist Your Words
btn | 10 years ago | on: The Kolmogorov-Smirnov Test
1. Failing to reject the null hypothesis is not the same as accepting the null hypothesis. That is, concluding "these data are from some distribution X" is spurious.
2. There's a 'sweet-spot' for the amount of data. If you have too few samples, it's very easy to fail to reject; and if you have too many, it's very easy to reject (the chart at the bottom of the "Two Sample Test" section illustrates this).
3. The question "are these data from some distribution X?" is usually too strong. It's usually more informative to ask "can these data be modelled with some distribution X?"
btn | 11 years ago | on: Building a Forex Trading Platform Using Kafka, Storm and Cassandra
btn | 11 years ago | on: Swift Has Reached 1.0
btn | 11 years ago | on: Automatic Build Numbers in Xcode
https://developer.apple.com/library/mac/documentation/Darwin...
https://developer.apple.com/library/ios/qa/qa1827/_index.htm...
btn | 11 years ago | on: Compressing Scrabble Dictionaries
The downside is that traversing the tree is a series of linear bit-counting operations---which can be painfully show without a bit of pre-caching.
[1]: http://www.cs.cmu.edu/afs/cs.cmu.edu/project/aladdin/wwwloca...
btn | 12 years ago | on: Improving GitHub for science
btn | 12 years ago | on: Improving GitHub for science
In comparison with BitBucket (not to advocate, but they offer a comparable service): the restrictions they waive for academic accounts are done so permanently.
btn | 12 years ago | on: JavaScript has a Unicode problem (2013)
I'm not, but I think it's the only sane thing for a text editor to do if you don't want it to incorporate a ton of language-specific rules. The UAX actually does make a distinction between "legacy" and "extended" grapheme clusters---if you're handing "delete", you'll want to use "legacy clusters" to separate the two Tamil marks; but for text selection, use "extended clusters" will combine them (it's a little bit more complicated than that, but there are properties of Unicode that allow you to handle the "preferred" method for editing a script, while remaining mostly language-agnostic).
Hangul is trickier, but input happens through an IME that "composes" the characters before they are committed to the editor. The IME will perform component-wise deletion, but once it's committed, the editor will operate on the grapheme. It's not a perfect solution, but keeping the composition/decomposition rules for the language in the IME seems preferable.
btn | 12 years ago | on: JavaScript has a Unicode problem (2013)
in the unlikely case I had to support Tamil or Korean for such a specialistic case.
Why is it "unlikely" that you would want your software to support users of other languages?
btn | 12 years ago | on: JavaScript has a Unicode problem (2013)
Luckily, the Unicode Technical Committee has figured this out for you, and UAX#29 provides an algorithm for determining grapheme cluster boundaries [1]. Yes, it's long and technical, it has many cases (and exceptions) to handle, and it can't be expressed compactly in two lines of JavaScript; but it will give you a well-defined and understood answer for all scripts in Unicode.
[1] http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Bounda...
btn | 12 years ago | on: How I “hacked” Kayak and booked a cheaper flight
Matrix lets you specify the the "sales city" (the last field in the advanced search options), which allows you to check out price discrimination by location.
btn | 12 years ago | on: The Other 'F Word': Brewer Responds To Starbucks Over Beer Name
btn | 12 years ago | on: Show HN: Signal – Edit emails in your Gmail inbox
btn | 12 years ago | on: Microsoft's First Chip Brings Tank-Finding Design to Xbox
btn | 12 years ago | on: Better line numbers for Vim
btn | 12 years ago | on: 6 tips to make the best iPhone app icons. 70% of users hate the new iOS icons
I wound up doing this a while ago for a similar toy project. After some poking around, it turned out that dictionary bundles are entirely supported by system APIs in CoreServices! The APIs are private, but Apple accidentally shipped a header file with documentation for them in the 10.7 SDK [1]. You can load a dictionary with `IDXCreateIndexObject()`, read through its indices with the search methods (and the convenient `kIDXSearchAllMatch`), and get pointers to its entry data with `IDXGetFieldDataPtrs()`.
It takes a bit of fiddling to figure out the structure (there are multiple indices for headwords, search keywords, cross-references, etc., and the API is a general-purpose trie library) and request the right fields, but those property lists in the bundle are there to help! (As the author of this article discovered, the entries are compressed and are proceeded with a 4-byte length marker.)
[1] https://github.com/phracker/MacOSX-SDKs/blob/master/MacOSX10...