sbi's comments | WingNews

sbi | 11 years ago | on: Proving that Android’s, Java’s and Python’s sorting algorithm is broken

Heapsort has poor cache performance though. Its asymptotic complexity depends on the assumption that memory access is O(1), which is not borne out in practice.

sbi | 11 years ago | on: Extracting the SuperFish certificate

Webster's Dictionary does not discuss the pejorative use of the word "ghetto" and in any case is not an arbiter of racism.

sbi | 11 years ago | on: Extracting the SuperFish certificate

What charming casual racism. "It's pretty ghetto" ... "The ghetto way is just to run this on a machine" ... "The ghetto reversing is to run strings."

sbi | 11 years ago | on: With no trademark, Sriracha name is showing up everywhere

Thanks for explaining that to me. To add to the confusion, tabasco is also a pepper cultivar named after the Mexican state.

sbi | 11 years ago | on: With no trademark, Sriracha name is showing up everywhere

Tabasco is a state in Mexico, though; how were its trademark owners able to obtain the name?

sbi | 11 years ago | on: Sed – An Introduction and Tutorial

There are many different versions of awk: gawk, (BSD) nawk, mawk, etc. I think OS X uses nawk, but mawk is reputedly faster. Gawk is definitely slower than both mawk. I'm not surprised that Perl is faster though.

sbi | 11 years ago | on: Sed – An Introduction and Tutorial

Probably not, but in some cases sed and AWK might be faster. I am a big fan of AWK. It is limited in some ways that make it impractical to use for really serious programs, but it is very expressive. Look at [1, 2].

[1]: http://c2.com/doc/expense/

[2]: http://www.pement.org/awk/awk1line.txt

sbi | 11 years ago | on: LaTeX users are slower, make more mistakes, and are happier than Word users

The data from the study are available on the PLoS website. The authors coded incomplete documents as mistakes---some participants submitted empty LaTeX documents for the table exercise with hundreds of "mistakes." Since the LaTeX users mean percentage completion was lower than that of the Word users, this may explain part of the authors' finding that the LaTeX users were more error prone as well.

sbi | 11 years ago | on: Efficiency Comparison of Document Preparation Systems – LaTeX and Word

Looking over their data, it seems that they coded incomplete text as an error. So, for example, participants 34 and 37 completed none of the table exercise and were coded as having produced 513 errors (both were LaTeX users). This accounts for the vast majority of the variation in the observed errors. I think they've convincingly shown that Word is more productive than LaTeX for some of the study tasks, but not that the resulting output is more correct.

That being said, I don't doubt that LaTeX is harder to use and more error prone than Word. Since I find myself frequently writing mathematics-heavy text, I personally prefer LaTeX ...

sbi | 11 years ago | on: Intrusive lists in Doom 3

Doesn't std::list<T> generate different code for each type T, at least naively? The generated code could end up taking up more space even if the data structures on which it operates look the same in memory.

sbi | 11 years ago | on: Popular Myths about C++, Part 2

This article correctly observes that "many resources are not plain memory" and then goes on to advise users to close() a file in a destructor. However, one of the ways that files are not like memory is that while free() can never fail, fclose() can fail if writing is buffered and the final write fails. This is especially problematic in C++ since throwing from a destructor is dangerous. Whether you rely on RAII or GC to free files, you will not be able to catch this sort of error. It is precisely for this reason that GC is a mechanism for memory management, not resource management in general.

sbi | 12 years ago | on: Project Euler

The Kelly criterion doesn't quite apply in this problem, since you are asked to optimize a more conservative strategy, namely to guarantee (with high probability) a certain amount of winnings rather than maximize log(winnings). In fact the Kelly value for f (in the problem's notation) is 0.25.

sbi | 12 years ago | on: The Mathematical Hacker (2012)

Typically you wouldn't use lgamma to compute exact values of factorials at all. For a real world example of this function in action, look at John D. Cook's blog:

[1]: http://www.johndcook.com/blog/2012/07/14/log-gamma-differenc...

[2]: http://www.johndcook.com/blog/2010/08/16/how-to-compute-log-...

[3]: http://www.johndcook.com/blog/2008/04/24/how-to-calculate-bi...

sbi | 12 years ago | on: Fibonacci Numbers in the Real World

I explained this in a reply below, but this is a linear-algebraic technique. The matrix A = [0 1; 1 1] satisfies the relation A^2 - A - 1 = 0. (In fact, if you think of A as a lag operator, this is the definition of Fibonacci numbers!) You can use this relation to compute products of matrices of the form aA + bI where I is the identity matrix---the comment below has the derivation. This is a bit faster than using matrix multiplication because there are only two coefficients to keep track of.

This might seem like a pointless optimization, but it is actually useful. One advantage is that you don't need to factor the polynomial x^2 - x - 1 or work with square roots. The closed-form for Fibonacci numbers is a bit of a red herring because to work with it exactly, you will end up doing the same kinds of computations anyway. For recurrences of higher degree, factorization might be impossible and this is the only route.

When you're working over a finite field, even if you don't start with a recurrence, there are fast techniques for finding factors of the characteristic polynomial of a sparse matrix (the Berlekamp-Massey algorithm comes to mind).

sbi | 12 years ago | on: Fibonacci Numbers in the Real World

You are correct that this code uses "Russian Peasant Multiplication." The question is: what is being exponentiated? One choice is the matrix A = [0 1; 1 1], whose powers are A^n = [F_{n-1} F_n; F_n F_{n+1] if we compute those powers treating A as a matrix. This involves some redundant computations since F_n is being computed twice.

An alternative method is to use the following identity:

  (aA + b)(cA + d) = acA^2 + (ad + bc)A + bd
                   = (ad + bc + ac)A + (bd + ac)
                   = ((a + b)(c + d) - bd)A + (bd + ac)

(In fact, A^n = F_n A + F_{n-1} I_2). This step requires 3 multiplications and 4 additions and is more-or-less equivalent to the "fast doubling" algorithm in reference [11], which is more optimized. But this derivation is completely generic---it just relies on the characteristic polynomial of A, namely A^2 - A - 1 = 0.

In the case where A is not the Fibonacci matrix but something much larger, working with polynomials modulo the characteristic polynomial of A is much faster than working with A as a matrix.

sbi | 12 years ago | on: Fibonacci Numbers in the Real World

Thank you for the reference. The algorithm in the parent comment is not actually matrix exponentiation but the "fast doubling algorithm" in Minase's terminology, where you exponentiate the matrix algebraically using the Cayley-Hamilton theorem rather than using matrix multiplication.

sbi | 12 years ago | on: Fibonacci Numbers in the Real World

Which reference?

sbi | 12 years ago | on: Fibonacci Numbers in the Real World

The "right" way to generate Fibonacci numbers is not via the closed form, but by exponentiation of the matrix A = [0 1; 1 1] that defines Fibonacci numbers as an element of its endomorphism ring Z[x]/(x^2 - x - 1). This doesn't rely on explicitly computing the eigenvalues of A and scales well to more complex recursions. Sample Haskell:

  fib :: Integer -> Integer
  fib n = fst . loop (0, 1) base . abs $ n
    where pmul (a, b) (c, d) = ((a + b)*(c + d) - b*d, b*d + a*c)
          base = if n >= 0 then (1, 0) else (1, -1)
          loop acc sq m
            | m == 0         = acc
            | m `mod` 2 == 0 = loop acc (pmul sq sq) (m `div` 2)
            | otherwise      = loop (pmul sq acc) (pmul sq sq) (m `div` 2)

sbi | 12 years ago | on: Refactoring with go fmt

There are tools for standardizing C formatting. Another poster mentioned indent. Clang-format [1] is very good, works with C++ as well as C, and has great editor integration. You can choose among predefined style (e.g. Google, Mozilla, LLVM style guides) or create your own and then and enforce it within a source tree very easily.

[1]: http://clang.llvm.org/docs/ClangFormat.html

sbi | 12 years ago | on: Firefox 23 is available

Thanks; I had searched on pgp.mit.edu for the primary key but forgot to add the hex prefix 0x.

Curiously, the 1024 bit DSA key used for some previous releases (0x7f4d66451ebcab3a) has been signed by "Someone at Mozilla Should Sign the Release Key (so users can verify the key's owner!) <[email protected]>".