The Deceptive Anagram Question

[+] OhHeyItsE|11 years ago|reply

Author sounds pretty darn smart. But, hey, he/she couldn't come up with the correct solution in the allotted time, under pressure, with a marker in his hand - must not be good enough. Hope the candidate they took on meets all of their expectations.

[+] Grue3|11 years ago|reply

He needed a help from a friend to get this right. I'd argue that at the time of the interview he wasn't ready. I'd also argue that the basic linear solution for this problem is straightforward, doesn't require any clever tricks, and a qualified candidate would have no problem solving it.

[+] gamesbrainiac|11 years ago|reply

I assure you that the author is indeed smart (at least he thinks he is), but is unskilled in the ways of complexity analysis. This was his means of making himself better, by seeing what other people made of what his attempts were. Regrettably, he still has a lot more to learn it seems.

[+] rasz_pl|11 years ago|reply

>A Vector Approach to Hashing

is something author didnt bother to test at all. His word list includes words with apostrophes (hex #27), like basic's, his code accounts only for ascii letters and crashes on that list :)

> its O(26) for list creation

hmm, no? unless python people create lists one element at a time ...tested and yes they do :o :D every additional byte takes ~10ms on my test computer(110K elements loop), hilarious :))) Its cheaper to create two lists outside main loop, and simply copy empty list over the work list instead of creating new list every time (or resetting global list manually), we are talking ~10% of whole algorithms time cheaper. Python is one hilarious way of benchmarking algorithms. Simple x=1 in your main loop will cost tens of milliseconds.

>and its O(26) for creating the tuple

Ehhh :/ Yes, sorting is faster in python, ONLY because sort function is highly optimized low level code, while anything you write yourself will be extremely slow due to interpreted nature of the language. You can easily write O(n) hash routine in low level language (assembler, c) and beat nlogn sort.

>But, I think this time, we can actually use collections.Counter correctly

correctly? yes, faster? NO, ~10% slower. Again seems author simply assumed something is going to work and didnt test.

All in all main lesson from all this should be not use python to gauge algorithms speed, unless you are being hired to write speed critical python code (ahahahaha).

[+] gamesbrainiac|11 years ago|reply

> is something author didnt bother to test at all. His word list includes words with apostrophes (hex #27), like basic's, his code accounts only for ascii letters and crashes on that list :)

Oh, I know it crashes. I had to remove a total of ~20 words that had apostrophes from the original word list. I just added this since it was an interesting take on the problem that I hadn't thought of. Instead of assuming that you have only alphabetical letters, you can assume that you will be getting any ascii character, and hence make an array of 255, and not 26. Which only adds to the problem.

> Ehhh :/ Yes, sorting is faster in python, ONLY because sort function is highly optimized low level code, while anything you write yourself will be extremely slow due to interpreted nature of the language. You can easily write O(n) hash routine in low level language (assembler, c) and beat nlogn sort.

No argument there. In fact, I tried alternatives in Python which were algorithmically (is that a correct word?) superior, but sucked when put to the test. I even tried with array.array, but unfortunately, that takes a list as an initialiser. So, if you wanted to initiate a fixed size array, you would actually be creating the list, and then creating the array which makes the process even slower.

> correctly? yes, faster? NO, ~10% slower. Again seems author simply assumed something is going to work and didnt test.

You insist on saying that I didn't test it (your other comments seem to reflect a similar level of disdain for other people's intelligence as well). It is indeed slower, but the reason I added it is to showcase a bit of stdlib magic. Initially, I was using Counter the wrong way, and now there's a place for it (the whole story ties in at the end). But on the bright side, you tried out the code yourself, and hopefully learned something new (I know that's hard considering how impeccably smart you most certainly are). To be honest, I didn't really think people would read it that far.

[+] rikkus|11 years ago|reply

I stopped when I scrolled as far as where he mentioned having trouble and thought I'd try it out myself. Here's my initial solution:

  new [] {"pool", "loco", "cool", "stain", "satin", "pretty", "nice", "loop"}
  .Aggregate (
    new { Index = 0, Found = new Dictionary<string, List<WordAndIndex>>() },
    (acc, word) =>
      {
        acc.Found
          .FindOrCreate(word.Sort())
          .Add(new WordAndIndex(word, acc.Index));
						
        return new { Index = acc.Index + 1, Found = acc.Found };
      }
  )
  .Found
  .Where(f => f.Value.Count() > 1)
  .SelectMany(f => f.Value)
  .OrderBy(wordAndIndex => wordAndIndex.Index)
  .Select(wordAndIndex => wordAndIndex.Text)

(some helper methods omitted for brevity)

I was fairly happy with this, but curious to see what he'd come up with in the end. I like it - especially the way it uses indices and zip. Here's my translation to C# (no helpers added here!):

  IEnumerable<string> Anagrams(IEnumerable<string> words)
  {
    Func<string, string> sortLetters =
      (s) => new string(s.ToCharArray().OrderBy(c => c).ToArray());
	
    var sortedWords = words.Select(sortLetters);
		
    var sortedWordsLookup = sortedWords.ToLookup(s => s);
	
    return words.Zip(
      sortedWords,
      (word, normal) => new { word, normal }
    )
    .Where(
      (wordAndSortedWord) =>
        sortedWordsLookup[wordAndSortedWord.normal].Count() > 1
    )
    .Select(wordAndSortedWord => wordAndSortedWord.word);
  }

[+] shred45|11 years ago|reply

Can someone explain the comment about how quadratic time complexity does not scale?

[+] crimsonalucard|11 years ago|reply

Good question. I just want to mention that this is a fact that a student with a degree in computer science should absolutely know.

[+] giech|11 years ago|reply

The idea is that as time goes by, computers become twice as fast (Moore's law), but problems also become twice as big. With a quadratic algorithm, you will need twice as much time to solve the problems of the bigger size, even on the faster computer. See page 22 of http://www.cs.princeton.edu/courses/archive/spr15/cos126/lec...

[+] gnur|11 years ago|reply

Imagine you need to compute a list that takes a second to calculate, a list that is twice as long will take 2^1=2 seconds, a list that is 10 times as long will take 2^10=1024 as long, if the list gets 100 times as long, it will take so long that the universe will probably have burned out by then.

[+] nine_k|11 years ago|reply

The key take-away: keep thinking about 'borked' interview questions. This can make you a better programmer.

[+] gamesbrainiac|11 years ago|reply

Agreed. I feel the key take away from any failure is to learn from them, and thus the mistakes that we make become earnest ones.

[+] maxerickson|11 years ago|reply

I think it is better to think of sorting the words as standardization (or looking it up, canonicalization). A typical goal when hashing is to avoid collisions, this algorithm is seeking a certain type of collision.

[+] gamesbrainiac|11 years ago|reply

Thank you so much for pointing this out. When Alexander use d the term "normalization" instead of "hashing", I felt that his term was more appropriate. If you know of any place where I can learn more about these other types of normalization/standardization/hashing?) please feel free to provide links. I would be truly grateful for the opportunity to learn more.

[+] father_of_two|11 years ago|reply

Here's a variant of the last solution the author presented, using a Counter of each word, instead a sorted list of words' chars, as normalization.

  cnt=Counter(tuple(Counter(w).iteritems()) for w in words)
  print [w for w in words if cnt[tuple(Counter(w).iteritems())] > 1]

This is just code golfing, I don't even think it's clear. The last solution of the author is nice, though.

[+] gamesbrainiac|11 years ago|reply

This is actually quite nice. A little hard to see, but I see what you did there. You'd get an earful from the algorithm-nazis though ;)

[+] chatman|11 years ago|reply

"This is wrong on so many levels, I don't even know where to begin."

^ This was remarked about a correct, but slow algorithm.

I couldn't figure out the deception part either.

[+] jack9|11 years ago|reply

This problem is used in most wordster/boggle/Wordy Birdies (from the makers of Angry Birds) style games.

"In the same order" is an unnecessary additional complexity. Asking for an O(1) solution demonstrates the exact same thought quality, so it's a bad (overly complicated and longer to answer) interview question for the purpose intended.

[+] nartz|11 years ago|reply

you only need to go through the array one time - here you are doing it multiple times....

also, you can define a order independent hashing function, that just maybe assigns prime numbers to each letter and adds them up....

instead...

hashed_words = {}

anagrams = []

for word in words:

  h = compute_hash(word)

  w = hashed_words.get(h)

  if w == True
  // we've already encountered another anagram, just add it

    anagrams.append(word)

  elsif w

   //case where both w and word are anagrams, so append both to list

   // and set the value to True, so we can shortcircuit this operation in the future

   anagrams.append(w); anagrams.append(word);

   hashed_words[h] = True //flag indicating weve seen this word before

  else

    // first time we are seeing this, so lets just set it

   hashed_words[h] = word

return anagrams

[+] vilhelm_s|11 years ago|reply

I think this would output ["loco", "cool", "stain", "satin", "pool", "loop"] but the question asks for ["pool", "loco", "cool", "stain", "satin", "loop"]?

[+] meggar|11 years ago|reply

Isn't the O(n) version actually O(NlogN) because of the sort?

[+] nilkn|11 years ago|reply

The individual words are being sorted rather than the list of words, and N here would correspond to the number of words. The sorting of each word would traditionally be O(M log M) where M is the number of characters in the word.

Note that under certain assumptions it's probably possible to sort a word faster than that because we know extra information about the elements being sorted (that is, the letters of the alphabet form a totally ordered set of small cardinality). Specifically, there exist algorithms like counting sort which exploit this extra information.

[+] cousin_it|11 years ago|reply

I think it only sorts the characters in each word, not the list of all words.

32 comments