Most common phone interview question at Google

asdginioubnou|5 years ago

Both solutions are O(1). The alphabet is finite. Let's say there are M different characters in the alphabet and N different characters in the string. If there is a duplicate, it is guaranteed to occur within the first M + 1 characters of the string by the pigeonhole principle. As N grows without bound, the number of characters that needs to be checked reaches a limit.

(I guess, most precisely, we should define K to be min(M + 1, N). Then the first solution is O(K^2) and the second solution is O(K))

EDIT: I assumed the first solution would compare each character to the characters visited so far, i.e.

    for i in [0..N]
        for j in [0..i]
           if arr[i] == arr[j] return arr[i]

Maybe it's

    for i in [0..N]
        for j in [0..N]
           if arr[i] == arr[j] && i != j
               return arr[i]

That would make it O(KN). Or O(N) if we treat the size of the alphabet as fixed.

asdginioubnou|5 years ago

>Let's say there are M different characters in the alphabet and N different characters in the string.

I mean "N characters in the string", i.e. the string is length N. There won't be N different characters.

v_g|5 years ago

Interesting point. However, Alphabet is probably not the best example, Unicode perhaps? The upper bound in Unicode is ~ 1.1 Million, but still fixed

chrisseaton|5 years ago

> This means is a magnitude faster than the previous one!

Is this right? It's a quadratic not an order of magnitude difference isn't it? And can you be an order of magnitude faster in an O notation expression anyway? Isn't an order of magnitude a coefficient? Don't we drop coefficients in O notation?

asdginioubnou|5 years ago

It's definitely wrong. A lot of people use "order of magnitude" to just mean "a lot".

I always use the precise meaning. It might be better for me to say "factor of ten" rather than "order of magnitude" if other people don't always use the precise meaning.

fancyfredbot|5 years ago

It's not wrong! For some value of N it is an order of magnitude faster. It's just a bit misleading.

stellalo|5 years ago

> Is this right?

It's not right, indeed

ayberk|5 years ago

I used to use this question before it became too common. Any reasonably-well-prepared candidate should be able come up with the O(N) solution and the follow-up is always good for a discussion: "what if the string is huge but the alphabet is small? Eg, a DNA sequence."

There's a single pass solution that's left as an exercise for the reader.

chrisseaton|5 years ago

> Any reasonably-well-prepared candidate

Isn't that the point? You say it like it's a problem, but it screens for someone who's reasonably prepared, which is probably what Google wants to do because they really start looking for the super-world-tier people they employ, because I guess that takes more time and resources.

> There's a single pass solution that's left as an exercise for the reader.

Doesn't the blog post already include a single-pass solution that works on a small alphabet?

aeyes|5 years ago

Going to ask the obvious question: Did 70% really want to solve this with a double for loop?

I don't see what changes with a small alphabet. Or are you searching for multiple ocurrences of sequences instead of single characters?

swang|5 years ago

i will still have to iterate through the string and the hash/dict for a small alphabet is not taking up a lot of space. by the wording it sounds like maybe you want people to use bitflags but your problem still seems solvable with the blog's solution (hash/dicts)

kinkrtyavimoodh|5 years ago

I love that this question is very basic, tests only entry-level data structures, and can at least be talked and reasoned about even if you don't know the exact API of a hashmap, and YET I am sure people will find a reason to complain about this claiming they never actually need to find a recurring character in their actual job role.

jrootabega|5 years ago

Congrats, it's now off the list of phone interview questions at Google.

thiht|5 years ago

> Most peoples immediate thought would be a double for loop

Really? I've a hard time believing this

jimbob45|5 years ago

Usually, part of the question is to ask about legal characters. If they tell you it's letters only, then you can get away with a pre-allocated array (and give a concrete big O answer).

TheOtherHobbes|5 years ago

It's not an order of magnitude faster. The details depend on the hash table implementation and the language and the processor specifics and the statistical distribution of string sizes and whether you can make assumptions about the number of symbols being used and whether a tight double loop for a small symbol set and string size would fit in the cache and whether it would stay there long enough to make a useful difference.

Naive Big O is just as likely to be wrong as naive anything else.

I hope this isn't a genuine phone screen question, because if someone punted this at me as a pass/fail I'd fail them for being confused about how real computers and real languages work behind those nice clean-looking "if thing is in other_thing..." statements.

asdginioubnou|5 years ago

The second solution is safer than the first. While it will sometimes be slower, it will never be catastrophically bad. It may have a small, predictable overhead, but it will never unexpectedly bring down the whole program when you get a few unusually long strings.

When searching through a sorted list, your first instinct should be to use binary search. Your first instinct should not be "well maybe the list is always small, so linear search would be faster." Write the safe, predictable thing and save the optimizations for later.

23 comments