Both solutions are O(1). The alphabet is finite. Let's say there are M different characters in the alphabet and N different characters in the string. If there is a duplicate, it is guaranteed to occur within the first M + 1 characters of the string by the pigeonhole principle. As N grows without bound, the number of characters that needs to be checked reaches a limit.
(I guess, most precisely, we should define K to be min(M + 1, N). Then the first solution is O(K^2) and the second solution is O(K))
EDIT: I assumed the first solution would compare each character to the characters visited so far, i.e.
for i in [0..N]
for j in [0..i]
if arr[i] == arr[j] return arr[i]
Maybe it's
for i in [0..N]
for j in [0..N]
if arr[i] == arr[j] && i != j
return arr[i]
That would make it O(KN). Or O(N) if we treat the size of the alphabet as fixed.
> This means is a magnitude faster than the previous one!
Is this right? It's a quadratic not an order of magnitude difference isn't it? And can you be an order of magnitude faster in an O notation expression anyway? Isn't an order of magnitude a coefficient? Don't we drop coefficients in O notation?
It's definitely wrong. A lot of people use "order of magnitude" to just mean "a lot".
I always use the precise meaning. It might be better for me to say "factor of ten" rather than "order of magnitude" if other people don't always use the precise meaning.
I used to use this question before it became too common. Any reasonably-well-prepared candidate should be able come up with the O(N) solution and the follow-up is always good for a discussion: "what if the string is huge but the alphabet is small? Eg, a DNA sequence."
There's a single pass solution that's left as an exercise for the reader.
Isn't that the point? You say it like it's a problem, but it screens for someone who's reasonably prepared, which is probably what Google wants to do because they really start looking for the super-world-tier people they employ, because I guess that takes more time and resources.
> There's a single pass solution that's left as an exercise for the reader.
Doesn't the blog post already include a single-pass solution that works on a small alphabet?
i will still have to iterate through the string and the hash/dict for a small alphabet is not taking up a lot of space. by the wording it sounds like maybe you want people to use bitflags but your problem still seems solvable with the blog's solution (hash/dicts)
I love that this question is very basic, tests only entry-level data structures, and can at least be talked and reasoned about even if you don't know the exact API of a hashmap, and YET I am sure people will find a reason to complain about this claiming they never actually need to find a recurring character in their actual job role.
Usually, part of the question is to ask about legal characters. If they tell you it's letters only, then you can get away with a pre-allocated array (and give a concrete big O answer).
It's not an order of magnitude faster. The details depend on the hash table implementation and the language and the processor specifics and the statistical distribution of string sizes and whether you can make assumptions about the number of symbols being used and whether a tight double loop for a small symbol set and string size would fit in the cache and whether it would stay there long enough to make a useful difference.
Naive Big O is just as likely to be wrong as naive anything else.
I hope this isn't a genuine phone screen question, because if someone punted this at me as a pass/fail I'd fail them for being confused about how real computers and real languages work behind those nice clean-looking "if thing is in other_thing..." statements.
The second solution is safer than the first. While it will sometimes be slower, it will never be catastrophically bad. It may have a small, predictable overhead, but it will never unexpectedly bring down the whole program when you get a few unusually long strings.
When searching through a sorted list, your first instinct should be to use binary search. Your first instinct should not be "well maybe the list is always small, so linear search would be faster." Write the safe, predictable thing and save the optimizations for later.
asdginioubnou|5 years ago
(I guess, most precisely, we should define K to be min(M + 1, N). Then the first solution is O(K^2) and the second solution is O(K))
EDIT: I assumed the first solution would compare each character to the characters visited so far, i.e.
Maybe it's That would make it O(KN). Or O(N) if we treat the size of the alphabet as fixed.asdginioubnou|5 years ago
I mean "N characters in the string", i.e. the string is length N. There won't be N different characters.
v_g|5 years ago
chrisseaton|5 years ago
Is this right? It's a quadratic not an order of magnitude difference isn't it? And can you be an order of magnitude faster in an O notation expression anyway? Isn't an order of magnitude a coefficient? Don't we drop coefficients in O notation?
asdginioubnou|5 years ago
I always use the precise meaning. It might be better for me to say "factor of ten" rather than "order of magnitude" if other people don't always use the precise meaning.
fancyfredbot|5 years ago
stellalo|5 years ago
It's not right, indeed
ayberk|5 years ago
There's a single pass solution that's left as an exercise for the reader.
chrisseaton|5 years ago
Isn't that the point? You say it like it's a problem, but it screens for someone who's reasonably prepared, which is probably what Google wants to do because they really start looking for the super-world-tier people they employ, because I guess that takes more time and resources.
> There's a single pass solution that's left as an exercise for the reader.
Doesn't the blog post already include a single-pass solution that works on a small alphabet?
aeyes|5 years ago
I don't see what changes with a small alphabet. Or are you searching for multiple ocurrences of sequences instead of single characters?
swang|5 years ago
kinkrtyavimoodh|5 years ago
jrootabega|5 years ago
thiht|5 years ago
Really? I've a hard time believing this
jimbob45|5 years ago
TheOtherHobbes|5 years ago
Naive Big O is just as likely to be wrong as naive anything else.
I hope this isn't a genuine phone screen question, because if someone punted this at me as a pass/fail I'd fail them for being confused about how real computers and real languages work behind those nice clean-looking "if thing is in other_thing..." statements.
asdginioubnou|5 years ago
When searching through a sorted list, your first instinct should be to use binary search. Your first instinct should not be "well maybe the list is always small, so linear search would be faster." Write the safe, predictable thing and save the optimizations for later.