top | item 40988729

(no title)

a20eac1d | 1 year ago

Could you give me a concrete example of what that looks like?

discuss

order

teaearlgraycold|1 year ago

Sure.

Here's a log file of page accesses on our server. It's a CSV. The first column is the user, the second column is the page, and the third column is the load time for that page in milliseconds. We want to know what is the most common three page path access pattern on our site. By that I mean, if the user goes to pages A -> B -> C -> A -> B -> C the most common three page path for that user is "A -> B -> C".

    user, page, load time
    A, B, 500
    A, C, 100
    A, D, 50
    B, C, 100
    A, E, 200
    B, A, 450

    etc.
So for this first question you should give an answer in the form of "A -> B -> C with a count of N".

We would have two files, one simple one that is possible to read through and calculate by hand, and one too long for that. The longer file has a "gotchya" where there's actually two paths that are tied for the highest frequency. I'd point out that they'd given an incomplete answer if they don't give all paths with the highest frequency.

The second part would be to calculate the slowest three page path using the load times.

In my opinion it's a pretty good way to filter out people that can't code at all. It's more or less a fancy fizzbuzz.

mym1990|1 year ago

Is there a point in the log where there is a time cutoff for a viewer of a page? By that I mean: in your sample user A goes B > C > D, then there is a view by a different user, and then we are back to user A. What if the time difference between user A going to page E is like 10 minutes...is that a new pattern?

I feel like this is a fun thought experiment, but instead of thinking about "gotchas" I would be more open to having a discussion about edge cases, etc... The connotation of gotchas just seems to be like a trap where if you hit one, you've failed the interview.

zelphirkalt|1 year ago

On a quick glance I don't understand your example. Are you sure there is no mistake in it? I would ask which user has shown ABC page path, because I don't see any. Perhaps you made up the lines on the fly while writing it here, and the actual example is clearer? Already a bit dumbfounded by this. Such things can easily throw people off for the rest of the interview. Keep in mind the stress situation you put people into. Examples need to be 100% clear.

commandlinefan|1 year ago

Ok, I'll bite... without having googled it, is there some trick to solving this besides enumerating every three-page path and sorting them? This reads like some one-off variant of the traveling salesman problem.

mulmen|1 year ago

Are these records assumed to be in order?

lucb1e|1 year ago

Okay I tried it. I got interrupted twice for like ~12 minutes total, making the time I spent coding *checks terminal history* also 12 minutes. I made the assumption (would have asked if live) that if a user visits "A-B-C-D-E-F", then the program should identify "B-C-D" (etc.) as a visited path as well, and not only "A-B-C" and "D-E-F", which I felt made it quite a bit trickier than perhaps intended (but this seems like the only correct solution to me). The code I came up with for the first question, where you "cat" (without UUOC! Heh) the log file data into the program:

    import sys
    unfinishedPaths = {}  # [user] = [path1, path2, ...] = [[page1, page2], [page1]]
    finishedPaths = {}  # [path] = count
    for line in sys.stdin:
        user = line.split(',')[0].strip()
        page = line.split(',')[1].strip()
        if user not in unfinishedPaths:
            unfinishedPaths[user] = []
        deleteIndex = []
        for pathindex, path in enumerate(unfinishedPaths[user]):
            path.append(page)
            if len(path) == 3:
                deleteIndex.append(pathindex)
        for pathindex in deleteIndex:
            serializedPath = ' -> '.join(unfinishedPaths[user][pathindex])
            if serializedPath in finishedPaths:
                finishedPaths[serializedPath] += 1
            else:
                finishedPaths[serializedPath] = 1
            del unfinishedPaths[user][pathindex]
        unfinishedPaths[user].append([page])
    
    for k in sorted(finishedPaths, key=lambda x: finishedPaths[x], reverse=True):
        print(str(k) + ' with a count of ' + str(finishedPaths[k]))
Not tested properly because no expected output is given, but from concatenating your sample data a few times and introducing a third person, the output looks plausible. And I just noticed I failed because it says top 3, not just print all in order (guess I expect the user to use "| head -3" since it's a command-line program).

I needed to look up the parameter/argument that turns out to be called "key" for sorted() so I didn't do it all by heart (used html docs on the local filesystem for that, no web search or LLM), and I had one bout of confusion where I thought I needed to have another for loop inside of the "for pathindex, path in ..." (thinking it was "for pathsindex, paths in", note the plural). Not sure I'd have figured that one out with interview stress.

This is definitely trickier than fizzbuzz or similar. Would budget at least 20 minutes for a great candidate having bad nerves and bad luck, which makes it fairly long given that you have follow-up questions and probably also want to get to other topics like team fit and compensation expectations at some point

edit: wait, now I need to know: did I get hired?