I had a similar interview question for Google. I was given a graph of both average user latency and server load, where average user latency was inversely correlated with server load, i.e. latency went up when the load on the server was at its lowest and vice versa, and was then asked to brainstorm reasons why this could be the case.
The actual answer (I could tell this was drawn from a real experience) was "China". The valleys in server load corresponded to EST nights, which happens to correspond to the workday in China. During these time periods, there are fewer users online, but the vast majority of them are located in Asia, where a response to them needs to get through the Great Firewall of China and cross a trans-Pacific cable. Meanwhile, the U.S. west coast is just going to bed and the U.S. east coast & Europe are sleeping, so all of the low-latency users drop out of the population sample. It was a nifty application of both Simpson's Paradox, correlation-is-not-causation, and speed-of-light limits.
Extra points if you can think of how to avoid strange debugging situations like this in the future.
I had an actual high severity issue on-call page over this kind of thing once.
When I first started with Amazon, I was a dev on the team owning the UI component that all warehouse Picking associates used. Had to run on mobile handscanners with IE5.5. The UI was dead simple, pretty darn fast and that speed matters for worker efficiency. Taking 3 seconds between an item scan and the next pick being displayed was unacceptably slow.
One night a few weeks after Christmas, I got an autocut alarm at around 10pm. It said "PickUI page loads 90th percentile > 5 seconds" or something to that effect. I logged in and sure enough, the latency was spiking up and down like crazy for the entire European stack. All our dependencies were rock solid, metrics showed stable performance. Network was fine, no issues. Rendering was normal. But the round trip latency from the users' perspective (sent via Ajax post after a page load) was showing some round trips were taking 15 seconds!
After 30 minutes of scratching my head, I happened to accidentally switch my graph from "P90 latency" to "number of data points". There were very, very few. It turned out that since the Christmas season was just over, all the EU wearhouses were only running day shifts that week. (That doesn't happen very often anymore, I hear). In the entire European region, there was one picking associate working. And she was picking in some area with poor WiFi signal. Around 20% of the time, her page loads were slow because her WiFi sucked. I drew out a rough map of the locations she picked from and the latency- they was some kind of dead zone. But she was the entirety of my metrics for Europe, so that meant my 90th percentile was terrible.
I modified the alarm to have a minimum number of data points threshold and went to bed.
Took me a couple times to read through this to realize that this was asking about the intermediate trip time between the server and the client rather than the total round-trip time that it took to process requests, including processing time.
Very creative question, requires thinking outside the box. I don't think I would be able to answer it correctly.
> The actual answer (I could tell this was drawn from a real experience) was "China"
“China” would have been my first answer to that question.
After spending so much time in a project that deals with massive amounts of HTTP requests from China, I realized that the majority of web developers are making too many assumptions. Several times I have found myself patching pieces of code because the connection was suddenly dropping in the middle of the operation, so I had to add retries and read incomplete streams of bytes to recover at least pieces of the actual data from our Chinese netizens.
In retrospective, I have learned a lot about TCP in the last year with this project.
Several months ago, I was doing ping geolocation on a bunch of VPN servers. To see if they're actually located where claimed. I looked at several AirVPN servers, claimed to be in Hong Kong. Based on data from hundreds of probes, they did in fact seem to be in Hong Kong.
However, minimum(rtt) was ~300 msec for probes in several mainland locations, some <100 km from Hong Kong. Why might that have been?
I love this story and find it refreshing each time I read it. That said, has anyone ever looked into if it's real or apocryphal? I'd be curious to know.
I don't know whether the original story was -ahem- embellished a bit or not, but I do know that after forwarding this to a sysadmin friend of mine running a host of Sun boxes at the Norwegian University of Science & Technology, he got somewhat obsessed with replicating this bug - turned out on his hardware du jour the range was some 1100km (700 miles.)
I believe the problem here can be used as a proof of proximity in a blockchain. Imagine 2 specially designed devices that can receive a string, perform a public key encryption on it, and transmit it with very low turnaround time.
A bitcoin block hash is found, and 2 participants begin a back and forth hashing session in which billions of round trips are performed...every minute or so, a transaction containing the latest result is submitted to the network, and eventually one is locked in.
An interested party could then verify that the transaction could not have occurred unless the 2 keys involved were within a certain physical distance from one another.
Not sure what it could be used for yet, but it's something that feels like it could be important for some purpose.
Only works if there's no router delays in the middle. The vast majority of consumer Internet connections fail this - I've already got an 8ms ping just to get to the first Comcast node, from my home wi-fi router. It's near instantaneous within Comcast's network, but then loses a couple milliseconds handing off to their fiber backbone provider.
He could use timeout as a proxy for distance because all the e-mail recipients were within the same university, with a direct-switched network. Once it got on the wire, the only latency was from speed-of-light.
You could use this to prove that N devices are spread fairly geographically evenly near the line between two devices you know the location of, if the transport latency was fairly constant.
If you cross this line over multiple countries, you now have a system where you can prove that the devices are spread across some number of the different countries, which probably has some useful properties, like making nation-state attacks slightly more difficult.
That would certainly demonstrate that the keys were brought geographically near each other for some amount of time, which is interesting, but of what use would it be? It can't usefully test where I am, just my key. I can rent a cloud machine anywhere in the world for 15 minutes and put my key there temporarily.
I’ve seen this several times before through the years, but somehow missed the ending.
> “I'm looking for work. If you need a SAGE Level IV with 10 years Perl, tool development, training, and architecture experience, please email me at [email protected]. I'm willing to relocate for the right opportunity.”
[+] [-] nostrademons|7 years ago|reply
The actual answer (I could tell this was drawn from a real experience) was "China". The valleys in server load corresponded to EST nights, which happens to correspond to the workday in China. During these time periods, there are fewer users online, but the vast majority of them are located in Asia, where a response to them needs to get through the Great Firewall of China and cross a trans-Pacific cable. Meanwhile, the U.S. west coast is just going to bed and the U.S. east coast & Europe are sleeping, so all of the low-latency users drop out of the population sample. It was a nifty application of both Simpson's Paradox, correlation-is-not-causation, and speed-of-light limits.
Extra points if you can think of how to avoid strange debugging situations like this in the future.
[+] [-] mabbo|7 years ago|reply
When I first started with Amazon, I was a dev on the team owning the UI component that all warehouse Picking associates used. Had to run on mobile handscanners with IE5.5. The UI was dead simple, pretty darn fast and that speed matters for worker efficiency. Taking 3 seconds between an item scan and the next pick being displayed was unacceptably slow.
One night a few weeks after Christmas, I got an autocut alarm at around 10pm. It said "PickUI page loads 90th percentile > 5 seconds" or something to that effect. I logged in and sure enough, the latency was spiking up and down like crazy for the entire European stack. All our dependencies were rock solid, metrics showed stable performance. Network was fine, no issues. Rendering was normal. But the round trip latency from the users' perspective (sent via Ajax post after a page load) was showing some round trips were taking 15 seconds!
After 30 minutes of scratching my head, I happened to accidentally switch my graph from "P90 latency" to "number of data points". There were very, very few. It turned out that since the Christmas season was just over, all the EU wearhouses were only running day shifts that week. (That doesn't happen very often anymore, I hear). In the entire European region, there was one picking associate working. And she was picking in some area with poor WiFi signal. Around 20% of the time, her page loads were slow because her WiFi sucked. I drew out a rough map of the locations she picked from and the latency- they was some kind of dead zone. But she was the entirety of my metrics for Europe, so that meant my 90th percentile was terrible.
I modified the alarm to have a minimum number of data points threshold and went to bed.
[+] [-] nickysielicki|7 years ago|reply
Very creative question, requires thinking outside the box. I don't think I would be able to answer it correctly.
[+] [-] guessmyname|7 years ago|reply
“China” would have been my first answer to that question.
After spending so much time in a project that deals with massive amounts of HTTP requests from China, I realized that the majority of web developers are making too many assumptions. Several times I have found myself patching pieces of code because the connection was suddenly dropping in the middle of the operation, so I had to add retries and read incomplete streams of bytes to recover at least pieces of the actual data from our Chinese netizens.
In retrospective, I have learned a lot about TCP in the last year with this project.
[+] [-] mirimir|7 years ago|reply
However, minimum(rtt) was ~300 msec for probes in several mainland locations, some <100 km from Hong Kong. Why might that have been?
[+] [-] slammm|7 years ago|reply
[+] [-] iooi|7 years ago|reply
The prompt gives "586 units, 56 prefixes" instead of the post's "1311 units, 63 prefixes".
So if you want to try this you can `brew install gnu-units` and run gunits instead:
"3070 units, 109 prefixes, 109 nonlinear units"
It's pretty annoying that OSX ships with such outdated utilities [1]
[1] http://meta.ath0.com/2012/02/05/apples-great-gpl-purge/
[+] [-] fisherjeff|7 years ago|reply
[+] [-] ghayes|7 years ago|reply
[+] [-] lb1lf|7 years ago|reply
[+] [-] poizan42|7 years ago|reply
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] pontifier|7 years ago|reply
A bitcoin block hash is found, and 2 participants begin a back and forth hashing session in which billions of round trips are performed...every minute or so, a transaction containing the latest result is submitted to the network, and eventually one is locked in.
An interested party could then verify that the transaction could not have occurred unless the 2 keys involved were within a certain physical distance from one another.
Not sure what it could be used for yet, but it's something that feels like it could be important for some purpose.
[+] [-] nostrademons|7 years ago|reply
He could use timeout as a proxy for distance because all the e-mail recipients were within the same university, with a direct-switched network. Once it got on the wire, the only latency was from speed-of-light.
[+] [-] Cursuviam|7 years ago|reply
If you cross this line over multiple countries, you now have a system where you can prove that the devices are spread across some number of the different countries, which probably has some useful properties, like making nation-state attacks slightly more difficult.
[+] [-] CobrastanJorji|7 years ago|reply
[+] [-] unknown|7 years ago|reply
[deleted]
[+] [-] PLenz|7 years ago|reply
[+] [-] ChicagoBoy11|7 years ago|reply
[+] [-] ronilan|7 years ago|reply
> “I'm looking for work. If you need a SAGE Level IV with 10 years Perl, tool development, training, and architecture experience, please email me at [email protected]. I'm willing to relocate for the right opportunity.”
I wonder how things turned out for Trey...
[+] [-] ghayes|7 years ago|reply
[+] [-] fargo|7 years ago|reply