This is such a clever way of sampling, kudos to the authors. Back when I was at Pew we tried to map YouTube using random walks through the API's "related videos" endpoint and it seemed like we hit a saturation point after a year, but the magnitude described here suggests there's a quite a long tail that flies under the radar. Google started locking down the API almost immediately after we published our study, I'm glad to see folks still pursuing research with good old-fashioned scraping. Our analysis was at the channel level and focused only on popular ones but it's interesting how some of the figures on TubeStats are pretty close to what we found (e.g. language distribution): https://www.pewresearch.org/internet/2019/07/25/a-week-in-th...
m463|2 years ago
Isn't this ironic, given how google bots scour the web relentlessly and hammer sites almost to death?
LeonM|2 years ago
I have been hosting sites and online services for a long time now and never had this problem, or heard of this issue ever before.
If your site can't even handle a crawler, you need to seriously question your hosting provider, or your architecture.
LocalH|2 years ago
dotandgtfo|2 years ago
MBCook|2 years ago
trogdor|2 years ago
0x1ceb00da|2 years ago
pants2|2 years ago
justinpombrio|2 years ago
In the "100 fish" example, the formula for approximating the total number of fish is:
In their YouTube sampling method, the formula for approximating the total number of videos is: Notice that this is flipped: in the fish example the main measurement is "tagged" (the number of fish that were tagged the second time you caught them), which is in the denominator. But when counting YouTube videos, the main measurement is "valid" (the number of urls that resolved to videos), which is in the numerator.zellyn|2 years ago
dclowd9901|2 years ago
midasuni|2 years ago
https://dl.acm.org/doi/10.1145/2068816.2068851
krackers|2 years ago
https://en.wikipedia.org/wiki/Unseen_species_problem http://www.stat.yale.edu/~yw562/reprints/species-si.pdf
neurostimulant|2 years ago
Won't this mess up stats though? It's like a lake monster randomly swapping an untagged fish with tagged fish as you catch them.
fergbrain|2 years ago
layer8|2 years ago
unknown|2 years ago
[deleted]
gaucheries|2 years ago
herval|2 years ago
nextaccountic|2 years ago
pvankessel|2 years ago
hipadev23|2 years ago
[deleted]
blackle|2 years ago