top | item 38741449

(no title)

stochtastic | 2 years ago

This is a fun dataset. The paper leaves a slight misimpression about channel statistics: IIUC, they do not correct for sampling propensity to reweight when looking at subscriber counts (it should be weighted ~1/# of videos per channel since the probability of a given channel appearing is proportional to the number of public videos that channel has, as long as the sample is a small fraction of the population).

discuss

order

bevan|2 years ago

I noticed that too. Seems very unlikely that 1,000,000 subscribers represents the 98th percentile and not the 99.999th.