This is a fun dataset. The paper leaves a slight misimpression about channel statistics: IIUC, they do not correct for sampling propensity to reweight when looking at subscriber counts (it should be weighted ~1/# of videos per channel since the probability of a given channel appearing is proportional to the number of public videos that channel has, as long as the sample is a small fraction of the population).
bevan|2 years ago