top | item 46514022

(no title)

osakasake | 1 month ago

This is factually incorrect. There’s no way that you are sampling ALL posts and comments because otherwise the average would not be 35 points. The vast majority of posts get no upvotes.

In addition, comments do not show the points accumulated so there’s no way you can know how many points a comment gets, only posts.

discuss

7777777phil|1 month ago

Thanks for the pushback this is exactly the kind of peer review I was hoping for at the preprint stage. You are likely correct regarding the sampling bias. While the intent was to capture all. posts, an average score of 35 suggests that my archiver missed a significant portion of the zero-vote posts (likely due to my workers API rate limits or churn during high-volume periods). This created a survivorship bias toward popular posts in the current dataset, which I will explicitly address and correct.

To clarify on the second point: I am not analyzing individual comment scores (which, as you noted, are hidden). The metric refers to post points relative to comment growth/volume. I will be updating the methodology section to reflect these limitations. The full code and dataset will be open-sourced with the final publication so the sampling can be fully audited. Appreciate the rigor.

ferfumarma|1 month ago

Interestingly, this is the kind of negative feedback that your post implies is bad. Thank goodness for negative feedback!

osakasake|1 month ago

If you want some more feedback, why are you using Cloudflare workers that presumably cost you money? You can retrieve all of the HN content with a regular PC pretty easily. I’m talking a single core with a python program and minimal RAM.