top | item 33791294

Does HN have anti-duplication protection?

4 points| mothcamp | 3 years ago

Six months ago, I published part one of my NLP course and submitted this link: https://news.ycombinator.com/item?id=31421232

This morning, I wanted to share that I released the FULL course (same URL) but every time I hit submit, it redirects me to my previous submission.

Is this some anti-duplication protection in action? Does my account not have posting privileges?

13 comments

order

mindcrime|3 years ago

Yes, there is at least some automatic anti-duplication stuff going on. The easiest way to see this in action is to re-submit an existing URL with the exact same URL within a certain period of time, and notice that your submission just automatically becomes an upvote on the existing submission.

That said, the anti-dupe mechanism doesn't catch all dupes, and from what I can recall of things said by dang, pg, etc in the past, I think that is intentional. In particular, dupes are explicitly considered OK after a certain period of time. You can see this by noting that certain links have been submitted to HN, and sometimes discussed in detail, on 5, 10, or even 15 unique occasions.

I believe it is the case that whatever automatic anti-duplicate detection they have doesn't do much besides look for an exact match on the URL though. It was known at one time that you could submit a dupe and get it to go through by just adding some extra stuff to the query string for example. What I can't speak to at all, is how much effort (if any) the mods put into manually detecting and remediating dupes. I can't recall any of the mods ever addressing that point explicitly, but my suspicion is that they do spend at least some cycles on doing that, but I can't prove it. And I may very well be wrong.

All this is totally unofficial mind you. It's just based on my recollections from various times this topic has been discussed in the past, and my own empirical observations. YMMV.

wskish|3 years ago

I noticed a lot of dups on the HN Summary bot (https://github.com/jiggy-ai/hn_summary) so was wondering if we needed an embedding similarity search to filter them. So I checked the database of recent stories and found 194 instances of duplicates with exact same Story Title or Story URL in the last few days that the bot has been running.

There were all story items that made it into the /topstories hacker news api endpoint:

https://gist.github.com/wskish/c8c6dbcb1c036882f3eb11b0660c0...

Normille|3 years ago

Judging by the countless times the same stories get posted here, I'd very much doubt there's any automatic de-duplication going on.

But, if the system is stopping you submitting the same URL again, why not why not just put a meaningless query string on the URL so it's different from last time. eg:

https://www.nlpdemystified.org/?blah

BTW. I don't know if that will work. Just a thought.

Tomte|3 years ago

Yes. The former submission got enough attention, so it shouldn't be submitted for a year.

Solution: write a separate release announcement (there's certainly more to tell than just "done"?), link to the course from there, and submit the announcement.

Normille|3 years ago

  >Yes. The former submission got enough attention, so it shouldn't be submitted for a year.
If this is official policy, it's kind of laughable to single out 'Show HN' posts for this treatement, given how any major techie news story gets submitted over and over and over again for days on end. I've complained about this so many times in the past, to no avail. Anyone who uses the 'Newest' page as their HN landing page will know what I mean.

mothcamp|3 years ago

Will do. Thank you.

PaulHoule|3 years ago

Yeah it tries to block dupes, but funny the other day people were complaining that the same Elon Musk tweet got submitted at least 5 times in 30 minutes…

Normille|3 years ago

  >people were complaining that the same Elon Musk tweet got submitted at least 5 times in 30 minutes….
Don't get me on my other soapbox --people posting fecking Tweets as 'news' stories!

Mind you, the throbbing vein in my temple, on that score, has stopped since I added this line to my uBlock Origin filters....

news.ycombinator.com#?#tr.athing > td.title > span:-abp-contains(/[Tt]witter/):upward(tr)