top | item 30843915

(no title)

samcrawford | 3 years ago

Filtering out the spam results is only half the problem. In my experience, a legitimate site's content is cloned by a spam site, and that one appears in a Google search and the legitimate one does not. The example that keeps hitting me is GitHub Issues.

Filtering out the spam only removes the clones; it doesn't get the good results back in.

discuss

order

3np|3 years ago

Host a personal (potentially shared with friends) searx (for multi-engine) or whoogle (google only) instance. Filter out some domains completely, rewrite others. The rewrite part is what allows you to substitute spam clone sites for the real deal. At least searx does dedupe already.

The time spent (including maintenance) will be paid back faster than you might expect.

Optionally rewrite some sites to altfronts like nitter/scribe/piped. If you care about spending time on privacy and decoupling searches from visits, you can set up arbitrary proxying rules.

One benefit among others over browser extensions is that it's a one-time setup for all your devices and clients. All you need to do on reinstall is to change the default search engine.

jccalhoun|3 years ago

it isn't even just 'clones' because so many sites will just summarize an article from somewhere else and give a link to it. Sometimes it is a game of telephone with one site summarizing a 2nd site which is a summary of a 3rd and so on. I want a search engine to show me the original source not the one with the best SEO

krono|3 years ago

Sure, but at least it prevents you from accidentally clicking those unwanted results - something I kept doing all the time.

Either way, OP's ask was for a way to blacklist results, and I'm providing a method to accomplish exactly that. Edit: The rest is up to Google.