top | item 22790425

Show HN: DNS over Wikipedia

398 points| aaronjanse | 5 years ago |github.com

Hey HN,

I saw a thread a while ago (linked in README) discussing how Wikipedia does a good job keeping track of the domains of websites like Sci-Hub or The Pirate Bay. Someone mentioned checking Wikipedia to find links to these sites, so I thought this would be a fun thing to automate!

To try it out, install an extension or modify your hosts file, then type in the name of a website with the TLD `.idk`.

For example: scihub.idk -> sci-hub.tw

Cheers!

104 comments

order

_gjrn|5 years ago

DNS translates a name into an IP address. This is not DNS per-se, it is just a search plugin for the url bar.

If an analogy was needed with a network service perhaps this is more like a proxy redirector than DNS.

Keep in mind: with this you will still be misdirected if your DNS/hosts file is pointing the name into a different IP than it should be.

capableweb|5 years ago

Indeed. Even the GitHub repositories description has this error.

> Resolve DNS queries using the official link found on a topic's Wikipedia page

@aaronjanse: you probably want to correct this. "Resolving DNS records" carry a specific meaning in that you have a DNS record and you "resolve" it to a value, which actually. You're kind of doing, in a way, I suppose.

I was convinced when I started writing this comment that calling this "resolve dns queries" is wrong. But thinking about it, DNS resolving is not necessarily resolving a "name into a IP-address" as @HugoDaniel in the comment I'm replying to is saying (think CNAME records and all the others that don't have IP addresses). It's just taking something and making it into something else, traditionally over DNS servers. But I guess you could argue that this is resolving a name into a different name, that then gets resolved into a IP address. So it's like a overlay over DNS resolving.

Meh, in the end I'm torn. Anyone else wanna give it a shot?

Polylactic_acid|5 years ago

Its incredible how insane this seems from the title but how practical it sounds from the readme..

basch|5 years ago

Right? Basically a modern im feeling lucky meets meta-dns.

mathieubordere|5 years ago

hehe yeah, was thinking the exact same thing

frei|5 years ago

Pretty neat! Similarly, I often use Wikipedia to find translations for specific technical terms that aren't in bilingual dictionaries or Google Translate. If you go to a wiki page about a term, there are usually many links on the sidebar to versions in other languages, which are usually titled with the canonical term in that language.

itaysk|5 years ago

I do this as well, I find that wikipedia is the best dictionary

bausano_michael|5 years ago

I found this to be a great method too. Especially for topics which I have been educated on in my mother tongue in high school. I know the term in Czech but I'd be unsure about the direct translation.

nitrogen|5 years ago

Out of curiosity, how well does Wiktionary fare in this regard?

nicbou|5 years ago

Dict.cc is excellent for that, if you're translating between German and English. Linguee can also be really good.

cpach|5 years ago

I also do this :) It would be cool to build a dictionary that uses this method.

segfaultbuserr|5 years ago

There's a risk of phishing by editing Wikipedia articles if the plugin gets popular. Perhaps it's useful to crosscheck the current URL against the 24-hour earlier and 48-hour earlier versions of the same article. Crosscheck back in time, not back in revision, since one can spam the history by making a lot of edits.

cxr|5 years ago

I jotted down some thoughts about this very thing last year. Here's the part that argues that it could work out to be fairly robust despite this apparent weakness:

> Not as trivially compromised as it sounds like it would be; could be faked with (inevitably short-lived) edits, but temporality can't be faked. If a system were rolled out tomorrow, nothing that happens after rollout [...] would alter the fact that for the last N years, Wikipedia has understood that the website for Facebook is facebook.com. Newly created, low-traffic articles and short-lived edits would fail the trust threshold. After rollout, there would be increased attention to make sure that longstanding edits getting in that misrepresent the link between domain and identity [can never reach maturity]. Would-be attackers would be discouraged to the point of not even trying.

https://www.colbyrussell.com/2019/05/15/may-integration.html...

Asmod4n|5 years ago

I believe the German version of Wikipedia had(has?) a feature where you only get verified versions of a page when you browse it anonymously.

BillinghamJ|5 years ago

Nice idea! Maybe should involve some randomised offsets so it can't just be planned ahead of time

pishpash|5 years ago

And what would you do if there was a difference?

hk__2|5 years ago

> Wikipedia keeps track of official URLs for popular websites

This should be Wikidata. Wikipedia does that, but this is more and more moved into Wikidata. This is a good thing, because Wikidata is much easier to query, and the official website of an entity is stored at a single place, that is then reused by all articles about that entity in all languages.

snek|5 years ago

The extension has nothing to do with DNS, a more accurate name would be "autocorrect over wikipedia".

The rust server set up with dnsmasq is a legit DNS server though.

MatthewWilkes|5 years ago

It isn't autocorrect either. It's domain name resolution.

Vinnl|5 years ago

I created whereisscihub.now.sh a while ago for exactly this purpose (but limited to the subset of Sci-Hub, of course, and it used Wikidata as its data source). It has since been taken down by Now.sh.

Just as a heads-up of what you could expect to see happening :)

abiogenesis|5 years ago

Nitpicking: Technically it's not DNS as it doesn't resolve names to addresses. Maybe CNAME over Wikipedia?

usmannk|5 years ago

Nitpicking nitpicking: "Technically" CNAME is DNS insofar as DNS is "technically" defined at all.

stepanhruda|5 years ago

No one would click on “I’m feeling lucky for top level pages over Wikipedia” though

renewiltord|5 years ago

This is hecka cool. What a clever concept! I like the idea of piggy-backing on top of a mechanism that is sort of kept in the right state by consensus.

LinuxBender|5 years ago

This may be a little off topic, but has anyone ever considered a web standard that includes a cryptographic signed file in a standard "well known" location that would contain content such as

- Domains used by the site (first party)

- Domains used by the site (third party)

- Methods allowed per domain.

- CDN's used by the site

- A records and their current IP addresses

- Reporting URL for errors

Then include the public keys for that payload in DNS and in the APEX of the domain? Perhaps a browser add-on could verify the content and report errors back to a standard reporting URL with some technical data that would show which ISP is potentially being tampered with? Does something like this already exist beyond DANE? Similar to HSTS maybe the browser could cache some of this info and show diffs in the report? Maybe the crypto keys learned for a domain could also be cached and warn the user if something has changed (show diff and option to report)? Maybe more complex would be a system that allows a consensus aggregation of data to be ingested by users so they may start off in a hostile network and some trusted domains populated by the browser in advance, also similar to HSTS?

andrekorol|5 years ago

That's a good use case for blockchain, in regards to the "consensus aggregation of data" that you mentioned.

oefrha|5 years ago

Pretty cool, although legally gray content distribution sites like Libgen, TPB, KAT, etc. are often or often better thought of as a collection of mirrors where any mirror (including the main site, if there is one) could be unavailable at any given time.

CapriciousCptl|5 years ago

Could use some sort of verification since Wiki can be gamed.

1. Look at past wiki edits combined with article popularity or other signals to arrive at something like a confidence level.

2. Offer some sort of confirmation check to the user.

gbear605|5 years ago

One concern is that you can’t always trust the Wikipedia link. For example, in this edit [1] to the Equifax page, a spammer changed the link to a spam site. They’re usually fixed quickly, but it’s not guaranteed. So it’s a really neat project, but be careful about actually using it, especially for sensitive websites.

[1]: https://en.wikipedia.org/w/index.php?title=Equifax&diff=9455...

edjrage|5 years ago

True, seems pretty risky. Maybe the extension could take advantage of the edit history and warn the user about recent changes?

Edit: Unrelated to this issue, but I have a more general idea for the kinds of inputs this extension may accept. It could be an omnibox command [0] that takes the input text, passes it through some search engine with "site:wikipedia.org", visits the first result and finally grabs the URL. So you don't have to know any part of the URL - you can just type the name of the thing.

[0]: https://developer.chrome.com/extensions/omnibox

yreg|5 years ago

The user should exercise caution, but in the use cases provided (a new scihub/tpb domain) that applies regardless.

29athrowaway|5 years ago

Many Wikipedia articles can be edited by anyone. This is not secure.

jrockway|5 years ago

Why does Google censor results, but not Wikipedia? It seems like you can DMCA Wikipedia just as easily as Google.

Overall this is a nifty hack and I like it a lot. Wikipedia has an edit history, and a DNS changelog is something that is very interesting to have. People can change things and phish users of this service, of course, but with the edit log you can see when and potentially why. That kind of transparency is pretty scary to someone that wants to do something malicious or nefarious.

jhasse|5 years ago

Google also sells copyrighted content, Wikipedia doesn't.

leoh|5 years ago

Nice work! Sometimes I seem to be directed to a wikipedia page as opposed to a URL. For example, with `aaronsw.idk` or `google.idk`. I wonder why that's the case?

O_H_E|5 years ago

I think it directs to the correct link when it is labeled `URL` in wiki. In the other cases the link is labeled `Website`.

erikig|5 years ago

Interesting idea but:

- How do you handle ambiguity? e.g what happens when sci-hub.idk and scihub.idk differ?

- Aren’t you concerned by the fact that Wikipedia is open to editing by the public?

aaronjanse|5 years ago

> Aren’t you concerned by the fact that Wikipedia is open to editing by the public?

Arguably the thrill of uncertainty could add to the fun :D

captn3m0|5 years ago

Maybe use WikiData? The slower rate of updates might work in your favour to avoid vandalism.

tubbs|5 years ago

Your second point was my first thought - mostly because of an experience I had.

I used Pushbullet's recipe for "Google acquisitions" up until the night I got the notification "Google acquires 4chan". After being perplexed for a bit and a few more "acquisitions" were made, I discovered the recipe just used Wikipedia's List of mergers and acquisitions by Alphabet[1] page as a source.

[1]: https://en.wikipedia.org/wiki/List_of_mergers_and_acquisitio...

jneplokh|5 years ago

Awesome idea! It could be applied to a lot of different websites. Even ones where I'm too lazy to type out the whole URL :p

Regardless, having a system where you can base it off a website could definitely be expanded beyond Wikipedia. Great work!

snorrah|5 years ago

Does this comply with the terms of service? I know this won’t be a popular reply and that’s fine, but I just want to know whether your admittedly intriguing concept isn’t taking the piss :)

upgoat|5 years ago

Woah this is hecka cool!! Nice work to the authors.

jaimex2|5 years ago

Alternatively just don't use Google.

newswasboring|5 years ago

And find the site through clairvoyance?

Sabinus|5 years ago

What search engines don't censor?

sm4rk0|5 years ago

Nice hack, but you can do it much easier with DuckDuckGo's "I'm Feeling Ducky", which is used by prefixing the search with a backslash:

https://lmddgtfy.net/?q=%5Chacker%20news

That's especially useful if DDG is default search engine in your browser.

(I'm not affiliated with DDG)

kelnos|5 years ago

That just takes you to the first DDG result, no?

The purpose of this seems to be to treat Wikipedia as a trusted, reliable source of truth about the canonical URL for websites (debatable, of course). The idea is that you don't trust the search engines, perhaps because you live in a country where your government has required search engines to censor results in some way, but (for some reason?) lets you go to Wikipedia.

jakear|5 years ago

> If you Google "Piratebay", the first search result is a fake "thepirate-bay.org" (with a dash) but the Wikipedia article lists the right one. — shpx

How interesting. Bing doesn't do this, which leads me to believe it's not a matter of legality. Is Google simply electing to self-censor results that it'd prefer it's used not to know about? Strange move, especially given the alternative Google does index is almost definitely more nefarious.

sixhobbits|5 years ago

Google has been downranking sites based on copyright takedown requests since 2018 at least [0]. And it's been very hard to find torrent sites or streaming sites through Google since then in my experience.

As many have pointed out, this just makes it easier for actually malicious sites to get traffic.

[0] https://torrentfreak.com/google-downranks-65000-pirate-sites...

tomcooks|5 years ago

Google does list proper pirating sites!

At the bottom of the page click on the DMCA complaint, you'll find all the URLs you shouldn't ever, never ever, click on~

jonchurch_|5 years ago

I'm not sure how long that's been the case. The actual site at their normal domain seems to have been down for a few months, with a 522 cloudflare timeout.

I'm curious if that's the case for you as well, or if it's my ISP blocking (I wouldn't expect to see the cloudflare error if my ISP was blocking but I don't know).

I bring this up because if the site is unresponsive from wherever you're searching (or perhaps unresponsive for all, idk) then maybe it got de-ranked on google.

jimmaswell|5 years ago

This fake one seems to work fine. To what end is it there, honeypot or just ad money?

BubRoss|5 years ago

Wouldn't dns over github make more sense than this?