top | item 17508301

DIRT Protocol raises $3M to build a Wikipedia for structured data

106 points| iamwil | 7 years ago |medium.com | reply

91 comments

order
[+] avichal|7 years ago|reply
It's unfortunate so many people come out of the woodwork to tell people their ideas are terrible or won't work.

I think it's far more interesting to ask how a thing might work, which uses cases might be dramatically underserved today and serve as a beachhead, or the tradeoffs being made rather than just say something is a "bad idea."

Dropbox launch: https://news.ycombinator.com/item?id=8863

Coinbase launch: https://news.ycombinator.com/item?id=4703443

A 2012 thread discussing comment negativity where, coincidentally, the top comment is from @iamwil who posted this link and is on the DIRT team: https://news.ycombinator.com/item?id=4363717

A classic thread from 2012 where PG talks about negative comments: https://news.ycombinator.com/item?id=4396747

To me, the most interesting ideas in the world are the ones that at first blush look like they can't possibly work. But upon thinking through how they might, you learn something.

Props to everyone in the thread who is asking genuine questions and actually trying to understand what the team is building.

[+] p1necone|7 years ago|reply
This seems like a hilariously bad idea. You're basically building a system where whichever group has the most money to burn gets to decide the "truth". I would love you to change my mind though.
[+] yinyinwu|7 years ago|reply
Whoever stakes the most tokens also has the most to lose. There is other side of the moderation - anyone can challenge incorrect data and earn tokens for their work.

Currently, there is no support for moderation at scale. Projects like OpenStreetMap offer a valuable resource, but struggle to maintain quality at scale. This is a relevant article: https://blog.emacsen.net/blog/2018/02/16/osm-is-in-trouble/.

With DIRT's model, we want to create a way to build data sets with a focus on accuracy that can scale.

[+] TazeTSchnitzel|7 years ago|reply
Climate change will quickly be proven as completely fake

…by big oil's dollars.

[+] p1necone|7 years ago|reply
Maybe if there was also very aggressive human moderation to remove anything that was even remotely emotionally/economically charged. Basically just stick to information about math and science (and even then you'd have to avoid political bugbears like global warming or anything to do with economics).
[+] ram_rar|7 years ago|reply
Firstly, Congratulations for launching DIRT.

>If the data is incorrect, anyone can challenge the data and earn tokens for identifying these inaccurate facts.

How do you moderate censorship or conflicting information ? if someone uploads my personal info, without my consent. How do I get to purge it ? From the current model, it seems like I ll have to pay money to "request" purging my own data.

[+] yinyinwu|7 years ago|reply
Thanks! DIRT is a fundamentally different approach because it removes the middleman and there is no central moderator. Our goal is to move the trust from a central party to a system of rules that anyone can participate in.

I think verified identity could be a registry on DIRT. Adding reputation on top of voting is something we're exploring.

[+] eindiran|7 years ago|reply
Can someone explain how exactly DIRT protocol will do moderation? From their site [1] it seems like they do some sort of moderating of crowd-sourced structured data. But this is where I am a bit confused: "DIRT maintains accuracy because every contributor needs to deposit tokens to write data. If the data is correct, it is freely shared. If the data is incorrect, anyone can challenge the data and earn tokens for identifying these inaccurate facts." How is the data flagged as incorrect? Who decides that the original data is wrong and the new data is correct?

Also, given that contributors have to put money up to add information, what incentive do they have to add information in the first place?

[1] https://dirtprotocol.com/

[+] yinyinwu|7 years ago|reply
Hi! Thanks for the question. DIRT works well for objective information. In the cryptocurrency use case, this could be the ERC-20 smart contract address for a token or a list of investors for a project.

To flag information as incorrect, you need to take tokens and challenge the data. A challenge starts a vote and anyone in the DIRT network can vote with their tokens on what information is correct. The vote winner and majority voters earn tokens. The vote loser and minority voters are penalized.

We are planning to publish our protocol design in a few weeks with more details.

[+] hobofan|7 years ago|reply
We are building something very similar to this with Rlay[0], though we usually don't frame it with the Wikipedia/Wikidata story. One of the factors that incentivices contribution of data in the first place, is that if you deposit tokens for a propositon earlier, you will be rewarded higher than if you do that later. Thus participants try to be the first ones to submit a certain piece of information.

[0]: https://rlay.com/

[+] detaro|7 years ago|reply
"Wikipedia for structured data" seems like an odd tagline when Wikipedia already has a Wikipedia for structured data.
[+] microcolonel|7 years ago|reply
Not to mention Wikidata, which is already Wikipedia for structured data.
[+] yinyinwu|7 years ago|reply
Thanks for the feedback! One of the differences between DIRT and Wikipedia is that we place a higher value on information accuracy. For example, in the cryptocurrency market project teams often list investors and advisors as affiliated with their project when they are not involved. A Wikipedia list of investors would not be sufficient. You want these project teams to have skin in the game and something to lose if they are spreading false information.

That said - the explanation didn't fit as well into a one liner :)

[+] sometimesijust|7 years ago|reply
Seems like a pretty great way to remove the middleman from populist infotainment consumption. Can't be worse than what we already have and by making it explicitly richest party wins it makes the process more transparent. It might not be the Truth but at least it is Honest.
[+] yinyinwu|7 years ago|reply
Thanks for the comment!

Transparency would be the third benefit. With the blockchain, you can see the entire history of votes. Every transaction is recorded. Today, if a website accepts bribes for reviews, visitors to the site do not know that this happened. With DIRT, if a wealthy token holder had a lot of tokens and tries to throw a vote, you can see the attack happening.

[+] DelightOne|7 years ago|reply
> Can't be worse than what we already have

Please proof. Otherwise good irony.

[+] TekMol|7 years ago|reply
My feeling is that this is either an illusion ("it will work out somehow") or a sham ("let's milk the crypto craze").

No answer to the question why selling votes should result in more accuracy. Buzzword Bingo. An overly broad approach. Lot's of social proof but thin on content. This reminds me of all those ICOs we see these days.

Looking forward to read the whitepaper. But somehow I have the feeling it will either never come or it will be just another marketing brochure without technical details.

[+] yinyinwu|7 years ago|reply
Before Wikipedia, the idea that an openly edited online journal would be better and more accurate alternative to Encyclopedia Brittanica would be surprising to most people.

There's two parts of the design that leads to more accuracy for DIRT:

1. Skin in the game - a token deposit to write encourages accuracy because you can lose the deposit if you are incorrect.

2. Encouraging moderation - moderators can earn tokens. If you vote and challenge correctly, you can earn tokens. This creates an economic reward for moderators that can protects the data accuracy in the long term.

We're posting the whitepaper and more importantly, launching the protocol with a first application in the coming months. Stay tuned!

[+] forgottenpass|7 years ago|reply
I don't see how the business model causes data to trend towards truth. I can see how it would trend towards whatever the people with financial incentive to change it want it to say. While DIRT pockets a tax on the edit war, ofc.
[+] lurker456|7 years ago|reply
Interesting idea. How will this handle legal take-down requests ? Backed by DMCA, GDPR, and so on.
[+] lm_nop|7 years ago|reply
Take down requests indeed. Especially for individuals, and especially with GDPR(right to be forgotten), CA privacy law(dont sell my data), etc. If someone else writes on DIRT that I'm an alien from an alien planet bringing a virus to earth (when in fact I'm a human from SF with no viruses), do I need to 1) PAY to get a token to challenge this and also 2) correct inaccurate information with correct information with no option to totally remove the entry (however "ridiculous" the entry)?

Applied to the case of getting accurate VC listings, DIRT has a ploy to get VCs to PAY to get tokens to challenge incorrect entries. Consumers also have an interest in the quality of information, but a primary concern lies with the subject of an entry.

DIRT -If I may, my request to you is to document the heck out of your policies and expected behaviors. The grey line of "ridiculous" that I point out is something that you've mentioned in another response, that you're not in the business of fake news. At some point, you'll need to be making decisions and providing ethical guidelines.

[+] jameslk|7 years ago|reply
If it costs tokens to submit information, what incentive is there to submit information?
[+] iamwil|7 years ago|reply
It can depend on the contents of the registry and who depends on that information. For example, with a list of top 100 colleges, readers might use it to decide which colleges to go to. And hence, writers would be incentivized to submit their own college to the list.

A better, but less mainstream-relatable example is a list of ERC-20 smart contract addresses.

[+] rgbrgb|7 years ago|reply
very cool! anyone have a list of useful TCRs? i haven't found one yet, but love the idea.
[+] dtran|7 years ago|reply
Inspired by https://news.ycombinator.com/item?id=17512045, I'm trying to figure out what gets me most excited about what DIRT and TCRs could enable.

For me, it's actually not data easily verifiable as true or false, but more for "wisdom of the crowds" type of knowledge—things that you couldn't put up on a source like Wikipedia. These tend to be lists or recommendations that contain some subjectivity, but also tend to coalesce around a mostly-agreed upon set of answers from a trusted set of sources.

In the centralized world, we usually rely upon institutions like the Michelin Guide to develop a fair set of criteria, but we ultimately as end users trust that institution's "objectivity" and judge whether we think that list is valuable. Sometimes when I research, I informally end up creating lists of lists and combining them ad-hoc if I can't tell which of them is more trusted. These lists also tend to end up being static or only updated once or twice a year and can fall horribly out of date.

I think TCR incentives could potentially be really interesting as an alternative to these lists which rely on the institution's brand. For example, I think Quora Answer Wikis (like this one: https://www.quora.com/What-are-the-best-independent-coffee-s...) and general consensus for recommendations in forums for questions like "Which cities should I visit in Thailand if I'm looking for nightlife and places to hike?" or "Which REST framework library should I use for a Django project?" It'd be amazing if DIRT could balance the incentives for community members to contribute to this type of data and keep them as living lists, with all changes and updates maintained through a community with the right checks and balances and incentives.

From the Medium post: >If the data is correct, it is freely shared. If the data is incorrect, anyone can challenge the data and earn tokens for identifying these inaccurate facts. Our protocol and platform makes it economically irrational for misinformation to persist in a data set.

I think the more interesting data would be data that's on a gray scale, e.g. using the above coffee shop in San Francisco example, obviously if John Doe tries to get his burger joint on the list as a growth hack even though they don't serve coffee, that should easily be verified as misinformation. But what if a coffee shop just closed for business, or moved to Mill Valley but thinks they should still be on the list, or just switched beans and raised the prices so that everyone agrees that it no longer deserves to be on the list?

Disclaimer: I know most of the team working on DIRT, and I don't know very much about TCRs.

[+] iamwil|7 years ago|reply
You're right that there's often lists that people make, but usually ends up outdated. Often in these cases, the incentives for reading the list are usually more than those maintaining the list.

People in the earlier days of the internet imagined a better world brought about by immediate and unfettered access to information. Many have tried to make freely available information on the internet. Wikipedia, IMDB, and Freebase are direct products of this school of thought. However, we can only count these on one hand. In fact, most free data projects languish and have a hard time getting off the ground.

What we all discovered as we built out the web is that only some kinds of data can be maintained for free sustainably. Sure, if it's something that engages fandom, like all the different types of starships in star trek, people are intrinsically motivated to update that list. But if it's something that's considered dry but useful, like the tax rates in every county in the US, or points of interest on a map, there won't be enough people with intrinsic motivation to keep that updated.

As builders and users of the web, we've compensated by subsidizing that dry/useful data, typically with a company selling advertising or subscriptions in adjacent services. The implicit deal we make as users is if the company provides the data for free, we're ok with the company accrue profits off the data we help curate. Recently, the sentiment has been growing that this may have been a raw deal for users of the web as a company's profits accrue to the point of immense power over our lives.

What I think the builders of the early web got wrong, was that certain types are data needed to involve other incentives besides intrinsic. While we've found other ways to incentivize users in the 2.5 decades of the internet, cryptocurrencies now give us one more tool in our toolbox to use economic incentives to design systems that converge on the curated lists that are regularly updated.

With this new toolkit, we may be able find another way to provide freely curated data without using subsidization. Instead of the value capture accruing in a single company, we may find a way to sustainably distribute it amongst the curators.

We're not as sure subjective data is a good first fit for TCRs. With any startup, it's better to find a niche application that's a great fit, and we think we've found one in objective data for the crypto space.

I think another aspect that might be exciting for you to think about is if you're able to link the data between registries. It's a non-obvious aspect that almost no one asks about.

[+] progval|7 years ago|reply
So it's Wikidata but with "blockchain" stamped on it to raise more money?
[+] yinyinwu|7 years ago|reply
Wikipedia works great in most use cases. There are also situations where DIRT is a better approach:

1) More efficient than centralized curation - Social media companies receive millions of requests to take down copyright information or spam sites. Today, you have centralized teams vetting each request individually and can take months to review. For this use case, DIRT is a valuable alternative to vetting information because it reduces the noise in each submission. 2) Commercial data - For markets where people can profit for spreading misinformation, open editing is not the best approach. You could create a wikipedia list of stores that sell hand made jewelry. Sellers can benefit from inclusion on the list, and would want to join the list regardless of whether they meet the criteria. The likely outcome is that the list would not be useful.

We’re believers in the blockchain and decentralization, but decentralized information curation is not needed everywhere. However, in markets where there is a critical, single point of failure, where you need transparency because you cannot trust any single actor, and where you have a high demand for data accuracy, DIRT can be very useful.

[+] pbreit|7 years ago|reply
$3m is a fairly modest raise.
[+] tribler|7 years ago|reply
The Truth is determined by the rich! Majority voting by token owners determines the facts. You just can't make up stuff like this.

Curious how this will play out. Threshold for contributing is non-trivial, so the wasteland scenario is my academic guess. The contribution scoring and ranking will help, but can you be anonymous?