top | item 30691430

(no title)

heavyarms | 4 years ago

First of all, I'd like to say that this looks like a great project and I wish you the best of luck. I've done a bit of work on building knowledge graphs from semi-structured data and I know that every aspect of it is challenging. Obviously there's the data pipelines, ETL, semantic matching/categorization, statistical models, etc. Just building a simple UI for presenting a large knowledge graph was more challenging than most front end work I've ever done.

Question: if the goal is to build a knowledge graph that can "explain how anything in the world is related to everything else" how do you measure progress toward that goal? And how do you measure the quality? Just having a bunch of topics and relationships is not a great metric in my opinion. Obviously this is still very early, but here's an example I found in about 30 seconds of clicking around:

"Evidence suggests that Heart Failure is related to Income and COVID-19." [https://www.system.com/view/topic/P0XELnR0PaK]

There are topics in System for "Obesity" and "Smoking", but those are not associated to Heart Failure.

discuss

order

adam_bly|4 years ago

Thank you so much. We'd love for you to join our Slack community (link on system.com).

Great question. There is no ground truth that we are modeling System after, i.e. there is no causal model of the world out there (to use Pearl's framing). So I'm not sure we can know how far along we are epistemologically. More practically, for the next few years we have plenty of work to just represent all the existing corpuses of scholarship! The truer and arguably more meaningful test of progress though is how decisions are improved — for users, for organizations — that use System.

Quality is evaluated and presented using a variety of parameters like strength, significance, and reproducibility (full documentation here: https://docs.system.com/system/using-system/investigating-re...).

Re completeness, as I wrote below, System Search results are not necessarily comprehensive — but they will be. System is in the early stages of its development as a public resource and you should expect that knowledge will be missing. The knowledge base will be constantly growing and improving and evolving as knowledge does. Our community will play an important role in relating what we expect or know should be related.

monstertruck|4 years ago

(Note: I work at System)

First, thanks! If you'd like to reach out and learn more or talk about your learnings from building something like this we'd be very interested (we have a Slack community and a direct contact form on the site).

As for your questions - we have tools for assessing the reproducibility (in the statistical sense) of models and relationships added to System, as well as tools for users (and built in to the platform itself) to assess the relative statistical strength between any two relationships that you find on the site.

And, yes, we're early on in the process of writing (peer-reviewed) evidence on various topics, and as you note, the value of seeing these systems will grow with how detailed the topics are covered and the overall number of the world's topics shown to be related. I hope you'll stay engaged to see!