Why Nasa Converted Its Lessons-Learned Database into a Knowledge Graph

[+] mentatseb|7 years ago|reply

The tech story by NASA's chief knowledge architect is more detailed on https://linkurio.us/blog/how-nasa-experiments-with-knowledge... and https://neo4j.com/blog/nasa-critical-data-knowledge-graph/, with a presentation video on https://www.youtube.com/watch?v=vwJyU9vsfmU

Disclamer: Linkurious CEO here, the tool used to explore the Neo4j graph database used at NASA.

[+] petra|7 years ago|reply

Since linkcurious is for the enterprise, What would be your recommendations for a personal knowledge database for individual users ?

[+] chrisweekly|7 years ago|reply

"Chief Knowledge Officer", cool title.

[+] titanix2|7 years ago|reply

I found your product this week when searching for a neo4j visualization tool but I couldn’t try it on anything other than an example database. Is there anyway to try/use it as a researcher?

[+] ice-berg|7 years ago|reply

Amazing, thanks for sharing!

[+] rambojazz|7 years ago|reply

I was hyped to read a cool article about NASA and tech, but this just reads like an advertising for a software that I've never heard of.

[+] ChrisLok1|7 years ago|reply

Was about to post the same thing, multiple submissions by the same person for the same product over the last 2 months.

[+] baxtr|7 years ago|reply

“Content Marketing” articles are proliferating on HN currently...

[+] mikkom|7 years ago|reply

This is the main problem when downvoting articles is not possible.

[+] wespiser_2018|7 years ago|reply

Yea, they definitely got me too. I was all set to nerd out when my "sales alert" went off...

[+] Timothycquinn|7 years ago|reply

I spent a bulk of my programming career modelling business processes in a graph database with strong schema, lifecycle control (state machines) and formal change control (revisioning).

I was always blown away with how easy it was to turn around a very stable and useful system where the customers could actually understand the data model and refactoring was easy to reason through.

Graph databases FTW.

[+] riku_iki|7 years ago|reply

What tools you used for this?

[+] aplc0r|7 years ago|reply

I didn't know the LLIS was available online.

The first one I managed to click on was related to a fire in an employee's car: https://llis.nasa.gov/lesson/943

[+] amylowe|7 years ago|reply

"Employee Falls Down Steep Ramp": https://llis.nasa.gov/lesson/21803

[+] argd678|7 years ago|reply

That’s been known for hundreds of years about linseed oil, and is one if the reasons it’s no longer used to waterproof clothing. I wonder what the circumstances were and what it was being used for.

[+] dmurray|7 years ago|reply

> Spontaneous heating occurs when a self-ignition combustible reacts with sufficient oxygen

Certainly something NASA employees need to be aware of.

[+] fouc|7 years ago|reply

I'm unable to find the lessons mentioned in the article about uprighting systems. Any ideas?

[+] tiuPapa|7 years ago|reply

So one thing I still don't understand is whether Neo4J a pure graph database is better than using something like AegensGraph[0] or Cayley[1], which uses a pre-existing DB engine for their persistent layer. If yes, what are the advantages? Is it something that totally depends on the use case? If it is, what criteria should be used to make a decision?

[0]:https://github.com/bitnine-oss/agensgraph [1]:https://github.com/cayleygraph/cayley

[+] the-alchemist|7 years ago|reply

There's pros and cons to deciding whether to go "graph native" or existing DB.

PROS

You can optimize for exactly the types of queries that you want graph databases to answer: shortest path, path finding, etc. Relational databases / document databases are (generally) very poop at those types of queries because those are not the types of queries people want to run on those databases. In a "graph native" database, everything down to the storage on disk can be optimized to perform graph algorithms.

CONS

There's years, sometimes decades, of engineering that goes into databases (I'm thinking of PostgreSQL and Cassandra, both of which have graph "layers" available). A lot of the engineering work is non-graph specific: ACID, how to handle transactions, distributed computing, WAL, replication.

Why re-engineer all of those just to perform graph operations? More quickly.

Also, I can send you a good paper by the founder of DGraph Labs if you're really curious.

[+] mistrial9|7 years ago|reply

indexing and search specialized to graph operations is a thing; no experience with those projects, but familiar with some workarounds in Postgres. Basically, the deeper the graph searches, the more the performance drops for relational DBs. This is a seriously studied topic, so refer to research for more details

[+] kendallgclark|7 years ago|reply

The real NASA knowledge graph, with actual technical detail... https://www.stardog.com/categories/nasa/

[+] jshen|7 years ago|reply

Yes, and thanks. This bit is really important, I see too many people who don’t understand the difference between a graph database and a knowledge graph.

“So how did we build this thing with the smart folks at NASA as partners and customers? The key takeaway here is that a Knowledge Graph platform is a Knowledge Toolkit plus a Graph Database, and all of those components are critical at NASA.

Doing this with a plain graph database isn’t going to work unless you want to do all the heavy lifting of AI, knowledge representation, machine learning, and automated reasoning yourself, from scratch. I’ll wait while you decide…didn’t think so.”

[+] brad0|7 years ago|reply

I wonder if there is an enterprise app that does this for you?

I can think of plenty of examples at my work where spidering a website and displaying it in a graph would be really cool.

Our wiki would be one for sure.

[+] david_p|7 years ago|reply

To answer your question, yes, there is. The combination of tools used by NASA is Neo4j (database) + Linkurious (enterprise graph exploration tool).

Links: https://neo4j.com/ https://linkurio.us/

More info about this use case here: https://linkurio.us/blog/how-nasa-experiments-with-knowledge...

The screenshot in the article is from Linkurious (without any mention in the article, which is strange).

Spoiler: Linkurious co-founder here.

[+] ice-berg|7 years ago|reply

True, I've been using mind mapping tools but it's not the same.

Nuclino (https://www.nuclino.com/) looks promising, trying it out now.

[+] dreamcompiler|7 years ago|reply

This seems to be an advertisement albeit a strange one. They make it clear that NASA used Neo4J rather than Nuclino. Neo4J is a true graph database, but I didn't find anything on the Nuclino website that suggests what Nuclino really is or what technology it uses.

[+] nift|7 years ago|reply

Nuclino is a tool to write documentation and the only thing "graph" about it, from my understanding as a user, is you can link to different documents within nuclino which then generates a graph. This graph nuclino visualises so the user can explore the documentation.

In my experience this exploring thing kinda only makes sense when you want to document doing/trying the same thing again ( which NASA probably is). If you are just documenting how to connect to a database, set something up or similar it, to me, falls pretty glat. Maybe I'm using it wrong...

No idea what they use under the hood.

Source: Use it where I work

[+] weitzj|7 years ago|reply

What I am looking for is a nice way (graph) where I can connect all kinds of events/people/commits/bugs/tickets and jump between them.

Currently I am putting links on GitHub PR descriptions so I know in my deployment GitHub repo, Who releases What, When and in Which cluster (where)

The PRs contain links to Jira tickets.

So all in all if you “sprinkle” enough links on GitHub Jira, I essentially can click through them and answer the question, how that ended up here? What changed? Where is the bug?

But I feel like this set of links referencing GitHub, Jira, PRs, Commits, Error Reports would be really fitting in some kind of graph

[+] mike555|7 years ago|reply

This kind of reminds me of the FMEA and its web structure, which is very useful.

It does share the big weakness with all the other such databases though, very hard to convince people to use it, specially to add and maintain content.

[+] fxfan|7 years ago|reply

Does anybody here have a 'canonical' application or example in mind that shows me what neo4j can do that matches my intuitive understanding better than the 'regular' RDBMS?

[+] lmeyerov|7 years ago|reply

That can be non-obvious, so fair. We (graphistry) get pulled into a lot of investigative scenarios -- account takeover (web logs), malware/phishing analysis (host/network logs & feeds), AML, claims fraud, etc. I found the problems being solved to be some combination of: awkward to express with SQL, too slow to run in a RDBMS, or hard to visually explore relationships/correlations.

Examples:

=== Shortest Paths

1a. Referral: "Who on our team connected to which leadership at Apple?"

(target:Company[name="Apple"])<-[_:Leadership]--(champion)--[]-->(us:Company[name="myCompany"])

1b. Supply Chain, AML, entanglements...: "How are these companies related, even if 5 companies away, and across all sorts of relationship types?"

(a:Company[name="a16z"])-[r:1..3]-(b:Company[name="juicero"])

=== Neighborhood (incl. multi-hop):

2a: 360 context on a security/fraud/ops incident:

(hacked:Computer[ip="10.10.0.0"])-[e:Alert]-(metadata:)

+ (hacked:Computer[ip="10.10.0.0"])-[Login]->(u:User)-[e:Alert]-(metadata:)

2b: fraud rings:

(fraudster:clientIP)-[login:http]-(b:Fingerprint)

+ (fraudster:clientIP)-[x:http[method="POST"]]-(p:Page)

2c: Journeys (customer, patient, ...)

(a:Patient[id=123]-[e:1..2]->(b:)

=== Whole system optimization / compute:

Personalized pagerank, supplychain optimization, business process mining, ...

===

The above can be extended, such as by adding in compute (correlation, influence scores, ...). That feds into viz / recommendations / decision making.

or: Not all uses of graph are end-to-end. We often get used with a graph db to improve understanding it (our viz scale 100-1000X over the tools here via GPUs)... but folks may instead plug their graphdb into a tabular frontend. Or use us with a tabular system like Splunk/Spark/Elastic. So the above can be hard to write in Splunk/SQL, or slow to run, or hard to visually understand.

[+] stergios|7 years ago|reply

This may not be a canonical application of a data model, but expressing graph queries using "Cypher", the graph query language invented by Neo4j, is very intuitive to my mind. I find the use of ASCII art to help visualize the relationships welcome.

For example, say we have a graph of movies, actors, reviewers, producers, etc. Here's Cypher query that returns the names of people who reviewed movies and the actors in these movies

  MATCH (r:Person)-[rev:REVIEWED]->(m:Movie)<-[:ACTED_IN]-(a:Person)
  RETURN DISTINCT r.name AS Reviewer, m.title AS Title, 
                  m.released AS Year, rev.rating AS Rating,
                  collect(a.name) AS Actors

Another example: You want to know what actors acted in movies in the decade starting with the year 1990

  MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
  WHERE 1990 <= m.released <= 1999
  RETURN m.released, collect(m.title) as titles, collect(p.name) as actors
  ORDER BY m.released

[+] nisa|7 years ago|reply

I'm new into this "knowledge" space - but I've stumbled upon structr.com - It's open source - you can use this as an extremly flexible content management system for buisness processes and stuff like that - check out their website - however can't tell you much more it's on my todo list.

[+] JJzD|7 years ago|reply

RDBMS are developed with joins in mind, but also die of the complexity involved resulting from these joins (both from developer perspective, as well resources.)

Now imagine your join to become a primary perspective to look at your data. Then you'd see creditcard transactions (who buys what when?) or maps are better represented as a graph. I know for example TomTom uses neo4j to validate map edits in production.

[+] joelschw|7 years ago|reply

Say you have transactions which follow a complex supply chain... Sure you can reconstruct the path taken using recursive SQL, but you're also joining lots and lots of things together at runtime.

In a graph database, you've effectively taken your 'join' penalty at the point of ingestion and you have an expressive query syntax to describe the pattern you're trying to match.

[+] wespiser_2018|7 years ago|reply

One problem I've seen in start ups as they scale isn't the lack of good documentation but the lack of information organization and hierarchy. The cost you pay is repeating experiments/trials, and generally slower development. The best way, I've found to overcome this, is to just talk to people and construct an information map/hierarchy as a mental model. Obviously, this process can't scale with the business. I wonder if this tool would be useful for software/product dev in start up environments?

[+] maxxxxx|7 years ago|reply

Has anybody ever seen a knowledge database for a large organization that actually works? I always see these efforts but usually they turn out pretty useless because nobody keeps them up to date.

[+] notyourwork|7 years ago|reply

Does Wikipedia count?

[+] jordache|7 years ago|reply

not very convincing. If the differentiator was the correlation of data in a more meaningful way - It doesn't matter if you display the correlating data in a list or a graph...

[+] blackbrokkoli|7 years ago|reply

Is there any way to view the knowledge graph in the new design? Lot's of other people linking the database itself, but I can't actually find a link with the new design...

[+] shifto|7 years ago|reply

This reads like SCP but with NASA and sometimes more scary.

[+] larkinrichards|7 years ago|reply

Never heard of this in my four years there. Hrm.

[+] tabtab|7 years ago|reply

It's not like space-travel knowledge-bases are rocket science ... oh, wait.

[+] Zecar|7 years ago|reply

Really rather not read a bunch of content marketing on this site. Could we stick to news?

79 comments