I found your product this week when searching for a neo4j visualization tool but I couldn’t try it on anything other than an example database. Is there anyway to try/use it as a researcher?
I spent a bulk of my programming career modelling business processes in a graph database with strong schema, lifecycle control (state machines) and formal change control (revisioning).
I was always blown away with how easy it was to turn around a very stable and useful system where the customers could actually understand the data model and refactoring was easy to reason through.
That’s been known for hundreds of years about linseed oil, and is one if the reasons it’s no longer used to waterproof clothing. I wonder what the circumstances were and what it was being used for.
So one thing I still don't understand is whether Neo4J a pure graph database is better than using something like AegensGraph[0] or Cayley[1], which uses a pre-existing DB engine for their persistent layer. If yes, what are the advantages? Is it something that totally depends on the use case? If it is, what criteria should be used to make a decision?
There's pros and cons to deciding whether to go "graph native"
or existing DB.
PROS
You can optimize for exactly the types of queries that you want graph databases to answer: shortest path, path finding, etc. Relational databases / document databases are (generally) very poop at those types of queries because those are not the types of queries people want to run on those databases. In a "graph native" database, everything down to the storage on disk can be optimized to perform graph algorithms.
CONS
There's years, sometimes decades, of engineering that goes into databases (I'm thinking of PostgreSQL and Cassandra, both of which have graph "layers" available). A lot of the engineering work is non-graph specific: ACID, how to handle transactions, distributed computing, WAL, replication.
Why re-engineer all of those just to perform graph operations? More quickly.
Also, I can send you a good paper by the founder of DGraph Labs if you're really curious.
indexing and search specialized to graph operations is a thing; no experience with those projects, but familiar with some workarounds in Postgres. Basically, the deeper the graph searches, the more the performance drops for relational DBs. This is a seriously studied topic, so refer to research for more details
Yes, and thanks. This bit is really important, I see too many people who don’t understand the difference between a graph database and a knowledge graph.
“So how did we build this thing with the smart folks at NASA as partners and customers? The key takeaway here is that a Knowledge Graph platform is a Knowledge Toolkit plus a Graph Database, and all of those components are critical at NASA.
Doing this with a plain graph database isn’t going to work unless you want to do all the heavy lifting of AI, knowledge representation, machine learning, and automated reasoning yourself, from scratch. I’ll wait while you decide…didn’t think so.”
This seems to be an advertisement albeit a strange one. They make it clear that NASA used Neo4J rather than Nuclino. Neo4J is a true graph database, but I didn't find anything on the Nuclino website that suggests what Nuclino really is or what technology it uses.
Nuclino is a tool to write documentation and the only thing "graph" about it, from my understanding as a user, is you can link to different documents within nuclino which then generates a graph. This graph nuclino visualises so the user can explore the documentation.
In my experience this exploring thing kinda only makes sense when you want to document doing/trying the same thing again ( which NASA probably is). If you are just documenting how to connect to a database, set something up or similar it, to me, falls pretty glat. Maybe I'm using it wrong...
What I am looking for is a nice way (graph) where I can connect all kinds of events/people/commits/bugs/tickets and jump between them.
Currently I am putting links on GitHub PR descriptions so I know in my deployment GitHub repo, Who releases What, When and in Which cluster (where)
The PRs contain links to Jira tickets.
So all in all if you “sprinkle” enough links on GitHub Jira, I essentially can click through them and answer the question, how that ended up here? What changed? Where is the bug?
But I feel like this set of links referencing GitHub, Jira, PRs, Commits, Error Reports would be really fitting in some kind of graph
This kind of reminds me of the FMEA and its web structure, which is very useful.
It does share the big weakness with all the other such databases though, very hard to convince people to use it, specially to add and maintain content.
Does anybody here have a 'canonical' application or example in mind that shows me what neo4j can do that matches my intuitive understanding better than the 'regular' RDBMS?
That can be non-obvious, so fair. We (graphistry) get pulled into a lot of investigative scenarios -- account takeover (web logs), malware/phishing analysis (host/network logs & feeds), AML, claims fraud, etc. I found the problems being solved to be some combination of: awkward to express with SQL, too slow to run in a RDBMS, or hard to visually explore relationships/correlations.
Examples:
=== Shortest Paths
1a. Referral: "Who on our team connected to which leadership at Apple?"
Personalized pagerank, supplychain optimization, business process mining, ...
===
The above can be extended, such as by adding in compute (correlation, influence scores, ...). That feds into viz / recommendations / decision making.
or: Not all uses of graph are end-to-end. We often get used with a graph db to improve understanding it (our viz scale 100-1000X over the tools here via GPUs)... but folks may instead plug their graphdb into a tabular frontend. Or use us with a tabular system like Splunk/Spark/Elastic. So the above can be hard to write in Splunk/SQL, or slow to run, or hard to visually understand.
This may not be a canonical application of a data model, but expressing graph queries using "Cypher", the graph query language invented by Neo4j, is very intuitive to my mind. I find the use of ASCII art to help visualize the relationships welcome.
For example, say we have a graph of movies, actors, reviewers, producers, etc. Here's Cypher query that returns the names of people who reviewed movies and the actors in these movies
MATCH (r:Person)-[rev:REVIEWED]->(m:Movie)<-[:ACTED_IN]-(a:Person)
RETURN DISTINCT r.name AS Reviewer, m.title AS Title,
m.released AS Year, rev.rating AS Rating,
collect(a.name) AS Actors
Another example: You want to know what actors acted in movies in the decade starting with the year 1990
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE 1990 <= m.released <= 1999
RETURN m.released, collect(m.title) as titles, collect(p.name) as actors
ORDER BY m.released
I'm new into this "knowledge" space - but I've stumbled upon structr.com - It's open source - you can use this as an extremly flexible content management system for buisness processes and stuff like that - check out their website - however can't tell you much more it's on my todo list.
RDBMS are developed with joins in mind, but also die of the complexity involved resulting from these joins (both from developer perspective, as well resources.)
Now imagine your join to become a primary perspective to look at your data.
Then you'd see creditcard transactions (who buys what when?) or maps are better represented as a graph.
I know for example TomTom uses neo4j to validate map edits in production.
Say you have transactions which follow a complex supply chain... Sure you can reconstruct the path taken using recursive SQL, but you're also joining lots and lots of things together at runtime.
In a graph database, you've effectively taken your 'join' penalty at the point of ingestion and you have an expressive query syntax to describe the pattern you're trying to match.
One problem I've seen in start ups as they scale isn't the lack of good documentation but the lack of information organization and hierarchy. The cost you pay is repeating experiments/trials, and generally slower development. The best way, I've found to overcome this, is to just talk to people and construct an information map/hierarchy as a mental model. Obviously, this process can't scale with the business. I wonder if this tool would be useful for software/product dev in start up environments?
Has anybody ever seen a knowledge database for a large organization that actually works? I always see these efforts but usually they turn out pretty useless because nobody keeps them up to date.
not very convincing. If the differentiator was the correlation of data in a more meaningful way - It doesn't matter if you display the correlating data in a list or a graph...
Is there any way to view the knowledge graph in the new design? Lot's of other people linking the database itself, but I can't actually find a link with the new design...
[+] [-] mentatseb|7 years ago|reply
Disclamer: Linkurious CEO here, the tool used to explore the Neo4j graph database used at NASA.
[+] [-] petra|7 years ago|reply
[+] [-] chrisweekly|7 years ago|reply
[+] [-] titanix2|7 years ago|reply
[+] [-] ice-berg|7 years ago|reply
[+] [-] rambojazz|7 years ago|reply
[+] [-] ChrisLok1|7 years ago|reply
[+] [-] baxtr|7 years ago|reply
[+] [-] mikkom|7 years ago|reply
[+] [-] wespiser_2018|7 years ago|reply
[+] [-] Timothycquinn|7 years ago|reply
I was always blown away with how easy it was to turn around a very stable and useful system where the customers could actually understand the data model and refactoring was easy to reason through.
Graph databases FTW.
[+] [-] riku_iki|7 years ago|reply
[+] [-] aplc0r|7 years ago|reply
The first one I managed to click on was related to a fire in an employee's car: https://llis.nasa.gov/lesson/943
[+] [-] amylowe|7 years ago|reply
[+] [-] argd678|7 years ago|reply
[+] [-] dmurray|7 years ago|reply
Certainly something NASA employees need to be aware of.
[+] [-] fouc|7 years ago|reply
[+] [-] tiuPapa|7 years ago|reply
[0]:https://github.com/bitnine-oss/agensgraph [1]:https://github.com/cayleygraph/cayley
[+] [-] the-alchemist|7 years ago|reply
PROS
You can optimize for exactly the types of queries that you want graph databases to answer: shortest path, path finding, etc. Relational databases / document databases are (generally) very poop at those types of queries because those are not the types of queries people want to run on those databases. In a "graph native" database, everything down to the storage on disk can be optimized to perform graph algorithms.
CONS
There's years, sometimes decades, of engineering that goes into databases (I'm thinking of PostgreSQL and Cassandra, both of which have graph "layers" available). A lot of the engineering work is non-graph specific: ACID, how to handle transactions, distributed computing, WAL, replication.
Why re-engineer all of those just to perform graph operations? More quickly.
Also, I can send you a good paper by the founder of DGraph Labs if you're really curious.
[+] [-] mistrial9|7 years ago|reply
[+] [-] kendallgclark|7 years ago|reply
[+] [-] jshen|7 years ago|reply
“So how did we build this thing with the smart folks at NASA as partners and customers? The key takeaway here is that a Knowledge Graph platform is a Knowledge Toolkit plus a Graph Database, and all of those components are critical at NASA.
Doing this with a plain graph database isn’t going to work unless you want to do all the heavy lifting of AI, knowledge representation, machine learning, and automated reasoning yourself, from scratch. I’ll wait while you decide…didn’t think so.”
[+] [-] brad0|7 years ago|reply
I can think of plenty of examples at my work where spidering a website and displaying it in a graph would be really cool.
Our wiki would be one for sure.
[+] [-] david_p|7 years ago|reply
Links: https://neo4j.com/ https://linkurio.us/
More info about this use case here: https://linkurio.us/blog/how-nasa-experiments-with-knowledge...
The screenshot in the article is from Linkurious (without any mention in the article, which is strange).
Spoiler: Linkurious co-founder here.
[+] [-] ice-berg|7 years ago|reply
Nuclino (https://www.nuclino.com/) looks promising, trying it out now.
[+] [-] dreamcompiler|7 years ago|reply
[+] [-] nift|7 years ago|reply
In my experience this exploring thing kinda only makes sense when you want to document doing/trying the same thing again ( which NASA probably is). If you are just documenting how to connect to a database, set something up or similar it, to me, falls pretty glat. Maybe I'm using it wrong...
No idea what they use under the hood.
Source: Use it where I work
[+] [-] weitzj|7 years ago|reply
Currently I am putting links on GitHub PR descriptions so I know in my deployment GitHub repo, Who releases What, When and in Which cluster (where)
The PRs contain links to Jira tickets.
So all in all if you “sprinkle” enough links on GitHub Jira, I essentially can click through them and answer the question, how that ended up here? What changed? Where is the bug?
But I feel like this set of links referencing GitHub, Jira, PRs, Commits, Error Reports would be really fitting in some kind of graph
[+] [-] mike555|7 years ago|reply
It does share the big weakness with all the other such databases though, very hard to convince people to use it, specially to add and maintain content.
[+] [-] fxfan|7 years ago|reply
[+] [-] lmeyerov|7 years ago|reply
Examples:
=== Shortest Paths
1a. Referral: "Who on our team connected to which leadership at Apple?"
(target:Company[name="Apple"])<-[_:Leadership]--(champion)--[]-->(us:Company[name="myCompany"])
1b. Supply Chain, AML, entanglements...: "How are these companies related, even if 5 companies away, and across all sorts of relationship types?"
(a:Company[name="a16z"])-[r:1..3]-(b:Company[name="juicero"])
=== Neighborhood (incl. multi-hop):
2a: 360 context on a security/fraud/ops incident:
(hacked:Computer[ip="10.10.0.0"])-[e:Alert]-(metadata:)
+ (hacked:Computer[ip="10.10.0.0"])-[Login]->(u:User)-[e:Alert]-(metadata:)
2b: fraud rings:
(fraudster:clientIP)-[login:http]-(b:Fingerprint)
+ (fraudster:clientIP)-[x:http[method="POST"]]-(p:Page)
2c: Journeys (customer, patient, ...)
(a:Patient[id=123]-[e:1..2]->(b:)
=== Whole system optimization / compute:
Personalized pagerank, supplychain optimization, business process mining, ...
===
The above can be extended, such as by adding in compute (correlation, influence scores, ...). That feds into viz / recommendations / decision making.
or: Not all uses of graph are end-to-end. We often get used with a graph db to improve understanding it (our viz scale 100-1000X over the tools here via GPUs)... but folks may instead plug their graphdb into a tabular frontend. Or use us with a tabular system like Splunk/Spark/Elastic. So the above can be hard to write in Splunk/SQL, or slow to run, or hard to visually understand.
[+] [-] stergios|7 years ago|reply
For example, say we have a graph of movies, actors, reviewers, producers, etc. Here's Cypher query that returns the names of people who reviewed movies and the actors in these movies
Another example: You want to know what actors acted in movies in the decade starting with the year 1990[+] [-] nisa|7 years ago|reply
[+] [-] JJzD|7 years ago|reply
Now imagine your join to become a primary perspective to look at your data. Then you'd see creditcard transactions (who buys what when?) or maps are better represented as a graph. I know for example TomTom uses neo4j to validate map edits in production.
[+] [-] joelschw|7 years ago|reply
In a graph database, you've effectively taken your 'join' penalty at the point of ingestion and you have an expressive query syntax to describe the pattern you're trying to match.
[+] [-] wespiser_2018|7 years ago|reply
[+] [-] maxxxxx|7 years ago|reply
[+] [-] notyourwork|7 years ago|reply
[+] [-] jordache|7 years ago|reply
[+] [-] blackbrokkoli|7 years ago|reply
[+] [-] shifto|7 years ago|reply
[+] [-] larkinrichards|7 years ago|reply
[+] [-] tabtab|7 years ago|reply
[+] [-] Zecar|7 years ago|reply