Could you explain more or point out some interesting references? I'm currently trying to understand how Datalog compares to SQL and, potentially GraphDBs
TypeDb is a practical Datalog-based database system [1] (with a different syntax). TerminusDb is a project in a similar vein [2], but actually an RDF store at its core. If you want to experiment with the connections between Datalog, relational algebra, and SQL, check out the Datalog Educational System. And if you want to jump into the theory, Foundations of Databases (the "Alice book") is very thorough but relatively readable [4]! Oh, and there's a Google project, Logica, to do Datalog over Postgres databases [5].
Mangle is a language that includes "textbook datalog" as a subset https://github.com/google/mangle ; like any real-world datalog language, it extends datalog with various facilities to make it practical.
A graph DBs short intro to datalog: just like the edges of a graph could be represented as a simple table (src, target), you could consider a database tuple or a datalog or prolog fact foo(x1, ..., xN) as a "generalized edge." The nice thing about datalog is then that as one is able to express a connections in an elegant way as "foo(...X...), bar(...X...)" (a conjunction, X being a "node"), whereas in the SQL world one has to deal with a clumsy JOIN statement to express the same thing.
Don't have any interesting references, sorry. My reasoning is mainly one of simplicity and power. In SQL you need to think in terms of tables, inner joins, outer joins, foreign keys etc. whereas datalog you do everything with relations as in prolog.
Not only is it conceptually much simpler, it's also a "pit of success" situation as thinking in terms of relations instead of tables leads you towards normal forms by default.
The Prolog code looks identical to Datalog but the execution model is different. Prolog uses depth-first search and backtracking, which can lead to infinite loops if the rules are not carefully ordered.
Datalog starts by evaluating all possible combinations of facts and rules. It builds a bottom-up derivation of all possible facts:
a. First, it derives all direct parent relationships.
b. Then, it applies the ancestor rules iteratively until no new facts can be derived.
For the query ancestor(john, X):
It returns all X that satisfy the ancestor relationship with john. This includes mary, ann, and tom. The order of rules doesn't affect the result or termination. Datalog guarantees termination because it operates on a finite set of possible facts.
Prolog uses a top-down, depth-first search strategy with backtracking.
For the query ancestor(john, X):
a. It first tries to satisfy parent(john, X). This succeeds with X = mary.
b. It then backtracks and tries the second rule: It satisfies parent(john, Y) with Y = mary. Then recursively calls ancestor(mary, X).
c. This process continues, exploring the tree depth-first.
Prolog will find solutions in this order: mary, ann, tom.
The order of clauses can affect both the order of results and termination: If the recursive rule were listed first, Prolog could enter an infinite loop. Prolog doesn't guarantee termination, especially with recursive rules.
SQL is more verbose. The equivalent of the Datalog/Prolog example above is:
-- Create and populate the table
CREATE TABLE Parent (
parent VARCHAR(50),
child VARCHAR(50)
);
INSERT INTO Parent VALUES ('john', 'mary');
INSERT INTO Parent VALUES ('mary', 'ann');
INSERT INTO Parent VALUES ('mary', 'tom');
-- Recursive query to find ancestors
WITH RECURSIVE Ancestor AS (
SELECT parent, child
FROM Parent
UNION ALL
SELECT a.parent, p.child
FROM Ancestor a
JOIN Parent p ON a.child = p.parent
)
SELECT DISTINCT parent AS ancestor
FROM Ancestor
WHERE child IN ('ann', 'tom');
This is a more interesting example of how one might use Datalog on a large dataset:
% Define the base relation
friend(Person1, Person2).
% Define friend-of-friend relation
friend_of_friend(X, Z) :- friend(X, Y), friend(Y, Z), X != Z.
% Define potential friend recommendation
% (friend of friend who is not already a friend)
recommend_friend(X, Z) :- friend_of_friend(X, Z), not friend(X, Z).
% Count mutual friends for recommendations
mutual_friend_count(X, Z, Count) :-
recommend_friend(X, Z),
Count = count{Y : friend(X, Y), friend(Y, Z)}.
% Query to get top friend recommendations for a person
top_recommendations(Person, RecommendedFriend, MutualCount) :-
mutual_friend_count(Person, RecommendedFriend, MutualCount),
MutualCount >= 5,
MutualCount = max{C : mutual_friend_count(Person, _, C)}.
The equivalent Postgres example would be:
WITH RECURSIVE
-- Base friend relation
friends AS (
SELECT DISTINCT person1, person2
FROM friendship
UNION
SELECT person2, person1
FROM friendship
),
-- Friend of friend relation
friend_of_friend AS (
SELECT f1.person1 AS person, f2.person2 AS friend_of_friend
FROM friends f1
JOIN friends f2 ON f1.person2 = f2.person1
WHERE f1.person1 <> f2.person2
),
-- Potential friend recommendations
potential_recommendations AS (
SELECT fof.person, fof.friend_of_friend,
COUNT(*) AS mutual_friend_count
FROM friend_of_friend fof
LEFT JOIN friends f ON fof.person = f.person1 AND fof.friend_of_friend = f.person2
WHERE f.person1 IS NULL -- Ensure they're not already friends
GROUP BY fof.person, fof.friend_of_friend
HAVING COUNT(*) >= 5 -- Minimum mutual friends threshold
),
-- Rank recommendations
ranked_recommendations AS (
SELECT person, friend_of_friend, mutual_friend_count,
RANK() OVER (PARTITION BY person ORDER BY mutual_friend_count DESC) as rank
FROM potential_recommendations
)
-- Get top recommendations
SELECT person, friend_of_friend, mutual_friend_count
FROM ranked_recommendations
WHERE rank = 1;
> Prolog uses depth-first search and backtracking, which can lead to infinite loops if the rules are not carefully ordered
Is this an issue in practice? Most languages can create programs with infinite loops, but it's easy to spot in code reviews. It's been over a decade since I encountered an infinite loop in production in the backend. Just wondering if the same is true for Prolog.
felixyz|1 year ago
[1]: https://typedb.com/ [2]: https://terminusdb.com/ [3]: http://www.fdi.ucm.es/profesor/fernan/des/ [4]: http://webdam.inria.fr/Alice/ [5]: https://github.com/evgskv/logica
burakemir|1 year ago
It was discussed on HN https://news.ycombinator.com/item?id=33756800 and is implemented in go. There is the beginnings of a Rust implementation meanwhile.
If you are looking for datalog in the textbooks, here are some references: https://github.com/google/mangle/blob/main/docs/bibliography...
A graph DBs short intro to datalog: just like the edges of a graph could be represented as a simple table (src, target), you could consider a database tuple or a datalog or prolog fact foo(x1, ..., xN) as a "generalized edge." The nice thing about datalog is then that as one is able to express a connections in an elegant way as "foo(...X...), bar(...X...)" (a conjunction, X being a "node"), whereas in the SQL world one has to deal with a clumsy JOIN statement to express the same thing.
sirwhinesalot|1 year ago
Not only is it conceptually much simpler, it's also a "pit of success" situation as thinking in terms of relations instead of tables leads you towards normal forms by default.
Add the ability to automatically derive new facts based on rules and it just wins by a country mile. I recommend giving Soufflé a try.
I haven't worked with GraphDBs enough to comment on that.
greenavocado|1 year ago
Datalog starts by evaluating all possible combinations of facts and rules. It builds a bottom-up derivation of all possible facts:
a. First, it derives all direct parent relationships.
b. Then, it applies the ancestor rules iteratively until no new facts can be derived.
For the query ancestor(john, X):
It returns all X that satisfy the ancestor relationship with john. This includes mary, ann, and tom. The order of rules doesn't affect the result or termination. Datalog guarantees termination because it operates on a finite set of possible facts.
Prolog uses a top-down, depth-first search strategy with backtracking.
For the query ancestor(john, X):
a. It first tries to satisfy parent(john, X). This succeeds with X = mary.
b. It then backtracks and tries the second rule: It satisfies parent(john, Y) with Y = mary. Then recursively calls ancestor(mary, X).
c. This process continues, exploring the tree depth-first.
Prolog will find solutions in this order: mary, ann, tom.
The order of clauses can affect both the order of results and termination: If the recursive rule were listed first, Prolog could enter an infinite loop. Prolog doesn't guarantee termination, especially with recursive rules.
SQL is more verbose. The equivalent of the Datalog/Prolog example above is:
This is a more interesting example of how one might use Datalog on a large dataset: The equivalent Postgres example would be: Full example you can run yourself: https://onecompiler.com/postgresql/42khbswatdkarl|1 year ago
Is this an issue in practice? Most languages can create programs with infinite loops, but it's easy to spot in code reviews. It's been over a decade since I encountered an infinite loop in production in the backend. Just wondering if the same is true for Prolog.
dmpk2k|1 year ago