top | item 40994972

(no title)

pfilo8 | 1 year ago

Could you explain more or point out some interesting references? I'm currently trying to understand how Datalog compares to SQL and, potentially GraphDBs

discuss

felixyz|1 year ago

TypeDb is a practical Datalog-based database system [1] (with a different syntax). TerminusDb is a project in a similar vein [2], but actually an RDF store at its core. If you want to experiment with the connections between Datalog, relational algebra, and SQL, check out the Datalog Educational System. And if you want to jump into the theory, Foundations of Databases (the "Alice book") is very thorough but relatively readable [4]! Oh, and there's a Google project, Logica, to do Datalog over Postgres databases [5].

[1]: https://typedb.com/ [2]: https://terminusdb.com/ [3]: http://www.fdi.ucm.es/profesor/fernan/des/ [4]: http://webdam.inria.fr/Alice/ [5]: https://github.com/evgskv/logica

burakemir|1 year ago

Mangle is a language that includes "textbook datalog" as a subset https://github.com/google/mangle ; like any real-world datalog language, it extends datalog with various facilities to make it practical.

It was discussed on HN https://news.ycombinator.com/item?id=33756800 and is implemented in go. There is the beginnings of a Rust implementation meanwhile.

If you are looking for datalog in the textbooks, here are some references: https://github.com/google/mangle/blob/main/docs/bibliography...

A graph DBs short intro to datalog: just like the edges of a graph could be represented as a simple table (src, target), you could consider a database tuple or a datalog or prolog fact foo(x1, ..., xN) as a "generalized edge." The nice thing about datalog is then that as one is able to express a connections in an elegant way as "foo(...X...), bar(...X...)" (a conjunction, X being a "node"), whereas in the SQL world one has to deal with a clumsy JOIN statement to express the same thing.

sirwhinesalot|1 year ago

Don't have any interesting references, sorry. My reasoning is mainly one of simplicity and power. In SQL you need to think in terms of tables, inner joins, outer joins, foreign keys etc. whereas datalog you do everything with relations as in prolog.

Not only is it conceptually much simpler, it's also a "pit of success" situation as thinking in terms of relations instead of tables leads you towards normal forms by default.

Add the ability to automatically derive new facts based on rules and it just wins by a country mile. I recommend giving Soufflé a try.

I haven't worked with GraphDBs enough to comment on that.

greenavocado|1 year ago

Prolog and Datalog example (they are identical in this case)

    % Facts
    parent(john, mary).
    parent(mary, ann).
    parent(mary, tom).

    % Rules
    ancestor(X, Y) :- parent(X, Y).
    ancestor(X, Z) :- parent(X, Y), ancestor(Y, Z).

    % Query
    ?- ancestor(john, X).

The Prolog code looks identical to Datalog but the execution model is different. Prolog uses depth-first search and backtracking, which can lead to infinite loops if the rules are not carefully ordered.

Datalog starts by evaluating all possible combinations of facts and rules. It builds a bottom-up derivation of all possible facts:

a. First, it derives all direct parent relationships.

b. Then, it applies the ancestor rules iteratively until no new facts can be derived.

For the query ancestor(john, X):

It returns all X that satisfy the ancestor relationship with john. This includes mary, ann, and tom. The order of rules doesn't affect the result or termination. Datalog guarantees termination because it operates on a finite set of possible facts.

Prolog uses a top-down, depth-first search strategy with backtracking.

For the query ancestor(john, X):

a. It first tries to satisfy parent(john, X). This succeeds with X = mary.

b. It then backtracks and tries the second rule: It satisfies parent(john, Y) with Y = mary. Then recursively calls ancestor(mary, X).

c. This process continues, exploring the tree depth-first.

Prolog will find solutions in this order: mary, ann, tom.

The order of clauses can affect both the order of results and termination: If the recursive rule were listed first, Prolog could enter an infinite loop. Prolog doesn't guarantee termination, especially with recursive rules.

SQL is more verbose. The equivalent of the Datalog/Prolog example above is:

    -- Create and populate the table
    CREATE TABLE Parent (
        parent VARCHAR(50),
        child VARCHAR(50)
    );

    INSERT INTO Parent VALUES ('john', 'mary');
    INSERT INTO Parent VALUES ('mary', 'ann');
    INSERT INTO Parent VALUES ('mary', 'tom');

    -- Recursive query to find ancestors
    WITH RECURSIVE Ancestor AS (
        SELECT parent, child
        FROM Parent
        UNION ALL
        SELECT a.parent, p.child
        FROM Ancestor a
        JOIN Parent p ON a.child = p.parent
    )
    SELECT DISTINCT parent AS ancestor
    FROM Ancestor
    WHERE child IN ('ann', 'tom');

This is a more interesting example of how one might use Datalog on a large dataset:

    % Define the base relation
    friend(Person1, Person2).

    % Define friend-of-friend relation
    friend_of_friend(X, Z) :- friend(X, Y), friend(Y, Z), X != Z.

    % Define potential friend recommendation
    % (friend of friend who is not already a friend)
    recommend_friend(X, Z) :- friend_of_friend(X, Z), not friend(X, Z).

    % Count mutual friends for recommendations
    mutual_friend_count(X, Z, Count) :- 
        recommend_friend(X, Z),
        Count = count{Y : friend(X, Y), friend(Y, Z)}.

    % Query to get top friend recommendations for a person
    top_recommendations(Person, RecommendedFriend, MutualCount) :-
        mutual_friend_count(Person, RecommendedFriend, MutualCount),
        MutualCount >= 5,
        MutualCount = max{C : mutual_friend_count(Person, _, C)}.

The equivalent Postgres example would be:

    WITH RECURSIVE
    -- Base friend relation
    friends AS (
        SELECT DISTINCT person1, person2
        FROM friendship
        UNION
        SELECT person2, person1
        FROM friendship
    ),

    -- Friend of friend relation
    friend_of_friend AS (
        SELECT f1.person1 AS person, f2.person2 AS friend_of_friend
        FROM friends f1
        JOIN friends f2 ON f1.person2 = f2.person1
        WHERE f1.person1 <> f2.person2
    ),

    -- Potential friend recommendations
    potential_recommendations AS (
        SELECT fof.person, fof.friend_of_friend, 
            COUNT(*) AS mutual_friend_count
        FROM friend_of_friend fof
        LEFT JOIN friends f ON fof.person = f.person1 AND fof.friend_of_friend = f.person2
        WHERE f.person1 IS NULL  -- Ensure they're not already friends
        GROUP BY fof.person, fof.friend_of_friend
        HAVING COUNT(*) >= 5  -- Minimum mutual friends threshold
    ),

    -- Rank recommendations
    ranked_recommendations AS (
        SELECT person, friend_of_friend, mutual_friend_count,
            RANK() OVER (PARTITION BY person ORDER BY mutual_friend_count DESC) as rank
        FROM potential_recommendations
    )

    -- Get top recommendations
    SELECT person, friend_of_friend, mutual_friend_count
    FROM ranked_recommendations
    WHERE rank = 1;

Full example you can run yourself: https://onecompiler.com/postgresql/42khbswat

dkarl|1 year ago

> Prolog uses depth-first search and backtracking, which can lead to infinite loops if the rules are not carefully ordered

Is this an issue in practice? Most languages can create programs with infinite loops, but it's easy to spot in code reviews. It's been over a decade since I encountered an infinite loop in production in the backend. Just wondering if the same is true for Prolog.

dmpk2k|1 year ago

How does the Datalog approach compare with RETE?