Dynamic Programming versus Memoization

[+] tbenst|13 years ago|reply

I had Shriram as a professor - great lecturer. Here's a racket implementation for the memoize function based off of class notes:

#lang racket

(define memo-table (box empty)) (define-struct memo (key ans))

(define (memoize f) (lambda args (local ([define lookup (filter (lambda (v) (equal? args (memo-key v))) (unbox memo-table))]) (if (empty? lookup) (begin (local ([define ans (apply f args)]) (begin (set-box! memo-table (cons (make-memo args ans) (unbox memo-table))) ans))) (memo-ans (first lookup))))))

Edit: I guess HN doesn't like my formatting. Here's a readable version http://pastie.org/4591751

[+] gus_massa|13 years ago|reply

You should use an empty line to make a new paragraph, and then start every line with two spaces:

  #lang racket

  (define memo-table (box empty))
  (define-struct memo (key ans))

  (define (memoize f)
    (lambda args
      (local ([define lookup (filter (lambda (v)
                                       (equal? args (memo-key v)))
                                     (unbox memo-table))])
        (if (empty? lookup)
            (begin (local ([define ans (apply f args)])
                     (begin
                       (set-box! memo-table (cons (make-memo args ans)
                                                  (unbox memo-table)))
                       ans)))
            (memo-ans (first lookup))))))

[+] Fixnum|13 years ago|reply

Be careful -- this doesn't fully memoize recursive functions unless you force the computation of intermediate results, since the recursive calls don't use memoization:

(define memo-fib (memoize fib)) ;; example from SICP (memo-fib 40) ;; long wait

I would love to see a good general-purpose 'memoize but rather doubt it's possible, though I think I remember seeing one in Common Lisp in "Paradigms of Artificial Intelligence Programming" that exploited CL's weird namespacing rules for functions to make recursive functions like 'fib run fast.

[+] mattj|13 years ago|reply

I think this neglects a performance consideration: dynamic programming is often simpler for a compiler to optimize, since it's generally a set of nested loops. Sometimes you can vectorize operations, but often you can just exploit locality to keep all the active set in l1 cache (not a compiler optimization per-se).

Memoization, however, is much more of a black-box. Very few compilers will handle the branching present in memoized function calls in the same way they'd handle a loop (which is also a branch, but a more predictable one).

That all being said, memoization is often simpler to reason about and great for infrequent computations.

[+] rcfox|13 years ago|reply

This won't help everywhere, but sometimes you can give the compiler a hint that a certain branch is less likely to be taken. The computation branch is going to be slower anyway, and only happen once, so tell the compiler to assume the lookup branch will be taken most of the time.

[+] ced|13 years ago|reply

Wikipedia has an implementation of Fibonacci with DP, and another with memoization. It's a clearer explanation than the blog provides.

https://en.wikipedia.org/wiki/Dynamic_programming#Fibonacci_...

[+] lukev|13 years ago|reply

My lack of a formal CS background is hurting me here. Could someone speak to the distinction between a tree and a DAG they make here?

I was under the impression that the terms were more or less synonymous.

[+] dkarl|13 years ago|reply

In programming lingo a tree is usually a directed acyclic graph in which every node has exactly one incoming edge (one parent) except for the root node, which has no parents. A "cycle" in a directed graph is a loop that connects a node to itself along directed edges, so "acyclic" just means there aren't any loops. In this case, that means no circular dependencies among the computations.

There's also a mathematical definition of tree which is occasionally used in theoretical CS, so you have to be careful about getting them mixed up, but that kind of tree is an undirected acyclic graph. The kind of tree they're talking about in the article is directed, because it represents computational dependencies. (If A is connected to B, then either A depends on B or B depends on A. The dependency only goes one way.)

The defining difference between a tree and a directed acyclic graph is that a node in a DAG can have more than one incoming edge. So a DAG is like a tree in which nodes can have more than one parent. Here's the simplest illustration. I can't draw arrows, but imagine each edge being directed from top to bottom, so that A is the root of the tree:

      A (a tree, therefore       A (a DAG, but not a tree)
     / \     also a DAG   )     / \
    B   C                      B   C
       /                        \ /
      D                          D

In the tree, D can have only one incoming edge (only one parent) so there can only be one path from A to D along the directed edges. In the DAG, D can have multiple incoming edges (from B and C in this case) so there can be multiple paths from A to D. Direction is important to keep in mind (and I really wish I could draw it.) Note that in the DAG, A-B-D-C-A isn't a loop because D-C-A goes against the direction of the edges: A->B->D<-C<-A. Also note that if the tree were not directed, it would be useless for representing dependencies, because it would not tell us whether C depends on A and D, or D depends on C and A depends on C, or maybe D depends on C which depends on A which depends on B. The order of the edges is what tells us that A depends on C and not vice-versa.

Here's what the article means by converting a tree into a DAG for computation. In the following illustration, I'm going to use letters to label the nodes, but different nodes in the same graph are different, even if they have the same label. Here's the tree:

Suppose the node labels represent computations, and the edges represent dependencies. A depends on the results of B and C, B depends on the result of D, C depends on the results of E and D, and D depends on the result of F. If you perform all the computations as they are represented in this tree, D and F will be computed twice. This might be inefficient. So you take nodes with the same labels and identify them, make them the same. That gives you a different graph which is no longer a tree:

This DAG represents the same dependencies that the tree above does, and since nodes with identical labels have been combined, each computation is represented once. The tree representation is easier to create, because you don't have to worry about finding and combining duplicate nodes, but the DAG is more efficient to compute.

The idea of memoization is that each node in the tree should be the name of a computation, and the computation itself should be looked up by name. That way even if a name occurs multiple times in the tree, the computation it names will only occur once. The computations named "B" and "C" both depend on a computation named "D" which they will look up by name. They don't have to know they share a dependency, and they don't even have to reference the same copy of the name "D". This extra layer of indirection is the "black box of memoization" that implicitly turns the tree into a DAG to avoid replicating computations D and F.

[+] pilgrim689|13 years ago|reply

A DAG has directed edges, whereas a tree does not. Also, a node in a tree can only have 1 parent.

Here's an example of a DAG that can't be a tree because of the edge directions:

http://en.wikipedia.org/wiki/File:Directed_acyclic_graph_3.s...

[+] unknown|13 years ago|reply

[deleted]

[+] nandemo|13 years ago|reply

A tree is a DAG that is connected: if a and b are nodes of a tree, there is a path between a and b. But in general a DAG doesn't have to be connected. So all trees are DAGs but not all DAGs are trees: some are forests i.e. unions of trees.

[+] drunkpotato|13 years ago|reply

This is a really interesting way to think about computations. I mostly think of memoization for I/O things like database lookups, or long computations. I'd never thought of it in comparison to dynamic programming algorithms.

[+] spaghetti|13 years ago|reply

Memoization and dynamic programming come together in top down DP algorithms. These are also referred to as recursion with memoization. The recursive function often looks something like:

    f(arguments) {
        results = memoTable[arguments]

        if(results) // use results
        else
            results = // expensive recursive call to f
            memoTable[arguments] = results
    }

[+] irahul|13 years ago|reply

> I'd never thought of it in comparison to dynamic programming algorithms.

The blog post is using a different terminology. Memoization isn't part of dynamic programming - it's just that the top-down dynamic programming benefits from memoization.

The top-down fibonacci is defined as fib(n) = fib(n-1) + fib(n-2) Since fib(n-1) and fib(n-2) are overlapping(they have to overlap for it to be an example of dynamic programming), memoization reduces the number of computations. But that's not to say that a non-memoizing solution isn't an example of dynamic programming.

The bottom-up dynamic programming is the iterative solution.

The blog post is calling the top-down DP as memoization, and bottom-up DP as dynamic programming. I don't think this terminology is common(or correct).

[+] hobbyist|13 years ago|reply

Dynamic programming is an optimization technique using divide and conquer, whereas memoization is an added optimization in the dynamic programming paradigm. An optimization we do while calculating fibonacci sequence is memoization, to prevent from going down the tree to the leaf every time.

26 comments