top | item 8143905

Transducers are coming to Clojure

319 points| siavosh | 11 years ago |blog.cognitect.com | reply

100 comments

order
[+] tel|11 years ago|reply
This sort of reminds me of the Church-encoded form of a list.

    newtype Fold a = Fold (forall r . (a -> r -> r) -> r -> r)

    fold :: [a] -> Fold a
    fold xs = Fold (spin xs) where
      spin []     cons nil = nil
      spin (a:as) cons nil = cons a (spin as cons nil)

    refold :: Fold a -> [a]
    refold (Fold f) = f (:) []
Notably, since `fold` and `refold` are isomorphisms then we can do everything we can do to `[a]` to `Fold a`

    map :: (a -> b) -> (Fold a -> Fold b)
    map x (Fold f) = Fold $ \cons nil -> f (cons . x) nil

    filter :: (a -> Bool) -> Fold a -> Fold a
    filter p (Fold f) =
      Fold $ \cons nil -> f (\a r -> if p a then cons a r else r) nil
but all of this work is done without concrete reference to `(:)` and `[]`... you instead just use stand-ins I've been calling cons and nil. What's nice about this is that `Fold` can be used to build anything which can be "constructed from the left"

    foldSet :: Fold a -> Set a
    foldSet (Fold f) = f Set.insert Set.empty
It's sort of dual to the stuff I was exploring in Swift here [0]. It also creates laziness for free because you can't really execute the chain until the end—Church-encoding is really a form of continuation passing.

The downside of this idea is that each time you "consume" a Fold you redo work—there's no place to put caching necessarily.

Maybe that's what they're solving with the Fold transformers representation.

[0] http://tel.github.io/2014/07/30/immutable_enumeration_in_swi...

[+] richhickey|11 years ago|reply
Kind of. The idea is to get out of the context of the 'whole job' (the ->r->r bit above) and focus on transformations of the step function (a->r->r) -> (b->r->r) {using your arg order above}. Not talking about the whole job (i.e. the source and result) makes for much more highly reusable components, especially when the jobs don't produce concrete results but, e.g., run indefinitely, like channel transformations.
[+] wyager|11 years ago|reply
For those curious a about this, look up Haskell's `build` and `destroy` functions. Those functions are for church-encoding lists and doing optimizations that way.
[+] lmm|11 years ago|reply
So is this actually just the Free monad?
[+] vbit|11 years ago|reply
I'm not quite sure what this means, so here's my attempt to translate this into Python. A reducer is a function such as `add`:

    def add(sum, num):
      return sum + num
  
Of course you can plug `add` directly in `reduce(add, [1, 2, 3], 0)` which gives `6`.

A transducer is an object returned by a call such as `map(lambda x: x + 1)`.

You can now apply the transducer to a reducer and get another reducer.

    map_inc = map(lambda x: x + 1)
    add_inc = map_inc(add)
    
Our first reducer simply added, but the next one increments and then adds. We can use it as `reduce(add_inc, [1, 2, 3], 0)` which gives, I'm guessing, `9`.

Since the transducer returns a reducer as well, we can compose transducers:

     r1 = filter(is_even)(map(increment)(add))
     # use r1 in reduce()
     
It seems in clojure, reduce() isn't the only useful function that works with reducers, there are others which makes this all worthwhile.

Is my translation accurate?

[+] bcoates|11 years ago|reply
If I understand right (I may not):

In Python, the (complex, generic) iter() protocol is how every container provides a method to iterate itself, and reduce is a trivial application of a reducer (like your add function) to a container by using the iter protocol.

In Clojure, it's the opposite: there is a reduce() protocol and every reducible container knows how to reduce itself. the traditional reduce function just takes a reducer and a reducible and performs the reduce protocol on it.

As there's a bunch of different kinds of containers with different rules, the best way to implement the reduce protocol on them is different for each, and there might be weird specialized reducers that are parallel or asynchronous or lazy or whatever.

Transducers allow you to composibly transform reducers into different reducers, which can then be handed to the reduce protocol. As it turns out, most (all?) normal container->container operations, like map and filter, have corresponding transducers.

Here's the good part: if you have a reduce protocol and a complete set of tranducers, containers don't need to be mappable. The map(fn, iterable) function needs to know how to iterate; mapping(fn) doesn't care about containers at all, and is fully general to mapping any reducible for free. So you can write a transducer to produce the effect of any map or filter-like operation without touching iter().

As an added bonus, the code is more efficient:

  reduce( add, map( lambda x: x + 1, xrange(10**9) ), 0 )
eagerly builds a gigantic mapped list (barring sophisticated laziness or loop-fusion optimizations), but

  reduce( mapping( lambda x: x + 1 )(add), xrange(10**9), 0 )
is equivalent and trivially runs in O(1) space on one element of xrange at a time.

PS: python translation of mapping from http://clojure.com/blog/2012/05/15/anatomy-of-reducer.html converted to named functions to be more pythonic:

  def mapping( transformation ):
    def transducer( reducer ):
      def new_reducer( accum, next ):
        return reducer(accum, transformation(next) )
      return new_reducer
    return transducer
[+] dopamean|11 years ago|reply
I'm also hoping for an informative answer to this question. Anyone?
[+] pron|11 years ago|reply
I think that a good way to understand transducers is to look at their implementation (shortened a bit). Here it is for map:

    ([f]
    (fn [f1]
      (fn
        ([result input]
           (f1 result (f input)))
        ([result input & inputs]
           (f1 result (apply f input inputs))))))
filter:

    ([pred]
    (fn [f1]
      (fn
        ([result input]
           (if (pred input)
             (f1 result input)
             result)))))
And it gets more interesting with take:

    ([n]
     (fn [f1]
       (let [na (atom n)]
         (fn
           ([result input]
              (let [n @na
                    nn (swap! na dec)
                    result (if (pos? n)
                             (f1 result input)
                             result)]
                (if (not (pos? nn))
                  (reduced result) ; a terminal value indicating "don't reduce further"
                  result)))))))
The transducer is supplied with the reducer next in the chain (f1) and returns a reducer function that gets fed with the reduced value by the preceding reduction (result) and the next element (input). Note how the take transducer maintains internal state with an atom. This could get a little tricky for more elaborate reductions, as how the internal state is maintained might have a significant effect on performance, depending on exactly how the reduction is performed. For example, if the reduction is done in parallel (say, with fork-join), then an internal state that's updated with locks (like refs) might significantly slow down -- or even deadlock -- the reduction.

AFAICT mapcat still only returns lazy-seqs.

[+] richhickey|11 years ago|reply
Because mapcat's signature was not amenable to the additional arity, there's now also flatmap (note you can write the lazy collection version of any transducer fn using sequence as below):

    (defn flatmap
      "maps f over coll and concatenates the results.  Thus function f
      should return a collection.  Returns a transducer when no collection
      is provided."
      ([f]
       (fn [f1]
         (fn
           ([] (f1))
           ([result] (f1 result))
           ([result input]
              (reduce (preserving-reduced f1) result (f input))))))
    
      ([f coll] (sequence (flatmap f) coll)))
[+] tel|11 years ago|reply
Interesting. Continuing to try to analyze these in Haskell, I think this is a direct translation:

    -- z is just there to not clobber some standard prelude names
    type Red r a = r -> a -> r

    zmap :: (b -> a) -> Red r a -> Red r b
    zmap f f1 result input = f1 result (f input)

    zfilt :: (a -> Bool) -> Red r a -> Red r a
    zfilt p f1 result input = if p input then f1 result input else result

    ztake :: Int -> Red r a -> (r -> a -> Maybe r)
    ztake n f1 = run n where
      run n result input =
        let n'     = n - 1
            result = if n >= 0 then f1 result input else result
        in if n' == 0 then Nothing else Just result
I wanted to post this mostly to note that `map` is mapping contravariantly here. Is there something I'm missing? I had that occurring when I was playing around with this idea before looking at the source as well... to fix it I had to consider the type `Red r a -> r` so that the `a` was covariant again.
[+] oafitupa|11 years ago|reply
As someone who tried Clojure and failed, serious question: Does anyone actually use all these crazy features/patterns that keep getting added/discovered and talked about?

I ask because even though I can imagine someone smart mastering these things and programming faster, I can't imagine a second person being able to understand his code, maintain it, and generally be productive. I imagine the second person losing a lot of time trying to understand what is going on, or even thinking he understood but in reality he didn't and messing things up.

So how do you even form a Clojure team?

[+] JackMorgan|11 years ago|reply
It's a mindset change.

Other than macros, which fundamentally alter the flow of code in a very non-standard way, the rest of these "crazy patterns and features" in Clojure are just like the crazy libraries used by typical Java/Ruby/C# developers, only thousands of times simpler. If I came to you and said, "does anyone even use these tens of thousands of libraries in Maven? How do other developers work on this afterwards, they'd have to learn all these new APIs!" I'd likely get a response like, "they'd just be expected to", with a chuckle. The mindset I've seen a lot is that language features are "too hard" to learn and should be avoided, but library complexity is beyond reproach and is rarely questioned.

Clojure the language takes the approach that it's better to provide a dozen simple but incredibly composable tools that can easily replace a lot of duplicated boilerplate in other languages. Like these transducers, in Java/C# they'd likely be one of the design patterns that needs a whole set of classes to make what he shows in a few characters. Would you rather learn to write and read maintain a handful of classes, or just learn what those few characters mean? I don't get paid by the line, and any abstraction built into the language like that is a few more classes I don't have to maintain and possibly implement incorrectly in some subtle way.

Like I said, it's just a mindset change. I know a company that only uses Clojure, and they hire a lot of entry level developers who haven't yet gotten stuck in a mindset that "languages are too hard; libraries are easy". They have great success with that, and their entry level guys are up to speed much faster than those in my shop, using .NET, where they have to learn a new language AND a mountain of boiler plate code using dozens of libraries and frameworks.

[+] jeletonskelly|11 years ago|reply
I work in a clojure shop with about 10 other developers. Until very recently, we hired folks with zero exposure to clojure or even function programming. Do we use all the new bells and whistles that Rich and the folks at Cognitect develop for clojure? Yes, but we're not jumping into the existing code base and refactoring everything to use core.async or reducers. Usually, one or two guys will use it in a less public, less critical piece of the infrastructure as a real-world application and then we do a code review with the whole team.

So, how do you build a clojure team, exactly? Don't try to exclusively hire developers who have been exposed to it yet or have masters degrees from elite universities, focus on finding people who love to program, who are genuinely interested in improving their own abilities, but can clearly hold their own in the language they currently use. You will soon have a team of developers who love the challenge of keeping up with what guys like Rich Hickey (and Cognitect) are doing. We have been very successful with this.

[+] adambard|11 years ago|reply
Something I've noticed about Clojure code is that people tend to have taken the effort to express a problem in a concise and logical way, even for very complex problems. I'm not sure if it's just because Clojure attracts people who value good code, or because Clojure provides the tools to do it, but it sure is nice. Better to spend an hour on a 5-line function than a 50-line one, if they accomplish the same thing in the end.
[+] angusiguess|11 years ago|reply
In my experience the answer is yes but there isn't necessarily a hurry. I work in a small shop with 3 other Clojure developers who vary in experience from pretty green (I'd dabbled a bunch but have only been writing it professionally for about 4 months) to wizard (people who are going on two years and written the majority of the libraries we use, who are fluent in pretty much everything all the way up the stack). The learning curve for me has been like this:

Stage 1: Deep end. Given a task, I crack open an existing application and spend half a day to a day just trying to read it and figure it out. I struggle with my notions of state and immutability, make and test some changes until I think I have it figured. It's slow and arduous, at this point I'm reading Chas Emerick's book and mostly writing very standard clojure, dipping into and experimenting with things like multimethods and atoms.

Stage 2: Local maximum. I'm comfortable with some of the better patterns to pass around and manage state, make a lot of use anonymous functions, and a lot of my code looks like a let to establish useful local variables followed by a return. I get comfortable with writing programs either from the top down or bottom up, slowly building to a point where everything ties together. I start dipping my toes into core.async.

Stage 3: I get very comfortable with core.async and appreciate channels as a really nice way to separate concerns, you can build a whole bunch of neat little machines with channels. Sometimes I go a little overboard and roll something back to just using functions.

Stage 4: Start writing code where you realize you could use a macro. Macros feel about as hard as stage 1, with careful repl development and scrabbling to build some kind of conceptual framework where the whole thing hangs together. Transducers come out, I read about them and understand them conceptually, and get pretty excited about the use of pipelines, which present a much nicer way to chain channels together (where before you might use a map or write custom functions to do channel transformations, but they don't prove to be very composeable).

That's pretty much where I'm at right now. I'm comfortable, but there's still a lot of stuff I haven't really jumped into. One nice thing about most of these constructs is that if my conceptual grasp is wrong, things usually don't work at all, and that's a good time to ask questions to others.

As far as building a team? If progressing through those stages and having days that are frustrating followed by some big breakthroughs seems appealing, that's probably a good indicator that you'd fit into this sort of environment. When I was dabbling I made some progress and started to understand some of these mechanisms, but sometimes you need a problem staring you in the face that requires some fancy moves to solve to motivate you to push past what you know.

It seems daunting at first, but I sort of think that all of programming is. Practical knowledge can be acquired through experience, but it's expanding how you work theoretically that is the hardest and most rewarding.

Just a datapoint, anyway.

[+] calibraxis|11 years ago|reply
This sentence is important: "There will be more explanations, tutorials and derivations to follow, here and elsewhere." If a team ever adopts transducers, it'll generally be after a lot of explanations exist and has been made easy to understand.

Clojure teams can defend themselves against exotic features like any other team: code review, conventions, etc. If I pull in some weird new feature, others had better be ok with it.

(Obviously, what I'm saying doesn't apply to every team in the world; presumably there's a spectrum of ways to deal, or fail to deal, with this problem.)

In my view, Clojure teams are formed via a political process, like any other programming language adoption.

[+] undershirt|11 years ago|reply
I just saw this morning that many functions in core.async are marked "Deprecated - this function will be removed. Use transformer instead." I guess Tranducers will provide a generic replacement for those. Looking forward to seeing some examples.
[+] unlogic|11 years ago|reply
This looks exciting, but I'm confused about the decision to add extra arity to collection-manipulating functions. "filter" that returns a collection or a transducer depending only on arity seems a little counter-intuitive.
[+] swannodette|11 years ago|reply
It's a pretty well considered tradeoff in my opinion - no existing code breaks while at the same time all the transformation functions now have the same semantic interpretation when used in different contexts. The alternative would be to replicate the notion of `map`, `filter`, etc. again and again as occurred with reducers and higher level core.async operations on channels.
[+] graycat|11 years ago|reply
I'm sorry, but from the OP I can't be at all sure I can understand notation such as:

     ;;reducing function signature
     whatever, input -> whatever
or

     ;;transducer signature
     (whatever, input -> whatever) -> (whatever, input ->   whatever)
Or, in mathematics there is some notation

     f: A --> B
where A and B are sets and f is a function. This notation means that for each element x in set A, function f returns value f(x) in set B. Seems clear enough. Maybe the notation in the OP is related? How I can't be sure I can guess at all correctly.
[+] ww520|11 years ago|reply
It's not a formal notation. It's talking about a pattern in function signature. The function takes in some parameters (whatever, input) and spits out an output ( -> whatever ).

The "reducing function signature" basically is just the function signature of a "reducer" (or "fold") function in the map/reduce (or map/fold) pattern.

The "whatever" is kind of sloppy and confusing. It's the accumulating memo parameter of a reducer function. To be precise, the reduce(list, reducer, init_value)->list function takes in another function - the reducer, which has the function signature of reducer(current_element, accumulating_memo)->new_memo to run against each element of the list.

The "transducer" is basically a function to take in a reducer and spit out another reducer.

e.g. you have a reducer to sum up all the elements of a list. A doubling transducer would take that reducer and produces another one that doubles each element before summing them up.

[+] richhickey|11 years ago|reply
a la Haskell:

    ;;reducing fn
    x->a->x

    ;;transducer fn
    (x->a->x)->(x->b->x)
[+] davdar|11 years ago|reply
Clojure transducers are exactly signal functions from Haskell FRP literature, for those interested in such a connection.
[+] richhickey|11 years ago|reply
I'm not yet seeing that, given:

Signal a :: Time -> a SF a b :: Signal a -> Signal b thus (Time -> a) -> (Time -> b)

not exactly:

(x->a->x) -> (x->b->x)

Can you point me to a paper that makes the connection clear?

[+] fnordsensei|11 years ago|reply
Tentative benchmark results have surfaced: https://github.com/thheller/transduce-bench

Add salt according to taste.

[+] mduerksen|11 years ago|reply
The benchmark is wrong atm, the compared functions do not yield the same result.

(comp (map inc) (filter even?)) means filtering first, then incrementing.

The opposite for (->> data (map inc) (filter even?) ...

- which is not the same. And of course, it also means that the latter has to increment the double amount of numbers.

EDIT: It was me who was wrong, thanks for the corrections. Pitfall successfully identified :)

[+] tel|11 years ago|reply
This is about as close as I could get in Haskell so far. It uses a slight twist on (x -> a -> x) called a Fold (which has a lot of great properties—it's a profunctor, an applicative, and a comonad).

Nicely, this construction lets us write `take` purely!

    {-# LANGUAGE GADTs         #-}
    {-# LANGUAGE RankNTypes    #-}
    {-# LANGUAGE TypeOperators #-}

    import           Control.Arrow
    import           Control.Category
    import qualified Prelude
    import           Prelude hiding (id, (.))

    data Fold a r where
      Fold :: (a -> x -> x) -> x -> (x -> r) -> Fold a r

    data Pair a b = Pair !a !b

    pfst :: Pair a b -> a
    pfst (Pair a b) = a

    psnd :: Pair a b -> b
    psnd (Pair a b) = b

    newtype (~>) a b = Arr (forall r . Fold b r -> Fold a r)

    instance Category (~>) where
      id = Arr id
      Arr f . Arr g = Arr (g . f)

    amap :: (a -> b) -> (a ~> b)
    amap f = Arr (\(Fold cons nil fin) -> Fold (cons . f) nil fin)

    afilter :: (a -> Bool) -> (a ~> a)
    afilter p = Arr $ \(Fold cons nil fin) ->
      let cons' = \a x -> if p a then cons a x else x
      in Fold cons' nil fin

    fold :: Fold a r -> [a] -> r
    fold (Fold cons nil fin) = fin . spin where
      spin []     = nil
      spin (a:as) = cons a (spin as)

    asequence :: (a ~> b) -> ([a] -> [b])
    asequence (Arr f) = fold (f (Fold (:) [] id))
    
    aflatmap :: (a -> [b]) -> (a ~> b)
    aflatmap f = Arr $ \(Fold cons nil fin) ->
      Fold (\a x -> foldr cons x (f a)) nil fin
    
    atake :: Int -> (a ~> a)
    atake n = Arr $ \(Fold cons nil fin) ->
      let cons' = \a x n -> if n > 0 then cons a (x (n-1)) else x n
      in Fold cons' (const nil) (\x -> fin (x n))
[+] phobbs|11 years ago|reply
Brilliant, this clears up the usage a lot more than the rest of the prose in the comment thread.
[+] dustingetz|11 years ago|reply
Not sure I understand - so Clojure is getting first class support for lazy collections and curried combinators? Or am I missing the important part?
[+] _halgari|11 years ago|reply
Many Clojure abstractions need things like map, filter, concat, etc. Reducers are eager, lazy-seqs are lazy, and core.async channels are lazy and push-based. This is a way to unify all these abstractions so that you can write a single map/filter/concat transform once, and use it in many different ways.
[+] jzelinskie|11 years ago|reply
I think this is just announcing that the Clojure core libraries are gaining another arity for many functions specifically for this usage so that you don't have to use (partial) or the short-hand #() syntax to use this pattern. I assume the goal is more community awareness and cleaner syntax for the re-use of this pattern.
[+] nohat00|11 years ago|reply
> "these transformers were never exposed a la carte, instead being encapsulated by the macrology of reducers."

What does 'macrology' mean in this context? Is this a common usage? Or a novel application of a word that ordinarily means "long and tedious talk without much substance"