top | item 22050152

(no title)

cutety | 6 years ago

Stream would technically better, however given the discussion is about Map/Reduce, the only thing Stream has in common with Map/Reduce is it's lazy. If you wanted something comparable (mapping is done in parallel, reducing as well just over partitions), then you'd want to use the Flow[1] library. As it does the same thing as Stream.map |> Enum.reduce just parallelized/partitioned, and what's great is the Flow module is more-or-less a drop in replacement for Enum/Stream (with a few caveats like calling Flow.partition before Flow.reduce). But, with just some quick a dirty benchmarks you can see Flow outperforms Stream on all but the smallest data set (range 1..100):

    with_stream = fn range ->
      range
      |> Stream.filter(&(rem(&1, 3) == 0))
      |> Stream.map(&(&1 * &1))
      |> Enum.reduce(0, &Kernel.+/2)
    end
    
    with_flow = fn range ->
      range
      |> Flow.from_enumerable()
      |> Flow.filter(&(rem(&1, 3) == 0))
      |> Flow.map(&(&1 * &1))
      |> Flow.partition()
      |> Flow.reduce(fn -> [0] end, fn val, [acc | _] ->
        [Kernel.+(val, acc)]
      end)
      |> Enum.sum()
    end
    
    iex(4)> Benchee.run(
    iex(4)>   %{"stream" => with_stream, "flow" => with_flow},
    iex(4)>   inputs: %{"small" => 1..100, "medium" => 1..10_000, "large" => 1..10_000_000}
    iex(4)> )
    Operating System: macOS
    CPU Information: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
    Number of Available Cores: 4
    Available memory: 8 GB
    Elixir 1.9.4
    Erlang 22.2.1
    
    Benchmark suite executing with the following configuration:
    warmup: 2 s
    time: 5 s
    memory time: 0 ns
    parallel: 1
    inputs: large, medium, small
    Estimated total run time: 42 s
    
    Benchmarking flow with input large...
    Benchmarking flow with input medium...
    Benchmarking flow with input small...
    Benchmarking stream with input large...
    Benchmarking stream with input medium...
    Benchmarking stream with input small...
    
    ##### With input large #####
    Name             ips        average  deviation         median         99th %
    flow          0.0994        10.06 s     ±0.00%        10.06 s        10.06 s
    stream        0.0782        12.78 s     ±0.00%        12.78 s        12.78 s
    
    Comparison:
    flow          0.0994
    stream        0.0782 - 1.27x slower +2.72 s
    
    ##### With input medium #####
    Name             ips        average  deviation         median         99th %
    flow           83.87       11.92 ms    ±20.48%       11.30 ms       25.53 ms
    stream         74.88       13.35 ms    ±32.02%       12.32 ms       30.22 ms
    
    Comparison:
    flow           83.87
    stream         74.88 - 1.12x slower +1.43 ms
    
    ##### With input small #####
    Name             ips        average  deviation         median         99th %
    stream        4.98 K        0.20 ms    ±87.16%       0.169 ms        0.56 ms
    flow          0.70 K        1.42 ms    ±21.58%        1.35 ms        2.52 ms
    
    Comparison:
    stream        4.98 K
    flow          0.70 K - 7.06x slower +1.22 ms
[1] https://hexdocs.pm/flow/Flow.html

discuss

order

No comments yet.