top | item 24240345

(no title)

xashor | 5 years ago

In J (which might be slower than K) with excessive comments for a one-liner:

    echo@> (2{ARGV) -.&([: <;._1 LF, 1!:1) (3{ARGV)
    NB.    2nd arg                          3rd arg
    NB.              g&f execute f for each, then g on both
    NB.                              1!:1 read file
    NB.                          LF, prepend newline
    NB.                 [: <;._1 split based on first char
    NB.             -.  remove right elements from left array
    NB. echo@> echo each line
    exit 0

On two ~1.6MB files with ~15k lines (both the same except 3) I had lying around:

    $ time j9 -c ./pseudo_grep.ijs test_b test_a
    …
    real   0m0.064s
    user   0m0.032s
    sys    0m0.017s
    $ time grep -vf test_b test_a
    …
    real   0m5.815s
    user   0m5.234s
    sys    0m0.576s

Note that most of the script is for loading each file into an array of lines. Most work is done by -. on the two arrays, which is exactly what you asked for, e.g. 0 1 2 3 4 -. 2 4 is 0 1 3. https://code.jsoftware.com/wiki/Vocabulary/minusdot#dyadic

discuss

DylanDmitri|5 years ago

In loopless Python:

    set(open('file_b')) - set(open('file_a'))

Slower than J by a factor of 2-3, but still 10x faster than grep:

    real    0m0.128s
    user    0m0.078s
    sys     0m0.063s

This would make a good Rosetta Code prompt.

throwaway_pdp09|5 years ago

I didn't know you could simply open a file and setify it. Interesting. & neat.

qmmmur|5 years ago

this is cheeky, I like it

dunefox|5 years ago

This is only fast because it hits C underneath, isn't it?

kbenson|5 years ago

That grep is not doing the same thing as the code, nor necessarily what the exercise requires. By default, grep tests patterns, so it's turning all those entries into individual regular expressions. You want to use fgrep, or the -F flag to make it treat all the source matches as fixes strings.

In my simple test, that resulting in grep running in 44% of the prior amount of time it required (still more than python though).

1vuio0pswjnm7|5 years ago

Apologies for the careless omission. I tested the difference on a larger job; with grep 28s, with fgrep 22s.