(no title)
justinsaccount | 7 months ago
SELECT *
FROM passwords
WHERE (password LIKE '%password%') AND (password LIKE '%123456%')
ORDER BY user ASC
INTO OUTFILE '/tmp/res.txt'
Query id: 9cafdd86-2258-47b2-9ba3-2c59069d7b85
12209 rows in set. Elapsed: 2.401 sec. Processed 1.40 billion rows, 25.24 GB (583.02 million rows/s., 10.51 GB/s.)
Peak memory usage: 62.99 MiB.And this is on a Xeon W-2265 from 2020.
If you don't want to use clickhouse you could try duckdb or datafusion (which is also rust).
In general, the way I'd make your program faster is to not read the data line by line... You probably want to do something like read much bigger chunks, ensure they are still on a line boundary, then search those larger chunks for your strings. Or look into using mmap and search for your strings without even reading the files.
No comments yet.