(no title)
timothycrosley | 5 years ago
Here is an example:
Given `create_csv.py`:
with open("output.csv", "w") as csv_file:
for i in range(1_000_001):
csv_file.write(",".join([str(i)] * 100)) # creating 100 columns per entry out of a million
csv_file.write("\n")
and `search_csv.py`: import sys
for row in open("output.csv"):
if "1000000" in row.split(","):
print(row)
break
else:
sys.exit("row 1 million not found!")
With the first script creating a 1 million line csv with each row containing 100 columns, and the second one being a worse case search (has to make it all the way to last row, has to search every column). The performance is better than what you mentioned on commodity hardware, very unoptimized Python code, and default Python3 installation:time python3 create_csv.py
real 0m3.267s
user 0m2.589s
sys 0m0.608s
time python3 search_csv.py1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000
real 0m6.132s
user 0m5.954s
sys 0m0.162s
No comments yet.