top | item 23876315

(no title)

The problem with examples like this is that simple FileIO operations are optimized across all languages. It would likely have similar speed in very naive unoptimized Python.

Here is an example:

Given `create_csv.py`:

  with open("output.csv", "w") as csv_file:
    for i in range(1_000_001):
      csv_file.write(",".join([str(i)] * 100))  # creating 100 columns per entry out of a million
      csv_file.write("\n")

and `search_csv.py`:

  import sys
    
  for row in open("output.csv"):
    if "1000000" in row.split(","):
      print(row)
      break
  else: 
    sys.exit("row 1 million not found!")

With the first script creating a 1 million line csv with each row containing 100 columns, and the second one being a worse case search (has to make it all the way to last row, has to search every column). The performance is better than what you mentioned on commodity hardware, very unoptimized Python code, and default Python3 installation:

time python3 create_csv.py

  real    0m3.267s
  user    0m2.589s
  sys     0m0.608s

time python3 search_csv.py

1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000,1000000

  real    0m6.132s
  user    0m5.954s
  sys     0m0.162s

discuss

No comments yet.