top | item 43164469

(no title)

kernelsanderz | 1 year ago

For another library that has great performance and features like full text indexing and the ability to version changes I’d recommend lancedb https://lancedb.github.io/lancedb/

Yes, it’s a vector database and has more complexity. But you can use it without creating indexes and it has excellent polars and pandas zero copy arrow support also.

discuss

order

daveguy|1 year ago

Since a lot of ML data is stored as parquet, I found this to be a useful tidbit from lancedb's documentation:

> Data storage is columnar and is interoperable with other columnar formats (such as Parquet) via Arrow

https://lancedb.github.io/lancedb/concepts/data_management/

Edit: That said, I am personally a fan of parquet, arrow, and ibis. So many data wrangling options out there it's easy to get analysis paralysis.

esafak|1 year ago

Lance is made for this stuff; parquet is not.

3abiton|1 year ago

How well does it scale?