What file formats are the existing datasets you have? I also work on data processing in a scientific domain where HDF5 is a common format. Unfortunately Duckdb doesn't support HDF5 out of the box, and the existing hdf5 extension wasn't fast enough and didn't have the features needed, so I made a new one based on the c++ extension template. I'd love to collaborate on it if anyone is interested.
steve_adams_86|1 month ago
Right now we typically read from CSV or Excel, because that's what the scientists prefer to work with. For better or worse. There's a bit of parquet kicking around. The wrappers around handling imports for DuckDB are very, very thin. It handles just about everything seamlessly
Is this the extension that was too slow? https://duckdb.org/community_extensions/extensions/hdf5