I see. Would it be fair to say you treat it almost like Pandas, except that it has a lower memory footprint since data is written to disk instead of memory. IE you use it for on the fly analysis of large frames of data, not like more traditional database/datawarehouse?
BTW, your questions are exactly those that I've been ask over the last few months, but also with a lot of focus over the last few days. Still learning as much as I can so the following might not be true.
For what it's worth, there's a difference between using duckdb to query a set of files vs loading a bunch of files in to a table. But once the data has been loaded into a table it can be backed up as a duckdb db file.
Therefore it might be more performant to preprocess duckdb db files (perhaps a process that works in conjunction with whatever manages your external tables) and load these db files into duckdb as needed (on the fly analysis) instead of loading datafiles into duckdb, transforming and CTAS every time.
d_watt|3 years ago
chrisjc|3 years ago
BTW, your questions are exactly those that I've been ask over the last few months, but also with a lot of focus over the last few days. Still learning as much as I can so the following might not be true.
For what it's worth, there's a difference between using duckdb to query a set of files vs loading a bunch of files in to a table. But once the data has been loaded into a table it can be backed up as a duckdb db file.
Therefore it might be more performant to preprocess duckdb db files (perhaps a process that works in conjunction with whatever manages your external tables) and load these db files into duckdb as needed (on the fly analysis) instead of loading datafiles into duckdb, transforming and CTAS every time.
https://duckdb.org/docs/sql/statements/attach
Of course all of this might be introducing more latency esp if you're trying to do NRT analytics.
I assume you could partition your data into multiple db files similar to how you would probably do it with your data files (managing external tables).