Scripting data movement is easy only for small and easy jobs. With many thousands of tables and more than a few TB, all kinds of issues start popping up. I read somewhere that 85% of large data migration projects fail.
Data warehouses really need an optimal Parquet file sizes to work efficiently, and for Snowflake it's roughly 100-200MB per file.
The good way to copy that is relying on DB statistics to determine the optimal number of records per chunk. Then, to have the job finish in a reasonable time, one needs to read a certain number of data chunks in parallel and stream that data into Parquet at S3 (or Azure Blobs). Once the data is up, Snowflake can ingest it.Shameless plug: my company created a commercial solution which does exactly that (https://www.spectralcore.com/omniloader). Happy to answer any questions.
metadat|3 years ago