
Optimize your data loading processes with best practices for file sizing compression and parallel processing to reduce load times and costs.

Optimize your data loading processes with best practices for file sizing, compression, and parallel processing to reduce load times and costs.
COPY INTO target_table
FROM @my_stage/
FILE_FORMAT = (
TYPE = PARQUET
COMPRESSION = SNAPPY
)
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE
ON_ERROR = CONTINUE
PURGE = TRUE;
FormatLoad SpeedCompressionBest UseParquetFastestExcellentLarge datasets, analyticsCSV (GZIP)ModerateGoodUniversal compatibilityJSONSlowestFairSemi-structured data
-- BAD: Single 10GB file
-- Load time: 40 minutes, poor parallelization
-- GOOD: 50 files × 200MB each
-- Load time: 8 minutes, excellent parallelization
-- 5x faster, same data volume
A company loaded 500GB daily using 5 large CSV files (100GB each uncompressed). Load time 120 minutes, 180 credits consumed.
After optimization: Compressed with GZIP (500GB → 75GB), split into 300 files × 250MB, converted to Parquet. Load time 18 minutes (6.7x faster), 30 credits consumed (83% reduction).
SELECT
TABLE_NAME,
AVG(FILE_SIZE)/1024/1024 as AVG_FILE_SIZE_MB,
AVG(ROW_COUNT) as AVG_ROWS_PER_FILE,
AVG(LOAD_TIME) as AVG_LOAD_SECONDS,
COUNT(*) as FILES_LOADED
FROM INFORMATION_SCHEMA.LOAD_HISTORY
WHERE LAST_LOAD_TIME >= DATEADD(day, -7, CURRENT_TIMESTAMP())
GROUP BY 1;
Uncover hidden inefficiencies and start reducing Snowflake spend in minutes no disruption, no risk.