BG Image
Data Loading
Jan 21, 2026

Data Loading Optimization: Maximize Throughput Minimize Cost

Optimize your data loading processes with best practices for file sizing compression and parallel processing to reduce load times and costs.

Raj
CEO, MaxMyCloud

Data Loading Optimization: Maximize Throughput, Minimize Cost

Optimize your data loading processes with best practices for file sizing, compression, and parallel processing to reduce load times and costs.

Key Optimization Factors

  1. File Size: 100-250MB compressed (optimal parallelization)
  2. Compression: GZIP or ZSTD (5-10x reduction)
  3. File Format: Parquet for best performance
  4. Parallel Processing: More files = better parallelization

Optimal COPY Command

COPY INTO target_table
FROM @my_stage/
FILE_FORMAT = (
TYPE = PARQUET
COMPRESSION = SNAPPY
)
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE
ON_ERROR = CONTINUE
PURGE = TRUE;

File Format Comparison

FormatLoad SpeedCompressionBest UseParquetFastestExcellentLarge datasets, analyticsCSV (GZIP)ModerateGoodUniversal compatibilityJSONSlowestFairSemi-structured data

Parallelization Strategy

-- BAD: Single 10GB file
-- Load time: 40 minutes, poor parallelization

-- GOOD: 50 files × 200MB each
-- Load time: 8 minutes, excellent parallelization
-- 5x faster, same data volume

Real-World Example

A company loaded 500GB daily using 5 large CSV files (100GB each uncompressed). Load time 120 minutes, 180 credits consumed.

After optimization: Compressed with GZIP (500GB → 75GB), split into 300 files × 250MB, converted to Parquet. Load time 18 minutes (6.7x faster), 30 credits consumed (83% reduction).

Monitoring Load Performance

SELECT
TABLE_NAME,
AVG(FILE_SIZE)/1024/1024 as AVG_FILE_SIZE_MB,
AVG(ROW_COUNT) as AVG_ROWS_PER_FILE,
AVG(LOAD_TIME) as AVG_LOAD_SECONDS,
COUNT(*) as FILES_LOADED
FROM INFORMATION_SCHEMA.LOAD_HISTORY
WHERE LAST_LOAD_TIME >= DATEADD(day, -7, CURRENT_TIMESTAMP())
GROUP BY 1;

Key Takeaways

  • Optimal file size: 100-250MB compressed
  • Always use compression (GZIP minimum, Parquet ideal)
  • More files = better parallelization
  • Parquet is fastest for large analytical datasets
  • Can achieve 80%+ cost reduction with proper optimization

Recent blogs

Start Optimizing Your Snowflake Costs Today

Uncover hidden inefficiencies and start reducing Snowflake spend in minutes no disruption, no risk.