BG Image
Data Loading
Jan 21, 2026

Optimize COPY Commands: Load Data 3x Faster Use 50% Fewer Credits

File size compression and parallel loading significantly impact load performance and costs. Discover the optimal configuration for your data pipeline.

Raj
CEO, MaxMyCloud

Optimize COPY Commands: Load Data 3x Faster, Use 50% Fewer Credits

File size, compression, and parallel loading significantly impact load performance and costs. Discover the optimal configuration for your data pipeline.

The Problem

Many organizations use default COPY settings, loading data inefficiently. This results in slow loads, wasted compute, and higher costs than necessary.

File Size Optimization

  • Too Small: (<100MB) Causes excessive overhead, poor parallelization
  • Optimal: 100-250MB compressed files for best parallelization
  • Too Large: (>1GB) Reduces parallelization, increases retry costs

Compression Matters

GZIP compression typically provides 3-5x reduction. Comparison:

  • 10GB uncompressed: 15 minutes, 20 credits
  • 2GB compressed: 8 minutes, 10 credits

Optimal COPY Configuration

COPY INTO target_table
FROM @my_stage/path/
FILE_FORMAT = (
TYPE = CSV
COMPRESSION = GZIP
FIELD_DELIMITER = ','
SKIP_HEADER = 1
)
SIZE_LIMIT = 250000000 -- 250MB per file
ON_ERROR = CONTINUE
PURGE = TRUE;

Best Practices

  1. File Size: 100-250MB compressed per file
  2. Compression: Always use GZIP or ZSTD
  3. Parallelization: Split large files before loading
  4. Error Handling: Use ON_ERROR = CONTINUE for resilience
  5. Cleanup: Always use PURGE = TRUE

Real-World Example

A data engineering team was loading 100GB of daily data using 10 large uncompressed files (10GB each). Loads took 45 minutes and consumed 60 credits. By compressing files with GZIP (reduced to 20GB total), splitting into 80 files of 250MB each, and optimizing COPY settings, load time dropped to 12 minutes and credit consumption to 20 credits - a 67% cost reduction.

Key Takeaways

  • Optimal file size is 100-250MB compressed
  • Always use compression (GZIP or ZSTD)
  • More files = better parallelization (to a point)
  • Use PURGE = TRUE to clean up staged files

Recent blogs

Start Optimizing Your Snowflake Costs Today

Uncover hidden inefficiencies and start reducing Snowflake spend in minutes no disruption, no risk.