BG Image
Query Patterns
Jan 21, 2026

Incremental Processing: Avoid Full Table Scans

Process only new or changed data instead of reprocessing entire tables to dramatically reduce compute costs and improve ETL efficiency.

Raj
CEO, MaxMyCloud

Incremental Processing: Avoid Full Table Scans

Process only new or changed data instead of reprocessing entire tables to dramatically reduce compute costs and improve ETL efficiency.

The Problem

Full table rewrites are expensive. If you're processing the same historical data repeatedly, you're wasting compute.

Full vs Incremental Processing

Full Rewrite (Expensive):

CREATE OR REPLACE TABLE summary AS
SELECT
product_id,
SUM(sales) as total_sales
FROM fact_sales -- Scans all historical data
GROUP BY 1;

-- Daily cost: Process 10TB = 100 credits/day

Incremental Processing (Efficient):

-- Use streams to track changes
CREATE STREAM sales_stream ON TABLE fact_sales;

CREATE TASK update_summary
WAREHOUSE = etl_wh
SCHEDULE = 'USING CRON 0 2 * * * UTC'
WHEN SYSTEM$STREAM_HAS_DATA('sales_stream')
AS
MERGE INTO summary t
USING (
SELECT product_id, SUM(sales) as new_sales
FROM sales_stream
GROUP BY 1
) s ON t.product_id = s.product_id
WHEN MATCHED THEN UPDATE SET t.total_sales = t.total_sales + s.new_sales
WHEN NOT MATCHED THEN INSERT VALUES (s.product_id, s.new_sales);

-- Daily cost: Process 100GB new data = 1 credit/day

Implementing Incremental Patterns

1. Watermark-Based

-- Track last processed timestamp
CREATE TABLE etl_watermarks (
table_name VARCHAR,
last_processed_timestamp TIMESTAMP
);

-- Process only new records
INSERT INTO target
SELECT * FROM source
WHERE updated_at > (
SELECT last_processed_timestamp
FROM etl_watermarks
WHERE table_name = 'source'
);

-- Update watermark
UPDATE etl_watermarks
SET last_processed_timestamp = CURRENT_TIMESTAMP()
WHERE table_name = 'source';

2. Stream-Based (Preferred)

CREATE STREAM source_changes ON TABLE source;

-- Automatically tracks INSERT, UPDATE, DELETE
MERGE INTO target t
USING source_changes s
ON t.id = s.id
WHEN MATCHED AND s.METADATA$ACTION = 'DELETE'
THEN DELETE
WHEN MATCHED AND s.METADATA$ACTION = 'INSERT'
THEN UPDATE SET t.value = s.value
WHEN NOT MATCHED AND s.METADATA$ACTION = 'INSERT'
THEN INSERT VALUES (s.id, s.value);

Real-World Example

An ETL process regenerated entire 5TB summary table daily from 50TB fact table. Daily scan 50TB, 500 credits/day, $1,500/day = $45,000/month.

After switching to incremental using streams: Daily new data ~200GB, 2 credits/day, $6/day = $180/month. Savings: $44,820/month (99.6% reduction).

Best Practices

  1. Use Streams for CDC (Change Data Capture)
  2. Implement watermark tracking for external sources
  3. Use MERGE instead of DELETE + INSERT
  4. Process only changed data
  5. Test incremental logic thoroughly

Key Takeaways

  • Incremental processing reduces compute by 90-99%
  • Use Streams for automatic change tracking
  • Watermarks work for external data sources
  • MERGE is more efficient than DELETE + INSERT
  • Massive cost savings with minimal complexity

Recent blogs

Start Optimizing Your Snowflake Costs Today

Uncover hidden inefficiencies and start reducing Snowflake spend in minutes no disruption, no risk.