Accelerating Analytics with pg_duckdb for Self-Hosted Supabase

PostgreSQL is outstanding for transactional workloads—inserts, updates, lookups, and carefully indexed queries. But ask it to aggregate millions of rows for an analytics dashboard, and you'll be staring at a spinning cursor. This is the classic OLTP vs OLAP divide, and it's why many teams maintain separate analytics databases.

If you're self-hosting Supabase, you have an advantage that Supabase Cloud users don't: the freedom to install any PostgreSQL extension you want. Enter pg_duckdb—an extension that embeds DuckDB's columnar-vectorized analytics engine directly inside your Postgres, accelerating analytical queries by up to 600x without exporting data or managing a separate system.

What is pg_duckdb?

pg_duckdb is an open-source PostgreSQL extension developed in collaboration with Hydra and MotherDuck. In 2026, Supabase acquired Hydra—the team behind pg_duckdb—to build Supabase Warehouse, signaling that embedded analytics is becoming a first-class citizen in the Supabase ecosystem.

The extension works by intercepting analytical queries and executing them through DuckDB's OLAP engine instead of Postgres's row-based executor. Your data stays in PostgreSQL tables, but computation happens in a format optimized for scanning and aggregating large datasets.

Key capabilities include:

No data export required: Query your existing PostgreSQL tables directly
Transparent execution: Enable with a simple SET duckdb.force_execution = true;
Data lake access: Read Parquet, CSV, JSON, Iceberg, and Delta Lake from S3, GCS, Azure, and R2
MIT licensed: Fully open source, hosted in the official DuckDB GitHub repository

Why Self-Hosted Users Have the Advantage

Supabase Cloud restricts certain extensions for security and multi-tenancy reasons. pg_duckdb isn't available on managed Supabase—you can't install custom extensions that embed another execution engine.

Self-hosted Supabase removes these constraints. You control the Postgres instance, which means you can install pg_duckdb and unlock analytics performance that would otherwise require a separate data warehouse. This is one of those features missing from Supabase Cloud that makes self-hosting compelling for data-intensive applications.

Setting Up pg_duckdb on Self-Hosted Supabase

There are two approaches depending on your infrastructure preferences.

Option 1: Using the pg_duckdb Docker Image

The fastest path is to use the official pg_duckdb Docker image instead of the standard Supabase Postgres image:

docker run -d \
  --name supabase-analytics \
  -e POSTGRES_PASSWORD=your_secure_password \
  -p 5432:5432 \
  pgduckdb/pgduckdb:18-v1.1.1

This image includes PostgreSQL 18 with pg_duckdb pre-installed. You can then configure it as your Supabase database backend.

Option 2: Installing on Existing Supabase Postgres

If you're running self-hosted Supabase with Docker Compose, you'll need to build a custom Postgres image with pg_duckdb included.

First, create a Dockerfile:

FROM supabase/postgres:15.6.1

# Install pg_duckdb dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    cmake \
    git \
    postgresql-server-dev-15

# Clone and build pg_duckdb
RUN git clone https://github.com/duckdb/pg_duckdb.git /tmp/pg_duckdb \
    && cd /tmp/pg_duckdb \
    && make \
    && make install

# Clean up
RUN rm -rf /tmp/pg_duckdb \
    && apt-get remove -y build-essential cmake git \
    && apt-get autoremove -y \
    && rm -rf /var/lib/apt/lists/*

Update your docker-compose.yml to use this custom image, then enable the extension:

CREATE EXTENSION IF NOT EXISTS pg_duckdb;

Option 3: Using Pigsty for Production Deployments

For production environments, Pigsty offers a batteries-included PostgreSQL distribution with pg_duckdb pre-packaged. Pigsty v3.1 includes pg_duckdb alongside other OLAP extensions like pg_analytics and pg_mooncake, and supports running Supabase on top of highly-available PostgreSQL clusters.

Running Your First Analytical Query

Once pg_duckdb is installed, running analytical queries is straightforward. The key is the duckdb.force_execution setting:

-- Enable DuckDB execution for this session
SET duckdb.force_execution = true;

-- Run your analytical query
SELECT 
    date_trunc('day', created_at) as day,
    COUNT(*) as order_count,
    SUM(total_amount) as revenue,
    AVG(total_amount) as avg_order_value
FROM orders
WHERE created_at >= NOW() - INTERVAL '90 days'
GROUP BY date_trunc('day', created_at)
ORDER BY day;

With duckdb.force_execution = true, pg_duckdb intercepts this aggregation query and executes it through DuckDB's vectorized engine. For tables with millions of rows, you'll see dramatic performance improvements—often 10x to 100x faster than standard PostgreSQL.

Practical Use Cases

Dashboard Analytics

Real-time dashboards that aggregate large datasets are the sweet spot for pg_duckdb. Instead of pre-computing metrics with materialized views or external ETL, you can query live data:

SET duckdb.force_execution = true;

-- Customer cohort analysis
WITH monthly_signups AS (
    SELECT 
        date_trunc('month', created_at) as cohort_month,
        user_id
    FROM users
),
monthly_revenue AS (
    SELECT 
        user_id,
        date_trunc('month', order_date) as revenue_month,
        SUM(amount) as revenue
    FROM orders
    GROUP BY user_id, date_trunc('month', order_date)
)
SELECT 
    cohort_month,
    revenue_month,
    COUNT(DISTINCT ms.user_id) as users,
    SUM(revenue) as total_revenue
FROM monthly_signups ms
JOIN monthly_revenue mr ON ms.user_id = mr.user_id
GROUP BY cohort_month, revenue_month
ORDER BY cohort_month, revenue_month;

Data Lake Queries

pg_duckdb can query external data sources directly, making it useful for analyzing logs, events, or archived data stored in object storage:

-- Query Parquet files from S3
SELECT *
FROM read_parquet('s3://your-bucket/events/*.parquet')
WHERE event_date >= '2026-01-01';

This integrates naturally with S3-compatible storage configurations you might already have for backups.

Ad-Hoc Exploration

Product teams and analysts often need to run exploratory queries that weren't anticipated when indexes were designed. pg_duckdb handles these gracefully without requiring index maintenance:

SET duckdb.force_execution = true;

-- Find products with unusual return rates
SELECT 
    p.category,
    p.name,
    COUNT(o.id) as orders,
    COUNT(r.id) as returns,
    ROUND(100.0 * COUNT(r.id) / NULLIF(COUNT(o.id), 0), 2) as return_rate
FROM products p
LEFT JOIN order_items oi ON p.id = oi.product_id
LEFT JOIN orders o ON oi.order_id = o.id
LEFT JOIN returns r ON o.id = r.order_id
GROUP BY p.category, p.name
HAVING COUNT(o.id) > 100
ORDER BY return_rate DESC;

Performance Considerations

When pg_duckdb Helps

pg_duckdb excels at:

Full table scans: Aggregating across large portions of a table
Complex aggregations: GROUP BY with multiple columns and aggregate functions
Joins on large tables: Particularly when indexes aren't available
Ad-hoc queries: Exploratory analysis without pre-built indexes

When to Stick with PostgreSQL

Standard PostgreSQL execution remains optimal for:

Point lookups: Fetching individual rows by primary key
Indexed range queries: WHERE clauses that hit covering indexes
Small result sets: Queries returning a few hundred rows
Transactional writes: INSERT, UPDATE, DELETE operations

Resource Usage

pg_duckdb runs within your PostgreSQL process and shares memory. For production deployments, consider:

Dedicated replica: Run pg_duckdb queries against a read replica to avoid impacting transactional workloads
Memory allocation: DuckDB can consume significant memory for large queries; monitor usage with your observability stack
Connection isolation: Use a separate connection pool for analytics queries

Comparing Alternatives

pg_duckdb vs Materialized Views

Materialized views pre-compute results, which is faster for repeated queries but requires refresh scheduling and storage overhead. pg_duckdb queries live data without maintenance burden.

Aspect	pg_duckdb	Materialized Views
Data freshness	Real-time	Stale until refresh
Maintenance	None	Refresh scheduling
Storage	None	Duplicates data
Ad-hoc queries	Excellent	Not supported
Known queries	Good	Excellent

For dashboards with fixed queries, materialized views may be more efficient. For exploratory analysis, pg_duckdb wins.

pg_duckdb vs External Data Warehouses

BigQuery, Snowflake, and similar services require data export pipelines and additional infrastructure. pg_duckdb keeps everything in Postgres:

Aspect	pg_duckdb	External Warehouse
Setup complexity	Low	High
Data latency	Zero	Minutes to hours
Additional cost	None	Usage-based
Query capabilities	Good	Excellent
Scale	Single node	Unlimited

For most self-hosted Supabase deployments, pg_duckdb provides sufficient analytics capability without the operational complexity of maintaining a separate data warehouse.

Integration with Supascale

If you're using Supascale to manage your self-hosted Supabase deployment, adding pg_duckdb requires building a custom Postgres image as described above. Supascale's selective service deployment capabilities let you run only the services you need, and you can integrate a pg_duckdb-enabled Postgres instance into your existing setup.

The automated backup system continues to work normally—pg_duckdb is just an extension that enhances query execution without changing how data is stored.

Limitations and Trade-offs

Be aware of these constraints:

Not all queries are accelerated: pg_duckdb works best with analytical patterns; some queries fall back to PostgreSQL
Extension management: You're responsible for updates and compatibility with Postgres versions
Single-node scaling: Unlike distributed warehouses, pg_duckdb is limited to your server's resources
Memory pressure: Large analytical queries can compete with transactional workloads if run on the same instance

Getting Started Today

If your self-hosted Supabase application needs faster analytics without the complexity of a separate data warehouse, pg_duckdb is worth exploring:

Test with the official Docker image on a development instance
Run your slowest dashboard queries with SET duckdb.force_execution = true;
Compare execution times against standard PostgreSQL
If results are promising, plan a production deployment using a read replica

For teams already managing multiple Supabase projects, adding an analytics-optimized replica with pg_duckdb can serve all projects without duplicating infrastructure.