Logical Replication for Self-Hosted Supabase: Stream Data to External Systems

Set up PostgreSQL logical replication on self-hosted Supabase to sync data with analytics platforms, warehouses, and external databases.

Cover Image for Logical Replication for Self-Hosted Supabase: Stream Data to External Systems

Running a self-hosted Supabase instance gives you full control over your data—but what happens when that data needs to flow elsewhere? Whether you're feeding a data warehouse, syncing with a legacy system, or powering real-time analytics dashboards, PostgreSQL's logical replication is your escape hatch from data silos.

This guide walks through setting up logical replication on self-hosted Supabase to stream changes to external PostgreSQL databases, data warehouses, and analytics platforms. No managed service lock-in required.

Why Logical Replication Matters for Self-Hosted Users

Physical replication copies the entire database at the binary level—great for read replicas, but inflexible. Logical replication works differently: it streams row-level changes (INSERT, UPDATE, DELETE) using the Write-Ahead Log (WAL), letting you:

  • Sync specific tables to external systems without copying everything
  • Transform data in flight before it reaches the destination
  • Replicate across PostgreSQL versions (even major version differences)
  • Feed non-PostgreSQL destinations using CDC tools like Debezium

For self-hosted Supabase users, this opens up powerful integrations that Supabase Cloud's managed replication (currently in private alpha) doesn't yet fully support.

Real Use Cases from the Community

The Supabase community on GitHub Discussions has documented several practical applications:

  • Analytics pipelines: Streaming operational data to BigQuery or ClickHouse for reporting
  • Multi-database architectures: Keeping a legacy MySQL system in sync during migration
  • Compliance and auditing: Replicating to an immutable audit database
  • Cross-region distribution: Syncing data to PostgreSQL instances closer to users

Prerequisites

Before diving into configuration, ensure your self-hosted Supabase setup meets these requirements:

PostgreSQL Configuration

Logical replication requires specific PostgreSQL settings. If you deployed Supabase using our installation guide, you'll need to verify these settings in your postgresql.conf:

-- Check current settings
SHOW wal_level;          -- Must be 'logical'
SHOW max_replication_slots;  -- At least 4 (1 per subscription + buffer)
SHOW max_wal_senders;    -- At least 4

If wal_level isn't set to logical, you'll need to modify your Docker configuration. In your docker-compose.yml, add these command arguments to the db service:

db:
  image: supabase/postgres:15.6.1.145
  command: >
    postgres
    -c wal_level=logical
    -c max_replication_slots=10
    -c max_wal_senders=10

Important: Changing wal_level requires a PostgreSQL restart. Plan for brief downtime.

Network Requirements

Your destination database needs to reach the source Supabase instance. For self-hosted setups, this typically means:

  • Opening port 5432 (or your custom PostgreSQL port) in your firewall
  • Ensuring the destination supports IPv6, or configuring IPv4 access
  • Using direct connections—connection poolers (PgBouncer/Supavisor) don't support replication protocols

Setting Up Logical Replication

The setup involves three components working together:

  1. Publication: Defines which tables to replicate on the source
  2. Replication Slot: Tracks WAL position for the subscriber
  3. Subscription: Created on the destination to pull changes

Step 1: Create a Publication on Supabase

Connect to your Supabase PostgreSQL instance and create a publication for the tables you want to replicate:

-- Replicate specific tables
CREATE PUBLICATION my_analytics_pub FOR TABLE 
  public.orders,
  public.users,
  public.products;

-- Or replicate all tables in a schema
CREATE PUBLICATION all_public FOR ALL TABLES IN SCHEMA public;

Best practice: Start with specific tables rather than FOR ALL TABLES. This reduces WAL traffic and makes it easier to manage schema changes.

Step 2: Create a Replication User

Don't use your superuser credentials for replication. Create a dedicated user with minimal required permissions:

-- Create replication user
CREATE USER replication_user WITH REPLICATION LOGIN PASSWORD 'secure_password_here';

-- Grant access to the publication
GRANT USAGE ON SCHEMA public TO replication_user;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO replication_user;

-- For future tables
ALTER DEFAULT PRIVILEGES IN SCHEMA public 
  GRANT SELECT ON TABLES TO replication_user;

Step 3: Configure pg_hba.conf for Replication Access

Add a line to pg_hba.conf allowing replication connections from your destination:

# Allow replication from specific IP
hostssl replication replication_user 203.0.113.50/32 scram-sha-256

# Or from a subnet (less secure)
hostssl replication replication_user 10.0.0.0/8 scram-sha-256

Reload PostgreSQL to apply:

docker exec supabase-db pg_ctl reload

Step 4: Create Subscription on Destination

On your destination PostgreSQL database, create the subscription:

CREATE SUBSCRIPTION my_analytics_sub
  CONNECTION 'host=your-supabase-host.com port=5432 dbname=postgres user=replication_user password=secure_password_here sslmode=require'
  PUBLICATION my_analytics_pub;

PostgreSQL will:

  1. Create a replication slot on the source automatically
  2. Copy existing table data (initial sync)
  3. Begin streaming new changes

Step 5: Verify Replication Status

Check that replication is working on the source:

-- View active replication slots
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots;

-- Monitor replication lag
SELECT 
  client_addr,
  state,
  sent_lsn,
  write_lsn,
  flush_lsn,
  replay_lsn,
  pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes
FROM pg_stat_replication;

On the destination, verify data is flowing:

-- Check subscription status
SELECT subname, subenabled, subslotname 
FROM pg_subscription;

-- View subscription stats
SELECT * FROM pg_stat_subscription;

Handling Common Challenges

Schema Changes (DDL)

Here's the hard truth: PostgreSQL logical replication doesn't replicate schema changes. If you add a column on your Supabase source, the destination won't automatically get it.

Workarounds:

  1. Manual synchronization: Apply DDL changes to both databases within your deployment pipeline
  2. Use tools like pgroll: Automates schema migrations across replicated databases
  3. Schema versioning tables: Track schema versions and alert when they diverge

Primary Key Requirements

Every replicated table must have a primary key. Without one, PostgreSQL can't identify which row to update or delete:

-- Check for tables without primary keys
SELECT t.table_schema, t.table_name
FROM information_schema.tables t
LEFT JOIN information_schema.table_constraints tc 
  ON t.table_schema = tc.table_schema 
  AND t.table_name = tc.table_name 
  AND tc.constraint_type = 'PRIMARY KEY'
WHERE t.table_type = 'BASE TABLE'
  AND t.table_schema = 'public'
  AND tc.constraint_name IS NULL;

WAL Retention and Disk Space

Replication slots prevent WAL segments from being deleted until the subscriber confirms receipt. If your subscriber goes offline, WAL files accumulate:

-- Check WAL retention per slot
SELECT slot_name, 
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal
FROM pg_replication_slots;

Configure a maximum WAL retention to prevent disk exhaustion:

-- In postgresql.conf or as a command argument
max_slot_wal_keep_size = '10GB'

When this limit is reached, the slot becomes invalid and you'll need to recreate the subscription.

Replicating to Non-PostgreSQL Destinations

For data warehouses like BigQuery, Snowflake, or ClickHouse, you'll need a CDC (Change Data Capture) tool to read the WAL and translate it:

Option 1: Debezium

Debezium is the industry standard for CDC. It connects to your PostgreSQL WAL and outputs changes to Kafka, which can then feed any destination:

# docker-compose addition for Debezium
debezium:
  image: debezium/connect:2.5
  environment:
    BOOTSTRAP_SERVERS: kafka:9092
    GROUP_ID: supabase-cdc
    CONFIG_STORAGE_TOPIC: debezium_configs
    OFFSET_STORAGE_TOPIC: debezium_offsets

This is covered in more detail in our Change Data Capture guide.

Option 2: Supabase's Open-Source ETL

Supabase's managed replication is built on an open-source ETL tool. For self-hosted users, you can run it yourself:

git clone https://github.com/supabase/etl
cd etl
# Configure destination in config.yaml
docker-compose up -d

This is particularly useful if you're eventually planning to use Supabase Warehouse (currently in development).

Monitoring Your Replication Pipeline

Once replication is running, you need visibility into its health. Here's a minimal monitoring query you can run periodically or integrate with your monitoring stack:

-- Replication health check
SELECT 
  s.slot_name,
  s.active,
  s.confirmed_flush_lsn,
  pg_current_wal_lsn() AS current_lsn,
  pg_wal_lsn_diff(pg_current_wal_lsn(), s.confirmed_flush_lsn) AS lag_bytes,
  CASE 
    WHEN pg_wal_lsn_diff(pg_current_wal_lsn(), s.confirmed_flush_lsn) > 1073741824 
    THEN 'CRITICAL: >1GB lag'
    WHEN pg_wal_lsn_diff(pg_current_wal_lsn(), s.confirmed_flush_lsn) > 104857600 
    THEN 'WARNING: >100MB lag'
    ELSE 'OK'
  END AS status
FROM pg_replication_slots s
WHERE s.slot_type = 'logical';

Set up alerts when:

  • active becomes false (subscriber disconnected)
  • lag_bytes exceeds your threshold (data falling behind)
  • Replication slot doesn't exist (was dropped)

Performance Considerations

Logical replication adds overhead to your Supabase instance. Here's what to watch:

CPU Impact

Decoding WAL for logical replication uses CPU. On busy databases, consider:

  • Deploying on instances with higher CPU allocation
  • Using replica_identity = DEFAULT (only primary key in WAL) rather than FULL (entire row)

Network Bandwidth

Initial sync copies all existing data. For large tables:

  • Schedule initial sync during low-traffic periods
  • Use copy_data = false in subscription if destination already has the data

Write Amplification

Each write to a replicated table generates WAL entries that must be:

  1. Written to disk
  2. Read by the WAL sender
  3. Sent over the network
  4. Applied at the destination

For write-heavy workloads, this can impact overall throughput.

Managing Replication with Supascale

Setting up logical replication manually works, but managing multiple Supabase projects with different replication targets gets complicated. Supascale helps by providing:

  • Centralized configuration: Manage PostgreSQL settings across projects
  • Backup coordination: Ensure backups include replication slot state for clean failover
  • Monitoring integration: Track replication lag alongside other metrics

The platform handles the operational complexity while you focus on what data goes where.

Troubleshooting

"Replication slot does not exist"

The destination lost its slot reference. On the source:

-- Create slot manually if needed
SELECT pg_create_logical_replication_slot('my_slot_name', 'pgoutput');

Then recreate the subscription pointing to the existing slot.

"Could not start WAL streaming"

Usually a network or authentication issue. Verify:

  1. Firewall allows connections on port 5432
  2. pg_hba.conf has correct entry for replication
  3. SSL settings match between source and destination

Replication Lag Keeps Growing

The destination can't keep up. Check:

  • Destination database performance (CPU, disk I/O)
  • Network latency between source and destination
  • Complex triggers or constraints on destination tables slowing inserts

Conclusion

Logical replication transforms self-hosted Supabase from an isolated database into a data hub that feeds your entire infrastructure. Whether you're building analytics pipelines, maintaining compliance systems, or architecting multi-database applications, the ability to stream changes to external systems is essential.

The setup requires careful attention to PostgreSQL configuration, network security, and monitoring—but once running, it's remarkably reliable. Start with a single table, validate the pipeline end-to-end, then expand to more tables as you build confidence.

For teams managing multiple Supabase projects with complex data flows, Supascale provides the operational layer that makes this manageable at scale.


Further Reading