Working with large datasets in self-hosted Supabase requires different strategies than single-row operations. Whether you're migrating data from another platform, importing CSV files from users, or syncing data between systems, bulk operations can make the difference between a process that takes minutes versus hours.
This guide covers practical techniques for optimizing batch operations in your self-hosted Supabase instance, from basic batch inserts to advanced import workflows.
Why Bulk Operations Matter for Self-Hosted Deployments
The Supabase REST API (powered by PostgREST) handles single-row operations efficiently, but naive approaches to bulk data fall apart quickly:
- Network overhead compounds: 10,000 individual INSERT requests means 10,000 round trips
- Transaction overhead adds up: Each statement creates its own transaction context
- Connection pool exhaustion: Rapid-fire requests can saturate your connection pooling setup
- Rate limiting kicks in: Even self-hosted deployments have practical limits
The good news: PostgreSQL (and by extension, self-hosted Supabase) offers powerful primitives for bulk operations. You just need to know how to use them.
Batch INSERT: The Foundation
Basic Batch Insert via supabase-js
The simplest improvement over single-row inserts is passing an array to the insert method:
// Instead of this (slow)
for (const user of users) {
await supabase.from('users').insert(user)
}
// Do this (10-100x faster)
await supabase.from('users').insert(users)
The Supabase JavaScript client batches these into a single HTTP request, which PostgREST converts into a multi-row INSERT statement.
Practical limits:
- Keep batches between 500-1,000 rows for optimal performance
- Larger batches (10,000+ rows) can cause memory pressure and lock contention
- Monitor your request payload size—very wide tables may need smaller batches
Chunking Large Datasets
For datasets exceeding your optimal batch size:
function chunk<T>(array: T[], size: number): T[][] {
const chunks: T[][] = []
for (let i = 0; i < array.length; i += size) {
chunks.push(array.slice(i, i + size))
}
return chunks
}
async function bulkInsert(records: any[], batchSize = 500) {
const batches = chunk(records, batchSize)
for (const batch of batches) {
const { error } = await supabase.from('users').insert(batch)
if (error) throw error
}
}
For better user experience with progress tracking:
async function bulkInsertWithProgress(
records: any[],
batchSize = 500,
onProgress?: (completed: number, total: number) => void
) {
const batches = chunk(records, batchSize)
let completed = 0
for (const batch of batches) {
const { error } = await supabase.from('users').insert(batch)
if (error) throw error
completed += batch.length
onProgress?.(completed, records.length)
}
}
Bulk Upsert: Insert or Update in One Operation
When you need to insert new records while updating existing ones, upsert operations eliminate the need for separate insert and update queries:
await supabase
.from('products')
.upsert(products, {
onConflict: 'sku',
ignoreDuplicates: false
})
This translates to PostgreSQL's INSERT ... ON CONFLICT syntax, which is significantly faster than checking existence and branching.
Critical requirement: You need a unique constraint or primary key on the conflict column. For self-hosted deployments, verify your indexes are properly configured.
Handling Partial Updates with Upsert
If you only want to update specific columns on conflict:
-- Create a function for selective upsert
CREATE OR REPLACE FUNCTION bulk_upsert_products(
product_data jsonb
)
RETURNS void AS $$
BEGIN
INSERT INTO products (sku, name, price, updated_at)
SELECT
(item->>'sku')::text,
(item->>'name')::text,
(item->>'price')::numeric,
now()
FROM jsonb_array_elements(product_data) AS item
ON CONFLICT (sku) DO UPDATE SET
price = EXCLUDED.price,
updated_at = now()
WHERE products.price != EXCLUDED.price;
END;
$$ LANGUAGE plpgsql;
Call it via RPC:
await supabase.rpc('bulk_upsert_products', {
product_data: products
})
The COPY Command: Maximum Import Speed
For truly large imports (100,000+ rows), nothing beats PostgreSQL's COPY command. It bypasses the SQL parser overhead and writes data directly to tables.
Using COPY for Self-Hosted Supabase
Since you control your self-hosted instance, you have direct database access:
# Connect directly to PostgreSQL psql "postgresql://postgres:your-password@your-server:5432/postgres" # Import CSV file \copy users(email, name, created_at) FROM '/path/to/users.csv' WITH (FORMAT csv, HEADER true)
Performance benchmarks (approximate):
| Method | 100K rows | 1M rows |
|---|---|---|
| Single INSERTs | 45 min | 7+ hours |
| Batch INSERT (1000 rows) | 2 min | 20 min |
| COPY command | 15 sec | 2.5 min |
Remote COPY via pg_dump/psql
For migrating data between servers:
# Export from source pg_dump -h source-server -U postgres -t users --data-only -F c > users.dump # Import to self-hosted Supabase pg_restore -h your-server -U postgres -d postgres --data-only users.dump
RPC Functions for Complex Bulk Operations
For operations that span multiple tables or require business logic, database functions called via RPC are ideal:
CREATE OR REPLACE FUNCTION process_order_batch(
orders jsonb
)
RETURNS TABLE(order_id uuid, status text) AS $$
DECLARE
order_record jsonb;
new_order_id uuid;
BEGIN
FOR order_record IN SELECT * FROM jsonb_array_elements(orders)
LOOP
-- Insert order
INSERT INTO orders (customer_id, total, created_at)
VALUES (
(order_record->>'customer_id')::uuid,
(order_record->>'total')::numeric,
now()
)
RETURNING id INTO new_order_id;
-- Insert order items
INSERT INTO order_items (order_id, product_id, quantity, price)
SELECT
new_order_id,
(item->>'product_id')::uuid,
(item->>'quantity')::integer,
(item->>'price')::numeric
FROM jsonb_array_elements(order_record->'items') AS item;
-- Update inventory
UPDATE products
SET stock = stock - (item->>'quantity')::integer
FROM jsonb_array_elements(order_record->'items') AS item
WHERE products.id = (item->>'product_id')::uuid;
order_id := new_order_id;
status := 'processed';
RETURN NEXT;
END LOOP;
END;
$$ LANGUAGE plpgsql;
This wraps multiple operations in a single transaction with proper error handling.
Optimizing Self-Hosted Import Performance
Temporarily Disable Triggers
Triggers on every row insertion add significant overhead during bulk imports:
-- Disable triggers before import ALTER TABLE users DISABLE TRIGGER ALL; -- Run your bulk import -- ... -- Re-enable triggers ALTER TABLE users ENABLE TRIGGER ALL;
Warning: This skips trigger-based audit logging and cascade updates. Only use for controlled import scenarios.
Drop and Recreate Indexes
Building indexes incrementally during large inserts is expensive:
-- Save index definitions SELECT indexdef FROM pg_indexes WHERE tablename = 'users'; -- Drop non-essential indexes DROP INDEX idx_users_email; DROP INDEX idx_users_created_at; -- Perform bulk import -- ... -- Recreate indexes (this runs faster on existing data) CREATE INDEX idx_users_email ON users(email); CREATE INDEX idx_users_created_at ON users(created_at);
Tune PostgreSQL Parameters
For import-heavy workloads, temporarily adjust these settings in your self-hosted deployment:
-- Increase memory for sorting and hashing SET work_mem = '256MB'; -- Reduce fsync overhead during bulk load SET synchronous_commit = off; -- Increase checkpoint distance SET checkpoint_completion_target = 0.9;
Remember to reset these after import if they differ from your production tuning.
Handling CSV Imports from Users
A common pattern is allowing users to upload CSV files through your application. Here's a production-ready approach:
Edge Function for CSV Processing
// supabase/functions/import-csv/index.ts
import { serve } from 'https://deno.land/[email protected]/http/server.ts'
import { createClient } from 'https://esm.sh/@supabase/supabase-js@2'
import { parse } from 'https://deno.land/[email protected]/csv/mod.ts'
serve(async (req) => {
const formData = await req.formData()
const file = formData.get('file') as File
const text = await file.text()
const records = parse(text, { skipFirstRow: true })
const supabase = createClient(
Deno.env.get('SUPABASE_URL')!,
Deno.env.get('SUPABASE_SERVICE_ROLE_KEY')!
)
// Process in batches
const batchSize = 500
let processed = 0
let errors: any[] = []
for (let i = 0; i < records.length; i += batchSize) {
const batch = records.slice(i, i + batchSize).map(row => ({
email: row[0],
name: row[1],
// Transform as needed
}))
const { error } = await supabase.from('users').insert(batch)
if (error) {
errors.push({ batch: i / batchSize, error: error.message })
} else {
processed += batch.length
}
}
return new Response(JSON.stringify({
processed,
errors,
total: records.length
}))
})
For files larger than Edge Function memory limits, consider streaming the CSV or using a background job queue.
Transactions for Atomicity
When bulk operations must succeed or fail as a unit, wrap them in a transaction via RPC:
CREATE OR REPLACE FUNCTION atomic_bulk_insert( users_data jsonb, orders_data jsonb ) RETURNS void AS $$ BEGIN -- Insert users INSERT INTO users (email, name) SELECT item->>'email', item->>'name' FROM jsonb_array_elements(users_data) AS item; -- Insert orders (references users) INSERT INTO orders (user_email, total) SELECT item->>'user_email', (item->>'total')::numeric FROM jsonb_array_elements(orders_data) AS item; -- If anything fails, entire transaction rolls back END; $$ LANGUAGE plpgsql;
Monitoring Bulk Operation Performance
Use PostgreSQL's explain analyze to understand query performance:
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) INSERT INTO users (email, name) SELECT * FROM unnest( ARRAY['[email protected]', '[email protected]'], ARRAY['User 1', 'User 2'] );
For ongoing monitoring, check your observability setup for:
- Insert rates per second
- Lock wait times
- WAL generation rate during imports
- Connection pool utilization
Common Pitfalls and Solutions
Row-Level Security Overhead
RLS policies are evaluated for each row during insert. For trusted import processes, consider using the service role key or a dedicated import user:
// For bulk imports only const adminClient = createClient( process.env.SUPABASE_URL!, process.env.SUPABASE_SERVICE_ROLE_KEY! )
Foreign Key Validation
Foreign key checks on each row slow bulk inserts. If you control the data integrity:
-- Temporarily defer constraint checks SET CONSTRAINTS ALL DEFERRED; -- Run bulk insert -- ... -- Constraints checked at commit COMMIT;
Disk Space During Large Imports
Large imports generate significant WAL (write-ahead log) data. Monitor your disk space and consider:
- Pre-allocating disk space
- Adjusting
max_wal_sizefor import workloads - Running CHECKPOINT after large imports
When to Use Supascale for Bulk Operations
Managing bulk imports on self-hosted Supabase involves coordination across multiple services and careful performance tuning. Supascale simplifies this by providing:
- One-click backup and restore for safe data operations
- Resource monitoring to track import impact
- Project management for maintaining separate environments (test bulk imports before production)
Check out the features overview or view pricing to see how Supascale can streamline your self-hosted data operations.
Further Reading
- Database Indexing for Self-Hosted Supabase - Optimize query performance after bulk imports
- PostgreSQL Performance Tuning - System-level tuning for heavy workloads
- Connection Pooling Guide - Handle concurrent bulk operations
- Supabase Documentation: Import Data - Official import guidance
