/
Data engineering

Stacks-up Sync: Reliable Real-Time Event Streaming Beyond pg_net

Discover PostgreSQL's pg_net extension's performance pitfalls in Docker vs. bare-metal setups through in-depth benchmarking, uncovering timeouts, port exhaustion, and reliability fixes.

Stacks-up Sync: Reliable Real-Time Event Streaming Beyond pg_net

PostgreSQL's pg_net extension promises asynchronous HTTP capabilities for database-driven webhooks, but extensive performance testing reveals critical limitations that compromise production reliability. This technical analysis examines pg_net's performance characteristics, debugging challenges, and infrastructure dependencies through comprehensive benchmarking methodology.

Methodology

For testing, we used the official Supabase Postgres Docker image with pg_net installed. The testing environment utilized Postgres 15.6.1 and pg_net 0.10.1 [1]. This benchmark ran locally with a simple Elixir Phoenix server as the webhook target. The server simulates processing with a controller action returning 200 responses immediately, typically in ~400 µs [1].

We used pgbench to perform transactions against the PostgreSQL database:

pgbench -c 10 -j 2 -t 1000 -P 5 -h localhost -U postgres -f bench.sql postgres

This configuration runs the bench.sql script 1000 times each across 10 connected clients, simulating concurrent operations against target tables.

Benchmark Scenarios

The evaluation focused on three critical scenarios:

  1. Single insert performance - Individual transaction throughput
  2. Batch insert performance - Bulk operation scalability
  3. Batch insert performance with retries - Error handling under load

For each scenario, we measured two key metrics: throughput (requests per second) and request success rate. Importantly, we determined throughput levels achieving reliable ~100% success rates.

The Bench

We implemented a trigger function attached to the spice_harvests table that initiates API requests via pg_net:

CREATE OR REPLACE FUNCTION spice_harvests_webhook()
   RETURNS TRIGGER
   LANGUAGE plpgsql
AS $$
BEGIN
   PERFORM net.http_post(
       url := 'http://host.docker.internal:4000/api/spice-harvest',
       body := NEW::jsonb,
       headers := '{"Content-Type": "application/json"}'::jsonb
   );
   RETURN NEW;
END;
$$;

pgbench typically finished the 1000 transactions in ~1s, then we allowed pg_net time to work through the net.http_request_queue table [1].

First Issues

Initial benchmarking with pg_net's default configuration revealed concerning results:

  • Success Rate: 94.72%
  • Throughput: 74.31 requests/second
  • Failed Requests: 528 with NULL status codes

Using the out-of-the-box Supabase Docker image for local development led to significant performance issues, including a high rate of request failures and throughput far below expectations. These issues were compounded by the fact that pg_net is challenging to debug when something goes wrong [1].

Error analysis revealed three primary failure modes:

  • "Timeout was reached" (80 requests)
  • "Couldn't connect to server" (53 requests)
  • "Server returned nothing (no headers, no data)" (395 requests)

As a sanity check, we verified our webhook server could handle the expected load. A direct HTTP test achieved 100% success rate at 1,707 requests per second, confirming the failures originated between PostgreSQL and the server.

Timeout Was Reached

pg_net configuration parameters include pg_net.batch_size (default: 200), pg_net.ttl (default: 6 hours), with the worker performing deletion while processing requests [2]. The timeout errors stem from libCurl, pg_net's underlying HTTP driver, indicating low-level networking bottlenecks.

Lowering batch_size from 200 to 100 improved performance:

  • Success Rate: 99.15%
  • Throughput: 50.26 requests/second
  • Failed Requests: 17 (all timeout errors)

Further reducing batch_size to 50 finally achieved 100% success rate at 97.36 requests/second.

Increasing timeout_milliseconds to 30 seconds with default batch_size achieved ~98% success rate at ~100 req/sec, but significant errors persisted. The most common error became "Server returned nothing (no headers, no data)" - indicating connection closure at the libCurl level.

Ephemeral Port Exhaustion

Network analysis using netstat revealed over 1000+ connections in TIME_WAIT state:

tcp4       0      0  127.0.0.1.59696        127.0.0.1.4000         TIME_WAIT
tcp4       0      0  127.0.0.1.59656        127.0.0.1.4000         TIME_WAIT

This indicates ephemeral port exhaustion - the operating system running out of available ports to establish new TCP connections. Each TCP connection enters TIME_WAIT after closure to ensure packet delivery, consuming the ephemeral port pool.

With high connection rates, TIME_WAIT sockets accumulate faster than they're released, causing new connection attempts to fail. We attempted system-level TCP optimizations:

sudo sysctl -w net.inet.tcp.msl=1000
sudo sysctl -w net.inet.ip.portrange.hifirst=32768
sudo sysctl -w net.inet.tcp.keepidle=10000

Unfortunately, this barely helped performance, with success rates remaining at 93.87% and throughput at 42.83 requests/second [1].

Docker?

To address the performance issues, we pulled the Postgres + pg_net setup out of Docker and ran Postgres on bare metal with brew on MacOS. This required building pg_net from source and enabling it in postgresql.conf [1].

The pg_net repository provides build instructions, though we encountered initial clang compilation errors requiring Xcode developer tools installation.

Running bare metal Postgres with pg_net achieved the promised ~1k/sec throughput at 100% success rate: Success Rate: 100.00%, Throughput: 937.19 requests/second [1].

Conclusion

This benchmarking exercise demonstrated that pg_net can deliver the high throughput and reliability it promises—but only under the right conditions. When running on bare-metal PostgreSQL installations, we achieved over 900 requests per second with 100% success rate, aligning with advertised performance metrics [1].

However, the journey to this point was far from straightforward. Using standard Supabase Docker images led to significant performance issues and high request failure rates. These issues were compounded by challenging debugging capabilities and the necessity to dive deep into system-level networking configurations—such as ephemeral port exhaustion [1].

This experience highlights a critical shortcoming: without robust retry mechanisms, you cannot guarantee that webhooks will reach their target HTTP endpoints [1]. For production systems requiring reliable real-time data synchronization between databases, CRMs, and ERPs, purpose-built platforms like Stacksync eliminate these infrastructure-level challenges while providing guaranteed delivery, comprehensive error handling, and bi-directional sync capabilities that pg_net cannot offer.

The default Supabase Docker image may not be sufficient for reliable development or production use without significant tweaks and deep system-level understanding. Additional benchmark scenarios will be covered in future posts [1] where we'll explore batch insert performance and larger payload handling.