July 30, 2020

PostgreSQL packet sniffing

Several months back I needed to troubleshoot a problem that looked like a slow query, but the query wasn't appearing in the slow query log. It was as if there was some kind of pause between preparing and executing the query. I suspected that the Java application code was up to some funny business but I couldn't prove it. The system was (still is) quite busy and many of the queries are prepared at daemon start time so "normal" packet sniffing didn't help me much.

I tried a few things including some Wireshark PostgreSQL plugins but they weren't stateful. I really needed something to link the relevant TCP packets together for me. In a fit of pique I hacked up a first sketch of pgtrace the stateful packet sniffer. I used pgtrace to figure out what was going on with the weird pausing queries.

Since then I've used pgtrace several times to troubleshoot other anomalies faster than I could with a generic packet sniffer.

Getting pgtrace

https://github.com/matthewnourse/pgtrace

Tested on CentOS 7 after compiling with GCC.

Using pgtrace

There's not much to it.

pgtrace device_to_sniff pcap_filter_string

or

pgtrace pcap_file

pgtrace works whether you're sniffing on the client or the server machine. It assumes that the PostgreSQL server is listening on port 5432 and it uses this assumption to figure out which side of the communication is the server.

For example:

pgtrace eth0 "port 2222"

will sniff all network traffic on port 2222. In this scenario pgtrace will assume that 2222 is the client's source port and that the server is listening on port 5432. This would be useful if you have a long-running PostgreSQL connection that you need to sniff.

pgtrace eth0 "port 5432"

will sniff all network traffic on port 5432.

pgtrace dump.pcap

will replay all the data from the packet capture file, as prepared by tcpdump. This can be useful if you're troubleshooting a busy system that generates packets faster than pgtrace can parse them.

pgtrace output

It helps to have the PostgreSQL Frontent/Backend Protocol specs on hand for this.

For example:

# pgtrace lo "port 5432"
pgtrace: 0 Self-test complete. device_or_file='lo' filter='port 5432'
pgtrace: 1596111995311480 source_port=17386 dest_port=5432 seq=3376136114 ack=2164505585 window=3635 size_payload=70 flags=0x18
1596111995311480 17386 fe Bind 15 .S_1.......
1596111995311480 17386 fe Execute 9 .....
pgtrace: 1596111995312776 source_port=16792 dest_port=5432 seq=3795050389 ack=3794399569 window=3635 size_payload=52 flags=0x18
1596111995312776 16792 fe Parse 16 .SELECT 1...
1596111995312776 16792 fe Bind 12 ........
1596111995312776 16792 fe Describe 6 P.
1596111995312776 16792 fe Execute 9 .....

Walking through the above...

# pgtrace lo "port 5432"

Start pgtrace listening on the loopback device on port 5432

pgtrace: 0 Self-test complete. device_or_file='lo' filter='port 5432'

The first output line. pgtrace runs a brief self-test on every run. Lines that aren't PostgreSQL messages start with "pgtrace: "

pgtrace: 1596111995311480 source_port=17386 dest_port=5432 seq=3376136114 ack=2164505585 window=3635 size_payload=70 flags=0x18

pgtrace has sniffed a TCP packet at 1596111995311480 microseconds after the Unix epoch. pgtrace uses the timestamp supplied by libpcap so that the timestamps in pcap files make sense.

1596111995311480 17386 fe Bind 15 .S_1.......

Bind a prepared query previously saved as S_1. The message was received on port 17386.

1596111995311480 17386 fe Execute 9 .....

Execute the prepared query.

pgtrace: 1596111995312776 source_port=16792 dest_port=5432 seq=3795050389 ack=3794399569 window=3635 size_payload=52 flags=0x18

pgtrace has sniffed another TCP packet.

1596111995312776 16792 fe Parse 16 .SELECT 1...
1596111995312776 16792 fe Bind 12 ........
1596111995312776 16792 fe Describe 6 P.
1596111995312776 16792 fe Execute 9 .....

Execute the "SELECT 1" SQL statement.

Limitations

Happy sniffing!