My PostgreSQL Cheat Sheet

I use PostgreSQL heavily, and I freaking love it, so I am writing down things that I refer to frequently.

At Clarisights, we run a pretty big PostgreSQL server, to give you an idea, we ingest around 2+ Billion events on a single node PostgreSQL server( Size: 4+ TB), and these numbers are from November 2019

We heavily tuned PostgreSQL to handle this kind of workload, I think our config is workload-specific and probably not very useful if you don’t have similar workload.

I am documenting some commands that I refer pretty frequently, hopefully they are useful for other PostgreSQL users πŸ™‚.

See time taken to execute your query

Use the \timing command to see the time taken to execute your query.

test_db=# \timing
Timing is on.
test_db=# SELECT * FROM test_table WHERE test_id = 156 ORDER BY created_at DESC LIMIT 500;
Time: 3.298 ms
test_db=# \timing
Timing is off.

Connect to a database with a user

psql -d test_db -U test_user

See psql docs for more details: PostgreSQL Docs: psql

Locks and blocking queries

pg_locks table stores all info current lock and waiting locks, see pg_locks docs for more details

See waiting locks

SELECT relation::regclass, * FROM pg_locks WHERE NOT GRANTED;

get more details with pg_stat_activity

SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event is NOT NULL;
pid  | wait_event_type |  wait_event
------+-----------------+---------------
 2540 | Lock            | relation
 6644 | LWLock          | ProcArrayLock
(2 rows)

Use SQL Queries on the wiki to see blocking queries: PostgreSQL Wiki: Lock Monitoring

Use pg_stat_activity to see the current state of the system.

The Statistics Collector

I can’t do justice to Stats Collector in one paragraph, so please check out official docs: PostgreSQL Docs: monitoring-stats

EXPLAIN and ANALYZE

EXPLAIN and ANALYZE is used to figure our how a query will be executed, these commands comes handy when you are debugging slow queries.

Use EXPLAIN to see query plan for a query

EXPLAIN SELECT * FROM foo;

Using ANALYZE with EXPLAIN will result in actual execution of query, and it will show actual run times and other statistics along with query plan

EXPLAIN ANALYZE SELECT * FROM foo;

You can get more information about internal I/O, by using BUFFERS along with EXPLAIN ANALYZE

EXPLAIN (ANALYZE,BUFFERS) SELECT * FROM foo;

It will print out Buffers:, which shows how much I/O PostgreSQL is doing. EXPLAIN has more flags for getting more info and formatting, See more: PostgreSQL Docs: EXPLAIN

auto_explain

Load auto_explain module to log slow statements automatically, it will log queries which run longer then configured duration. You can configure it to log queries with EXPLAIN or EXPLAIN ANALYZE output as well. See more: PostgreSQL Docs: auto_explain

Index Maintenance and Stats

Over time database usage pattern will evolve and that will mean some of your indicies will be unused, and you will need to add some new indicies

PostgreSQL collects information about your indexes, use it to remove unused indexes, and see which index is used how much

See more: PostgreSQL Wiki: Index Maintenance

Find duplicates in a table

unique index with duplicate data will fail, so here is handly query which you can use to find duplicates

-- find ids of duplicates, for adding unique index on item_id and location_id
SELECT
  item_id as a,
  location_id as b,
  COUNT(*),
  ARRAY_AGG(id)
FROM item_location_reports
GROUP BY (a, b)
HAVING COUNT(*) > 1
ORDER BY item_id;

Performance

PostgreSQL Wiki contains list of snippets which you can use to analyze and tune performance

See more: PostgreSQL Wiki: Performance Snippets

Tools

List of tools that comes handy when running PostgreSQL in production

Deep Dive

Literature that goes beyond basics, and explores internals of PostgreSQL

Further Reading


Continue Reading