My PostgreSQL Cheat Sheet
I use PostgreSQL heavily, and I freaking love it, so I am writing down things that I refer to frequently.
At Clarisights, we run a pretty big PostgreSQL server, to give you an idea, we ingest around 2+ Billion events on a single node PostgreSQL server( Size: 4+ TB), and these numbers are from November 2019
We heavily tuned PostgreSQL to handle this kind of workload, I think our config is workload-specific and probably not very useful if you donβt have similar workload.
I am documenting some commands that I refer pretty frequently, hopefully they are useful for other PostgreSQL users π.
See time taken to execute your query
Use the \timing
command to see the time taken to execute your query.
test_db=# \timing
Timing is on.
test_db=# SELECT * FROM test_table WHERE test_id = 156 ORDER BY created_at DESC LIMIT 500;
Time: 3.298 ms
test_db=# \timing
Timing is off.
Connect to a database with a user
psql -d test_db -U test_user
See psql docs for more details: PostgreSQL Docs: psql
Locks and blocking queries
pg_locks
table stores all info current lock and waiting locks, see pg_locks docs for more details
See waiting locks
SELECT relation::regclass, * FROM pg_locks WHERE NOT GRANTED;
get more details with pg_stat_activity
SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event is NOT NULL;
pid | wait_event_type | wait_event
------+-----------------+---------------
2540 | Lock | relation
6644 | LWLock | ProcArrayLock
(2 rows)
Use SQL Queries on the wiki to see blocking queries: PostgreSQL Wiki: Lock Monitoring
Use pg_stat_activity
to see the current state of the system.
The Statistics Collector
I canβt do justice to Stats Collector in one paragraph, so please check out official docs: PostgreSQL Docs: monitoring-stats
EXPLAIN and ANALYZE
EXPLAIN and ANALYZE is used to figure our how a query will be executed, these commands comes handy when you are debugging slow queries.
Use EXPLAIN
to see query plan for a query
EXPLAIN SELECT * FROM foo;
Using ANALYZE
with EXPLAIN
will result in actual execution of query, and it will
show actual run times and other statistics along with query plan
EXPLAIN ANALYZE SELECT * FROM foo;
You can get more information about internal I/O, by using BUFFERS
along with EXPLAIN ANALYZE
EXPLAIN (ANALYZE,BUFFERS) SELECT * FROM foo;
It will print out Buffers:
, which shows how much I/O PostgreSQL is doing.
EXPLAIN
has more flags for getting more info and formatting, See more: PostgreSQL Docs: EXPLAIN
auto_explain
Load auto_explain
module to log slow statements automatically, it will log queries
which run longer then configured duration. You can configure it to log queries with
EXPLAIN
or EXPLAIN ANALYZE
output as well.
See more: PostgreSQL Docs: auto_explain
Index Maintenance and Stats
Over time database usage pattern will evolve and that will mean some of your indicies will be unused, and you will need to add some new indicies
PostgreSQL collects information about your indexes, use it to remove unused indexes, and see which index is used how much
See more: PostgreSQL Wiki: Index Maintenance
Find duplicates in a table
unique index with duplicate data will fail, so here is handly query which you can use to find duplicates
-- find ids of duplicates, for adding unique index on item_id and location_id
SELECT
item_id as a,
location_id as b,
COUNT(*),
ARRAY_AGG(id)
FROM item_location_reports
GROUP BY (a, b)
HAVING COUNT(*) > 1
ORDER BY item_id;
Performance
PostgreSQL Wiki contains list of snippets which you can use to analyze and tune performance
See more: PostgreSQL Wiki: Performance Snippets
Tools
List of tools that comes handy when running PostgreSQL in production
- Percona Monitoring and Management
- PgHero
- Pgtune, web view
- HypoPG
- ottertune
- PostgreSQL: Software Catalogue - Product Categories, the page links to tons of software that works with PostgreSQL
Deep Dive
Literature that goes beyond basics, and explores internals of PostgreSQL
- PostgreSQL Wiki: Backend flowchart contains details of internals and how they interact with each other
- The Internals of PostgreSQL - one of the best resources to learn more on internals
Further Reading
- Collecting metrics with PostgreSQL monitoring tools - DataDog Blog
- Planet PostgreSQL is a PostgreSQL related blog aggregation service run by the PostgreSQL community.
- Continuous Archiving and Point-in-Time Recovery (PITR)
- High Availability, Load Balancing, and Replication