I use PostgreSQL heavily, and I freaking love it, so I am writing down things that I refer to frequently.
At Clarisights, we run a pretty big PostgreSQL server, to give you an idea, we ingest around 2+ Billion events on a single node PostgreSQL server( Size: 4+ TB), and these numbers are from November 2019
We heavily tuned PostgreSQL to handle this kind of workload, I think our config is workload-specific and probably not very useful if you don’t have similar workload.
I am documenting some commands that I refer pretty frequently, hopefully they are useful for other PostgreSQL users 🙂.
See time taken to execute your query
\timing command to see the time taken to execute your query.
test_db=# \timing Timing is on. test_db=# SELECT * FROM test_table WHERE test_id = 156 ORDER BY created_at DESC LIMIT 500; Time: 3.298 ms test_db=# \timing Timing is off.
Connect to a database with a user
psql -d test_db -U test_user
See psql docs for more details: PostgreSQL Docs: psql
Locks and blocking queries
pg_locks table stores all info current lock and waiting locks, see pg_locks docs for more details
See waiting locks
SELECT relation::regclass, * FROM pg_locks WHERE NOT GRANTED;
get more details with pg_stat_activity
SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event is NOT NULL; pid | wait_event_type | wait_event ------+-----------------+--------------- 2540 | Lock | relation 6644 | LWLock | ProcArrayLock (2 rows)
Use SQL Queries on the wiki to see blocking queries: PostgreSQL Wiki: Lock Monitoring
pg_stat_activity to see the current state of the system.
The Statistics Collector
I can’t do justice to Stats Collector in one paragraph, so please check out official docs: PostgreSQL Docs: monitoring-stats
EXPLAIN and ANALYZE
EXPLAIN and ANALYZE is used to figure our how a query will be executed, these commands comes handy when you are debugging slow queries.
EXPLAIN to see query plan for a query
EXPLAIN SELECT * FROM foo;
EXPLAIN will result in actual execution of query, and it will
show actual run times and other statistics along with query plan
EXPLAIN ANALYZE SELECT * FROM foo;
You can get more information about internal I/O, by using
BUFFERS along with
EXPLAIN (ANALYZE,BUFFERS) SELECT * FROM foo;
It will print out
Buffers:, which shows how much I/O PostgreSQL is doing.
EXPLAIN has more flags for getting more info and formatting, See more: PostgreSQL Docs: EXPLAIN
auto_explain module to log slow statements automatically, it will log queries
which run longer then configured duration. You can configure it to log queries with
EXPLAIN ANALYZE output as well.
See more: PostgreSQL Docs: auto_explain
Index Maintenance and Stats
Over time database usage pattern will evolve and that will mean some of your indicies will be unused, and you will need to add some new indicies
PostgreSQL collects information about your indexes, use it to remove unused indexes, and see which index is used how much
See more: PostgreSQL Wiki: Index Maintenance
Find duplicates in a table
unique index with duplicate data will fail, so here is handly query which you can use to find duplicates
-- find ids of duplicates, for adding unique index on item_id and location_id SELECT item_id as a, location_id as b, COUNT(*), ARRAY_AGG(id) FROM item_location_reports GROUP BY (a, b) HAVING COUNT(*) > 1 ORDER BY item_id;
PostgreSQL Wiki contains list of snippets which you can use to analyze and tune performance
See more: PostgreSQL Wiki: Performance Snippets
List of tools that comes handy when running PostgreSQL in production
- Percona Monitoring and Management
- Pgtune, web view
- PostgreSQL: Software Catalogue - Product Categories, the page links to tons of software that works with PostgreSQL
Literature that goes beyond basics, and explores internals of PostgreSQL
- PostgreSQL Wiki: Backend flowchart contains details of internals and how they interact with each other
- The Internals of PostgreSQL - one of the best resources to learn more on internals
- Collecting metrics with PostgreSQL monitoring tools - DataDog Blog
- Planet PostgreSQL is a PostgreSQL related blog aggregation service run by the PostgreSQL community.
- Continuous Archiving and Point-in-Time Recovery (PITR)
- High Availability, Load Balancing, and Replication