[PostgreSQL]: How to check Replication lag

We can calculate the replication lag by comparing the current WAL write location on the primary with the last WAL location received/replayed by the standby.
They can be retrieved using pg_current_xlog_location on the primary and the pg_last_xlog_receive_location/pg_last_xlog_replay_location on the standby, respectively.

[ postgres @ masterhost : /root ] psql -c “SELECT pg_current_xlog_location()”
could not change directory to “/root”: Permission denied
pg_current_xlog_location
————————–
0/205B598
(1 row)

[ postgres @ masterhost : /root ]

[ postgres @ slavehost : /root ] psql -c “select pg_last_xlog_receive_location()”
could not change directory to “/root”: Permission denied
pg_last_xlog_receive_location
——————————-
0/205B598
(1 row)

[ postgres @ slavehost : /root ] psql -c “select pg_last_xlog_receive_location()”
could not change directory to “/root”: Permission denied
pg_last_xlog_receive_location
——————————-
0/205B598
(1 row)

[ postgres @ slavehost : /root ]
This can be achieved by comparing pg_last_xlog_receive_location() and pg_last_xlog_replay_location() on the slave, and if they are the same it returns 0, otherwise it runs the above query again:

postgres=# SELECT
postgres-# CASE
postgres-# WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0
postgres-# ELSE EXTRACT (EPOCH FROM now() – pg_last_xact_replay_timestamp())::INTEGER
postgres-# END
postgres-# AS replication_lag;
replication_lag
—————–
0
(1 row)

postgres=#
This query is all good, but the problem is that it is not safe. If for some reason the master stops sending transaction logs, this query will continue to return 0 and you will think the replication is working, when it is not.
Other way …
============

At Master :

postgres=# SELECT pg_last_xlog_receive_location();
pg_last_xlog_receive_location
——————————-
0/205B3E8
(1 row)
At Slave :

postgres=# SELECT pg_last_xlog_receive_location();
pg_last_xlog_receive_location
——————————-
0/205B3E8
(1 row)

Other way …
============

Below query works great and it is a very good query to give you the lag in seconds. The problem is if the master is not active, it doesn’t mean a thing. So you need to first check if two servers are in sync and if they are, return 0.

postgres=# SELECT EXTRACT(EPOCH FROM (now() – pg_last_xact_replay_timestamp()))::INT;
date_part
———–
3173
(1 row)
Other way …
============

Solution#1 would be combined with a check if the wal receiver process is running before running that query with something like:

You can also check the progress of streaming replication by using ps command
[ postgres @ masterhost : /root ] ps -ef | grep sender
postgres 5167 26754 0 04:15 pts/1 00:00:00 grep sender
postgres 5295 4517 0 Jul24 ? 00:00:01 postgres: wal sender process rep 10.116.175.107(49770) streaming 0/205B598
[ postgres @ masterhost : /root ]
[ postgres @ masterhost : /root ]
[ postgres @ slavehost : /root ] ps -ef | grep receiver
postgres 3821 3808 0 Jul24 ? 00:01:27 postgres: wal receiver process streaming 0/205B598
postgres 5276 2515 0 04:15 pts/1 00:00:00 grep receiver
[ postgres @ slavehost : /root ]

  • Ask Question