repmgr Command to Manage and Maintenance Your Database Cluster PostgreSQL 13

2 years ago SETHA THAY 6861
repmgr Command to Manage and Maintenance Your Database Cluster PostgreSQL 13

In the previous post, Managing High Availability Database Cluster with Repmgr, you have learned how to set up a PostgreSQL database cluster which consists of a master and two standby in the same server (localhost). In this post, we will introduce a few repmgr commands to manage and maintain the created cluster. As you may know that by having your database clustered will let you achieve more high availability of your database and system. Therefore, knowledge for managing your cluster is a must to understand well the solution to apply when there is any issue happen. Let's start !!!!

1. View Cluster

repmgr cluster show — display information about each registered node in the replication cluster

This command will show info of each registered server, role (primary/standby), and status. you can execute this command on any node in the cluster. It also provides useful connectivity from a particular node.

This command requires either a valid repmgr.conf file; no additional arguments are needed.

/app/postgres/product/13.3/db_1/bin/repmgr -f /app/repmgr/5.2/repmgr.conf cluster show
 ID | Name        | Role    | Status    | Upstream  | Location | Priority | Timeline | Connection string
----+-------------+---------+-----------+-----------+----------+----------+----------+----------------------------------------------------------------------
 1  | master-db   | primary | * running |           | default  | 100      | 1        | host=localhost port=5432 user=repmgr dbname=repmgr connect_timeout=2
 2  | standby1-db | standby |   running | master-db | default  | 100      | 1        | host=localhost port=5433 user=repmgr dbname=repmgr connect_timeout=2
 3  | standby2-db | standby |   running | master-db | default  | 100      | 1        | host=localhost port=5434 user=repmgr dbname=repmgr connect_timeout=2

To show database connection errors when polling nodes, run the command in --verbose mode

/app/postgres/product/13.3/db_1/bin/repmgr -f /app/repmgr/5.2/repmgr.conf cluster show --verbose
NOTICE: using provided configuration file "/app/repmgr/5.2/repmgr.conf"
INFO: connecting to database
 ID | Name        | Role    | Status    | Upstream  | Location | Priority | Timeline | Connection string
----+-------------+---------+-----------+-----------+----------+----------+----------+----------------------------------------------------------------------
 1  | master-db   | primary | * running |           | default  | 100      | 1        | host=localhost port=5432 user=repmgr dbname=repmgr connect_timeout=2
 2  | standby1-db | standby |   running | master-db | default  | 100      | 1        | host=localhost port=5433 user=repmgr dbname=repmgr connect_timeout=2
 3  | standby2-db | standby |   running | master-db | default  | 100      | 1        | host=localhost port=5434 user=repmgr dbname=repmgr connect_timeout=2

To only display significant info of the cluster, run the command in --compact mode.

/app/postgres/product/13.3/db_1/bin/repmgr -f /app/repmgr/5.2/repmgr.conf cluster show --compact
 ID | Name        | Role    | Status    | Upstream  | Location | Prio. | TLI
----+-------------+---------+-----------+-----------+----------+-------+-----
 1  | master-db   | primary | * running |           | default  | 100   | 1
 2  | standby1-db | standby |   running | master-db | default  | 100   | 1
 3  | standby2-db | standby |   running | master-db | default  | 100   | 1

2. Manual Standby Promotion to Master

In this scenario, we suppose that the master database is having trouble giving service or having downtime which cannot solve by restart the service nor restart the server. We can solve this by promoting either standby 1 or standby 2 as the master in order to continuously operate the business without any problem.

In this example, we will stop the database service on the master node and view the DB cluster by the following command

sudo systemctl stop postgresql

Viewing cluster on standby 1 or standby 2

/app/postgres/product/13.3/db_1/bin/repmgr -f /app/repmgr/5.2/repmgr2.conf cluster show --compact
 ID | Name        | Role    | Status        | Upstream    | Location | Prio. | TLI
----+-------------+---------+---------------+-------------+----------+-------+-----
 1  | master-db   | primary | ? unreachable | ?           | default  | 100   |
 2  | standby1-db | standby |   running     | ? master-db | default  | 100   | 1
 3  | standby2-db | standby |   running     | ? master-db | default  | 100   | 1

WARNING: following issues were detected
  - unable to connect to node "master-db" (ID: 1)
  - node "master-db" (ID: 1) is registered as an active primary but is unreachable
  - unable to connect to node "standby1-db" (ID: 2)'s upstream node "master-db" (ID: 1)
  - unable to determine if node "standby1-db" (ID: 2) is attached to its upstream node "master-db" (ID: 1)
  - unable to connect to node "standby2-db" (ID: 3)'s upstream node "master-db" (ID: 1)
  - unable to determine if node "standby2-db" (ID: 3) is attached to its upstream node "master-db" (ID: 1)

HINT: execute with --verbose option to see connection error messages

As you can see in the table, the primary database is unreachable now because we just stop its services. Now we will use promote command to let standby 1 become primary

repmgr standby promote — promote a standby to a primary

This command will promote a standby to a primary if the current primary has failed. This command requires a valid repmgr.conf file for the standby, no additional arguments are required. We will run the standby promote command with --dry-run first to check the possibility of standby 1 to become master.

/app/postgres/product/13.3/db_1/bin/repmgr -f /app/repmgr/5.2/repmgr2.conf standby promote --dry-run
INFO: node is a standby
INFO: no active primary server found in this replication cluster
WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified
DETAIL: these nodes will remain attached to the current primary:
  standby2-db (node ID: 3)
INFO: 0 walsenders required, 10 available
INFO: node will be promoted using the "pg_promote()" function
INFO: prerequisites for executing STANDBY PROMOTE are met

NOW, the prerequisites for executing STANDBY PROMOTE are met. Let's run the command without option --dry-run

/app/postgres/product/13.3/db_1/bin/repmgr -f /app/repmgr/5.2/repmgr2.conf standby promote
WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified
DETAIL: these nodes will remain attached to the current primary:
  standby2-db (node ID: 3)
NOTICE: promoting standby to primary
DETAIL: promoting server "standby1-db" (ID: 2) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "standby1-db" (ID: 2) was successfully promoted to primary

Let's check the database cluster again, you will now see the standby 1 is running as role primary. That should be fine but you can see there is a problem with standby 2 with the upstream

/app/postgres/product/13.3/db_1/bin/repmgr -f /app/repmgr/5.2/repmgr2.conf cluster show --compact
 ID | Name        | Role    | Status    | Upstream    | Location | Prio. | TLI
----+-------------+---------+-----------+-------------+----------+-------+-----
 1  | master-db   | primary | - failed  | ?           | default  | 100   |
 2  | standby1-db | primary | * running |             | default  | 100   | 2
 3  | standby2-db | standby |   running | ? master-db | default  | 100   | 1

WARNING: following issues were detected
  - unable to connect to node "master-db" (ID: 1)
  - unable to connect to node "standby2-db" (ID: 3)'s upstream node "master-db" (ID: 1)
  - unable to determine if node "standby2-db" (ID: 3) is attached to its upstream node "master-db" (ID: 1)

HINT: execute with --verbose option to see connection error messages

Normally, the upstream of standby 2 is the broken or unavailable master and now we have promoted standby 1 as a new primary instance. Therefore, we have to change upstream of standby 2 to follow this new primary (standby 1) with the following command

repmgr standby follow — attach a running standby to a new upstream node

This command is used to attach the standby to a new upstream node. Typically this will be the primary, but this command can also be used to attach the standby to another standby. This command requires a valid repmgr.conf file for the standby, no additional arguments are required.

Let's run the command with option --dry-run

/app/postgres/product/13.3/db_1/bin/repmgr -f /app/repmgr/5.2/repmgr3.conf standby follow --dry-run
NOTICE: attempting to find and follow current primary
INFO: connected to node 2, checking for current primary
INFO: follow target is primary node "standby1-db" (ID: 2)
INFO: replication connection to the local node was successful
INFO: replication connection to the follow target node was successful
INFO: local and follow target system identifiers match
DETAIL: system identifier is 6976164936838428849
INFO: local node 3 can attach to follow target node 2
DETAIL: local node's recovery point: 0/60000A0; follow target node's fork point: 0/60000A0
INFO: prerequisites for executing STANDBY FOLLOW are met

Now everything looks fine, let's run the command without option --dry-run and check cluster again

/app/postgres/product/13.3/db_1/bin/repmgr -f /app/repmgr/5.2/repmgr3.conf standby follow
NOTICE: attempting to find and follow current primary
INFO: local node 3 can attach to follow target node 2
DETAIL: local node's recovery point: 0/60000A0; follow target node's fork point: 0/60000A0
NOTICE: setting node 3's upstream to node 2
WARNING: node "standby2-db" not found in "pg_stat_replication"
WARNING: node "standby2-db" not found in "pg_stat_replication"
WARNING: node "standby2-db" not found in "pg_stat_replication"
WARNING: node "standby2-db" not found in "pg_stat_replication"
WARNING: node "standby2-db" not found in "pg_stat_replication"
WARNING: node "standby2-db" not found in "pg_stat_replication"
NOTICE: STANDBY FOLLOW successful
DETAIL: standby attached to upstream node "standby1-db" (ID: 2)

CURRENT STATUS OF CLUSTER

/app/postgres/product/13.3/db_1/bin/repmgr -f /app/repmgr/5.2/repmgr3.conf cluster show --compact
 ID | Name        | Role    | Status    | Upstream    | Location | Prio. | TLI
----+-------------+---------+-----------+-------------+----------+-------+-----
 1  | master-db   | primary | - failed  | ?           | default  | 100   |
 2  | standby1-db | primary | * running |             | default  | 100   | 2
 3  | standby2-db | standby |   running | standby1-db | default  | 100   | 2

WARNING: following issues were detected
  - unable to connect to node "master-db" (ID: 1)

HINT: execute with --verbose option to see connection error messages

Now you have promoted standby 1 to be a new primary database instance to serve the service for clients and asking standby 2 to follow the new primary. This means that your cluster is ok now. As a database administrator, all you have to do now is fixing the problems of the broken primary database server.

3. Rejoin Operation

Rejoin operation is done when you have done fixing the problems of the broken primary database server and would like to include it in the cluster again. This inclusion will not make its role as primary again, this broken primary which is just fixed will become just a standby. 

repmgr node rejoin — rejoin a dormant (stopped) node to the replication cluster

This command will enable a dormant (stopped) node to be rejoined to the replication cluster. This can optionally use pg_rewind to re-integrate a node that has diverged from the rest of the cluster, typically a failed primary.

Let's do --dry-run option on the fixed node

/app/repmgr/5.2$ /app/postgres/product/13.3/db_1/bin/repmgr node rejoin -f /app/repmgr/5.2/repmgr1.conf -d 'host=localhost user=repmgr dbname=repmgr port=5433' --force-rewind=/app/postgres/product/13.3/db_1/bin/pg_rewind --dry-run
INFO: replication connection to the rejoin target node was successful
INFO: local and rejoin target system identifiers match
DETAIL: system identifier is 6976164936838428849
INFO: prerequisites for using pg_rewind are met
INFO: pg_rewind would now be executed
DETAIL: pg_rewind command is:
  /app/postgres/product/13.3/db_1/bin/pg_rewind -D '/data01' --source-server='host=localhost port=5433 user=repmgr dbname=repmgr connect_timeout=2'
INFO: prerequisites for executing NODE REJOIN are met

Let's do without --dry-run option on the fixed node

/app/repmgr/5.2$ /app/postgres/product/13.3/db_1/bin/repmgr node rejoin -f /app/repmgr/5.2/repmgr1.conf -d 'host=localhost user=repmgr dbname=repmgr port=5433' --force-rewind=/app/postgres/product/13.3/db_1/bin/pg_rewind
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/app/postgres/product/13.3/db_1/bin/pg_rewind -D '/data01' --source-server='host=localhost port=5433 user=repmgr dbname=repmgr connect_timeout=2'"
NOTICE: 0 files copied to /data01
NOTICE: setting node 1's upstream to node 2
WARNING: unable to ping "host=localhost port=5432 user=repmgr dbname=repmgr connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "sudo systemctl start postgresql"
[sudo] password for postgres:
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2

CURRENT STATUS OF CLUSTER

/app/repmgr/5.2$ /app/postgres/product/13.3/db_1/bin/repmgr -f /app/repmgr/5.2/repmgr1.conf cluster show --compact
 ID | Name        | Role    | Status    | Upstream    | Location | Prio. | TLI
----+-------------+---------+-----------+-------------+----------+-------+-----
 1  | master-db   | standby |   running | standby1-db | default  | 100   | 1
 2  | standby1-db | primary | * running |             | default  | 100   | 2
 3  | standby2-db | standby |   running | standby1-db | default  | 100   | 2

4. Manual Switch Over

Now as you can see in the above cluster list, "standby1-db" is running as role primary, and the other two are running as standby. If you want to switch over to let "master-db" running as primary again, you have to do switch over repmgr operation as below

repmgr standby switchover — promote a standby to primary and demote the existing primary to a standby

This command promotes a standby to primary and demotes the existing primary to standby. This command must be run on standby to be promoted and requires a passwordless SSH connection to the current primary. If other nodes are connected to the demotion candidate, repmgr can instruct these to follow the new primary if the option --siblings-follow is specified.

/app/postgres/product/13.3/db_1/bin/repmgr standby switchover -f /app/repmgr/5.2/repmgr1.conf --siblings-follow --dry-run
NOTICE: checking switchover on node "master-db" (ID: 1) in --dry-run mode
INFO: SSH connection to host "localhost" succeeded
INFO: able to execute "repmgr" on remote host "localhost"
INFO: all sibling nodes are reachable via SSH
INFO: 2 walsenders required, 10 available
INFO: demotion candidate is able to make replication connection to promotion candidate
INFO: 0 pending archive files
INFO: replication lag on this standby is 0 seconds
NOTICE: local node "master-db" (ID: 1) would be promoted to primary; current primary "standby1-db" (ID: 2) would be demoted to standby
INFO: following shutdown command would be run on node "standby1-db":
  "/app/postgres/product/13.3/db_1/bin/pg_ctl -D /data02 -l logfile stop"
INFO: parameter "shutdown_check_timeout" is set to 60 seconds
INFO: prerequisites for executing STANDBY SWITCHOVER are met

Let's do without --dry-run option

/app/postgres/product/13.3/db_1/bin/repmgr standby switchover -f /app/repmgr/5.2/repmgr1.conf --siblings-follow
NOTICE: executing switchover on node "master-db" (ID: 1)
NOTICE: local node "master-db" (ID: 1) will be promoted to primary; current primary "standby1-db" (ID: 2) will be demoted to standby
NOTICE: stopping current primary node "standby1-db" (ID: 2)
NOTICE: issuing CHECKPOINT on node "standby1-db" (ID: 2)
DETAIL: executing server command "/app/postgres/product/13.3/db_1/bin/pg_ctl -D /data02 -l logfile stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/7000028
NOTICE: promoting standby to primary
DETAIL: promoting server "master-db" (ID: 1) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "master-db" (ID: 1) was successfully promoted to primary
INFO: local node 2 can attach to rejoin target node 1
DETAIL: local node's recovery point: 0/7000028; rejoin target node's fork point: 0/70000A0
NOTICE: setting node 2's upstream to node 1
WARNING: unable to ping "host=localhost port=5433 user=repmgr dbname=repmgr connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "/app/postgres/product/13.3/db_1/bin/pg_ctl -D /data02 -l logfile start"
NOTICE: NODE REJOIN successful
DETAIL: node 2 is now attached to node 1
NOTICE: node  "master-db" (ID: 1) promoted to primary, node "standby1-db" (ID: 2) demoted to standby
NOTICE: executing STANDBY FOLLOW on 1 of 1 siblings
INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes
NOTICE: switchover was successful
DETAIL: node "master-db" is now primary and node "standby1-db" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully

CURRENT STATUS OF CLUSTER

/app/postgres/product/13.3/db_1/bin/repmgr -f /app/repmgr/5.2/repmgr1.conf cluster show --compact
 ID | Name        | Role    | Status    | Upstream  | Location | Prio. | TLI
----+-------------+---------+-----------+-----------+----------+-------+-----
 1  | master-db   | primary | * running |           | default  | 100   | 3
 2  | standby1-db | standby |   running | master-db | default  | 100   | 2
 3  | standby2-db | standby |   running | master-db | default  | 100   | 2

Now as you can see in the list, the status of the cluster has been changed to the original state when it was set up. Hopefully, these commands of repmgr can provide you the ease of usage as well as benefit to manage and maintain your PostgreSQL database cluster. yes

 

THANK YOU!!!

Find Us @
Facebook
Telegram
Twitter
LinkedIn


About author

Author Profile

SETHA THAY

Software Engineer & Project Manager. I am willing to share IT knowledge, technical experiences and investment to financial freedom. Feel free to ask and contact me.



Scroll to Top