Bài 15: Recovering failed nodes

Rejoin failed primary vào cluster, sử dụng pg_rewind mechanism và rebuild replica từ backup khi cần thiết.

12 min read
XDEV ASIA

Bài 15: Recovering failed nodes

Mục tiêu

Sau bài học này, bạn sẽ:

  • Rejoin old primary sau khi failover
  • Sử dụng pg_rewind để sync lại data
  • Rebuild replica với pg_basebackup
  • Xử lý timeline divergence
  • Recover từ split-brain scenarios
  • Automate recovery với Patroni

1. Node Recovery Overview

1.1. Recovery Scenarios

Khi nào cần recover node?

Scenario 1: Old primary sau failover

Before:
  node1 (primary) → FAILS
  node2 (replica) → promoted to primary

After:
  node1: Needs rejoin as replica
  node2: Current primary

Scenario 2: Replica disconnected

Before:
  node3 (replica) → Network partition / Crash

After:
  node3: Needs to catch up with primary

Scenario 3: Hardware replacement

Before:
  node2: Disk failure

After:
  node2: New disk, needs full rebuild

Scenario 4: Timeline divergence

Before:
  node1 accepted writes AFTER losing leader lock

After:
  node1: Diverged timeline, conflicts with cluster

1.2. Recovery Methods

MethodWhen to useTimeData loss
Auto-rejoinNode was clean shutdown~10sNone
pg_rewindTimeline divergence~1-5minNone
pg_basebackupMajor corruption / Full rebuild~30min+None
Manual recoveryComplex split-brain scenariosVariesPossible

2. Auto-Rejoin (Patroni Default)

2.1. How auto-rejoin works

When node comes back online:

1. Patroni starts
2. Checks DCS for cluster state
3. Finds current leader (e.g., node2)
4. Compares local timeline with cluster timeline
5. If compatible → auto-rejoin as replica
6. If diverged → need pg_rewind or reinit

2.2. Example: Clean rejoin

Setup:

# Current cluster state
patronictl list postgres

# + Cluster: postgres ----+----+-----------+
# | Member | Host        | Role    | State   | TL | Lag in MB |
# +--------+-------------+---------+---------+----+-----------+
# | node1  | 10.0.1.11   | Leader  | running |  2 |           |
# | node2  | 10.0.1.12   | Replica | running |  2 |         0 |
# | node3  | 10.0.1.13   | Replica | running |  2 |         0 |
# +--------+-------------+---------+---------+----+-----------+

Simulate node3 failure:

# On node3: Stop Patroni cleanly
sudo systemctl stop patroni

# Cluster now:
# | node1  | 10.0.1.11   | Leader  | running |  2 |           |
# | node2  | 10.0.1.12   | Replica | running |  2 |         0 |
# | node3  | 10.0.1.13   | -       | stopped |  - |           | ← Down

Recovery:

# On node3: Start Patroni
sudo systemctl start patroni

# Watch logs
sudo journalctl -u patroni -f

Log output:

2024-11-25 10:00:00 INFO: Starting Patroni...
2024-11-25 10:00:01 INFO: Connected to DCS (etcd)
2024-11-25 10:00:02 INFO: Cluster timeline: 2, local timeline: 2 ✅
2024-11-25 10:00:03 INFO: Current leader: node1
2024-11-25 10:00:04 INFO: Rejoining as replica
2024-11-25 10:00:05 INFO: Starting PostgreSQL in recovery mode
2024-11-25 10:00:08 INFO: Replication started, streaming from node1
2024-11-25 10:00:10 INFO: Successfully rejoined cluster ✅

Verify:

patronictl list postgres

# + Cluster: postgres ----+----+-----------+
# | Member | Host        | Role    | State   | TL | Lag in MB |
# +--------+-------------+---------+---------+----+-----------+
# | node1  | 10.0.1.11   | Leader  | running |  2 |           |
# | node2  | 10.0.1.12   | Replica | running |  2 |         0 |
# | node3  | 10.0.1.13   | Replica | running |  2 |         0 | ← Rejoined!
# +--------+-------------+---------+---------+----+-----------+

Time: ~10 seconds ✅

2.3. Configuration for auto-rejoin

# In patroni.yml
postgresql:
  use_pg_rewind: true  # Enable automatic pg_rewind if needed
  remove_data_directory_on_rewind_failure: false  # Safety
  remove_data_directory_on_diverged_timelines: false  # Safety

# Patroni will attempt:
# 1. Auto-rejoin (if timelines match)
# 2. pg_rewind (if timeline diverged but recoverable)
# 3. Full reinit (if pg_rewind fails and auto-reinit enabled)

3. Using pg_rewind

3.1. What is pg_rewind?

pg_rewind = Tool to resync a PostgreSQL instance that diverged from the current timeline.

When needed:

Scenario: Old primary received writes AFTER failover

Timeline:
  T+0: node1 (primary), node2 (replica)
  T+1: Network partition
  T+2: node2 promoted (timeline: 1 → 2)
  T+3: node1 still thinks it's primary, accepts writes (timeline: 1)
  T+4: Network restored
  T+5: Conflict! node1 timeline=1, cluster timeline=2

Solution: pg_rewind node1 to match node2's timeline

How it works:

1. Find common ancestor (last shared WAL position)
2. Replay WAL from new primary
3. Overwrite conflicting blocks
4. Node rejoins as replica on new timeline

3.2. Prerequisites for pg_rewind

Requirements:

# In patroni.yml → postgresql.parameters
wal_log_hints: 'on'  # Required! (or use full_page_writes)

# Or use data checksums (set during initdb):
# initdb --data-checksums

# Also ensure:
max_wal_senders: 10  # For replication
wal_level: replica   # For replication

Why wal_log_hints?

Without wal_log_hints:
  pg_rewind cannot determine which blocks changed
  → Cannot resync
  → Must use full rebuild (pg_basebackup)

With wal_log_hints:
  PostgreSQL tracks all block changes
  → pg_rewind can identify divergence
  → Fast resync ✅

Trade-off: ~1-2% write performance overhead

3.3. Manual pg_rewind

Scenario: node1 (old primary) needs resync after failover.

Step 1: Stop PostgreSQL on node1

# On node1
sudo systemctl stop patroni
sudo systemctl stop postgresql

Step 2: Run pg_rewind

# On node1: Rewind to match node2 (current primary)
sudo -u postgres pg_rewind \
  --target-pgdata=/var/lib/postgresql/18/data \
  --source-server="host=10.0.1.12 port=5432 user=replicator dbname=postgres" \
  --progress \
  --debug

# Output:
# connected to server
# servers diverged at WAL location 0/3000000 on timeline 1
# rewinding from last common checkpoint at 0/2000000 on timeline 1
# reading source file list
# reading target file list
# reading WAL in target
# need to copy 124 MB (total source directory size is 2048 MB)
# creating backup label and updating control file
# syncing target data directory
# Done!

Step 3: Create standby.signal

# On node1: Mark as standby
sudo -u postgres touch /var/lib/postgresql/18/data/standby.signal

Step 4: Update primary_conninfo

# On node1: Point to new primary (node2)
sudo -u postgres tee /var/lib/postgresql/18/data/postgresql.auto.conf <<EOF
primary_conninfo = 'host=10.0.1.12 port=5432 user=replicator password=replica_password'
EOF

Step 5: Start PostgreSQL

# On node1
sudo systemctl start patroni

# Patroni will start PostgreSQL in recovery mode

Step 6: Verify

patronictl list postgres

# node1 should now be a Replica following node2 ✅

Time: ~1-5 minutes (depends on divergence size)

3.4. Automatic pg_rewind (Patroni)

Enable in patroni.yml:

# Patroni will automatically run pg_rewind if needed
postgresql:
  use_pg_rewind: true
  
  parameters:
    wal_log_hints: 'on'  # Required!

Behavior:

When node rejoins after failover:
  1. Patroni detects timeline divergence
  2. Automatically runs pg_rewind
  3. Restarts PostgreSQL as replica
  4. Node rejoins cluster

No manual intervention needed! ✅

Example log:

2024-11-25 10:05:00 INFO: Local timeline 1, cluster timeline 2
2024-11-25 10:05:01 WARNING: Timeline divergence detected
2024-11-25 10:05:02 INFO: use_pg_rewind enabled, attempting rewind...
2024-11-25 10:05:03 INFO: Running pg_rewind...
2024-11-25 10:05:45 INFO: pg_rewind completed successfully
2024-11-25 10:05:46 INFO: Starting PostgreSQL as replica
2024-11-25 10:05:50 INFO: Rejoined cluster ✅

4. Full Rebuild with pg_basebackup

4.1. When to use pg_basebackup

Use cases:

  1. pg_rewind failed - Data too diverged
  2. Corruption detected - Data integrity issues
  3. Major version upgrade - Different PostgreSQL versions
  4. New node - Adding fresh replica to cluster
  5. Disk replaced - Empty data directory
  6. Paranoid safety - Want guaranteed clean state

Trade-off: Slower (~30min-2hrs for large DB) but guaranteed clean.

4.2. Manual pg_basebackup

Step 1: Stop and clean node

# On node to rebuild (e.g., node3)
sudo systemctl stop patroni
sudo systemctl stop postgresql

# Remove old data directory
sudo rm -rf /var/lib/postgresql/18/data/*

Step 2: Take base backup from primary

# On node3: Backup from current primary (node2)
sudo -u postgres pg_basebackup \
  -h 10.0.1.12 \
  -p 5432 \
  -U replicator \
  -D /var/lib/postgresql/18/data \
  -Fp \
  -Xs \
  -P \
  -R

# Flags:
# -h: Host (primary)
# -U: Replication user
# -D: Target data directory
# -Fp: Plain format (not tar)
# -Xs: Stream WAL during backup
# -P: Show progress
# -R: Create standby.signal and replication config

Output:

Password: [enter replicator password]
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/4000000 on timeline 2
pg_basebackup: starting background WAL receiver
24567/24567 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/4000168
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed

Step 3: Verify configuration

# On node3: Check standby.signal created
ls /var/lib/postgresql/18/data/standby.signal

# Check primary_conninfo
cat /var/lib/postgresql/18/data/postgresql.auto.conf | grep primary_conninfo

Step 4: Start node

# On node3
sudo systemctl start patroni

# Node will rejoin as replica

Step 5: Verify

patronictl list postgres

# node3 should be streaming from primary ✅

Time: ~30min-2hrs (depends on database size)

4.3. Patroni automatic reinit

Enable auto-reinit:

# In patroni.yml
postgresql:
  use_pg_rewind: true
  
  # If pg_rewind fails, auto-reinit
  remove_data_directory_on_rewind_failure: true
  remove_data_directory_on_diverged_timelines: true

# WARNING: Data directory will be DELETED and recreated
# Only enable if you trust automation!

Behavior:

When node rejoins:
  1. Try auto-rejoin → FAILED (diverged)
  2. Try pg_rewind → FAILED (corruption)
  3. Automatically remove data directory
  4. Run pg_basebackup from current primary
  5. Rejoin as replica

Fully automated! But destructive! ⚠️

4.4. Patroni reinit command

Manual trigger:

# Force reinit on node3
patronictl reinit postgres node3

# Patroni will:
# 1. Stop PostgreSQL on node3
# 2. Remove data directory
# 3. Run pg_basebackup from leader
# 4. Start as replica

# Prompt:
# Are you sure you want to reinitialize members node3? [y/N]: y

Monitor progress:

# On node3: Watch logs
sudo journalctl -u patroni -f

# Expected:
# INFO: Removing data directory...
# INFO: Running pg_basebackup...
# INFO: Backup completed (24 GB in 15 minutes)
# INFO: Starting PostgreSQL...
# INFO: Rejoined cluster ✅

5. Timeline Divergence Resolution

5.1. Understanding timelines

Timeline = History branch counter

Initial:
  Timeline 1 (all nodes)

After first failover:
  Old primary: Timeline 1
  New primary: Timeline 2 ← Incremented

After second failover:
  Timeline 3 ← Incremented again

Why timelines exist:

Prevent data conflict:
  If two nodes both think they're primary,
  they write on different timelines.
  → Conflict detected
  → Manual intervention required

5.2. Detecting timeline divergence

Check local timeline:

# On any node
sudo -u postgres psql -c "
  SELECT timeline_id 
  FROM pg_control_checkpoint();
"

# Example:
# timeline_id
# ------------
#           2

Check cluster timeline:

# Via Patroni
patronictl list postgres | head -2

# + Cluster: postgres (7001234567890123456) ----+----+-----------+
#                                               ↑ Timeline in cluster ID

# Or via REST API
curl -s http://10.0.1.12:8008/patroni | jq '.timeline'
# Output: 2

Compare:

# If node timeline ≠ cluster timeline
# → Node needs pg_rewind or reinit

5.3. Scenario: Timeline divergence after split-brain

Setup:

T+0: 3-node cluster, node1 = primary (timeline 2)
T+1: Network partition splits node1 from node2/node3
T+2: node1 thinks it's still primary (timeline 2)
T+3: node2/node3 elect node2 as primary (timeline 3)
T+4: Both node1 and node2 accept writes!
  - node1: timeline 2, accepting writes ❌
  - node2: timeline 3, accepting writes ✅
  - Split-brain! ⚠️
T+5: Network restored
T+6: Conflict detected

Resolution:

# Step 1: Verify which timeline is "correct"
patronictl list postgres

# + Cluster: postgres ----+----+-----------+
# | Member | Host        | Role    | State   | TL | Lag in MB |
# +--------+-------------+---------+---------+----+-----------+
# | node1  | 10.0.1.11   | -       | stopped |  2 |           | ← WRONG timeline
# | node2  | 10.0.1.12   | Leader  | running |  3 |           | ← CORRECT
# | node3  | 10.0.1.13   | Replica | running |  3 |         0 |
# +--------+-------------+---------+---------+----+-----------+

# Step 2: Save diverged data from node1 (if needed)
sudo -u postgres pg_dumpall -h 10.0.1.11 > /backup/node1-diverged-data.sql

# Step 3: Rewind node1 to match timeline 3
# If pg_rewind works:
patronictl reinit postgres node1

# If pg_rewind fails (likely due to significant divergence):
# Manual pg_basebackup required
sudo systemctl stop patroni  # On node1
sudo rm -rf /var/lib/postgresql/18/data/*
sudo -u postgres pg_basebackup -h 10.0.1.12 -D /var/lib/postgresql/18/data -U replicator -R -P
sudo systemctl start patroni

# Step 4: Manually reconcile diverged data (if important)
# Review /backup/node1-diverged-data.sql
# Manually merge important transactions into node2

Prevention:

# Configure Patroni to prevent split-brain
bootstrap:
  dcs:
    # Primary loses leader lock → immediately demote
    ttl: 30
    retry_timeout: 10
    
  postgresql:
    parameters:
      # Prevent writes if not sure about leadership
      synchronous_commit: 'remote_apply'  # Requires sync replica

6. Split-Brain Prevention and Recovery

6.1. How Patroni prevents split-brain

Mechanism: DCS Leader Lock

Primary MUST hold leader lock in DCS:

If primary loses DCS connection:
  1. Cannot renew leader lock
  2. TTL expires (e.g., 30 seconds)
  3. Primary DEMOTES itself (becomes read-only)
  4. Replicas detect no leader
  5. Election begins

Key: Primary NEVER operates without DCS lock ✅

Code flow (pseudo):

while True:
    if is_leader:
        if can_renew_leader_lock():
            # Still leader, continue
            accept_writes()
        else:
            # Lost DCS connection!
            log.error("Lost leader lock, DEMOTING!")
            demote_to_replica()
            reject_writes()
    
    sleep(loop_wait)

6.2. Fencing mechanisms

PostgreSQL-level fencing:

-- When demoted, set read-only
ALTER SYSTEM SET default_transaction_read_only = 'on';
SELECT pg_reload_conf();

-- All new transactions will fail:
-- ERROR: cannot execute INSERT in a read-only transaction

OS-level fencing (advanced):

# STONITH (Shoot The Other Node In The Head)
# Via callbacks in patroni.yml

callbacks:
  on_start: /var/lib/postgresql/callbacks/on_start.sh
  on_stop: /var/lib/postgresql/callbacks/on_stop.sh
  on_role_change: /var/lib/postgresql/callbacks/on_role_change.sh

# on_role_change.sh example:
#!/bin/bash
ROLE=$1  # "master" or "replica"

if [ "$ROLE" == "replica" ]; then
  # Lost leadership, ensure NO writes possible
  sudo iptables -A INPUT -p tcp --dport 5432 -j REJECT
  # Block incoming connections to PostgreSQL
fi

if [ "$ROLE" == "master" ]; then
  # Gained leadership, allow writes
  sudo iptables -D INPUT -p tcp --dport 5432 -j REJECT
fi

6.3. Scenario: Recover from split-brain

Detection:

# Symptoms:
# - Multiple nodes claim to be primary
# - Patroni shows errors
# - Applications seeing inconsistent data

# Check cluster state
patronictl list postgres

# If you see multiple "Leader" or conflicts:
# SPLIT-BRAIN DETECTED! ⚠️

Recovery steps:

# Step 1: STOP ALL NODES immediately
for node in node1 node2 node3; do
  ssh $node "sudo systemctl stop patroni"
done

# Step 2: Determine "source of truth"
# Usually: Node with most recent data / highest timeline
for node in node1 node2 node3; do
  echo "=== $node ==="
  ssh $node "sudo -u postgres psql -c \"
    SELECT timeline_id, pg_last_wal_receive_lsn()
    FROM pg_control_checkpoint();
  \""
done

# Step 3: Choose winner (e.g., node2 has highest timeline)
WINNER="node2"

# Step 4: Backup diverged data from losers
ssh node1 "sudo -u postgres pg_dumpall > /backup/node1-diverged.sql"
ssh node3 "sudo -u postgres pg_dumpall > /backup/node3-diverged.sql"

# Step 5: Wipe losers and rebuild from winner
for node in node1 node3; do
  ssh $node "sudo rm -rf /var/lib/postgresql/18/data/*"
  ssh $node "sudo -u postgres pg_basebackup \
    -h $WINNER \
    -D /var/lib/postgresql/18/data \
    -U replicator -R -P"
done

# Step 6: Clear DCS state (fresh start)
etcdctl del --prefix /service/postgres/

# Step 7: Start winner first
ssh $WINNER "sudo systemctl start patroni"

# Wait for winner to become leader
sleep 10

# Step 8: Start other nodes
ssh node1 "sudo systemctl start patroni"
ssh node3 "sudo systemctl start patroni"

# Step 9: Verify cluster
patronictl list postgres

# Should show:
# node2: Leader
# node1: Replica (following node2)
# node3: Replica (following node2)
# All same timeline ✅

# Step 10: Reconcile diverged data manually
# Review /backup/*-diverged.sql files
# Merge critical transactions if needed

7. Monitoring Node Recovery

7.1. Key metrics

-- Replication status
SELECT application_name, 
       state,
       pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes,
       replay_lag,
       sync_state
FROM pg_stat_replication;

-- Timeline check
SELECT timeline_id FROM pg_control_checkpoint();

-- Recovery status (on replica)
SELECT pg_is_in_recovery(),
       pg_last_wal_receive_lsn(),
       pg_last_wal_replay_lsn(),
       pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()) AS replay_lag_bytes;

7.2. Patroni REST API monitoring

# Check node status
curl -s http://10.0.1.11:8008/patroni | jq

# Key fields:
# {
#   "state": "running",
#   "role": "replica",
#   "timeline": 3,
#   "replication": [
#     {
#       "usename": "replicator",
#       "application_name": "node1",
#       "state": "streaming",
#       "sync_state": "async",
#       "replay_lsn": "0/5000000"
#     }
#   ]
# }

7.3. Alerting on recovery issues

# Prometheus alert
groups:
  - name: node_recovery
    rules:
      - alert: PatroniNodeDown
        expr: up{job="patroni"} == 0
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Patroni node {{ $labels.instance }} is down"
      
      - alert: PatroniTimelineMismatch
        expr: |
          count by (cluster) (patroni_timeline) 
          != 
          count by (cluster, timeline) (patroni_timeline)
        labels:
          severity: critical
        annotations:
          summary: "Timeline mismatch detected - possible split-brain"
      
      - alert: PatroniReplicationLagHigh
        expr: patroni_replication_lag_bytes > 104857600  # 100MB
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Replication lag > 100MB on {{ $labels.instance }}"

8. Best Practices

✅ DO

  1. Enable wal_log_hints - Required for pg_rewind
  2. Test recovery regularly - Monthly drills
  3. Monitor timelines - Alert on divergence
  4. Have backups - Before risky operations
  5. Document procedures - Recovery runbooks
  6. Use Patroni auto-recovery - Less manual intervention
  7. Verify after recovery - Test replication, queries
  8. Keep DCS healthy - etcd cluster critical
  9. Log everything - Audit trail for incidents
  10. Practice split-brain recovery - Hope never needed, but be ready

❌ DON'T

  1. Don't skip wal_log_hints - pg_rewind will fail
  2. Don't assume auto-recovery works - Test it!
  3. Don't ignore timeline mismatches - Critical issue
  4. Don't manually promote during recovery - Let Patroni handle
  5. Don't delete data without backup - Diverged data may be important
  6. Don't run split-brain clusters - Fix immediately
  7. Don't forget callbacks - Fencing prevents split-brain
  8. Don't over-automate reinit - Risk data loss

9. Lab Exercises

Lab 1: Auto-rejoin after clean shutdown

Tasks:

  1. Stop one replica: sudo systemctl stop patroni
  2. Make changes on primary
  3. Start replica: sudo systemctl start patroni
  4. Verify auto-rejoin and lag catch-up
  5. Time the recovery

Lab 2: pg_rewind after simulated failover

Tasks:

  1. Record current primary
  2. Manually stop primary: sudo systemctl stop patroni
  3. Wait for failover to complete
  4. Start old primary (should auto-rewind)
  5. Verify old primary rejoined as replica
  6. Check timeline increment

Lab 3: Full rebuild with pg_basebackup

Tasks:

  1. Stop a replica
  2. Delete data directory: sudo rm -rf /var/lib/postgresql/18/data/*
  3. Manually run pg_basebackup from primary
  4. Start replica
  5. Verify replication restored
  6. Measure rebuild time

Lab 4: Patroni reinit command

Tasks:

  1. Use patronictl reinit postgres node3
  2. Monitor logs during process
  3. Verify automated rebuild
  4. Compare time vs manual pg_basebackup

Lab 5: Timeline divergence simulation

Tasks:

  1. Create network partition (iptables)
  2. Wait for failover
  3. Manually promote old primary (force split-brain)
  4. Write different data to both "primaries"
  5. Restore network
  6. Observe conflict detection
  7. Practice recovery procedure

10. Troubleshooting

Issue: pg_rewind fails

Errorpg_rewind: fatal: could not find common ancestor

Cause: wal_log_hints not enabled or data too diverged.

Solution:

# Check wal_log_hints
sudo -u postgres psql -c "SHOW wal_log_hints;"

# If off, enable:
sudo -u postgres psql -c "ALTER SYSTEM SET wal_log_hints = on;"
sudo systemctl restart postgresql

# If still fails, use pg_basebackup instead
patronictl reinit postgres node1

Issue: Replica stuck in recovery

Symptoms: Replica shows "running" but high lag.

Diagnosis:

# Check replication status
sudo -u postgres psql -h 10.0.1.11 -c "
  SELECT * FROM pg_stat_replication;
"

# Check replica logs
sudo journalctl -u postgresql -n 100

Common causes:

  • WAL receiver crashed
  • Network issues
  • Disk full on replica
  • Archive restore errors

Solution:

# Restart replication
sudo systemctl restart patroni

# If persists, reinit
patronictl reinit postgres node3

Issue: Cannot connect after recovery

ErrorFATAL: the database system is starting up

Cause: PostgreSQL still replaying WAL.

Solution: Wait for recovery to complete, or check logs for errors.

# Check recovery progress
sudo -u postgres psql -h 10.0.1.13 -c "
  SELECT pg_is_in_recovery(),
         pg_last_wal_receive_lsn(),
         pg_last_wal_replay_lsn();
"

11. Tổng kết

Recovery Methods Summary

MethodSpeedData LossUse Case
Auto-rejoinFastestNoneClean shutdown/restart
pg_rewindFastNoneTimeline divergence
pg_basebackupSlowNoneCorruption, major divergence
Manual recoveryVariesPossibleSplit-brain, complex issues

Key Concepts

✅ Auto-rejoin - Patroni handles clean recovery automatically

✅ pg_rewind - Resync after timeline divergence (requires wal_log_hints)

✅ pg_basebackup - Full rebuild from primary (slow but safe)

✅ Timeline - History branch, increments on failover

✅ Split-brain - Multiple primaries (prevented by DCS leader lock)

Recovery Checklist

  •  Node failure detected
  •  Determine recovery method needed
  •  Backup diverged data (if any)
  •  Execute recovery (auto or manual)
  •  Verify timeline matches cluster
  •  Verify replication streaming
  •  Test read/write operations
  •  Check replication lag
  •  Update monitoring/documentation

Next Steps

Bài 16 sẽ cover Backup và Point-in-Time Recovery:

  • pg_basebackup strategies
  • WAL archiving configuration
  • Point-in-Time Recovery (PITR) procedures
  • Backup automation and scheduling
  • Disaster recovery planning
recovery pg_rewind node-recovery rebuild-replica lab

Đánh dấu hoàn thành (Bài 15: Recovering failed nodes)