Bài 14: Switchover có kế hoạch (Planned Switchover)
Phân biệt planned switchover và failover, khi nào cần switchover, zero-downtime maintenance và thực hành switchover an toàn.
Bài 14: Switchover có kế hoạch (Planned Switchover)
Mục tiêu
Sau bài học này, bạn sẽ:
- Phân biệt switchover vs failover
- Thực hiện planned switchover an toàn
- Hiểu graceful vs immediate switchover
- Minimize downtime trong maintenance
- Automate switchover cho rolling updates
- Handle switchover trong production
1. Switchover Overview
1.1. Switchover là gì?
Switchover = Có kế hoạch promote một replica lên làm primary.
So sánh với Failover:
| Aspect | Failover | Switchover |
|---|---|---|
| Trigger | Primary failure (unplanned) | Manual/scheduled (planned) |
| Downtime | 30-60 seconds | 0-10 seconds |
| Data loss | Possible (if async) | Zero (controlled) |
| Control | Automatic | Manual/scripted |
| Timing | Unpredictable | Scheduled |
1.2. Khi nào cần switchover?
Common scenarios:
A. Hardware maintenance
Scenario: Need to replace failing disk on primary server
→ Switchover to replica
→ Perform maintenance on old primary
→ Keep as replica or switchover back
B. Software upgrades
Scenario: OS kernel update requires reboot
→ Switchover to replica
→ Update & reboot old primary
→ Verify, then switchover back (optional)
C. Database migration
Scenario: Move database to larger server
→ Add new server as replica
→ Switchover to new server
→ Remove old server
D. Datacenter migration
Scenario: Move from DC1 to DC2
→ Setup replicas in DC2
→ Switchover primary to DC2
→ Decommission DC1 nodes
E. Testing
Scenario: Test HA readiness before production
→ Perform switchover in staging
→ Validate application behavior
→ Measure downtime
1.3. Switchover Benefits
✅ Zero data loss - All transactions committed before switch
✅ Controlled timing - During maintenance window
✅ Lower risk - Coordinated, tested process
✅ Minimal downtime - 0-10 seconds vs 30-60 for failover
✅ Reversible - Can switchover back if issues
2. Types of Switchover
2.1. Graceful Switchover (Default)
Process:
1. Verify cluster healthy
2. Wait for replication lag = 0
3. Stop new connections to old primary
4. Wait for active transactions to complete
5. Promote new primary
6. Reconfigure old primary as replica
Downtime: ~5-10 seconds ✅
Data loss: None ✅
Command:
patronictl switchover postgres
2.2. Immediate Switchover
Process:
1. Immediately promote replica
2. Kill active connections on old primary
3. Demote old primary (force if needed)
Downtime: ~2-5 seconds ✅
Data loss: Possible if transactions in-flight ⚠️
Command:
patronictl switchover postgres --force
2.3. Scheduled Switchover
Process:
1. Schedule switchover at specific time
2. Patroni waits until scheduled time
3. Performs graceful switchover automatically
Downtime: ~5-10 seconds ✅
Automation: Full ✅
Command:
patronictl switchover postgres --scheduled 2024-11-25T02:00:00
3. Switchover Prerequisites
3.1. Cluster health check
# 1. Verify all nodes running
patronictl list postgres
# Expected:
# + Cluster: postgres (7001234567890123456) ----+----+-----------+
# | Member | Host | Role | State | TL | Lag in MB |
# +--------+---------------+---------+---------+----+-----------+
# | node1 | 10.0.1.11:5432| Leader | running | 2 | |
# | node2 | 10.0.1.12:5432| Replica | running | 2 | 0 | ✅
# | node3 | 10.0.1.13:5432| Replica | running | 2 | 0 | ✅
# +--------+---------------+---------+---------+----+-----------+
# All nodes must be:
# - State: running ✅
# - Lag: 0 or very low ✅
# - Same timeline ✅
3.2. Replication lag check
# Check lag on all replicas
sudo -u postgres psql -h 10.0.1.11 -c "
SELECT application_name,
client_addr,
state,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes,
replay_lag
FROM pg_stat_replication
ORDER BY lag_bytes DESC;
"
# Desired:
# application_name | client_addr | state | lag_bytes | replay_lag
# -----------------+-------------+-----------+-----------+------------
# node2 | 10.0.1.12 | streaming | 0 | 00:00:00 ✅
# node3 | 10.0.1.13 | streaming | 0 | 00:00:00 ✅
3.3. Target candidate check
# Check if target has nofailover tag
patronictl show-config postgres | grep -A10 "tags:"
# Target node should have:
tags:
nofailover: false # ✅ Can be promoted
priority: 100 # Higher = preferred
# NOT:
tags:
nofailover: true # ❌ Cannot be promoted
3.4. Connection availability
# Test connection to target
psql -h 10.0.1.12 -U postgres -c "SELECT 1;"
# Test application user
psql -h 10.0.1.12 -U app_user -d myapp -c "SELECT 1;"
4. Performing Switchover
4.1. Interactive Switchover (Recommended)
Step-by-step:
# 1. Initiate switchover
patronictl switchover postgres
# Patroni prompts:
Master [node1]: ← Current primary (press Enter to accept)
Candidate ['node2', 'node3'] []: ← Type target, e.g., "node2"
When should the switchover take place (e.g. 2024-11-25T10:00 ) [now]: ← Press Enter for immediate
Are you sure you want to switchover cluster postgres, demoting current master node1? [y/N]: y
Output:
2024-11-25 10:30:00.123 UTC [INFO]: Switching over from node1 to node2
2024-11-25 10:30:02.456 UTC [INFO]: Waiting for replica node2 to catch up...
2024-11-25 10:30:02.789 UTC [INFO]: Replica node2 lag: 0 bytes ✅
2024-11-25 10:30:03.012 UTC [INFO]: Promoting node2...
2024-11-25 10:30:05.234 UTC [INFO]: node2 promoted successfully
2024-11-25 10:30:06.567 UTC [INFO]: Demoting node1...
2024-11-25 10:30:08.890 UTC [INFO]: node1 reconfigured as replica
2024-11-25 10:30:10.123 UTC [INFO]: Switchover completed ✅
Total time: 10 seconds
4.2. Non-interactive Switchover
Direct command:
# Specify master and candidate explicitly
patronictl switchover postgres \
--master node1 \
--candidate node2 \
--force
# --force: Skip confirmation prompt
4.3. Scheduled Switchover
Schedule for maintenance window:
# Schedule switchover at 2 AM
patronictl switchover postgres \
--master node1 \
--candidate node2 \
--scheduled "2024-11-25T02:00:00"
# Patroni will automatically execute at scheduled time
Verify scheduled switchover:
# Check pending actions
curl -s http://10.0.1.11:8008/patroni | jq '.scheduled_switchover'
# Output:
# {
# "at": "2024-11-25T02:00:00+00:00",
# "from": "node1",
# "to": "node2"
# }
Cancel scheduled switchover:
# If plans change
patronictl flush postgres switchover
4.4. Switchover with REST API
Trigger via API:
# POST to current leader
curl -X POST http://10.0.1.11:8008/switchover \
-H "Content-Type: application/json" \
-d '{
"leader": "node1",
"candidate": "node2"
}'
# Response:
# {
# "status": "ok",
# "message": "Switchover scheduled"
# }
5. Switchover Timeline
5.1. Detailed flow
T+0s: INITIATE SWITCHOVER
Command: patronictl switchover postgres --master node1 --candidate node2
T+0.5s: PRE-CHECKS
✓ node1 is current leader
✓ node2 is healthy replica
✓ node2 replication lag: 0 bytes
✓ node2 timeline matches: 2
T+1s: PREPARE OLD PRIMARY (node1)
- Checkpoint: CHECKPOINT;
- Flush WAL
- Set session_replication_role = 'replica' (prevent writes soon)
T+2s: WAIT FOR LAG = 0
- Monitor: pg_stat_replication.replay_lag
- node2 lag: 0 bytes ✅
- All WAL replayed
T+3s: PAUSE OLD PRIMARY
- Set: pg_catalog.pg_pause_wal_replay() on replicas (not needed, they're already replaying)
- Actually: Just ensure all WAL consumed
T+4s: DEMOTE OLD PRIMARY (node1)
- Remove leader lock from DCS
- Stop accepting new connections (pg_ctl reload with max_connections=0)
- Wait for active transactions (timeout: 30s default)
T+5s: PROMOTE NEW PRIMARY (node2)
- Acquire leader lock in DCS
- Execute: SELECT pg_promote();
- Timeline: 2 → 3
- Run callbacks: on_role_change, post_promote
T+7s: VERIFY NEW PRIMARY
- pg_is_in_recovery() → false ✅
- Accepting connections
- Timeline = 3
T+8s: RECONFIGURE OLD PRIMARY (node1)
- Update primary_conninfo → node2:5432
- Update recovery.signal
- Restart PostgreSQL in recovery mode
- Timeline: 2 → 3
T+10s: REPLICATION RESTORED
- node1 now streaming from node2
- node3 updated to stream from node2
- All replicas timeline = 3
T+10s: SWITCHOVER COMPLETE ✅
Primary: node2 (was replica)
Replica: node1 (was primary)
Replica: node3
Total downtime: ~5-10 seconds
Data loss: None ✅
5.2. What happens to active connections?
During switchover:
Client connections to old primary (node1):
Option A: Graceful (default)
- New connections: REJECTED
- Active queries: ALLOWED TO COMPLETE (timeout: 30s)
- Idle connections: TERMINATED after queries done
Option B: Force (--force)
- All connections: TERMINATED IMMEDIATELY
- Active queries: ROLLBACK
- Faster but risky ⚠️
Application behavior:
# Well-written application with retry logic
import psycopg2
def execute_query():
retries = 3
for i in range(retries):
try:
conn = psycopg2.connect("host=10.0.1.11 ...")
cursor = conn.cursor()
cursor.execute("SELECT * FROM users;")
return cursor.fetchall()
except psycopg2.OperationalError as e:
if i < retries - 1:
time.sleep(1) # Wait and retry
continue
raise
6. Verification After Switchover
6.1. Cluster status
patronictl list postgres
# Expected:
# + Cluster: postgres (7001234567890123456) ----+----+-----------+
# | Member | Host | Role | State | TL | Lag in MB |
# +--------+---------------+---------+---------+----+-----------+
# | node1 | 10.0.1.11:5432| Replica | running | 3 | 0 | ← Was Leader
# | node2 | 10.0.1.12:5432| Leader | running | 3 | | ← Was Replica
# | node3 | 10.0.1.13:5432| Replica | running | 3 | 0 |
# +--------+---------------+---------+---------+----+-----------+
# Check:
# ✅ node2 is now Leader
# ✅ Timeline changed: 2 → 3
# ✅ All nodes running
# ✅ Replication lag = 0
6.2. Replication status
# On new primary (node2)
sudo -u postgres psql -h 10.0.1.12 -c "
SELECT application_name, client_addr, state, sync_state
FROM pg_stat_replication;
"
# Expected:
# application_name | client_addr | state | sync_state
# -----------------+-------------+-----------+------------
# node1 | 10.0.1.11 | streaming | async
# node3 | 10.0.1.13 | streaming | async
# Both replicas should be streaming from node2 ✅
6.3. Write test
# Insert on new primary
sudo -u postgres psql -h 10.0.1.12 -d testdb -c "
INSERT INTO test_table (data, created_at)
VALUES ('After switchover', NOW())
RETURNING *;
"
# Verify on replicas
sudo -u postgres psql -h 10.0.1.11 -d testdb -c "
SELECT * FROM test_table ORDER BY created_at DESC LIMIT 1;
"
sudo -u postgres psql -h 10.0.1.13 -d testdb -c "
SELECT * FROM test_table ORDER BY created_at DESC LIMIT 1;
"
# Should see the new row on both replicas ✅
6.4. Timeline verification
# Check timeline on all nodes
for node in 10.0.1.11 10.0.1.12 10.0.1.13; do
echo "=== $node ==="
sudo -u postgres psql -h $node -c "
SELECT timeline_id, pg_is_in_recovery() AS is_replica
FROM pg_control_checkpoint();
"
done
# All should report:
# timeline_id | is_replica
# ------------+------------
# 3 | t/f
7. Switchover Best Practices
7.1. Pre-switchover checklist
#!/bin/bash
# pre-switchover-check.sh
echo "=== Pre-Switchover Checks ==="
# 1. Cluster health
echo "1. Checking cluster health..."
patronictl list postgres | grep -q "running" || { echo "❌ Not all nodes running"; exit 1; }
echo "✅ All nodes running"
# 2. Replication lag
echo "2. Checking replication lag..."
lag=$(sudo -u postgres psql -h 10.0.1.11 -t -c "
SELECT COALESCE(MAX(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)), 0)
FROM pg_stat_replication;
")
if [ "$lag" -gt 1048576 ]; then # 1MB
echo "❌ Lag too high: $lag bytes"
exit 1
fi
echo "✅ Lag acceptable: $lag bytes"
# 3. Target candidate available
echo "3. Checking target candidate..."
patronictl list postgres | grep node2 | grep -q "running" || { echo "❌ node2 not available"; exit 1; }
echo "✅ Target candidate available"
# 4. No scheduled maintenance
echo "4. Checking scheduled actions..."
curl -s http://10.0.1.11:8008/patroni | jq -e '.scheduled_switchover == null' > /dev/null || {
echo "⚠️ Another switchover already scheduled"
}
echo ""
echo "✅ All pre-checks passed. Safe to proceed."
7.2. Minimize downtime strategies
A. Connection pooler
Use PgBouncer/HAProxy between app and database:
App → PgBouncer → Primary
↓
Replicas
During switchover:
1. PgBouncer detects primary change
2. Reconnects to new primary automatically
3. Application sees minimal disruption
B. Read-replica routing
Route read queries to replicas during switchover:
- Write queries: Wait for new primary
- Read queries: Continue on replicas (may be slightly stale)
Result: Partial availability during switchover
C. Application-level retry
# Implement exponential backoff
def execute_with_retry(query, max_retries=3):
for i in range(max_retries):
try:
return execute_query(query)
except OperationalError:
if i == max_retries - 1:
raise
time.sleep(2 ** i) # 1s, 2s, 4s
7.3. Communication plan
Before switchover:
T-24h: Announce maintenance window
- Email: ops@, dev@, stakeholders
- Slack: #incidents, #ops
- Status page: Update with scheduled maintenance
T-1h: Reminder notification
- Final checks
- Confirm go/no-go
T-5min: Begin maintenance
- Start switchover
- Monitor dashboards
During switchover:
- Real-time updates in ops channel
- Monitor metrics (latency, error rate)
- Have rollback plan ready
After switchover:
- Verify all systems operational
- Post-switchover validation
- Update documentation
- Send completion notification
8. Troubleshooting Switchover
8.1. Issue: Switchover command hangs
Symptoms: patronictl switchover never completes.
Diagnosis:
# Check what Patroni is waiting for
sudo journalctl -u patroni -f
# Common causes:
# A. High replication lag
sudo -u postgres psql -h 10.0.1.11 -c "
SELECT application_name,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes
FROM pg_stat_replication;
"
# If lag > 0, Patroni waits for lag = 0
# B. Active long-running queries
sudo -u postgres psql -h 10.0.1.11 -c "
SELECT pid, usename, state, query_start, query
FROM pg_stat_activity
WHERE state = 'active' AND query_start < now() - interval '5 minutes';
"
# Kill blocking queries:
# SELECT pg_terminate_backend(pid);
Solution:
# Option 1: Wait for lag to catch up (recommended)
# Option 2: Use --force to skip wait (risk data loss)
# Option 3: Cancel and reschedule
Ctrl+C # Cancel current switchover attempt
8.2. Issue: Candidate not eligible
Symptoms: Error "candidate is not eligible".
Diagnosis:
# Check nofailover tag
patronictl show-config postgres | grep -A5 "node2:"
# If output shows:
# node2:
# tags:
# nofailover: true ← Problem!
Solution:
# Remove nofailover tag
patronictl edit-config postgres
# Edit:
tags:
nofailover: false # Change to false
# Restart Patroni on node2
sudo systemctl restart patroni
8.3. Issue: Old primary won't demote
Symptoms: Switchover fails, old primary still leader.
Diagnosis:
# Check Patroni logs on old primary
sudo journalctl -u patroni -n 100 | grep -i "demote\|error"
# Possible causes:
# - PostgreSQL won't stop
# - Active transactions won't terminate
# - File permission issues
Solution:
# Force demote via REST API
curl -X POST http://10.0.1.11:8008/restart
# Or manually:
sudo -u postgres psql -h 10.0.1.11 -c "
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid != pg_backend_pid();
"
sudo systemctl restart patroni
8.4. Issue: Replication broken after switchover
Symptoms: Old primary not replicating from new primary.
Diagnosis:
# Check replication status
patronictl list postgres
# If node1 shows "stopped" or "streaming: False"
# Check logs
sudo journalctl -u patroni -u postgresql -n 100
Solution:
# A. Restart Patroni (usually auto-fixes)
sudo systemctl restart patroni
# B. Manual reinit if needed
patronictl reinit postgres node1
# Patroni will:
# 1. Stop PostgreSQL on node1
# 2. Remove data directory
# 3. pg_basebackup from node2
# 4. Start as replica
9. Switchover Automation
9.1. Scripted switchover
#!/bin/bash
# automated-switchover.sh
set -e
CLUSTER="postgres"
OLD_PRIMARY="node1"
NEW_PRIMARY="node2"
echo "=== Starting Automated Switchover ==="
echo "From: $OLD_PRIMARY → To: $NEW_PRIMARY"
# Pre-checks
echo "Running pre-checks..."
./pre-switchover-check.sh || exit 1
# Perform switchover
echo "Executing switchover..."
patronictl switchover $CLUSTER \
--master $OLD_PRIMARY \
--candidate $NEW_PRIMARY \
--force
# Wait for completion
echo "Waiting for switchover to complete..."
sleep 15
# Post-checks
echo "Running post-checks..."
new_leader=$(patronictl list $CLUSTER | grep Leader | awk '{print $2}')
if [ "$new_leader" == "$NEW_PRIMARY" ]; then
echo "✅ Switchover successful!"
echo "New leader: $new_leader"
else
echo "❌ Switchover failed!"
echo "Current leader: $new_leader"
exit 1
fi
# Verify replication
echo "Verifying replication..."
patronictl list $CLUSTER
echo "=== Switchover Complete ==="
9.2. Ansible playbook
# switchover.yml
---
- name: Perform Patroni switchover
hosts: localhost
gather_facts: no
vars:
cluster_name: postgres
old_primary: node1
new_primary: node2
tasks:
- name: Pre-check cluster health
command: patronictl list {{ cluster_name }}
register: cluster_status
changed_when: false
- name: Verify all nodes running
assert:
that:
- "'running' in cluster_status.stdout"
fail_msg: "Not all nodes are running"
- name: Execute switchover
command: >
patronictl switchover {{ cluster_name }}
--master {{ old_primary }}
--candidate {{ new_primary }}
--force
register: switchover_result
- name: Wait for switchover completion
pause:
seconds: 15
- name: Verify new leader
command: patronictl list {{ cluster_name }}
register: final_status
changed_when: false
- name: Display result
debug:
msg: "{{ final_status.stdout_lines }}"
- name: Verify leadership
assert:
that:
- "'{{ new_primary }}' in final_status.stdout"
- "'Leader' in final_status.stdout"
fail_msg: "Switchover failed"
success_msg: "Switchover successful"
Run:
ansible-playbook switchover.yml
9.3. CI/CD integration
# .github/workflows/db-maintenance.yml
name: Database Maintenance Switchover
on:
schedule:
- cron: '0 2 * * 0' # Every Sunday at 2 AM
workflow_dispatch: # Manual trigger
jobs:
switchover:
runs-on: self-hosted
steps:
- name: Notify start
run: |
curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
-d '{"text": "Starting scheduled database switchover"}'
- name: Pre-checks
run: ./scripts/pre-switchover-check.sh
- name: Execute switchover
run: |
patronictl switchover postgres \
--master node1 \
--candidate node2 \
--force
- name: Verify
run: ./scripts/post-switchover-verify.sh
- name: Notify completion
if: always()
run: |
curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
-d '{"text": "Switchover completed: ${{ job.status }}"}'
10. Rolling Updates with Switchover
10.1. Update strategy
Scenario: Update PostgreSQL from 17 → 18.
Steps:
1. Update replica node3 (least critical)
- Stop Patroni
- Upgrade PostgreSQL
- Start Patroni
- Verify replication
2. Update replica node2
- Stop Patroni
- Upgrade PostgreSQL
- Start Patroni
- Verify replication
3. Switchover to node2 (now updated)
- patronictl switchover --master node1 --candidate node2
4. Update old primary node1
- Stop Patroni
- Upgrade PostgreSQL
- Start Patroni (now replica)
- Verify replication
5. Optionally switchover back to node1
- patronictl switchover --master node2 --candidate node1
Result: Zero-downtime upgrade ✅
10.2. Kernel update example
#!/bin/bash
# rolling-kernel-update.sh
NODES=("node1" "node2" "node3")
PRIMARY=$(patronictl list postgres | grep Leader | awk '{print $2}')
echo "Current primary: $PRIMARY"
# Update replicas first
for node in "${NODES[@]}"; do
if [ "$node" == "$PRIMARY" ]; then
continue # Skip primary for now
fi
echo "=== Updating $node ==="
ssh $node 'sudo yum update -y kernel && sudo reboot'
echo "Waiting for $node to come back..."
sleep 60
# Wait for node to rejoin
until patronictl list postgres | grep $node | grep -q "running"; do
echo "Waiting for $node..."
sleep 10
done
echo "✅ $node updated and rejoined"
done
# Now switchover from primary
NEW_PRIMARY=${NODES[1]} # Pick a replica
if [ "$NEW_PRIMARY" == "$PRIMARY" ]; then
NEW_PRIMARY=${NODES[2]}
fi
echo "=== Switching over from $PRIMARY to $NEW_PRIMARY ==="
patronictl switchover postgres \
--master $PRIMARY \
--candidate $NEW_PRIMARY \
--force
sleep 15
# Update old primary
echo "=== Updating $PRIMARY ==="
ssh $PRIMARY 'sudo yum update -y kernel && sudo reboot'
echo "Waiting for $PRIMARY to rejoin as replica..."
sleep 60
until patronictl list postgres | grep $PRIMARY | grep -q "running"; do
echo "Waiting for $PRIMARY..."
sleep 10
done
echo "✅ All nodes updated!"
patronictl list postgres
11. Lab Exercises
Lab 1: Basic switchover
Tasks:
- Check current primary:
patronictl list - Perform switchover:
patronictl switchover postgres - Measure downtime with continuous query loop
- Verify new topology
- Document observations
Lab 2: Scheduled switchover
Tasks:
- Schedule switchover for 2 minutes from now
- Monitor logs during wait period
- Observe automatic execution
- Cancel a scheduled switchover (repeat and test cancel)
Lab 3: Forced vs graceful
Tasks:
- Create long-running query:
SELECT pg_sleep(300); - Attempt graceful switchover (observe wait)
- Cancel and retry with --force
- Compare behavior and downtime
Lab 4: Rolling update simulation
Tasks:
- Start with 3-node cluster
- "Update" node3 (simulate by restarting)
- "Update" node2
- Switchover to node2
- "Update" node1
- Verify all nodes operational
Lab 5: Switchover under load
Tasks:
- Start pgbench:
pgbench -c 10 -T 300 - During load, perform switchover
- Analyze pgbench output for errors
- Calculate success rate
- Test with connection pooler (PgBouncer)
12. Tổng kết
Key Concepts
✅ Switchover = Planned, controlled role change
✅ Graceful = Wait for transactions (slower, safer)
✅ Immediate = Force termination (faster, riskier)
✅ Scheduled = Automated at specific time
✅ Zero downtime = Achievable with proper architecture
Switchover vs Failover
| Aspect | Switchover | Failover |
|---|---|---|
| Planning | Scheduled | Unplanned |
| Control | Manual | Automatic |
| Downtime | 0-10s | 30-60s |
| Data loss | None | Possible |
| Reversible | Yes | No |
Best Practices
- ✅ Test in staging first
- ✅ Schedule during low-traffic windows
- ✅ Use graceful mode (default)
- ✅ Verify lag = 0 before switchover
- ✅ Monitor during process
- ✅ Have rollback plan
- ✅ Communicate with stakeholders
- ✅ Document procedure
Next Steps
Bài 15 sẽ cover Recovering Failed Nodes:
- Rejoin old primary after failover
- pg_rewind usage and scenarios
- Full rebuild with pg_basebackup
- Timeline divergence resolution
- Split-brain recovery