Bài 8: Cấu hình Patroni chi tiết
Phân tích sâu file patroni.yml từng section: bootstrap, PostgreSQL parameters, authentication, tags và constraints cho cluster.
Bài 8: Cấu hình Patroni chi tiết
Mục tiêu
Sau bài học này, bạn sẽ:
- Hiểu sâu về từng section trong file
patroni.yml - Cấu hình bootstrap options
- Tuning PostgreSQL parameters cho HA
- Cấu hình authentication và security
- Sử dụng tags và constraints
- Tối ưu các timing parameters
1. Tổng quan về Patroni Configuration
1.1. Configuration layers
Patroni có nhiều layers cấu hình:
┌─────────────────────────────────────┐
│ 1. Command line arguments │ (Highest priority)
│ patroni --config-file=... │
├─────────────────────────────────────┤
│ 2. Environment variables │
│ PATRONI_SCOPE=postgres │
├─────────────────────────────────────┤
│ 3. Configuration file │
│ /etc/patroni/patroni.yml │
├─────────────────────────────────────┤
│ 4. DCS (Dynamic configuration) │
│ Stored in etcd │
└─────────────────────────────────────┘
↓
Merged configuration
Priority order: Command line > Environment > Config file > DCS
1.2. Static vs Dynamic configuration
Static configuration (trong patroni.yml):
- Node-specific settings (name, addresses)
- etcd connection info
- Data directory, bin directory
- Restart required để apply changes
Dynamic configuration (trong DCS):
- PostgreSQL parameters
- Bootstrap settings
- TTL, loop_wait, retry_timeout
- Có thể update runtime:
patronictl edit-config
2. Section: Scope và Namespace
2.1. Scope (Cluster name)
scope: postgres
Scope là tên unique của cluster trong DCS.
Ý nghĩa:
- Tất cả nodes trong cùng cluster phải có cùng
scope - DCS keys được prefix với scope:
/service/postgres/... - Cho phép nhiều clusters trên cùng etcd cluster
Best practices:
# Development
scope: postgres-dev
# Staging
scope: postgres-staging
# Production
scope: postgres-prod
# Multi-tenant
scope: customer1-postgres
scope: customer2-postgres
2.2. Namespace
namespace: /service/
Namespace là prefix cho tất cả keys trong DCS.
Full DCS key structure:
/service/postgres/leader
/service/postgres/members/node1
/service/postgres/config
Multiple clusters example:
/service/postgres-prod/leader
/service/postgres-staging/leader
/application/myapp-db/leader
3. Section: Node Information
3.1. Node name
name: node1
Requirements:
- Unique trong cluster
- Không đổi sau khi bootstrap
- Nên dùng hostname hoặc FQDN
Ví dụ naming conventions:
# Simple
name: node1
name: node2
name: node3
# With datacenter
name: dc1-node1
name: dc2-node1
# With role hint (not recommended)
name: pg-primary-01 # ❌ Role changes
name: pg-db-01 # ✅ Better
3.2. Host information
# Optional: Override hostname
host: 10.0.1.11
Patroni tự động detect hostname, nhưng có thể override nếu cần.
4. Section: REST API
4.1. Basic configuration
restapi:
listen: 0.0.0.0:8008
connect_address: 10.0.1.11:8008
Parameters:
listen: Interface và port để bind (0.0.0.0 = all interfaces)connect_address: Address mà các nodes khác dùng để connect
4.2. Authentication
restapi:
listen: 0.0.0.0:8008
connect_address: 10.0.1.11:8008
authentication:
username: admin
password: secret_password
Khi nào cần authentication?:
- REST API exposed ra internet
- Compliance requirements
- Multi-tenant environments
Sử dụng với curl:
curl -u admin:secret_password http://10.0.1.11:8008/
4.3. SSL/TLS
restapi:
listen: 0.0.0.0:8008
connect_address: 10.0.1.11:8008
certfile: /etc/patroni/ssl/server.crt
keyfile: /etc/patroni/ssl/server.key
cafile: /etc/patroni/ssl/ca.crt
verify_client: required # none, optional, required
Generate self-signed certificates:
# CA
openssl genrsa -out ca.key 4096
openssl req -new -x509 -days 3650 -key ca.key -out ca.crt
# Server certificate
openssl genrsa -out server.key 4096
openssl req -new -key server.key -out server.csr
openssl x509 -req -days 365 -in server.csr -CA ca.crt -CAkey ca.key -set_serial 01 -out server.crt
4.4. REST API endpoints
Health check endpoints:
# General health
GET /health
# Returns: 200 if healthy, 503 if not
# Primary check
GET /primary
GET /master # deprecated
# Returns: 200 if primary, 503 if not
# Replica check
GET /replica
# Returns: 200 if replica, 503 if primary or unhealthy
# Read-only check (replica or sync standby)
GET /read-only
# Returns: 200 if can serve reads
# Synchronous standby check
GET /synchronous
GET /sync
# Returns: 200 if synchronous standby
Management endpoints:
# Restart PostgreSQL
POST /restart
# Reload configuration
POST /reload
# Reinitialize
POST /reinitialize
5. Section: Bootstrap
5.1. DCS settings
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
maximum_lag_on_syncnode: 1048576
synchronous_mode: false
synchronous_mode_strict: false
TTL (Time To Live):
ttl: 30 # seconds
- Leader lock expiration time
- Nếu leader không renew trong TTL → lock expires
- Tradeoff:
- TTL thấp (10s): Failover nhanh, nhưng risk false positives
- TTL cao (60s): Ổn định hơn, nhưng downtime lâu hơn
- Recommended: 30 seconds
loop_wait:
loop_wait: 10 # seconds
- Interval giữa các health checks
- Leader renews lock mỗi
loop_waitseconds - Recommended: 10 seconds (1/3 của TTL)
retry_timeout:
retry_timeout: 10 # seconds
- Timeout cho DCS operations
- Nếu DCS không respond trong timeout → consider failed
- Recommended: 10 seconds
maximum_lag_on_failover:
maximum_lag_on_failover: 1048576 # bytes (1MB)
- Max replication lag để eligible for promotion
- Replica với lag > threshold sẽ không được chọn làm primary
- 0 = no limit (any replica can be promoted)
- Recommended: 1MB cho zero data loss preference
synchronous_mode:
synchronous_mode: false
synchronous_mode_strict: false
false: Asynchronous replication (default)true: Enable synchronous replicationsynchronous_mode_strict: Strict sync mode (no writes if no sync standby)
5.2. PostgreSQL bootstrap parameters
bootstrap:
dcs:
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
# Replication
wal_level: replica
hot_standby: "on"
wal_keep_size: "1GB"
max_wal_senders: 10
max_replication_slots: 10
wal_log_hints: "on"
# Archiving (optional)
archive_mode: "on"
archive_timeout: 300
archive_command: "cp %p /var/lib/postgresql/18/archive/%f"
# Performance
shared_buffers: "2GB"
effective_cache_size: "6GB"
maintenance_work_mem: "512MB"
checkpoint_completion_target: 0.9
wal_buffers: "16MB"
default_statistics_target: 100
random_page_cost: 1.1
effective_io_concurrency: 200
work_mem: "16MB"
min_wal_size: "1GB"
max_wal_size: "4GB"
max_worker_processes: 4
max_parallel_workers_per_gather: 2
max_parallel_workers: 4
max_parallel_maintenance_workers: 2
use_pg_rewind:
use_pg_rewind: true
- Enable automatic recovery với pg_rewind
- Faster recovery khi rejoining cluster
- Requires:
wal_log_hints = onhoặc data checksums
use_slots:
use_slots: true
- Create replication slots tự động
- Prevent WAL deletion khi replica lag
- Recommended: true
5.3. initdb options
bootstrap:
initdb:
- encoding: UTF8
- locale: en_US.UTF-8
- data-checksums
- auth-host: scram-sha-256
- auth-local: peer
Common options:
encoding: Character encoding (UTF8 recommended)locale: System localedata-checksums: Enable page checksums (detect corruption)auth-host: Default authentication method for host connectionsauth-local: Default authentication method for local connections
Lưu ý: initdb chỉ chạy khi bootstrap cluster lần đầu.
5.4. pg_hba configuration
bootstrap:
pg_hba:
# Local connections
- local all all peer
- local all all md5
# Localhost
- host all all 127.0.0.1/32 scram-sha-256
- host all all ::1/128 scram-sha-256
# Replication connections
- host replication replicator 10.0.1.11/32 scram-sha-256
- host replication replicator 10.0.1.12/32 scram-sha-256
- host replication replicator 10.0.1.13/32 scram-sha-256
# Application connections
- host all all 10.0.1.0/24 scram-sha-256
# Allow from specific app servers
- host myapp myapp_user 10.0.2.0/24 scram-sha-256
Best practices:
- ✅ Use
scram-sha-256(most secure) - ✅ Specific IP addresses/subnets
- ✅ Separate users for different purposes
- ❌ Avoid
trustmethod - ❌ Avoid
0.0.0.0/0unless necessary
5.5. Bootstrap users
bootstrap:
users:
admin:
password: admin_password_here
options:
- createrole
- createdb
myapp:
password: myapp_password_here
options:
- login
monitoring:
password: monitoring_password_here
options:
- login
User types:
- admin: Administrative tasks
- application: Application database user
- monitoring: Prometheus exporter, etc.
- replication: Already handled by Patroni
5.6. Post-bootstrap scripts
bootstrap:
post_bootstrap: /etc/patroni/scripts/post_bootstrap.sh
post_init: /etc/patroni/scripts/post_init.sh
post_bootstrap: Chạy sau khi bootstrap cluster (chỉ trên primary) post_init: Chạy sau khi initialize database
Example script (/etc/patroni/scripts/post_bootstrap.sh):
#!/bin/bash
# Create extensions
psql -U postgres -c "CREATE EXTENSION IF NOT EXISTS pg_stat_statements;"
psql -U postgres -c "CREATE EXTENSION IF NOT EXISTS pg_trgm;"
# Create databases
psql -U postgres -c "CREATE DATABASE myapp;"
# Grant permissions
psql -U postgres -d myapp -c "GRANT ALL ON SCHEMA public TO myapp;"
6. Section: PostgreSQL
6.1. Connection settings
postgresql:
listen: 0.0.0.0:5432
connect_address: 10.0.1.11:5432
proxy_address: 10.0.1.100:5432 # Optional: VIP address
listen: Interface để PostgreSQL listen connect_address: Address để replication connections proxy_address: Virtual IP (HAProxy, pgBouncer)
6.2. Data and binary directories
postgresql:
data_dir: /var/lib/postgresql/18/data
bin_dir: /usr/lib/postgresql/18/bin
config_dir: /etc/postgresql/18/main # Optional
pgpass: /var/lib/postgresql/.pgpass # Optional
Lưu ý:
data_dir: Nơi lưu database filesbin_dir: Nơi chứa PostgreSQL binaries (psql, pg_ctl, etc.)config_dir: Nếu config files ở nơi khác với data_dir
6.3. Authentication
postgresql:
authentication:
replication:
username: replicator
password: replicator_password
superuser:
username: postgres
password: postgres_password
rewind:
username: rewind_user
password: rewind_password
replication: User cho streaming replication superuser: Patroni dùng để manage PostgreSQL rewind: User cho pg_rewind (optional, có thể dùng superuser)
Security best practice: Store passwords in environment variables hoặc secrets manager.
6.4. Runtime parameters
postgresql:
parameters:
# Connection
max_connections: 200
superuser_reserved_connections: 3
# Memory
shared_buffers: "4GB" # 25% of RAM
effective_cache_size: "12GB" # 50-75% of RAM
maintenance_work_mem: "1GB"
work_mem: "20MB"
# WAL
wal_buffers: "16MB"
min_wal_size: "2GB"
max_wal_size: "8GB"
wal_compression: "on"
# Checkpoints
checkpoint_timeout: "15min"
checkpoint_completion_target: 0.9
# Query planning
default_statistics_target: 100
random_page_cost: 1.1 # SSD
effective_io_concurrency: 200 # SSD
# Parallel query
max_worker_processes: 8
max_parallel_workers_per_gather: 4
max_parallel_workers: 8
max_parallel_maintenance_workers: 4
# Logging
log_line_prefix: "%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h "
log_checkpoints: "on"
log_connections: "on"
log_disconnections: "on"
log_lock_waits: "on"
log_temp_files: 0
log_autovacuum_min_duration: 0
# Auto-vacuum
autovacuum: "on"
autovacuum_max_workers: 3
autovacuum_naptime: "10s"
# Locale
lc_messages: "en_US.UTF-8"
lc_monetary: "en_US.UTF-8"
lc_numeric: "en_US.UTF-8"
lc_time: "en_US.UTF-8"
# Extensions
shared_preload_libraries: "pg_stat_statements"
Memory sizing guide:
Total RAM: 16GB
shared_buffers: 4GB (25%)
effective_cache_size: 12GB (75%)
maintenance_work_mem: 1GB
work_mem: 20MB × max_connections = 4GB max
6.5. Additional pg_hba entries
postgresql:
pg_hba:
# Additional entries beyond bootstrap
- hostssl all all 10.0.3.0/24 scram-sha-256
- host replication replicator 10.0.4.0/24 scram-sha-256
Merge với entries từ bootstrap.pg_hba.
6.6. Callback scripts
postgresql:
callbacks:
on_reload: /etc/patroni/scripts/on_reload.sh
on_restart: /etc/patroni/scripts/on_restart.sh
on_role_change: /etc/patroni/scripts/on_role_change.sh
on_start: /etc/patroni/scripts/on_start.sh
on_stop: /etc/patroni/scripts/on_stop.sh
on_role_change example:
#!/bin/bash
# /etc/patroni/scripts/on_role_change.sh
ROLE=$1 # 'master' or 'replica'
CLUSTER=$2
LEADER=$3
if [ "$ROLE" = "master" ]; then
echo "$(date): Promoted to PRIMARY" >> /var/log/patroni/role_changes.log
# Update HAProxy
curl -X POST http://haproxy:9999/update
# Send notification
curl -X POST https://slack.webhook.url \
-d "{\"text\": \"PostgreSQL node promoted to PRIMARY\"}"
else
echo "$(date): Demoted to REPLICA" >> /var/log/patroni/role_changes.log
fi
6.7. Custom configuration files
postgresql:
custom_conf: /etc/postgresql/18/main/custom.conf
Include custom configuration file.
Example (custom.conf):
# Custom settings
statement_timeout = 30000
idle_in_transaction_session_timeout = 60000
6.8. Remove data directory on failover
postgresql:
remove_data_directory_on_rewind_failure: true
remove_data_directory_on_diverged_timelines: true
Cẩn thận: Xóa data directory nếu recovery fail.
7. Section: Tags
7.1. Failover tags
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
nofailover:
nofailover: false # Node có thể become primary
nofailover: true # Node KHÔNG BAO GIỜ become primary
Use case: Replica chỉ dùng cho reporting, analytics.
noloadbalance:
noloadbalance: false # Node có thể serve read queries
noloadbalance: true # Node KHÔNG serve read queries
Use case: Node đang maintenance hoặc có issue.
clonefrom:
clonefrom: false # Node có thể làm source cho basebackup
clonefrom: true # Node ưu tiên làm source
Use case: Designated backup node.
nosync:
nosync: false # Node có thể become synchronous standby
nosync: true # Node KHÔNG become synchronous standby
Use case: Async replica ở datacenter khác.
7.2. Custom tags
tags:
datacenter: dc1
environment: production
application: myapp
version: v1.0.0
rack: rack1
zone: us-east-1a
Use cases:
- Monitoring và labeling
- Custom failover logic
- Geographic routing
- Multi-tenant identification
7.3. Priority tag
tags:
nofailover: false
# Higher number = higher priority for promotion
failover_priority: 100
Example cluster:
node1: failover_priority: 100 ← Preferred primary
node2: failover_priority: 50
node3: failover_priority: 10 ← Last resort
8. Section: Watchdog
8.1. Basic watchdog configuration
watchdog:
mode: required # off, automatic, required
device: /dev/watchdog
safety_margin: 5
Modes:
off: Disable watchdogautomatic: Use if availablerequired: Fail if watchdog không available
8.2. Hardware watchdog
Check watchdog availability:
ls -l /dev/watchdog
# crw------- 1 root root 10, 130 ... /dev/watchdog
Load watchdog module:
# Load softdog module
sudo modprobe softdog
# Make persistent
echo "softdog" | sudo tee -a /etc/modules
# Verify
lsmod | grep dog
# softdog ...
Grant access to postgres user:
# Create udev rule
sudo tee /etc/udev/rules.d/60-watchdog.rules << EOF
KERNEL=="watchdog", OWNER="postgres", GROUP="postgres", MODE="0660"
EOF
# Reload udev
sudo udevadm control --reload-rules
sudo udevadm trigger
8.3. Why use watchdog?
Split-brain prevention:
- Patroni hangs nhưng PostgreSQL vẫn chạy
- Network issue: Patroni loses DCS but node alive
- Watchdog reboots node → Prevent zombie primary
Flow:
1. Patroni healthy → Kicks watchdog every 10s
2. Patroni hangs/loses DCS → Stops kicking
3. Watchdog timeout (safety_margin) → Reboot node
4. Node reboots → No zombie primary
9. Section: Synchronous Replication
9.1. Enable synchronous mode
bootstrap:
dcs:
synchronous_mode: true
synchronous_mode_strict: false
synchronous_node_count: 1
synchronous_mode: Enable sync replication synchronous_mode_strict: Primary refuses writes nếu không có sync standby synchronous_node_count: Số sync standbys (≥1)
9.2. Synchronous mode variants
Async (default):
synchronous_mode: false
- Fast writes
- Risk data loss nếu primary fails
Synchronous:
synchronous_mode: true
synchronous_mode_strict: false
- Wait for 1 standby confirmation
- Degrade to async nếu không có standbys
Strict synchronous:
synchronous_mode: true
synchronous_mode_strict: true
- REFUSE writes nếu không có sync standby
- Zero data loss guarantee
- Risk availability impact
9.3. Multiple synchronous standbys
bootstrap:
dcs:
synchronous_mode: true
synchronous_node_count: 2 # Wait for 2 standbys
PostgreSQL 18 supports:
synchronous_standby_names = 'FIRST 2 (node2, node3, node4)'
-- or
synchronous_standby_names = 'ANY 2 (node2, node3, node4)'
10. Complete Configuration Example
10.1. Production-grade patroni.yml
scope: postgres-prod
namespace: /service/
name: node1
restapi:
listen: 0.0.0.0:8008
connect_address: 10.0.1.11:8008
authentication:
username: admin
password: ${PATRONI_RESTAPI_PASSWORD}
certfile: /etc/patroni/ssl/server.crt
keyfile: /etc/patroni/ssl/server.key
etcd:
hosts: 10.0.1.11:2379,10.0.1.12:2379,10.0.1.13:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
maximum_lag_on_syncnode: 1048576
synchronous_mode: true
synchronous_mode_strict: false
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
# Replication
wal_level: replica
hot_standby: "on"
wal_keep_size: "2GB"
max_wal_senders: 10
max_replication_slots: 10
wal_log_hints: "on"
# Performance
max_connections: 200
shared_buffers: "4GB"
effective_cache_size: "12GB"
maintenance_work_mem: "1GB"
work_mem: "20MB"
wal_buffers: "16MB"
checkpoint_completion_target: 0.9
# Logging
logging_collector: "on"
log_directory: "log"
log_filename: "postgresql-%Y-%m-%d_%H%M%S.log"
log_line_prefix: "%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h "
log_checkpoints: "on"
log_connections: "on"
log_disconnections: "on"
log_min_duration_statement: 1000
# Extensions
shared_preload_libraries: "pg_stat_statements"
initdb:
- encoding: UTF8
- locale: en_US.UTF-8
- data-checksums
pg_hba:
- local all all peer
- host replication replicator 10.0.1.11/32 scram-sha-256
- host replication replicator 10.0.1.12/32 scram-sha-256
- host replication replicator 10.0.1.13/32 scram-sha-256
- host all all 10.0.1.0/24 scram-sha-256
- hostssl all all 0.0.0.0/0 scram-sha-256
users:
admin:
password: ${ADMIN_PASSWORD}
options:
- createrole
- createdb
postgresql:
listen: 0.0.0.0:5432
connect_address: 10.0.1.11:5432
data_dir: /var/lib/postgresql/18/data
bin_dir: /usr/lib/postgresql/18/bin
authentication:
replication:
username: replicator
password: ${REPLICATION_PASSWORD}
superuser:
username: postgres
password: ${POSTGRES_PASSWORD}
parameters:
unix_socket_directories: "/var/run/postgresql"
callbacks:
on_role_change: /etc/patroni/scripts/on_role_change.sh
on_start: /etc/patroni/scripts/on_start.sh
on_stop: /etc/patroni/scripts/on_stop.sh
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
datacenter: dc1
environment: production
failover_priority: 100
watchdog:
mode: automatic
device: /dev/watchdog
safety_margin: 5
10.2. Environment variables
# /etc/patroni/patroni.env
export PATRONI_RESTAPI_PASSWORD="secure_api_password"
export ADMIN_PASSWORD="secure_admin_password"
export REPLICATION_PASSWORD="secure_replication_password"
export POSTGRES_PASSWORD="secure_postgres_password"
Load in systemd:
[Service]
EnvironmentFile=/etc/patroni/patroni.env
11. Tổng kết
Key Takeaways
✅ Configuration layers: Command line > Env > Config file > DCS
✅ Static config: Node-specific, requires restart
✅ Dynamic config: Cluster-wide, update via patronictl edit-config
✅ Bootstrap: One-time initialization settings
✅ Tags: Control failover behavior và node roles
✅ Sync replication: Balance between durability và availability
Best Practices Checklist
- Use environment variables cho passwords
- Enable
use_pg_rewindvớiwal_log_hints: on - Set appropriate
ttl,loop_wait,retry_timeout - Configure
maximum_lag_on_failoverfor zero data loss - Use
data-checksumstrong initdb - Set up callback scripts cho notifications
- Configure watchdog cho split-brain prevention
- Use
scram-sha-256authentication - Document custom tags và their meanings
- Regular backup of configuration files
Chuẩn bị cho Bài 9
Bài 9 sẽ bootstrap cluster lần đầu với configuration đã chuẩn bị:
- Start Patroni trên 3 nodes
- Verify cluster formation
- Test basic operations
- Troubleshoot common issues