Bài 6: Cài đặt và cấu hình etcd cluster
Download, cài đặt và cấu hình etcd cluster 3 nodes, tạo systemd service và kiểm tra health với etcdctl commands.
Bài 6: Cài đặt và cấu hình etcd cluster
Mục tiêu
Sau bài học này, bạn sẽ:
- Hiểu vai trò của etcd trong Patroni architecture
- Download và cài đặt etcd trên 3 nodes
- Cấu hình etcd cluster với Raft consensus
- Tạo systemd service cho etcd
- Kiểm tra health của etcd cluster
- Sử dụng etcdctl commands cơ bản
1. Giới thiệu etcd
1.1. etcd là gì?
etcd là distributed, reliable key-value store sử dụng Raft consensus algorithm. Được CoreOS phát triển và hiện là project của CNCF (Cloud Native Computing Foundation).
Đặc điểm chính:
- 🔐 Strongly consistent: Đảm bảo consistency với Raft
- 🚀 Fast: Sub-millisecond latency cho reads
- 🔄 Distributed: Chạy multi-node cluster với quorum
- 📡 Watch mechanism: Real-time notifications cho changes
- 🔒 TTL support: Automatic key expiration (cho leader locks)
- 🌐 gRPC + HTTP API: Easy integration
1.2. etcd trong Patroni Architecture
┌──────────────────────────────────┐
│ etcd Cluster (3 nodes) │
│ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │etcd1│───│etcd2│───│etcd3│ │
│ └──┬──┘ └──┬──┘ └──┬──┘ │
│ │ │ │ │
│ └─────────┴─────────┘ │
│ Raft Consensus │
└──────────────────────────────────┘
│ │ │
┌────┴────┐ │ ┌─────┴─────┐
▼ ▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│Patroni 1│ │Patroni 2│ │Patroni 3│
└─────────┘ └─────────┘ └─────────┘
etcd lưu trữ:
/service/postgres/leader: Leader lock (TTL 30s)/service/postgres/members/: Node information/service/postgres/config: Cluster configuration/service/postgres/initialize: Bootstrap state/service/postgres/failover: Failover instructions
2. Download và cài đặt etcd
2.1. Architecture considerations
Cluster size recommendations:
- 3 nodes: Khuyến nghị cho production, tolerate 1 failure
- 5 nodes: High availability, tolerate 2 failures
- 7+ nodes: Overkill cho hầu hết use cases
Deployment topology:
Option 1: etcd on separate servers (Recommended)
┌──────────┐ ┌──────────┐ ┌──────────┐
│ etcd1 │ │ etcd2 │ │ etcd3 │
└──────────┘ └──────────┘ └──────────┘
▲ ▲ ▲
└─────────────┴─────────────┘
│ │ │
┌──────────┐ ┌──────────┐ ┌──────────┐
│Patroni 1 │ │Patroni 2 │ │Patroni 3 │
│ + PG │ │ + PG │ │ + PG │
└──────────┘ └──────────┘ └──────────┘
Option 2: etcd co-located (For labs/dev)
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ etcd1 │ │ etcd2 │ │ etcd3 │
│ Patroni 1 │ │ Patroni 2 │ │ Patroni 3 │
│ PG │ │ PG │ │ PG │
└──────────────┘ └──────────────┘ └──────────────┘
Lab này sử dụng Option 2 (co-located) để tiết kiệm resources.
2.2. Cài đặt etcd trên Ubuntu/Debian
Thực hiện trên TẤT CẢ 3 nodes.
Bước 1: Download etcd binary
# Set version
ETCD_VER=v3.5.11
# Download
wget https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz
# Extract
tar xzvf etcd-${ETCD_VER}-linux-amd64.tar.gz
# Move binaries to PATH
sudo mv etcd-${ETCD_VER}-linux-amd64/etcd /usr/local/bin/
sudo mv etcd-${ETCD_VER}-linux-amd64/etcdctl /usr/local/bin/
sudo mv etcd-${ETCD_VER}-linux-amd64/etcdutl /usr/local/bin/
# Verify
etcd --version
etcdctl version
Output:
etcd Version: 3.5.11
Git SHA: ...
Go Version: go1.20.12
Bước 2: Tạo etcd user và directories
# Tạo user
sudo useradd -r -s /bin/false etcd
# Tạo directories
sudo mkdir -p /var/lib/etcd
sudo mkdir -p /etc/etcd
# Set ownership
sudo chown -R etcd:etcd /var/lib/etcd
sudo chown -R etcd:etcd /etc/etcd
2.3. Cài đặt trên CentOS/RHEL
# Download (same as Ubuntu)
ETCD_VER=v3.5.11
wget https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzvf etcd-${ETCD_VER}-linux-amd64.tar.gz
sudo mv etcd-${ETCD_VER}-linux-amd64/etcd* /usr/local/bin/
# Create user and directories
sudo useradd -r -s /sbin/nologin etcd
sudo mkdir -p /var/lib/etcd /etc/etcd
sudo chown -R etcd:etcd /var/lib/etcd /etc/etcd
3. Cấu hình etcd cluster 3 nodes
3.1. Network topology
node1 (etcd1): 10.0.1.11:2379,2380
node2 (etcd2): 10.0.1.12:2379,2380
node3 (etcd3): 10.0.1.13:2379,2380
Port 2379: Client communication (Patroni connects here)
Port 2380: Peer communication (etcd cluster internal)
3.2. Tạo configuration file
Node 1 (10.0.1.11) - /etc/etcd/etcd.conf
# Member name
ETCD_NAME="etcd1"
# Data directory
ETCD_DATA_DIR="/var/lib/etcd/etcd1.etcd"
# Listen URLs
ETCD_LISTEN_PEER_URLS="http://10.0.1.11:2380"
ETCD_LISTEN_CLIENT_URLS="http://10.0.1.11:2379,http://127.0.0.1:2379"
# Advertise URLs (what other nodes use to connect)
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.0.1.11:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://10.0.1.11:2379"
# Cluster configuration
ETCD_INITIAL_CLUSTER="etcd1=http://10.0.1.11:2380,etcd2=http://10.0.1.12:2380,etcd3=http://10.0.1.13:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-patroni"
# Logging
ETCD_LOG_LEVEL="info"
Node 2 (10.0.1.12) - /etc/etcd/etcd.conf
ETCD_NAME="etcd2"
ETCD_DATA_DIR="/var/lib/etcd/etcd2.etcd"
ETCD_LISTEN_PEER_URLS="http://10.0.1.12:2380"
ETCD_LISTEN_CLIENT_URLS="http://10.0.1.12:2379,http://127.0.0.1:2379"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.0.1.12:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://10.0.1.12:2379"
ETCD_INITIAL_CLUSTER="etcd1=http://10.0.1.11:2380,etcd2=http://10.0.1.12:2380,etcd3=http://10.0.1.13:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-patroni"
ETCD_LOG_LEVEL="info"
Node 3 (10.0.1.13) - /etc/etcd/etcd.conf
ETCD_NAME="etcd3"
ETCD_DATA_DIR="/var/lib/etcd/etcd3.etcd"
ETCD_LISTEN_PEER_URLS="http://10.0.1.13:2380"
ETCD_LISTEN_CLIENT_URLS="http://10.0.1.13:2379,http://127.0.0.1:2379"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.0.1.13:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://10.0.1.13:2379"
ETCD_INITIAL_CLUSTER="etcd1=http://10.0.1.11:2380,etcd2=http://10.0.1.12:2380,etcd3=http://10.0.1.13:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-patroni"
ETCD_LOG_LEVEL="info"
3.3. Giải thích các parameters
| Parameter | Ý nghĩa |
|---|---|
ETCD_NAME | Tên unique của member trong cluster |
ETCD_DATA_DIR | Thư mục lưu data |
ETCD_LISTEN_PEER_URLS | URL listen cho peer communication (port 2380) |
ETCD_LISTEN_CLIENT_URLS | URL listen cho client connections (port 2379) |
ETCD_INITIAL_ADVERTISE_PEER_URLS | URL để các peers khác connect đến |
ETCD_ADVERTISE_CLIENT_URLS | URL để clients connect đến |
ETCD_INITIAL_CLUSTER | Danh sách tất cả members khi bootstrap |
ETCD_INITIAL_CLUSTER_STATE | new (first time) hoặc existing (add member) |
ETCD_INITIAL_CLUSTER_TOKEN | Token unique cho cluster (tránh nhầm lẫn) |
4. Tạo systemd service
Tạo file /etc/systemd/system/etcd.service trên TẤT CẢ 3 nodes:
[Unit]
Description=etcd distributed reliable key-value store
Documentation=https://etcd.io/docs/
After=network.target
Wants=network-online.target
[Service]
Type=notify
User=etcd
Group=etcd
# Load environment variables from config file
EnvironmentFile=/etc/etcd/etcd.conf
# Start etcd with config
ExecStart=/usr/local/bin/etcd
# Restart on failure
Restart=on-failure
RestartSec=5
# Limits
LimitNOFILE=65536
LimitNPROC=65536
# Security
NoNewPrivileges=true
ProtectHome=true
ProtectSystem=strict
ReadWritePaths=/var/lib/etcd
[Install]
WantedBy=multi-user.target
Reload systemd và enable service:
sudo systemctl daemon-reload
sudo systemctl enable etcd
5. Khởi động etcd cluster
5.1. Start etcd trên các nodes
Quan trọng: Khởi động ĐỒNG THỜI hoặc trong vòng 30 giây để cluster có thể form.
Terminal 1 (node1):
sudo systemctl start etcd
sudo systemctl status etcd
Terminal 2 (node2):
sudo systemctl start etcd
sudo systemctl status etcd
Terminal 3 (node3):
sudo systemctl start etcd
sudo systemctl status etcd
5.2. Kiểm tra logs
sudo journalctl -u etcd -f
Successful startup logs:
... etcd1 became leader at term 2
... established a TCP streaming connection with peer etcd2
... established a TCP streaming connection with peer etcd3
... ready to serve client requests
6. Kiểm tra health của etcd cluster
6.1. Check cluster members
# Từ bất kỳ node nào
etcdctl member list
# Output:
# 8e9e05c52164694d, started, etcd1, http://10.0.1.11:2380, http://10.0.1.11:2379, false
# 91bc3c398fb3c146, started, etcd2, http://10.0.1.12:2380, http://10.0.1.12:2379, false
# fd422379fda50e48, started, etcd3, http://10.0.1.13:2380, http://10.0.1.13:2379, false
6.2. Check cluster health
etcdctl endpoint health --cluster
# Output:
# http://10.0.1.11:2379 is healthy: successfully committed proposal: took = 2.345678ms
# http://10.0.1.12:2379 is healthy: successfully committed proposal: took = 1.234567ms
# http://10.0.1.13:2379 is healthy: successfully committed proposal: took = 2.123456ms
6.3. Check endpoint status
etcdctl endpoint status --cluster --write-out=table
# Output:
# +------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
# +------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | 10.0.1.11:2379 | 8e9e05c52164694d | 3.5.11 | 20 kB | true | false | 2 | 8 | 8 | |
# | 10.0.1.12:2379 | 91bc3c398fb3c146 | 3.5.11 | 20 kB | false | false | 2 | 8 | 8 | |
# | 10.0.1.13:2379 | fd422379fda50e48 | 3.5.11 | 20 kB | false | false | 2 | 8 | 8 | |
# +------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
Giải thích output:
IS LEADER: etcd1 đang là leaderRAFT TERM: Election term (tăng mỗi lần election)RAFT INDEX: Number of log entries
7. etcdctl commands cơ bản
7.1. Set environment (optional)
export ETCDCTL_API=3
export ETCDCTL_ENDPOINTS=http://10.0.1.11:2379,http://10.0.1.12:2379,http://10.0.1.13:2379
# Thêm vào ~/.bashrc để persistent
echo 'export ETCDCTL_API=3' >> ~/.bashrc
echo 'export ETCDCTL_ENDPOINTS=http://10.0.1.11:2379,http://10.0.1.12:2379,http://10.0.1.13:2379' >> ~/.bashrc
7.2. Basic operations
Put/Get/Delete keys
# Write a key
etcdctl put /test/key1 "Hello etcd"
# Read a key
etcdctl get /test/key1
# Output:
# /test/key1
# Hello etcd
# Get with details
etcdctl get /test/key1 --write-out=json
# Delete a key
etcdctl del /test/key1
List keys with prefix
# Put some test keys
etcdctl put /service/postgres/test1 "value1"
etcdctl put /service/postgres/test2 "value2"
# List all keys under /service/postgres/
etcdctl get /service/postgres/ --prefix
# Output:
# /service/postgres/test1
# value1
# /service/postgres/test2
# value2
Watch for changes
# Terminal 1: Watch for changes
etcdctl watch /service/postgres/ --prefix
# Terminal 2: Make changes
etcdctl put /service/postgres/leader "node1"
# Terminal 1 sẽ hiển thị:
# PUT
# /service/postgres/leader
# node1
TTL keys (dùng cho leader locks)
# Create a lease with 30 seconds TTL
etcdctl lease grant 30
# Output: lease 7587869125995748410 granted with TTL(30s)
# Put key with lease
etcdctl put /test/ttl-key "value" --lease=7587869125995748410
# Key sẽ tự động xóa sau 30 giây
# Keep lease alive
etcdctl lease keep-alive 7587869125995748410
7.3. Advanced operations
Transaction (atomic operations)
# Atomic compare-and-swap
etcdctl txn <<< '
compare:
value("/test/key1") = "old_value"
success requests:
put /test/key1 "new_value"
failure requests:
get /test/key1
'
Snapshot backup
# Create snapshot
etcdctl snapshot save /tmp/etcd-backup.db
# Verify snapshot
etcdctl snapshot status /tmp/etcd-backup.db --write-out=table
8. Lab: Setup etcd cluster hoàn chỉnh
8.1. Lab objectives
- ✅ Cài đặt etcd trên 3 nodes
- ✅ Cấu hình cluster
- ✅ Verify cluster health
- ✅ Test basic operations
- ✅ Simulate node failure
8.2. Step-by-step lab guide
1. Cài đặt etcd trên tất cả nodes
Đã thực hiện ở Section 2.
2. Tạo config files
Đã thực hiện ở Section 3.
3. Tạo systemd service
Đã thực hiện ở Section 4.
4. Start cluster
# Trên cả 3 nodes (đồng thời)
sudo systemctl start etcd
# Check status
sudo systemctl status etcd
5. Verify cluster
# Member list
etcdctl member list
# Health check
etcdctl endpoint health --cluster
# Status
etcdctl endpoint status --cluster --write-out=table
6. Test write/read
# On node1: Write
etcdctl put /test/mykey "Hello from etcd cluster"
# On node2: Read
etcdctl get /test/mykey
# Should see: Hello from etcd cluster
# On node3: Verify
etcdctl get /test/mykey
# Should see: Hello from etcd cluster
7. Test leader election
# Identify current leader
etcdctl endpoint status --cluster --write-out=table
# Note which node IS LEADER = true
# Stop leader node
sudo systemctl stop etcd # On leader node
# Wait 5-10 seconds
# Check from another node
etcdctl endpoint status --cluster --write-out=table
# New leader should be elected
# Restart stopped node
sudo systemctl start etcd # On stopped node
# Verify rejoined
etcdctl member list
8. Test data persistence
# Write some data
etcdctl put /persistent/key "This should survive restart"
# Restart ALL nodes (one by one)
sudo systemctl restart etcd
# Verify data
etcdctl get /persistent/key
# Should still see: This should survive restart
8.3. Troubleshooting common issues
Issue 1: Cluster won't form
# Symptom
journalctl -u etcd -n 50
# Error: "request cluster ID mismatch"
# Solution: Clear data and restart
sudo systemctl stop etcd
sudo rm -rf /var/lib/etcd/*
sudo systemctl start etcd
Issue 2: Cannot connect to etcd
# Check if etcd is listening
sudo netstat -tlnp | grep etcd
# Should see ports 2379 and 2380
# Check firewall
sudo firewall-cmd --list-all # CentOS/RHEL
sudo ufw status # Ubuntu
# Add firewall rules if needed
sudo ufw allow 2379/tcp
sudo ufw allow 2380/tcp
Issue 3: Node won't join cluster
# Check ETCD_INITIAL_CLUSTER in config
cat /etc/etcd/etcd.conf | grep INITIAL_CLUSTER
# Verify network connectivity
ping 10.0.1.11
telnet 10.0.1.11 2380
Issue 4: Split-brain or multiple leaders
# Check cluster status
etcdctl endpoint status --cluster --write-out=table
# If multiple leaders (shouldn't happen with proper setup):
# 1. Stop all etcd instances
sudo systemctl stop etcd # On all nodes
# 2. Clear data on all nodes
sudo rm -rf /var/lib/etcd/*
# 3. Restart cluster (bootstrap again)
# Start all nodes within 30 seconds
9. Performance tuning
9.1. etcd tuning parameters
# Add to /etc/etcd/etcd.conf
# Heartbeat interval (default: 100ms)
ETCD_HEARTBEAT_INTERVAL="100"
# Election timeout (default: 1000ms)
ETCD_ELECTION_TIMEOUT="1000"
# Snapshot count (default: 10000)
# Compact and snapshot after this many transactions
ETCD_SNAPSHOT_COUNT="10000"
# Quota backend bytes (default: 2GB)
# Max database size
ETCD_QUOTA_BACKEND_BYTES="2147483648"
9.2. Monitoring etcd
Key metrics to monitor:
- Latency (99th percentile < 50ms)
- Disk fsync duration (< 10ms)
- Leader changes (should be rare)
- Database size
- Failed proposals
Check metrics:
curl http://10.0.1.11:2379/metrics
# Key metrics:
# etcd_server_has_leader
# etcd_server_leader_changes_seen_total
# etcd_disk_backend_commit_duration_seconds
# etcd_network_peer_round_trip_time_seconds
10. Tổng kết
Key Takeaways
✅ etcd cluster: 3-node cluster cho production HA
✅ Ports: 2379 (client), 2380 (peer)
✅ Raft consensus: Automatic leader election và data replication
✅ Quorum: Cần 2/3 nodes để cluster hoạt động
✅ TTL keys: Dùng cho Patroni leader locks
✅ etcdctl: CLI tool để quản lý và troubleshoot
Checklist sau Lab
- etcd cluster 3 nodes đang chạy
-
etcdctl member listhiển thị đầy đủ 3 members -
etcdctl endpoint health --clustertất cả healthy - Có 1 leader và 2 followers
- etcd service enabled và sẽ auto-start khi reboot
- Firewall cho phép ports 2379 và 2380
Kiến trúc hiện tại
✅ 3 VMs prepared (Bài 4)
✅ PostgreSQL 15 installed (Bài 5)
✅ etcd cluster running (Bài 6)
Next: Cài đặt Patroni và bootstrap HA cluster
Chuẩn bị cho Bài 7
Bài tiếp theo sẽ cài đặt Patroni và tích hợp với etcd cluster đã setup.