Bài 6: Cài đặt và cấu hình etcd cluster

Download, cài đặt và cấu hình etcd cluster 3 nodes, tạo systemd service và kiểm tra health với etcdctl commands.

9 min read
XDEV ASIA

Bài 6: Cài đặt và cấu hình etcd cluster

Mục tiêu

Sau bài học này, bạn sẽ:

  • Hiểu vai trò của etcd trong Patroni architecture
  • Download và cài đặt etcd trên 3 nodes
  • Cấu hình etcd cluster với Raft consensus
  • Tạo systemd service cho etcd
  • Kiểm tra health của etcd cluster
  • Sử dụng etcdctl commands cơ bản

1. Giới thiệu etcd

1.1. etcd là gì?

etcd là distributed, reliable key-value store sử dụng Raft consensus algorithm. Được CoreOS phát triển và hiện là project của CNCF (Cloud Native Computing Foundation).

Đặc điểm chính:

  • 🔐 Strongly consistent: Đảm bảo consistency với Raft
  • 🚀 Fast: Sub-millisecond latency cho reads
  • 🔄 Distributed: Chạy multi-node cluster với quorum
  • 📡 Watch mechanism: Real-time notifications cho changes
  • 🔒 TTL support: Automatic key expiration (cho leader locks)
  • 🌐 gRPC + HTTP API: Easy integration

1.2. etcd trong Patroni Architecture

┌──────────────────────────────────┐
│      etcd Cluster (3 nodes)      │
│  ┌─────┐   ┌─────┐   ┌─────┐    │
│  │etcd1│───│etcd2│───│etcd3│    │
│  └──┬──┘   └──┬──┘   └──┬──┘    │
│     │         │         │         │
│     └─────────┴─────────┘         │
│        Raft Consensus             │
└──────────────────────────────────┘
         │        │        │
    ┌────┴────┐  │  ┌─────┴─────┐
    ▼         ▼  ▼  ▼           ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│Patroni 1│ │Patroni 2│ │Patroni 3│
└─────────┘ └─────────┘ └─────────┘

etcd lưu trữ:

  • /service/postgres/leader: Leader lock (TTL 30s)
  • /service/postgres/members/: Node information
  • /service/postgres/config: Cluster configuration
  • /service/postgres/initialize: Bootstrap state
  • /service/postgres/failover: Failover instructions

2. Download và cài đặt etcd

2.1. Architecture considerations

Cluster size recommendations:

  • 3 nodes: Khuyến nghị cho production, tolerate 1 failure
  • 5 nodes: High availability, tolerate 2 failures
  • 7+ nodes: Overkill cho hầu hết use cases

Deployment topology:

Option 1: etcd on separate servers (Recommended)
┌──────────┐  ┌──────────┐  ┌──────────┐
│  etcd1   │  │  etcd2   │  │  etcd3   │
└──────────┘  └──────────┘  └──────────┘
      ▲             ▲             ▲
      └─────────────┴─────────────┘
      │             │             │
┌──────────┐  ┌──────────┐  ┌──────────┐
│Patroni 1 │  │Patroni 2 │  │Patroni 3 │
│  + PG    │  │  + PG    │  │  + PG    │
└──────────┘  └──────────┘  └──────────┘

Option 2: etcd co-located (For labs/dev)
┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│   etcd1      │  │   etcd2      │  │   etcd3      │
│   Patroni 1  │  │   Patroni 2  │  │   Patroni 3  │
│   PG         │  │   PG         │  │   PG         │
└──────────────┘  └──────────────┘  └──────────────┘

Lab này sử dụng Option 2 (co-located) để tiết kiệm resources.

2.2. Cài đặt etcd trên Ubuntu/Debian

Thực hiện trên TẤT CẢ 3 nodes.

Bước 1: Download etcd binary

# Set version
ETCD_VER=v3.5.11

# Download
wget https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz

# Extract
tar xzvf etcd-${ETCD_VER}-linux-amd64.tar.gz

# Move binaries to PATH
sudo mv etcd-${ETCD_VER}-linux-amd64/etcd /usr/local/bin/
sudo mv etcd-${ETCD_VER}-linux-amd64/etcdctl /usr/local/bin/
sudo mv etcd-${ETCD_VER}-linux-amd64/etcdutl /usr/local/bin/

# Verify
etcd --version
etcdctl version

Output:

etcd Version: 3.5.11
Git SHA: ...
Go Version: go1.20.12

Bước 2: Tạo etcd user và directories

# Tạo user
sudo useradd -r -s /bin/false etcd

# Tạo directories
sudo mkdir -p /var/lib/etcd
sudo mkdir -p /etc/etcd

# Set ownership
sudo chown -R etcd:etcd /var/lib/etcd
sudo chown -R etcd:etcd /etc/etcd

2.3. Cài đặt trên CentOS/RHEL

# Download (same as Ubuntu)
ETCD_VER=v3.5.11
wget https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz

tar xzvf etcd-${ETCD_VER}-linux-amd64.tar.gz

sudo mv etcd-${ETCD_VER}-linux-amd64/etcd* /usr/local/bin/

# Create user and directories
sudo useradd -r -s /sbin/nologin etcd
sudo mkdir -p /var/lib/etcd /etc/etcd
sudo chown -R etcd:etcd /var/lib/etcd /etc/etcd

3. Cấu hình etcd cluster 3 nodes

3.1. Network topology

node1 (etcd1): 10.0.1.11:2379,2380
node2 (etcd2): 10.0.1.12:2379,2380
node3 (etcd3): 10.0.1.13:2379,2380

Port 2379: Client communication (Patroni connects here)
Port 2380: Peer communication (etcd cluster internal)

3.2. Tạo configuration file

Node 1 (10.0.1.11) - /etc/etcd/etcd.conf

# Member name
ETCD_NAME="etcd1"

# Data directory
ETCD_DATA_DIR="/var/lib/etcd/etcd1.etcd"

# Listen URLs
ETCD_LISTEN_PEER_URLS="http://10.0.1.11:2380"
ETCD_LISTEN_CLIENT_URLS="http://10.0.1.11:2379,http://127.0.0.1:2379"

# Advertise URLs (what other nodes use to connect)
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.0.1.11:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://10.0.1.11:2379"

# Cluster configuration
ETCD_INITIAL_CLUSTER="etcd1=http://10.0.1.11:2380,etcd2=http://10.0.1.12:2380,etcd3=http://10.0.1.13:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-patroni"

# Logging
ETCD_LOG_LEVEL="info"

Node 2 (10.0.1.12) - /etc/etcd/etcd.conf

ETCD_NAME="etcd2"
ETCD_DATA_DIR="/var/lib/etcd/etcd2.etcd"

ETCD_LISTEN_PEER_URLS="http://10.0.1.12:2380"
ETCD_LISTEN_CLIENT_URLS="http://10.0.1.12:2379,http://127.0.0.1:2379"

ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.0.1.12:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://10.0.1.12:2379"

ETCD_INITIAL_CLUSTER="etcd1=http://10.0.1.11:2380,etcd2=http://10.0.1.12:2380,etcd3=http://10.0.1.13:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-patroni"

ETCD_LOG_LEVEL="info"

Node 3 (10.0.1.13) - /etc/etcd/etcd.conf

ETCD_NAME="etcd3"
ETCD_DATA_DIR="/var/lib/etcd/etcd3.etcd"

ETCD_LISTEN_PEER_URLS="http://10.0.1.13:2380"
ETCD_LISTEN_CLIENT_URLS="http://10.0.1.13:2379,http://127.0.0.1:2379"

ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.0.1.13:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://10.0.1.13:2379"

ETCD_INITIAL_CLUSTER="etcd1=http://10.0.1.11:2380,etcd2=http://10.0.1.12:2380,etcd3=http://10.0.1.13:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-patroni"

ETCD_LOG_LEVEL="info"

3.3. Giải thích các parameters

ParameterÝ nghĩa
ETCD_NAMETên unique của member trong cluster
ETCD_DATA_DIRThư mục lưu data
ETCD_LISTEN_PEER_URLSURL listen cho peer communication (port 2380)
ETCD_LISTEN_CLIENT_URLSURL listen cho client connections (port 2379)
ETCD_INITIAL_ADVERTISE_PEER_URLSURL để các peers khác connect đến
ETCD_ADVERTISE_CLIENT_URLSURL để clients connect đến
ETCD_INITIAL_CLUSTERDanh sách tất cả members khi bootstrap
ETCD_INITIAL_CLUSTER_STATEnew (first time) hoặc existing (add member)
ETCD_INITIAL_CLUSTER_TOKENToken unique cho cluster (tránh nhầm lẫn)

4. Tạo systemd service

Tạo file /etc/systemd/system/etcd.service trên TẤT CẢ 3 nodes:

[Unit]
Description=etcd distributed reliable key-value store
Documentation=https://etcd.io/docs/
After=network.target
Wants=network-online.target

[Service]
Type=notify
User=etcd
Group=etcd

# Load environment variables from config file
EnvironmentFile=/etc/etcd/etcd.conf

# Start etcd with config
ExecStart=/usr/local/bin/etcd

# Restart on failure
Restart=on-failure
RestartSec=5

# Limits
LimitNOFILE=65536
LimitNPROC=65536

# Security
NoNewPrivileges=true
ProtectHome=true
ProtectSystem=strict
ReadWritePaths=/var/lib/etcd

[Install]
WantedBy=multi-user.target

Reload systemd và enable service:

sudo systemctl daemon-reload
sudo systemctl enable etcd

5. Khởi động etcd cluster

5.1. Start etcd trên các nodes

Quan trọng: Khởi động ĐỒNG THỜI hoặc trong vòng 30 giây để cluster có thể form.

Terminal 1 (node1):

sudo systemctl start etcd
sudo systemctl status etcd

Terminal 2 (node2):

sudo systemctl start etcd
sudo systemctl status etcd

Terminal 3 (node3):

sudo systemctl start etcd
sudo systemctl status etcd

5.2. Kiểm tra logs

sudo journalctl -u etcd -f

Successful startup logs:

... etcd1 became leader at term 2
... established a TCP streaming connection with peer etcd2
... established a TCP streaming connection with peer etcd3
... ready to serve client requests

6. Kiểm tra health của etcd cluster

6.1. Check cluster members

# Từ bất kỳ node nào
etcdctl member list

# Output:
# 8e9e05c52164694d, started, etcd1, http://10.0.1.11:2380, http://10.0.1.11:2379, false
# 91bc3c398fb3c146, started, etcd2, http://10.0.1.12:2380, http://10.0.1.12:2379, false
# fd422379fda50e48, started, etcd3, http://10.0.1.13:2380, http://10.0.1.13:2379, false

6.2. Check cluster health

etcdctl endpoint health --cluster

# Output:
# http://10.0.1.11:2379 is healthy: successfully committed proposal: took = 2.345678ms
# http://10.0.1.12:2379 is healthy: successfully committed proposal: took = 1.234567ms
# http://10.0.1.13:2379 is healthy: successfully committed proposal: took = 2.123456ms

6.3. Check endpoint status

etcdctl endpoint status --cluster --write-out=table

# Output:
# +------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# |    ENDPOINT      |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
# +------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | 10.0.1.11:2379  | 8e9e05c52164694d |  3.5.11 |   20 kB |      true |      false |         2 |          8 |                  8 |        |
# | 10.0.1.12:2379  | 91bc3c398fb3c146 |  3.5.11 |   20 kB |     false |      false |         2 |          8 |                  8 |        |
# | 10.0.1.13:2379  | fd422379fda50e48 |  3.5.11 |   20 kB |     false |      false |         2 |          8 |                  8 |        |
# +------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

Giải thích output:

  • IS LEADER: etcd1 đang là leader
  • RAFT TERM: Election term (tăng mỗi lần election)
  • RAFT INDEX: Number of log entries

7. etcdctl commands cơ bản

7.1. Set environment (optional)

export ETCDCTL_API=3
export ETCDCTL_ENDPOINTS=http://10.0.1.11:2379,http://10.0.1.12:2379,http://10.0.1.13:2379

# Thêm vào ~/.bashrc để persistent
echo 'export ETCDCTL_API=3' >> ~/.bashrc
echo 'export ETCDCTL_ENDPOINTS=http://10.0.1.11:2379,http://10.0.1.12:2379,http://10.0.1.13:2379' >> ~/.bashrc

7.2. Basic operations

Put/Get/Delete keys

# Write a key
etcdctl put /test/key1 "Hello etcd"

# Read a key
etcdctl get /test/key1
# Output:
# /test/key1
# Hello etcd

# Get with details
etcdctl get /test/key1 --write-out=json

# Delete a key
etcdctl del /test/key1

List keys with prefix

# Put some test keys
etcdctl put /service/postgres/test1 "value1"
etcdctl put /service/postgres/test2 "value2"

# List all keys under /service/postgres/
etcdctl get /service/postgres/ --prefix

# Output:
# /service/postgres/test1
# value1
# /service/postgres/test2
# value2

Watch for changes

# Terminal 1: Watch for changes
etcdctl watch /service/postgres/ --prefix

# Terminal 2: Make changes
etcdctl put /service/postgres/leader "node1"

# Terminal 1 sẽ hiển thị:
# PUT
# /service/postgres/leader
# node1

TTL keys (dùng cho leader locks)

# Create a lease with 30 seconds TTL
etcdctl lease grant 30
# Output: lease 7587869125995748410 granted with TTL(30s)

# Put key with lease
etcdctl put /test/ttl-key "value" --lease=7587869125995748410

# Key sẽ tự động xóa sau 30 giây

# Keep lease alive
etcdctl lease keep-alive 7587869125995748410

7.3. Advanced operations

Transaction (atomic operations)

# Atomic compare-and-swap
etcdctl txn <<< '
compare:
value("/test/key1") = "old_value"

success requests:
put /test/key1 "new_value"

failure requests:
get /test/key1
'

Snapshot backup

# Create snapshot
etcdctl snapshot save /tmp/etcd-backup.db

# Verify snapshot
etcdctl snapshot status /tmp/etcd-backup.db --write-out=table

8. Lab: Setup etcd cluster hoàn chỉnh

8.1. Lab objectives

  • ✅ Cài đặt etcd trên 3 nodes
  • ✅ Cấu hình cluster
  • ✅ Verify cluster health
  • ✅ Test basic operations
  • ✅ Simulate node failure

8.2. Step-by-step lab guide

1. Cài đặt etcd trên tất cả nodes

Đã thực hiện ở Section 2.

2. Tạo config files

Đã thực hiện ở Section 3.

3. Tạo systemd service

Đã thực hiện ở Section 4.

4. Start cluster

# Trên cả 3 nodes (đồng thời)
sudo systemctl start etcd

# Check status
sudo systemctl status etcd

5. Verify cluster

# Member list
etcdctl member list

# Health check
etcdctl endpoint health --cluster

# Status
etcdctl endpoint status --cluster --write-out=table

6. Test write/read

# On node1: Write
etcdctl put /test/mykey "Hello from etcd cluster"

# On node2: Read
etcdctl get /test/mykey
# Should see: Hello from etcd cluster

# On node3: Verify
etcdctl get /test/mykey
# Should see: Hello from etcd cluster

7. Test leader election

# Identify current leader
etcdctl endpoint status --cluster --write-out=table
# Note which node IS LEADER = true

# Stop leader node
sudo systemctl stop etcd  # On leader node

# Wait 5-10 seconds

# Check from another node
etcdctl endpoint status --cluster --write-out=table
# New leader should be elected

# Restart stopped node
sudo systemctl start etcd  # On stopped node

# Verify rejoined
etcdctl member list

8. Test data persistence

# Write some data
etcdctl put /persistent/key "This should survive restart"

# Restart ALL nodes (one by one)
sudo systemctl restart etcd

# Verify data
etcdctl get /persistent/key
# Should still see: This should survive restart

8.3. Troubleshooting common issues

Issue 1: Cluster won't form

# Symptom
journalctl -u etcd -n 50
# Error: "request cluster ID mismatch"

# Solution: Clear data and restart
sudo systemctl stop etcd
sudo rm -rf /var/lib/etcd/*
sudo systemctl start etcd

Issue 2: Cannot connect to etcd

# Check if etcd is listening
sudo netstat -tlnp | grep etcd
# Should see ports 2379 and 2380

# Check firewall
sudo firewall-cmd --list-all  # CentOS/RHEL
sudo ufw status                # Ubuntu

# Add firewall rules if needed
sudo ufw allow 2379/tcp
sudo ufw allow 2380/tcp

Issue 3: Node won't join cluster

# Check ETCD_INITIAL_CLUSTER in config
cat /etc/etcd/etcd.conf | grep INITIAL_CLUSTER

# Verify network connectivity
ping 10.0.1.11
telnet 10.0.1.11 2380

Issue 4: Split-brain or multiple leaders

# Check cluster status
etcdctl endpoint status --cluster --write-out=table

# If multiple leaders (shouldn't happen with proper setup):
# 1. Stop all etcd instances
sudo systemctl stop etcd  # On all nodes

# 2. Clear data on all nodes
sudo rm -rf /var/lib/etcd/*

# 3. Restart cluster (bootstrap again)
# Start all nodes within 30 seconds

9. Performance tuning

9.1. etcd tuning parameters

# Add to /etc/etcd/etcd.conf

# Heartbeat interval (default: 100ms)
ETCD_HEARTBEAT_INTERVAL="100"

# Election timeout (default: 1000ms)
ETCD_ELECTION_TIMEOUT="1000"

# Snapshot count (default: 10000)
# Compact and snapshot after this many transactions
ETCD_SNAPSHOT_COUNT="10000"

# Quota backend bytes (default: 2GB)
# Max database size
ETCD_QUOTA_BACKEND_BYTES="2147483648"

9.2. Monitoring etcd

Key metrics to monitor:

  • Latency (99th percentile < 50ms)
  • Disk fsync duration (< 10ms)
  • Leader changes (should be rare)
  • Database size
  • Failed proposals

Check metrics:

curl http://10.0.1.11:2379/metrics

# Key metrics:
# etcd_server_has_leader
# etcd_server_leader_changes_seen_total
# etcd_disk_backend_commit_duration_seconds
# etcd_network_peer_round_trip_time_seconds

10. Tổng kết

Key Takeaways

✅ etcd cluster: 3-node cluster cho production HA

✅ Ports: 2379 (client), 2380 (peer)

✅ Raft consensus: Automatic leader election và data replication

✅ Quorum: Cần 2/3 nodes để cluster hoạt động

✅ TTL keys: Dùng cho Patroni leader locks

✅ etcdctl: CLI tool để quản lý và troubleshoot

Checklist sau Lab

  •  etcd cluster 3 nodes đang chạy
  •  etcdctl member list hiển thị đầy đủ 3 members
  •  etcdctl endpoint health --cluster tất cả healthy
  •  Có 1 leader và 2 followers
  •  etcd service enabled và sẽ auto-start khi reboot
  •  Firewall cho phép ports 2379 và 2380

Kiến trúc hiện tại

✅ 3 VMs prepared (Bài 4)
✅ PostgreSQL 15 installed (Bài 5)
✅ etcd cluster running (Bài 6)

Next: Cài đặt Patroni và bootstrap HA cluster

Chuẩn bị cho Bài 7

Bài tiếp theo sẽ cài đặt Patroni và tích hợp với etcd cluster đã setup.

etcd etcd-cluster systemd etcdctl distributed-storage lab

Đánh dấu hoàn thành (Bài 6: Cài đặt và cấu hình etcd cluster)