Bài 4: Chuẩn bị hạ tầng

Hướng dẫn chi tiết yêu cầu phần cứng, cấu hình network/firewall, setup 3 VMs/Servers và đồng bộ thời gian cho cluster HA.

10 min read
Bài 4: Chuẩn bị hạ tầng

Mục tiêu

Sau bài học này, bạn sẽ:

  • Hiểu yêu cầu phần cứng và phần mềm cho Patroni cluster
  • Cấu hình network và firewall
  • Setup 3 VMs/Servers (VirtualBox/VMware/Cloud)
  • Thiết lập SSH key-based authentication
  • Đồng bộ thời gian với NTP/chrony

1. Yêu cầu phần cứng & phần mềm

Kiến trúc Lab

Chúng ta sẽ setup một cluster gồm 3 nodes:

Kiến trúc Lab

Yêu cầu phần cứng (mỗi node)

Minimum (Lab/Dev):

  • CPU: 2 cores
  • RAM: 4 GB
  • Disk: 20 GB (OS) + 20 GB (PostgreSQL data)
  • Network: 1 Gbps

Recommended (Production):

  • CPU: 4-8 cores
  • RAM: 8-32 GB (depends on workload)
  • Disk:
    • OS: 50 GB SSD
    • PostgreSQL data: 100+ GB NVMe SSD
    • WAL: Separate disk (optional, for performance)
  • Network: 10 Gbps, redundant NICs

Storage recommendations:

/dev/sda  → OS (Ubuntu 22.04)
/dev/sdb  → PostgreSQL data (/var/lib/postgresql)
/dev/sdc  → WAL files (/var/lib/postgresql/pg_wal) [optional]

Yêu cầu phần mềm

Operating System:

  • Ubuntu 22.04 LTS (recommended)
  • Rocky Linux 9 / AlmaLinux 9
  • Debian 12

Software Stack:

Component           Version      Purpose
─────────────────────────────────────────────────
PostgreSQL          18.x  Database
Patroni             3.x          HA orchestration
etcd                3.5.x        DCS
Python              3.9+         Patroni runtime
HAProxy (optional)  2.8+         Load balancer
PgBouncer (optional) 1.21+       Connection pooler

Network Requirements

Latency:

  • Between PostgreSQL nodes: < 10ms (same datacenter)
  • Between etcd nodes: < 5ms (critical!)
  • Client to database: < 50ms

Bandwidth:

  • Replication: Depends on write load
  • etcd: Low bandwidth, but low latency critical

Ports to open:

ServicePortProtocolPurpose
PostgreSQL5432TCPDatabase connections
Patroni REST API8008TCPHealth checks, management
etcd client2379TCPClient-to-etcd communication
etcd peer2380TCPetcd cluster communication
SSH22TCPRemote administration

2. Cấu hình network và firewall

IP Planning

Node assignments:

Hostname    IP Address    Role
─────────────────────────────────────
pg-node1    10.0.1.11     PostgreSQL + Patroni + etcd
pg-node2    10.0.1.12     PostgreSQL + Patroni + etcd
pg-node3    10.0.1.13     PostgreSQL + Patroni + etcd

Optional components:

haproxy     10.0.1.10     Load balancer (VIP)
monitoring  10.0.1.20     Prometheus + Grafana

Hostname Configuration

On each node:

# Set hostname
sudo hostnamectl set-hostname pg-node1  # Change for each node

# Edit /etc/hosts
sudo tee -a /etc/hosts << EOF
10.0.1.11   pg-node1
10.0.1.12   pg-node2
10.0.1.13   pg-node3
EOF

# Verify
hostname -f
ping -c 2 pg-node2
ping -c 2 pg-node3

Firewall Configuration (UFW)

On Ubuntu:

# Enable UFW
sudo ufw enable

# Allow SSH
sudo ufw allow 22/tcp

# PostgreSQL
sudo ufw allow from 10.0.1.0/24 to any port 5432

# Patroni REST API
sudo ufw allow from 10.0.1.0/24 to any port 8008

# etcd client port
sudo ufw allow from 10.0.1.0/24 to any port 2379

# etcd peer port
sudo ufw allow from 10.0.1.0/24 to any port 2380

# Verify rules
sudo ufw status numbered

Expected output:

Status: active

     To                         Action      From
     --                         ------      ----
[ 1] 22/tcp                     ALLOW IN    Anywhere
[ 2] 5432                       ALLOW IN    10.0.1.0/24
[ 3] 8008                       ALLOW IN    10.0.1.0/24
[ 4] 2379                       ALLOW IN    10.0.1.0/24
[ 5] 2380                       ALLOW IN    10.0.1.0/24

Firewall Configuration (firewalld)

On Rocky Linux / AlmaLinux:

# Enable firewalld
sudo systemctl enable --now firewalld

# Add services
sudo firewall-cmd --permanent --add-service=postgresql
sudo firewall-cmd --permanent --add-port=8008/tcp
sudo firewall-cmd --permanent --add-port=2379/tcp
sudo firewall-cmd --permanent --add-port=2380/tcp

# Allow from specific subnet
sudo firewall-cmd --permanent --add-rich-rule='
  rule family="ipv4"
  source address="10.0.1.0/24"
  port protocol="tcp" port="5432" accept'

# Reload
sudo firewall-cmd --reload

# Verify
sudo firewall-cmd --list-all

Network Performance Testing

Test latency between nodes:

# Install tools
sudo apt install -y iputils-ping netcat-openbsd iperf3

# Test ping latency
ping -c 10 pg-node2
# Expected: < 1ms same datacenter, < 10ms same region

# Test TCP connectivity
nc -zv pg-node2 5432
nc -zv pg-node2 2379

# Test bandwidth (on receiver node2)
iperf3 -s

# From sender node1
iperf3 -c pg-node2 -t 10
# Expected: > 500 Mbps on 1Gbps network

3. Setup 3 VMs/Servers

Option 1: VirtualBox (Local Development)

Create VM template:

# Download Ubuntu 22.04 ISO
wget https://releases.ubuntu.com/22.04/ubuntu-22.04.3-live-server-amd64.iso

# VirtualBox CLI
VBoxManage createvm --name "pg-node1" --ostype Ubuntu_64 --register

VBoxManage modifyvm "pg-node1" \
  --memory 4096 \
  --cpus 2 \
  --nic1 bridged \
  --bridgeadapter1 en0 \
  --boot1 disk

VBoxManage createhd --filename ~/VirtualBox\ VMs/pg-node1/pg-node1.vdi --size 40960

VBoxManage storagectl "pg-node1" --name "SATA Controller" --add sata --controller IntelAHCI
VBoxManage storageattach "pg-node1" --storagectl "SATA Controller" --port 0 --device 0 \
  --type hdd --medium ~/VirtualBox\ VMs/pg-node1/pg-node1.vdi

# Install OS, then clone for other nodes
VBoxManage clonevm "pg-node1" --name "pg-node2" --register
VBoxManage clonevm "pg-node1" --name "pg-node3" --register

Configure network:

# Edit /etc/netplan/00-installer-config.yaml
network:
  ethernets:
    enp0s3:
      addresses:
        - 10.0.1.11/24
      routes:
        - to: default
          via: 10.0.1.1
      nameservers:
        addresses: [8.8.8.8, 8.8.4.4]
  version: 2

# Apply
sudo netplan apply

Option 2: VMware Workstation

Create VM: 1. New Virtual Machine → Custom 2. Hardware compatibility: Workstation 17.x 3. Install from: ISO image (Ubuntu 22.04) 4. Guest OS: Linux → Ubuntu 64-bit 5. VM name: pg-node1 6. Processors: 2 cores 7. Memory: 4096 MB 8. Network: Bridged or NAT with port forwarding 9. Disk: 40 GB, single file 10. Finish and install OS

Clone for other nodes:

  • Right-click VM → Manage → Clone
  • Create linked clone or full clone
  • Change VM name and network settings

Post-Installation Steps (All Platforms)

Update system:

# Ubuntu/Debian
sudo apt update && sudo apt upgrade -y

# Rocky Linux/AlmaLinux
sudo dnf update -y

# Install essential tools
sudo apt install -y \
  curl \
  wget \
  vim \
  git \
  net-tools \
  htop \
  iotop \
  sysstat \
  build-essential

Disable swap (recommended for databases):

# Check current swap
free -h

# Disable swap
sudo swapoff -a

# Remove from /etc/fstab
sudo sed -i '/swap/d' /etc/fstab

# Verify
free -h

Set system limits:

# Edit /etc/security/limits.conf
sudo tee -a /etc/security/limits.conf << EOF
postgres soft nofile 65536
postgres hard nofile 65536
postgres soft nproc 8192
postgres hard nproc 8192
EOF

# Edit /etc/sysctl.conf
sudo tee -a /etc/sysctl.conf << EOF
# PostgreSQL optimizations
vm.swappiness = 1
vm.overcommit_memory = 2
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10
net.ipv4.tcp_keepalive_time = 200
net.ipv4.tcp_keepalive_intvl = 200
net.ipv4.tcp_keepalive_probes = 5
EOF

# Apply
sudo sysctl -p

4. SSH key-based authentication

Generate SSH keys

On your local machine/jump server:

# Generate SSH key pair
ssh-keygen -t ed25519 -C "patroni-cluster" -f ~/.ssh/patroni_cluster

# Output:
# ~/.ssh/patroni_cluster (private key)
# ~/.ssh/patroni_cluster.pub (public key)

# Set permissions
chmod 600 ~/.ssh/patroni_cluster
chmod 644 ~/.ssh/patroni_cluster.pub

Copy keys to all nodes

# Copy to each node
ssh-copy-id -i ~/.ssh/patroni_cluster.pub ubuntu@10.0.1.11
ssh-copy-id -i ~/.ssh/patroni_cluster.pub ubuntu@10.0.1.12
ssh-copy-id -i ~/.ssh/patroni_cluster.pub ubuntu@10.0.1.13

# Or manually
for node in pg-node1 pg-node2 pg-node3; do
  cat ~/.ssh/patroni_cluster.pub | ssh ubuntu@$node \
    "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys"
done

Configure SSH client

Edit ~/.ssh/config:

cat >> ~/.ssh/config << EOF
Host pg-node*
  User ubuntu
  IdentityFile ~/.ssh/patroni_cluster
  StrictHostKeyChecking no
  UserKnownHostsFile /dev/null

Host pg-node1
  HostName 10.0.1.11

Host pg-node2
  HostName 10.0.1.12

Host pg-node3
  HostName 10.0.1.13
EOF

chmod 600 ~/.ssh/config

Test SSH connectivity

# Test password-less SSH
ssh pg-node1 "hostname && date"
ssh pg-node2 "hostname && date"
ssh pg-node3 "hostname && date"

# Should connect without password prompt

Setup inter-node SSH (for postgres user)

On each node:

# As postgres user (after PostgreSQL installation)
sudo -u postgres ssh-keygen -t ed25519 -N "" -f /var/lib/postgresql/.ssh/id_ed25519

# Copy public key to other nodes
for node in pg-node1 pg-node2 pg-node3; do
  sudo -u postgres ssh-copy-id -i /var/lib/postgresql/.ssh/id_ed25519.pub postgres@$node
done

5. Time synchronization (NTP/chrony)

Why time sync is critical?

Importance:

  • Distributed systems rely on consistent time
  • etcd uses timestamps for leader election
  • PostgreSQL WAL includes timestamps
  • Monitoring and debugging requires accurate time

Acceptable drift: < 500ms (ideally < 100ms)

Ubuntu/Debian:

# Install chrony
sudo apt install -y chrony

# Edit /etc/chrony/chrony.conf
sudo vim /etc/chrony/chrony.conf

Configuration:

# Use public NTP servers
pool ntp.ubuntu.com iburst maxsources 4
pool 0.ubuntu.pool.ntp.org iburst maxsources 1
pool 1.ubuntu.pool.ntp.org iburst maxsources 1
pool 2.ubuntu.pool.ntp.org iburst maxsources 2

# Or use local NTP server
# server 10.0.1.1 iburst

# Record the rate at which the system clock gains/losses time
driftfile /var/lib/chrony/chrony.drift

# Allow NTP client access from local network
allow 10.0.1.0/24

# Serve time even if not synchronized to a time source
local stratum 10

# Specify directory for log files
logdir /var/log/chrony

# Select which information is logged
log measurements statistics tracking

Start and enable:

# Start chrony
sudo systemctl enable --now chrony

# Check status
sudo systemctl status chrony

# Verify time synchronization
chronyc sources -v
chronyc tracking

Expected output:

MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^* time.cloudflare.com           3   6   377    32   +123us[ +456us] +/-   20ms
^+ ntp.ubuntu.com                2   6   377    33   -234us[ -101us] +/-   15ms

Alternative: systemd-timesyncd (Simpler)

Ubuntu/Debian:

# Install (usually pre-installed)
sudo apt install -y systemd-timesyncd

# Edit /etc/systemd/timesyncd.conf
sudo vim /etc/systemd/timesyncd.conf

Configuration:

[Time]
NTP=ntp.ubuntu.com 0.ubuntu.pool.ntp.org 1.ubuntu.pool.ntp.org
FallbackNTP=time.cloudflare.com

Enable and verify:

# Enable
sudo systemctl enable --now systemd-timesyncd

# Check status
timedatectl status
systemctl status systemd-timesyncd

# Should show "System clock synchronized: yes"

Verify time synchronization across cluster

Create verification script:

#!/bin/bash
# check_time_sync.sh

echo "Checking time synchronization across cluster..."
echo "================================================"

for node in pg-node1 pg-node2 pg-node3; do
  echo -n "$node: "
  ssh $node "date '+%Y-%m-%d %H:%M:%S.%N %Z'"
done

echo ""
echo "Time difference check:"
time1=$(ssh pg-node1 "date +%s%N")
time2=$(ssh pg-node2 "date +%s%N")
time3=$(ssh pg-node3 "date +%s%N")

diff12=$(( (time1 - time2) / 1000000 ))  # Convert to milliseconds
diff13=$(( (time1 - time3) / 1000000 ))
diff23=$(( (time2 - time3) / 1000000 ))

echo "node1 vs node2: ${diff12}ms"
echo "node1 vs node3: ${diff13}ms"
echo "node2 vs node3: ${diff23}ms"

if [ ${diff12#-} -lt 100 ] && [ ${diff13#-} -lt 100 ] && [ ${diff23#-} -lt 100 ]; then
  echo "✓ Time synchronization is good (< 100ms)"
else
  echo "✗ WARNING: Time drift detected! Please fix NTP configuration"
fi
chmod +x check_time_sync.sh
./check_time_sync.sh

6. Lab: Complete Infrastructure Setup

Lab Objectives

  • Setup 3 VMs với network đúng
  • Configure firewall cho tất cả required ports
  • Thiết lập SSH passwordless authentication
  • Đồng bộ thời gian với NTP
  • Verify connectivity giữa các nodes

Lab Steps

Step 1: Verify VM specifications

# On each node
ssh pg-node1 "cat /etc/os-release | grep PRETTY_NAME"
ssh pg-node1 "nproc"
ssh pg-node1 "free -h"
ssh pg-node1 "df -h"

# Repeat for node2, node3

Step 2: Network connectivity test

# Create test script
cat > test_connectivity.sh << 'EOF'
#!/bin/bash

NODES=("pg-node1" "pg-node2" "pg-node3")
PORTS=(22 5432 8008 2379 2380)

for node in "${NODES[@]}"; do
  echo "Testing connectivity to $node..."
  for port in "${PORTS[@]}"; do
    if nc -zv -w 2 $node $port 2>&1 | grep -q succeeded; then
      echo "  ✓ Port $port: OK"
    else
      echo "  ✗ Port $port: FAILED"
    fi
  done
  echo ""
done
EOF

chmod +x test_connectivity.sh
./test_connectivity.sh

Step 3: Verify SSH authentication

# Test SSH without password
for node in pg-node1 pg-node2 pg-node3; do
  echo "Testing SSH to $node..."
  ssh -o BatchMode=yes $node "echo 'SSH OK'" || echo "SSH FAILED"
done

Step 4: Check time synchronization

./check_time_sync.sh

Step 5: Run comprehensive validation

cat > validate_infrastructure.sh << 'EOF'
#!/bin/bash

RED='\033[0;31m'
GREEN='\033[0;32m'
NC='\033[0m' # No Color

NODES=("pg-node1" "pg-node2" "pg-node3")

echo "========================================="
echo "Infrastructure Validation Report"
echo "========================================="
echo ""

for node in "${NODES[@]}"; do
  echo "Checking $node..."
  
  # Hostname
  hostname=$(ssh $node "hostname")
  echo "  Hostname: $hostname"
  
  # IP Address
  ip=$(ssh $node "hostname -I | awk '{print \$1}'")
  echo "  IP: $ip"
  
  # CPU/RAM
  cpu=$(ssh $node "nproc")
  ram=$(ssh $node "free -h | grep Mem | awk '{print \$2}'")
  echo "  CPU: ${cpu} cores, RAM: ${ram}"
  
  # Disk
  disk=$(ssh $node "df -h / | tail -1 | awk '{print \$4}'")
  echo "  Disk free: $disk"
  
  # Firewall
  firewall=$(ssh $node "sudo ufw status | grep Status | awk '{print \$2}'")
  echo "  Firewall: $firewall"
  
  # Time sync
  timesync=$(ssh $node "timedatectl | grep 'System clock synchronized' | awk '{print \$4}'")
  if [ "$timesync" == "yes" ]; then
    echo -e "  Time sync: ${GREEN}✓${NC}"
  else
    echo -e "  Time sync: ${RED}✗${NC}"
  fi
  
  echo ""
done

echo "========================================="
echo "Connectivity Matrix"
echo "========================================="

for src in "${NODES[@]}"; do
  for dst in "${NODES[@]}"; do
    if [ "$src" != "$dst" ]; then
      if ssh $src "ping -c 1 -W 1 $dst" > /dev/null 2>&1; then
        echo -e "$src → $dst: ${GREEN}✓${NC}"
      else
        echo -e "$src → $dst: ${RED}✗${NC}"
      fi
    fi
  done
done

echo ""
echo "========================================="
echo "Validation Complete"
echo "========================================="
EOF

chmod +x validate_infrastructure.sh
./validate_infrastructure.sh

Expected output (all green checkmarks):

=========================================
Infrastructure Validation Report
=========================================

Checking pg-node1...
  Hostname: pg-node1
  IP: 10.0.1.11
  CPU: 2 cores, RAM: 4.0Gi
  Disk free: 25G
  Firewall: active
  Time sync: ✓

[... similar for node2, node3 ...]

=========================================
Connectivity Matrix
=========================================
pg-node1 → pg-node2: ✓
pg-node1 → pg-node3: ✓
pg-node2 → pg-node1: ✓
pg-node2 → pg-node3: ✓
pg-node3 → pg-node1: ✓
pg-node3 → pg-node2: ✓

7. Tổng kết

Checklist Infrastructure

Trước khi tiếp tục bài 5, đảm bảo:

✅ 3 VMs/Servers ready với đủ CPU, RAM, disk

✅ Networking configured: static IPs, /etc/hosts

✅ Firewall rules: ports 22, 5432, 8008, 2379, 2380

✅ SSH keys deployed, passwordless authentication works

✅ Time sync configured với chrony/timesyncd

✅ System optimized: swap disabled, kernel parameters tuned

✅ Connectivity verified: all nodes can reach each other

Troubleshooting

Problem: SSH connection refused

# Check SSH service
sudo systemctl status sshd

# Check firewall
sudo ufw status | grep 22

Problem: Time drift detected

# Force time sync
sudo chronyc makestep

# Or restart chrony
sudo systemctl restart chrony

Problem: Network unreachable

# Check network interface
ip addr show

# Check routing
ip route show

# Restart networking
sudo systemctl restart systemd-networkd

Câu hỏi ôn tập

  1. Tại sao cần ít nhất 3 nodes cho Patroni cluster?
  2. Firewall cần mở những ports nào? Tại sao?
  3. Tại sao time synchronization quan trọng cho distributed system?
  4. Swap có nên enable cho PostgreSQL server không? Tại sao?
  5. Latency giữa các etcd nodes nên là bao nhiêu?

Chuẩn bị cho bài tiếp theo

Bài 5 sẽ hướng dẫn cài đặt PostgreSQL:

  • Cài PostgreSQL từ package repository
  • Cấu hình postgresql.conf
  • Thiết lập pg_hba.conf
  • Lab: Cài đặt PostgreSQL trên 3 nodes
infrastructure setup networking firewall ssh ntp prerequisites

Đánh dấu hoàn thành (Bài 4: Chuẩn bị hạ tầng)