Bài 26: Automation với Ansible
Tạo Ansible playbooks cho deployment, configuration management, automated testing và tích hợp CI/CD cho PostgreSQL HA cluster.
Bài 26: Automation với Ansible
Mục tiêu
Sau bài học này, bạn sẽ:
- Automate Patroni cluster deployment with Ansible
- Manage configuration with playbooks
- Implement automated testing
- Integrate database changes into CI/CD
- Use Infrastructure as Code principles
1. Ansible Basics for PostgreSQL
1.1. Install Ansible
# Install Ansible
sudo apt-get update
sudo apt-get install -y ansible
# Or via pip
pip3 install ansible
# Verify
ansible --version
# ansible [core 2.15.5]
1.2. Inventory file
# inventory.ini
[postgres_cluster]
pg-node1 ansible_host=10.0.1.11 ansible_user=ubuntu
pg-node2 ansible_host=10.0.1.12 ansible_user=ubuntu
pg-node3 ansible_host=10.0.1.13 ansible_user=ubuntu
[etcd_cluster]
etcd-node1 ansible_host=10.0.1.11 ansible_user=ubuntu
etcd-node2 ansible_host=10.0.1.12 ansible_user=ubuntu
etcd-node3 ansible_host=10.0.1.13 ansible_user=ubuntu
[all:vars]
ansible_python_interpreter=/usr/bin/python3
ansible_ssh_private_key_file=~/.ssh/id_rsa
1.3. Ansible configuration
# ansible.cfg
[defaults]
inventory = inventory.ini
host_key_checking = False
retry_files_enabled = False
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
pipelining = True
2. Complete Patroni Deployment Playbook
2.1. Main playbook
# site.yml
---
- name: Deploy PostgreSQL HA Cluster with Patroni
hosts: all
become: yes
vars_files:
- vars/main.yml
roles:
- common
- etcd
- postgresql
- patroni
- haproxy
- monitoring
2.2. Variables
# vars/main.yml
---
# PostgreSQL
postgresql_version: 18
postgresql_data_dir: /var/lib/postgresql/{{ postgresql_version }}/data
postgresql_bin_dir: /usr/lib/postgresql/{{ postgresql_version }}/bin
# Patroni
patroni_scope: postgres-cluster
patroni_namespace: /service/
# etcd
etcd_version: 3.5.11
etcd_data_dir: /var/lib/etcd
etcd_initial_cluster_token: etcd-cluster-token
# Cluster
cluster_nodes:
- { name: pg-node1, ip: 10.0.1.11 }
- { name: pg-node2, ip: 10.0.1.12 }
- { name: pg-node3, ip: 10.0.1.13 }
# Passwords (use Ansible Vault in production!)
postgres_password: !vault |
$ANSIBLE_VAULT;1.1;AES256
...encrypted...
replicator_password: !vault |
$ANSIBLE_VAULT;1.1;AES256
...encrypted...
2.3. Common role
# roles/common/tasks/main.yml
---
- name: Update apt cache
apt:
update_cache: yes
cache_valid_time: 3600
- name: Install common packages
apt:
name:
- curl
- wget
- vim
- git
- htop
- net-tools
- python3
- python3-pip
state: present
- name: Set timezone
timezone:
name: Asia/Ho_Chi_Minh
- name: Configure sysctl for PostgreSQL
sysctl:
name: "{{ item.name }}"
value: "{{ item.value }}"
state: present
reload: yes
loop:
- { name: 'vm.swappiness', value: '1' }
- { name: 'vm.overcommit_memory', value: '2' }
- { name: 'vm.dirty_ratio', value: '10' }
- { name: 'vm.dirty_background_ratio', value: '3' }
- { name: 'net.ipv4.tcp_keepalive_time', value: '60' }
- { name: 'net.ipv4.tcp_keepalive_intvl', value: '10' }
- { name: 'net.ipv4.tcp_keepalive_probes', value: '6' }
- name: Set system limits
pam_limits:
domain: postgres
limit_type: "{{ item.type }}"
limit_item: "{{ item.item }}"
value: "{{ item.value }}"
loop:
- { type: 'soft', item: 'nofile', value: '65536' }
- { type: 'hard', item: 'nofile', value: '65536' }
- { type: 'soft', item: 'nproc', value: '8192' }
- { type: 'hard', item: 'nproc', value: '8192' }
2.4. etcd role
# roles/etcd/tasks/main.yml
---
- name: Create etcd user
user:
name: etcd
shell: /bin/false
system: yes
home: "{{ etcd_data_dir }}"
- name: Download etcd
get_url:
url: "https://github.com/etcd-io/etcd/releases/download/v{{ etcd_version }}/etcd-v{{ etcd_version }}-linux-amd64.tar.gz"
dest: /tmp/etcd.tar.gz
- name: Extract etcd
unarchive:
src: /tmp/etcd.tar.gz
dest: /tmp
remote_src: yes
- name: Install etcd binaries
copy:
src: "/tmp/etcd-v{{ etcd_version }}-linux-amd64/{{ item }}"
dest: /usr/local/bin/{{ item }}
mode: '0755'
remote_src: yes
loop:
- etcd
- etcdctl
- name: Create etcd data directory
file:
path: "{{ etcd_data_dir }}"
state: directory
owner: etcd
group: etcd
mode: '0755'
- name: Template etcd systemd service
template:
src: etcd.service.j2
dest: /etc/systemd/system/etcd.service
notify: restart etcd
- name: Start and enable etcd
systemd:
name: etcd
state: started
enabled: yes
daemon_reload: yes
{# roles/etcd/templates/etcd.service.j2 #}
[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target
[Service]
Type=notify
User=etcd
ExecStart=/usr/local/bin/etcd \
--name {{ ansible_hostname }} \
--data-dir {{ etcd_data_dir }} \
--initial-advertise-peer-urls http://{{ ansible_default_ipv4.address }}:2380 \
--listen-peer-urls http://{{ ansible_default_ipv4.address }}:2380 \
--listen-client-urls http://{{ ansible_default_ipv4.address }}:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://{{ ansible_default_ipv4.address }}:2379 \
--initial-cluster-token {{ etcd_initial_cluster_token }} \
--initial-cluster {% for node in cluster_nodes %}{{ node.name }}=http://{{ node.ip }}:2380{% if not loop.last %},{% endif %}{% endfor %} \
--initial-cluster-state new
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
2.5. PostgreSQL role
# roles/postgresql/tasks/main.yml
---
- name: Add PostgreSQL apt key
apt_key:
url: https://www.postgresql.org/media/keys/ACCC4CF8.asc
state: present
- name: Add PostgreSQL repository
apt_repository:
repo: "deb http://apt.postgresql.org/pub/repos/apt/ {{ ansible_distribution_release }}-pgdg main"
state: present
- name: Install PostgreSQL
apt:
name:
- "postgresql-{{ postgresql_version }}"
- "postgresql-contrib-{{ postgresql_version }}"
- "postgresql-server-dev-{{ postgresql_version }}"
state: present
update_cache: yes
- name: Stop and disable PostgreSQL (managed by Patroni)
systemd:
name: "postgresql@{{ postgresql_version }}-main"
state: stopped
enabled: no
ignore_errors: yes
- name: Create PostgreSQL directories
file:
path: "{{ item }}"
state: directory
owner: postgres
group: postgres
mode: '0700'
loop:
- "{{ postgresql_data_dir }}"
- /var/lib/postgresql/wal_archive
- /var/lib/postgresql/backups
2.6. Patroni role
# roles/patroni/tasks/main.yml
---
- name: Install Python dependencies
pip:
name:
- patroni[etcd]
- psycopg2-binary
- python-etcd
state: present
executable: pip3
- name: Create Patroni configuration directory
file:
path: /etc/patroni
state: directory
owner: postgres
group: postgres
mode: '0755'
- name: Template Patroni configuration
template:
src: patroni.yml.j2
dest: /etc/patroni/patroni.yml
owner: postgres
group: postgres
mode: '0600'
notify: restart patroni
- name: Template Patroni systemd service
template:
src: patroni.service.j2
dest: /etc/systemd/system/patroni.service
notify: restart patroni
- name: Start and enable Patroni
systemd:
name: patroni
state: started
enabled: yes
daemon_reload: yes
- name: Wait for Patroni to be ready
wait_for:
port: 8008
timeout: 60
{# roles/patroni/templates/patroni.yml.j2 #}
scope: {{ patroni_scope }}
name: {{ ansible_hostname }}
restapi:
listen: {{ ansible_default_ipv4.address }}:8008
connect_address: {{ ansible_default_ipv4.address }}:8008
etcd:
hosts: {% for node in cluster_nodes %}{{ node.ip }}:2379{% if not loop.last %},{% endif %}{% endfor %}
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
max_connections: 100
shared_buffers: 256MB
effective_cache_size: 1GB
maintenance_work_mem: 64MB
checkpoint_completion_target: 0.9
wal_buffers: 16MB
default_statistics_target: 100
random_page_cost: 1.1
effective_io_concurrency: 200
work_mem: 2621kB
min_wal_size: 1GB
max_wal_size: 4GB
max_worker_processes: 4
max_parallel_workers_per_gather: 2
max_parallel_workers: 4
max_parallel_maintenance_workers: 2
wal_level: replica
max_wal_senders: 10
max_replication_slots: 10
hot_standby: on
archive_mode: on
archive_command: 'test ! -f /var/lib/postgresql/wal_archive/%f && cp %p /var/lib/postgresql/wal_archive/%f'
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 0.0.0.0/0 scram-sha-256
- host all all 0.0.0.0/0 scram-sha-256
postgresql:
listen: 0.0.0.0:5432
connect_address: {{ ansible_default_ipv4.address }}:5432
data_dir: {{ postgresql_data_dir }}
bin_dir: {{ postgresql_bin_dir }}
authentication:
replication:
username: replicator
password: {{ replicator_password }}
superuser:
username: postgres
password: {{ postgres_password }}
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
3. Deployment
3.1. Run playbook
# Dry run (check mode)
ansible-playbook site.yml --check
# Execute
ansible-playbook site.yml
# With verbose output
ansible-playbook site.yml -vvv
# Specific tags
ansible-playbook site.yml --tags "postgresql,patroni"
3.2. Verify deployment
# verify.yml
---
- name: Verify Patroni cluster
hosts: postgres_cluster
tasks:
- name: Check Patroni service
systemd:
name: patroni
state: started
register: patroni_status
- name: Get cluster status
command: patronictl -c /etc/patroni/patroni.yml list
register: cluster_status
changed_when: false
- name: Display cluster status
debug:
var: cluster_status.stdout_lines
- name: Check PostgreSQL connectivity
postgresql_ping:
db: postgres
login_host: localhost
login_user: postgres
login_password: "{{ postgres_password }}"
become_user: postgres
ansible-playbook verify.yml
4. Configuration Management
4.1. Dynamic configuration update
# update_config.yml
---
- name: Update Patroni configuration
hosts: postgres_cluster
become: yes
vars:
new_max_connections: 200
tasks:
- name: Update DCS configuration
shell: |
patronictl -c /etc/patroni/patroni.yml edit-config --apply - <<EOF
postgresql:
parameters:
max_connections: {{ new_max_connections }}
EOF
run_once: true
register: config_update
- name: Restart nodes if needed
shell: patronictl -c /etc/patroni/patroni.yml restart {{ patroni_scope }} {{ ansible_hostname }} --force
when: "'Pending restart' in config_update.stdout"
4.2. Backup automation
# backup.yml
---
- name: Perform PostgreSQL backup
hosts: postgres_cluster[0] # Only on leader
become: yes
become_user: postgres
vars:
backup_dir: /var/lib/postgresql/backups
backup_retention_days: 7
tasks:
- name: Create backup directory
file:
path: "{{ backup_dir }}"
state: directory
mode: '0700'
- name: Run pg_basebackup
shell: |
pg_basebackup -D {{ backup_dir }}/backup_$(date +%Y%m%d_%H%M%S) \
-Ft -z -Xs -P
args:
creates: "{{ backup_dir }}/backup_*"
- name: Remove old backups
find:
paths: "{{ backup_dir }}"
age: "{{ backup_retention_days }}d"
recurse: yes
register: old_backups
- name: Delete old backups
file:
path: "{{ item.path }}"
state: absent
loop: "{{ old_backups.files }}"
5. Testing Automation
5.1. Molecule for testing
# Install Molecule
pip3 install molecule molecule-plugins[docker]
# Initialize Molecule scenario
cd roles/patroni
molecule init scenario --driver-name docker
# roles/patroni/molecule/default/molecule.yml
---
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: pg-node1
image: ubuntu:22.04
pre_build_image: true
- name: pg-node2
image: ubuntu:22.04
pre_build_image: true
- name: pg-node3
image: ubuntu:22.04
pre_build_image: true
provisioner:
name: ansible
verifier:
name: ansible
# roles/patroni/molecule/default/verify.yml
---
- name: Verify
hosts: all
tasks:
- name: Check Patroni is running
systemd:
name: patroni
state: started
register: result
failed_when: result.status.ActiveState != 'active'
- name: Check cluster has leader
shell: patronictl -c /etc/patroni/patroni.yml list | grep Leader
register: leader_check
failed_when: leader_check.rc != 0
# Run tests
molecule test
5.2. Testinfra for validation
pip3 install testinfra
# tests/test_patroni.py
import testinfra
def test_patroni_service(host):
"""Test Patroni service is running"""
service = host.service("patroni")
assert service.is_running
assert service.is_enabled
def test_postgresql_port(host):
"""Test PostgreSQL port is listening"""
assert host.socket("tcp://0.0.0.0:5432").is_listening
def test_patroni_rest_api(host):
"""Test Patroni REST API"""
assert host.socket("tcp://0.0.0.0:8008").is_listening
def test_etcd_connectivity(host):
"""Test etcd cluster health"""
cmd = host.run("etcdctl endpoint health")
assert cmd.rc == 0
assert "healthy" in cmd.stdout
def test_cluster_has_leader(host):
"""Test cluster has exactly one leader"""
cmd = host.run("patronictl -c /etc/patroni/patroni.yml list")
assert cmd.rc == 0
assert cmd.stdout.count("Leader") == 1
def test_replication_lag(host):
"""Test replication lag is low"""
cmd = host.run("sudo -u postgres psql -Atc \"SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) FROM pg_stat_replication;\"")
if cmd.rc == 0 and cmd.stdout:
lag = int(cmd.stdout.strip())
assert lag < 1048576 # < 1MB lag
# Run tests
pytest tests/test_patroni.py -v
6. CI/CD Integration
6.1. GitLab CI example
# .gitlab-ci.yml
stages:
- lint
- test
- deploy_staging
- deploy_production
variables:
ANSIBLE_FORCE_COLOR: "true"
lint:
stage: lint
image: python:3.11
before_script:
- pip install ansible-lint yamllint
script:
- ansible-lint site.yml
- yamllint .
only:
- merge_requests
- main
test:
stage: test
image: python:3.11
before_script:
- pip install molecule molecule-plugins[docker] testinfra
script:
- molecule test
only:
- merge_requests
- main
deploy_staging:
stage: deploy_staging
image: python:3.11
before_script:
- pip install ansible
- eval $(ssh-agent -s)
- echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add -
script:
- ansible-playbook -i inventory/staging.ini site.yml
only:
- main
environment:
name: staging
deploy_production:
stage: deploy_production
image: python:3.11
before_script:
- pip install ansible
- eval $(ssh-agent -s)
- echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add -
script:
- ansible-playbook -i inventory/production.ini site.yml
only:
- tags
when: manual
environment:
name: production
6.2. GitHub Actions example
# .github/workflows/deploy.yml
name: Deploy Patroni Cluster
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install ansible-lint yamllint
- name: Run linters
run: |
ansible-lint site.yml
yamllint .
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install Molecule
run: |
pip install molecule molecule-plugins[docker]
- name: Run Molecule tests
run: |
molecule test
deploy_staging:
needs: [lint, test]
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install Ansible
run: pip install ansible
- name: Deploy to staging
env:
ANSIBLE_HOST_KEY_CHECKING: False
run: |
echo "${{ secrets.SSH_PRIVATE_KEY }}" > private_key
chmod 600 private_key
ansible-playbook -i inventory/staging.ini site.yml --private-key=private_key
7. Best Practices
✅ DO
- Use Ansible Vault - Encrypt secrets
- Idempotent playbooks - Can run multiple times
- Test in Molecule - Before production
- Version control - Git for all playbooks
- Document variables - Clear README
- Use roles - Modular organization
- Tag tasks - Selective execution
- CI/CD integration - Automated testing
- Dry runs - Always --check first
- Backup before changes - Safety net
❌ DON'T
- Don't hardcode secrets - Use Vault
- Don't skip testing - Staging first
- Don't use shell when module exists - Use PostgreSQL modules
- Don't ignore failed tasks - Handle errors
- Don't run without backups - Always backup first
8. Lab Exercises
Lab 1: Deploy cluster with Ansible
Tasks:
- Setup inventory for 3 nodes
- Create playbook with roles
- Deploy etcd cluster
- Deploy PostgreSQL + Patroni
- Verify cluster health
Lab 2: Configuration management
Tasks:
- Update max_connections via playbook
- Automate nightly backups
- Create playbook for DR failover
- Test configuration rollback
- Document all playbooks
Lab 3: CI/CD pipeline
Tasks:
- Setup GitLab/GitHub Actions
- Add linting stage
- Add Molecule testing
- Deploy to staging automatically
- Manual approval for production
Lab 4: Testing with Molecule
Tasks:
- Initialize Molecule scenario
- Write verification tests
- Test role in Docker containers
- Validate cluster functionality
- Integrate into CI pipeline
9. Tổng kết
Automation Benefits
Manual vs Automated:
- Deployment time: 4 hours → 15 minutes
- Error rate: 30% → < 1%
- Consistency: Variable → 100%
- Documentation: Outdated → Self-documenting
- Repeatability: Difficult → Trivial
Key Ansible Concepts
Inventory: Define hosts
Playbooks: Define tasks
Roles: Modular organization
Variables: Configuration data
Vault: Secret management
Modules: Reusable components
Handlers: Triggered actions
Tags: Selective execution
Next Steps
Bài 27 sẽ cover Disaster Recovery Drills:
- DR planning procedures
- Testing methodologies
- Incident response workflows
- Post-mortem analysis
- Full DR simulation labs