Bài 12: Monitoring và Logging trong NGINX

1. Access Logs và Error Logs

1.1. Default Log Configuration

http {
    # Default log format
    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for"';
    
    # Access log
    access_log /var/log/nginx/access.log main;
    
    # Error log with level
    error_log /var/log/nginx/error.log warn;
    
    server {
        listen 80;
        server_name example.com;
        
        # Server-specific logs
        access_log /var/log/nginx/example.com.access.log main;
        error_log /var/log/nginx/example.com.error.log;
        
        location / {
            root /var/www/html;
        }
        
        # Disable logging for specific location
        location /health {
            access_log off;
            return 200 "OK\n";
        }
        
        # Location-specific log
        location /api/ {
            access_log /var/log/nginx/api.access.log main;
            proxy_pass http://backend;
        }
    }
}

1.2. Error Log Levels

# Error log levels (highest to lowest severity)
error_log /var/log/nginx/error.log emerg;   # System is unusable
error_log /var/log/nginx/error.log alert;   # Action must be taken immediately
error_log /var/log/nginx/error.log crit;    # Critical conditions
error_log /var/log/nginx/error.log error;   # Error conditions (default)
error_log /var/log/nginx/error.log warn;    # Warning conditions
error_log /var/log/nginx/error.log notice;  # Normal but significant
error_log /var/log/nginx/error.log info;    # Informational messages
error_log /var/log/nginx/error.log debug;   # Debug messages

# Recommended for production
error_log /var/log/nginx/error.log warn;

# For debugging
error_log /var/log/nginx/error.log debug;

1.3. Custom Log Formats

Detailed log format:

http {
    log_format detailed '$remote_addr - $remote_user [$time_local] '
                       '"$request" $status $body_bytes_sent '
                       '"$http_referer" "$http_user_agent" '
                       'rt=$request_time uct="$upstream_connect_time" '
                       'uht="$upstream_header_time" urt="$upstream_response_time"';
    
    access_log /var/log/nginx/access.log detailed;
}

JSON log format:

http {
    log_format json_combined escape=json
    '{'
        '"time_local":"$time_local",'
        '"remote_addr":"$remote_addr",'
        '"remote_user":"$remote_user",'
        '"request":"$request",'
        '"status": "$status",'
        '"body_bytes_sent":"$body_bytes_sent",'
        '"request_time":"$request_time",'
        '"http_referrer":"$http_referer",'
        '"http_user_agent":"$http_user_agent",'
        '"http_x_forwarded_for":"$http_x_forwarded_for",'
        '"upstream_addr":"$upstream_addr",'
        '"upstream_status":"$upstream_status",'
        '"upstream_response_time":"$upstream_response_time",'
        '"upstream_connect_time":"$upstream_connect_time",'
        '"upstream_header_time":"$upstream_header_time"'
    '}';
    
    access_log /var/log/nginx/access.log json_combined;
}

Cache status log:

http {
    log_format cache_status '$remote_addr - [$time_local] "$request" '
                           '$status $body_bytes_sent '
                           '"$http_referer" "$http_user_agent" '
                           'cache_status=$upstream_cache_status '
                           'response_time=$request_time';
    
    access_log /var/log/nginx/cache.log cache_status;
}

Performance tracking log:

http {
    log_format performance '$time_iso8601 $remote_addr '
                          '"$request" $status $body_bytes_sent '
                          'rt=$request_time '
                          'ua="$upstream_addr" '
                          'us=$upstream_status '
                          'ut=$upstream_response_time '
                          'ul="$upstream_response_length" '
                          'cs=$upstream_cache_status';
    
    access_log /var/log/nginx/performance.log performance;
}

Security log:

http {
    log_format security '$remote_addr [$time_local] '
                       '"$request" $status '
                       'user_agent="$http_user_agent" '
                       'referer="$http_referer" '
                       'forwarded_for="$http_x_forwarded_for" '
                       'host="$host"';
    
    # Log only suspicious requests
    map $status $loggable {
        ~^[23] 0;
        default 1;
    }
    
    access_log /var/log/nginx/security.log security if=$loggable;
}

1.4. Conditional Logging

http {
    # Don't log successful health checks
    map $request_uri $loggable_request {
        ~^/health$ 0;
        ~^/ping$ 0;
        default 1;
    }
    
    # Don't log static files
    map $request_uri $loggable_static {
        ~*\.(jpg|jpeg|png|gif|ico|css|js)$ 0;
        default 1;
    }
    
    # Combine conditions
    map "$loggable_request:$loggable_static" $final_loggable {
        "0:0" 0;
        "0:1" 0;
        "1:0" 0;
        default 1;
    }
    
    server {
        listen 80;
        
        access_log /var/log/nginx/access.log combined if=$final_loggable;
        
        # Or per-location
        location /api/ {
            access_log /var/log/nginx/api.log combined;
            proxy_pass http://backend;
        }
        
        location /static/ {
            access_log off;
            root /var/www;
        }
    }
}

1.5. Log Variables

Available variables:

# Client information
$remote_addr          # Client IP address
$remote_user          # Client username (HTTP auth)
$remote_port          # Client port

# Request information
$request              # Full request line
$request_method       # GET, POST, etc.
$request_uri          # Full URI with arguments
$uri                  # URI without arguments
$args                 # Query string
$scheme               # http or https
$server_protocol      # HTTP version (HTTP/1.1, HTTP/2.0)

# Response information
$status               # Response status code
$body_bytes_sent      # Bytes sent to client
$bytes_sent           # Total bytes sent (including headers)

# Timing information
$request_time         # Total request processing time
$upstream_response_time    # Backend response time
$upstream_connect_time     # Time to connect to backend
$upstream_header_time      # Time to receive headers from backend

# Upstream information
$upstream_addr        # Backend server address
$upstream_status      # Backend response status
$upstream_cache_status     # Cache status (HIT, MISS, etc.)

# Headers
$http_referer         # Referer header
$http_user_agent      # User-Agent header
$http_x_forwarded_for # X-Forwarded-For header

# Time
$time_local           # Local time
$time_iso8601         # ISO 8601 format
$msec                 # Unix timestamp with milliseconds

2. Log Rotation

2.1. Logrotate Configuration

# /etc/logrotate.d/nginx
/var/log/nginx/*.log {
    daily                    # Rotate daily
    missingok               # Don't error if log is missing
    rotate 14               # Keep 14 days of logs
    compress                # Compress rotated logs
    delaycompress          # Compress after one rotation
    notifempty             # Don't rotate if empty
    create 0640 nginx adm  # Create new file with permissions
    sharedscripts          # Run postrotate once for all logs
    postrotate
        if [ -f /var/run/nginx.pid ]; then
            kill -USR1 `cat /var/run/nginx.pid`
        fi
    endscript
}

Weekly rotation:

# /etc/logrotate.d/nginx-weekly
/var/log/nginx/access.log
/var/log/nginx/error.log {
    weekly
    rotate 52
    compress
    delaycompress
    notifempty
    create 0640 nginx adm
    sharedscripts
    postrotate
        [ -f /var/run/nginx.pid ] && kill -USR1 `cat /var/run/nginx.pid`
    endscript
}

Size-based rotation:

# /etc/logrotate.d/nginx-size
/var/log/nginx/*.log {
    size 100M           # Rotate when file reaches 100MB
    rotate 10           # Keep 10 rotated files
    compress
    delaycompress
    notifempty
    create 0640 nginx adm
    sharedscripts
    postrotate
        [ -f /var/run/nginx.pid ] && kill -USR1 `cat /var/run/nginx.pid`
    endscript
}

Test logrotate:

# Test configuration
sudo logrotate -d /etc/logrotate.d/nginx

# Force rotation
sudo logrotate -f /etc/logrotate.d/nginx

# Check status
cat /var/lib/logrotate/status

2.2. Manual Log Rotation Script

#!/bin/bash
# rotate_nginx_logs.sh

LOG_DIR="/var/log/nginx"
BACKUP_DIR="/var/log/nginx/archive"
DAYS_TO_KEEP=30

# Create backup directory
mkdir -p $BACKUP_DIR

# Get current date
DATE=$(date +%Y%m%d-%H%M%S)

# Rotate logs
for log in access.log error.log; do
    if [ -f "$LOG_DIR/$log" ]; then
        # Move log file
        mv "$LOG_DIR/$log" "$BACKUP_DIR/${log%.*}-$DATE.log"
        
        # Compress
        gzip "$BACKUP_DIR/${log%.*}-$DATE.log"
        
        # Create new empty log
        touch "$LOG_DIR/$log"
        chmod 640 "$LOG_DIR/$log"
        chown nginx:adm "$LOG_DIR/$log"
    fi
done

# Reload Nginx
nginx -s reopen

# Delete old logs
find $BACKUP_DIR -name "*.gz" -mtime +$DAYS_TO_KEEP -delete

echo "Log rotation complete: $DATE"

Cron job:

# /etc/cron.d/nginx-logrotate
0 0 * * * root /usr/local/bin/rotate_nginx_logs.sh >> /var/log/nginx-rotation.log 2>&1

3. Log Analysis Tools

3.1. GoAccess (Real-time Web Log Analyzer)

Install GoAccess:

# Ubuntu/Debian
sudo apt install goaccess

# CentOS/RHEL
sudo yum install goaccess

# macOS
brew install goaccess

Analyze logs:

# Real-time terminal dashboard
sudo goaccess /var/log/nginx/access.log -c

# Generate HTML report
sudo goaccess /var/log/nginx/access.log -o /var/www/html/report.html --log-format=COMBINED

# Real-time HTML dashboard
sudo goaccess /var/log/nginx/access.log -o /var/www/html/report.html --log-format=COMBINED --real-time-html

# With custom log format
sudo goaccess /var/log/nginx/access.log --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u"' --date-format=%d/%b/%Y --time-format=%H:%M:%S

GoAccess configuration:

# /etc/goaccess/goaccess.conf

# Log format
log-format %h %^[%d:%t %^] "%r" %s %b "%R" "%u"
date-format %d/%b/%Y
time-format %H:%M:%S

# UI options
color-scheme 1
hl-header true

# Output options
html-prefs {"theme":"bright","perPage":10}
html-report-title "Nginx Statistics"

# Enable/disable panels
enable-panel VISITORS
enable-panel REQUESTS
enable-panel REQUESTS_STATIC
enable-panel NOT_FOUND
enable-panel HOSTS
enable-panel OS
enable-panel BROWSERS
enable-panel STATUS_CODES
enable-panel REFERRING_SITES
enable-panel KEYPHRASES
enable-panel GEO_LOCATION

3.2. AWK Scripts for Log Analysis

Requests per second:

#!/bin/bash
# requests_per_second.sh

awk '{print $4}' /var/log/nginx/access.log | \
    cut -d: -f1-3 | \
    uniq -c | \
    awk '{print $2, $1}' | \
    sort

Top 10 IP addresses:

#!/bin/bash
# top_ips.sh

awk '{print $1}' /var/log/nginx/access.log | \
    sort | \
    uniq -c | \
    sort -rn | \
    head -10

Top 10 URLs:

#!/bin/bash
# top_urls.sh

awk '{print $7}' /var/log/nginx/access.log | \
    sort | \
    uniq -c | \
    sort -rn | \
    head -10

Status code distribution:

#!/bin/bash
# status_codes.sh

awk '{print $9}' /var/log/nginx/access.log | \
    sort | \
    uniq -c | \
    sort -rn

Average response time:

#!/bin/bash
# avg_response_time.sh

# Assuming response time is logged
awk '{sum+=$NF; count++} END {print "Average:", sum/count "s"}' /var/log/nginx/access.log

Requests by hour:

#!/bin/bash
# requests_by_hour.sh

awk '{print $4}' /var/log/nginx/access.log | \
    cut -d: -f2 | \
    sort -n | \
    uniq -c

404 errors:

#!/bin/bash
# 404_errors.sh

awk '$9 == "404" {print $7}' /var/log/nginx/access.log | \
    sort | \
    uniq -c | \
    sort -rn | \
    head -20

Bandwidth by IP:

#!/bin/bash
# bandwidth_by_ip.sh

awk '{ip[$1]+=$10} END {for (i in ip) print i, ip[i]/1024/1024 "MB"}' /var/log/nginx/access.log | \
    sort -k2 -rn | \
    head -10

3.3. Complete Log Analysis Script

#!/bin/bash
# analyze_nginx_logs.sh

LOG_FILE="/var/log/nginx/access.log"
OUTPUT_DIR="/var/www/reports"
DATE=$(date +%Y-%m-%d)

mkdir -p $OUTPUT_DIR

echo "Nginx Log Analysis - $DATE" > $OUTPUT_DIR/report-$DATE.txt
echo "=====================================" >> $OUTPUT_DIR/report-$DATE.txt
echo "" >> $OUTPUT_DIR/report-$DATE.txt

# Total requests
echo "Total Requests:" >> $OUTPUT_DIR/report-$DATE.txt
wc -l < $LOG_FILE >> $OUTPUT_DIR/report-$DATE.txt
echo "" >> $OUTPUT_DIR/report-$DATE.txt

# Top 10 IPs
echo "Top 10 IP Addresses:" >> $OUTPUT_DIR/report-$DATE.txt
awk '{print $1}' $LOG_FILE | sort | uniq -c | sort -rn | head -10 >> $OUTPUT_DIR/report-$DATE.txt
echo "" >> $OUTPUT_DIR/report-$DATE.txt

# Top 10 URLs
echo "Top 10 URLs:" >> $OUTPUT_DIR/report-$DATE.txt
awk '{print $7}' $LOG_FILE | sort | uniq -c | sort -rn | head -10 >> $OUTPUT_DIR/report-$DATE.txt
echo "" >> $OUTPUT_DIR/report-$DATE.txt

# Status codes
echo "Status Code Distribution:" >> $OUTPUT_DIR/report-$DATE.txt
awk '{print $9}' $LOG_FILE | sort | uniq -c | sort -rn >> $OUTPUT_DIR/report-$DATE.txt
echo "" >> $OUTPUT_DIR/report-$DATE.txt

# Top user agents
echo "Top 10 User Agents:" >> $OUTPUT_DIR/report-$DATE.txt
awk -F'"' '{print $6}' $LOG_FILE | sort | uniq -c | sort -rn | head -10 >> $OUTPUT_DIR/report-$DATE.txt
echo "" >> $OUTPUT_DIR/report-$DATE.txt

# Requests by hour
echo "Requests by Hour:" >> $OUTPUT_DIR/report-$DATE.txt
awk '{print $4}' $LOG_FILE | cut -d: -f2 | sort -n | uniq -c >> $OUTPUT_DIR/report-$DATE.txt
echo "" >> $OUTPUT_DIR/report-$DATE.txt

# Top 404 errors
echo "Top 404 Errors:" >> $OUTPUT_DIR/report-$DATE.txt
awk '$9 == "404" {print $7}' $LOG_FILE | sort | uniq -c | sort -rn | head -10 >> $OUTPUT_DIR/report-$DATE.txt

echo "Report generated: $OUTPUT_DIR/report-$DATE.txt"

4. Prometheus + Grafana Integration

4.1. Nginx Prometheus Exporter

Install nginx-prometheus-exporter:

# Download latest release
wget https://github.com/nginxinc/nginx-prometheus-exporter/releases/download/v0.11.0/nginx-prometheus-exporter_0.11.0_linux_amd64.tar.gz

# Extract
tar -xzf nginx-prometheus-exporter_0.11.0_linux_amd64.tar.gz

# Move to bin
sudo mv nginx-prometheus-exporter /usr/local/bin/

# Create systemd service
sudo nano /etc/systemd/system/nginx-exporter.service

Systemd service:

[Unit]
Description=Nginx Prometheus Exporter
After=network.target

[Service]
Type=simple
User=nginx-exporter
Group=nginx-exporter
ExecStart=/usr/local/bin/nginx-prometheus-exporter \
    -nginx.scrape-uri=http://localhost:8080/stub_status \
    -web.listen-address=:9113

Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Enable stub_status in Nginx:

server {
    listen 8080;
    server_name localhost;
    
    location /stub_status {
        stub_status;
        access_log off;
        allow 127.0.0.1;
        deny all;
    }
}

Start exporter:

sudo systemctl daemon-reload
sudo systemctl start nginx-exporter
sudo systemctl enable nginx-exporter

# Check status
sudo systemctl status nginx-exporter

# Test metrics endpoint
curl http://localhost:9113/metrics

4.2. Prometheus Configuration

# /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'nginx'
    static_configs:
      - targets: ['localhost:9113']
        labels:
          instance: 'nginx-server-1'
          environment: 'production'
  
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']
        labels:
          instance: 'nginx-server-1'

4.3. VTS Module (Advanced Metrics)

Install Nginx with VTS module:

# Clone VTS module
git clone https://github.com/vozlt/nginx-module-vts.git

# Download Nginx source
wget http://nginx.org/download/nginx-1.24.0.tar.gz
tar -xzf nginx-1.24.0.tar.gz
cd nginx-1.24.0

# Configure with VTS module
./configure --add-module=../nginx-module-vts \
    --prefix=/etc/nginx \
    --sbin-path=/usr/sbin/nginx \
    --conf-path=/etc/nginx/nginx.conf

# Compile and install
make
sudo make install

Configure VTS:

http {
    vhost_traffic_status_zone;
    
    server {
        listen 80;
        
        location /status {
            vhost_traffic_status_display;
            vhost_traffic_status_display_format html;
            access_log off;
            allow 127.0.0.1;
            deny all;
        }
    }
}

4.4. Grafana Dashboard

Install Grafana:

# Ubuntu/Debian
sudo apt install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo apt update
sudo apt install grafana

# Start Grafana
sudo systemctl start grafana-server
sudo systemctl enable grafana-server

Access Grafana:

URL: http://localhost:3000
Default login: admin/admin

Import Nginx dashboard:

Go to Dashboards → Import
Enter dashboard ID: 12708 (Nginx Prometheus Exporter)
Select Prometheus datasource
Import

Custom Grafana dashboard queries:

# Request rate
rate(nginx_http_requests_total[5m])

# Error rate
rate(nginx_http_requests_total{status=~"5.."}[5m])

# Average response time
rate(nginx_http_request_duration_seconds_sum[5m]) / 
rate(nginx_http_request_duration_seconds_count[5m])

# Active connections
nginx_connections_active

# Request rate by status code
sum(rate(nginx_http_requests_total[5m])) by (status)

# Bandwidth
rate(nginx_http_request_bytes_total[5m])

5. ELK Stack Integration

5.1. Elasticsearch Installation

# Import Elasticsearch GPG key
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

# Add repository
echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

# Install Elasticsearch
sudo apt update
sudo apt install elasticsearch

# Configure
sudo nano /etc/elasticsearch/elasticsearch.yml

Elasticsearch configuration:

# /etc/elasticsearch/elasticsearch.yml
cluster.name: nginx-logs
node.name: node-1
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: localhost
http.port: 9200
discovery.type: single-node

# Security
xpack.security.enabled: false

Start Elasticsearch:

sudo systemctl start elasticsearch
sudo systemctl enable elasticsearch

# Test
curl http://localhost:9200

5.2. Logstash Configuration

Install Logstash:

sudo apt install logstash

Logstash pipeline:

# /etc/logstash/conf.d/nginx.conf
input {
  file {
    path => "/var/log/nginx/access.log"
    start_position => "beginning"
    type => "nginx-access"
  }
  
  file {
    path => "/var/log/nginx/error.log"
    start_position => "beginning"
    type => "nginx-error"
  }
}

filter {
  if [type] == "nginx-access" {
    grok {
      match => {
        "message" => '%{IPORHOST:remote_addr} - %{DATA:remote_user} \[%{HTTPDATE:time_local}\] "%{WORD:request_method} %{DATA:request_path} HTTP/%{NUMBER:http_version}" %{NUMBER:status} %{NUMBER:body_bytes_sent} "%{DATA:http_referer}" "%{DATA:http_user_agent}"'
      }
    }
    
    date {
      match => [ "time_local", "dd/MMM/yyyy:HH:mm:ss Z" ]
      target => "@timestamp"
    }
    
    geoip {
      source => "remote_addr"
    }
    
    mutate {
      convert => {
        "status" => "integer"
        "body_bytes_sent" => "integer"
      }
    }
  }
  
  if [type] == "nginx-error" {
    grok {
      match => {
        "message" => "(?<timestamp>%{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY}[- ]%{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER:threadid}\: \*%{NUMBER:connectionid} %{GREEDYDATA:errormessage}"
      }
    }
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "nginx-logs-%{+YYYY.MM.dd}"
  }
  
  # Debug output
  # stdout { codec => rubydebug }
}

Start Logstash:

sudo systemctl start logstash
sudo systemctl enable logstash

# Test configuration
sudo -u logstash /usr/share/logstash/bin/logstash --path.settings /etc/logstash -t

5.3. Kibana Installation

# Install Kibana
sudo apt install kibana

# Configure
sudo nano /etc/kibana/kibana.yml

Kibana configuration:

# /etc/kibana/kibana.yml
server.port: 5601
server.host: "localhost"
elasticsearch.hosts: ["http://localhost:9200"]

Start Kibana:

sudo systemctl start kibana
sudo systemctl enable kibana

# Access at: http://localhost:5601

Configure Nginx reverse proxy for Kibana:

server {
    listen 80;
    server_name kibana.example.com;
    
    location / {
        proxy_pass http://localhost:5601;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
    }
}

5.4. Kibana Visualizations

Create index pattern:

Management → Index Patterns
Create index pattern: nginx-logs-*
Select time field: @timestamp

Sample visualizations:

Request Rate Over Time
- Type: Line chart
- Y-axis: Count
- X-axis: Date Histogram (@timestamp)
Status Code Distribution
- Type: Pie chart
- Slice by: status.keyword
Top 10 URLs
- Type: Data table
- Metrics: Count
- Buckets: request_path.keyword
Geographic Distribution
- Type: Coordinate map
- Geohash: geoip.location
Error Rate
- Type: Metric
- Aggregation: Count
- Filter: status >= 400

5.5. Filebeat Alternative

# Install Filebeat
sudo apt install filebeat

# Configure
sudo nano /etc/filebeat/filebeat.yml

Filebeat configuration:

# /etc/filebeat/filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/nginx/access.log
  fields:
    log_type: nginx-access
  
- type: log
  enabled: true
  paths:
    - /var/log/nginx/error.log
  fields:
    log_type: nginx-error

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false

setup.template.settings:
  index.number_of_shards: 1

output.elasticsearch:
  hosts: ["localhost:9200"]
  index: "filebeat-nginx-%{+yyyy.MM.dd}"

setup.kibana:
  host: "localhost:5601"

processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

Enable Nginx module:

sudo filebeat modules enable nginx

# Configure module
sudo nano /etc/filebeat/modules.d/nginx.yml

# /etc/filebeat/modules.d/nginx.yml
- module: nginx
  access:
    enabled: true
    var.paths: ["/var/log/nginx/access.log"]
  
  error:
    enabled: true
    var.paths: ["/var/log/nginx/error.log"]

Start Filebeat:

# Setup
sudo filebeat setup -e

# Start
sudo systemctl start filebeat
sudo systemctl enable filebeat

6. Alerting Systems

6.1. Prometheus Alertmanager

Install Alertmanager:

wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar -xzf alertmanager-0.26.0.linux-amd64.tar.gz
sudo mv alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
sudo mv alertmanager-0.26.0.linux-amd64/amtool /usr/local/bin/

Alertmanager configuration:

# /etc/alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.gmail.com:587'
  smtp_from: 'alerts@example.com'
  smtp_auth_username: 'alerts@example.com'
  smtp_auth_password: 'your-password'

route:
  group_by: ['alertname', 'cluster']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 12h
  receiver: 'email-notifications'

receivers:
  - name: 'email-notifications'
    email_configs:
      - to: 'admin@example.com'
        headers:
          Subject: 'Nginx Alert: {{ .GroupLabels.alertname }}'
  
  - name: 'slack-notifications'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
        channel: '#alerts'
        title: 'Nginx Alert'
        text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'instance']

Prometheus alert rules:

# /etc/prometheus/rules/nginx_alerts.yml
groups:
  - name: nginx
    interval: 30s
    rules:
      # High error rate
      - alert: NginxHighErrorRate
        expr: rate(nginx_http_requests_total{status=~"5.."}[5m]) > 10
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate on {{ $labels.instance }}"
          description: "Error rate is {{ $value }} errors/sec"
      
      # High response time
      - alert: NginxHighResponseTime
        expr: nginx_http_request_duration_seconds{quantile="0.99"} > 5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High response time on {{ $labels.instance }}"
          description: "99th percentile response time is {{ $value }}s"
      
      # Nginx down
      - alert: NginxDown
        expr: up{job="nginx"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Nginx is down on {{ $labels.instance }}"
          description: "Nginx exporter is not responding"
      
      # High connection count
      - alert: NginxHighConnections
        expr: nginx_connections_active > 1000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High connection count on {{ $labels.instance }}"
          description: "Active connections: {{ $value }}"
      
      # Low request rate (possible issue)
      - alert: NginxLowRequestRate
        expr: rate(nginx_http_requests_total[5m]) < 1
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Low request rate on {{ $labels.instance }}"
          description: "Request rate is {{ $value }} req/sec"

6.2. Email Alerts Script

#!/bin/bash
# nginx_alert.sh

THRESHOLD_ERROR_RATE=100
THRESHOLD_RESPONSE_TIME=5
EMAIL="admin@example.com"
LOG_FILE="/var/log/nginx/access.log"

# Check error rate
ERROR_COUNT=$(grep -c " 5[0-9][0-9] " $LOG_FILE)

if [ $ERROR_COUNT -gt $THRESHOLD_ERROR_RATE ]; then
    echo "ALERT: High error rate detected: $ERROR_COUNT errors" | \
        mail -s "Nginx Alert: High Error Rate" $EMAIL
fi

# Check response time (if logged)
AVG_RESPONSE_TIME=$(awk '{sum+=$NF; count++} END {print sum/count}' $LOG_FILE)

if (( $(echo "$AVG_RESPONSE_TIME > $THRESHOLD_RESPONSE_TIME" | bc -l) )); then
    echo "ALERT: High response time: ${AVG_RESPONSE_TIME}s" | \
        mail -s "Nginx Alert: High Response Time" $EMAIL
fi

6.3. Slack Webhook Integration

#!/bin/bash
# slack_alert.sh

WEBHOOK_URL="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
LOG_FILE="/var/log/nginx/error.log"

# Check for errors
ERROR_COUNT=$(wc -l < $LOG_FILE)

if [ $ERROR_COUNT -gt 100 ]; then
    MESSAGE="⚠️ Nginx Alert: $ERROR_COUNT errors detected!"
    
    curl -X POST $WEBHOOK_URL \
        -H 'Content-Type: application/json' \
        -d "{\"text\":\"$MESSAGE\"}"
fi

6.4. Automated Response Script

#!/bin/bash
# auto_response.sh

LOG_FILE="/var/log/nginx/error.log"
THRESHOLD=100

# Count recent errors (last 5 minutes)
ERROR_COUNT=$(find /var/log/nginx -name "error.log" -mmin -5 -exec wc -l {} \; | awk '{sum+=$1} END {print sum}')

if [ $ERROR_COUNT -gt $THRESHOLD ]; then
    echo "High error rate detected: $ERROR_COUNT errors"
    
    # Reload Nginx
    echo "Reloading Nginx..."
    systemctl reload nginx
    
    # Send alert
    echo "High error rate triggered Nginx reload: $ERROR_COUNT errors" | \
        mail -s "Nginx Auto-Response" admin@example.com
    
    # Log action
    echo "$(date): Auto-reload triggered due to $ERROR_COUNT errors" >> /var/log/nginx-auto-response.log
fi

7. Real-time Monitoring Dashboard

7.1. Custom HTML Dashboard

<!DOCTYPE html>
<html>
<head>
    <title>Nginx Real-time Monitor</title>
    <meta http-equiv="refresh" content="5">
    <style>
        body {
            font-family: Arial, sans-serif;
            margin: 20px;
            background: #f5f5f5;
        }
        .container {
            max-width: 1200px;
            margin: 0 auto;
        }
        .metric {
            background: white;
            padding: 20px;
            margin: 10px 0;
            border-radius: 5px;
            box-shadow: 0 2px 4px rgba(0,0,0,0.1);
        }
        .metric h2 {
            margin: 0 0 10px 0;
            color: #333;
        }
        .value {
            font-size: 32px;
            font-weight: bold;
            color: #007bff;
        }
        .alert {
            background: #ff4444;
            color: white;
        }
        .warning {
            background: #ffaa00;
            color: white;
        }
        .ok {
            background: #00c851;
            color: white;
        }
    </style>
</head>
<body>
    <div class="container">
        <h1>Nginx Real-time Monitor</h1>
        
        <?php
        // Read Nginx stats
        $stats = file_get_contents('http://localhost:8080/stub_status');
        
        preg_match('/Active connections: (\d+)/', $stats, $active);
        preg_match('/(\d+)\s+(\d+)\s+(\d+)/', $stats, $totals);
        preg_match('/Reading: (\d+) Writing: (\d+) Waiting: (\d+)/', $stats, $current);
        
        $activeConnections = $active[1];
        $accepts = $totals[1];
        $handled = $totals[2];
        $requests = $totals[3];
        $reading = $current[1];
        $writing = $current[2];
        $waiting = $current[3];
        
        // Determine status
        $status = 'ok';
        if ($activeConnections > 1000) $status = 'alert';
        elseif ($activeConnections > 500) $status = 'warning';
        ?>
        
        <div class="metric <?php echo $status; ?>">
            <h2>Active Connections</h2>
            <div class="value"><?php echo $activeConnections; ?></div>
        </div>
        
        <div class="metric">
            <h2>Total Requests</h2>
            <div class="value"><?php echo number_format($requests); ?></div>
        </div>
        
        <div class="metric">
            <h2>Connection Details</h2>
            <p>Reading: <?php echo $reading; ?></p>
            <p>Writing: <?php echo $writing; ?></p>
            <p>Waiting: <?php echo $waiting; ?></p>
        </div>
        
        <div class="metric">
            <h2>Server Stats</h2>
            <p>Accepts: <?php echo number_format($accepts); ?></p>
            <p>Handled: <?php echo number_format($handled); ?></p>
            <p>Requests per connection: <?php echo round($requests/$handled, 2); ?></p>
        </div>
        
        <p><small>Last updated: <?php echo date('Y-m-d H:i:s'); ?></small></p>
    </div>
</body>
</html>

7.2. Python Real-time Monitor

#!/usr/bin/env python3
# nginx_monitor.py

import requests
import time
import re
from datetime import datetime

def get_nginx_stats():
    """Fetch Nginx stub_status"""
    try:
        response = requests.get('http://localhost:8080/stub_status')
        return response.text
    except Exception as e:
        print(f"Error fetching stats: {e}")
        return None

def parse_stats(stats):
    """Parse stub_status output"""
    if not stats:
        return None
    
    active = re.search(r'Active connections: (\d+)', stats)
    totals = re.search(r'(\d+)\s+(\d+)\s+(\d+)', stats)
    current = re.search(r'Reading: (\d+) Writing: (\d+) Waiting: (\d+)', stats)
    
    return {
        'active_connections': int(active.group(1)),
        'accepts': int(totals.group(1)),
        'handled': int(totals.group(2)),
        'requests': int(totals.group(3)),
        'reading': int(current.group(1)),
        'writing': int(current.group(2)),
        'waiting': int(current.group(3)),
        'timestamp': datetime.now()
    }

def display_stats(stats):
    """Display stats in terminal"""
    print("\033[2J\033[H")  # Clear screen
    print("=" * 50)
    print("Nginx Real-time Monitor")
    print("=" * 50)
    print(f"Time: {stats['timestamp'].strftime('%Y-%m-%d %H:%M:%S')}")
    print()
    print(f"Active Connections: {stats['active_connections']}")
    print(f"Total Requests: {stats['requests']:,}")
    print()
    print(f"Reading: {stats['reading']}")
    print(f"Writing: {stats['writing']}")
    print(f"Waiting: {stats['waiting']}")
    print()
    print(f"Accepts: {stats['accepts']:,}")
    print(f"Handled: {stats['handled']:,}")
    print(f"Requests/Connection: {stats['requests']/stats['handled']:.2f}")
    print()
    
    # Alerts
    if stats['active_connections'] > 1000:
        print("⚠️  ALERT: High connection count!")
    elif stats['active_connections'] > 500:
        print("⚠️  WARNING: Elevated connection count")
    else:
        print("✓ Status: OK")

def main():
    print("Starting Nginx monitor...")
    print("Press Ctrl+C to exit")
    time.sleep(2)
    
    try:
        while True:
            stats_text = get_nginx_stats()
            stats = parse_stats(stats_text)
            
            if stats:
                display_stats(stats)
            
            time.sleep(5)
    except KeyboardInterrupt:
        print("\nMonitor stopped")

if __name__ == '__main__':
    main()

Tổng kết

Trong bài này, bạn đã học:

✅ Access logs và error logs configuration
✅ Custom log formats (JSON, detailed, performance)
✅ Log rotation với logrotate
✅ Log analysis tools (GoAccess, AWK scripts)
✅ Prometheus + Grafana monitoring
✅ ELK Stack integration (Elasticsearch, Logstash, Kibana)
✅ Alerting systems (Alertmanager, email, Slack)
✅ Real-time dashboards
✅ Performance metrics và troubleshooting

Key takeaways:

Use structured logging (JSON) for better analysis
Implement log rotation to manage disk space
Set up real-time monitoring for quick issue detection
Create dashboards for visualization
Configure alerts for critical issues
Automate log analysis with scripts
Integrate with centralized logging systems

Monitoring checklist:

✅ Access and error logs configured
✅ Log rotation enabled
✅ Prometheus exporter running
✅ Grafana dashboards created
✅ Alerting rules configured
✅ ELK stack (optional) integrated
✅ Regular log analysis scheduled
✅ Backup and retention policies defined

Bài tiếp theo: High Availability và Load Balancing Advanced - Nginx Plus features, health checks, session persistence, active-active setups, failover strategies, và disaster recovery để đảm bảo uptime tối đa trong production environments.

Menu

Bài 12: Monitoring và Logging trong NGINX

DUY TRAN

Bài học khóa học

1. Access Logs và Error Logs

1.1. Default Log Configuration

1.2. Error Log Levels

1.3. Custom Log Formats

1.4. Conditional Logging

1.5. Log Variables

2. Log Rotation

2.1. Logrotate Configuration

2.2. Manual Log Rotation Script

3. Log Analysis Tools

3.1. GoAccess (Real-time Web Log Analyzer)

3.2. AWK Scripts for Log Analysis

3.3. Complete Log Analysis Script

4. Prometheus + Grafana Integration

4.1. Nginx Prometheus Exporter

4.2. Prometheus Configuration

4.3. VTS Module (Advanced Metrics)

4.4. Grafana Dashboard

5. ELK Stack Integration

5.1. Elasticsearch Installation

5.2. Logstash Configuration

5.3. Kibana Installation

5.4. Kibana Visualizations

5.5. Filebeat Alternative

6. Alerting Systems

6.1. Prometheus Alertmanager

6.2. Email Alerts Script

6.3. Slack Webhook Integration

6.4. Automated Response Script

7. Real-time Monitoring Dashboard

7.1. Custom HTML Dashboard

7.2. Python Real-time Monitor

Tổng kết

Đánh dấu hoàn thành (Bài 12: Monitoring và Logging trong NGINX)

Menu

Bài 12: Monitoring và Logging trong NGINX

DUY TRAN

Bài học khóa học

Bài 1: Tổng quan về PostgreSQL High Availability

Bài 2: Streaming Replication trong PostgreSQL

Bài 3: Giới thiệu Patroni và etcd

Bài 4: Chuẩn bị hạ tầng

Bài 1: Giới thiệu và Cài đặt Nginx

Bài 2: Cấu hình Cơ bản Nginx

Bài 3: Logging và Monitoring Nginx

Bài 4: Reverse Proxy

Bài 5: Load Balancing

Bài 6: NGINX CACHING

Bài 7: SSL/TLS và HTTPS trong NGINX

Bài 8: Performance Tuning NGINX

Bài 9: Security trong NGINX

Bài 10: Rewrite và Redirects trong NGINX

Bài 11: Nginx với Application Stack trong NGINX

Bài 12: Monitoring và Logging trong NGINX

Bài 13: High Availability và Load Balancing Advanced trong NGINX

Bài 14: Microservices và Service Mesh trong NGINX

Bài 15: Production Best Practices và Advanced Topics trong NGINX

Bài 5: Cài đặt PostgreSQL

Bài 6: Cài đặt và cấu hình etcd cluster

Bài 7: Cài đặt Patroni

Bài 8: Cấu hình Patroni chi tiết

Bài 9: Bootstrap PostgreSQL Cluster

Bài 10: Quản lý Replication

Bài 11: Patroni Callbacks

Bài 12: Patroni REST API

Bài 13: Automatic Failover

Bài 14: Switchover có kế hoạch (Planned Switchover)

Bài 15: Recovering failed nodes

Bài 16: Backup và Point-in-Time Recovery (PITR)

Bài 17: Monitoring Patroni Cluster

Bài 18: Performance Tuning

Bài 19: Logging và Troubleshooting

Bài 20: Security Best Practices

Bài 21: Multi-datacenter Setup

Bài 22: Patroni với Kubernetes

Bài 24: Upgrade Strategies

Bài 23: Patroni Configuration Management

Bài 25: Real-world Case Studies

Bài 26: Automation với Ansible

Bài 27: Disaster Recovery Drills

Bài 28: Thiết Kế Kiến Trúc HA

Bài 29: Deploy Production-ready Cluster

BÀI 1: GIỚI THIỆU KUBERNETES VÀ CONTAINER ORCHESTRATION

1. Access Logs và Error Logs

1.1. Default Log Configuration

1.2. Error Log Levels

1.3. Custom Log Formats

1.4. Conditional Logging

1.5. Log Variables

2. Log Rotation

2.1. Logrotate Configuration

2.2. Manual Log Rotation Script

3. Log Analysis Tools

3.1. GoAccess (Real-time Web Log Analyzer)

3.2. AWK Scripts for Log Analysis

3.3. Complete Log Analysis Script

4. Prometheus + Grafana Integration

4.1. Nginx Prometheus Exporter

4.2. Prometheus Configuration

4.3. VTS Module (Advanced Metrics)

4.4. Grafana Dashboard

5. ELK Stack Integration

5.1. Elasticsearch Installation

5.2. Logstash Configuration

5.3. Kibana Installation

5.4. Kibana Visualizations

5.5. Filebeat Alternative

6. Alerting Systems

6.1. Prometheus Alertmanager

6.2. Email Alerts Script

6.3. Slack Webhook Integration

6.4. Automated Response Script

7. Real-time Monitoring Dashboard

7.1. Custom HTML Dashboard