Bài 12: Monitoring và Logging trong NGINX

Bài học về Monitoring và Logging trong Nginx - access logs analysis, custom log formats, log rotation với logrotate, real-time monitoring tools, Prometheus + Grafana integration, ELK Stack (Elasticsearch, Logstash, Kibana), alerting systems, performance metrics và troubleshooting.

15 min read
Bài 12: Monitoring và Logging trong NGINX

1. Access Logs và Error Logs

1.1. Default Log Configuration

http {
    # Default log format
    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for"';
    
    # Access log
    access_log /var/log/nginx/access.log main;
    
    # Error log with level
    error_log /var/log/nginx/error.log warn;
    
    server {
        listen 80;
        server_name example.com;
        
        # Server-specific logs
        access_log /var/log/nginx/example.com.access.log main;
        error_log /var/log/nginx/example.com.error.log;
        
        location / {
            root /var/www/html;
        }
        
        # Disable logging for specific location
        location /health {
            access_log off;
            return 200 "OK\n";
        }
        
        # Location-specific log
        location /api/ {
            access_log /var/log/nginx/api.access.log main;
            proxy_pass http://backend;
        }
    }
}

1.2. Error Log Levels

# Error log levels (highest to lowest severity)
error_log /var/log/nginx/error.log emerg;   # System is unusable
error_log /var/log/nginx/error.log alert;   # Action must be taken immediately
error_log /var/log/nginx/error.log crit;    # Critical conditions
error_log /var/log/nginx/error.log error;   # Error conditions (default)
error_log /var/log/nginx/error.log warn;    # Warning conditions
error_log /var/log/nginx/error.log notice;  # Normal but significant
error_log /var/log/nginx/error.log info;    # Informational messages
error_log /var/log/nginx/error.log debug;   # Debug messages

# Recommended for production
error_log /var/log/nginx/error.log warn;

# For debugging
error_log /var/log/nginx/error.log debug;

1.3. Custom Log Formats

Detailed log format:

http {
    log_format detailed '$remote_addr - $remote_user [$time_local] '
                       '"$request" $status $body_bytes_sent '
                       '"$http_referer" "$http_user_agent" '
                       'rt=$request_time uct="$upstream_connect_time" '
                       'uht="$upstream_header_time" urt="$upstream_response_time"';
    
    access_log /var/log/nginx/access.log detailed;
}

JSON log format:

http {
    log_format json_combined escape=json
    '{'
        '"time_local":"$time_local",'
        '"remote_addr":"$remote_addr",'
        '"remote_user":"$remote_user",'
        '"request":"$request",'
        '"status": "$status",'
        '"body_bytes_sent":"$body_bytes_sent",'
        '"request_time":"$request_time",'
        '"http_referrer":"$http_referer",'
        '"http_user_agent":"$http_user_agent",'
        '"http_x_forwarded_for":"$http_x_forwarded_for",'
        '"upstream_addr":"$upstream_addr",'
        '"upstream_status":"$upstream_status",'
        '"upstream_response_time":"$upstream_response_time",'
        '"upstream_connect_time":"$upstream_connect_time",'
        '"upstream_header_time":"$upstream_header_time"'
    '}';
    
    access_log /var/log/nginx/access.log json_combined;
}

Cache status log:

http {
    log_format cache_status '$remote_addr - [$time_local] "$request" '
                           '$status $body_bytes_sent '
                           '"$http_referer" "$http_user_agent" '
                           'cache_status=$upstream_cache_status '
                           'response_time=$request_time';
    
    access_log /var/log/nginx/cache.log cache_status;
}

Performance tracking log:

http {
    log_format performance '$time_iso8601 $remote_addr '
                          '"$request" $status $body_bytes_sent '
                          'rt=$request_time '
                          'ua="$upstream_addr" '
                          'us=$upstream_status '
                          'ut=$upstream_response_time '
                          'ul="$upstream_response_length" '
                          'cs=$upstream_cache_status';
    
    access_log /var/log/nginx/performance.log performance;
}

Security log:

http {
    log_format security '$remote_addr [$time_local] '
                       '"$request" $status '
                       'user_agent="$http_user_agent" '
                       'referer="$http_referer" '
                       'forwarded_for="$http_x_forwarded_for" '
                       'host="$host"';
    
    # Log only suspicious requests
    map $status $loggable {
        ~^[23] 0;
        default 1;
    }
    
    access_log /var/log/nginx/security.log security if=$loggable;
}

1.4. Conditional Logging

http {
    # Don't log successful health checks
    map $request_uri $loggable_request {
        ~^/health$ 0;
        ~^/ping$ 0;
        default 1;
    }
    
    # Don't log static files
    map $request_uri $loggable_static {
        ~*\.(jpg|jpeg|png|gif|ico|css|js)$ 0;
        default 1;
    }
    
    # Combine conditions
    map "$loggable_request:$loggable_static" $final_loggable {
        "0:0" 0;
        "0:1" 0;
        "1:0" 0;
        default 1;
    }
    
    server {
        listen 80;
        
        access_log /var/log/nginx/access.log combined if=$final_loggable;
        
        # Or per-location
        location /api/ {
            access_log /var/log/nginx/api.log combined;
            proxy_pass http://backend;
        }
        
        location /static/ {
            access_log off;
            root /var/www;
        }
    }
}

1.5. Log Variables

Available variables:

# Client information
$remote_addr          # Client IP address
$remote_user          # Client username (HTTP auth)
$remote_port          # Client port

# Request information
$request              # Full request line
$request_method       # GET, POST, etc.
$request_uri          # Full URI with arguments
$uri                  # URI without arguments
$args                 # Query string
$scheme               # http or https
$server_protocol      # HTTP version (HTTP/1.1, HTTP/2.0)

# Response information
$status               # Response status code
$body_bytes_sent      # Bytes sent to client
$bytes_sent           # Total bytes sent (including headers)

# Timing information
$request_time         # Total request processing time
$upstream_response_time    # Backend response time
$upstream_connect_time     # Time to connect to backend
$upstream_header_time      # Time to receive headers from backend

# Upstream information
$upstream_addr        # Backend server address
$upstream_status      # Backend response status
$upstream_cache_status     # Cache status (HIT, MISS, etc.)

# Headers
$http_referer         # Referer header
$http_user_agent      # User-Agent header
$http_x_forwarded_for # X-Forwarded-For header

# Time
$time_local           # Local time
$time_iso8601         # ISO 8601 format
$msec                 # Unix timestamp with milliseconds

2. Log Rotation

2.1. Logrotate Configuration

# /etc/logrotate.d/nginx
/var/log/nginx/*.log {
    daily                    # Rotate daily
    missingok               # Don't error if log is missing
    rotate 14               # Keep 14 days of logs
    compress                # Compress rotated logs
    delaycompress          # Compress after one rotation
    notifempty             # Don't rotate if empty
    create 0640 nginx adm  # Create new file with permissions
    sharedscripts          # Run postrotate once for all logs
    postrotate
        if [ -f /var/run/nginx.pid ]; then
            kill -USR1 `cat /var/run/nginx.pid`
        fi
    endscript
}

Weekly rotation:

# /etc/logrotate.d/nginx-weekly
/var/log/nginx/access.log
/var/log/nginx/error.log {
    weekly
    rotate 52
    compress
    delaycompress
    notifempty
    create 0640 nginx adm
    sharedscripts
    postrotate
        [ -f /var/run/nginx.pid ] && kill -USR1 `cat /var/run/nginx.pid`
    endscript
}

Size-based rotation:

# /etc/logrotate.d/nginx-size
/var/log/nginx/*.log {
    size 100M           # Rotate when file reaches 100MB
    rotate 10           # Keep 10 rotated files
    compress
    delaycompress
    notifempty
    create 0640 nginx adm
    sharedscripts
    postrotate
        [ -f /var/run/nginx.pid ] && kill -USR1 `cat /var/run/nginx.pid`
    endscript
}

Test logrotate:

# Test configuration
sudo logrotate -d /etc/logrotate.d/nginx

# Force rotation
sudo logrotate -f /etc/logrotate.d/nginx

# Check status
cat /var/lib/logrotate/status

2.2. Manual Log Rotation Script

#!/bin/bash
# rotate_nginx_logs.sh

LOG_DIR="/var/log/nginx"
BACKUP_DIR="/var/log/nginx/archive"
DAYS_TO_KEEP=30

# Create backup directory
mkdir -p $BACKUP_DIR

# Get current date
DATE=$(date +%Y%m%d-%H%M%S)

# Rotate logs
for log in access.log error.log; do
    if [ -f "$LOG_DIR/$log" ]; then
        # Move log file
        mv "$LOG_DIR/$log" "$BACKUP_DIR/${log%.*}-$DATE.log"
        
        # Compress
        gzip "$BACKUP_DIR/${log%.*}-$DATE.log"
        
        # Create new empty log
        touch "$LOG_DIR/$log"
        chmod 640 "$LOG_DIR/$log"
        chown nginx:adm "$LOG_DIR/$log"
    fi
done

# Reload Nginx
nginx -s reopen

# Delete old logs
find $BACKUP_DIR -name "*.gz" -mtime +$DAYS_TO_KEEP -delete

echo "Log rotation complete: $DATE"

Cron job:

# /etc/cron.d/nginx-logrotate
0 0 * * * root /usr/local/bin/rotate_nginx_logs.sh >> /var/log/nginx-rotation.log 2>&1

3. Log Analysis Tools

3.1. GoAccess (Real-time Web Log Analyzer)

Install GoAccess:

# Ubuntu/Debian
sudo apt install goaccess

# CentOS/RHEL
sudo yum install goaccess

# macOS
brew install goaccess

Analyze logs:

# Real-time terminal dashboard
sudo goaccess /var/log/nginx/access.log -c

# Generate HTML report
sudo goaccess /var/log/nginx/access.log -o /var/www/html/report.html --log-format=COMBINED

# Real-time HTML dashboard
sudo goaccess /var/log/nginx/access.log -o /var/www/html/report.html --log-format=COMBINED --real-time-html

# With custom log format
sudo goaccess /var/log/nginx/access.log --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u"' --date-format=%d/%b/%Y --time-format=%H:%M:%S

GoAccess configuration:

# /etc/goaccess/goaccess.conf

# Log format
log-format %h %^[%d:%t %^] "%r" %s %b "%R" "%u"
date-format %d/%b/%Y
time-format %H:%M:%S

# UI options
color-scheme 1
hl-header true

# Output options
html-prefs {"theme":"bright","perPage":10}
html-report-title "Nginx Statistics"

# Enable/disable panels
enable-panel VISITORS
enable-panel REQUESTS
enable-panel REQUESTS_STATIC
enable-panel NOT_FOUND
enable-panel HOSTS
enable-panel OS
enable-panel BROWSERS
enable-panel STATUS_CODES
enable-panel REFERRING_SITES
enable-panel KEYPHRASES
enable-panel GEO_LOCATION

3.2. AWK Scripts for Log Analysis

Requests per second:

#!/bin/bash
# requests_per_second.sh

awk '{print $4}' /var/log/nginx/access.log | \
    cut -d: -f1-3 | \
    uniq -c | \
    awk '{print $2, $1}' | \
    sort

Top 10 IP addresses:

#!/bin/bash
# top_ips.sh

awk '{print $1}' /var/log/nginx/access.log | \
    sort | \
    uniq -c | \
    sort -rn | \
    head -10

Top 10 URLs:

#!/bin/bash
# top_urls.sh

awk '{print $7}' /var/log/nginx/access.log | \
    sort | \
    uniq -c | \
    sort -rn | \
    head -10

Status code distribution:

#!/bin/bash
# status_codes.sh

awk '{print $9}' /var/log/nginx/access.log | \
    sort | \
    uniq -c | \
    sort -rn

Average response time:

#!/bin/bash
# avg_response_time.sh

# Assuming response time is logged
awk '{sum+=$NF; count++} END {print "Average:", sum/count "s"}' /var/log/nginx/access.log

Requests by hour:

#!/bin/bash
# requests_by_hour.sh

awk '{print $4}' /var/log/nginx/access.log | \
    cut -d: -f2 | \
    sort -n | \
    uniq -c

404 errors:

#!/bin/bash
# 404_errors.sh

awk '$9 == "404" {print $7}' /var/log/nginx/access.log | \
    sort | \
    uniq -c | \
    sort -rn | \
    head -20

Bandwidth by IP:

#!/bin/bash
# bandwidth_by_ip.sh

awk '{ip[$1]+=$10} END {for (i in ip) print i, ip[i]/1024/1024 "MB"}' /var/log/nginx/access.log | \
    sort -k2 -rn | \
    head -10

3.3. Complete Log Analysis Script

#!/bin/bash
# analyze_nginx_logs.sh

LOG_FILE="/var/log/nginx/access.log"
OUTPUT_DIR="/var/www/reports"
DATE=$(date +%Y-%m-%d)

mkdir -p $OUTPUT_DIR

echo "Nginx Log Analysis - $DATE" > $OUTPUT_DIR/report-$DATE.txt
echo "=====================================" >> $OUTPUT_DIR/report-$DATE.txt
echo "" >> $OUTPUT_DIR/report-$DATE.txt

# Total requests
echo "Total Requests:" >> $OUTPUT_DIR/report-$DATE.txt
wc -l < $LOG_FILE >> $OUTPUT_DIR/report-$DATE.txt
echo "" >> $OUTPUT_DIR/report-$DATE.txt

# Top 10 IPs
echo "Top 10 IP Addresses:" >> $OUTPUT_DIR/report-$DATE.txt
awk '{print $1}' $LOG_FILE | sort | uniq -c | sort -rn | head -10 >> $OUTPUT_DIR/report-$DATE.txt
echo "" >> $OUTPUT_DIR/report-$DATE.txt

# Top 10 URLs
echo "Top 10 URLs:" >> $OUTPUT_DIR/report-$DATE.txt
awk '{print $7}' $LOG_FILE | sort | uniq -c | sort -rn | head -10 >> $OUTPUT_DIR/report-$DATE.txt
echo "" >> $OUTPUT_DIR/report-$DATE.txt

# Status codes
echo "Status Code Distribution:" >> $OUTPUT_DIR/report-$DATE.txt
awk '{print $9}' $LOG_FILE | sort | uniq -c | sort -rn >> $OUTPUT_DIR/report-$DATE.txt
echo "" >> $OUTPUT_DIR/report-$DATE.txt

# Top user agents
echo "Top 10 User Agents:" >> $OUTPUT_DIR/report-$DATE.txt
awk -F'"' '{print $6}' $LOG_FILE | sort | uniq -c | sort -rn | head -10 >> $OUTPUT_DIR/report-$DATE.txt
echo "" >> $OUTPUT_DIR/report-$DATE.txt

# Requests by hour
echo "Requests by Hour:" >> $OUTPUT_DIR/report-$DATE.txt
awk '{print $4}' $LOG_FILE | cut -d: -f2 | sort -n | uniq -c >> $OUTPUT_DIR/report-$DATE.txt
echo "" >> $OUTPUT_DIR/report-$DATE.txt

# Top 404 errors
echo "Top 404 Errors:" >> $OUTPUT_DIR/report-$DATE.txt
awk '$9 == "404" {print $7}' $LOG_FILE | sort | uniq -c | sort -rn | head -10 >> $OUTPUT_DIR/report-$DATE.txt

echo "Report generated: $OUTPUT_DIR/report-$DATE.txt"

4. Prometheus + Grafana Integration

4.1. Nginx Prometheus Exporter

Install nginx-prometheus-exporter:

# Download latest release
wget https://github.com/nginxinc/nginx-prometheus-exporter/releases/download/v0.11.0/nginx-prometheus-exporter_0.11.0_linux_amd64.tar.gz

# Extract
tar -xzf nginx-prometheus-exporter_0.11.0_linux_amd64.tar.gz

# Move to bin
sudo mv nginx-prometheus-exporter /usr/local/bin/

# Create systemd service
sudo nano /etc/systemd/system/nginx-exporter.service

Systemd service:

[Unit]
Description=Nginx Prometheus Exporter
After=network.target

[Service]
Type=simple
User=nginx-exporter
Group=nginx-exporter
ExecStart=/usr/local/bin/nginx-prometheus-exporter \
    -nginx.scrape-uri=http://localhost:8080/stub_status \
    -web.listen-address=:9113

Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Enable stub_status in Nginx:

server {
    listen 8080;
    server_name localhost;
    
    location /stub_status {
        stub_status;
        access_log off;
        allow 127.0.0.1;
        deny all;
    }
}

Start exporter:

sudo systemctl daemon-reload
sudo systemctl start nginx-exporter
sudo systemctl enable nginx-exporter

# Check status
sudo systemctl status nginx-exporter

# Test metrics endpoint
curl http://localhost:9113/metrics

4.2. Prometheus Configuration

# /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'nginx'
    static_configs:
      - targets: ['localhost:9113']
        labels:
          instance: 'nginx-server-1'
          environment: 'production'
  
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']
        labels:
          instance: 'nginx-server-1'

4.3. VTS Module (Advanced Metrics)

Install Nginx with VTS module:

# Clone VTS module
git clone https://github.com/vozlt/nginx-module-vts.git

# Download Nginx source
wget http://nginx.org/download/nginx-1.24.0.tar.gz
tar -xzf nginx-1.24.0.tar.gz
cd nginx-1.24.0

# Configure with VTS module
./configure --add-module=../nginx-module-vts \
    --prefix=/etc/nginx \
    --sbin-path=/usr/sbin/nginx \
    --conf-path=/etc/nginx/nginx.conf

# Compile and install
make
sudo make install

Configure VTS:

http {
    vhost_traffic_status_zone;
    
    server {
        listen 80;
        
        location /status {
            vhost_traffic_status_display;
            vhost_traffic_status_display_format html;
            access_log off;
            allow 127.0.0.1;
            deny all;
        }
    }
}

4.4. Grafana Dashboard

Install Grafana:

# Ubuntu/Debian
sudo apt install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo apt update
sudo apt install grafana

# Start Grafana
sudo systemctl start grafana-server
sudo systemctl enable grafana-server

Access Grafana:

  • URL: http://localhost:3000
  • Default login: admin/admin

Import Nginx dashboard:

  1. Go to Dashboards → Import
  2. Enter dashboard ID: 12708 (Nginx Prometheus Exporter)
  3. Select Prometheus datasource
  4. Import

Custom Grafana dashboard queries:

# Request rate
rate(nginx_http_requests_total[5m])

# Error rate
rate(nginx_http_requests_total{status=~"5.."}[5m])

# Average response time
rate(nginx_http_request_duration_seconds_sum[5m]) / 
rate(nginx_http_request_duration_seconds_count[5m])

# Active connections
nginx_connections_active

# Request rate by status code
sum(rate(nginx_http_requests_total[5m])) by (status)

# Bandwidth
rate(nginx_http_request_bytes_total[5m])

5. ELK Stack Integration

5.1. Elasticsearch Installation

# Import Elasticsearch GPG key
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

# Add repository
echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

# Install Elasticsearch
sudo apt update
sudo apt install elasticsearch

# Configure
sudo nano /etc/elasticsearch/elasticsearch.yml

Elasticsearch configuration:

# /etc/elasticsearch/elasticsearch.yml
cluster.name: nginx-logs
node.name: node-1
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: localhost
http.port: 9200
discovery.type: single-node

# Security
xpack.security.enabled: false

Start Elasticsearch:

sudo systemctl start elasticsearch
sudo systemctl enable elasticsearch

# Test
curl http://localhost:9200

5.2. Logstash Configuration

Install Logstash:

sudo apt install logstash

Logstash pipeline:

# /etc/logstash/conf.d/nginx.conf
input {
  file {
    path => "/var/log/nginx/access.log"
    start_position => "beginning"
    type => "nginx-access"
  }
  
  file {
    path => "/var/log/nginx/error.log"
    start_position => "beginning"
    type => "nginx-error"
  }
}

filter {
  if [type] == "nginx-access" {
    grok {
      match => {
        "message" => '%{IPORHOST:remote_addr} - %{DATA:remote_user} \[%{HTTPDATE:time_local}\] "%{WORD:request_method} %{DATA:request_path} HTTP/%{NUMBER:http_version}" %{NUMBER:status} %{NUMBER:body_bytes_sent} "%{DATA:http_referer}" "%{DATA:http_user_agent}"'
      }
    }
    
    date {
      match => [ "time_local", "dd/MMM/yyyy:HH:mm:ss Z" ]
      target => "@timestamp"
    }
    
    geoip {
      source => "remote_addr"
    }
    
    mutate {
      convert => {
        "status" => "integer"
        "body_bytes_sent" => "integer"
      }
    }
  }
  
  if [type] == "nginx-error" {
    grok {
      match => {
        "message" => "(?<timestamp>%{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY}[- ]%{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER:threadid}\: \*%{NUMBER:connectionid} %{GREEDYDATA:errormessage}"
      }
    }
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "nginx-logs-%{+YYYY.MM.dd}"
  }
  
  # Debug output
  # stdout { codec => rubydebug }
}

Start Logstash:

sudo systemctl start logstash
sudo systemctl enable logstash

# Test configuration
sudo -u logstash /usr/share/logstash/bin/logstash --path.settings /etc/logstash -t

5.3. Kibana Installation

# Install Kibana
sudo apt install kibana

# Configure
sudo nano /etc/kibana/kibana.yml

Kibana configuration:

# /etc/kibana/kibana.yml
server.port: 5601
server.host: "localhost"
elasticsearch.hosts: ["http://localhost:9200"]

Start Kibana:

sudo systemctl start kibana
sudo systemctl enable kibana

# Access at: http://localhost:5601

Configure Nginx reverse proxy for Kibana:

server {
    listen 80;
    server_name kibana.example.com;
    
    location / {
        proxy_pass http://localhost:5601;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
    }
}

5.4. Kibana Visualizations

Create index pattern:

  1. Management → Index Patterns
  2. Create index pattern: nginx-logs-*
  3. Select time field: @timestamp

Sample visualizations:

  1. Request Rate Over Time
    • Type: Line chart
    • Y-axis: Count
    • X-axis: Date Histogram (@timestamp)
  2. Status Code Distribution
    • Type: Pie chart
    • Slice by: status.keyword
  3. Top 10 URLs
    • Type: Data table
    • Metrics: Count
    • Buckets: request_path.keyword
  4. Geographic Distribution
    • Type: Coordinate map
    • Geohash: geoip.location
  5. Error Rate
    • Type: Metric
    • Aggregation: Count
    • Filter: status >= 400

5.5. Filebeat Alternative

# Install Filebeat
sudo apt install filebeat

# Configure
sudo nano /etc/filebeat/filebeat.yml

Filebeat configuration:

# /etc/filebeat/filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/nginx/access.log
  fields:
    log_type: nginx-access
  
- type: log
  enabled: true
  paths:
    - /var/log/nginx/error.log
  fields:
    log_type: nginx-error

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false

setup.template.settings:
  index.number_of_shards: 1

output.elasticsearch:
  hosts: ["localhost:9200"]
  index: "filebeat-nginx-%{+yyyy.MM.dd}"

setup.kibana:
  host: "localhost:5601"

processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

Enable Nginx module:

sudo filebeat modules enable nginx

# Configure module
sudo nano /etc/filebeat/modules.d/nginx.yml
# /etc/filebeat/modules.d/nginx.yml
- module: nginx
  access:
    enabled: true
    var.paths: ["/var/log/nginx/access.log"]
  
  error:
    enabled: true
    var.paths: ["/var/log/nginx/error.log"]

Start Filebeat:

# Setup
sudo filebeat setup -e

# Start
sudo systemctl start filebeat
sudo systemctl enable filebeat

6. Alerting Systems

6.1. Prometheus Alertmanager

Install Alertmanager:

wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar -xzf alertmanager-0.26.0.linux-amd64.tar.gz
sudo mv alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
sudo mv alertmanager-0.26.0.linux-amd64/amtool /usr/local/bin/

Alertmanager configuration:

# /etc/alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.gmail.com:587'
  smtp_from: 'alerts@example.com'
  smtp_auth_username: 'alerts@example.com'
  smtp_auth_password: 'your-password'

route:
  group_by: ['alertname', 'cluster']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 12h
  receiver: 'email-notifications'

receivers:
  - name: 'email-notifications'
    email_configs:
      - to: 'admin@example.com'
        headers:
          Subject: 'Nginx Alert: {{ .GroupLabels.alertname }}'
  
  - name: 'slack-notifications'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
        channel: '#alerts'
        title: 'Nginx Alert'
        text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'instance']

Prometheus alert rules:

# /etc/prometheus/rules/nginx_alerts.yml
groups:
  - name: nginx
    interval: 30s
    rules:
      # High error rate
      - alert: NginxHighErrorRate
        expr: rate(nginx_http_requests_total{status=~"5.."}[5m]) > 10
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate on {{ $labels.instance }}"
          description: "Error rate is {{ $value }} errors/sec"
      
      # High response time
      - alert: NginxHighResponseTime
        expr: nginx_http_request_duration_seconds{quantile="0.99"} > 5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High response time on {{ $labels.instance }}"
          description: "99th percentile response time is {{ $value }}s"
      
      # Nginx down
      - alert: NginxDown
        expr: up{job="nginx"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Nginx is down on {{ $labels.instance }}"
          description: "Nginx exporter is not responding"
      
      # High connection count
      - alert: NginxHighConnections
        expr: nginx_connections_active > 1000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High connection count on {{ $labels.instance }}"
          description: "Active connections: {{ $value }}"
      
      # Low request rate (possible issue)
      - alert: NginxLowRequestRate
        expr: rate(nginx_http_requests_total[5m]) < 1
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Low request rate on {{ $labels.instance }}"
          description: "Request rate is {{ $value }} req/sec"

6.2. Email Alerts Script

#!/bin/bash
# nginx_alert.sh

THRESHOLD_ERROR_RATE=100
THRESHOLD_RESPONSE_TIME=5
EMAIL="admin@example.com"
LOG_FILE="/var/log/nginx/access.log"

# Check error rate
ERROR_COUNT=$(grep -c " 5[0-9][0-9] " $LOG_FILE)

if [ $ERROR_COUNT -gt $THRESHOLD_ERROR_RATE ]; then
    echo "ALERT: High error rate detected: $ERROR_COUNT errors" | \
        mail -s "Nginx Alert: High Error Rate" $EMAIL
fi

# Check response time (if logged)
AVG_RESPONSE_TIME=$(awk '{sum+=$NF; count++} END {print sum/count}' $LOG_FILE)

if (( $(echo "$AVG_RESPONSE_TIME > $THRESHOLD_RESPONSE_TIME" | bc -l) )); then
    echo "ALERT: High response time: ${AVG_RESPONSE_TIME}s" | \
        mail -s "Nginx Alert: High Response Time" $EMAIL
fi

6.3. Slack Webhook Integration

#!/bin/bash
# slack_alert.sh

WEBHOOK_URL="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
LOG_FILE="/var/log/nginx/error.log"

# Check for errors
ERROR_COUNT=$(wc -l < $LOG_FILE)

if [ $ERROR_COUNT -gt 100 ]; then
    MESSAGE="⚠️ Nginx Alert: $ERROR_COUNT errors detected!"
    
    curl -X POST $WEBHOOK_URL \
        -H 'Content-Type: application/json' \
        -d "{\"text\":\"$MESSAGE\"}"
fi

6.4. Automated Response Script

#!/bin/bash
# auto_response.sh

LOG_FILE="/var/log/nginx/error.log"
THRESHOLD=100

# Count recent errors (last 5 minutes)
ERROR_COUNT=$(find /var/log/nginx -name "error.log" -mmin -5 -exec wc -l {} \; | awk '{sum+=$1} END {print sum}')

if [ $ERROR_COUNT -gt $THRESHOLD ]; then
    echo "High error rate detected: $ERROR_COUNT errors"
    
    # Reload Nginx
    echo "Reloading Nginx..."
    systemctl reload nginx
    
    # Send alert
    echo "High error rate triggered Nginx reload: $ERROR_COUNT errors" | \
        mail -s "Nginx Auto-Response" admin@example.com
    
    # Log action
    echo "$(date): Auto-reload triggered due to $ERROR_COUNT errors" >> /var/log/nginx-auto-response.log
fi

7. Real-time Monitoring Dashboard

7.1. Custom HTML Dashboard

<!DOCTYPE html>
<html>
<head>
    <title>Nginx Real-time Monitor</title>
    <meta http-equiv="refresh" content="5">
    <style>
        body {
            font-family: Arial, sans-serif;
            margin: 20px;
            background: #f5f5f5;
        }
        .container {
            max-width: 1200px;
            margin: 0 auto;
        }
        .metric {
            background: white;
            padding: 20px;
            margin: 10px 0;
            border-radius: 5px;
            box-shadow: 0 2px 4px rgba(0,0,0,0.1);
        }
        .metric h2 {
            margin: 0 0 10px 0;
            color: #333;
        }
        .value {
            font-size: 32px;
            font-weight: bold;
            color: #007bff;
        }
        .alert {
            background: #ff4444;
            color: white;
        }
        .warning {
            background: #ffaa00;
            color: white;
        }
        .ok {
            background: #00c851;
            color: white;
        }
    </style>
</head>
<body>
    <div class="container">
        <h1>Nginx Real-time Monitor</h1>
        
        <?php
        // Read Nginx stats
        $stats = file_get_contents('http://localhost:8080/stub_status');
        
        preg_match('/Active connections: (\d+)/', $stats, $active);
        preg_match('/(\d+)\s+(\d+)\s+(\d+)/', $stats, $totals);
        preg_match('/Reading: (\d+) Writing: (\d+) Waiting: (\d+)/', $stats, $current);
        
        $activeConnections = $active[1];
        $accepts = $totals[1];
        $handled = $totals[2];
        $requests = $totals[3];
        $reading = $current[1];
        $writing = $current[2];
        $waiting = $current[3];
        
        // Determine status
        $status = 'ok';
        if ($activeConnections > 1000) $status = 'alert';
        elseif ($activeConnections > 500) $status = 'warning';
        ?>
        
        <div class="metric <?php echo $status; ?>">
            <h2>Active Connections</h2>
            <div class="value"><?php echo $activeConnections; ?></div>
        </div>
        
        <div class="metric">
            <h2>Total Requests</h2>
            <div class="value"><?php echo number_format($requests); ?></div>
        </div>
        
        <div class="metric">
            <h2>Connection Details</h2>
            <p>Reading: <?php echo $reading; ?></p>
            <p>Writing: <?php echo $writing; ?></p>
            <p>Waiting: <?php echo $waiting; ?></p>
        </div>
        
        <div class="metric">
            <h2>Server Stats</h2>
            <p>Accepts: <?php echo number_format($accepts); ?></p>
            <p>Handled: <?php echo number_format($handled); ?></p>
            <p>Requests per connection: <?php echo round($requests/$handled, 2); ?></p>
        </div>
        
        <p><small>Last updated: <?php echo date('Y-m-d H:i:s'); ?></small></p>
    </div>
</body>
</html>

7.2. Python Real-time Monitor

#!/usr/bin/env python3
# nginx_monitor.py

import requests
import time
import re
from datetime import datetime

def get_nginx_stats():
    """Fetch Nginx stub_status"""
    try:
        response = requests.get('http://localhost:8080/stub_status')
        return response.text
    except Exception as e:
        print(f"Error fetching stats: {e}")
        return None

def parse_stats(stats):
    """Parse stub_status output"""
    if not stats:
        return None
    
    active = re.search(r'Active connections: (\d+)', stats)
    totals = re.search(r'(\d+)\s+(\d+)\s+(\d+)', stats)
    current = re.search(r'Reading: (\d+) Writing: (\d+) Waiting: (\d+)', stats)
    
    return {
        'active_connections': int(active.group(1)),
        'accepts': int(totals.group(1)),
        'handled': int(totals.group(2)),
        'requests': int(totals.group(3)),
        'reading': int(current.group(1)),
        'writing': int(current.group(2)),
        'waiting': int(current.group(3)),
        'timestamp': datetime.now()
    }

def display_stats(stats):
    """Display stats in terminal"""
    print("\033[2J\033[H")  # Clear screen
    print("=" * 50)
    print("Nginx Real-time Monitor")
    print("=" * 50)
    print(f"Time: {stats['timestamp'].strftime('%Y-%m-%d %H:%M:%S')}")
    print()
    print(f"Active Connections: {stats['active_connections']}")
    print(f"Total Requests: {stats['requests']:,}")
    print()
    print(f"Reading: {stats['reading']}")
    print(f"Writing: {stats['writing']}")
    print(f"Waiting: {stats['waiting']}")
    print()
    print(f"Accepts: {stats['accepts']:,}")
    print(f"Handled: {stats['handled']:,}")
    print(f"Requests/Connection: {stats['requests']/stats['handled']:.2f}")
    print()
    
    # Alerts
    if stats['active_connections'] > 1000:
        print("⚠️  ALERT: High connection count!")
    elif stats['active_connections'] > 500:
        print("⚠️  WARNING: Elevated connection count")
    else:
        print("✓ Status: OK")

def main():
    print("Starting Nginx monitor...")
    print("Press Ctrl+C to exit")
    time.sleep(2)
    
    try:
        while True:
            stats_text = get_nginx_stats()
            stats = parse_stats(stats_text)
            
            if stats:
                display_stats(stats)
            
            time.sleep(5)
    except KeyboardInterrupt:
        print("\nMonitor stopped")

if __name__ == '__main__':
    main()

Tổng kết

Trong bài này, bạn đã học:

  • ✅ Access logs và error logs configuration
  • ✅ Custom log formats (JSON, detailed, performance)
  • ✅ Log rotation với logrotate
  • ✅ Log analysis tools (GoAccess, AWK scripts)
  • ✅ Prometheus + Grafana monitoring
  • ✅ ELK Stack integration (Elasticsearch, Logstash, Kibana)
  • ✅ Alerting systems (Alertmanager, email, Slack)
  • ✅ Real-time dashboards
  • ✅ Performance metrics và troubleshooting

Key takeaways:

  • Use structured logging (JSON) for better analysis
  • Implement log rotation to manage disk space
  • Set up real-time monitoring for quick issue detection
  • Create dashboards for visualization
  • Configure alerts for critical issues
  • Automate log analysis with scripts
  • Integrate with centralized logging systems

Monitoring checklist:

  • ✅ Access and error logs configured
  • ✅ Log rotation enabled
  • ✅ Prometheus exporter running
  • ✅ Grafana dashboards created
  • ✅ Alerting rules configured
  • ✅ ELK stack (optional) integrated
  • ✅ Regular log analysis scheduled
  • ✅ Backup and retention policies defined

Bài tiếp theo: High Availability và Load Balancing Advanced - Nginx Plus features, health checks, session persistence, active-active setups, failover strategies, và disaster recovery để đảm bảo uptime tối đa trong production environments.

Nginx monitoring logging AccessLogs ErrorLogs prometheus troubleshooting production devops Observability

Đánh dấu hoàn thành (Bài 12: Monitoring và Logging trong NGINX)