Bài 12: Monitoring và Logging trong NGINX
Bài học về Monitoring và Logging trong Nginx - access logs analysis, custom log formats, log rotation với logrotate, real-time monitoring tools, Prometheus + Grafana integration, ELK Stack (Elasticsearch, Logstash, Kibana), alerting systems, performance metrics và troubleshooting.
1. Access Logs và Error Logs
1.1. Default Log Configuration
http {
# Default log format
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
# Access log
access_log /var/log/nginx/access.log main;
# Error log with level
error_log /var/log/nginx/error.log warn;
server {
listen 80;
server_name example.com;
# Server-specific logs
access_log /var/log/nginx/example.com.access.log main;
error_log /var/log/nginx/example.com.error.log;
location / {
root /var/www/html;
}
# Disable logging for specific location
location /health {
access_log off;
return 200 "OK\n";
}
# Location-specific log
location /api/ {
access_log /var/log/nginx/api.access.log main;
proxy_pass http://backend;
}
}
}
1.2. Error Log Levels
# Error log levels (highest to lowest severity)
error_log /var/log/nginx/error.log emerg; # System is unusable
error_log /var/log/nginx/error.log alert; # Action must be taken immediately
error_log /var/log/nginx/error.log crit; # Critical conditions
error_log /var/log/nginx/error.log error; # Error conditions (default)
error_log /var/log/nginx/error.log warn; # Warning conditions
error_log /var/log/nginx/error.log notice; # Normal but significant
error_log /var/log/nginx/error.log info; # Informational messages
error_log /var/log/nginx/error.log debug; # Debug messages
# Recommended for production
error_log /var/log/nginx/error.log warn;
# For debugging
error_log /var/log/nginx/error.log debug;
1.3. Custom Log Formats
Detailed log format:
http {
log_format detailed '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent" '
'rt=$request_time uct="$upstream_connect_time" '
'uht="$upstream_header_time" urt="$upstream_response_time"';
access_log /var/log/nginx/access.log detailed;
}
JSON log format:
http {
log_format json_combined escape=json
'{'
'"time_local":"$time_local",'
'"remote_addr":"$remote_addr",'
'"remote_user":"$remote_user",'
'"request":"$request",'
'"status": "$status",'
'"body_bytes_sent":"$body_bytes_sent",'
'"request_time":"$request_time",'
'"http_referrer":"$http_referer",'
'"http_user_agent":"$http_user_agent",'
'"http_x_forwarded_for":"$http_x_forwarded_for",'
'"upstream_addr":"$upstream_addr",'
'"upstream_status":"$upstream_status",'
'"upstream_response_time":"$upstream_response_time",'
'"upstream_connect_time":"$upstream_connect_time",'
'"upstream_header_time":"$upstream_header_time"'
'}';
access_log /var/log/nginx/access.log json_combined;
}
Cache status log:
http {
log_format cache_status '$remote_addr - [$time_local] "$request" '
'$status $body_bytes_sent '
'"$http_referer" "$http_user_agent" '
'cache_status=$upstream_cache_status '
'response_time=$request_time';
access_log /var/log/nginx/cache.log cache_status;
}
Performance tracking log:
http {
log_format performance '$time_iso8601 $remote_addr '
'"$request" $status $body_bytes_sent '
'rt=$request_time '
'ua="$upstream_addr" '
'us=$upstream_status '
'ut=$upstream_response_time '
'ul="$upstream_response_length" '
'cs=$upstream_cache_status';
access_log /var/log/nginx/performance.log performance;
}
Security log:
http {
log_format security '$remote_addr [$time_local] '
'"$request" $status '
'user_agent="$http_user_agent" '
'referer="$http_referer" '
'forwarded_for="$http_x_forwarded_for" '
'host="$host"';
# Log only suspicious requests
map $status $loggable {
~^[23] 0;
default 1;
}
access_log /var/log/nginx/security.log security if=$loggable;
}
1.4. Conditional Logging
http {
# Don't log successful health checks
map $request_uri $loggable_request {
~^/health$ 0;
~^/ping$ 0;
default 1;
}
# Don't log static files
map $request_uri $loggable_static {
~*\.(jpg|jpeg|png|gif|ico|css|js)$ 0;
default 1;
}
# Combine conditions
map "$loggable_request:$loggable_static" $final_loggable {
"0:0" 0;
"0:1" 0;
"1:0" 0;
default 1;
}
server {
listen 80;
access_log /var/log/nginx/access.log combined if=$final_loggable;
# Or per-location
location /api/ {
access_log /var/log/nginx/api.log combined;
proxy_pass http://backend;
}
location /static/ {
access_log off;
root /var/www;
}
}
}
1.5. Log Variables
Available variables:
# Client information
$remote_addr # Client IP address
$remote_user # Client username (HTTP auth)
$remote_port # Client port
# Request information
$request # Full request line
$request_method # GET, POST, etc.
$request_uri # Full URI with arguments
$uri # URI without arguments
$args # Query string
$scheme # http or https
$server_protocol # HTTP version (HTTP/1.1, HTTP/2.0)
# Response information
$status # Response status code
$body_bytes_sent # Bytes sent to client
$bytes_sent # Total bytes sent (including headers)
# Timing information
$request_time # Total request processing time
$upstream_response_time # Backend response time
$upstream_connect_time # Time to connect to backend
$upstream_header_time # Time to receive headers from backend
# Upstream information
$upstream_addr # Backend server address
$upstream_status # Backend response status
$upstream_cache_status # Cache status (HIT, MISS, etc.)
# Headers
$http_referer # Referer header
$http_user_agent # User-Agent header
$http_x_forwarded_for # X-Forwarded-For header
# Time
$time_local # Local time
$time_iso8601 # ISO 8601 format
$msec # Unix timestamp with milliseconds
2. Log Rotation
2.1. Logrotate Configuration
# /etc/logrotate.d/nginx
/var/log/nginx/*.log {
daily # Rotate daily
missingok # Don't error if log is missing
rotate 14 # Keep 14 days of logs
compress # Compress rotated logs
delaycompress # Compress after one rotation
notifempty # Don't rotate if empty
create 0640 nginx adm # Create new file with permissions
sharedscripts # Run postrotate once for all logs
postrotate
if [ -f /var/run/nginx.pid ]; then
kill -USR1 `cat /var/run/nginx.pid`
fi
endscript
}
Weekly rotation:
# /etc/logrotate.d/nginx-weekly
/var/log/nginx/access.log
/var/log/nginx/error.log {
weekly
rotate 52
compress
delaycompress
notifempty
create 0640 nginx adm
sharedscripts
postrotate
[ -f /var/run/nginx.pid ] && kill -USR1 `cat /var/run/nginx.pid`
endscript
}
Size-based rotation:
# /etc/logrotate.d/nginx-size
/var/log/nginx/*.log {
size 100M # Rotate when file reaches 100MB
rotate 10 # Keep 10 rotated files
compress
delaycompress
notifempty
create 0640 nginx adm
sharedscripts
postrotate
[ -f /var/run/nginx.pid ] && kill -USR1 `cat /var/run/nginx.pid`
endscript
}
Test logrotate:
# Test configuration
sudo logrotate -d /etc/logrotate.d/nginx
# Force rotation
sudo logrotate -f /etc/logrotate.d/nginx
# Check status
cat /var/lib/logrotate/status
2.2. Manual Log Rotation Script
#!/bin/bash
# rotate_nginx_logs.sh
LOG_DIR="/var/log/nginx"
BACKUP_DIR="/var/log/nginx/archive"
DAYS_TO_KEEP=30
# Create backup directory
mkdir -p $BACKUP_DIR
# Get current date
DATE=$(date +%Y%m%d-%H%M%S)
# Rotate logs
for log in access.log error.log; do
if [ -f "$LOG_DIR/$log" ]; then
# Move log file
mv "$LOG_DIR/$log" "$BACKUP_DIR/${log%.*}-$DATE.log"
# Compress
gzip "$BACKUP_DIR/${log%.*}-$DATE.log"
# Create new empty log
touch "$LOG_DIR/$log"
chmod 640 "$LOG_DIR/$log"
chown nginx:adm "$LOG_DIR/$log"
fi
done
# Reload Nginx
nginx -s reopen
# Delete old logs
find $BACKUP_DIR -name "*.gz" -mtime +$DAYS_TO_KEEP -delete
echo "Log rotation complete: $DATE"
Cron job:
# /etc/cron.d/nginx-logrotate
0 0 * * * root /usr/local/bin/rotate_nginx_logs.sh >> /var/log/nginx-rotation.log 2>&1
3. Log Analysis Tools
3.1. GoAccess (Real-time Web Log Analyzer)
Install GoAccess:
# Ubuntu/Debian
sudo apt install goaccess
# CentOS/RHEL
sudo yum install goaccess
# macOS
brew install goaccess
Analyze logs:
# Real-time terminal dashboard
sudo goaccess /var/log/nginx/access.log -c
# Generate HTML report
sudo goaccess /var/log/nginx/access.log -o /var/www/html/report.html --log-format=COMBINED
# Real-time HTML dashboard
sudo goaccess /var/log/nginx/access.log -o /var/www/html/report.html --log-format=COMBINED --real-time-html
# With custom log format
sudo goaccess /var/log/nginx/access.log --log-format='%h %^[%d:%t %^] "%r" %s %b "%R" "%u"' --date-format=%d/%b/%Y --time-format=%H:%M:%S
GoAccess configuration:
# /etc/goaccess/goaccess.conf
# Log format
log-format %h %^[%d:%t %^] "%r" %s %b "%R" "%u"
date-format %d/%b/%Y
time-format %H:%M:%S
# UI options
color-scheme 1
hl-header true
# Output options
html-prefs {"theme":"bright","perPage":10}
html-report-title "Nginx Statistics"
# Enable/disable panels
enable-panel VISITORS
enable-panel REQUESTS
enable-panel REQUESTS_STATIC
enable-panel NOT_FOUND
enable-panel HOSTS
enable-panel OS
enable-panel BROWSERS
enable-panel STATUS_CODES
enable-panel REFERRING_SITES
enable-panel KEYPHRASES
enable-panel GEO_LOCATION
3.2. AWK Scripts for Log Analysis
Requests per second:
#!/bin/bash
# requests_per_second.sh
awk '{print $4}' /var/log/nginx/access.log | \
cut -d: -f1-3 | \
uniq -c | \
awk '{print $2, $1}' | \
sort
Top 10 IP addresses:
#!/bin/bash
# top_ips.sh
awk '{print $1}' /var/log/nginx/access.log | \
sort | \
uniq -c | \
sort -rn | \
head -10
Top 10 URLs:
#!/bin/bash
# top_urls.sh
awk '{print $7}' /var/log/nginx/access.log | \
sort | \
uniq -c | \
sort -rn | \
head -10
Status code distribution:
#!/bin/bash
# status_codes.sh
awk '{print $9}' /var/log/nginx/access.log | \
sort | \
uniq -c | \
sort -rn
Average response time:
#!/bin/bash
# avg_response_time.sh
# Assuming response time is logged
awk '{sum+=$NF; count++} END {print "Average:", sum/count "s"}' /var/log/nginx/access.log
Requests by hour:
#!/bin/bash
# requests_by_hour.sh
awk '{print $4}' /var/log/nginx/access.log | \
cut -d: -f2 | \
sort -n | \
uniq -c
404 errors:
#!/bin/bash
# 404_errors.sh
awk '$9 == "404" {print $7}' /var/log/nginx/access.log | \
sort | \
uniq -c | \
sort -rn | \
head -20
Bandwidth by IP:
#!/bin/bash
# bandwidth_by_ip.sh
awk '{ip[$1]+=$10} END {for (i in ip) print i, ip[i]/1024/1024 "MB"}' /var/log/nginx/access.log | \
sort -k2 -rn | \
head -10
3.3. Complete Log Analysis Script
#!/bin/bash
# analyze_nginx_logs.sh
LOG_FILE="/var/log/nginx/access.log"
OUTPUT_DIR="/var/www/reports"
DATE=$(date +%Y-%m-%d)
mkdir -p $OUTPUT_DIR
echo "Nginx Log Analysis - $DATE" > $OUTPUT_DIR/report-$DATE.txt
echo "=====================================" >> $OUTPUT_DIR/report-$DATE.txt
echo "" >> $OUTPUT_DIR/report-$DATE.txt
# Total requests
echo "Total Requests:" >> $OUTPUT_DIR/report-$DATE.txt
wc -l < $LOG_FILE >> $OUTPUT_DIR/report-$DATE.txt
echo "" >> $OUTPUT_DIR/report-$DATE.txt
# Top 10 IPs
echo "Top 10 IP Addresses:" >> $OUTPUT_DIR/report-$DATE.txt
awk '{print $1}' $LOG_FILE | sort | uniq -c | sort -rn | head -10 >> $OUTPUT_DIR/report-$DATE.txt
echo "" >> $OUTPUT_DIR/report-$DATE.txt
# Top 10 URLs
echo "Top 10 URLs:" >> $OUTPUT_DIR/report-$DATE.txt
awk '{print $7}' $LOG_FILE | sort | uniq -c | sort -rn | head -10 >> $OUTPUT_DIR/report-$DATE.txt
echo "" >> $OUTPUT_DIR/report-$DATE.txt
# Status codes
echo "Status Code Distribution:" >> $OUTPUT_DIR/report-$DATE.txt
awk '{print $9}' $LOG_FILE | sort | uniq -c | sort -rn >> $OUTPUT_DIR/report-$DATE.txt
echo "" >> $OUTPUT_DIR/report-$DATE.txt
# Top user agents
echo "Top 10 User Agents:" >> $OUTPUT_DIR/report-$DATE.txt
awk -F'"' '{print $6}' $LOG_FILE | sort | uniq -c | sort -rn | head -10 >> $OUTPUT_DIR/report-$DATE.txt
echo "" >> $OUTPUT_DIR/report-$DATE.txt
# Requests by hour
echo "Requests by Hour:" >> $OUTPUT_DIR/report-$DATE.txt
awk '{print $4}' $LOG_FILE | cut -d: -f2 | sort -n | uniq -c >> $OUTPUT_DIR/report-$DATE.txt
echo "" >> $OUTPUT_DIR/report-$DATE.txt
# Top 404 errors
echo "Top 404 Errors:" >> $OUTPUT_DIR/report-$DATE.txt
awk '$9 == "404" {print $7}' $LOG_FILE | sort | uniq -c | sort -rn | head -10 >> $OUTPUT_DIR/report-$DATE.txt
echo "Report generated: $OUTPUT_DIR/report-$DATE.txt"
4. Prometheus + Grafana Integration
4.1. Nginx Prometheus Exporter
Install nginx-prometheus-exporter:
# Download latest release
wget https://github.com/nginxinc/nginx-prometheus-exporter/releases/download/v0.11.0/nginx-prometheus-exporter_0.11.0_linux_amd64.tar.gz
# Extract
tar -xzf nginx-prometheus-exporter_0.11.0_linux_amd64.tar.gz
# Move to bin
sudo mv nginx-prometheus-exporter /usr/local/bin/
# Create systemd service
sudo nano /etc/systemd/system/nginx-exporter.service
Systemd service:
[Unit]
Description=Nginx Prometheus Exporter
After=network.target
[Service]
Type=simple
User=nginx-exporter
Group=nginx-exporter
ExecStart=/usr/local/bin/nginx-prometheus-exporter \
-nginx.scrape-uri=http://localhost:8080/stub_status \
-web.listen-address=:9113
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
Enable stub_status in Nginx:
server {
listen 8080;
server_name localhost;
location /stub_status {
stub_status;
access_log off;
allow 127.0.0.1;
deny all;
}
}
Start exporter:
sudo systemctl daemon-reload
sudo systemctl start nginx-exporter
sudo systemctl enable nginx-exporter
# Check status
sudo systemctl status nginx-exporter
# Test metrics endpoint
curl http://localhost:9113/metrics
4.2. Prometheus Configuration
# /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'nginx'
static_configs:
- targets: ['localhost:9113']
labels:
instance: 'nginx-server-1'
environment: 'production'
- job_name: 'node'
static_configs:
- targets: ['localhost:9100']
labels:
instance: 'nginx-server-1'
4.3. VTS Module (Advanced Metrics)
Install Nginx with VTS module:
# Clone VTS module
git clone https://github.com/vozlt/nginx-module-vts.git
# Download Nginx source
wget http://nginx.org/download/nginx-1.24.0.tar.gz
tar -xzf nginx-1.24.0.tar.gz
cd nginx-1.24.0
# Configure with VTS module
./configure --add-module=../nginx-module-vts \
--prefix=/etc/nginx \
--sbin-path=/usr/sbin/nginx \
--conf-path=/etc/nginx/nginx.conf
# Compile and install
make
sudo make install
Configure VTS:
http {
vhost_traffic_status_zone;
server {
listen 80;
location /status {
vhost_traffic_status_display;
vhost_traffic_status_display_format html;
access_log off;
allow 127.0.0.1;
deny all;
}
}
}
4.4. Grafana Dashboard
Install Grafana:
# Ubuntu/Debian
sudo apt install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo apt update
sudo apt install grafana
# Start Grafana
sudo systemctl start grafana-server
sudo systemctl enable grafana-server
Access Grafana:
- URL: http://localhost:3000
- Default login: admin/admin
Import Nginx dashboard:
- Go to Dashboards → Import
- Enter dashboard ID: 12708 (Nginx Prometheus Exporter)
- Select Prometheus datasource
- Import
Custom Grafana dashboard queries:
# Request rate
rate(nginx_http_requests_total[5m])
# Error rate
rate(nginx_http_requests_total{status=~"5.."}[5m])
# Average response time
rate(nginx_http_request_duration_seconds_sum[5m]) /
rate(nginx_http_request_duration_seconds_count[5m])
# Active connections
nginx_connections_active
# Request rate by status code
sum(rate(nginx_http_requests_total[5m])) by (status)
# Bandwidth
rate(nginx_http_request_bytes_total[5m])
5. ELK Stack Integration
5.1. Elasticsearch Installation
# Import Elasticsearch GPG key
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
# Add repository
echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
# Install Elasticsearch
sudo apt update
sudo apt install elasticsearch
# Configure
sudo nano /etc/elasticsearch/elasticsearch.yml
Elasticsearch configuration:
# /etc/elasticsearch/elasticsearch.yml
cluster.name: nginx-logs
node.name: node-1
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: localhost
http.port: 9200
discovery.type: single-node
# Security
xpack.security.enabled: false
Start Elasticsearch:
sudo systemctl start elasticsearch
sudo systemctl enable elasticsearch
# Test
curl http://localhost:9200
5.2. Logstash Configuration
Install Logstash:
sudo apt install logstash
Logstash pipeline:
# /etc/logstash/conf.d/nginx.conf
input {
file {
path => "/var/log/nginx/access.log"
start_position => "beginning"
type => "nginx-access"
}
file {
path => "/var/log/nginx/error.log"
start_position => "beginning"
type => "nginx-error"
}
}
filter {
if [type] == "nginx-access" {
grok {
match => {
"message" => '%{IPORHOST:remote_addr} - %{DATA:remote_user} \[%{HTTPDATE:time_local}\] "%{WORD:request_method} %{DATA:request_path} HTTP/%{NUMBER:http_version}" %{NUMBER:status} %{NUMBER:body_bytes_sent} "%{DATA:http_referer}" "%{DATA:http_user_agent}"'
}
}
date {
match => [ "time_local", "dd/MMM/yyyy:HH:mm:ss Z" ]
target => "@timestamp"
}
geoip {
source => "remote_addr"
}
mutate {
convert => {
"status" => "integer"
"body_bytes_sent" => "integer"
}
}
}
if [type] == "nginx-error" {
grok {
match => {
"message" => "(?<timestamp>%{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY}[- ]%{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER:threadid}\: \*%{NUMBER:connectionid} %{GREEDYDATA:errormessage}"
}
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "nginx-logs-%{+YYYY.MM.dd}"
}
# Debug output
# stdout { codec => rubydebug }
}
Start Logstash:
sudo systemctl start logstash
sudo systemctl enable logstash
# Test configuration
sudo -u logstash /usr/share/logstash/bin/logstash --path.settings /etc/logstash -t
5.3. Kibana Installation
# Install Kibana
sudo apt install kibana
# Configure
sudo nano /etc/kibana/kibana.yml
Kibana configuration:
# /etc/kibana/kibana.yml
server.port: 5601
server.host: "localhost"
elasticsearch.hosts: ["http://localhost:9200"]
Start Kibana:
sudo systemctl start kibana
sudo systemctl enable kibana
# Access at: http://localhost:5601
Configure Nginx reverse proxy for Kibana:
server {
listen 80;
server_name kibana.example.com;
location / {
proxy_pass http://localhost:5601;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
}
}
5.4. Kibana Visualizations
Create index pattern:
- Management → Index Patterns
- Create index pattern:
nginx-logs-* - Select time field:
@timestamp
Sample visualizations:
- Request Rate Over Time
- Type: Line chart
- Y-axis: Count
- X-axis: Date Histogram (@timestamp)
- Status Code Distribution
- Type: Pie chart
- Slice by: status.keyword
- Top 10 URLs
- Type: Data table
- Metrics: Count
- Buckets: request_path.keyword
- Geographic Distribution
- Type: Coordinate map
- Geohash: geoip.location
- Error Rate
- Type: Metric
- Aggregation: Count
- Filter: status >= 400
5.5. Filebeat Alternative
# Install Filebeat
sudo apt install filebeat
# Configure
sudo nano /etc/filebeat/filebeat.yml
Filebeat configuration:
# /etc/filebeat/filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/nginx/access.log
fields:
log_type: nginx-access
- type: log
enabled: true
paths:
- /var/log/nginx/error.log
fields:
log_type: nginx-error
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
setup.template.settings:
index.number_of_shards: 1
output.elasticsearch:
hosts: ["localhost:9200"]
index: "filebeat-nginx-%{+yyyy.MM.dd}"
setup.kibana:
host: "localhost:5601"
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
Enable Nginx module:
sudo filebeat modules enable nginx
# Configure module
sudo nano /etc/filebeat/modules.d/nginx.yml
# /etc/filebeat/modules.d/nginx.yml
- module: nginx
access:
enabled: true
var.paths: ["/var/log/nginx/access.log"]
error:
enabled: true
var.paths: ["/var/log/nginx/error.log"]
Start Filebeat:
# Setup
sudo filebeat setup -e
# Start
sudo systemctl start filebeat
sudo systemctl enable filebeat
6. Alerting Systems
6.1. Prometheus Alertmanager
Install Alertmanager:
wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
tar -xzf alertmanager-0.26.0.linux-amd64.tar.gz
sudo mv alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
sudo mv alertmanager-0.26.0.linux-amd64/amtool /usr/local/bin/
Alertmanager configuration:
# /etc/alertmanager/alertmanager.yml
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.gmail.com:587'
smtp_from: 'alerts@example.com'
smtp_auth_username: 'alerts@example.com'
smtp_auth_password: 'your-password'
route:
group_by: ['alertname', 'cluster']
group_wait: 10s
group_interval: 10s
repeat_interval: 12h
receiver: 'email-notifications'
receivers:
- name: 'email-notifications'
email_configs:
- to: 'admin@example.com'
headers:
Subject: 'Nginx Alert: {{ .GroupLabels.alertname }}'
- name: 'slack-notifications'
slack_configs:
- api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
channel: '#alerts'
title: 'Nginx Alert'
text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'instance']
Prometheus alert rules:
# /etc/prometheus/rules/nginx_alerts.yml
groups:
- name: nginx
interval: 30s
rules:
# High error rate
- alert: NginxHighErrorRate
expr: rate(nginx_http_requests_total{status=~"5.."}[5m]) > 10
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate on {{ $labels.instance }}"
description: "Error rate is {{ $value }} errors/sec"
# High response time
- alert: NginxHighResponseTime
expr: nginx_http_request_duration_seconds{quantile="0.99"} > 5
for: 5m
labels:
severity: warning
annotations:
summary: "High response time on {{ $labels.instance }}"
description: "99th percentile response time is {{ $value }}s"
# Nginx down
- alert: NginxDown
expr: up{job="nginx"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Nginx is down on {{ $labels.instance }}"
description: "Nginx exporter is not responding"
# High connection count
- alert: NginxHighConnections
expr: nginx_connections_active > 1000
for: 5m
labels:
severity: warning
annotations:
summary: "High connection count on {{ $labels.instance }}"
description: "Active connections: {{ $value }}"
# Low request rate (possible issue)
- alert: NginxLowRequestRate
expr: rate(nginx_http_requests_total[5m]) < 1
for: 10m
labels:
severity: warning
annotations:
summary: "Low request rate on {{ $labels.instance }}"
description: "Request rate is {{ $value }} req/sec"
6.2. Email Alerts Script
#!/bin/bash
# nginx_alert.sh
THRESHOLD_ERROR_RATE=100
THRESHOLD_RESPONSE_TIME=5
EMAIL="admin@example.com"
LOG_FILE="/var/log/nginx/access.log"
# Check error rate
ERROR_COUNT=$(grep -c " 5[0-9][0-9] " $LOG_FILE)
if [ $ERROR_COUNT -gt $THRESHOLD_ERROR_RATE ]; then
echo "ALERT: High error rate detected: $ERROR_COUNT errors" | \
mail -s "Nginx Alert: High Error Rate" $EMAIL
fi
# Check response time (if logged)
AVG_RESPONSE_TIME=$(awk '{sum+=$NF; count++} END {print sum/count}' $LOG_FILE)
if (( $(echo "$AVG_RESPONSE_TIME > $THRESHOLD_RESPONSE_TIME" | bc -l) )); then
echo "ALERT: High response time: ${AVG_RESPONSE_TIME}s" | \
mail -s "Nginx Alert: High Response Time" $EMAIL
fi
6.3. Slack Webhook Integration
#!/bin/bash
# slack_alert.sh
WEBHOOK_URL="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
LOG_FILE="/var/log/nginx/error.log"
# Check for errors
ERROR_COUNT=$(wc -l < $LOG_FILE)
if [ $ERROR_COUNT -gt 100 ]; then
MESSAGE="⚠️ Nginx Alert: $ERROR_COUNT errors detected!"
curl -X POST $WEBHOOK_URL \
-H 'Content-Type: application/json' \
-d "{\"text\":\"$MESSAGE\"}"
fi
6.4. Automated Response Script
#!/bin/bash
# auto_response.sh
LOG_FILE="/var/log/nginx/error.log"
THRESHOLD=100
# Count recent errors (last 5 minutes)
ERROR_COUNT=$(find /var/log/nginx -name "error.log" -mmin -5 -exec wc -l {} \; | awk '{sum+=$1} END {print sum}')
if [ $ERROR_COUNT -gt $THRESHOLD ]; then
echo "High error rate detected: $ERROR_COUNT errors"
# Reload Nginx
echo "Reloading Nginx..."
systemctl reload nginx
# Send alert
echo "High error rate triggered Nginx reload: $ERROR_COUNT errors" | \
mail -s "Nginx Auto-Response" admin@example.com
# Log action
echo "$(date): Auto-reload triggered due to $ERROR_COUNT errors" >> /var/log/nginx-auto-response.log
fi
7. Real-time Monitoring Dashboard
7.1. Custom HTML Dashboard
<!DOCTYPE html>
<html>
<head>
<title>Nginx Real-time Monitor</title>
<meta http-equiv="refresh" content="5">
<style>
body {
font-family: Arial, sans-serif;
margin: 20px;
background: #f5f5f5;
}
.container {
max-width: 1200px;
margin: 0 auto;
}
.metric {
background: white;
padding: 20px;
margin: 10px 0;
border-radius: 5px;
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
}
.metric h2 {
margin: 0 0 10px 0;
color: #333;
}
.value {
font-size: 32px;
font-weight: bold;
color: #007bff;
}
.alert {
background: #ff4444;
color: white;
}
.warning {
background: #ffaa00;
color: white;
}
.ok {
background: #00c851;
color: white;
}
</style>
</head>
<body>
<div class="container">
<h1>Nginx Real-time Monitor</h1>
<?php
// Read Nginx stats
$stats = file_get_contents('http://localhost:8080/stub_status');
preg_match('/Active connections: (\d+)/', $stats, $active);
preg_match('/(\d+)\s+(\d+)\s+(\d+)/', $stats, $totals);
preg_match('/Reading: (\d+) Writing: (\d+) Waiting: (\d+)/', $stats, $current);
$activeConnections = $active[1];
$accepts = $totals[1];
$handled = $totals[2];
$requests = $totals[3];
$reading = $current[1];
$writing = $current[2];
$waiting = $current[3];
// Determine status
$status = 'ok';
if ($activeConnections > 1000) $status = 'alert';
elseif ($activeConnections > 500) $status = 'warning';
?>
<div class="metric <?php echo $status; ?>">
<h2>Active Connections</h2>
<div class="value"><?php echo $activeConnections; ?></div>
</div>
<div class="metric">
<h2>Total Requests</h2>
<div class="value"><?php echo number_format($requests); ?></div>
</div>
<div class="metric">
<h2>Connection Details</h2>
<p>Reading: <?php echo $reading; ?></p>
<p>Writing: <?php echo $writing; ?></p>
<p>Waiting: <?php echo $waiting; ?></p>
</div>
<div class="metric">
<h2>Server Stats</h2>
<p>Accepts: <?php echo number_format($accepts); ?></p>
<p>Handled: <?php echo number_format($handled); ?></p>
<p>Requests per connection: <?php echo round($requests/$handled, 2); ?></p>
</div>
<p><small>Last updated: <?php echo date('Y-m-d H:i:s'); ?></small></p>
</div>
</body>
</html>
7.2. Python Real-time Monitor
#!/usr/bin/env python3
# nginx_monitor.py
import requests
import time
import re
from datetime import datetime
def get_nginx_stats():
"""Fetch Nginx stub_status"""
try:
response = requests.get('http://localhost:8080/stub_status')
return response.text
except Exception as e:
print(f"Error fetching stats: {e}")
return None
def parse_stats(stats):
"""Parse stub_status output"""
if not stats:
return None
active = re.search(r'Active connections: (\d+)', stats)
totals = re.search(r'(\d+)\s+(\d+)\s+(\d+)', stats)
current = re.search(r'Reading: (\d+) Writing: (\d+) Waiting: (\d+)', stats)
return {
'active_connections': int(active.group(1)),
'accepts': int(totals.group(1)),
'handled': int(totals.group(2)),
'requests': int(totals.group(3)),
'reading': int(current.group(1)),
'writing': int(current.group(2)),
'waiting': int(current.group(3)),
'timestamp': datetime.now()
}
def display_stats(stats):
"""Display stats in terminal"""
print("\033[2J\033[H") # Clear screen
print("=" * 50)
print("Nginx Real-time Monitor")
print("=" * 50)
print(f"Time: {stats['timestamp'].strftime('%Y-%m-%d %H:%M:%S')}")
print()
print(f"Active Connections: {stats['active_connections']}")
print(f"Total Requests: {stats['requests']:,}")
print()
print(f"Reading: {stats['reading']}")
print(f"Writing: {stats['writing']}")
print(f"Waiting: {stats['waiting']}")
print()
print(f"Accepts: {stats['accepts']:,}")
print(f"Handled: {stats['handled']:,}")
print(f"Requests/Connection: {stats['requests']/stats['handled']:.2f}")
print()
# Alerts
if stats['active_connections'] > 1000:
print("⚠️ ALERT: High connection count!")
elif stats['active_connections'] > 500:
print("⚠️ WARNING: Elevated connection count")
else:
print("✓ Status: OK")
def main():
print("Starting Nginx monitor...")
print("Press Ctrl+C to exit")
time.sleep(2)
try:
while True:
stats_text = get_nginx_stats()
stats = parse_stats(stats_text)
if stats:
display_stats(stats)
time.sleep(5)
except KeyboardInterrupt:
print("\nMonitor stopped")
if __name__ == '__main__':
main()
Tổng kết
Trong bài này, bạn đã học:
- ✅ Access logs và error logs configuration
- ✅ Custom log formats (JSON, detailed, performance)
- ✅ Log rotation với logrotate
- ✅ Log analysis tools (GoAccess, AWK scripts)
- ✅ Prometheus + Grafana monitoring
- ✅ ELK Stack integration (Elasticsearch, Logstash, Kibana)
- ✅ Alerting systems (Alertmanager, email, Slack)
- ✅ Real-time dashboards
- ✅ Performance metrics và troubleshooting
Key takeaways:
- Use structured logging (JSON) for better analysis
- Implement log rotation to manage disk space
- Set up real-time monitoring for quick issue detection
- Create dashboards for visualization
- Configure alerts for critical issues
- Automate log analysis with scripts
- Integrate with centralized logging systems
Monitoring checklist:
- ✅ Access and error logs configured
- ✅ Log rotation enabled
- ✅ Prometheus exporter running
- ✅ Grafana dashboards created
- ✅ Alerting rules configured
- ✅ ELK stack (optional) integrated
- ✅ Regular log analysis scheduled
- ✅ Backup and retention policies defined
Bài tiếp theo: High Availability và Load Balancing Advanced - Nginx Plus features, health checks, session persistence, active-active setups, failover strategies, và disaster recovery để đảm bảo uptime tối đa trong production environments.