📊 Monitoring

Rocky Linux Monitoring & Observability

Know what your server is doing at all times — CPU, RAM, disk I/O, network, log anomalies, and service health. Build a full observability stack on Rocky Linux 9.

Real-Time Metrics Log Aggregation Alerting
System Health

Key Metrics to Monitor

These are the critical metrics every Rocky Linux server administrator should track — with alert thresholds.

CPU Usage 68%
Healthy below 80% sustained
Memory (RAM) 54%
54% of 8GB used — comfortable
Disk I/O Wait 12%
Low iowait — storage not bottlenecked
Disk Usage / 41%
Root partition — 41% of 100GB
Disk Usage /var 73%
Watch this — logs and databases
Network RX 38%
380 Mbps of 1Gbps
Load Average 55%
2.2 / 4 CPUs — 55% load

Netdata

Zero-config, real-time per-second metrics. Runs as a lightweight daemon. Best for instant visibility on any Rocky server.

🔥

Prometheus + Node Exporter

Pull-based metrics collection. Node Exporter exposes 600+ OS metrics. Store in Prometheus TSDB, visualise in Grafana.

📈

Grafana Dashboards

Beautiful dashboards over Prometheus data. Pre-built Rocky Linux / Node Exporter dashboards available in Grafana Hub.

📋

journald + journalctl

Systemd journal: all service logs centralised. Query by time, unit, priority. Forward to Loki or Elasticsearch for aggregation.

🚨

Alertmanager

Route Prometheus alerts to email, Slack, PagerDuty, or webhook. Define inhibition rules to prevent alert storms.

🔍

Loki + Promtail

Grafana Loki is like Prometheus but for logs. Promtail ships journal and file logs. Query with LogQL alongside your metrics.

Netdata — what it monitors
CPU Per-core usage, steal, softirq
Memory RAM, swap, slab, cache detail
Disk Read/write IOPS, bandwidth, latency
Network Per-interface packets, errors, drops
Processes Fork rate, running, blocked
Systemd Per-service CPU/RAM usage
MySQL/PG Query rate, slow queries, connections
nginx/Apache Requests/s, error rate, response time
Monitoring · Netdata

Netdata — Real-Time Monitoring

Netdata is a zero-config monitoring agent that starts collecting per-second metrics immediately on install. It has built-in dashboards, anomaly detection, and plugin support for MySQL, PostgreSQL, nginx, and 600+ more applications.

Install Netdata on Rocky Linux
# One-line installer (official)
[root@rocky ~]$ wget -O /tmp/netdata-kickstart.sh https://get.netdata.cloud/kickstart.sh
[root@rocky ~]$ bash /tmp/netdata-kickstart.sh --stable-channel
 
# Or via DNF (EPEL)
[root@rocky ~]$ dnf install -y epel-release && dnf install -y netdata
[root@rocky ~]$ systemctl enable --now netdata
 
# Dashboard at http://SERVER_IP:19999
[root@rocky ~]$ firewall-cmd --permanent --add-port=19999/tcp && firewall-cmd --reload
💡
Restrict access: Never expose Netdata port to the internet. Bind it to localhost and use an nginx reverse proxy with authentication, or use Netdata Cloud for secure remote access.
Monitoring · Prometheus

Prometheus + Node Exporter

Prometheus scrapes metrics from Node Exporter (running on each Rocky Linux server) every 15 seconds and stores them in its time-series database. Query with PromQL and alert via Alertmanager.

Install Node Exporter on monitored servers
[root@rocky ~]$ dnf install -y golang-github-prometheus-node-exporter
[root@rocky ~]$ systemctl enable --now node_exporter
# Metrics at http://SERVER:9100/metrics
prometheus.yml — scrape config
global:
  scrape_interval: 15s
 
scrape_configs:
  - job_name: 'rocky-servers'
    static_configs:
      - targets: ['192.168.1.10:9100', '192.168.1.11:9100']
 
  - job_name: 'mysql'
    static_configs:
      - targets: ['192.168.1.10:9104']
Useful PromQL queries
# CPU usage %
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
 
# Free memory MB
node_memory_MemFree_bytes / 1024 / 1024
 
# Disk usage %
(1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100
Monitoring · Grafana

Grafana Dashboard Setup

Connect Grafana to your Prometheus instance and import community dashboards for instant visibility.

Install Grafana on Rocky Linux
[root@rocky ~]$ cat >> /etc/yum.repos.d/grafana.repo << 'EOF'
[grafana]
name=Grafana
baseurl=https://rpm.grafana.com
gpgcheck=1
EOF
[root@rocky ~]$ dnf install -y grafana
[root@rocky ~]$ systemctl enable --now grafana-server
# UI at http://SERVER:3000 admin/admin
💡
Dashboard ID 1860 — the "Node Exporter Full" dashboard on Grafana Hub — gives you 20+ panels covering every Rocky Linux metric out of the box.
journalctl — common queries
[root@rocky ~]$ journalctl -u sshd -n 50 --no-pager
Last 50 SSH service log entries
 
[root@rocky ~]$ journalctl -p err -S "1 hour ago"
All errors in the last hour
 
[root@rocky ~]$ journalctl -k -S today
Kernel messages today
 
[root@rocky ~]$ journalctl --disk-usage
Archived and active journals: 1.2G
 
[root@rocky ~]$ journalctl --vacuum-size=500M
Trim journal to 500MB
Monitoring · Logs

journald — Centralised Log Management

Systemd's journald collects all service logs, kernel messages, and boot events into a structured binary journal. On Rocky Linux this replaces scattered /var/log/ text files for most services.

/etc/systemd/journald.conf — retention settings
[Journal]
Storage = persistent
Compress = yes
SystemMaxUse = 2G # max disk use
SystemKeepFree = 1G # keep this free
MaxRetentionSec = 90day
ForwardToSyslog = no
💡
Loki integration: Install promtail to ship journal entries to Grafana Loki. Then query logs with LogQL directly alongside your Prometheus metrics in the same Grafana dashboard.
Monitoring · Alerting

Alert Rules — When to Get Paged

Define Prometheus alert rules that trigger Alertmanager to notify your on-call team via email, Slack, or PagerDuty.

Alert RuleCondition (PromQL)SeverityAction
High CPU Load node_load1 / count(node_cpu_seconds_total{mode="idle"}) > 0.9 WARNING Check top, ps auxf
Disk > 85% (1 - node_filesystem_avail_bytes/node_filesystem_size_bytes)*100 > 85 WARNING Clean logs, expand LV
Disk > 95% disk_usage_pct > 95 CRITICAL URGENT: free space now
Service Down up == 0 CRITICAL Check systemctl status
High Memory node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes < 0.1 WARNING Check memory leaks
OOM Killer Active increase(node_vmstat_oom_kill[5m]) > 0 CRITICAL Process killed — check logs
RAID Degraded node_md_disks_active < node_md_disks CRITICAL Replace failed disk
SSH Failed Logins > 20 increase(fail2ban_banned_ip_total[5m]) > 20 WARNING Check fail2ban, firewall

Set up monitoring for your Rocky Linux servers

We deploy and configure the full observability stack — Netdata, Prometheus, Grafana, Loki — and tune alert rules for your specific infrastructure.