Rocky Linux Monitoring — Netdata, Prometheus, Grafana & Journald Stack

System Health

Key Metrics to Monitor

These are the critical metrics every Rocky Linux server administrator should track — with alert thresholds.

CPU Usage 68%

Healthy below 80% sustained

Memory (RAM) 54%

54% of 8GB used — comfortable

Disk I/O Wait 12%

Low iowait — storage not bottlenecked

Disk Usage / 41%

Root partition — 41% of 100GB

Disk Usage /var 73%

Watch this — logs and databases

Network RX 38%

380 Mbps of 1Gbps

Load Average 55%

2.2 / 4 CPUs — 55% load

⚡

Netdata

Zero-config, real-time per-second metrics. Runs as a lightweight daemon. Best for instant visibility on any Rocky server.

🔥

Prometheus + Node Exporter

Pull-based metrics collection. Node Exporter exposes 600+ OS metrics. Store in Prometheus TSDB, visualise in Grafana.

📈

Grafana Dashboards

Beautiful dashboards over Prometheus data. Pre-built Rocky Linux / Node Exporter dashboards available in Grafana Hub.

📋

journald + journalctl

Systemd journal: all service logs centralised. Query by time, unit, priority. Forward to Loki or Elasticsearch for aggregation.

🚨

Alertmanager

Route Prometheus alerts to email, Slack, PagerDuty, or webhook. Define inhibition rules to prevent alert storms.

🔍

Loki + Promtail

Grafana Loki is like Prometheus but for logs. Promtail ships journal and file logs. Query with LogQL alongside your metrics.

Netdata — what it monitors

          CPU
          Per-core usage, steal, softirq
        

          Memory
          RAM, swap, slab, cache detail
        

          Disk
          Read/write IOPS, bandwidth, latency
        

          Network
          Per-interface packets, errors, drops
        

          Processes
          Fork rate, running, blocked
        

          Systemd
          Per-service CPU/RAM usage
        

          MySQL/PG
          Query rate, slow queries, connections
        

          nginx/Apache
          Requests/s, error rate, response time
        

Monitoring · Netdata

Netdata — Real-Time Monitoring

Netdata is a zero-config monitoring agent that starts collecting per-second metrics immediately on install. It has built-in dashboards, anomaly detection, and plugin support for MySQL, PostgreSQL, nginx, and 600+ more applications.

Install Netdata on Rocky Linux

# One-line installer (official)

[root@rocky ~]$ wget -O /tmp/netdata-kickstart.sh https://get.netdata.cloud/kickstart.sh

[root@rocky ~]$ bash /tmp/netdata-kickstart.sh --stable-channel

# Or via DNF (EPEL)

[root@rocky ~]$ dnf install -y epel-release && dnf install -y netdata

[root@rocky ~]$ systemctl enable --now netdata

# Dashboard at http://SERVER_IP:19999

[root@rocky ~]$ firewall-cmd --permanent --add-port=19999/tcp && firewall-cmd --reload

💡

Restrict access: Never expose Netdata port to the internet. Bind it to localhost and use an nginx reverse proxy with authentication, or use Netdata Cloud for secure remote access.

Monitoring · Prometheus

Prometheus + Node Exporter

Prometheus scrapes metrics from Node Exporter (running on each Rocky Linux server) every 15 seconds and stores them in its time-series database. Query with PromQL and alert via Alertmanager.

Install Node Exporter on monitored servers

[root@rocky ~]$ dnf install -y golang-github-prometheus-node-exporter

[root@rocky ~]$ systemctl enable --now node_exporter

# Metrics at http://SERVER:9100/metrics

prometheus.yml — scrape config

global:

scrape_interval: 15s

scrape_configs:

- job_name: 'rocky-servers'

static_configs:

- targets: ['192.168.1.10:9100', '192.168.1.11:9100']

- job_name: 'mysql'

static_configs:

- targets: ['192.168.1.10:9104']

Useful PromQL queries

# CPU usage %

100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Free memory MB

node_memory_MemFree_bytes / 1024 / 1024

# Disk usage %

(1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100

Monitoring · Grafana

Grafana Dashboard Setup

Connect Grafana to your Prometheus instance and import community dashboards for instant visibility.

Install Grafana on Rocky Linux

[root@rocky ~]$ cat >> /etc/yum.repos.d/grafana.repo << 'EOF'

[grafana]

name=Grafana

baseurl=https://rpm.grafana.com

gpgcheck=1

EOF

[root@rocky ~]$ dnf install -y grafana

[root@rocky ~]$ systemctl enable --now grafana-server

# UI at http://SERVER:3000 admin/admin

💡

Dashboard ID 1860 — the "Node Exporter Full" dashboard on Grafana Hub — gives you 20+ panels covering every Rocky Linux metric out of the box.

journalctl — common queries
[root@rocky ~]$ journalctl -u sshd -n 50 --no-pager
Last 50 SSH service log entries
 
[root@rocky ~]$ journalctl -p err -S "1 hour ago"
All errors in the last hour
 
[root@rocky ~]$ journalctl -k -S today
Kernel messages today
 
[root@rocky ~]$ journalctl --disk-usage
Archived and active journals: 1.2G
 
[root@rocky ~]$ journalctl --vacuum-size=500M
Trim journal to 500MB

Monitoring · Logs

journald — Centralised Log Management

Systemd's journald collects all service logs, kernel messages, and boot events into a structured binary journal. On Rocky Linux this replaces scattered /var/log/ text files for most services.

/etc/systemd/journald.conf — retention settings

[Journal]

Storage = persistent

Compress = yes

SystemMaxUse = 2G # max disk use

SystemKeepFree = 1G # keep this free

MaxRetentionSec = 90day

ForwardToSyslog = no

💡

Loki integration: Install promtail to ship journal entries to Grafana Loki. Then query logs with LogQL directly alongside your Prometheus metrics in the same Grafana dashboard.

Monitoring · Alerting

Alert Rules — When to Get Paged

Define Prometheus alert rules that trigger Alertmanager to notify your on-call team via email, Slack, or PagerDuty.

Alert Rule	Condition (PromQL)	Severity	Action
High CPU Load	`node_load1 / count(node_cpu_seconds_total{mode="idle"}) > 0.9`	WARNING	Check top, ps auxf
Disk > 85%	`(1 - node_filesystem_avail_bytes/node_filesystem_size_bytes)*100 > 85`	WARNING	Clean logs, expand LV
Disk > 95%	`disk_usage_pct > 95`	CRITICAL	URGENT: free space now
Service Down	`up == 0`	CRITICAL	Check systemctl status
High Memory	`node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes < 0.1`	WARNING	Check memory leaks
OOM Killer Active	`increase(node_vmstat_oom_kill[5m]) > 0`	CRITICAL	Process killed — check logs
RAID Degraded	`node_md_disks_active < node_md_disks`	CRITICAL	Replace failed disk
SSH Failed Logins > 20	`increase(fail2ban_banned_ip_total[5m]) > 20`	WARNING	Check fail2ban, firewall

Rocky Linux Monitoring & Observability

Key Metrics to Monitor

Netdata

Prometheus + Node Exporter

Grafana Dashboards

journald + journalctl

Alertmanager

Loki + Promtail

Netdata — Real-Time Monitoring

Prometheus + Node Exporter

Grafana Dashboard Setup

journald — Centralised Log Management

Alert Rules — When to Get Paged

Set up monitoring for your Rocky Linux servers