Overview
This script checks disk usage, CPU & memory utilization, and watches specific processes
you care about. It logs everything (rotating logs) and can alert via console, macOS desktop
notification, and Slack (optional webhook). Thresholds and watchlists are configured in
config.yaml.
Repo & Quick Start
GitHub: cmwalls/system-health-checker
# clone & run (macOS/Linux)
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python health_check.py
Project Structure
system-health-checker/
├─ health_check.py # main script
├─ config.yaml # thresholds, process watchlist, alert settings
├─ requirements.txt
├─ logs/ # rotating logs (gitignored)
└─ .gitignore
Configuration (config.yaml)
Edit thresholds and watchlists without touching code:
thresholds:
cpu_percent: 85
memory_percent: 85
disk_percent: 90
process_watchlist:
required: [] # must be running, e.g., ["redis-server", "postgres"]
monitor:
- name: "redis-server" # alert if over CPU/MEM limits
cpu_percent: 70
memory_percent: 20
logging:
file: "logs/health_check.log"
level: "INFO"
alerts:
desktop: true # macOS desktop notifications
slack: true # requires SLACK_WEBHOOK env var
console: true # print to stdout
What the Script Does
- Disk Check: Calculates % used on root (
/) and compares todisk_percent. - CPU & Memory: Samples CPU (1s interval) and reads system memory percent, compares to thresholds.
- Process Watchlist:
required: alerts if any listed process isn’t running.monitor: alerts when named processes exceed per-process CPU/MEM limits.
- Top Processes: Logs the top N processes by memory for quick triage.
- Exit Codes: Exits non-zero if any alert fired (handy for CI/cron monitoring).
Alerts
- Console: Always prints.
- Desktop (macOS): Uses
osascriptnotifications if enabled. - Slack: Export a webhook and enable in config:
export SLACK_WEBHOOK="https://hooks.slack.com/services/..."
Logging
Rotating file logs (default logs/health_check.log, ~500 KB, 3 backups) + console output for visibility.
Scheduling (cron)
Run every 10 minutes (adjust the absolute path):
crontab -e
*/10 * * * * /bin/zsh -lc 'cd /ABSOLUTE/PATH/system-health-checker && source .venv/bin/activate && python health_check.py'
Code Walkthrough (selected)
load_config()– loads YAML config.setup_logger()– rotating file handler + console handler.check_disk()– computes percent used of root.check_cpu_mem()– CPU sample over a 1-second interval and current memory %.evaluate_process_watchlist()– ensures required processes exist and monitored ones stay under per-process thresholds.alert()– central alerting (console, macOS notification, Slack if webhook present).main()– orchestrates checks, logs results, prints top processes, sets exit code if any alerts fired.
Why This Matters (Employer Signal)
- Proactive mindset: preventing incidents by monitoring basics.
- Config-driven: thresholds & alerts without code edits.
- Operational polish: rotating logs, exit codes, cron-ready, Slack hooks.
- Extensible: add email, Windows toast notifications, or systemd timers easily.