GPU Reporting Utility

gpumanager

Github: https://happilee12.github.io/gpu-util-webhook/

PIP Page: https://pypi.org/project/gpumanager/

Sample NVIDIA GPU utilization, store each snapshot as a CSV file, aggregate usage over a configurable window, and send a Slack report on your own schedule. gpumanager is meant to be used together with a Slack incoming webhook.

pipx install gpumanager
pipx ensurepath
source ~/.bashrc

gpumanager install-systemd
gpumanager init
gpumanager test-sample
gpumanager test-report

What it does

  • Collects GPU utilization data through nvidia-smi.
  • Writes one CSV file per sample so the data remains simple and inspectable.
  • Aggregates average utilization by GPU and sends a Slack summary.
  • Supports interactive setup, runtime status checks, and user-level systemd timers.

Quick start

gpumanager install-systemd
gpumanager init

If you install with pipx, run pipx ensurepath and reload your shell once so the gpumanager command is available.

Core commands

  • gpumanager init sets up the Slack webhook, storage directory, sample interval, report time, report window, timezone, and server name.
  • gpumanager test-sample collects the current GPU utilization and writes a CSV snapshot.
  • gpumanager test-report aggregates existing CSV data and sends a Slack report immediately.
  • gpumanager install-systemd installs the user services and timers.
  • gpumanager disable-sample and gpumanager disable-report stop sampling or reporting independently.
  • gpumanager reload runs systemctl --user daemon-reload and restarts the installed gpumanager timers.
  • gpumanager status shows the active configuration path, timers, webhook URL, local runtime details, and the next sample/report trigger times.

Configuration example

[slack]
webhook_url = "https://hooks.slack.com/services/..."

[storage]
csv_dir = "/var/lib/gpumanager"

[sample]
interval = "30s"

[report]
report_time = "0 * * * *"
interval = "1h"

[general]
timezone = "Asia/Seoul"
server_name = "my_server"

Recommended setup

1. Realtime report

Check near-realtime GPU activity every 10 minutes.

[sample]
interval = "1m"

[report]
report_time = "*/10 * * * *"
interval = "1m"

2. Daily Average report

This matches the current default-style daily setup.

[sample]
interval = "1m"

[report]
report_time = "0 9 * * *"
interval = "1d"

3. Weekly report

Send one summary per week and aggregate the last 7 days.

[sample]
interval = "1m"

[report]
report_time = "0 9 * * 1"
interval = "7d"

Report format

[my_server] 2025.09.06 16:49:32 KST
Window: last 1h
GPU 0: 31.38%
GPU 1: 29.39%
GPU 2: 31.57%
GPU 3: 56.36%

Troubleshooting

4. Test

After finishing the configuration, send a test report.

gpumanager test-sample
gpumanager test-report

If the Slack message arrives normally, the setup is working.

If the message is delivered here but does not arrive at the scheduled time, gpumanager install-systemd may not have been run yet. In that case, run gpumanager status and check sample_timer_installed, report_timer_installed, sample.next_trigger, and report.next_trigger.

How to setup a Slack webhook

This section is intentionally left as a placeholder so project-specific setup steps can be added later.

gpumanager sends notifications through a Slack incoming webhook.

Official Slack guide: Sending messages using incoming webhooks

Run example

This section is ready for screenshots or diagrams showing the live sampling and reporting flow.

Slack Webhook Demo Image