GPU Reporting Utility

gpumanager

Github: https://happilee12.github.io/gpu-util-webhook/

PIP Page: https://pypi.org/project/gpumanager/

Sample NVIDIA GPU utilization, store each snapshot as a CSV file, aggregate usage over a configurable window, and send a Slack report on your own schedule. gpumanager is meant to be used together with a Slack incoming webhook.

pipx install gpumanager
pipx ensurepath
source ~/.bashrc

gpumanager install-systemd
gpumanager init
gpumanager sample
gpumanager test

What it does

  • Collects GPU utilization data through nvidia-smi.
  • Writes one CSV file per sample so the data remains simple and inspectable.
  • Aggregates average utilization by GPU and sends a Slack summary.
  • Supports interactive setup, runtime status checks, and user-level systemd timers.

Quick start

gpumanager install-systemd
gpumanager init

If you install with pipx, run pipx ensurepath and reload your shell once so the gpumanager command is available.

Core commands

  • gpumanager init sets up the Slack webhook, storage directory, sample interval, report time, report window, timezone, and server name.
  • gpumanager sample collects the current GPU utilization and writes a CSV snapshot.
  • gpumanager test uses existing CSV data and sends a test Slack message.
  • gpumanager install-systemd installs the user services and timers.
  • gpumanager disable-sample and gpumanager disable-report stop sampling or reporting independently.
  • gpumanager status shows the active configuration path, timers, webhook URL, and local runtime details.

Configuration example

[slack]
webhook_url = "https://hooks.slack.com/services/..."

[storage]
csv_dir = "/var/lib/gpumanager"

[sample]
interval = "30s"

[report]
report_time = "0 * * * *"
interval = "1h"

[general]
timezone = "Asia/Seoul"
server_name = "my_server"

Recommended setup

1. Realtime report

Check near-realtime GPU activity every 10 minutes.

[sample]
interval = "1m"

[report]
report_time = "*/10 * * * *"
interval = "1m"

2. Daily Average report

This matches the current default-style daily setup.

[sample]
interval = "1m"

[report]
report_time = "0 9 * * *"
interval = "1d"

3. Weekly report

Send one summary per week and aggregate the last 7 days.

[sample]
interval = "1m"

[report]
report_time = "0 9 * * 1"
interval = "7d"

Report format

[my_server] 2025.09.06 16:49:32 KST
Window: last 1h
GPU 0: 31.38%
GPU 1: 29.39%
GPU 2: 31.57%
GPU 3: 56.36%

Troubleshooting

4. Test

After finishing the configuration, send a test report.

gpumanager sample
gpumanager test

If the Slack message arrives normally, the setup is working.

If the message is delivered here but does not arrive at the scheduled time, gpumanager install-systemd may not have been run yet. In that case, run gpumanager status and check whether sample_timer_installed and report_timer_installed are set correctly.

How to setup a Slack webhook

This section is intentionally left as a placeholder so project-specific setup steps can be added later.

gpumanager sends notifications through a Slack incoming webhook.

Official Slack guide: Sending messages using incoming webhooks

Run example

This section is ready for screenshots or diagrams showing the live sampling and reporting flow.

Slack Webhook Demo Image