Have you ever wanted to monitor your NVIDIA GPU directly from your Python code? I’ve been there, and I’m going to show you exactly how to do it. I first discovered NVML Python when I participated in Google Summer of Code several years back, working on Ganglia’s GPU monitoring module. Trust me, once you learn these techniques, you’ll never go back to the old ways of checking your GPU stats.
NVML (NVIDIA Management Library) is a powerful C-based API that gives you direct access to monitor and manage NVIDIA GPU devices. Think of it as the engine behind the popular nvidia-smi command-line tool – but now you can access all that data programmatically in Python!
The incredible thing about NVML Python is that it offers complete control over your GPU monitoring without any complex C programming. You get instant access to critical metrics like:
This is absolutely essential knowledge for data scientists, machine learning engineers, and anyone working with GPU-accelerated applications.
Before diving into the code, you need to install the Python bindings for NVML. The setup process is straightforward:
The most up-to-date package is available on PyPI. Simply run:
pip install nvidia-ml-py This package provides Python bindings to the NVIDIA Management Library. Make sure you have NVIDIA drivers properly installed on your system before proceeding.
Tip 💡: It’s a good practice to set up a Python virtual environment for your project first and install dependencies there.
After installation, you can import the library into your Python script:
from pynvml import *Code language: JavaScript (javascript) The library exposes all the functionality you need to interact with your GPUs. Let’s start coding!
The first step in any NVML Python application is establishing a connection to the GPU. This must be handled with proper exception management:
try:
nvmlInit()
print("NVML initialized successfully")
except NVMLError as err:
print(f"Failed to initialize NVML: {err}")
sys.exit(1)Code language: PHP (php) Always wrap your initialization in a try-except block. If no compatible GPUs are found or there’s a driver issue, this will catch the error and prevent your application from crashing.
Properly closing the connection is just as important as initializing it:
try:
nvmlShutdown()
print("NVML shutdown successful")
except NVMLError as err:
print(f"Error shutting down NVML: {err}")
return 1Code language: PHP (php) A clean shutdown ensures system resources are properly released. Always include this in your application’s cleanup routine.
NVML makes it easy to detect how many GPUs are available in your system:
def get_gpu_count():
try:
gpu_count = nvmlDeviceGetCount()
print(f"Found {gpu_count} GPU devices")
return gpu_count
except NVMLError as err:
print(f"Error getting GPU count: {err}")
return 0Code language: PHP (php) This function returns the number of NVIDIA GPUs that NVML can access and control.
To interact with a specific GPU, you need to obtain a reference to it using its index:
def get_gpu_by_index(gpu_id):
try:
handle = nvmlDeviceGetHandleByIndex(gpu_id)
return handle
except NVMLError as err:
print(f"Error accessing GPU {gpu_id}: {err}")
return NoneCode language: PHP (php) GPU indices are zero-based, so your first GPU has index 0, the second has index 1, and so on.
Once you have a GPU handle, you can extract various metrics and information:
def get_gpu_info(handle):
try:
name = nvmlDeviceGetName(handle)
uuid = nvmlDeviceGetUUID(handle)
serial = nvmlDeviceGetSerial(handle)
print(f"GPU Name: {name}")
print(f"GPU UUID: {uuid}")
print(f"Serial Number: {serial}")
except NVMLError as err:
print(f"Error getting GPU info: {err}")Code language: PHP (php) Temperature is one of the most critical metrics for GPU health monitoring:
def get_gpu_temperature(handle):
try:
temp = nvmlDeviceGetTemperature(handle, NVML_TEMPERATURE_GPU)
print(f"GPU Temperature: {temp}°C")
return temp
except NVMLError as err:
print(f"Error getting temperature: {err}")
return NoneCode language: PHP (php) Memory utilization is crucial for optimizing GPU applications:
def get_memory_info(handle):
try:
mem_info = nvmlDeviceGetMemoryInfo(handle)
total = mem_info.total / 1024 / 1024 # Convert to MB
used = mem_info.used / 1024 / 1024 # Convert to MB
free = mem_info.free / 1024 / 1024 # Convert to MB
print(f"Total Memory: {total:.2f} MB")
print(f"Used Memory: {used:.2f} MB")
print(f"Free Memory: {free:.2f} MB")
print(f"Memory Utilization: {(used/total)*100:.2f}%")
return mem_info
except NVMLError as err:
print(f"Error getting memory info: {err}")
return NoneCode language: PHP (php) Monitoring utilization helps identify performance bottlenecks:
def get_utilization_rates(handle):
try:
util = nvmlDeviceGetUtilizationRates(handle)
print(f"GPU Utilization: {util.gpu}%")
print(f"Memory Utilization: {util.memory}%")
return util
except NVMLError as err:
print(f"Error getting utilization: {err}")
return NoneCode language: PHP (php) Power consumption metrics are valuable for energy efficiency monitoring:
def get_power_usage(handle):
try:
power = nvmlDeviceGetPowerUsage(handle) / 1000.0 # Convert to Watts
print(f"Power Usage: {power:.2f} W")
return power
except NVMLError as err:
print(f"Error getting power usage: {err}")
return NoneCode language: PHP (php) Let’s put everything together into a practical monitoring script:
import sys
import time
from pynvml import *
def monitor_gpus(interval=1, duration=10):
try:
# Initialize NVML
nvmlInit()
# Get number of GPUs
gpu_count = nvmlDeviceGetCount()
print(f"Found {gpu_count} GPU devices")
# Monitor for the specified duration
end_time = time.time() + duration
while time.time() < end_time:
print("\n" + "="*50)
print(f"Timestamp: {time.strftime('%Y-%m-%d %H:%M:%S')}")
# Iterate through all GPUs
for i in range(gpu_count):
handle = nvmlDeviceGetHandleByIndex(i)
name = nvmlDeviceGetName(handle)
print(f"\nGPU {i}: {name}")
print("-" * 30)
# Get temperature
temp = nvmlDeviceGetTemperature(handle, NVML_TEMPERATURE_GPU)
print(f"Temperature: {temp}°C")
# Get memory info
mem_info = nvmlDeviceGetMemoryInfo(handle)
print(f"Memory: {mem_info.used/1024/1024:.2f} MB / {mem_info.total/1024/1024:.2f} MB "
f"({mem_info.used*100/mem_info.total:.2f}%)")
# Get utilization
util = nvmlDeviceGetUtilizationRates(handle)
print(f"Utilization: GPU {util.gpu}%, Memory {util.memory}%")
# Get power (if available)
try:
power = nvmlDeviceGetPowerUsage(handle) / 1000.0
print(f"Power: {power:.2f} W")
except NVMLError:
pass
# Wait for the next interval
time.sleep(interval)
# Shutdown NVML
nvmlShutdown()
except NVMLError as err:
print(f"NVML Error: {err}")
sys.exit(1)
# Run monitoring for 30 seconds, refreshing every 2 seconds
if __name__ == "__main__":
monitor_gpus(interval=2, duration=30)Code language: PHP (php) This script provides a comprehensive view of your GPU status, updated at regular intervals. Perfect for monitoring during model training or benchmarking!
One powerful feature of NVML is the ability to see which processes are using your GPU:
def get_gpu_processes(handle):
try:
# Get compute processes
processes = nvmlDeviceGetComputeRunningProcesses(handle)
print(f"Found {len(processes)} processes running on GPU")
for proc in processes:
pid = proc.pid
used_mem = proc.usedGpuMemory / 1024 / 1024 # Convert to MB
print(f"Process ID: {pid}, Memory Usage: {used_mem:.2f} MB")
# On Linux, you can get process name
try:
import psutil
process = psutil.Process(pid)
print(f"Process Name: {process.name()}")
print(f"Command Line: {' '.join(process.cmdline())}")
except:
pass
return processes
except NVMLError as err:
print(f"Error getting process info: {err}")
return NoneCode language: PHP (php) This function requires the psutil library to get process names on Linux systems. Install it with pip install psutil.
The NVML Python API opens up tremendous possibilities for GPU monitoring and management:
When working with the NVML Python API, you might encounter these common issues:
If you see errors like “NVML not initialized,” make sure you’re calling nvmlInit() before any other NVML functions.
If NVML reports no GPUs, check:
nvidia-smi in your terminal)If you notice memory leaks, ensure you’re properly calling nvmlShutdown() when your application ends.
The NVML Python API provides unprecedented access to monitor and manage your NVIDIA GPUs directly from Python code. Whether you’re developing machine learning applications, running compute-intensive simulations, or building custom monitoring solutions, these tools give you fine-grained control over your GPU resources.
I hope this guide helps you get started with NVML Python! Remember, the key to mastering GPU utilization is proper monitoring and management. With these tools, you’ll be able to optimize your applications and maintain peak GPU performance.
Happy coding! 🚀
Learn python file handling from scratch! This comprehensive guide walks you through reading, writing, and managing files in Python with real-world examples, troubleshooting tips, and…
You've conquered the service worker lifecycle, mastered caching strategies, and explored advanced features. Now it's time to lock down your implementation with battle-tested service worker…
Unlock the full potential of service workers with advanced features like push notifications, background sync, and performance optimization techniques that transform your web app into…
This website uses cookies.
View Comments
Dear autor,
this is a great tutorial and I have been able to get the information out of my GPU except the frame rate. I searched the pynvml.py for a "getFPS"-function, but unfortunately none is available. Do you have any idea on how to get the frames per second? For example my MSI Afterburner is monitoring this value.
What I want to do basically is to gather some CPU/GPU Information and send it via USB to my Arduino (maybe at 0.5 Hz), which should display the information on a separate small display.
Thanks in advance for your suggestion on how to get the FPS in python! :)
Regards,
Matthias