
Have you ever wanted to monitor your NVIDIA GPU directly from your Python code? I’ve been there, and I’m going to show you exactly how to do it. I first discovered NVML Python when I participated in Google Summer of Code several years back, working on Ganglia’s GPU monitoring module. Trust me, once you learn these techniques, you’ll never go back to the old ways of checking your GPU stats.
What is NVML and Why Should You Care?
NVML (NVIDIA Management Library) is a powerful C-based API that gives you direct access to monitor and manage NVIDIA GPU devices. Think of it as the engine behind the popular nvidia-smi command-line tool – but now you can access all that data programmatically in Python!
The incredible thing about NVML Python is that it offers complete control over your GPU monitoring without any complex C programming. You get instant access to critical metrics like:
- GPU utilization
- Temperature readings
- Memory usage statistics
- Process information
- Power consumption
This is absolutely essential knowledge for data scientists, machine learning engineers, and anyone working with GPU-accelerated applications.
Setting Up Your Environment
Before diving into the code, you need to install the Python bindings for NVML. The setup process is straightforward:
Step 1: Install the Package
The most up-to-date package is available on PyPI. Simply run:
pip install nvidia-ml-pyThis package provides Python bindings to the NVIDIA Management Library. Make sure you have NVIDIA drivers properly installed on your system before proceeding.
Tip 💡: It’s a good practice to set up a Python virtual environment for your project first and install dependencies there.
Step 2: Import the Library
After installation, you can import the library into your Python script:
from pynvml import *Code language: JavaScript (javascript)The library exposes all the functionality you need to interact with your GPUs. Let’s start coding!
Essential NVML Python Operations
Initializing the Connection
The first step in any NVML Python application is establishing a connection to the GPU. This must be handled with proper exception management:
try:
nvmlInit()
print("NVML initialized successfully")
except NVMLError as err:
print(f"Failed to initialize NVML: {err}")
sys.exit(1)Code language: PHP (php)Always wrap your initialization in a try-except block. If no compatible GPUs are found or there’s a driver issue, this will catch the error and prevent your application from crashing.
Terminating the Connection
Properly closing the connection is just as important as initializing it:
try:
nvmlShutdown()
print("NVML shutdown successful")
except NVMLError as err:
print(f"Error shutting down NVML: {err}")
return 1Code language: PHP (php)A clean shutdown ensures system resources are properly released. Always include this in your application’s cleanup routine.
Discovering Available GPUs
NVML makes it easy to detect how many GPUs are available in your system:
def get_gpu_count():
try:
gpu_count = nvmlDeviceGetCount()
print(f"Found {gpu_count} GPU devices")
return gpu_count
except NVMLError as err:
print(f"Error getting GPU count: {err}")
return 0Code language: PHP (php)This function returns the number of NVIDIA GPUs that NVML can access and control.
Getting a GPU Handle
To interact with a specific GPU, you need to obtain a reference to it using its index:
def get_gpu_by_index(gpu_id):
try:
handle = nvmlDeviceGetHandleByIndex(gpu_id)
return handle
except NVMLError as err:
print(f"Error accessing GPU {gpu_id}: {err}")
return NoneCode language: PHP (php)GPU indices are zero-based, so your first GPU has index 0, the second has index 1, and so on.
Retrieving GPU Information
Once you have a GPU handle, you can extract various metrics and information:
Basic GPU Information
def get_gpu_info(handle):
try:
name = nvmlDeviceGetName(handle)
uuid = nvmlDeviceGetUUID(handle)
serial = nvmlDeviceGetSerial(handle)
print(f"GPU Name: {name}")
print(f"GPU UUID: {uuid}")
print(f"Serial Number: {serial}")
except NVMLError as err:
print(f"Error getting GPU info: {err}")Code language: PHP (php)Temperature Monitoring
Temperature is one of the most critical metrics for GPU health monitoring:
def get_gpu_temperature(handle):
try:
temp = nvmlDeviceGetTemperature(handle, NVML_TEMPERATURE_GPU)
print(f"GPU Temperature: {temp}°C")
return temp
except NVMLError as err:
print(f"Error getting temperature: {err}")
return NoneCode language: PHP (php)Memory Usage Statistics
Memory utilization is crucial for optimizing GPU applications:
def get_memory_info(handle):
try:
mem_info = nvmlDeviceGetMemoryInfo(handle)
total = mem_info.total / 1024 / 1024 # Convert to MB
used = mem_info.used / 1024 / 1024 # Convert to MB
free = mem_info.free / 1024 / 1024 # Convert to MB
print(f"Total Memory: {total:.2f} MB")
print(f"Used Memory: {used:.2f} MB")
print(f"Free Memory: {free:.2f} MB")
print(f"Memory Utilization: {(used/total)*100:.2f}%")
return mem_info
except NVMLError as err:
print(f"Error getting memory info: {err}")
return NoneCode language: PHP (php)GPU Utilization Rates
Monitoring utilization helps identify performance bottlenecks:
def get_utilization_rates(handle):
try:
util = nvmlDeviceGetUtilizationRates(handle)
print(f"GPU Utilization: {util.gpu}%")
print(f"Memory Utilization: {util.memory}%")
return util
except NVMLError as err:
print(f"Error getting utilization: {err}")
return NoneCode language: PHP (php)Power Usage
Power consumption metrics are valuable for energy efficiency monitoring:
def get_power_usage(handle):
try:
power = nvmlDeviceGetPowerUsage(handle) / 1000.0 # Convert to Watts
print(f"Power Usage: {power:.2f} W")
return power
except NVMLError as err:
print(f"Error getting power usage: {err}")
return NoneCode language: PHP (php)Practical Example: Complete GPU Monitoring Script
Let’s put everything together into a practical monitoring script:
import sys
import time
from pynvml import *
def monitor_gpus(interval=1, duration=10):
try:
# Initialize NVML
nvmlInit()
# Get number of GPUs
gpu_count = nvmlDeviceGetCount()
print(f"Found {gpu_count} GPU devices")
# Monitor for the specified duration
end_time = time.time() + duration
while time.time() < end_time:
print("\n" + "="*50)
print(f"Timestamp: {time.strftime('%Y-%m-%d %H:%M:%S')}")
# Iterate through all GPUs
for i in range(gpu_count):
handle = nvmlDeviceGetHandleByIndex(i)
name = nvmlDeviceGetName(handle)
print(f"\nGPU {i}: {name}")
print("-" * 30)
# Get temperature
temp = nvmlDeviceGetTemperature(handle, NVML_TEMPERATURE_GPU)
print(f"Temperature: {temp}°C")
# Get memory info
mem_info = nvmlDeviceGetMemoryInfo(handle)
print(f"Memory: {mem_info.used/1024/1024:.2f} MB / {mem_info.total/1024/1024:.2f} MB "
f"({mem_info.used*100/mem_info.total:.2f}%)")
# Get utilization
util = nvmlDeviceGetUtilizationRates(handle)
print(f"Utilization: GPU {util.gpu}%, Memory {util.memory}%")
# Get power (if available)
try:
power = nvmlDeviceGetPowerUsage(handle) / 1000.0
print(f"Power: {power:.2f} W")
except NVMLError:
pass
# Wait for the next interval
time.sleep(interval)
# Shutdown NVML
nvmlShutdown()
except NVMLError as err:
print(f"NVML Error: {err}")
sys.exit(1)
# Run monitoring for 30 seconds, refreshing every 2 seconds
if __name__ == "__main__":
monitor_gpus(interval=2, duration=30)Code language: PHP (php)This script provides a comprehensive view of your GPU status, updated at regular intervals. Perfect for monitoring during model training or benchmarking!
Advanced Usage: Process Monitoring
One powerful feature of NVML is the ability to see which processes are using your GPU:
def get_gpu_processes(handle):
try:
# Get compute processes
processes = nvmlDeviceGetComputeRunningProcesses(handle)
print(f"Found {len(processes)} processes running on GPU")
for proc in processes:
pid = proc.pid
used_mem = proc.usedGpuMemory / 1024 / 1024 # Convert to MB
print(f"Process ID: {pid}, Memory Usage: {used_mem:.2f} MB")
# On Linux, you can get process name
try:
import psutil
process = psutil.Process(pid)
print(f"Process Name: {process.name()}")
print(f"Command Line: {' '.join(process.cmdline())}")
except:
pass
return processes
except NVMLError as err:
print(f"Error getting process info: {err}")
return NoneCode language: PHP (php)This function requires the psutil library to get process names on Linux systems. Install it with pip install psutil.
Real-world Applications
The NVML Python API opens up tremendous possibilities for GPU monitoring and management:
- Custom Monitoring Dashboards: Create your own GPU monitoring solution with visualizations and alerts
- Resource Optimization: Track GPU usage patterns to optimize workloads
- Auto-scaling Applications: Dynamically adjust batch sizes based on available GPU memory
- Cluster Management: Distribute workloads based on GPU availability and utilization
- System Health Monitoring: Set up automated alerts for temperature or memory thresholds
Troubleshooting Common Issues
When working with the NVML Python API, you might encounter these common issues:
NVML Not Initialized
If you see errors like “NVML not initialized,” make sure you’re calling nvmlInit() before any other NVML functions.
No GPUs Found
If NVML reports no GPUs, check:
- NVIDIA drivers are properly installed
- The GPU is recognized by the system (try running
nvidia-smiin your terminal) - Your user has permissions to access the GPU devices
Memory Leak Issues
If you notice memory leaks, ensure you’re properly calling nvmlShutdown() when your application ends.
Conclusion
The NVML Python API provides unprecedented access to monitor and manage your NVIDIA GPUs directly from Python code. Whether you’re developing machine learning applications, running compute-intensive simulations, or building custom monitoring solutions, these tools give you fine-grained control over your GPU resources.
I hope this guide helps you get started with NVML Python! Remember, the key to mastering GPU utilization is proper monitoring and management. With these tools, you’ll be able to optimize your applications and maintain peak GPU performance.
Happy coding! 🚀
Resources and Further Reading
Discover more from CodeSamplez.com
Subscribe to get the latest posts sent to your email.

Dear autor,
this is a great tutorial and I have been able to get the information out of my GPU except the frame rate. I searched the pynvml.py for a “getFPS”-function, but unfortunately none is available. Do you have any idea on how to get the frames per second? For example my MSI Afterburner is monitoring this value.
What I want to do basically is to gather some CPU/GPU Information and send it via USB to my Arduino (maybe at 0.5 Hz), which should display the information on a separate small display.
Thanks in advance for your suggestion on how to get the FPS in python! 🙂
Regards,
Matthias