Skip to main content
Puppetry LogoPuppetry Logo (Dark)
StudioPricingBlogAffiliateSupportWhat's New
💻

Tech

Technical insights and development updates

How we fixed CUDA Error 101: invalid device ordinal ... torch._C._cuda_getDeviceCount() >  0 🤯

How we fixed CUDA Error 101: invalid device ordinal ... torch._C._cuda_getDeviceCount() > 0 🤯

This article details how a team fixed a server issue where one out of eight GPUs went offline due to a loose power connector. Attempts to bypass the problem via configuration adjustments failed. Success came from directly unbinding the troublesome GPU from the NVIDIA driver, a quick fix that got the server running again without needing a reboot. The story emphasizes simple, effective solutions in tech troubleshooting.

Saravana Rathinam

Saravana Rathinam

March 23, 2024
About Us
API Docs
Terms of Service
Privacy
Cookies
Facebook
Instagram
YouTube
Discord
Threads
LinkedIn

Puppet™ and Puppetry™ are trademarks of ELBO AI Inc.

© 2024 ELBO AI Inc. All rights reserved.