Why can't Linux with Nvidia drivers just work!
What happened?
I shutdown my dev work laptop last night, and today morning as I boot it up, this happens!

It's lucky that I didn't have urgent work commitments.
So, what's going on? I'm not running very old hardware? I don't think anything is special. It's a Dell Inc. XPS 15 9510 with Ubuntu 24.04.4 LTS. But, it has a NVIDIA GeForce RTX™ 3050 Ti Laptop GPU and therefore I suspect, Nvidia drivers or some Linux kernel upgrades are somehow involved. This is such a pain!
Using my other laptop, I was able to do some searching and found a work around. Luckily, I can to boot into an older kernel, which is this version:
nolan-veed@nolan-veed:/boot$ uname -r
6.14.0-37-generic
And so, I don't need to go into rescue mode or anything. I don't have to go around hunting for that USB flash drive.
Diagnosing the problem
Now, my new kernel, which is 6.17.0-14-generic doesn't work. In the /boot dir, the initrd image shows up as broken:

So, something got upgraded and broke it. If I try to re-install my kernel, I see the following:
nolan-veed@nolan-veed:/boot$ sudo apt install --reinstall linux-image-generic-hwe-24.04
...
Setting up nvidia-dkms-575 (575.57.08-0ubuntu1) ...
update-initramfs: deferring update (trigger activated)
update-initramfs: Generating /boot/initrd.img-6.14.0-37-generic
INFO:Enable nvidia
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/dell_latitude
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/put_your_quirks_here
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/lenovo_thinkpad
Removing old nvidia/575.57.08 DKMS files...
Module nvidia/575.57.08 for kernel 6.14.0-37-generic (x86_64):
Before uninstall, this module version was ACTIVE on this kernel.
Deleting /lib/modules/6.14.0-37-generic/updates/dkms/nvidia.ko.zst
Deleting /lib/modules/6.14.0-37-generic/updates/dkms/nvidia-modeset.ko.zst
Deleting /lib/modules/6.14.0-37-generic/updates/dkms/nvidia-drm.ko.zst
Deleting /lib/modules/6.14.0-37-generic/updates/dkms/nvidia-uvm.ko.zst
Deleting /lib/modules/6.14.0-37-generic/updates/dkms/nvidia-peermem.ko.zst
Running depmod... done.
Deleting module nvidia/575.57.08 completely from the DKMS tree.
Loading new nvidia/575.57.08 DKMS files...
Building for 6.14.0-37-generic and 6.17.0-14-generic
It then proceeds to build the kernel modules for the old and new kernels...
Building initial module nvidia/575.57.08 for 6.14.0-37-generic
Sign command: /usr/bin/kmodsign
Signing key: /var/lib/shim-signed/mok/MOK.priv
Public certificate (MOK): /var/lib/shim-signed/mok/MOK.der
Building module(s)............. done.
Signing module /var/lib/dkms/nvidia/575.57.08/build/nvidia.ko
Signing module /var/lib/dkms/nvidia/575.57.08/build/nvidia-modeset.ko
Signing module /var/lib/dkms/nvidia/575.57.08/build/nvidia-drm.ko
Signing module /var/lib/dkms/nvidia/575.57.08/build/nvidia-uvm.ko
Signing module /var/lib/dkms/nvidia/575.57.08/build/nvidia-peermem.ko
Installing /lib/modules/6.14.0-37-generic/updates/dkms/nvidia.ko.zst
Installing /lib/modules/6.14.0-37-generic/updates/dkms/nvidia-modeset.ko.zst
Installing /lib/modules/6.14.0-37-generic/updates/dkms/nvidia-drm.ko.zst
Installing /lib/modules/6.14.0-37-generic/updates/dkms/nvidia-uvm.ko.zst
Installing /lib/modules/6.14.0-37-generic/updates/dkms/nvidia-peermem.ko.zst
Running depmod... done.
Building initial module nvidia/575.57.08 for 6.17.0-14-generic
Sign command: /usr/bin/kmodsign
Signing key: /var/lib/shim-signed/mok/MOK.priv
Public certificate (MOK): /var/lib/shim-signed/mok/MOK.der
Building module(s)............(bad exit status: 2)
Failed command:
'make' -j16 KERNEL_UNAME=6.17.0-14-generic IGNORE_CC_MISMATCH=1 SYSSRC=/lib/modules/6.17.0-14-generic/build LD=/usr/bin/ld.bfd CONFIG_X86_KERNEL_IBT= modules
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/nvidia-kernel-source-575.0.crash'
Error! Bad return status for module build on kernel: 6.17.0-14-generic (x86_64)
Consult /var/lib/dkms/nvidia/575.57.08/build/make.log for more information.
dpkg: error processing package nvidia-dkms-575 (--configure):
installed nvidia-dkms-575 package post-installation script subprocess returned error exit status 10
dpkg: dependency problems prevent configuration of nvidia-driver-575:
nvidia-driver-575 depends on nvidia-dkms-575 (= 575.57.08-0ubuntu1); however:
Package nvidia-dkms-575 is not configured yet.
dpkg: error processing package nvidia-driver-575 (--configure):
dependency problems - leaving unconfigured
No apport report written because the error message indicates its a followup error from a previous failure.
Setting up linux-image-6.17.0-14-generic (6.17.0-14.14~24.04.1) ...
Setting up linux-image-generic-hwe-24.04 (6.17.0-14.14~24.04.1) ...
Processing triggers for initramfs-tools (0.142ubuntu25.8) ...
update-initramfs: Generating /boot/initrd.img-6.14.0-37-generic
Processing triggers for linux-image-6.17.0-14-generic (6.17.0-14.14~24.04.1) ...
/etc/kernel/postinst.d/dkms:
Sign command: /usr/bin/kmodsign
Signing key: /var/lib/shim-signed/mok/MOK.priv
Public certificate (MOK): /var/lib/shim-signed/mok/MOK.der
Autoinstall of module nvidia/575.57.08 for kernel 6.17.0-14-generic (x86_64)
Building module(s).............(bad exit status: 2)
Failed command:
'make' -j16 KERNEL_UNAME=6.17.0-14-generic IGNORE_CC_MISMATCH=1 SYSSRC=/lib/modules/6.17.0-14-generic/build LD=/usr/bin/ld.bfd CONFIG_X86_KERNEL_IBT= modules
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/nvidia-kernel-source-575.0.crash'
Error! Bad return status for module build on kernel: 6.17.0-14-generic (x86_64)
Consult /var/lib/dkms/nvidia/575.57.08/build/make.log for more information.
Autoinstall on 6.17.0-14-generic failed for module(s) nvidia(10).
Error! One or more modules failed to install during autoinstall.
Refer to previous errors for more information.
run-parts: /etc/kernel/postinst.d/dkms exited with return code 1
dpkg: error processing package linux-image-6.17.0-14-generic (--configure):
installed linux-image-6.17.0-14-generic package post-installation script subprocess returned error exit status 1
Errors were encountered while processing:
nvidia-dkms-575
nvidia-driver-575
linux-image-6.17.0-14-generic
E: Sub-process /usr/bin/dpkg returned an error code (1)
But, the build fails against the newer one. And the build logs are showing the the Nvidia breakage:
In file included from nvidia-uvm/uvm_common.h:43,
from nvidia-uvm/uvm_pmm_gpu.c:163:
/usr/src/linux-headers-6.17.0-14-generic/include/linux/pci-p2pdma.h: In function ‘pci_p2pdma_state’:
nvidia-uvm/uvm_linux.h:390:32: error: ‘struct page’ has no member named ‘pgmap’
390 | #define page_pgmap(page) (page)->pgmap
| ^~
/usr/src/linux-headers-6.17.0-14-generic/include/linux/pci-p2pdma.h:170:37: note: in expansion of macro ‘page_pgmap’
170 | if (state->pgmap != page_pgmap(page))
| ^~~~~~~~~~
make[4]: *** [/usr/src/linux-headers-6.17.0-14-generic/scripts/Makefile.build:287: nvidia-uvm/uvm_pmm_gpu.o] Error 1
It's possible that the kernel was automatically upgraded, and broke.
What if I try to move to a newer driver? According to https://endoflife.date/nvidia, the new LTSB is 580, so that seems like a sensible choice:

So, I try to install that:
nolan-veed@nolan-veed:~$ sudo apt install nvidia-driver-580
...
Removing nvidia-driver-575 (575.57.08-0ubuntu1) ...
Removing nvidia-dkms-575 (575.57.08-0ubuntu1) ...
Removing all DKMS Modules
Done.
INFO:Disable nvidia
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/dell_latitude
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/put_your_quirks_here
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/lenovo_thinkpad
update-initramfs: deferring update (trigger activated)
Removing libnvidia-gl-575:amd64 (575.57.08-0ubuntu1) ...
Removing nvidia-utils-575 (575.57.08-0ubuntu1) ...
Removing xserver-xorg-video-nvidia-575 (575.57.08-0ubuntu1) ...
dpkg: libnvidia-compute-575:amd64: dependency problems, but removing anyway as you requested:
nvidia-compute-utils-575 depends on libnvidia-compute-575.
Removing libnvidia-compute-575:amd64 (575.57.08-0ubuntu1) ...
dpkg: libnvidia-cfg1-575:amd64: dependency problems, but removing anyway as you requested:
nvidia-persistenced depends on libnvidia-cfg1; however:
Package libnvidia-cfg1 is not installed.
Package libnvidia-cfg1-575:amd64 which provides libnvidia-cfg1 is to be removed.
...
Selecting previously unselected package libnvidia-gl-580:amd64.
Preparing to unpack .../06-libnvidia-gl-580_580.126.16-1ubuntu1_amd64.deb ...
Unpacking libnvidia-gl-580:amd64 (580.126.16-1ubuntu1) ...
dpkg: error processing archive /tmp/apt-dpkg-install-ezE9Pi/06-libnvidia-gl-580_580.126.16-1ubuntu1_amd64.deb (--unpack):
trying to overwrite '/usr/lib/x86_64-linux-gnu/gbm/nvidia-drm_gbm.so', which is also in package libnvidia-extra-575:amd64 575.57.08-0ubuntu1
Sigh! From then on, I kept getting dpkg errors which meant I was unable to move forward.
So, it was time to remove the drivers as per Nvidia's latest instructions:
nolan-veed@nolan-veed:~$ apt remove --autoremove --purge -V \
cuda-compat\* \
cuda-drivers\* \
libnvidia-cfg1\* \
libnvidia-compute\* \
libnvidia-decode\* \
libnvidia-encode\* \
libnvidia-extra\* \
libnvidia-fbc1\* \
libnvidia-gl\* \
libnvidia-gpucomp\* \
libnvidia-nscq\* \
libnvsdm\* \
libxnvctrl\* \
nvidia-dkms\* \
nvidia-driver\* \
nvidia-fabricmanager\* \
nvidia-firmware\* \
nvidia-headless\* \
nvidia-imex\* \
nvidia-kernel\* \
nvidia-modprobe\* \
nvidia-open\* \
nvidia-persistenced\* \
nvidia-settings\* \
nvidia-xconfig\* \
xserver-xorg-video-nvidia\*
Once removed, I rebooted my system with the latest kernel, the display was running on the Intel UHD graphics.
I was then able to reinstall the newer drivers successfully:
nolan-veed@nolan-veed:~$ sudo apt install cuda-drivers-580
...
Setting up nvidia-dkms-580 (580.126.16-1ubuntu1) ...
update-initramfs: deferring update (trigger activated)
INFO:Enable nvidia
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/dell_latitude
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/put_your_quirks_here
DEBUG:Parsing /usr/share/ubuntu-drivers-common/quirks/lenovo_thinkpad
Loading new nvidia/580.126.16 DKMS files...
Building for 6.14.0-37-generic and 6.17.0-14-generic
Building initial module nvidia/580.126.16 for 6.14.0-37-generic
Sign command: /usr/bin/kmodsign
Signing key: /var/lib/shim-signed/mok/MOK.priv
Public certificate (MOK): /var/lib/shim-signed/mok/MOK.der
Building module(s)........... done.
Signing module /var/lib/dkms/nvidia/580.126.16/build/nvidia.ko
Signing module /var/lib/dkms/nvidia/580.126.16/build/nvidia-modeset.ko
Signing module /var/lib/dkms/nvidia/580.126.16/build/nvidia-drm.ko
Signing module /var/lib/dkms/nvidia/580.126.16/build/nvidia-uvm.ko
Signing module /var/lib/dkms/nvidia/580.126.16/build/nvidia-peermem.ko
Installing /lib/modules/6.14.0-37-generic/updates/dkms/nvidia.ko.zst
Installing /lib/modules/6.14.0-37-generic/updates/dkms/nvidia-modeset.ko.zst
Installing /lib/modules/6.14.0-37-generic/updates/dkms/nvidia-drm.ko.zst
Installing /lib/modules/6.14.0-37-generic/updates/dkms/nvidia-uvm.ko.zst
Installing /lib/modules/6.14.0-37-generic/updates/dkms/nvidia-peermem.ko.zst
Running depmod... done.
Building initial module nvidia/580.126.16 for 6.17.0-14-generic
Sign command: /usr/bin/kmodsign
Signing key: /var/lib/shim-signed/mok/MOK.priv
Public certificate (MOK): /var/lib/shim-signed/mok/MOK.der
Building module(s)............. done.
Signing module /var/lib/dkms/nvidia/580.126.16/build/nvidia.ko
Signing module /var/lib/dkms/nvidia/580.126.16/build/nvidia-modeset.ko
Signing module /var/lib/dkms/nvidia/580.126.16/build/nvidia-drm.ko
Signing module /var/lib/dkms/nvidia/580.126.16/build/nvidia-uvm.ko
Signing module /var/lib/dkms/nvidia/580.126.16/build/nvidia-peermem.ko
Installing /lib/modules/6.17.0-14-generic/updates/dkms/nvidia.ko.zst
Installing /lib/modules/6.17.0-14-generic/updates/dkms/nvidia-modeset.ko.zst
Installing /lib/modules/6.17.0-14-generic/updates/dkms/nvidia-drm.ko.zst
Installing /lib/modules/6.17.0-14-generic/updates/dkms/nvidia-uvm.ko.zst
Installing /lib/modules/6.17.0-14-generic/updates/dkms/nvidia-peermem.ko.zst
Summary
When it comes to Linux + Nvidia drivers, I have encountered these pain points several times. The upgrade paths aren't perfect. I'm lucky to have worked in these systems for a while to know my way around, but for some folks, it's far from an attractive setup. But, there are ways to get out this mess - removal and reinstall of the Nvidia drivers is simple enough and has always worked well for me. Thanks.