Fixed my AMD Ryzen 5000x series random reboots

After running fine for two years my Ryzen 5950x CPU became unstable and started throwing random reboots. No errors were reported, and Windows Event Viewer just showed a kernel power loss event. The distinguishing feature was that the reboots usually happened when the CPU was lightly loaded or idle, or a few seconds after leaving a heavy workload. The machine passed stress tests without an issue.

TLDR: After trying every BIOS adjustment under the sun, installing a different OS, and rotating every component bar the motherboard with old spare parts, I was left with only one option: It had to be the motherboard. So I bought a new one (MSI MAG X570S Torpedo Max), and yes, everything just works great now.

For a long time I thought it was the CPU, until I swapped in my old 3900x without improvement. Trolling the internet, this is a common problem across AMD's 3000x, 5000x and (sadly) new 7000x CPUs. There's no official acknowledgement of the problem by AMD (sound familiar?), and there doesn't seem to be a single reliable fix. The consensus seems to be that the CPU is undervolting below tolerance under light/idle workloads. So people are experimenting with many different BIOS settings to try and fix it. But I just couldn't get it to work, no matter what I tried.

So if you are having this problem consider that it might not be the CPU. If you can borrow one or have a compatible spare sitting around try swapping them out!

AM4 boards are starting to disappear, but I got the MSI MAG X570S Torpedo Max mainly because the VRM was highly rated (S-tier) in terms of ability to deliver power. If you have a juice hungry machine, be aware that boards vary widely in their capacity to supply power and product names can be misleading. The Gigabyte X570 Aorus Elite, for example, sounds like it should be really good but is actually a mediocre B-tier peformer.

It's early days but I set an arbitrary all-core undervolt of -15 in advanced PBO. The machine seems stable and is giving slighly above average scores in OCCT benchmarks. In the default OCCT torture test it sustains 4.5 GHz with all cores engaged at 100% workload, while running at a mere 63°C thanks to a new Arctic Liquid Freezer III 280 cooler. 

