1

tl:dr;

  • ThinkPad X1 Carbon Gen 8 with Intel i7-10510U
  • Ubuntu 24.04
  • extensive throttling down to 400 MHz even if temperature is low
  • GOAL: be able to squeeze the max out of my CPU while keeping the temperature at the top limit (I don't care if it dies, it's old anyway)

EDIT

After trying everything else, at last I tried to completely disable intel_pstate and force acpi. I've enabled Intel SpeedStep in my BIOS and put everything to Max Performance (even for battery).

At first this seemed like it fixed my problem but not for long. But at least the issue now seems consistent and it is possible to temporarily fix it.

So when I start my PC and start stressing the CPU (I use stress for that), it works fine for a few minutes - the temperature is stuck at 97c, clock at 2.6 GHz and even under heavy load, everything is responsive and I can use the PC. After a few minutes, the clock falls to 1.9 GHz and temperature starts falling. At that point, PC becomes slow and sluggish. Even though the temperature is now lower (around 80c), the clock remains at 1.9 GHz.

Then if I switch power mode from Performance to Balanced and back to Performance, the clock jumps to 2.6 GHz again and again, it lasts of few minutes before it goes back to 1.9. So temporary fix is to just switch modes every time I notice the clock has fallen.

Long story

I have a Lenovo ThinkPad X1 Carbon Gen 8 with Intel i7-10510U. Currently running a freshly installed Ubuntu 24.04 LTS (a week ago I wiped my disk and installed everything from scratch with all the default settings). 3 weeks ago the laptop was cleaned up (removed dust, replaced thermal paste).

But I can't get rid of nasty thermal throttling issues that are being bothering me for quite a while now (I would say more than a year). Everything started after one of the Ubuntu upgrades, don't remember which one exactly, and has been there since then through various Ubuntu versions (upgrades) and now with clean install as well.

When I start my laptop, connected to AC and in Performance mode, everything works fine until I start torturing the CPU with max load. Then it reaches 96-97c and works fine for a while until the throttling kicks in. And here comes the issue - once the throttle kicks in, it seems like the system is not able to recover from that and the CPU remains throttled even after the temperature has fallen for 20+ c. I can temporarily solve the problem by switching from Performance to Balanced and then back again -> it will work for a while (eg. it can remain around 5 minutes on 96c and then it will start again). Balance mode doesn't seem to be affected but the performance is also rather bad there as well so I want to use Performance mode.

Sometimes it will throttle it to eg. 2.6 GHz or 1.6 GHz which is OKish, but sometimes it gets all the way down to 400 MHz and the machine gets unusable, even though the temperature is eg. at 50c. So why it doesn't return the high clock once the temperature has decreased?

Before this started, it was working normally. If I was heavily stressing the CPU, it was stuck with temperature on 96c for hours but it never throttled that much so that it becomes unusable. It would throttle it just enough to keep it below 100c which is the absolute max, nothing more. And I want this behavior back.

I've read dozens of threads already, I tried everything that I could but nothing helped. So I tried:

  • full cleanup of laptop internals from dust + thermal paste replacement (done by professionals)
  • full clean install (full disk wipe + clean install) of Ubuntu 24.04 LTS
  • I uninstalled thermald
  • I added GRUB_CMDLINE_LINUX_DEFAULT="intel_pstate=enable quiet splash"to the /etc/default/grub file
  • I tried sudo echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor -> this gets lost after reboot
  • I tried sudo echo performance | sudo tee /sys/firmware/acpi/platform_profile
  • I tried cpufreq-set -r --governor performance
  • I tried sudo powerprofilesctl set performance
  • I installed cpupower-gui and tried to set there both min and max frequency to 4 GHz, I even tried to set it to something sane (like minimum to 2 GHz and max to 4 GHz) -> this gets lost after reboot, but it doesn't do anything even before reboot
  • I tried to play with settings in BIOS for anything related to power and CPU; first I set everything to Performance, and then I disabled everything, but with no help
  • I checked the lap detection mode with cat /sys/devices/platform/thinkpad_acpi/dytc_lapmode but it's 0 all the time
  • cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver returns intel_pstate for all 8 cores

So basically I tried anything that I've found on the internet, looking through dozens of threads on AskUbuntu and Reddit without success and I'm now totally frustrated. Is it possible that there's a hardware failure in question? Should I get a new laptop?

The only thing that comes to my mind currently is to try to update BIOS/firmware. But this used to happen automatically in the past so I would say it's probably now on the latest version. How can I check that and eventually upgrade/downgrade if I don't have Windows on that laptop (I do have on other one if needed)?

Additional facts

  • I'm using original Lenovo 65W charger
  • the battery inside is fully functional (84% health, 630 cycles, no errors and still keeps a few hours of work in one charge)
  • the issue also happens on battery
  • the more I tickle with it, the worse it gets (usually it works fine after reboot until I don't start killing the CPU and then it starts with less throttling until I don't start changing all those profiles -> then it starts to reach 400, 600 or 800 MHz which renders the machine unusable)
  • after reboot, cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor always returns powersave even if I've manually set it to performance before reboot
XploD
  • 189
  • 8

1 Answers1

3

tl:dr; thermald was not working properly from the start, and later I even uninstalled it, thinking that it's causing the issue. It turned out that it's needed for proper working. Installing it again and making it work fixed the issue. To me it looks like there was some kind of conflict between the thermald service and Lenovo's feature for lap detection which prevented thermald to start properly.

Full answer

I think I managed to fix it! It was the da*n thermald which appears to be needed after all. So uninstalling it was the first thing I tried because I found many answers here that it is the one usually causing issues. But after I tried every single remaining option without success, I decided to try to install this thingy again.

So I installed it with:

sudo apt install thermald

I started the service successfully and checked the status with

systemctl status thermald

And saw the following errors:

[/sys/devices/platform/thinkpad_acpi/dytc_lapmode] present: Thermald can't run on this platform Unsupported cpu model or platform

Quick Google brought me to this: https://github.com/intel/thermal_daemon/issues/268

So I edited the service file for thermald:

sudo vi /usr/lib/systemd/system/thermald.service

and replaced --adaptive with --ignore-cpuid-check --workaround-enabled so the ExecStart line looks like this:

ExecStart=/usr/sbin/thermald --systemd --dbus-enable --ignore-cpuid-check --workaround-enabled

After that I restarted the service:

sudo systemctl daemon-reload
sudo systemctl restart thermald 

And when I checked the status again with

sudo systemctl status thermald

I saw that it's running normally now. It still shows some warnings/errors but the last line is the most important - it shows that the daemon was started after all:

Jul 12 18:58:14 thinkpad systemd[1]: Starting thermald.service - Thermal Daemon Service...
Jul 12 18:58:14 thinkpad thermald[1104]: sensor id 13 : No temp sysfs
for reading raw temp
Jul 12 18:58:14 thinkpad thermald[1104]: sensor id 13 : No temp sysfs for reading raw temp
Jul 12 18:58:14 thinkpad thermald[1104]: sensor id 13 : No temp sysfs for reading raw temp
Jul 12 18:58:14 thinkpad thermald[1104]: Config file /etc/thermald/thermal-conf.xml does not exist
Jul 12 18:58:14 thinkpad thermald[1104]: Config file /etc/thermald/thermal-conf.xml does not exist
Jul 12 18:58:14 thinkpad thermald[1104]: Config file /etc/thermald/thermal-conf.xml does not exist
Jul 12 18:58:14 thinkpad thermald[1104]: Polling mode is enabled: 4
Jul 12 18:58:14 thinkpad systemd[1]: Started thermald.service - Thermal Daemon Service.

And voila! It works exactly as I want it to work and as it's supposed to work! I'm currently running stress with all cores at 100%, CPU temp is constantly at 96-97, clock is mostly at 2.6 GHz with occasional variations and it's currently running for 15 minutes without any issue. You can see it occasionally going down below 2 GHz but only for a second and then goes back to 2.6 GHz (probably to keep temperature not going above 97, because TCC Offset is set to 3) but also occasionally goes above 3 GHz.

I tried to reboot the machine, and it continued to work, everything remained. I also tried to switch power modes but I was not able to break it, it just works!

Note:

I did many other things but nothing affected or solved the issue in any way so I do believe that it was just the missing/not working thermald. But in case that this doesn't help for others, what I also did today:

  • I replaced intel_pstate with acpi
  • I've turned on Intel SpeedStep and Intel Power Management in BIOS and set everything to Max Performance, even for the battery

But I do believe that just making sure thermald is installed and running will fix the issue.

I've learned A LOT with this!

XploD
  • 189
  • 8