2

I just discovered thermald for preventing machines from overheating. I'd like some basic suggestions on how to modify the xml configuration file. Below is the one I have at /etc/thermald/thermal-conf.xml. From some examples I have skimmed online, it seems it is set to start preventing overheating at 55 C (if I am reading the <Temperature>55000</Temperature> line correctly), but my cores reach even 94 C with fans going on.

I am using a Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz machine.

<?xml version="1.0"?>

<!-- use "man thermal-conf.xml" for details -->

<!-- BEGIN --> <ThermalConfiguration> <Platform> <Name>Generic X86 Laptop Device</Name> <ProductName>EXAMPLE_SYSTEM</ProductName> <Preference>QUIET</Preference> <ThermalSensors> <ThermalSensor> <Type>TSKN</Type> <AsyncCapable>1</AsyncCapable> </ThermalSensor> </ThermalSensors> <ThermalZones> <ThermalZone> <Type>SKIN</Type> <TripPoints> <TripPoint> <SensorType>TSKN</SensorType> <Temperature>55000</Temperature> <type>passive</type> <ControlType>SEQUENTIAL</ControlType> <CoolingDevice> <index>1</index> <type>rapl_controller</type> <influence> 100 </influence> <SamplingPeriod> 16 </SamplingPeriod> </CoolingDevice> <CoolingDevice> <index>2</index> <type>intel_powerclamp</type> <influence> 100 </influence> <SamplingPeriod> 12 </SamplingPeriod> </CoolingDevice> </TripPoint> </TripPoints> </ThermalZone> </ThermalZones> </Platform>

<!-- Thermal configuration example only --> <Platform> <Name>Example Platform Name</Name> <!--UUID is optional, if present this will be matched --> <!-- Both product name and UUID can contain wild card "", which matches any platform --> <UUID>Example UUID</UUID> <ProductName>Example Product Name</ProductName> <Preference>QUIET</Preference> <ThermalSensors> <ThermalSensor> <!-- New Sensor with a type and path --> <Type>example_sensor_1</Type> <Path>/some_path</Path> <AsyncCapable>0</AsyncCapable> </ThermalSensor> <ThermalSensor> <!-- Already present in thermal sysfs, enable this or add/change config For example, here we are indicating that sensor can do async events to avoid polling --> <Type>example_thermal_sysfs_sensor</Type> <!-- If async capable, then we don't need to poll --> <AsyncCapable>1</AsyncCapable> </ThermalSensor> <ThermalSensor> <!-- Examle of a virtual sensor. This sensor depends on other real sensor or virtual sensor. E.g. here the temp will be temp of example_sensor_1 0.5 + 10 --> <Type>example_virtual_sensor</Type> <Virtual>1</Virtual> <SensorLink> <SensorType>example_sensor_1</SensorType> <Multiplier> 0.5 </Multiplier> <Offset> 10 </Offset> </SensorLink> </ThermalSensor>

&lt;/ThermalSensors&gt;
&lt;ThermalZones&gt;
    &lt;ThermalZone&gt;
        &lt;Type&gt;Example Zone type&lt;/Type&gt;
        &lt;TripPoints&gt;
            &lt;TripPoint&gt;
                &lt;SensorType&gt;example_sensor_1&lt;/SensorType&gt;
                &lt;!-- Temperature at which to take action --&gt;
                &lt;Temperature&gt; 75000 &lt;/Temperature&gt;
                &lt;!-- max/passive/active
                    If a MAX type is specified, then
                    daemon will use PID control
                    to aggresively throttle to avoid
                    reaching this temp.
                 --&gt;
                &lt;type&gt;max&lt;/type&gt;
                &lt;!-- SEQUENTIAL | PARALLEL
                When a trip point temp is violated, then
                number of cooling device can be activated.
                If control type is SEQUENTIAL then
                It will exhaust first cooling device before trying
                next.
                --&gt;
                &lt;ControlType&gt;SEQUENTIAL&lt;/ControlType&gt;
                &lt;CoolingDevice&gt;
                    &lt;index&gt;1&lt;/index&gt;
                    &lt;type&gt;example_cooling_device&lt;/type&gt;
                    &lt;!-- Influence will be used order cooling devices.
                        First cooling device will be used, which has
                        highest influence.
                    --&gt;
                    &lt;influence&gt; 100 &lt;/influence&gt;
                    &lt;!-- Delay in using this cdev, this takes some time
                    too actually cool a zone
                    --&gt;
                    &lt;SamplingPeriod&gt; 12 &lt;/SamplingPeriod&gt;
                &lt;/CoolingDevice&gt;
            &lt;/TripPoint&gt;

        &lt;/TripPoints&gt;
    &lt;/ThermalZone&gt;
&lt;/ThermalZones&gt;
&lt;CoolingDevices&gt;
    &lt;CoolingDevice&gt;
        &lt;!--
            Cooling device can be specified
            by a type and optionally a sysfs path
            If the type already present in thermal sysfs
            no need of a path.
            Compensation can use min/max and step size
            to increasing cool the system.
            Debounce period can be used to force
            a waiting period for action
        --&gt;
        &lt;Type&gt;example_cooling_device&lt;/Type&gt;
        &lt;MinState&gt;0&lt;/MinState&gt;
        &lt;IncDecStep&gt;10&lt;/IncDecStep&gt;
        &lt;ReadBack&gt; 0 &lt;/ReadBack&gt;
        &lt;MaxState&gt;50&lt;/MaxState&gt;
        &lt;DebouncePeriod&gt;5000&lt;/DebouncePeriod&gt;
        &lt;!--
            If there are no PID parameter
            compensation increase step wise and exponentaially
            if single step is not able to change trend.
            Alternatively a PID parameters can be specified
            then next step will use PID calculation using
            provided PID constants.
        --&gt;&gt;
        &lt;PidControl&gt;
            &lt;kp&gt;0.001&lt;/kp&gt;
            &lt;kd&gt;0.0001&lt;/kd&gt;
            &lt;ki&gt;0.0001&lt;/ki&gt;
        &lt;/PidControl&gt;
    &lt;/CoolingDevice&gt;
&lt;/CoolingDevices&gt;

</Platform> </ThermalConfiguration> <!-- END -->

Following the suggestion from @heynnema, I have deleted the configuration file, stopped thermald, and ran sudo thermald --no-daemon --loglevel=info. Following is the output, but how is this supposed to help me construct a new, more efficient configuration file?

$ sudo thermald --no-daemon --loglevel=info
[1649408071][INFO]RAPL domain count 1
[1649408071][INFO]RAPL domain count 1
[1649408071][MSG]22 CPUID levels; family:model:stepping 0x6:a5:2 (6:165:2)
[1649408071][INFO]Running on a vanilla kernel
[1649408071][MSG]Polling mode is enabled: 4
[1649408071][INFO]sensor_update: type TSKN
[1649408071][INFO]sensor_update: type acpitz
[1649408071][INFO]sensor_update: type x86_pkg_temp
[1649408071][INFO]sensor_update: type pch_cometlake
[1649408071][INFO]sensor_update: type NGFF
[1649408071][INFO]sensor_update: type TMEM
[1649408071][INFO]sensor_update: type B0D4
[1649408071][INFO]sensor_update: type TVGA
[1649408071][INFO]thd_read_default_thermal_sensors loaded 8 sensors 
[1649408071][INFO]dts /sys/devices/platform/coretemp.0/name doesn't exist
[1649408071][WARN]sensor id 11 : No temp sysfs for reading raw temp
[1649408071][WARN]sensor id 11 : No temp sysfs for reading raw temp
[1649408071][WARN]sensor id 11 : No temp sysfs for reading raw temp
[1649408071][INFO]INT3400 Base path is 
[1649408071][INFO]failed to open /dev/acpi_thermal_rel 
[1649408071][INFO]failed to open /dev/acpi_thermal_rel 
[1649408071][INFO]TRT/ART read failed
[1649408071][INFO]Using config file /etc/thermald/thermal-conf.xml
I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
[1649408071][WARN]error: could not parse file /etc/thermald/thermal-conf.xml
[1649408071][INFO]sensor index:2 TSKN /sys/class/thermal/thermal_zone2/ Async:0 
[1649408071][INFO]sensor index:0 acpitz /sys/class/thermal/thermal_zone0/ Async:0 
[1649408071][INFO]sensor index:7 x86_pkg_temp /sys/class/thermal/thermal_zone7/ Async:1 
[1649408071][INFO]sensor index:5 pch_cometlake /sys/class/thermal/thermal_zone5/ Async:0 
[1649408071][INFO]sensor index:3 NGFF /sys/class/thermal/thermal_zone3/ Async:0 
[1649408071][INFO]sensor index:1 TMEM /sys/class/thermal/thermal_zone1/ Async:0 
[1649408071][INFO]sensor index:6 B0D4 /sys/class/thermal/thermal_zone6/ Async:0 
[1649408071][INFO]sensor index:4 TVGA /sys/class/thermal/thermal_zone4/ Async:0 
[1649408071][INFO]sensor index:8 hwmon /sys/class/hwmon/hwmon5/temp1_input Async:0 
[1649408071][INFO]sensor index:9 hwmon /sys/class/hwmon/hwmon5/temp2_input Async:0 
[1649408071][INFO]sensor index:10 hwmon /sys/class/hwmon/hwmon5/temp3_input Async:0 
[1649408071][INFO]thd_read_default_cooling devices loaded 14 cdevs 
[1649408071][INFO]ppcc limits max:47000000 min:10000000  min_win:28000000 step:1000000
[1649408071][INFO]set_pid_param 14 [-1000.100,10]
[1649408071][INFO]Use Default pstate drv settings
[1649408071][INFO]sysfs create failed 
[1649408071][INFO]INT3400 Base path is 
[1649408071][INFO]failed to open /dev/acpi_thermal_rel 
[1649408071][INFO]failed to open /dev/acpi_thermal_rel 
[1649408071][INFO]TRT/ART read failed
[1649408071][INFO]Using config file /etc/thermald/thermal-conf.xml
I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
[1649408071][WARN]error: could not parse file /etc/thermald/thermal-conf.xml
[1649408071][INFO]name = package-0
[1649408071][INFO]name = dram
[1649408071][INFO]sysfs read failed /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:2/constraint_0_max_power_uw
[1649408071][INFO]:powercap RAPL invalid max power limit range 
[1649408071][INFO]Calculate dynamically phy_max 
[1649408071][INFO]set_pid_param 18 [-0.4.0,0]
[1649408071][INFO]13: ath10k_thermal, C:0 MN: 0 MX:100 ST:1 pt:/sys/class/thermal/ rd_bk 1 
[1649408071][INFO]1: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]11: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]8: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]6: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]4: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]2: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]12: intel_powerclamp, C:-1 MN: 0 MX:50 ST:5 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]0: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]10: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]9: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]7: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]5: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]3: Processor, C:0 MN: 0 MX:3 ST:1 pt:/sys/class/thermal/ rd_bk 0 
[1649408071][INFO]14: rapl_controller, C:47000000 MN: 47000000 MX:10000000 Inc ST:-2000000 Dec ST:-1000000 pt:/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/ rd_bk 1 
[1649408071][INFO]15: intel_pstate, C:0 MN: 0 MX:10 ST:1 pt:/sys/devices/system/cpu/intel_pstate/ rd_bk 1 
[1649408071][INFO]16: rapl_controller_dram, C:100000000 MN: 100000000 MX:0 ST:-500000 pt:/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:2/ rd_bk 1 
[1649408071][INFO]17: LCD, C:0 MN: 0 MX:120000 ST:12000 pt:/sys/class/backlight/intel_backlight/ rd_bk 1 
[1649408071][INFO]18: amdgpu, C:0 MN: 0 MX:0 ST:0 pt: rd_bk 1 
[1649408071][INFO]thd_read_default_thermal_zones loaded 7 zones 
[1649408071][INFO]INT3400 Base path is 
[1649408071][INFO]zone cpu will be created 
[1649408071][INFO]dts zone /sys/devices/platform/coretemp.0/name doesn't exist
[1649408071][INFO]/sys/class/hwmon/hwmon6/name->dell_smm
[1649408071][INFO]/sys/class/hwmon/hwmon4/name->pch_cometlake
[1649408071][INFO]/sys/class/hwmon/hwmon2/name->BAT0
[1649408071][INFO]/sys/class/hwmon/hwmon0/name->AC
[1649408071][INFO]/sys/class/hwmon/hwmon7/name->ath10k_hwmon
[1649408071][INFO]/sys/class/hwmon/hwmon5/name->coretemp
[1649408071][INFO]Buggy max temp: to close to critical 90000
[1649408071][INFO]Core temp DTS :critical 100000, max 90000, psv 95000
[1649408071][INFO]node type: Element, name: CoolingDevice value: rapl_controller
[1649408071][INFO]node type: Element, name: CoolingDevice value: intel_pstate
[1649408071][INFO]node type: Element, name: CoolingDevice value: intel_powerclamp
[1649408071][INFO]node type: Element, name: CoolingDevice value: cpufreq
[1649408071][INFO]node type: Element, name: CoolingDevice value: Processor
[1649408071][INFO]CDEVS order specified in thermal-cpu-cdev-order.xml
[1649408071][INFO]/sys/class/hwmon/hwmon3/name->nouveau
[1649408071][INFO]/sys/class/hwmon/hwmon1/name->acpitz
[1649408071][INFO]INT3400 Base path is 
[1649408071][INFO]failed to open /dev/acpi_thermal_rel 
[1649408071][INFO]failed to open /dev/acpi_thermal_rel 
[1649408071][INFO]TRT/ART read failed
[1649408071][INFO]Using config file /etc/thermald/thermal-conf.xml
I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
[1649408071][WARN]error: could not parse file /etc/thermald/thermal-conf.xml
[1649408071][INFO]

ZONE DUMP BEGIN [1649408071][INFO] [1649408071][INFO]Zone 8: cpu, Active:1 Bind:0 Sensor_cnt:1 [1649408071][INFO]..sensors.. [1649408071][INFO]sensor index:7 x86_pkg_temp /sys/class/thermal/thermal_zone7/ Async:1 [1649408071][INFO]..trips.. [1649408071][INFO]index 0: type:passive temp:95000 hyst:0 zone id:8 sensor id:65535 control_type:1 cdev size:4 [1649408071][INFO]cdev[0] rapl_controller, Sampling period: 0 [1649408071][INFO] target_state:not defined [1649408071][INFO]cdev[1] intel_pstate, Sampling period: 0 [1649408071][INFO] target_state:not defined [1649408071][INFO]cdev[2] intel_powerclamp, Sampling period: 0 [1649408071][INFO] target_state:not defined [1649408071][INFO]cdev[3] Processor, Sampling period: 0 [1649408071][INFO] target_state:not defined [1649408071][INFO]index 1: type:polling temp:85500 hyst:0 zone id:8 sensor id:7 control_type:0 cdev size:0 [1649408071][INFO] [1649408071][INFO]

ZONE DUMP END [1649408071][INFO]Current user preference is 0 [1649408071][INFO]thd_engine_thread begin

After the edit, this is my configuration file, yet cores temp goes up to 90 C:

~$ cat /etc/thermald/thermal-conf.xml
<?xml version="1.0"?>
<ThermalConfiguration>
<Platform>
        <Name>Generic X86 Laptop Device</Name>
        <ProductName>*</ProductName>
        <Preference>QUIET</Preference>
        <ThermalZones>
                <ThermalZone>
                        <Type>cpu</Type>
                        <TripPoints>
                                <TripPoint>
                                        <SensorType>x86_pkg_temp</SensorType>
                                        <Temperature>55000</Temperature>
                                        <type>passive</type>
                                        <ControlType>PARALLEL</ControlType>
                                </TripPoint>
                        </TripPoints>
                </ThermalZone>
        </ThermalZones>
</Platform>
</ThermalConfiguration>

Additional info:

~$ ls -al /etc/thermald
total 32
drwxr-xr-x   2 root      root       4096 Apr  8 16:32 .
drwxr-xr-x 159 root      root      12288 Apr  5 09:03 ..
-rw-r--r--   1 root      root       4605 Jan 15  2019 backup
-rw-rw-r--   1 username username   816 Apr  8 16:32 thermal-conf.xml
-rw-r--r--   1 root      root        508 Jan 15  2019 thermal-cpu-cdev-order.xml

And also this seems relevant (thermald inactive?):

$ sudo systemctl status thermald
● thermald.service - Thermal Daemon Service
     Loaded: loaded (/lib/systemd/system/thermald.service; enabled; vendor preset: enabled)
     Active: inactive (dead) since Fri 2022-04-08 10:54:28 CEST; 1 weeks 0 days ago
   Main PID: 1328 (code=exited, status=0/SUCCESS)

Apr 07 11:51:51 Precision-3551 thermald[1328]: error: could not parse file /etc/thermald/thermal-conf.xml Apr 07 11:51:51 Precision-3551 thermald[1328]: I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml" Apr 07 11:51:51 Precision-3551 thermald[1328]: error: could not parse file /etc/thermald/thermal-conf.xml Apr 07 11:51:51 Precision-3551 thermald[1328]: I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml" Apr 07 11:51:51 Precision-3551 thermald[1328]: error: could not parse file /etc/thermald/thermal-conf.xml Apr 08 10:54:26 Precision-3551 systemd[1]: Stopping Thermal Daemon Service... Apr 08 10:54:26 Precision-3551 thermald[1328]: Terminating ... Apr 08 10:54:27 Precision-3551 thermald[1328]: terminating on user request .. Apr 08 10:54:28 Precision-3551 systemd[1]: thermald.service: Succeeded. Apr 08 10:54:28 Precision-3551 systemd[1]: Stopped Thermal Daemon Service.

I have now reactivated it with sudo service thermald restart and now:

$ sudo systemctl status thermald
● thermald.service - Thermal Daemon Service
     Loaded: loaded (/lib/systemd/system/thermald.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2022-04-15 22:26:23 CEST; 2s ago
   Main PID: 609438 (thermald)
      Tasks: 2 (limit: 18622)
     Memory: 1.3M
     CGroup: /system.slice/thermald.service
             └─609438 /usr/sbin/thermald --systemd --dbus-enable --adaptive

Apr 15 22:26:23 Precision-3551 systemd[1]: Starting Thermal Daemon Service... Apr 15 22:26:23 Precision-3551 systemd[1]: Started Thermal Daemon Service. Apr 15 22:26:23 Precision-3551 thermald[609438]: 22 CPUID levels; family:model:stepping 0x6:a5:2 (6:165:2) Apr 15 22:26:23 Precision-3551 thermald[609438]: 22 CPUID levels; family:model:stepping 0x6:a5:2 (6:165:2) Apr 15 22:26:23 Precision-3551 thermald[609438]: Polling mode is enabled: 4 Apr 15 22:26:23 Precision-3551 thermald[609438]: sensor id 11 : No temp sysfs for reading raw temp Apr 15 22:26:23 Precision-3551 thermald[609438]: sensor id 11 : No temp sysfs for reading raw temp Apr 15 22:26:23 Precision-3551 thermald[609438]: sensor id 11 : No temp sysfs for reading raw temp

Py-ser
  • 687

1 Answers1

0

From the comments:

A lesson on how to configure thermald could take a while. First check man thermald and man thermal-conf.xml. The thermal-conf.xml file that you used is a generic one that's an example only. First remove it altogether, and restart thermald. It'll try and run in a default configuration if it doesn't find the .xml file. See how that works. Otherwise, stop thermald, and manually run it with sudo thermald --no-daemon --loglevel=info and let thermald tell you itself what it finds, and use that to write your own .xml file.

Here's my thermal-conf.xml file...

<?xml version="1.0"?>
<ThermalConfiguration>
<Platform>
        <Name>Dell Inspiron-7700-AIO</Name>
        <ProductName>*</ProductName>
        <Preference>QUIET</Preference>
        <ThermalZones>
                <ThermalZone>
                        <Type>cpu</Type>
                        <TripPoints>
                                <TripPoint>
                                        <SensorType>x86_pkg_temp</SensorType>
                                        <Temperature>65000</Temperature>
                                        <type>passive</type>
                                        <ControlType>PARALLEL</ControlType>
                                        <CoolingDevice>
                                                <index>0</index>
                                                <type>Fan</type>
                                                <influence>30</influence>
                                                <SamplingPeriod>10</SamplingPeriod>
                                        </CoolingDevice>
                                        <CoolingDevice>
                                                <index>5</index>
                                                <type>Processor</type>
                                                <influence>80</influence>
                                                <SamplingPeriod>5</SamplingPeriod>
                                        </CoolingDevice>
                                        <CoolingDevice>
                                                <index>13</index>
                                                <type>intel_powerclamp</type>
                                                <influence>100</influence>
                                                <SamplingPeriod>5</SamplingPeriod>
                                        </CoolingDevice>
                                </TripPoint>
                        </TripPoints>
                </ThermalZone>
        </ThermalZones>
</Platform>
</ThermalConfiguration>

Update #1:

Minimal thermal-conf.xml file...

Just edit <Name>, <SensorType>, and <Temperature> values. Then restart thermald as a daemon, or manually to observe what goes on.

<?xml version="1.0"?>
<ThermalConfiguration>
<Platform>
        <Name>Generic</Name>
        <ProductName>*</ProductName>
        <Preference>QUIET</Preference>
        <ThermalZones>
                <ThermalZone>
                        <Type>cpu</Type>
                        <TripPoints>
                                <TripPoint>
                                        <SensorType>x86_pkg_temp</SensorType>
                                        <Temperature>55000</Temperature>
                                </TripPoint>
                        </TripPoints>
                </ThermalZone>
        </ThermalZones>
</Platform>
</ThermalConfiguration>

To stress the CPU and observe what happens with the temps, first install Vitals https://extensions.gnome.org/extension/1460/vitals/ and set it to display CPU package temps, and FAN speed. Then type "YES" in the terminal and watch what happens to the CPU temp. You can also install the stress app to do the same as "YES", but with more control.

heynnema
  • 73,649