Closed Bug 1545308 Opened 4 months ago Closed 3 months ago

Some packet.net instances are slow: /proc/cpuinfo shows reduced MHz

Categories

(Taskcluster :: General, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gbrown, Unassigned)

References

(Blocks 5 open bugs)

Details

(Keywords: leave-open)

Attachments

(1 file)

In bug 1540280 we noticed that some Android test tasks running on packet.net fail due to reduced performance and that poor performance is strongly associated with particular instances: machine-13 initially, but more recently machine-20, machine-7, and perhaps machine-12.

Bug 1474758 has a good collection of these test failures due to poor performance:

https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-04-10&endday=2019-04-17&tree=trunk&bug=1474758

Android test tasks create an artifact called "android-performance.log" which includes a dump of /proc/cpuinfo. The /proc/cpuinfo from poor-test-performance logs shows very low "cpu MHz": around 800 MHz vs 3000+ MHz for normal performance runs.

Low MHz instances:
https://taskcluster-artifacts.net/VcmCJJ7HTj22PmZ9kSp7tg/0/public/test_info//android-performance.log
https://taskcluster-artifacts.net/EldBuqS9RdeIwMvvtdHXlA/0/public/test_info//android-performance.log
https://taskcluster-artifacts.net/doK_4p7NQbuMhJBDbMw0Bw/0/public/test_info//android-performance.log

eg.

Host /proc/cpuinfo:
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 94
model name	: Intel(R) Xeon(R) CPU E3-1240 v5 @ 3.50GHz
stepping	: 3
microcode	: 0x6a
cpu MHz		: 799.941  <<<===
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs		:
bogomips	: 7007.88
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

Normal MHz instances:
https://taskcluster-artifacts.net/ZCYF9RmrQm6o1wU6AQxNmw/0/public/test_info//android-performance.log
https://taskcluster-artifacts.net/BGlX0vd9SKiMTHJH-XGIqg/0/public/test_info//android-performance.log
https://taskcluster-artifacts.net/FmVpczkERQuqgTGDCZ6L5A/0/public/test_info//android-performance.log

eg

Host /proc/cpuinfo:
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 94
model name	: Intel(R) Xeon(R) CPU E3-1240 v5 @ 3.50GHz
stepping	: 3
microcode	: 0x8a
cpu MHz		: 3700.156 <<<===
cache size	: 8192 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs		:
bogomips	: 7008.63
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

Is that Linux CPU scaling?

Blocks: 1338040

See especially https://bugzilla.mozilla.org/show_bug.cgi?id=1474758#c41: "we may be facing resource starvation in shared tasks. The solution would be to reduce worker capacity and spawn more instances."

While I have some concerns (why didn't we see this effect when we initially experimented with a very small pool? why did this seem to start fairly suddenly a few weeks ago, when there were no big changes to load?), it seems the best explanation. Can we change to 2 workers per instance?

Flags: needinfo?(coop)

(In reply to Geoff Brown [:gbrown] from comment #2)

While I have some concerns (why didn't we see this effect when we initially experimented with a very small pool? why did this seem to start fairly suddenly a few weeks ago, when there were no big changes to load?), it seems the best explanation. Can we change to 2 workers per instance?

It will take a while to recreate all the instances. I'll start tomorrow AM. I'll also want to socialize the associated increase in cost with Travis.

Assignee: nobody → coop
Status: NEW → ASSIGNED
Flags: needinfo?(coop)

I've bumped the total number of instances up to 40 from 25.

I started recreating machine-0, but it took 5 attempts to recreate that single instance due to the flakiness of bootstrapping from scratch every time. Rather than recreate the existing instances, I've decided to provision the new, higher-number instances first instead. Once we have that new capacity, I'll return and recreate the low-numbered ones.

So far we have 4 instances that are each running only 2 workers instead of 4. These are:

machine-0
machine-25
machine-30
machine-34

That gives you some indication of how often provisioning works on the first pass. :/

I'll continue slogging through this.

machine-[24-39] are all running with 2 workers each. I'll now start recreating the existing instances.

There are now 40 packet.net instances, each of which are running 2 workers.

If we are still seeing timeouts and slowdowns in this new configuration, we can drop to a single worker per instance, but at that point, we've pretty much invalidated the reason for pursuing packet.net instances in the first place.

Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED

:coop -- Despite your efforts here, this condition continues; in fact, I see no difference.

From bug 1474758, all of these recent (April 28, 29) failures' android-performance.log artifacts show /proc/cpuinfo around 800 Mhz:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=243333700&repo=autoland&lineNumber=14416
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=243310537&repo=autoland&lineNumber=12875
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=243217995&repo=mozilla-inbound&lineNumber=28534
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=243217990&repo=mozilla-inbound&lineNumber=27638
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=243217501&repo=autoland&lineNumber=29276

eg https://taskcluster-artifacts.net/H71E7qBJRbGwhymOkqE4YQ/0/public/test_info//android-performance.log

Host /proc/cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 158
model name : Intel(R) Xeon(R) CPU E3-1240 v6 @ 3.70GHz
stepping : 9
microcode : 0x8e
cpu MHz : 799.980

If we really are running max 2 workers/instance now, then I think there is no correlation between the reduced cpuinfo MHz and # workers/instance.

Flags: needinfo?(coop)

I would be surprised to see such a correspondance -- nothing in docker apportions MHz between containers.

This sounds a lot more like CPU throttling has somehow been enabled.

I just opened a new ticket in packet for the CPU throttling problem.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

for reference a few years ago :garndt and I worked on running talos at packet.net and we saw the same thing- specific workers were running significantly slower than the majority of the other workers- we were doing 1 process/machine. We spent a few weeks working with our contact at packet.net at the time and didn't get anywhere, they were confused and couldn't explain it.

if we were to work around this and had 4 instances/machine, that means that all instances on those machines would be "auto retried/failed".

does this change over time, as in one machine works fine at 9am but at 11am it is running slower? Was the machine rebooted in between, etc.?

I got a response from packet.net:

So after reviewing what could be the possible cause for this, I dag deeper on c1.small.x86 capabilities instead.
I highly suggest to tune your CPU and this to verify or If you haven't already here is our best guide below:
https://support.packet.com/kb/articles/cpu-tuning

This can be verified/confirmed by

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

I am working in another bug and will dig into this afterward.

Verify setting of scaling_governor by adding it to existing log.

Keywords: leave-open

I've seen this on multiple Intel based machines - desktops and servers.

It's related to the scaling governor - and the default appears to throttle down the processor to 800MHz (usually) if its not modified. You may have to tweak the following scripts - depending on your processor type.

First to see what mode the scaling governor is in:

#!/bin/bash

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

----- call this mode.sh --------

Next to set all the cores for max performance - try this:

#!/bin/bash
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor >/dev/null

------ call this scripts perf.sh --------

Next - rerun the "mode.sh" script above and you should see:

performance

printed out once for every core you have access to.

Lastly - if you want to save energy, you can set the scaling governor to save power with:

#!/bin/bash

echo powersave | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor >/dev/null

------- call this psave.sh -------

These scripts are copied from from a server with a 6-core 12 thread Xeon processor with an Intel motherboard. I've hunted through all the BIOS settings and set everything up to "go fast" - that is - no throttling. Yet I have to run the perf.sh script everytime the machine reboots!!

Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d6416b899841
Add cpufreq/scaling_governor info to android-performance.log; r=wcosta

Thanks Al - sounds like we are on the right track.

And my diagnostic patch confirms that we usually see "powersave".

I can't write to scaling_governor in the same place -- no privileges. Maybe that's better done in the worker? Hoping wcosta or coop can sort that out...

I just redeployed instances with CPU governor set to "performance". :gbrown, could you please confirm the slowness issue is gone?

Flags: needinfo?(gbrown)

I still see "powersave" reported by recent tasks:

https://treeherder.mozilla.org/logviewer.html#?job_id=244855542&repo=autoland

https://taskcluster-artifacts.net/aA8f776nSMqt3jS99rDvzQ/0/public/test_info//android-performance.log

/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor: powersave
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor: powersave
/sys/devices/system/cpu/cpu2/cpufreq/scaling_governor: powersave
/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor: powersave
/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor: powersave
/sys/devices/system/cpu/cpu5/cpufreq/scaling_governor: powersave
/sys/devices/system/cpu/cpu6/cpufreq/scaling_governor: powersave
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor: powersave

Flags: needinfo?(gbrown)

Two things:

  1. Given that reducing the # of workers per instance didn't fix this problem, are we back to running 4 workers/instance with 25 instances total again? I never landed my change to switch to 2 workers/instance with 40 instances when it became clear that wasn't helping, and AFAICT there's been no other change to the terraform file: https://github.com/taskcluster/taskcluster-infrastructure/blob/master/docker-worker.tf Just want to make sure that the github repo is representative of the current state.

  2. The sooner we can get to image-based deployments in packet.net, the better. This iteration cycle is going to be painful otherwise. Wander: can you pick up bug 1523569 once sccache in GCP is done, please? Maybe a git repo is overkill, but having some sort of local filestore for images hosted in packet.net will be required for bug 1508790 anyway.

Flags: needinfo?(coop) → needinfo?(wcosta)

(In reply to Geoff Brown [:gbrown] from comment #19)

I still see "powersave" reported by recent tasks:

https://treeherder.mozilla.org/logviewer.html#?job_id=244855542&repo=autoland

https://taskcluster-artifacts.net/aA8f776nSMqt3jS99rDvzQ/0/public/test_info//
android-performance.log

/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor: powersave
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor: powersave
/sys/devices/system/cpu/cpu2/cpufreq/scaling_governor: powersave
/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor: powersave
/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor: powersave
/sys/devices/system/cpu/cpu5/cpufreq/scaling_governor: powersave
/sys/devices/system/cpu/cpu6/cpufreq/scaling_governor: powersave
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor: powersave

There was a bustage in the code, I redeployed and it now has "performance"

Flags: needinfo?(wcosta)

(In reply to Wander Lairson Costa [:wcosta] from comment #21)

There was a bustage in the code, I redeployed and it now has "performance"

Sorry, but I still see "powersave" reported by all recent tasks:

https://treeherder.mozilla.org/logviewer.html#?job_id=245395212&repo=autoland (Started: Wed, May 8, 13:04:33)

https://taskcluster-artifacts.net/Ghl2qrptRyujTc4L3ecniQ/0/public/test_info//android-performance.log

/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor: powersave
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor: powersave
/sys/devices/system/cpu/cpu2/cpufreq/scaling_governor: powersave
/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor: powersave
/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor: powersave
/sys/devices/system/cpu/cpu5/cpufreq/scaling_governor: powersave
/sys/devices/system/cpu/cpu6/cpufreq/scaling_governor: powersave
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor: powersave

(In reply to Wander Lairson Costa [:wcosta] from comment #21)

There was a bustage in the code, I redeployed and it now has "performance"

Relevant PR is here: https://github.com/taskcluster/taskcluster-infrastructure/pull/46

I suggested to Wander in IRC that we should try fixing this by hand, i.e. ssh to each machine and manually set scaling_governor to performance. We can then iterate on making the deployment automation do this automatically.

The latest tasks have "performance" now:

https://treeherder.mozilla.org/logviewer.html#?job_id=245429039&repo=autoland

https://taskcluster-artifacts.net/Cug6JQYTT-2QwWLnjJR6tQ/0/public/test_info//android-performance.log

/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor: performance
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor: performance
/sys/devices/system/cpu/cpu2/cpufreq/scaling_governor: performance
/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor: performance
/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor: performance
/sys/devices/system/cpu/cpu5/cpufreq/scaling_governor: performance
/sys/devices/system/cpu/cpu6/cpufreq/scaling_governor: performance
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor: performance

Self ni to check on associated intermittent failures tomorrow.

Flags: needinfo?(gbrown)

Oh darn.

From bug 1474758,

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=245433579&repo=autoland&lineNumber=13527

https://taskcluster-artifacts.net/SOjCQvELT7OoeYGgaVYy-w/0/public/test_info//android-performance.log

Host cpufreq/scaling_governor:
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor: performance
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor: performance
/sys/devices/system/cpu/cpu2/cpufreq/scaling_governor: performance
/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor: performance
/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor: performance
/sys/devices/system/cpu/cpu5/cpufreq/scaling_governor: performance
/sys/devices/system/cpu/cpu6/cpufreq/scaling_governor: performance
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor: performance

Host /proc/cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 94
model name : Intel(R) Xeon(R) CPU E3-1240 v5 @ 3.50GHz
stepping : 3
microcode : 0x6a
cpu MHz : 799.941
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs :
bogomips : 7008.65
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:

Notice "performance": this happened after yesterday's change.

Notice "cpu MHz : 799.941"!!

Flags: needinfo?(gbrown)

(In reply to Geoff Brown [:gbrown] from comment #25)

Oh darn.

Notice "cpu MHz : 799.941"!!

OK, that's concerning.

Wander: can you double-check the pool of instances to see how pervasive this is, and then reach out to packet for next steps here?

I've NI-ed Al who is already on this bug too.

Flags: needinfo?(wcosta)
Flags: needinfo?(al)

I see 3 new examples so far:

https://treeherder.mozilla.org/logviewer.html#?job_id=245544363&repo=autoland
https://treeherder.mozilla.org/logviewer.html#?job_id=245433579&repo=autoland
https://treeherder.mozilla.org/logviewer.html#?job_id=245434394&repo=mozilla-central

2 of the 3 examples above have worker id "machine-4". Can that worker id be translated into a packet.net instance that someone could ssh into to investigate further?

The fix I provided in Comment 13 will not persist after a reboot.
To ensure that the scaling governor is placed into performance mode requires the following steps:

Add the following line:

GOVERNOR="performance"

in

/etc/init.d/cpufrequtils

On Ubuntu 18.04 you need to run:

sudo apt-get install cpufrequtils
sudo systemctl disable ondemand


Flags: needinfo?(al)

I spotted frequencies going to 800 MHz even with scaling governor set to performance. What I am now doing is, besides setting scaling governor to performance, I also set the minimum CPU frequency to 3.5 GHz. :gbrown, could you please keep an eye on failing tasks?

Flags: needinfo?(wcosta)

Update: even so, I can see machines running with 800 MHz.

Associated test failures definitely continue and remain a big concern. Logs show "performance" and ~800 MHz.

Do we have more ideas? Any work in progress?

I am tempted to have the task fail and retry when it finds this condition.

Do we know the frequency how often this happens? Is it happening on all workers or only for some of those? Also which scaling driver is actually running? Maybe there is a bug we are just hitting here?

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #32)

Do we know the frequency how often this happens?

Not really. It is not very frequent, but there are about 10 cases found in bug 1474758 each day; that is only jsreftests, which account for maybe 10% of packet.net tasks, so a very gross estimate would be 100 cases of low MHz per day.

Is it happening on all workers or only for some of those?

There is a correlation with certain worker-ids for a period of time, but the affected worker-ids seem to change from day to day. Look at the "Machine name" column of https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-05-09&endday=2019-05-16&tree=trunk&bug=1474758 to see what I mean.

Also which scaling driver is actually running?

When a task cats /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor, it sees "performance" now.

Maybe there is a bug we are just hitting here?

A scaling driver / governor bug?

(In reply to Geoff Brown [:gbrown] from comment #33)

Not really. It is not very frequent, but there are about 10 cases found in bug 1474758 each day; that is only jsreftests, which account for maybe 10% of packet.net tasks, so a very gross estimate would be 100 cases of low MHz per day.

Is it happening on all workers or only for some of those?

There is a correlation with certain worker-ids for a period of time, but the affected worker-ids seem to change from day to day. Look at the "Machine name" column of https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-05-09&endday=2019-05-16&tree=trunk&bug=1474758 to see what I mean.

As it looks like it happens for the workers with the name 7, 8, 34, and 36. Others only appear once in that list. Maybe someone could check one of those manually? Can we blacklist (taking out of the pool) for now?

Also which scaling driver is actually running?

When a task cats /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor, it sees "performance" now.

So comment 23 referenced: https://github.com/taskcluster/taskcluster-infrastructure/pull/46/files

Where in this patch do we actually set the governor to performance? We only start the cpufreq service, or? Using cat in a task (?) to set it, doesn't it conflict with the service?

A scaling driver / governor bug?

Yes, so it would be good to know which driver is actually used. Is it intel_pstate?

See Also: → 1552334

Bug 1552334 recognizes the slow instances and retries the affected task. That is effective in avoiding test failures, but sometimes delays test runs and is inefficient in terms of worker use: It would still be great to see this bug resolved properly.

(In reply to Geoff Brown [:gbrown] from comment #35)

sometimes delays test runs and is inefficient in terms of worker use

As an example, in

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&searchStr=android%2Copt%2Cx86_64%2Cwpt2&revision=b74e5737da64a7af28ab4f81f996950917aa71c5

this task retried 4 times (getting worker 8 or worker 34 each time). The task takes about 5 minutes to detect the poor performance (this could be improved) + time for rescheduling, so the start of the successful task was delayed by about 30 minutes in total.

so maybe we estimate 7 minutes/retry- and then calculate the total retries or % retries or retries/day, then we could determine how many workers we need.

Actually if we have this retry in place, possibly we could consider running talos on linux @packet.net :)

Assignee: coop → nobody

This seems to have stopped!

I see no retries due to reduced MHz since May 25. Wonderful!

Let's keep this bug open, unless we know how it was fixed...

Self note: the correct command line to set the cpu governor for intel_pstate driver is:

echo performance | tee /sys/devices/system/cpu/cpufreq/policy*/scaling_governor

Status: REOPENED → RESOLVED
Closed: 4 months ago3 months ago
Resolution: --- → FIXED

Beginning May 30, tasks consistently reported "powersave" governors. Intermittent retries for reduced MHz were noticed today.

You need to log in before you can comment on or make changes to this bug.