Closed Bug 1492929 Opened 3 years ago Closed 3 years ago

30x slowdown on startup and page load time with MOZ_CHAOSMODE=0

Categories

(Core :: General, defect)

Unspecified
Windows
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla64
Tracking Status
firefox-esr60 --- unaffected
firefox62 --- unaffected
firefox63 --- unaffected
firefox64 --- fixed

People

(Reporter: cpeterson, Assigned: decoder)

References

Details

(Keywords: regression)

Attachments

(1 file)

While probably contraindicated, I run Nightly with MOZ_CHAOSMODE=0 in my Windows environment variables all the time. I've never had any unusual problems (or found any bugs only reproducible in Chaos Mode, TBH) until bug 1486400 (add task dispatch/run delays for ChaosMode) landed. cnn.com used to load in 2 seconds but now it takes over a minute. The Sleep(1) when dispatching tasks is probably too much.

https://hg.mozilla.org/integration/mozilla-inbound/rev/1dd7a055bae0

This bug is probably a low priority because it only affects people or tests with MOZ_CHAOSMODE=0.

Nathan said in Slack:

"30-40x slowdown from that seems pretty unusual, though I guess that's on both the dispatch and the run. We should probably only wait on the dispatch when we're not on the current thread, though I don't know how expensive that would be to check."
Blocks: 1486400
So, does this slowdown only reproduce if MOZ_CHAOSMODE=0 or also with the MOZ_CHAOSMODE flags that are now enabled by default?

Because those task dispatch/run delays are enabled by default now in ASan Nightly.
Flags: needinfo?(cpeterson)
(In reply to Christian Holler (:decoder) from comment #1)
> So, does this slowdown only reproduce if MOZ_CHAOSMODE=0 or also with the
> MOZ_CHAOSMODE flags that are now enabled by default?

Which chaos mode flags are enabled by default? AFAICT, the default is ChaosFeature None (0x0), not to be confused with environment variable MOZ_CHAOSMODE=0 which is parsed as ChaosFeature Any (-1).

Bug 1486400 added two new ChaosFeature flags:

  // Delay dispatching threads to encourage dispatched tasks to run.
  TaskDispatching = 0x40,
  // Delay task running to encourage sending threads to run.
  TaskRunning = 0x80,


With MOZ_CHAOSMODE unset or 0x3F (all ChaosFeatures except the new TaskDispatching or TaskRunning), Firefox takes about 2 seconds to launch and finish loading about:home on my Windows 10 laptop.

With MOZ_CHAOSMODE=0x40, it takes about 6 seconds.
With MOZ_CHAOSMODE=0x80, it takes about 10 seconds.
With MOZ_CHAOSMODE=0xC0 (TaskDispatching | TaskRunning) or 0 (Any), it takes about 16 seconds.
Flags: needinfo?(cpeterson)
Thanks. We enable TaskDispatching, TaskRunning and some others on ASan Nightly only, hence my question

Nathan, can we maybe lower some of the delays a bit so this gets better? Or can you point me to them and I make a patch? Not sure what reasonable values would be, but I assumed we would maybe have to fine-tune this further. Both of the reports I got so far are on Windows, maybe we do something more there.
Flags: needinfo?(nfroyd)
We can't do very much on Windows, because we can't (natively) sleep longer than millisecond resolutions.  If you wanted to adopt the approach that the profiler uses, we could try that:

https://searchfox.org/mozilla-central/source/tools/profiler/core/platform-win32.cpp#251-262

But that's liable to make power usage go up somewhat.
Flags: needinfo?(nfroyd)
Duplicate of this bug: 1493050
(In reply to Nathan Froyd [:froydnj] from comment #4)
> We can't do very much on Windows, because we can't (natively) sleep longer
> than millisecond resolutions.  If you wanted to adopt the approach that the
> profiler uses, we could try that:
> 
> https://searchfox.org/mozilla-central/source/tools/profiler/core/platform-
> win32.cpp#251-262
> 
> But that's liable to make power usage go up somewhat.

It seems like overkill to go to such lengths for precision on a value that was randomly-generated anyway. But doing nothing at all also seems too extreme to me. Can we do something in between? How about a Sleep(0)? (For proper code it's a bad API, but this is chaos mode after all!)
Comment on attachment 9010811 [details]
Bug 1492929 - Disable ChaosMode sleep delay on Windows. r?froydnj

Nathan Froyd [:froydnj] has approved the revision.
Attachment #9010811 - Flags: review+
Pushed by choller@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6b44a1ec79c5
Disable ChaosMode sleep delay on Windows. r=froydnj
https://hg.mozilla.org/mozilla-central/rev/6b44a1ec79c5
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
(In reply to David Major [:dmajor] from comment #7)
> (In reply to Nathan Froyd [:froydnj] from comment #4)
> > We can't do very much on Windows, because we can't (natively) sleep longer
> > than millisecond resolutions.  If you wanted to adopt the approach that the
> > profiler uses, we could try that:
> > 
> > https://searchfox.org/mozilla-central/source/tools/profiler/core/platform-
> > win32.cpp#251-262
> > 
> > But that's liable to make power usage go up somewhat.
> 
> It seems like overkill to go to such lengths for precision on a value that
> was randomly-generated anyway. But doing nothing at all also seems too
> extreme to me. Can we do something in between? How about a Sleep(0)? (For
> proper code it's a bad API, but this is chaos mode after all!)

Even with `Sleep(0)` we should probably only randomly call it with a rather low probability. Depending on what the system clock is set to we could be waiting up to 15ms if there's another thread waiting.
Assignee: nobody → choller
You need to log in before you can comment on or make changes to this bug.