Closed Bug 1650629 Opened 5 years ago Closed 5 years ago

Mozilla Firefox Nightly randomly hangs on opening website pages (TaskController busy-loops in background processes while one process is busy)

Categories

(Core :: Performance: General, defect, P2)

80 Branch
x86_64
All
defect

Tracking

()

VERIFIED FIXED
mozilla80
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- unaffected
firefox78 --- unaffected
firefox79 --- unaffected
firefox80 --- verified

People

(Reporter: Virtual, Assigned: bas.schouten)

References

(Regression)

Details

(Keywords: nightly-community, regression, reproducible)

Attachments

(2 files)

STR:

  1. Open Mozilla Firefox Nightly
  2. Open Bugzilla website page

and enjoy hang for some time (0,5-3s)

If it's not reproducible, open various website pages, each website page in new tab.

Unfortunately capturing profile with Firefox Profiler is not possible for me, due to bug #1650627,
but I will try to search for regression range with mozregression-gui with my old Mozilla Firefox Nightly profile.

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Component: Untriaged → Gecko Profiler
Product: Firefox → Core
Component: Gecko Profiler → Untriaged
Product: Core → Firefox

Profile - https://share.firefox.dev/2CaOuid

I'm still searching for regression range.

First searches shows that probably:
bad - 2020-07-03
good - 2020-07-04

Looks like a really long (~3s) wait on the content process main thread waiting for async paints to flush.

Component: Untriaged → Graphics
Product: Firefox → Core

@markus: Can you see something in the profile that would be causing this?

Putting severity to S2 for now.

Blocks: gfx-triage
Severity: -- → S2
Flags: needinfo?(mstange.moz)
Priority: -- → P2

@virtual_manPL: Can you attach your about:support information?

Flags: needinfo?(Virtual)
Attached file about;support.txt β€”

(In reply to Kris Taeleman (:ktaeleman) from comment #5)

@virtual_manPL: Can you attach your about:support information?

Sure.

Flags: needinfo?(Virtual)

That looks different. Instead of a graphics issue, this looks like about 13s is spent waiting for a socket to access the network.

I agree with Mike, two very different issues.

In the first profile, with the long "flushing async paints", I can see a very busy "Privileged content process" during the same time that seems to be caught spinning in the new task scheduler. Maybe it's starving CPU resources from the paint thread?

In the second profile, the network request to bugzilla is delayed by uBlock Origin, which seems to be waiting for another network request that it started earlier, to "https://hosts-file.net/.%5Cad_servers.txt?_=8". That network request then errors out after a long time. I don't know why the request would take so long, and the URL seems wrong, too. Maybe there's some corrupted state in the IndexedDB database, like there was for profiler.firefox.com .

Flags: needinfo?(mstange.moz)

Moving to performance team as these don't look like Graphics issues.

Component: Graphics → Performance

The first issue is bug 1649976.

I believe I've found the root cause of this issue, I haven't been able to reproduce yet myself but I have a patch. Still looking to find a way to reproduce it so I can verify my fix.

Virtual, are you able to reproduce this reliably? If so, perhaps you could test a build from Bas to see if it helps.

[uBlock Origin] seems to be waiting for another network request that it started earlier, to https://hosts-file.net/.%5Cad_servers.txt?_=8. That network request then errors out after a long time.

That resource has been removed from uBO's stock lists (it has never been enabled by default) a long while ago since the list no longer exists:
https://github.com/uBlockOrigin/uBlock-issues/issues/971

People should definitely remove it from their selection of lists.

(In reply to Mike Conley (:mconley) (:βš™οΈ) from comment #13)

Virtual, are you able to reproduce this reliably? If so, perhaps you could test a build from Bas to see if it helps.

Windows build is here: target.zip (try push)

Flags: needinfo?(Virtual)

So, what's happening here is two-fold:

  1. Because the TaskController can return its dummy event even when it only has idle tasks, i.e. isn't going to do any work. We can wrongly tell the callers that work has been done, this needs fixing.
  2. If mayWait is true we should block inside TaskController waiting for non-idle events if it turns out there's only idle events, rather than just doing a no-op. This is the behavior as it was before.

(In reply to Markus Stange [:mstange] from comment #9)

[...]
In the second profile, the network request to bugzilla is delayed by uBlock Origin, which seems to be waiting for another network request that it started earlier, to "https://hosts-file.net/.%5Cad_servers.txt?_=8". That network request then errors out after a long time. I don't know why the request would take so long, and the URL seems wrong, too. Maybe there's some corrupted state in the IndexedDB database, like there was for profiler.firefox.com .

Looks like I had some issue with database of uBlock Origin extension too, as it's stayed for some time in 1.27.11b4 per-release version. It did not want to update itself by auto-update, even when I downloaded XPI file and installed it manually it did not help, as it did not update. I tried again to update it in disabled mode and fortunately it finally updated to latest version after I installed it manually by using downloaded earlier XPI file.

I also disabled few obsolete and outdated filter lists which are not working anymore, as Markus Stange [:mstange] above and Raymond Hill [:gorhill] below wrote:
(In reply to rhill@raymondhill.net from comment #14)

[uBlock Origin] seems to be waiting for another network request that it started earlier, to https://hosts-file.net/.%5Cad_servers.txt?_=8. That network request then errors out after a long time.

That resource has been removed from uBO's stock lists (it has never been enabled by default) a long while ago since the list no longer exists:
https://github.com/uBlockOrigin/uBlock-issues/issues/971

People should definitely remove it from their selection of lists.

@ Bas Schouten (:bas.schouten) + Mike Conley (:mconley) (:βš™οΈ) + Markus Stange [:mstange] - Awesome! Thank you very much! I will test this try test build deeply for few days and I will report back to you with results by end of weekend to make sure, as bug happens in most cases randomly, but still for my very long browser sessions I'm having over dozens of hangs, so definitely I will see difference if it's fixed or not.

Summary: Mozilla Firefox Nightly randomly hangs on opening website pages → Mozilla Firefox Nightly randomly hangs on opening website pages (TaskController busy-loops in background processes while one process is busy)
OS: Windows 7 → All
Assignee: nobody → bas
Status: NEW → ASSIGNED
Regressed by: 1606706
Pushed by bschouten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/63b8480782c8 Correctly wait when requested and report whether work was done by ProcessNextEvent when using TaskController. r=mstange
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla80

Set release status flags based on info from the regressing bug 1606706

Thank you very much for fixing this! \o/

I didn't notice anymore any hangs, so I'm confirming that bug looks fixed in this try test build, and as well starting in Mozilla Firefox Nightly 80.0a1 (2020-07-10). I'm only hoping, that performance is same or even better with patch from this bug with patch from bug #1606706, before landing both.

Has Regression Range: --- → yes
Has STR: --- → yes
Flags: needinfo?(Virtual)
Regressions: 1652684
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: