Closed Bug 1402689 Opened 2 years ago Closed 2 years ago

Firefox 57 regression: infinite loading animation on error pages

Categories

(Core :: DOM: Core & HTML, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 --- unaffected
firefox57 --- wontfix
firefox58 --- fixed

People

(Reporter: soeren.hentzschel, Assigned: freesamael)

References

Details

(Keywords: regression, Whiteboard: [necko-triaged])

Attachments

(2 files)

Attached video regression
[Tracking Requested - why for this release]:

Bug 1376895 introduced a regression on error pages if the WebExtension from Adblock Plus is installed.

STR:

1. Install Adblock Plus WebExtension from development channel (https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/, current version: 2.99.0.1838beta)
2. open about:foo

Expected:

no infinite loading animation in tab

Actual:

see the attached video
I forgot to include the mozregression output:

12:06.38 INFO: Last good revision: dd4bf41ca47d3e3036b194a61af4983c58dd14b2
12:06.38 INFO: First bad revision: 1bc06a4d147320b0976467a9771ec4411dbe66d1
12:06.38 INFO: Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=dd4bf41ca47d3e3036b194a61af4983c58dd14b2&tochange=1bc06a4d147320b0976467a9771ec4411dbe66d1
Hi Gabor,
Can you help take a look at this one?
Flags: needinfo?(gkrizsanits)
Component: General → Tabbed Browser
Hi Andy, this seems like a bad regression. Could you please help find an owner?
Flags: needinfo?(amckay)
Tracked for now
I can't reproduce. A web extension should never be able to interfere with the tabbed browser animation. I'm at a bit of a loss here.
Flags: needinfo?(amckay)
I am still able to reproduce with my working profile and with a new profile on macOS, but only after installing the WebExtension.

I tested further and it's more interesting: the problem still occurs after disabling or uninstalling the WebExtension:

1. about:foo -> no bug
2. install ABP WebExtension
3. about:foo -> bug
4. uninstall ABP WebExtension (I even restarted Firefox)
5. about:foo -> still a bug

I tested with another WebExtension (New Tab Override) and there were no problems. Then I tested with uBlock Origin and the same bug occured.
Sorry, wasn't thinking clearly about comment 5. ABP does intercept incoming web requests and can cause any page to take longer to load, so could cause any tab spinner to take longer. Why it occurs on error pages, I don't know.
Component: Tabbed Browser → WebExtensions: Compatibility
Priority: -- → P5
Product: Firefox → Toolkit
I will get to this within a day or two.
Hi Andy, thanks for looking into this. It was tagged as a P5. Will we get around to fixing a P5 before 57 goes live? I am just wondering whether this is a nice to fix for 57 or a must? Seems like a must to me just based on the severity of the bug and the popularity of the Webext.
Flags: needinfo?(amckay)
It is a popular add-on but the bug doesn't cause performance problems or data loss. Further more the number of people going to about:[some 404] is very, very rare. When they do, they'll like close the page. I've been able to reproduce it on beta now, but I can't reproduce it any other page.
Flags: needinfo?(amckay)
I could not reproduce it on a nightly debug build, and based on Andy's comment I don't think I will work on this.

Bug 1376895 makes sure that for new tabs, if its browser was preloaded, the first time the user leaves the about:newtab page we rerun the process selecting algorithm. I guess whatever API handles the page request interruption should be a bit more careful around this case. If someone wants to fix this issue and knows that code, let me know and I'm happy to provide help to figure something out.
Flags: needinfo?(gkrizsanits)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #11)
> I could not reproduce it on a nightly debug build, and based on Andy's
> comment I don't think I will work on this.
> 
> Bug 1376895 makes sure that for new tabs, if its browser was preloaded, the
> first time the user leaves the about:newtab page we rerun the process
> selecting algorithm. I guess whatever API handles the page request
> interruption should be a bit more careful around this case. If someone wants
> to fix this issue and knows that code, let me know and I'm happy to provide
> help to figure something out.

In case it's helpful: I *think* that the internal browser/webProgress.loadURI() call in the child process throws an error (synchronously) for non-existing about: pages. It doesn't do that in many other cases, which is why we run into edgecases in this space semi-regularly. TBH, I'm tempted to suggest that if we want to do anything here, we should make the internal APIs behave differently and treat it just like a "normal" broken http load where we show an error page, instead of whatever we're doing right now, rather than playing whack-a-mole with all the consumers that aren't prepared to handle the edgecase.
Note, Soren also filed bug 1389784, which reports ghost tabs that don't load content which might be webextension related as well. We have a ni to kmag over there who said he'd look into it.

Andy, not sure what to do here. bug 1389784 is much more serious, if it's the same issue, we might have a blocker here. Can you help me triage these for 57?
Flags: needinfo?(amckay)
Could be, Kris is looking at bug 1389784. A couple of us haven't been able to reproduce that bug either yet. We'll see what we can find.
Flags: needinfo?(amckay)
I can reproduce this reliably without any add-ons installed.

STR:

1) Open a new window
2) visit about:oops
3) ctril-t
4) visit about:oops in the new tab

Note: first tab has proper error icon, second tab displays endless throbber.

I don't think this is extension related.
Component: WebExtensions: Compatibility → Tabbed Browser
Priority: P5 → P2
Product: Toolkit → Firefox
Would appreciate some help triaging this for 57. I personally dont' see it as a blocker, but I'm betting it's a pretty easy bug to fix.
Flags: needinfo?(dao+bmo)
Summary: Firefox 57 regression: infinite loading animation on error pages with Adblock Plus WebExtension installed → Firefox 57 regression: infinite loading animation on error pages
Moving to networking based on comment 12.
Component: Tabbed Browser → Networking
Flags: needinfo?(dao+bmo)
Priority: P2 → --
Product: Firefox → Core
Honza can you take a look?
Flags: needinfo?(honzab.moz)
Not really.  Let's leave this untriaged for now, this is a very low priority thing.  We show the error page, which is good, and there is no big deal (IMO) when the load icon spins.
Flags: needinfo?(honzab.moz)
We should fix this on 58.
Jason, can you find someone?
Flags: needinfo?(jduell.mcbugs)
Priority: -- → P1
Whiteboard: [necko-triaged]
Seeing if someone in Taipei can take this. Looks like comment 15 has the best STR.
Flags: needinfo?(jduell.mcbugs) → needinfo?(swu)
Kershaw can look at this bug.
Assignee: nobody → kechang
Status: NEW → ASSIGNED
Flags: needinfo?(swu)
Note that this bug only happens when e10s is enabled.

I think this is not a networking bug, since all networking code works as expected according to the STR in comment 15. 

The first tab's error icon can show correctly because loadURIWithOptions returns NS_ERROR_MALFORMED_URI at [1]. At the second time, loadURIWithOptions returns NS_OK instead. So, I dug into nsDocShell::LoadURIWithOptions to figure out the difference between the first and second load. The first load of "about:oops" failed in DoURILoad at [2] due to NS_ERROR_MALFORMED_URI returned by NS_NewChannelInternal. For the second load, the doc shell returned NS_OK at [3], which is before calling DoURILoad. It looks like that E10SUtils.shouldLoadURI [4] decide not to load "about:oops" in this process, and lastly, "about:oops" will be loaded again by [5].

Tentatively move the component to DOM and ni :smaug to take a look. Thanks.

[1] https://searchfox.org/mozilla-central/rev/d30462037ffea383e74c42542c820cf65b2b144e/toolkit/content/browser-child.js#359
[2] https://searchfox.org/mozilla-central/rev/d30462037ffea383e74c42542c820cf65b2b144e/docshell/base/nsDocShell.cpp#11219
[3] https://searchfox.org/mozilla-central/rev/d30462037ffea383e74c42542c820cf65b2b144e/docshell/base/nsDocShell.cpp#10715
[4] https://searchfox.org/mozilla-central/rev/d30462037ffea383e74c42542c820cf65b2b144e/browser/base/content/tab-content.js#704
[5] https://searchfox.org/mozilla-central/rev/d30462037ffea383e74c42542c820cf65b2b144e/browser/modules/E10SUtils.jsm#262
Component: Networking → DOM
Flags: needinfo?(bugs)
I don't think I have time to look at this any time soon, and it wasn't my bug which regressed this.
Should we backout the regressing bug?

Samael, perhaps you have some time?
Flags: needinfo?(sawang)
Flags: needinfo?(gkrizsanits)
Flags: needinfo?(bugs)
Thanks for Kershaw's analysis on comment 24.
Assignee: kechang → nobody
Status: ASSIGNED → NEW
(In reply to Olli Pettay [:smaug] from comment #25)
> I don't think I have time to look at this any time soon, and it wasn't my
> bug which regressed this.
> Should we backout the regressing bug?

Backing it out is not really an option. That would be quite a deal breaker for activity stream and would make a pretty massive performance regression.

Comment 24 is correct, and the next step is that the parent process decided if it's needed to be loaded in the parent or in a child process in the later case it creates a new browser for the tab and loads it there. And this is all expected behavior, what I don't know is who/what controls the icon. If anyone knows where can I find that code and maybe briefly explain me how it works I'm happy to figure out what's going on and find a fix for this issue.
Flags: needinfo?(gkrizsanits)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #27)
> anyone knows where can I find that code and maybe briefly explain me how it
> works I'm happy to figure out what's going on and find a fix for this issue.

http://searchfox.org/mozilla-central/rev/dd47bee6468de7e1221b4d006342ad6b9813d0e5/browser/base/content/tabbrowser.xml#757

http://searchfox.org/mozilla-central/rev/dd47bee6468de7e1221b4d006342ad6b9813d0e5/browser/base/content/tabbrowser.xml#770
I found that the busy attribute was set by session store [1], as LoadInOtherProcess eventually invokes SessionStore.navigateAndRestore, and it seems the loading of about:oops and the subsequent error page about:neterror wouldn't cause any state change with STATE_IS_NETWORK, so the busy attribute never gets cleared (the tab progress listener in tabbrowser only clears busy attribute when it receives state change with `STATE_IS_NETWORK & STATE_STOP`).

It did cause onLocationChange with LOCATION_CHANGE_ERROR_PAGE, so probably we can make the tab progress listener clear busy attribute on LOCATION_CHANGE_ERROR_PAGE? I assume :gabor hasn't been working on it yet so I'll provide a patch.

BTW, it seems we're applying nsBrowserStatusFilter on both the parent and child sides, and the filter in OnStateChange [2] is potentially problemtatic in this case. I'm noticing that the child often sends `STATE_STOP & STATE_IS_REQUEST` without a corresponding STATE_START being sent first, which makes mFinishedRequests exceed mTotalRequests on parent side.

[1] http://searchfox.org/mozilla-central/rev/40e8eb46609dcb8780764774ec550afff1eed3a5/browser/components/sessionstore/SessionStore.jsm#2954
[2] http://searchfox.org/mozilla-central/rev/21363323fd4aa21db074c808fb5358a46df6d698/toolkit/components/statusfilter/nsBrowserStatusFilter.cpp#187-190
Assignee: nobody → sawang
Flags: needinfo?(sawang)
(In reply to Samael Wang [:freesamael] from comment #29)
> I found that the busy attribute was set by session store [1], as
> LoadInOtherProcess eventually invokes SessionStore.navigateAndRestore, and
> it seems the loading of about:oops and the subsequent error page
> about:neterror wouldn't cause any state change with STATE_IS_NETWORK, so the
> busy attribute never gets cleared (the tab progress listener in tabbrowser
> only clears busy attribute when it receives state change with
> `STATE_IS_NETWORK & STATE_STOP`).
> 
> It did cause onLocationChange with LOCATION_CHANGE_ERROR_PAGE, so probably
> we can make the tab progress listener clear busy attribute on
> LOCATION_CHANGE_ERROR_PAGE? I assume :gabor hasn't been working on it yet so
> I'll provide a patch.

I looked into it a bit as well and that's how far I've got... My plan was to just
fake a state change event, but your plan might be even better. Thanks for taking it
over!
Duplicate of this bug: 1401323
Comment on attachment 8923310 [details]
Bug 1402689 - Clear tab's "busy" attribute on LOCATION_CHANGE_ERROR_PAGE.

https://reviewboard.mozilla.org/r/194506/#review200410
Attachment #8923310 - Flags: review?(dao+bmo) → review+
Pushed by dgottwald@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/312722e5da38
Clear tab's "busy" attribute on LOCATION_CHANGE_ERROR_PAGE. r=dao
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/312722e5da38
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
(In reply to Samael Wang [:freesamael] from comment #29)
> BTW, it seems we're applying nsBrowserStatusFilter on both the parent and
> child sides, and the filter in OnStateChange [2] is potentially problemtatic
> in this case. I'm noticing that the child often sends `STATE_STOP &
> STATE_IS_REQUEST` without a corresponding STATE_START being sent first,
> which makes mFinishedRequests exceed mTotalRequests on parent side.

Should there be a followup issue to address this?
Flags: needinfo?(sawang)
Filed bug 1414745
Flags: needinfo?(sawang)
See Also: → 1418920
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.