Permanent comm/mail/test/browser/message-header/browser_messageHeader.js | waiting for vsync to be disabled - timed out after 50 tries
Categories
(Thunderbird :: Upstream Synchronization, defect)
Tracking
(thunderbird_esr102 unaffected, thunderbird104 unaffected)
| Tracking | Status | |
|---|---|---|
| thunderbird_esr102 | --- | unaffected |
| thunderbird104 | --- | unaffected |
People
(Reporter: darktrojan, Assigned: florian)
References
(Regression)
Details
(Keywords: intermittent-failure, regression, Whiteboard: [stockwell disable-recommended])
Attachments
(4 files)
This use of the accessibility service causes vsync to be enabled forever and the test fails at the check added by bug 1742842.
| Assignee | ||
Comment 1•3 years ago
|
||
This reminds me bug 1745024.
Somehow a refresh observer is added, and never removed.
I added a profiler marker to see stacks for all refresh observer additions (I'm attaching the patch that adds this marker, in case you would like to use it).
Then using ./mach test comm/mail/test/browser/message-header/browser_messageHeader.js --profiler I got this profile of the test: https://share.firefox.dev/3QcdZQW
Filtering markers for "accessibility,test" (https://share.firefox.dev/3blTxyj) I could find that there's one RefreshObserverStart marker that doesn't match the beginning of a RefreshObserver marker, so that's a RefreshObserver that isn't removed before the end of the profile.
From the Marker Table we can see the full stack: https://share.firefox.dev/3cSJ7X3
The RefreshObserver addition was triggered by an Element.setAttribute call from set headerValue chrome://messenger/content/header-fields.js:631, which points to https://searchfox.org/comm-central/rev/a35c27ab8f68ca38fa1584932e5e9f5dfecfe002/mail/base/content/widgets/header-fields.js#643-646
That's as far as I could go with a quick inspection of the test and its behavior in the profiler. I think this points to a real bug in Thunderbird that wastes power (hopefully only for users who have the accessibility service initialized).
| Reporter | ||
Comment 2•3 years ago
|
||
I'm well out of my depth, but I don't think the setAttribute call is (directly) to blame. If you take it away the problem remains. In fact some crude debugging showed several instances of mozilla::a11y::NotificationController that called AddRefreshObserver more times than RemoveRefreshObserver. (That is the real problem, right?)
I don't know anything about the a11y code so I'm stuck for where to go from here.
| Comment hidden (Intermittent Failures Robot) |
| Assignee | ||
Comment 4•3 years ago
|
||
(In reply to Geoff Lankow (:darktrojan) from comment #2)
I'm well out of my depth, but I don't think the
setAttributecall is (directly) to blame.
My understanding is that this setAttribute call is where the problem starts, but is not the problem. There's likely something related to that document (or how it's loaded, or unloaded) that makes the accessibility code forget to unregister its refresh observer.
If you take it away the problem remains. In fact some crude debugging showed several instances of
mozilla::a11y::NotificationControllerthat calledAddRefreshObservermore times thanRemoveRefreshObserver. (That is the real problem, right?)
I think it is, yes.
I don't know anything about the a11y code so I'm stuck for where to go from here.
I think your next step will be to ask for help from people who know the accessibility code. Jamie, do you have suggestions for how Geoff could continue his investigations here, or guesses about what might be happening?
Comment 5•3 years ago
|
||
If the a11y refresh observer isn't getting removed, that suggests a11y thinks there is something pending it needs to wait for. I'd start by checking whether any of the various early returns in a11y::NotificationController::WillRefresh are being hit. That should tell you why the refresh observer isn't being removed (it gets removed at the bottom of that function). That might lead to an obvious answer... or it might not, but even if it doesn't, knowing which early return is being hit will help me figure out how to direct you next.
| Reporter | ||
Comment 6•3 years ago
|
||
That makes sense.
It's not returning early, it's falling out the bottom of the function without calling RemoveRefreshObserver, because mDocument->HasLoadState(DocAccessible::eCompletelyLoaded) is false. I'm assuming that the not-completely-loaded document isn't messenger.xhtml (mDocument) but one of the subdocuments.
| Reporter | ||
Comment 7•3 years ago
|
||
I enabled lifecycle logging in the test (with as much removed as possible) and this is what it said. As far as I can make out, messenger.xhtml is loaded completely, then starts loading again.
| Comment hidden (Intermittent Failures Robot) |
Comment 9•3 years ago
|
||
Thanks for the log. Unfortunately, I can't see why this might be failing from this.
There are a couple of things I'd suggest trying:
- I'm guessing this log is missing some of the "document initialize" snippets. I'd look for "document initialize" snippets with a DOM document address that don't have corresponding "
document loaded *completely*" snippets with the same DOM document address. That should tell you which document never gets the complete state. - Add an
elseblock at the bottom of the finalifcondition in a11y::NotificationController::WillRefresh. Inside that, print the URI of the document so you can see which one is never loading. This will spam you with output, but if you get an entry which repeats forever, that should tell you which document it is. Something like this:
printf("a11y didn't remove refresh, doc %p, URI %s\n", mDocument, mDocument->DocumentNode()->GetDocumentURI()->GetSpecOrDefault().get());
| Comment hidden (Intermittent Failures Robot) |
| Reporter | ||
Comment 11•3 years ago
|
||
Here's a complete log of the test instead of just the a11y bits. This time I turned the logging on right at the start of the program instead of just in the test, and left in a bit of printf debugging too. What happens between 0:03.03 TEST_START and 0:03.07 INFO Entering setup bound seems weird to me.
| Comment hidden (Intermittent Failures Robot) |
Comment 13•3 years ago
|
||
Urgh. It also looks like there's some logging missing, which I don't understand.
0:03.06 GECKO(3601475) a11y added refresh, doc 0x7f25562ab700, URI chrome://messenger/content/messenger.xhtml
But I don't see any entry in the log for 0x7f25562ab700 before that point. There should be a A11Y DOCCREATE: document initialize at the very least. And then there's this:
0:03.07 GECKO(3601475) module enabled: doclifecycle
I'm not sure why a11y logging would get reinitialized there. That, plus the destroy for messenger.xhtml above, makes me wonder whether the a11y engine is being shut down and then restarted for some reason... but I don't understand how or why that would happen, especially if it's enabled due to client detection or force enabling at startup, which means it can't be shut down.
I'm thinking it might be useful to printf the child documents that aren't ready along with their URIs so we can see which doc is holding us up. Probably the easiest place to do this (or copy) is here. You can print the URI the same way as for mDocument.
| Reporter | ||
Comment 14•3 years ago
|
||
I'm not sure why a11y logging would get reinitialized there.
I restarted it. But why did it shut down in the first place?
I'm thinking it might be useful to printf the child documents that aren't ready along with their URIs so we can see which doc is holding us up. Probably the easiest place to do this (or copy) is here. You can print the URI the same way as for mDocument.
Okay I'll have a look when I get a moment.
| Comment hidden (Intermittent Failures Robot) |
Comment 16•3 years ago
|
||
(In reply to Geoff Lankow (:darktrojan) from comment #14)
I restarted it. But why did it shut down in the first place?
How are you starting the a11y engine at Firefox startup? Do you have it force enabled via pref or does it just get enabled for you?
How are you starting a11y logging: via the environment variable or via JS?
How did you restart logging?
If the only consumer of the accessibility service was XPCOM and all XPCOM references were released, that would cause the service to shut down. However, I'm not sure that could happen in your case because something seems to have started the engine at startup.
| Assignee | ||
Comment 17•3 years ago
|
||
Updated•3 years ago
|
| Assignee | ||
Comment 18•3 years ago
|
||
(In reply to James Teh [:Jamie] from comment #16)
If the only consumer of the accessibility service was XPCOM and all XPCOM references were released, that would cause the service to shut down.
Ahah! A reference to the service is kept in a global variable set at the top level:
const gAccService = Cc["@mozilla.org/accessibilityService;1"].getService(
Ci.nsIAccessibilityService
);
Dropping that reference at the end of the test makes the test finish cleanly. Thanks a lot for your help Jamie!
Updated•3 years ago
|
Comment 19•3 years ago
|
||
Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/8d55155cbc33
Release the reference to the accessibility service at the end of the test to unobserve vsync. r=mkmelin
Comment 20•3 years ago
|
||
(In reply to Florian Quèze [:florian] (PTO until August 29) from comment #18)
Ahah! A reference to the service is kept in a global variable set at the top level:
Dropping that reference at the end of the test makes the test finish cleanly.
Note that this doesn't fix the real problem here; it wallpapers over it. The a11y engine shouldn't keep VSync enabled while the engine is running unless there are pending updates and there really shouldn't be pending updates forever. If you ran this test on a system where a11y is always enabled and can't be shut down (e.g. because the user was using an assistive technology), we'd probably still see the VSync problem in this test.
That said, given that this is currently only something we're concerned about in this test, it's probably not worth pursuing this further.
| Reporter | ||
Comment 21•3 years ago
|
||
(In reply to James Teh [:Jamie] from comment #16)
(In reply to Geoff Lankow (:darktrojan) from comment #14)
I restarted it. But why did it shut down in the first place?
How are you starting the a11y engine at Firefox startup? Do you have it force enabled via pref or does it just get enabled for you?
How are you starting a11y logging: via the environment variable or via JS?
How did you restart logging?If the only consumer of the accessibility service was XPCOM and all XPCOM references were released, that would cause the service to shut down. However, I'm not sure that could happen in your case because something seems to have started the engine at startup.
I added code to start the logging very early in the program. All references being released would explain why it shut down, I think. I started the logging again when it got to the test code. So nothing to worry about there.
(In reply to James Teh [:Jamie] from comment #20)
Note that this doesn't fix the real problem here; it wallpapers over it. The a11y engine shouldn't keep VSync enabled while the engine is running unless there are pending updates and there really shouldn't be pending updates forever.
Correct, there's still a problem here. Probably worth another bug but for now I'll add what I've worked out:
- The document in question is the main window, messenger.xhtml.
- It loses the
eDOMLoadedflag here when an email message begins loading. - The only place (that I could find) where it could get
eDOMLoadedback isDocAccessible::ProcessLoad, but we never get there because this block won't be entered withouteDOMLoaded.
Comment 22•3 years ago
•
|
||
(In reply to Geoff Lankow (:darktrojan) from comment #21)
- The document in question is the main window, messenger.xhtml.
- It loses the
eDOMLoadedflag here when an email message begins loading.
That's intriguing. Does the email message replace messenger.xhtml or is it loaded as a browser/iframe? Normally, a document should only get the loading state if it is reloading or if another document is about to replace it. And if another document replaces it, we should shut the old document down.
- The only place (that I could find) where it could get
eDOMLoadedback isDocAccessible::ProcessLoad, but we never get there because this block won't be entered withouteDOMLoaded.
If loading completes or stops, we will also set eDOMLoaded in mozilla::a11y::DocAccessible::NotifyOfLoad, triggered when we get nsIWebProgressListener::OnStateChange with STATE_STOP. It seems like this isn't happening.
| Reporter | ||
Comment 23•3 years ago
|
||
(In reply to James Teh [:Jamie] from comment #22)
That's intriguing. Does the email message replace messenger.xhtml or is it loaded as a browser/iframe?
It's in a (parent process) content browser. Everything seems to happen normally there, AFAICT from the logging.
If loading completes or stops, we will also set eDOMLoaded in mozilla::a11y::DocAccessible::NotifyOfLoad, triggered when we get nsIWebProgressListener::OnStateChange with STATE_STOP. It seems like this isn't happening.
Ah yeah, I knew about that but forgot. It happens for the inner document but not messenger.xhtml.
| Comment hidden (Intermittent Failures Robot) |
Description
•