Closed Bug 1425932 Opened 2 years ago Closed 2 years ago

Intermittent accessible/tests/browser/tree/browser_aria_owns.js | Test timed out -

Categories

(Core :: Disability Access APIs, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox-esr52 --- unaffected
firefox57 --- unaffected
firefox58 --- unaffected
firefox59 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: samuel.thibault)

References

Details

(Keywords: intermittent-failure, regression, Whiteboard: [stockwell fixed:product])

Attachments

(1 file, 4 obsolete files)

Filed by: ncsoregi [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=152147403&repo=mozilla-inbound

https://queue.taskcluster.net/v1/task/NC-61BYUS4OiqjSe3i6HBA/runs/0/artifacts/public/logs/live_backing.log

INFO -  Buffered messages finished
3239
18:39:11    ERROR -  152 INFO TEST-UNEXPECTED-FAIL | accessible/tests/browser/tree/browser_aria_owns.js | Test timed out -
3240
18:39:11     INFO -  153 INFO TEST-FAIL | accessible/tests/browser/tree/browser_aria_owns.js | Assertion count 2 is greater than expected range 0-0 assertions. -
3241
18:39:11     INFO -  GECKO(3360) | MEMORY STAT | vsize 850MB | vsizeMaxContiguous 442MB | residentFast 270MB | heapAllocated 81MB
This actually fails more than 50% of the time, non-e10s (which is Win7-debug only only because that's the only place we run non-e10s tests), and is caused by the patch for bug 1260598.
Blocks: 1260598
Keywords: regression
Priority: P5 → --
Attached patch patch (obsolete) — Splinter Review
The test is about event ordering within the document itself, so I guess in the end it may be safer to use the condition I introduced in an earlier version of the patch, to avoid too many side effects while still fixing the original issue. The attached patch should be doing it, it would be useful to run the test with it (I'm not sure I will be able to run tests on such recent version of firefox here).
Flags: needinfo?(surkov.alexander)
I managed to run the test, the attached patch doesn't fix the issue.
Yura, any speculations what may happen here? How tweaking the event ordering between different documents may affect on this particular test? Should we enable extending logging for more data?
Flags: needinfo?(surkov.alexander) → needinfo?(yzenevich)
Attached patch patch (obsolete) — Splinter Review
The attached patch, however, does fix the issue. It actually goes back to my original attempt: making the WillRefresh() call of the child do the call for the parent first.
Attachment #8937674 - Attachment is obsolete: true
It looks like another case of missing calling WillRefresh from the timer for some queuing oddity reason: during the test, the document only gets WillRefresh called once, even if it returned without processing anything. Perhaps we are missing requeuing a trigger for the timer which makes the WillRefresh calls?
I mean: when WillRefresh returns without doing anything (for whatever reason), it doesn't call RemoveRefreshObserver, so the NotificationController is still in the RefreshDriver array, but EnsureTimerStarted() is not called either. And indeed, while I see the RefreshDriver for chrome running Tick, that does not happen for the RefreshDriver for the document. So I guess in that precise case we can't afford missing the refresh opportunity, and calling WillRefresh for the parent is the right thing to do? We don't see regressions any more with that approach (thanks to the fix about the mContentInsertions fix), I'm currently running the FF 55 testsuite with it (I don't have the toolchain working for FF 57 testsuite)
Attached patch fix (obsolete) — Splinter Review
Here is a properly formated patch which fixes this issue. We are not seeing regressions in our user conditions, and the FF 55 testsuite does not show regressions.
Attachment #8937701 - Attachment is obsolete: true
Attachment #8937813 - Flags: review?(surkov.alexander)
(In reply to Samuel Thibault from comment #7)
> I mean: when WillRefresh returns without doing anything (for whatever
> reason), it doesn't call RemoveRefreshObserver, so the
> NotificationController is still in the RefreshDriver array, but
> EnsureTimerStarted() is not called either.

I miss EnsureTimerStarted part, more details please?

> And indeed, while I see the
> RefreshDriver for chrome running Tick, that does not happen for the
> RefreshDriver for the document.

what do you mean by chrome and document? It is not a document for browser UI and document for a tab, right it is something else?

> So I guess in that precise case we can't
> afford missing the refresh opportunity, and calling WillRefresh for the
> parent is the right thing to do?

I think you're right. If there are no layout changes then refresh driver never triggers, so if we reject WillRefresh for some reason, we may be not called again until user makes something like mouse move.

> We don't see regressions any more with that
> approach (thanks to the fix about the mContentInsertions fix), I'm currently
> running the FF 55 testsuite with it (I don't have the toolchain working for
> FF 57 testsuite)

True. However I'm scared about while() thing from the current patch, since I'm not confident that we are guaranteed that it will never fall into endless loop under some crazy circumstances.

I guess it could work as a hot fix, but I'd really wish if we had one WillRefresh() that walked from top level parent through all sub documents. That should be a more straightforward logic, easier for understanding. Would you be interested to take a shot?
Flags: needinfo?(yzenevich)
Comment on attachment 8937813 [details] [diff] [review]
fix

Review of attachment 8937813 [details] [diff] [review]:
-----------------------------------------------------------------

::: accessible/base/NotificationController.cpp
@@ +638,1 @@
>    }

If we force the parent's update, then I'm not sure how WaitingForParent() can return false. If it can, then I suppose that we may block the thread for forever, no?
Attached patch fix (obsolete) — Splinter Review
EnsureTimerStarted is called by AddRefreshObserver, and the name suggests to me that it's something that ensures that there will be a caller of Tick(). I however tried to call RemoveRefreshObserver+AddRefreshObserver before returning, but that didn't help.
  
  By "chrome" and "document", I mean the DocAccessible objects with NativeRole() == roles::CHROME_WINDOW and roles::DOCUMENT.
  
  About the while loop, it seems we can replace it with an if, and bug 1260598 still be fixed, here is the updated patch.

  About going through all sub documents, do we really want this? It seems expensive to look through the whole tree on each WillRefresh. Or is the tree of subdocuments not too big even on very complex webpages?
Attachment #8937813 - Attachment is obsolete: true
Attachment #8937813 - Flags: review?(surkov.alexander)
Attachment #8938005 - Flags: review?(surkov.alexander)
(In reply to Samuel Thibault from comment #12)

>   About the while loop, it seems we can replace it with an if, and bug
> 1260598 still be fixed, here is the updated patch.

yep, it looks better

>   About going through all sub documents, do we really want this? It seems
> expensive to look through the whole tree on each WillRefresh. Or is the tree
> of subdocuments not too big even on very complex webpages?

There's no many subdocuments, I don't think that a typical web page has more than few of them, I guess it could reach dozen, but I'm skeptic it can go higher.

Anyway, it's not really important because each document has own WillRefresh now and they all are triggered all together by refresh driver. The suggestion is to have one WillRefresh and traverse all documents from top to bottom. So the amount of document updates is not changed in this case, but it will get rid of problem of random order document updates - the issue you fight with.
Attachment #8938005 - Flags: review?(surkov.alexander) → review+
Not all documents register for WillRefresh all the time, they only do so when there is something queued and ScheduleProcessing is called. If there aren't so many documents it shouldn't harm too much to traverse the whole tree. On websites with some ads etc., I could usually see like 20 documents.

However, on e.g. leboncoin.fr (very famous in France), I'm getting dozens of documents, it seems to actually increase while browsing, easily reaching hundreds of documents and counting :/ On the other hand WillRefresh calls don't increase accordingly, it remains at the same load level, meaning only a constant few of those documents request ScheduleProcessing. Browsing the whole documents tree would however make the browsing more and more sluggish.

BTW, do I need to resend the small patch with the r=surkov tag for it to be commited in the meanwhile?
(In reply to Samuel Thibault from comment #15)
> Not all documents register for WillRefresh all the time, they only do so
> when there is something queued and ScheduleProcessing is called. 

you have a point

If there
> aren't so many documents it shouldn't harm too much to traverse the whole
> tree. On websites with some ads etc., I could usually see like 20 documents.

> However, on e.g. leboncoin.fr (very famous in France), I'm getting dozens of
> documents, it seems to actually increase while browsing, easily reaching
> hundreds of documents and counting :/ On the other hand WillRefresh calls
> don't increase accordingly, it remains at the same load level, meaning only
> a constant few of those documents request ScheduleProcessing. Browsing the
> whole documents tree would however make the browsing more and more sluggish.

I'd be also interesting to know how those documents are static, i.e. how often they are updated, and how long the refresh queue for this web site.

Anyway, I would be surprised if one hundred long list traversal made the browser any slower, but it doesn't sound optimal for sure.

> BTW, do I need to resend the small patch with the r=surkov tag for it to be
> commited in the meanwhile?

Sure.

It makes sense to file a bug to explore from top to down tree traversal idea, and list there your concerns.
Attached patch fixSplinter Review
> > BTW, do I need to resend the small patch with the r=surkov tag for it to be
> > commited in the meanwhile?
> 
> Sure.

Ok, I wasn't sure because https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/How_to_Submit_a_Patch seemed to imply that the committer would add the tag anyway.

Here is the updated short fix.
Attachment #8938005 - Attachment is obsolete: true
Attachment #8938379 - Flags: review?(surkov.alexander)
(In reply to Samuel Thibault from comment #17)
> Created attachment 8938379 [details] [diff] [review]
> fix
> 
> > > BTW, do I need to resend the small patch with the r=surkov tag for it to be
> > > commited in the meanwhile?
> > 
> > Sure.
> 
> Ok, I wasn't sure because
> https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/
> How_to_Submit_a_Patch seemed to imply that the committer would add the tag
> anyway.

hm, apparently my r+ didn't work out properly, will fix it.
Attachment #8938379 - Flags: review?(surkov.alexander) → review+
AIUI, review+ is not enough, and somebody also needs to set checkin-needed for a committer to have a look.
(FI, I submitted the tree traversal TODO to bug 1426687)
Assignee: nobody → samuel.thibault
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/09e57c36e8da
Explicitly process chrome events before processing document events. r=surkov
Keywords: checkin-needed
Whiteboard: [stockwell fixed:product]
https://hg.mozilla.org/mozilla-central/rev/09e57c36e8da
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Depends on: 1426868
You need to log in before you can comment on or make changes to this bug.