Closed
Bug 1283947
Opened 8 years ago
Closed 7 years ago
Intermittent docshell/test/navigation/test_sessionhistory.html | Test timed out.
Categories
(Core :: DOM: Navigation, defect, P3)
Core
DOM: Navigation
Tracking
()
RESOLVED
FIXED
mozilla55
People
(Reporter: intermittent-bug-filer, Assigned: freesamael)
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell fixed])
Attachments
(3 files, 5 obsolete files)
6.28 KB,
patch
|
mattwoodrow
:
review+
|
Details | Diff | Splinter Review |
6.28 KB,
patch
|
gchang
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
6.21 KB,
patch
|
lizzard
:
approval-mozilla-beta-
|
Details | Diff | Splinter Review |
Filed by: philringnalda@gmail.com https://treeherder.mozilla.org/logviewer.html#?job_id=31085907&repo=mozilla-inbound http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-linux64-asan/1467395420/mozilla-inbound_ubuntu64-asan_vm_test-mochitest-e10s-1-bm67-tests1-linux64-build539.txt.gz
Comment 1•8 years ago
|
||
This is easily hit on a Linux64 debug Try push with --disable-optimize, FWIW.
Comment 2•8 years ago
|
||
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 22•7 years ago
|
||
:smaug - This test has been failing intermittently for a long time. Can you make it more reliable? Or would you consider skipping it on some platforms? It looks like failures are all linux and all e10s, mostly debug.
Flags: needinfo?(bugs)
Comment 23•7 years ago
|
||
Guess fix: since both hashchange event firing and animation frame callback firing are async, but don't depend on each others, try to enforce some particular order. This is the place where the test timeouts.
Assignee | ||
Comment 24•7 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #23) > Created attachment 8830202 [details] [diff] [review] > guess fix > > Guess fix: since both hashchange event firing and animation frame callback > firing are async, but don't depend on each others, try to enforce some > particular order. > This is the place where the test timeouts. I rr recorded a successful run and found in that recording nsGlobalWindow::FireHashchange also happens after nsIDocument::TakeFrameRequestCallbacks, so maybe the sequence doesn't really matter, I guess. One thing interesting is that in all failure logs of comment 21 it reads > INFO - Error: Unable to restore focus, expect failures and timeouts. Could it be requestAnimationFrame taking too much time due to the whole browser window is somehow in the background?
Flags: needinfo?(bugs)
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 27•7 years ago
|
||
Comment on attachment 8830202 [details] [diff] [review] guess fix Review of attachment 8830202 [details] [diff] [review]: ----------------------------------------------------------------- I can reproduce the timeout with mochitest --repeat 200 on my local Linux machine. I tried to reproduce the failure multiple times. With or without the patch, it always times out at around loop 100 ~ 150, so I think this patch doesn't fix the problem. (but I didn't try to reproduce it on try server) Some observations here: 1. It does seem to be easier to reproduce the issue when compiling with --disable-optimize. If not building with --disable-optimize it can run up to 150 ~ 200 loops on my machine before first timeout happens. 2. The visible symptom is that the browser window of file_scrollRestoration.html becomes blank, nothing shows in that window. Resizing the window before test timeout fixes it -- the window recovers to show the page content and test continues. 3. I rr recorded the timeout with & without the patch. In both cases, there were some pending transactions (nsRefreshDriver::mPendingTransaction was 2 or 3 and nsRefreshDriver::mWaitingForTransaction was true) that never finish, so nsRefreshDriver::Tick always gets early return at [1] and the frame request callback never gets invoked. Through without sufficient knowledge I couldn't figure out what the root cause is... BTW is there a faster way to rr replay to a given event? Replaying to the failure on my machine took 2.5 hrs... [1] http://searchfox.org/mozilla-central/rev/e95e4ed8b5229a29dacc32c0b90968df5e7a6ff3/layout/base/nsRefreshDriver.cpp#1742
Attachment #8830202 -
Flags: review?(sawang) → review-
Comment 28•7 years ago
|
||
Thanks for testing this, since I haven't managed to reproduce. Sounds like the issue is somewhere in graphics.
Assignee: bugs → nobody
Comment 29•7 years ago
|
||
matt, does comment 27 ring any bells? Why would we waiting for DidComposite forever?
Flags: needinfo?(matt.woodrow)
Comment 30•7 years ago
|
||
Not really, I guess the compositor didn't send back the notification for some reason. The rr recording is probably the best way to track this down. Normally ShadowLayerForwarder::EndTransaction calls SendUpdate(NoSwap) with aScheduleComposite=true. That calls through to LayerTransactionParent::RecvUpdate -> CrossProcessCompositorBridgeParent::ShadowLayersUpdated (for the content process) -> CompositorBridgeParent::NotifyShadowTreeTransaction and then that calls ScheduleComposition. That should result in a call to CompositorBridgeParent::CompositeToTarget and the DidComposite message being sent. I'd suggest getting the rr replay to the nsRefreshDriver::GetTransactionId() call that increments mPendingTransaction to the value that it's stuck at, and then following through the code I described above to see which piece isn't happening.
Flags: needinfo?(matt.woodrow)
Assignee | ||
Comment 31•7 years ago
|
||
I may have found the reason. It seems the problematic sequence was 1. The nsRefreshDriver(1) of "file_scrollRestoration.html" ticks, triggers SendUpdate(), and starts waiting for the transaction. 2. Test script sets window.location to "data:text/html,", causing nsRefreshDriver(1)::Freeze() as it's moving to bfcache. 3. The other nsRefreshDriver(2) of "data:text/html," becomes active. ClientLayerManager::mTransactionIdAllocator sets to nsRefreshDriver(2). 4. Chrome process notifies DidComposite, eventually calls nsRefreshDriver(2)::NotifyTransactionCompleted(). 5. Test script performs history.back(). nsRefreshDriver(2) is frozen and nsRefreshDriver(1) becomes active again. 6. Test script registers a callback through requestAnimationFrame(). However the transaction nsRefreshDriver(1) is waiting has been delivered to nsRefreshDriver(2), so it waits forever and never invokes frame callbacks.
Assignee | ||
Comment 32•7 years ago
|
||
Assignee | ||
Comment 33•7 years ago
|
||
Comment on attachment 8833256 [details] [diff] [review] v1 Hi Matt, I'm super unfamiliar with gfx code. Does this look the right way to solve the problem?
Attachment #8833256 -
Flags: review?(matt.woodrow)
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → sawang
Comment hidden (Intermittent Failures Robot) |
Comment 35•7 years ago
|
||
Comment on attachment 8833256 [details] [diff] [review] v1 Review of attachment 8833256 [details] [diff] [review]: ----------------------------------------------------------------- Can we instead clear the waiting for transaction state (set mCompletedTransaction to mPendingTransaction, and set mWaitingForTransaction to false) when we assign a different ClientLayerManager to the refresh driver?
Attachment #8833256 -
Flags: review?(matt.woodrow)
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Updated•7 years ago
|
Attachment #8830202 -
Attachment is obsolete: true
Assignee | ||
Updated•7 years ago
|
Attachment #8833256 -
Attachment is obsolete: true
Assignee | ||
Comment 37•7 years ago
|
||
Update to address review comment.
Assignee | ||
Comment 38•7 years ago
|
||
Comment on attachment 8836579 [details] [diff] [review] v2 Hi Matt, Would you take another review?
Attachment #8836579 -
Flags: review?(matt.woodrow)
Updated•7 years ago
|
Attachment #8836579 -
Flags: review?(matt.woodrow) → review+
Assignee | ||
Updated•7 years ago
|
Keywords: checkin-needed
Comment 39•7 years ago
|
||
Pushed by ihsiao@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/69bc1e04bb2c Clear pending transactions when ClientLayerManager is assigning to a different refresh driver. r=mattwoodrow
Keywords: checkin-needed
Comment 40•7 years ago
|
||
Backed out for build bustage: https://hg.mozilla.org/integration/mozilla-inbound/rev/79cea5a0238246a9dfe4ed8cc57161b09376998d Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=4c483e15fb91e4aaa6fb99e548d48516de194a78 Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=77520366&repo=mozilla-inbound [task 2017-02-15T09:51:29.215067Z] 09:51:29 INFO - /home/worker/workspace/build/src/obj-firefox/_virtualenv/bin/python /home/worker/workspace/build/src/config/expandlibs_gen.py -o libdom_file.a.desc Unified_cpp_dom_file0.o [task 2017-02-15T09:51:29.388317Z] 09:51:29 INFO - In file included from /home/worker/workspace/build/src/obj-firefox/gfx/layers/Unified_cpp_gfx_layers4.cpp:38:0: [task 2017-02-15T09:51:29.388767Z] 09:51:29 INFO - /home/worker/workspace/build/src/gfx/layers/client/ClientLayerManager.cpp:513:1: error: redefinition of 'void mozilla::layers::ClientLayerManager::SetTransactionIdAllocator(mozilla::layers::TransactionIdAllocator*)' [task 2017-02-15T09:51:29.389094Z] 09:51:29 INFO - ClientLayerManager::SetTransactionIdAllocator(TransactionIdAllocator* aAllocator) [task 2017-02-15T09:51:29.389133Z] 09:51:29 INFO - ^ [task 2017-02-15T09:51:29.389279Z] 09:51:29 INFO - In file included from /home/worker/workspace/build/src/gfx/layers/client/ClientCanvasLayer.h:10:0, [task 2017-02-15T09:51:29.389533Z] 09:51:29 INFO - from /home/worker/workspace/build/src/gfx/layers/client/ClientCanvasLayer.cpp:6, [task 2017-02-15T09:51:29.389607Z] 09:51:29 INFO - from /home/worker/workspace/build/src/obj-firefox/gfx/layers/Unified_cpp_gfx_layers4.cpp:2: [task 2017-02-15T09:51:29.389996Z] 09:51:29 INFO - /home/worker/workspace/build/src/gfx/layers/client/ClientLayerManager.h:223:16: note: 'virtual void mozilla::layers::ClientLayerManager::SetTransactionIdAllocator(mozilla::layers::TransactionIdAllocator*)' previously defined here [task 2017-02-15T09:51:29.390131Z] 09:51:29 INFO - virtual void SetTransactionIdAllocator(TransactionIdAllocator* aAllocator) override [task 2017-02-15T09:51:29.390332Z] 09:51:29 INFO - ^ [task 2017-02-15T09:51:29.390402Z] 09:51:29 INFO - /home/worker/workspace/build/src/config/rules.mk:1016: recipe for target 'Unified_cpp_gfx_layers4.o' failed [task 2017-02-15T09:51:29.390551Z] 09:51:29 INFO - gmake[5]: *** [Unified_cpp_gfx_layers4.o] Error 1
Flags: needinfo?(sawang)
Assignee | ||
Comment 41•7 years ago
|
||
Made a silly mistake on rebasing before...
Assignee | ||
Updated•7 years ago
|
Attachment #8836579 -
Attachment is obsolete: true
Flags: needinfo?(sawang)
Assignee | ||
Updated•7 years ago
|
Attachment #8837537 -
Flags: review+
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 43•7 years ago
|
||
rebase for landing
Assignee | ||
Updated•7 years ago
|
Attachment #8837537 -
Attachment is obsolete: true
Assignee | ||
Updated•7 years ago
|
Attachment #8839065 -
Flags: review+
Assignee | ||
Updated•7 years ago
|
Keywords: checkin-needed
Comment 44•7 years ago
|
||
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/3a8c78a6146f Clear pending transactions when ClientLayerManager is assigning to a different refresh driver. r=mattwoodrow
Keywords: checkin-needed
Comment 45•7 years ago
|
||
Backed out for failing browser_animation_pseudo_elements.js on Linux opt without e10s: https://hg.mozilla.org/integration/mozilla-inbound/rev/cc5a5a7e5331255b5dda147e09ea4d85a9bd8d56 Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=3a8c78a6146fa9fb3fa9f73d0547b6520e26a470&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=78876097&repo=mozilla-inbound [task 2017-02-20T20:18:39.010986Z] 20:18:39 INFO - TEST-START | devtools/client/animationinspector/test/browser_animation_pseudo_elements.js [task 2017-02-20T20:19:34.590530Z] 20:19:34 INFO - TEST-INFO | started process screentopng [task 2017-02-20T20:19:35.113422Z] 20:19:35 INFO - TEST-INFO | screentopng: exit 0 [task 2017-02-20T20:19:35.113547Z] 20:19:35 INFO - Buffered messages logged at 20:18:39 [task 2017-02-20T20:19:35.114394Z] 20:19:35 INFO - Entering test bound [task 2017-02-20T20:19:35.115954Z] 20:19:35 INFO - Adding a new tab with URL: http://example.com/browser/devtools/client/animationinspector/test/doc_pseudo_elements.html [task 2017-02-20T20:19:35.118351Z] 20:19:35 INFO - Tab added and finished loading [task 2017-02-20T20:19:35.119839Z] 20:19:35 INFO - Loading the helper frame script chrome://mochitests/content/browser/devtools/client/animationinspector/test/doc_frame_script.js [task 2017-02-20T20:19:35.121185Z] 20:19:35 INFO - Loading the helper frame script chrome://devtools/content/shared/frame-script-utils.js [task 2017-02-20T20:19:35.122534Z] 20:19:35 INFO - Opening the inspector [task 2017-02-20T20:19:35.123684Z] 20:19:35 INFO - Opening the toolbox [task 2017-02-20T20:19:35.124880Z] 20:19:35 INFO - Buffered messages finished [task 2017-02-20T20:19:35.126605Z] 20:19:35 INFO - TEST-UNEXPECTED-FAIL | devtools/client/animationinspector/test/browser_animation_pseudo_elements.js | Test timed out - [task 2017-02-20T20:19:35.127270Z] 20:19:35 INFO - Removing tab. [task 2017-02-20T20:19:35.128132Z] 20:19:35 INFO - Waiting for event: 'TabClose' on [object XULElement]. [task 2017-02-20T20:19:35.130505Z] 20:19:35 INFO - Got event: 'TabClose' on [object XULElement]. [task 2017-02-20T20:19:35.131219Z] 20:19:35 INFO - Tab removed and finished closing
Flags: needinfo?(sawang)
Updated•7 years ago
|
Whiteboard: [stockwell unknown]
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 48•7 years ago
|
||
Assignee | ||
Updated•7 years ago
|
Attachment #8833256 -
Attachment description: Clear mWaitingForTransaction on Thaw → v1
Assignee | ||
Updated•7 years ago
|
Attachment #8836579 -
Attachment description: Clear pending transactions when ClientLayerManager is assigning to a different refresh driver → v2
Assignee | ||
Updated•7 years ago
|
Attachment #8837537 -
Attachment description: Clear pending transactions when ClientLayerManager is assigning to a different refresh driver → v3
Assignee | ||
Updated•7 years ago
|
Attachment #8839065 -
Attachment description: Clear pending transactions when ClientLayerManager is assigning to a different refresh driver → v4
Attachment #8839065 -
Attachment is patch: false
Assignee | ||
Updated•7 years ago
|
Attachment #8839065 -
Attachment is obsolete: true
Attachment #8839065 -
Attachment is patch: true
Assignee | ||
Comment 49•7 years ago
|
||
Comment on attachment 8845335 [details] [diff] [review] Clear pending transactions when ClientLayerManager is assigning to a different refresh driver Looks I forgot to check aAllocator != mTransactionIdAllocator. In the meantime, given that we already know DidComposite could happen after ClientLayerManager has switched to another refresh driver, I'm thinking we should also reset the transaction id of newly adopted refresh driver to a higher value so it doesn't get confused. Would like to know your opinion on this part. Does it sound reasonable to you?
Flags: needinfo?(sawang)
Attachment #8845335 -
Flags: review?(matt.woodrow)
Comment 50•7 years ago
|
||
Comment on attachment 8845335 [details] [diff] [review] Clear pending transactions when ClientLayerManager is assigning to a different refresh driver Review of attachment 8845335 [details] [diff] [review]: ----------------------------------------------------------------- Yeah, sounds like a good idea.
Attachment #8845335 -
Flags: review?(matt.woodrow) → review+
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 52•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a09513bc18b635152fa47f29b628389f6332a654&group_state=expanded&selectedJob=82977175
Keywords: checkin-needed
Comment 53•7 years ago
|
||
Pushed by cbook@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/0fa3d7b36a7b Clear pending transactions when ClientLayerManager is assigning to a different refresh driver. r=mattwoodrow
Keywords: checkin-needed
Updated•7 years ago
|
Whiteboard: [stockwell unknown] → [stockwell fixed]
Comment 54•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/0fa3d7b36a7b
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Updated•7 years ago
|
status-firefox52:
--- → wontfix
status-firefox53:
--- → affected
status-firefox54:
--- → affected
status-firefox-esr52:
--- → affected
Comment 55•7 years ago
|
||
No occurrences in the last week. Please request Aurora/Beta/ESR52 approval on this when you get a chance.
Flags: needinfo?(sawang)
Assignee | ||
Comment 56•7 years ago
|
||
rebase on aurora
Assignee | ||
Comment 57•7 years ago
|
||
rebase on beta
Assignee | ||
Updated•7 years ago
|
Attachment #8850863 -
Attachment description: Clear pending transactions when ClientLayerManager is assigning to a different refresh driver → patch for aurora
Assignee | ||
Updated•7 years ago
|
Attachment #8850865 -
Attachment description: Clear pending transactions when ClientLayerManager is assigning to a different refresh driver → patch for beta
Assignee | ||
Comment 58•7 years ago
|
||
Making try runs on rebased patches. On ESR52 it doesn't look quite green: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e5ed8d52cf41174fdc22d3905776a2f640f2385a&group_state=expanded Waiting for aurora / beta rebased try runs: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fcbd33fe85b09862f3f28781899d73e811e63d6c&group_state=expanded https://treeherder.mozilla.org/#/jobs?repo=try&revision=2cc1e9db1497b9850c0205609de19c5b22d6669f&group_state=expanded
Flags: needinfo?(sawang)
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 60•7 years ago
|
||
Comment on attachment 8850863 [details] [diff] [review] patch for aurora Approval Request Comment [Feature/Bug causing the regression]: I think it's since we introduced e10s. [User impact if declined]: In very rare case history navigation could result in a blank page. It's recoverable if a subsequent composite request happens, such as resizing, minimizing the window and back, or just refreshing the page. [Is this code covered by automated tests?]: Yes. It fixes automation timeout. [Has the fix been verified in Nightly?]: Yes. [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: None. [Is the change risky?]: Slightly. [Why is the change risky/not risky?]: It's just a small change but still changes to graphics / refresh driver could be relatively risky to cause regression. [String changes made/needed]: None.
Attachment #8850863 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 61•7 years ago
|
||
Comment on attachment 8850865 [details] [diff] [review] patch for beta Approval Request Comment [Feature/Bug causing the regression]: I think it's since we introduced e10s. [User impact if declined]: In very rare case history navigation could result in a blank page. It's recoverable if a subsequent composite request happens, such as resizing, minimizing the window and back, or just refreshing the page. [Is this code covered by automated tests?]: Yes. It fixes automation timeout. [Has the fix been verified in Nightly?]: Yes. [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: None. [Is the change risky?]: Slightly. [Why is the change risky/not risky?]: It's just a small change but still changes to graphics / refresh driver could be relatively risky to cause regression. [String changes made/needed]: None.
Attachment #8850865 -
Flags: approval-mozilla-beta?
Comment 62•7 years ago
|
||
Comment on attachment 8850863 [details] [diff] [review] patch for aurora Fix an intermittent-failure. Let's take it in aurora first. Aurora54+.
Attachment #8850863 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 63•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/cbc61557e638
Comment 64•7 years ago
|
||
Comment on attachment 8850865 [details] [diff] [review] patch for beta I'd like to let this ride with aurora 54 since we are getting near to the end of the beta cycle. Nice to see this long standing test failure fixed!
Attachment #8850865 -
Flags: approval-mozilla-beta? → approval-mozilla-beta-
Updated•7 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•