Closed Bug 995688 Opened 6 years ago Closed 5 years ago

Intermittent test_multi_sharedWorker_lifetimes.html | Still have data stored - got undefined, expected 0123456789abcdefghijklmnopqrstuvwxyz

Categories

(Core :: DOM: Workers, defect)

x86
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla34
Tracking Status
firefox31 --- wontfix
firefox32 --- fixed
firefox33 --- fixed
firefox34 --- fixed
firefox-esr24 --- unaffected
firefox-esr31 --- fixed
b2g-v1.4 --- unaffected
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed

People

(Reporter: RyanVM, Assigned: bkelly)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file, 3 obsolete files)

https://tbpl.mozilla.org/php/getParsedLog.php?id=37692125&tree=B2g-Inbound

b2g_ubuntu32_vm b2g-inbound opt test mochitest-1 on 2014-04-12 05:54:29 PDT for push c08e4287123b
slave: tst-linux32-spot-338

06:06:32     INFO -  9410 INFO TEST-PASS | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Got a MessageEvent
06:06:32     INFO -  9411 INFO TEST-PASS | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Correct window got the event
06:06:32     INFO -  9412 INFO TEST-PASS | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Got a result message
06:06:32     INFO -  9413 INFO TEST-PASS | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Got stored data
06:06:32     INFO -  9414 INFO TEST-INFO | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Navigating to about:blank
06:06:32     INFO -  9415 INFO TEST-INFO | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Navigating to multi_sharedWorker_frame.html
06:06:32     INFO -  9416 INFO TEST-INFO | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Posting message: {"command":"retrieve"}
06:06:32     INFO -  9417 INFO TEST-INFO | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Worker message: {"type":"result"}
06:06:32     INFO -  9418 INFO TEST-PASS | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Got a MessageEvent
06:06:32     INFO -  9419 INFO TEST-PASS | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Correct window got the event
06:06:32     INFO -  9420 INFO TEST-PASS | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Got a result message
06:06:32     INFO -  9421 INFO TEST-PASS | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | No data stored
06:06:32     INFO -  9422 INFO TEST-INFO | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Posting message: {"command":"store","data":"0123456789abcdefghijklmnopqrstuvwxyz"}
06:06:32     INFO -  9423 INFO TEST-INFO | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Posting message: {"command":"retrieve"}
06:06:32     INFO -  9424 INFO TEST-INFO | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Worker message: {"type":"result","data":"0123456789abcdefghijklmnopqrstuvwxyz"}
06:06:32     INFO -  9425 INFO TEST-PASS | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Got a MessageEvent
06:06:32     INFO -  9426 INFO TEST-PASS | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Correct window got the event
06:06:32     INFO -  9427 INFO TEST-PASS | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Got a result message
06:06:32     INFO -  9428 INFO TEST-PASS | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Got stored data
06:06:32     INFO -  9429 INFO TEST-INFO | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Enabling 'browser.sessionhistory.cache_subframes' pref
06:06:32     INFO -  9430 INFO TEST-INFO | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Navigating to about:blank
06:06:32     INFO -  9431 INFO TEST-INFO | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Running GC
06:06:32     INFO -  9432 INFO TEST-INFO | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Waiting the event queue to clear
06:06:32     INFO -  9433 INFO TEST-INFO | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Running GC
06:06:32     INFO -  9434 INFO TEST-INFO | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Waiting the event queue to clear
06:06:32     INFO -  9435 INFO TEST-INFO | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Running GC
06:06:32     INFO -  9436 INFO TEST-INFO | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Waiting the event queue to clear
06:06:32     INFO -  9437 INFO TEST-INFO | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Navigating to multi_sharedWorker_frame.html
06:06:32     INFO -  9438 INFO TEST-INFO | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Posting message: {"command":"retrieve"}
06:06:32     INFO -  9439 INFO TEST-INFO | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Worker message: {"type":"result"}
06:06:32     INFO -  9440 INFO TEST-PASS | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Got a MessageEvent
06:06:32     INFO -  9441 INFO TEST-PASS | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Correct window got the event
06:06:32     INFO -  9442 INFO TEST-PASS | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Got a result message
06:06:32     INFO -  9443 INFO TEST-UNEXPECTED-FAIL | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | Still have data stored - got undefined, expected 0123456789abcdefghijklmnopqrstuvwxyz
06:06:32     INFO -  9444 INFO TEST-INFO | MEMORY STAT vsize after test: 733900800
06:06:32     INFO -  9445 INFO TEST-INFO | MEMORY STAT residentFast after test: 141307904
06:06:32     INFO -  9446 INFO TEST-INFO | MEMORY STAT heapAllocated after test: 47595604
06:06:32     INFO -  9447 INFO TEST-END | /tests/dom/workers/test/test_multi_sharedWorker_lifetimes.html | finished in 734ms
FWIW, this is permaorange for me locally on OSX 10.9, against this revision:
  https://hg.mozilla.org/mozilla-central/rev/3d8e8213e08e
It fails when run with
  ./mach mochitest-plain dom/workers/test/test_multi_sharedWorker_lifetimes.html
Or as part of the entire directory.
(git commit d160b3db0af98ae25f584bc0e50fa0090d930edc)
2 months in and this bug hasn't fixed itself yet. Ben, can you please take a look at this frequent failure?
Flags: needinfo?(bent.mozilla)
Blocks: 1005907
OK, I think this bug has sat long enough now without going anywhere. I'll be disabling this test in 2 days unless there's new activity suggesting that someone is looking into it.
Ryan asked me to look at this.  Will try to do some investigation in the next few days.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
So I can run the test locally, but I have not been able to reproduce so far.

Looking at the test, it appears the javascript context is being lost for the SharedWorker when navigating away from a page even though its in the BF cache.
As I said in comment 18, this fails for me every time on OSX, if that helps.  I can also poke around in lldb if that would help, though I'm not familiar with any of this code.
Thanks!  I missed your comment in the sea of tbpl stacks.
Thanks to Andrew, I can reproduce this failure on my macbook.
So it appears that when this fails on Mac OS X its due to a new shared worker thread being spun up instead of picking up the existing shared worker thread.

I want to see if this is the same thing happening on b2g-desktop, so here's a try push:

  https://tbpl.mozilla.org/?tree=Try&rev=6d429a69f180
(In reply to Ben Kelly [:bkelly] from comment #273)
> So it appears that when this fails on Mac OS X its due to a new shared
> worker thread being spun up instead of picking up the existing shared worker
> thread.

On the Mac this is happening because ForgetSharedWorker() is clearing the DomainLoadInfo structure after the first GC/wait in the test.
And on the mac ForgetSharedWorker is getting called because the WorkerPrivateParent is being cycle collected.
If I'm reading this right, it seems WorkerPrivateParent is normally kept alive by virtue of being DOMEventTargetHelper which adds itself to a window.  So that window must be getting killed or clearing its reference for some reason.
No, the edge from the window to the event target is weak.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #277)
> No, the edge from the window to the event target is weak.

That makes sense.  I guess nsDocShell preserving the presentation is supposed to hold on to it then.  I'll poke at that bit of code.

Also, the try build is failing to trigger with the debug added.  I may just have to solve this on Mac and hope the b2g-desktop issue is the same.
Cycle collection was a red herring.  There are multiple worker objects in play and that CC was for a different object.

It appears UnregisterSharedWorker is getting called and triggering Cancel().
On mac this is failing because the style attribute is getting changed on the scrollbar.  This in turn causes nsSHEntryShared to DropPresentation().  DropPresentation then leads to FreeInnerObjects() which does an UnregisterSharedWorker.
Here's another try build that should hopefully change the timing less and allow the failure to reproduce:

  https://tbpl.mozilla.org/?tree=Try&rev=605e6f06b56b
So the stack for the document getting evicted from the BF cache is below.  Basically the ScrollbarActivity is changing the opacity in order to do some fade-in or fade-out effect.

Talking to Ollie, we could check aElement->IsInNativeAnonymousSubtree() in nsSHEntryShared::AttributeChanged() before evicting from the BF cache.  This would ignore changes to browser chrome.

Alternatively, Ollie points out that the ScrollbarActivity should not be in the refresh driver list any more.  So the correct fix is to fix that by itself or in combination with the check.

    439 INFO Assertion failure: false, at /Users/bkelly/devel/hg/mozilla-central/docshell/shistory/src/nsSHEntryShared.cpp:356
    440 INFO mozilla::dom::Element::SetAttrAndNotify(int, nsIAtom*, nsIAtom*, nsAttrValue const&, nsAttrValue&, unsigned char, bool, bool, bool)+0x000006A5 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.3.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x019C7615]
    441 INFO nsStyledElementNotElementCSSInlineStyle::SetInlineStyleRule(mozilla::css::StyleRule*, nsAString_internal const*, bool)+0x00000162 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.3.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x01AC0672]
    442 INFO nsDOMCSSAttributeDeclaration::SetCSSDeclaration(mozilla::css::Declaration*)+0x00000155 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.3.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x01FB35C5]
    443 INFO nsDOMCSSDeclaration::ParsePropertyValue(nsCSSProperty, nsAString_internal const&, bool)+0x0000012C [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.3.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x01FB3F8C]
    444 INFO mozilla::layout::SetOpacityOnElement(nsIContent*, double)+0x0000016C [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.3.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x0215543C]
    445 INFO mozilla::layout::ScrollbarActivity::UpdateOpacity(mozilla::TimeStamp)+0x000000CF [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.3.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x02154C9F]
    446 INFO mozilla::layout::ScrollbarActivity::WillRefresh(mozilla::TimeStamp)+0x0000009D [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.3.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x02154B7D]
    447 INFO nsRefreshDriver::Tick(long long, mozilla::TimeStamp)+0x00000548 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.3.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x020889C8]
    448 INFO mozilla::RefreshDriverTimer::Tick()+0x00000168 [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.3.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x0208B9D8]
    449 INFO nsTimerImpl::Fire()+0x000003AE [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.3.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x000B940E]
    450 INFO nsTimerEvent::Run()+0x0000010F [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.3.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x000B970F]
    451 INFO nsThread::ProcessNextEvent(bool, bool*)+0x000004FB [/Users/bkelly/devel/hg/mozilla-central/obj-x86_64-apple-darwin13.3.0-debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x000B4A9B]
Olli, here is a patch to check for the scrollbar style change in nsSHEntryShared.  It seems we need this no matter what since its always possible that the next ScrollbarActivity event is already in the event queue when we put the document in the BF cache.

I did not see an obvious place where nsDocShell stops the ScrollbarActivity animation.  Is that wholely missing?  It seems perhaps it should observe the DOM_WINDOW_FROZEN_TOPIC in order to implement that.

Here is a try build:

  https://tbpl.mozilla.org/?tree=Try&rev=1f9422da66e8

I'll do a bunch of triggers to see if this fixes the intermittent or not.
Attachment #8468051 - Flags: review?(bugs)
Flags: needinfo?(bent.mozilla)
Attachment #8468051 - Attachment description: bug995688_bfcache_scrollbar.patch → Don't evict from BF cache due to browser chrome scrollbar style changes.
Comment on attachment 8468051 [details] [diff] [review]
Don't evict from BF cache due to browser chrome scrollbar style changes.

aElement->IsInNativeAnonymousSubtree() &&
aElement->IsXUL(nsGkAtoms::scrollbar) &&
aNameSpaceID == kNameSpaceID_None &&
aAttribute == nsGkAtoms::style

with that, r+

And remove the useless MOZ_ASSERTs. We crash anyway if aElement or aAttribute is null.
Attachment #8468051 - Flags: review?(bugs) → review+
Thanks Olli!  Definitely better than a bunch of string compares.

Here's a new try:

  https://tbpl.mozilla.org/?tree=Try&rev=4ac682c5002e

And for comparison, here's a try without the patch:

  https://tbpl.mozilla.org/?tree=Try&rev=faea79ae1936

Lets see what kind of improvement we get.

That being said, even if the b2g-desktop failure persists we should probably still land this as it breaks local mac tests.
Attachment #8468051 - Attachment is obsolete: true
Attachment #8468136 - Flags: review+
Canceled those try builds because the patch doesn't work.  Just noticed it also fails on my local build.  Checking to see if we locked something down too much.
After twiddling the style attribute a number of times we finally get a change to the active attribute.

Olli, what do you think about allowing that attribute to change as well?

Also, I'll file a follow-up bug to fix scrollbar activity, although I won't have time to look at it.  It seems stopping and restarting the animation correctly may be a bit tricky.
Attachment #8468136 - Attachment is obsolete: true
Attachment #8468167 - Flags: review?(bugs)
See Also: → 1049277
Attachment #8468167 - Flags: review?(bugs) → review+
But actually, don't we have a pref to disable the fadin/out
"layout.testing.overlay-scrollbars.always-visible", but I don't see anything for the active case.
(In reply to Olli Pettay [:smaug] from comment #290)
> "layout.testing.overlay-scrollbars.always-visible", but I don't see anything
> for the active case.

I think perhaps the scrollbar never goes active if its not fading in or out.  The test passes on mac with just setting this pref.

Lets use this approach in this test and then the fix for the underlying issue can go in bug 1049277.
Patch that just flips the scrollbar pref in the test to avoid this issue.  Verified this works locally on my mac.

Try build:

  https://tbpl.mozilla.org/?tree=Try&rev=b5c440285e72
Attachment #8468167 - Attachment is obsolete: true
Attachment #8468608 - Flags: review?(bugs)
Attachment #8468608 - Flags: review?(bugs) → review+
The particular tests we care about passed in the try build.  I'll land once mozilla-inbound opens up again.
https://hg.mozilla.org/mozilla-central/rev/9ff7c80c4790
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Really makes me wonder how often we get burned in other tests by similar issues. Can we disable this at the harness level instead of just in the one test?
QA Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.