Closed Bug 992270 Opened 6 years ago Closed 5 years ago

browser_google_behavior.js & browser_bing_behavior.js fails when run in chunks on Cedar

Categories

(Firefox :: Search, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
Iteration:
36.1
Tracking Status
firefox29 --- fixed
firefox30 --- fixed
firefox31 --- fixed
firefox-esr24 --- unaffected
b2g-v1.2 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed

People

(Reporter: RyanVM, Assigned: MattN)

References

Details

Attachments

(2 files, 1 obsolete file)

We are currently running mochitest-bc on Cedar in chunks and with the devtools tests split out into their own suite. One consequence of this work is that we are seeing test interdependencies come to light as failures when things don't run before or after that were expected. These failures are blocking our ability to get these big improvements into production.

One common failure we are seeing is the following in browser_google_behavior.js on Linux and Windows:

https://tbpl.mozilla.org/php/getParsedLog.php?id=37281944&tree=Cedar

08:26:55     INFO -  TEST-START | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js
08:26:55     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | Google is installed
08:26:55     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | Check that Google is the default search engine
08:26:55     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | Check search URL for 'foo'
08:26:55     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | Running : context menu search
08:26:55     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | onStateChange
08:26:55     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | received document start
08:26:55     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | req is a channel
08:26:55     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | search URL was loaded
08:26:55     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | Actual URI: https://www.google.com/search?q=foo&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:unofficial&client=firefox-a&channel=rcs
08:26:55     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | Running : keyword search
08:26:55     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | onStateChange
08:26:55     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | onStateChange
08:26:55     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | received document start
08:26:55     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | req is a channel
08:26:55     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | search URL was loaded
08:26:55     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | Actual URI: https://www.google.com/search?q=foo&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:unofficial&client=firefox-a&channel=fflb
08:26:55     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | Running : search bar search
08:26:55     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | onStateChange
08:26:55     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | onStateChange
08:26:55     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | received document start
08:26:55     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | req is a channel
08:26:55     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | search URL was loaded
08:26:55     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | Actual URI: https://www.google.com/search?q=foo&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:unofficial&client=firefox-a&channel=sb
08:26:55     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | Running : home page search
08:26:55     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | Waiting for about:home load
08:26:55     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | onStateChange
08:26:55     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | onStateChange
08:26:55     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | received document start
08:26:55     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | req is a channel
08:26:55     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | search URL was loaded
08:26:55     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | Actual URI: https://www.google.com/search?q=foo&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:unofficial&client=firefox-a&channel=np&source=hp
08:26:55  WARNING -  TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | uncaught exception - NotFoundError: The operation failed because the requested database object could not be found. For example, an object store did not exist but was being opened. at chrome://browser/content/abouthome/aboutHome.js:254
08:26:55     INFO -  Stack trace:
08:26:55     INFO -      JS frame :: chrome://mochikit/content/tests/SimpleTest/SimpleTest.js :: simpletestOnerror :: line 1324
08:26:55     INFO -      native frame :: <unknown filename> :: <TOP_LEVEL> :: line 0
08:26:55     INFO -  JavaScript error: chrome://browser/content/abouthome/aboutHome.js, line 254: NotFoundError: The operation failed because the requested database object could not be found. For example, an object store did not exist but was being opened.
08:26:55     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | Console message: [JavaScript Error: "NotFoundError: The operation failed because the requested database object could not be found. For example, an object store did not exist but was being opened." {file: "chrome://browser/content/abouthome/aboutHome.js" line: 254}]
08:26:55     INFO -  TEST-INFO | MEMORY STAT vsize after test: 868421632
08:26:55     INFO -  TEST-INFO | MEMORY STAT residentFast after test: 164909056
08:26:55     INFO -  TEST-INFO | MEMORY STAT heapAllocated after test: 74275296
08:26:55     INFO -  INFO TEST-END | chrome://mochitests/content/browser/browser/components/search/test/browser_google_behavior.js | finished in 394ms
Version: 26 Branch → unspecified
Bug 975528 comment 25 - 27 has some comments on this failure.
Flags: firefox-backlog?
Same error case as bug 972684 (see bug 972684 comment 15). Looking for that particular exception in bug comments shows it showing up in a bunch of tests somewhat randomly.
Flags: firefox-backlog? → firefox-backlog+
Bug 890409 made browser_aboutHome.js ignore related Indexed DB uncaught exceptions in the test with ignoreAllUncaughtExceptions(). If about:home's own test doesn't care about these (which I don't really agree with), why should search tests? It's at least better than disabling the test altogether while chunking. I couldn't reproduce the problem locally in order to debug the root cause.
Assignee: nobody → MattN+bmo
Status: NEW → ASSIGNED
Attachment #8402239 - Flags: review?(gavin.sharp)
Summary: browser_google_behavior.js fails when run in chunks on Cedar → browser_google_behavior.js & browser_bing_behavior.js fails when run in chunks on Cedar
Comment on attachment 8402239 [details] [diff] [review]
v.1 ignoreAllUncaughtExceptions in the about:home subtest

I wonder if instead we can just wrap this test's call to "doc.getElementById("searchSubmit").click();" in an executeSoon - triggering the search from the mutation observer could be causing a strange interaction (aboutHome itself has the same mutation observer which ends up doing indexedDB calls via ensureSnippetsMapThen).

Can we try that first?
(In reply to :Gavin Sharp (email gavin@gavinsharp.com) from comment #4)
> Can we try that first?

Sure, gimme a few minutes.
FWIW, I'm happy to run any patches through Cedar if you want to verify that it works given your troubles reproducing locally. Just needinfo me.
Ryan, can you check if this fixes the issue? Thanks
Attachment #8402255 - Flags: feedback?(ryanvm)
Comment on attachment 8402255 [details] [diff] [review]
Approach 2 - Delay the searchSubmit click on about:home

Not good :(

https://tbpl.mozilla.org/?tree=Cedar&jobname=mochitest-browser-chrome-2&rev=c7d51e394c53
Attachment #8402255 - Flags: feedback?(ryanvm) → feedback-
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #8)
> Comment on attachment 8402255 [details] [diff] [review]
> Approach 2 - Delay the searchSubmit click on about:home
> 
> Not good :(

OK, mind trying a really high delay of 2s?
Attachment #8402255 - Attachment is obsolete: true
Attachment #8402400 - Flags: feedback?(ryanvm)
Comment on attachment 8402400 [details] [diff] [review]
Approach 2 - v2 - 2s delay for the searchSubmit click on about:home

https://tbpl.mozilla.org/?tree=Cedar&jobname=mochitest-browser-chrome-2&rev=c27138be046a

Pretty darn green :)
Attachment #8402400 - Flags: feedback?(ryanvm) → feedback+
Comment on attachment 8402400 [details] [diff] [review]
Approach 2 - v2 - 2s delay for the searchSubmit click on about:home

(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #10)
> Comment on attachment 8402400 [details] [diff] [review]
> Approach 2 - v2 - 2s delay for the searchSubmit click on about:home
>
> Pretty darn green :)

Thanks Ryan

Gavin, so executeSoon still failed and 2 seconds works. Do you want to just land a setTimeout or add some way for the test to know if needs to wait?
Attachment #8402400 - Flags: feedback?(gavin.sharp)
Looks like we can still hit this intermittently even with Approach 2.
https://tbpl.mozilla.org/php/getParsedLog.php?id=37373994&tree=Cedar
There shouldn't be anything inherently time-based here, unless it's related to async indexedDB file I/O. If an executeSoon (or two) doesn't help then we're probably looking at a flaky hackaround at best.

If these help reduce the frequency to tolerable levels, I guess we can go with a setTimeout hack as a temporary solution, but I'd really rather fix things properly.

Looking at the aboutHome.js code:
http://hg.mozilla.org/mozilla-central/annotate/5fa70bd90a8b/browser/base/content/abouthome/aboutHome.js#l225

We have an openRequest.onerror handler that calls deleteDatabase, an openRequest.onupgradeneeded handler that calls createObjectStore(SNIPPETS_OBJECTSTORE_NAME), and the line that's throwing NOT_FOUND_ERR is .objectStore(SNIPPETS_OBJECTSTORE_NAME) call in a openRequest.onsuccess handler.

Do any indexedDB experts see potential problems/races with that indexedDB-using code that could cause this? Consider that multiple about:home pages are frequently opened/closed loaded/unloaded very quickly during tests.
Flags: needinfo?(khuey)
Flags: needinfo?(jonas)
Flags: needinfo?(bent.mozilla)
And these tests in particular load about:home, which invokes that aboutHome.js indexedDB code, and then quickly trigger a search that navigates the same tab to google.com. Introducing the delay before triggering the search is what comment 11 was doing, which seemed to mitigate the problem.
Comment on attachment 8402239 [details] [diff] [review]
v.1 ignoreAllUncaughtExceptions in the about:home subtest

I'm fine with landing one of these to mitigate in the short term, but I'd rather avoid it if possible.
Attachment #8402239 - Flags: review?(gavin.sharp) → feedback+
Attachment #8402400 - Flags: feedback?(gavin.sharp) → feedback+
I think when you deleteDatabase you should wait for the onsuccess/onerror handlers of the request to fire before invoking your callbacks.
Flags: needinfo?(khuey)
Flags: needinfo?(jonas)
Flags: needinfo?(bent.mozilla)
Mak, we're opening to get this work into production by the end of the week. What are your thoughts? :)
Flags: needinfo?(mak77)
The last time we hit such issues, we decided the only test using about:home should be the about:home test, that is ignoring exceptions as expected. As discussed in the original bug, it's very hard, probably even unwanted, to try to solve this in a better way.
First of all, it's not an error we care about, we are fine with losing data and we don't want to slow down the page opening/closing to cope with something we don't care about.
Second, we are moving the search field into about:newTab, that makes me think we should not spend time improving test-ability of current about:home (since newtab is privileged it probably doesn't even need indexedDB). My suspect is that about:home is not going to live much longer.
Third, as previously discussed, there's no interest into adding a global notification stating that idb has completed all pending work (may be complex and has a maintenance cost), so the only way would be to manually track every single transaction, and delay onunload (not really nice for the user). That code would probably be error-prone expensive to maintain.
In the end, I don't think it's worth spending any time on that.

Re comment 16, it may be correct to wait for the deleteDatabase request, but in this case I don't see how it's going to help, on open error we deleteDatabase and we just move on without a database, we don't use it anymore. the success handler of openConnection should not be invoked since we don't retry to open it.
At any time the tab may still be closed and idb connection may be open and rollbacking a not completed transaction, and maybe the next test is hitting that.

It wouldn't be an issue for a user, snippets may not be shown in some quite rare case.

ignoring exceptions for the subtest sounds ok, moving those tests to abouthome test (that should also be splitted since too long) would be even better, imo.
Flags: needinfo?(mak77)
(In reply to Marco Bonardo [:mak] from comment #19)
> First of all, it's not an error we care about, we are fine with losing data
> and we don't want to slow down the page opening/closing to cope with
> something we don't care about.

If we don't care about this error, can we have about:home deal with it more gracefully e.g. by adding a try…catch around it?
error events will bubble to window.onerror and then the error console unless event.preventDefault() is called. Maybe about:home should add those.
Hey Matt, should we go with the Approach 2 patch for now or is an updated patch forthcoming? We want to get the Cedar work into production by tomorrow, which realistically means that all patches need to be landed by today.
Flags: needinfo?(MattN+bmo)
I was hoping we'd take the Approach 1 or Marco would do a proper fix. I don't see why a proper fix needs to block this bug when the main test for about:home is doing the same. I need to focus on Australis so won't have time to investigate this.
Flags: needinfo?(MattN+bmo)
Thanks Matt. Gavin, how would you like us to proceed here?
Flags: needinfo?(gavin.sharp)
(In reply to Matthew N. [:MattN] from comment #20)
> (In reply to Marco Bonardo [:mak] from comment #19)
> > First of all, it's not an error we care about, we are fine with losing data
> > and we don't want to slow down the page opening/closing to cope with
> > something we don't care about.
> 
> If we don't care about this error, can we have about:home deal with it more
> gracefully e.g. by adding a try…catch around it?

you cannot try catch this afaik... IIRC I also tried to get onError but was unable cause the error happened after the page was already gone. I tried some time ago fwiw, not sure if anything changed.
I'm confused about how this is causing test failures to begin with. The exception in comment 0 is triggered by a content script (aboutHome.js) - how is it reaching the test harness' chrome window onerror handler? Do content exceptions like that typically bubble to chrome onerror listeners?

And why wouldn't try/catching the .objectStore access in about:home work? Can we make sure to try that before landing a worse workaround like v.1?

If the v.1 patch mitigates the problem, we can do that in the near term. Let's just make sure we leave a bug open to sort out the underlying problems here.
Flags: needinfo?(gavin.sharp)
Comment on attachment 8402239 [details] [diff] [review]
v.1 ignoreAllUncaughtExceptions in the about:home subtest

Appears to be working well on Cedar. Mind turning that f+ into an r+, Gavin? :)
Attachment #8402239 - Flags: review?(gavin.sharp)
Funny story, Kyle's disabling of browser_aboutHome.js on inbound caused these exact failures to start happening there as well. So I went ahead and pushed Approach 1 as a bustage fix with the f+ and comment 26 as an implicit OK for landing it as a short-term measure. Setting leave-open on the bug for a better fix once available.

https://hg.mozilla.org/integration/mozilla-inbound/rev/edf0e2b97f99
Keywords: leave-open
Attachment #8402239 - Flags: review?(gavin.sharp) → review+
Landed Approach 1 on all affected release branches due to this blocking bug 995041. Calling it "fixed" for tracking purposes, but feel free to set the flags back to affected when there's a final patch ready to land.

https://hg.mozilla.org/releases/mozilla-aurora/rev/4f032968994e
https://hg.mozilla.org/releases/mozilla-beta/rev/0e107cfcd3fd
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/f2a5c55476bd
https://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/04241f0d2298
Whiteboard: [qa-]
The rest of this is being handled in bug 1082133.
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Keywords: leave-open
Resolution: --- → FIXED
Iteration: --- → 36.1
Flags: qe-verify-
Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.