Closed Bug 974270 Opened 7 years ago Closed 6 years ago

Fix or disable datastore mochitests on b2g emulator

Categories

(Core :: DOM: Device Interfaces, defect)

x86
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla34

People

(Reporter: philor, Assigned: ehsan)

References

Details

Attachments

(2 files, 2 obsolete files)

When I was trying (and failing) to retrigger my way back to before they got bad, I was seeing failure rates between 30% and 80%, neither of which are at all acceptable.

One very good possibility is that I couldn't find a time when I could retrigger green because they don't much like the m1.medium AWS instances that we switched back to recently, and were one of the few things that were happier on the m3.mediums.

At any rate, even though it's hard to see them as one thing while they are happening, as an overall pattern it's clear: we time out in two datastore tests, fail with "Error callback invoked" in a bunch more, then time out in tests_downloads_basic.html and tests_downloads_large.html (or more rarely, for variety, time out in just one datastore test, and add in a third downloads test). Then, because the mochitest harness thinks that four timeouts means the run has gone to hell, we stop running any more tests, which means that the last 116 mochitests in mochitest-5 only get to run between 70% and 20% of the time, which even more than the pain in the ass constant starring means these either need to be fixed, or need to be disabled.

If uninstalling an app (which seems to be where they are when they time out) just takes a long time, it's possible that they are honest timeouts from tests that just take too long, and the "fix" (other than making uninstalling apps faster) is just to requestLongerTimeout(2) or (5) or (10).
(In reply to Phil Ringnalda (:philor) from comment #0)
> If uninstalling an app (which seems to be where they are when they time out)
> just takes a long time, it's possible that they are honest timeouts from
> tests that just take too long, and the "fix" (other than making uninstalling
> apps faster) is just to requestLongerTimeout(2) or (5) or (10).

I think we should definitely try this idea out on try.  Andrea, can you please take this?  Thanks!
Flags: needinfo?(amarchesini)
Attached patch timeout.patch (obsolete) — Splinter Review
https://tbpl.mozilla.org/?tree=Try&rev=b2c2e18e8d92
Flags: needinfo?(amarchesini)
it seems green.
Attachment #8378187 - Flags: review?(ehsan)
Comment on attachment 8378187 [details] [diff] [review]
timeout.patch

Review of attachment 8378187 [details] [diff] [review]:
-----------------------------------------------------------------

The patch looks good, but I don't think that a single try run is enough evidence that this actually fixes the oranges.  I have triggered 20 runs or so of the ICS emulator test which runs the data store tests in both opt and debug modes, please wait for those try runs to finish before landing this.  Thanks!
Attachment #8378187 - Flags: review?(ehsan) → review+
The tests are still timing out. :(
Sorry, debugging these is going to have to move to Try :(. Disabled.

https://hg.mozilla.org/mozilla-central/rev/3762e1037b59
Assignee: nobody → ryanvm
Whiteboard: [test disabled on B2G][leave open]
Assignee: ryanvm → amarchesini
Dunno whether it also dropped the frequency or left it exactly the same, but that disabling seems to have shifted it from two datastore and two download timeouts resulting in the run being killed to four download timeouts resulting in the run being killed. What do datastore and download tests have in common?
it turned out that this issue is 99% a duplicate of Bug 965677.
Of course we see

1016B2GRunner TEST-UNEXPECTED-FAIL | /tests/dom/datastore/tests/test_readonly.html | application timed out after 330.0 seconds with no output

But the full log shows:

02:29:24     INFO -  1027 INFO TEST-START | /tests/dom/datastore/tests/test_readonly.html
02:29:24     INFO -  1028 INFO TEST-INFO | /tests/dom/datastore/tests/test_readonly.html | permissions
02:29:24     INFO -  1029 INFO TEST-INFO | /tests/dom/datastore/tests/test_readonly.html | configured
02:29:24     INFO -  1030 INFO TEST-INFO | /tests/dom/datastore/tests/test_readonly.html | installed
02:29:24    ERROR - Return code: 247

Does somebody know where this 'return code: 247' comes from?
Flags: needinfo?(philringnalda)
I would assume from the mochitest harness saying "it's been 330 seconds since you last produced any output, you must be hung, time to kill you." Saying this is a duplicate of bug 965677 is like a coroner saying "well, he's dead, and all those corpses in that pile are dead, he must have died of death."
Flags: needinfo?(philringnalda)
Andrea, have you tried to see if you can reproduce locally?
Yes. I cannot reproduce it. And I tested it a looot of times.
I don't think what philor is saying is 100% correct. Think about this:

https://tbpl.mozilla.org/?tree=Try&rev=50072732a5bd - the first orange shows:

...
10:40:43     INFO -  933 INFO TEST-INFO | /tests/dom/datastore/tests/test_arrays.html | app executed
10:40:43     INFO -  934 INFO TEST-PASS | /tests/dom/datastore/tests/test_arrays.html | Message from app: OK Get - Data matches: 29 29 == 30 (elided 100 passes or known failures)
10:40:43     INFO -  -- webapps.js uninstall http://test/tests/dom/datastore/tests/file_app.sjs?testToken=file_arrays.html
10:40:43    ERROR - Return code: 247
10:40:43     INFO - dumping logcat
10:40:43     INFO - Running command: ['cat', '/builds/slave/test/build/emulator-5554.log']
10:40:43     INFO - Copy/paste: cat /builds/slave/test/build/emulator-5554.log

No mochitests are in timeout but we still have this "247" error code.
Here are two suggestions:

1. Try to ask in #ateam whether someone knows what exactly the exit code 247 means.

2. File a bug to get access to a test slave so that you can run the test inside the emulator on that machine to see if you can reproduce.  Sometimes test failures like this won't happen locally because they are caused by some kind of a timing that only happens on our test machines.
Attached patch patch (obsolete) — Splinter Review
This should fix the issue.
Attachment #8378187 - Attachment is obsolete: true
Attachment #8382150 - Flags: review?(ehsan)
See comment 6. Your patch won't actually re-enable them.
Comment on attachment 8382150 [details] [diff] [review]
patch

Review of attachment 8382150 [details] [diff] [review]:
-----------------------------------------------------------------

r+ on the file_basic.html part, but please don't land the other part yet.  We still have more investigation to do.

Also, r=me on adding requestCompleteLog() on all data store tests.
Attachment #8382150 - Flags: review?(ehsan) → review+
Attached patch patchSplinter Review
Attachment #8382150 - Attachment is obsolete: true
Attachment #8382323 - Flags: review?(ehsan)
Attachment #8382323 - Flags: review?(ehsan) → review+
Andrea, with bug 981085 fixed, can you please give these tests another spin on the try server and see how we're doing?  Thanks!
Flags: needinfo?(amarchesini)
Still the same problem. This issue is still there.
We still have the following in the log:

10:38:28     INFO -  ************************************************************
10:38:28     INFO -  * Call to xpconnect wrapped JSObject produced this error:  *
10:38:28     INFO -  [Exception... "Component returned failure code: 0x80570016 (NS_ERROR_XPC_GS_RETURNED_FAILURE) [nsIJSCID.getService]"  nsresult: "0x80570016 (NS_ERROR_XPC_GS_RETURNED_FAILURE)"  location: "JS frame :: resource://gre/modules/XPCOMUtils.jsm :: XPCU_serviceLambda :: line 202"  data: no]
10:38:28     INFO -  ************************************************************
10:38:28     INFO -  ############################### browserElementPanning.js loaded
10:38:29     INFO -  ######################## BrowserElementChildPreload.js loaded
10:38:31     INFO -  -- webapps.js uninstall http://test/tests/dom/datastore/tests/file_app.sjs?testToken=file_app_install.html
10:38:34     INFO -  723 INFO TEST-INFO | MEMORY STAT vsize after test: 124121088
10:38:34     INFO -  724 INFO TEST-INFO | MEMORY STAT residentFast after test: 66154496
10:38:34     INFO -  725 INFO TEST-INFO | MEMORY STAT heapAllocated after test: 32738988
10:38:34     INFO -  726 INFO TEST-END | /tests/dom/datastore/tests/test_app_install.html | finished in 9912ms
10:38:35     INFO -  727 INFO TEST-START | /tests/dom/datastore/tests/test_arrays.html
10:38:40     INFO -  ************************************************************
10:38:40     INFO -  * Call to xpconnect wrapped JSObject produced this error:  *
10:38:40     INFO -  [Exception... "[JavaScript Error: "DOMApplicationRegistry is not defined" {file: "resource://gre/modules/BrowserElementParent.jsm" line: 238}]'[JavaScript Error: "DOMApplicationRegistry is not defined" {file: "resource://gre/modules/BrowserElementParent.jsm" line: 238}]' when calling method: [nsIObserver::observe]"  nsresult: "0x80570021 (NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS)"  location: "JS frame :: http://mochi.test:8888/tests/dom/datastore/tests/test_arrays.html :: testApp :: line 65"  data: yes]
10:38:40     INFO -  ************************************************************

Fabrice said in bug 961317 comment 130 that Webapps.jsm should never be loaded in a content process.  Bug 902165 is intended to fix that.  We should probably wait for bug 902165 to get fixed and test this again.
Depends on: 902165
I have no idea what has fixed these tests, but the try server says
they are fixed.  We always suspected that this is caused by a bug
in the mozapps code, and there has been some development there, and
elsewhere.
https://tbpl.mozilla.org/?tree=Try&rev=e01e1adb5f46

There's of course individual tests that I will examine separately.
Assignee: amarchesini → ehsan
Attachment #8462307 - Flags: review?(khuey)
https://hg.mozilla.org/integration/b2g-inbound/rev/7c5efe8157e0
Whiteboard: [test disabled on B2G][leave open]
https://hg.mozilla.org/mozilla-central/rev/7c5efe8157e0
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
See Also: → 1047522
You need to log in before you can comment on or make changes to this bug.