Fix and re-enable dom/downloads tests on B2G

RESOLVED WORKSFORME

Status

()

defect
P1
normal
RESOLVED WORKSFORME
5 years ago
4 months ago

People

(Reporter: RyanVM, Unassigned)

Tracking

({intermittent-failure})

unspecified
ARM
Gonk (Firefox OS)
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(tracking-b2g:backlog)

Details

(Whiteboard: [systemsfe][p=8][priority])

Attachments

(1 attachment)

The dom/downloads tests are timeout-prone, often leading to the test suite aborting the run. The timeouts are frequent enough to cause the mochitest suite they run in to not meet visibility standards. Therefore, they are being disabled until they can be fixed and re-enabled.
Note that these tests are failure prone in both emulator and B2G desktop builds.

https://hg.mozilla.org/integration/mozilla-inbound/rev/d4b5b3c9f947
Keywords: leave-open
Hi all, I'll try and look into this again ASAP.
Status: NEW → ASSIGNED
Whiteboard: [systemfe][p=8]
Actually, Ted is going to look at this. :)
Assignee: nobody → tclancy
Whiteboard: [systemfe][p=8] → [systemsfe][p=8]
Target Milestone: --- → 1.4 S3 (14mar)
Attachment #8387581 - Flags: review?(ryanvm) → review+
Comment 6 only re-disabled them on desktop builds. Should have been all B2G builds.
https://hg.mozilla.org/integration/b2g-inbound/rev/395537cf3310
Target Milestone: 1.4 S3 (14mar) → 1.4 S4 (28mar)
Whiteboard: [systemsfe][p=8] → [systemsfe][p=8][priority]
blocking-b2g: --- → backlog
Target Milestone: 1.4 S4 (28mar) → ---
Priority: -- → P1
It's probably worth calling out that these tests don't run *anywhere* since dom/downloads only gets built "if CONFIG['MOZ_B2G']" (per dom/moz.build) and the mochitest.ini is now "skip-if = buildapp == 'mulet' || buildapp == 'b2g'"
Assignee: tclancy → nobody
(setting to NEW since there is no longer an assignee)

So, all the referenced logs are now dead and the TBPL robot excerpts aren't super helpful, but while running these tests on a local emulator-kk build I am seeing test_downloads_basic.html fail on this assertion intermittently:

15 INFO TEST-UNEXPECTED-FAIL | dom/downloads/tests/test_downloads_basic.html | Download current size is zero - got 1024, expected 0

Which seems to be based on the assumption that the add event will be emitted before any download progress happens.  This is clearly incorrect per the following debug observed from the same run:

-*- DownloadsAPI.jsm : onDownloadAdded ({totalBytes:1024, currentBytes:1024, url:"http://mochi.test:8888/tests/dom/downloads/tests/serve_file.sjs?contentType=application/octet-stream&size=1024", path:"/storage/sdcard/downloads/test.bin", contentType:"application/octet-stream", startTime:1424187832007, sourceAppManifestURL:(void 0), id:"download-1", state:"downloading"})

It seems like if the test really wants to assert that check, it needs to be able to be able to communicate with serve_file.sjs to tell it to release the data only after it hears about the download starting.  Unfortunately that's a bit of a hassle since a dummy XPCOM object with wrappedJSObject would need to be used.  Right now server_file.sjs uses a sketchy hard-coded 1000ms delay before it starts writing, which explains why this would be so intermittent.


I find the whole download codebase a bit confusing in terms of what the normal control flow is.  Handy for anyone else looking into this in the future is that test_downloads_basic.html's click() event results in the following toSerializable() download representation being notified (emphasis on the "copy" saver being used rather than the "legacy" saver):
{source:{url:"http://mochi.test:8888/tests/dom/downloads/tests/serve_file.sjs?contentType=application/octet-stream&size=1024", referrer:"http://mochi.test:8888/tests/dom/downloads/tests/test_downloads_basic.html"}, target:{path:"/storage/sdcard/downloads/test.bin", partFilePath:"/storage/sdcard/downloads/test.bin.part"}, saver:{type:"copy", entityID:"/1024/"}, startTime:"2015-02-17T15:49:05.850Z", tryToKeepPartialData:true, contentType:"application/octet-stream"}
Status: ASSIGNED → NEW
I reproduced the test_downloads_large.html failure too that looks like:
93 INFO TEST-UNEXPECTED-FAIL | dom/downloads/tests/test_downloads_large.html | Start with an empty download list. - expected PASS

At least in my reproduction, the problem here is that the mozDownloads API thinks removeFinished() returns a promise.  It does not, even though it's using Task.spawn, which is probably an oversight.  So clearAllDone's subsequent call to list.getAll() "races" the highly-promise-using removeFinished() and remove() implementations.  Race is probably the wrong phrase since it seems more like a deterministic interleaving since none of them involve any I/O.  If the download list is long enough, it seems certain for clearAllDone to fail.  (Say, like if I added a new test that adds a download.)  This probably explains the test_downloads_pause_resume failure instances that follow a test_downloads_pause_remove timeout, since that test tries to remove a download, and if it fails, it could push the following test over the limit.
:paolo, re: my comment 12 here, removeFinished uses Task.spawn but does not return a promise.  see: 
https://dxr.mozilla.org/mozilla-central/source/toolkit/components/jsdownloads/src/DownloadList.jsm#235

Would it be okay to modify that code to just return the Task.spawn promise?  (I'm basically looking for an rubber-stamp on making just that change as part of a bigger patch that might make these mozDownload tests happier.
Flags: needinfo?(paolo.mozmail)
Hi Andrew, thanks for looking into this. I wasn't involved in the implementation of the DOM API for B2G but took a quick look around and I see your overall task might more complex than it initially seemed (although this specific issue about removeFinished seems to have a clear way forward).

(In reply to Andrew Sutherland [:asuth] from comment #13)
> removeFinished uses Task.spawn but does not return a promise.
> Would it be okay to modify that code to just return the Task.spawn promise?

removeFinished does not return anything by design (note how we catch and report any async errors to the console). The UI is expected to be updated because it receives removal notifications, and there is no special event to wait for after removeFinished is called. This is because the call cannot guarantee, by the time it ends, that all downloads in the list are not finished, since other downloads may have finished in the meantime.

Tests should just check that the expected notifications happen. See how we test the function in the back-end:

http://mxr.mozilla.org/mozilla-central/source/toolkit/components/jsdownloads/test/unit/test_DownloadList.js#405

In fact, the caller of clearAllDone does not expect it to return a Promise as well, and the mock object in Gaia tests also returns undefined:

http://mxr.mozilla.org/gaia/search?string=clearAllDone

(There's a comment about bug 1097435 but I don't think it's related to anything in this bug).

I think our internal mozDownloadManager.clearAllDone DOM API should just be changed to return void.

In fact, the current implementation does not do what is expected anyways. The documentation says the promise resolves with the list of removed downloads:

http://mxr.mozilla.org/mozilla-central/source/dom/downloads/DownloadsAPI.jsm#190

But it currently resolves with the list of _remaining_ downloads:

http://mxr.mozilla.org/mozilla-central/source/dom/webidl/Downloads.webidl#39

Fortunately we ignore the result.

Two drive-by comments on the above code as well:
1) The second "list = yield Downloads.getList(Downloads.ALL);" just returns the same DownloadList object as the first call. A snapshot is taken only when calling the getAll method, see <https://developer.mozilla.org/en-US/docs/Mozilla/JavaScript_code_modules/Downloads.jsm/DownloadList#getAll%28%29>.
2) Error handling is not quite right in that if an error occurs, we report it to the console but don't send a message back to the other process, so we keep a dangling promise in the global list in the other process.
Flags: needinfo?(paolo.mozmail)
(In reply to :Paolo Amadini from comment #14)
> http://mxr.mozilla.org/mozilla-central/source/dom/downloads/DownloadsAPI.
> jsm#190
> http://mxr.mozilla.org/mozilla-central/source/dom/webidl/Downloads.webidl#39

Obviously these links were swapped.

I'd also add that changing removeFinished to return a Promise, even assuming it was the right thing which isn't, would not be a quick fix at all because all the consumers would need to be updated to wait on the returned promise and report any errors, and specific tests would need to be added for the new behavior. And we'd have in some way to include the finalization of each download in the equation as well, which we can just ignore for now.
(In reply to Andrew Sutherland [:asuth] from comment #11)
> Which seems to be based on the assumption that the add event will be emitted
> before any download progress happens.  This is clearly incorrect

Yeah, DownloadList and Download status events are completely orthogonal.

> It seems like if the test really wants to assert that check, it needs to be
> able to be able to communicate with serve_file.sjs to tell it to release the
> data only after it hears about the download starting.  Unfortunately that's
> a bit of a hassle since a dummy XPCOM object with wrappedJSObject would need
> to be used.  Right now server_file.sjs uses a sketchy hard-coded 1000ms
> delay before it starts writing, which explains why this would be so
> intermittent.

Ouch. See our httpd.js based implementation and registerInterruptibleHandler() / continueResponses().

http://mxr.mozilla.org/mozilla-central/source/toolkit/components/jsdownloads/test/unit/head.js#588

Actually, you may be able to simplify things by updating tests to focus on the DOM API and avoid checking anything that the back-end already guarantees for you (I haven't looked at the code in detail though, so I'm not talking necessarily about this specific case).

> (emphasis on the "copy" saver being used rather than the "legacy" saver):

It would be great to get rid of the "legacy" saver, but we still need to use it for all downloads initiated from existing request channels. Legacy downloads that are restarted, or new downloads that are initiated directly using the Downloads.jsm API, use the "copy" saver by default instead. That's why a "legacy" saver serializes as a "copy" saver, "legacy" savers can't be deserialized meaningfully.
Thanks for the super-prompt and thorough response!

(In reply to :Paolo Amadini from comment #14)
> (There's a comment about bug 1097435 but I don't think it's related to
> anything in this bug).

Agreed, although I don't actually understand how the situation the bug implies could happen based on the current implementation.  Maybe it got fixed?

> I think our internal mozDownloadManager.clearAllDone DOM API should just be
> changed to return void.

Okay, I'll do this and fix-up the test to use a helper method that operates in the same fashion as the test you linked to.  Specifically, it will:
- run getList
- run clearAllDone, expecting each known DOMDownload in the "succeded"/"stopped" states to experience a change event that marks it finalized.  (mozDownloadManager has no other notification of removal.)  Any download in the "downloading" state will generate a failure since it means another test smeared into the the test in question in an unexpected fashion that will likely break the test further.

This should allow test_downloads_large and its friends to still test getDownloads() the way they expect and arguably also serve as a better check of clearAllDone than what exists now.  (Although, as you say elsewhere, I'm not sure mozDownloadManager really needs to be second-guessing jsdownloads.)
 
> In fact, the current implementation does not do what is expected anyways.
> The documentation says the promise resolves with the list of removed
> downloads:

Yeah, I saw that too and updated the comment in my patch on bug 825318 that adds the adoptDownload method to mozDownloadManager.  I'll update it now to say it's void and such.

> Two drive-by comments on the above code as well:
> 1) The second "list = yield Downloads.getList(Downloads.ALL);" just returns
> the same DownloadList object as the first call. A snapshot is taken only
> when calling the getAll method, see
> <https://developer.mozilla.org/en-US/docs/Mozilla/JavaScript_code_modules/
> Downloads.jsm/DownloadList#getAll%28%29>.

(This will be mooted by the change.)

> 2) Error handling is not quite right in that if an error occurs, we report
> it to the console but don't send a message back to the other process, so we
> keep a dangling promise in the global list in the other process.

I think this will also be mooted since the function will no longer need to return a value or have a tracking promise, etc.

(In reply to :Paolo Amadini from comment #16)
> Ouch. See our httpd.js based implementation and
> registerInterruptibleHandler() / continueResponses().
> 
> http://mxr.mozilla.org/mozilla-central/source/toolkit/components/jsdownloads/
> test/unit/head.js#588

Yes, the xpcshell tests seem to be doing the right thing there, but it's also simpler.  I think you grokked what I said, but for clarity, I was referencing https://developer.mozilla.org/en-US/docs/Mochitest#How_do_I_keep_state_across_loads_of_different_server-side_scripts.3F which indicates we need to jump through some hoops since the mochitest implementation involves significantly more sandboxes and execution contexts.
 
> Actually, you may be able to simplify things by updating tests to focus on
> the DOM API and avoid checking anything that the back-end already guarantees
> for you (I haven't looked at the code in detail though, so I'm not talking
> necessarily about this specific case).

Yes.  I've addressed this issue now by not checking on the currentBytes size in the test and adding a comment about how the check was not expected to be reliable.

> > (emphasis on the "copy" saver being used rather than the "legacy" saver):
> 
> It would be great to get rid of the "legacy" saver, but we still need to use
> it for all downloads initiated from existing request channels. Legacy
> downloads that are restarted, or new downloads that are initiated directly
> using the Downloads.jsm API, use the "copy" saver by default instead. That's
> why a "legacy" saver serializes as a "copy" saver, "legacy" savers can't be
> deserialized meaningfully.

Interesting!  Mainly, I had, and still have no idea how the b2g download started via a DOM click() was triggering the download to start.  I had missed the nuance that the legacy saver serializes as a copy saver, though, so maybe that explains things.  (Not that I was trying that hard to find out :)
(In reply to Andrew Sutherland [:asuth] from comment #17)
> Thanks for the super-prompt and thorough response!

Thanks to you for working on this :-)

> we need to jump through some hoops since the mochitest
> implementation involves significantly more sandboxes and execution contexts.

Yeah, unfortunately mochitest is often more complicated than xpcshell, and can't reuse most code from other suites.

> Mainly, I had, and still have no idea how the b2g download
> started via a DOM click() was triggering the download to start.

If you're curious, when a "Content-Disposition" header or the "download" attribute on the "a" element is seen, the load is redirected through "uriloader/exthandler/nsExternalHelperAppService.cpp", that then triggers the Downloads code using the nsITransfer interface. One day I'll write a wiki page instead of saying this on bugs every time :-)
I'm experiencing some OS.File.stat problems in the emulator debug build that is making the adoptDownload test intermittently timeout now that we check which is my current blocker.  (try closures also caused some delays)

A "good" run looks like this where it takes us 3 seconds to actually run the expected-to-fail OS.File.stat call:
16:11:07     INFO -  -*- DownloadsAPI.jsm : stat
16:11:10     INFO -  -*- DownloadsAPI.jsm : error branch
(from http://ftp.mozilla.org/pub/mozilla.org/b2g/try-builds/bugmail@asutherland.org-1d8c791a8b6b/try-emulator-debug/try_ubuntu64_vm-b2g-emulator-debug_test-mochitest-debug-7-bm116-tests1-linux64-build9.txt.gz)

A bad run just seems to have stuff disappear into the aether:
02:15:37     INFO -  -*- DownloadsAPI.jsm : stat
02:16:29     INFO -  [Parent 702] WARNING: A control runnable was posted to a worker that is already shutting down!: file ../../../gecko/dom/workers/WorkerPrivate.cpp, line 2417
(from http://ftp.mozilla.org/pub/mozilla.org/b2g/try-builds/bugmail@asutherland.org-1d8c791a8b6b/try-emulator-debug/try_ubuntu64_vm-b2g-emulator-debug_test-mochitest-debug-7-bm115-tests1-linux64-build18.txt.gz)

I'm spinning a local debug emulator build so I can ideally figure out what is going on.  I have also formulated the backup plan of becoming some type of reclusive hermit.
(In reply to Andrew Sutherland [:asuth] from comment #19)
> I'm experiencing some OS.File.stat problems in the emulator debug build that
> is making the adoptDownload test intermittently timeout now that we check
> which is my current blocker.  (try closures also caused some delays)

I identified a bug in OS.File on b2g that particularly likes to manifest on emulators because they are so sloooooooow.  It is bug 1135113.  I think I know how to fix the problem, but I'm still liking the hermit idea.
(clearing dep; that bug actually got duped across to bug 1125989, but the OS.File usage is specific to my adoptDownload enhancement on bug 825318 so this bug shouldn't depend on OS.File working)
No longer depends on: 1135113
I believe I've corrected the intermittent failures in this test with https://hg.mozilla.org/mozilla-central/rev/4d0ef24554ca from https://bugzilla.mozilla.org/show_bug.cgi?id=825318#c116

Resolving WFM.  Please reopen or refile if this comes back.

Do note that, as also described on https://bugzilla.mozilla.org/show_bug.cgi?id=825318#c109, we do not run the tests on b2g-desktop anymore because we do not expect mozDownloadManager to work on b2g-desktop correctly until bug 1130264 is resolved.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME
blocking-b2g: backlog → ---
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.