Closed
Bug 1272425
Opened 8 years ago
Closed 8 years ago
test_chrome_ext_downloads_misc.html is permafailing on cached downloads
Categories
(WebExtensions :: Untriaged, defect)
WebExtensions
Untriaged
Tracking
(firefox49 fixed)
RESOLVED
FIXED
mozilla49
Tracking | Status | |
---|---|---|
firefox49 | --- | fixed |
People
(Reporter: aswan, Assigned: aswan)
Details
(Whiteboard: [downloads] triaged)
Attachments
(3 files, 4 obsolete files)
As of https://hg.mozilla.org/integration/mozilla-inbound/rev/546b3763d1b6d2c3d64b34b9d9d25d346b805996, downloads created from the downloads tests are failing since the downloaded objects are cached.
Assignee | ||
Comment 1•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/bdeeb5e0c9a6e77c57e0affa17e6b033e7e3ab9c Bug 1272425: disable partial download tests pending bug fix r=me
Assignee | ||
Comment 2•8 years ago
|
||
MozReview-Commit-ID: 1ngLyDxvhRz
Assignee | ||
Comment 3•8 years ago
|
||
Quick background, this test covers the webextensions downloads api which uses DownloadCore.jsm to start downloads. It has a few different test cases that use a custom .sjs to create an object that we can reliably fetch only part of (it's a 30 byte object and the handler only sends the first 15 bytes for a regular GET but handles Range headers to get bytes 16-30 in a subsequent request). As of the change from the bug description, the second test case that tried to fetch this object began to fail. I thought it would be as simple as adding cache-control headers to prevent this object from getting cached. But that doesn't seem to work. The wip patch I attached adds a dump statement to the onProgress handler that DownloadCore attaches to the http channel. When I run test_chrome_ext_downloads_misc.html with this patch, I can see the partial request go by with tcpdump on the loopback interface, but the dump statement is never printed for the second test case (the first test case passes so of course the dump statement is displayed). I'm a little stumped, any thoughts?
Flags: needinfo?(mcmanus)
Assignee | ||
Updated•8 years ago
|
Keywords: leave-open
Comment 4•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/bdeeb5e0c9a6
Comment 5•8 years ago
|
||
(In reply to Andrew Swan [:aswan] from comment #3) > Quick background, this test covers the webextensions downloads api which > uses DownloadCore.jsm to start downloads. It has a few different test cases > that use a custom .sjs to create an object that we can reliably fetch only > part of (it's a 30 byte object and the handler only sends the first 15 bytes > for a regular GET but handles Range headers to get bytes 16-30 in a > subsequent request). > > As of the change from the bug description, the second test case that tried > to fetch this object began to fail. I thought it would be as simple as > adding cache-control headers to prevent this object from getting cached. > But that doesn't seem to work. The wip patch I attached adds a dump > statement to the onProgress handler that DownloadCore attaches to the http > channel. When I run test_chrome_ext_downloads_misc.html with this patch, I > can see the partial request go by with tcpdump on the loopback interface, > but the dump statement is never printed for the second test case (the first > test case passes so of course the dump statement is displayed). > > I'm a little stumped, any thoughts? I would have to dig into it to say more - but have you seen an NSPR log for it? That often helps. https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging
Flags: needinfo?(mcmanus)
Assignee | ||
Updated•8 years ago
|
Whiteboard: [downloads] → [downloads] triaged
Assignee | ||
Comment 6•8 years ago
|
||
I've attached a lengthy NSPR log. I took a glance and a couple of things stood out. There's this at line 4842: > 2016-05-17 18:19:55.546208 UTC - [Main Thread]: D/nsHttp Must validate since response contains 'no-cache' header I assume "validate" means "make a request to the origin server" but it doesn't appear to actually do that. Without seeing the "http request" logged, I do see the following in the vicinity of line 4850: > 2016-05-17 18:19:55.547906 UTC - [Main Thread]: V/nsHttp Trading cache input stream for output stream [channel=12fc4f800] ... > 2016-05-17 18:19:55.547940 UTC - [Main Thread]: V/nsHttp nsHttpChannel::OnDataAvailable [this=12fc4f800 request=11c2a88b0 offset=0 count=15] > 2016-05-17 18:19:55.547943 UTC - [Main Thread]: V/nsHttp sending progress and status notification [this=12fc4f800 status=804b0006 progress=15/31] This concerns me for two reasons: 1. there's a no-cache header but data is actually being served from cache? 2. It claims to be sending progress notification but we're not getting the notification. Then this at line 4892: > 2016-05-17 18:19:55.589116 UTC - [Main Thread]: V/nsHttp nsHttpChannel::Connect 128040800 AwaitingCacheCallbacks forces async Again it appears to be waiting for the cache even though the object is marked no-cache. I let the test sit quietly deadlocked for about 10 seconds but then noticed then when I quite the browser, during shutdown we finally got all this: > 2016-05-17 18:20:09.266310 UTC - [Main Thread]: V/nsHttp nsHttpChannel::FinalizeCacheEntry [this=12fc4f800] > 2016-05-17 18:20:09.266312 UTC - [Main Thread]: V/nsHttp calling OnStopRequest > 2016-05-17 18:20:09.266424 UTC - [Main Thread]: V/nsHttp nsHttpChannel::CloseCacheEntry [this=12fc4f800] mStatus=80004004 mCacheEntryIsWriteOnly=1 > 2016-05-17 18:20:09.266454 UTC - [Main Thread]: V/nsHttp dooming cache entry!! > 2016-05-17 18:20:09.268692 UTC - [Main Thread]: V/nsHttp nsHttpChannel::OnCacheEntryAvailable [this=128040800 entry=0 new=0 appcache=0 status=804b003d mAppCache=0 mAppCacheForWrite=0] > 2016-05-17 18:20:09.268702 UTC - [Main Thread]: V/nsHttp nsHttpChannel::SetupTransaction [this=128040800] > 2016-05-17 18:20:09.268713 UTC - [Main Thread]: D/nsHttp Creating nsHttpTransaction @11730e400 > 2016-05-17 18:20:09.268714 UTC - [Main Thread]: V/nsHttp nsHttpChannel 128040800 created nsHttpTransaction 11730e400 > 2016-05-17 18:20:09.268789 UTC - [Main Thread]: D/nsHttp nsHttpTransaction::Init [this=11730e400 caps=1] > 2016-05-17 18:20:09.268824 UTC - [Main Thread]: I/nsHttp http request [ > 2016-05-17 18:20:09.268829 UTC - [Main Thread]: I/nsHttp GET http://mochi.test:8888/chrome/toolkit/components/extensions/test/mochitest/interruptible.sjs HTTP/1.1 > 2016-05-17 18:20:09.268831 UTC - [Main Thread]: I/nsHttp Host: mochi.test:8888 > 2016-05-17 18:20:09.268832 UTC - [Main Thread]: I/nsHttp User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:49.0) Gecko/20100101 Firefox/49.0 > 2016-05-17 18:20:09.268834 UTC - [Main Thread]: I/nsHttp Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 > 2016-05-17 18:20:09.268835 UTC - [Main Thread]: I/nsHttp Accept-Language: en-US,en;q=0.5 > 2016-05-17 18:20:09.268836 UTC - [Main Thread]: I/nsHttp Accept-Encoding: gzip, deflate > 2016-05-17 18:20:09.268838 UTC - [Main Thread]: I/nsHttp Connection: keep-alive > 2016-05-17 18:20:09.268839 UTC - [Main Thread]: I/nsHttp ] Which looks even more like that connection was stuck waiting for the cache. For further background on this test, one test case (test_cancel) starts a GET request but after the response is half complete, it cancels the request. The next test case (test_pauseresume) tries to fetch the same object. When it is working properly, we should get repeat the same behavior (cancel the GET request after receiving half the data), then issue a new GET request with a Range header to get the remaining data. I hope I'm not too far off base with the above but I'm now out of my depth with this code, can you take it further from here?
Flags: needinfo?(mcmanus)
Comment 7•8 years ago
|
||
there are times we block some channels on pending cache entries (hoping to reuse the cache entry). There are flags to avoid this (XHR uses it, for example). :mayhemer could give you more information with a focused question - he's the cache expert.
Flags: needinfo?(mcmanus)
Assignee | ||
Comment 8•8 years ago
|
||
MozReview-Commit-ID: HhYr1wxsrW4
Assignee | ||
Updated•8 years ago
|
Attachment #8751846 -
Attachment is obsolete: true
Assignee | ||
Comment 9•8 years ago
|
||
Attachment #8753460 -
Attachment is obsolete: true
Assignee | ||
Comment 10•8 years ago
|
||
The attached patch now has a much simpler test case that produces the same problem. I've also updated the NSPR log so there's now less other stuff to get in the way. I think the observations from comment 6 still apply though the line numbers from that comment are all different in the new patch.
Flags: needinfo?(honzab.moz)
Comment 11•8 years ago
|
||
I need two things: - tell me what URL to look for (the one that fails) - add also cache2:5 module to the NSPR_LOG_MODULES variable
Flags: needinfo?(honzab.moz) → needinfo?(aswan)
Comment 12•8 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #11) > I need two things: > - tell me what URL to look for (the one that fails) > - add also cache2:5 module to the NSPR_LOG_MODULES variable (and rerun the test with log captured ;))
Assignee | ||
Comment 13•8 years ago
|
||
Thanks for taking a look, new log that includes cache2 is attached. The URL used by the test is http://mochi.test:8888/chrome/toolkit/components/extensions/test/mochitest/interruptible.sjs
Attachment #8753590 -
Attachment is obsolete: true
Flags: needinfo?(aswan) → needinfo?(honzab.moz)
Comment 14•8 years ago
|
||
Thanks. The log says the following: - the first channel reads from the server - gets the following response (which is apparently non-resumable btw, but about that more below) 2016-05-18 17:53:46.943045 UTC - [Socket Thread]: I/nsHttp http response [ 2016-05-18 17:53:46.943053 UTC - [Socket Thread]: I/nsHttp HTTP/1.1 200 OK 2016-05-18 17:53:46.943056 UTC - [Socket Thread]: I/nsHttp Content-Type: text/plain 2016-05-18 17:53:46.943057 UTC - [Socket Thread]: I/nsHttp Content-Length: 31 2016-05-18 17:53:46.943059 UTC - [Socket Thread]: I/nsHttp Cache-Control: no-cache 2016-05-18 17:53:46.943061 UTC - [Socket Thread]: I/nsHttp Connection: close 2016-05-18 17:53:46.943063 UTC - [Socket Thread]: I/nsHttp Server: httpd.js 2016-05-18 17:53:46.943065 UTC - [Socket Thread]: I/nsHttp Date: Wed, 18 May 2016 17:53:46 GMT 2016-05-18 17:53:46.943066 UTC - [Socket Thread]: I/nsHttp ] - the second channel is created while the first is still waiting for the response - the second channel is blocked on waiting for the cache entry because the first one is still writing to it (and the entry needs to be revalidated): 2016-05-18 17:53:46.963366 UTC - [Main Thread]: V/nsHttp wait for entry completion, response is not resumable 2016-05-18 17:53:46.963393 UTC - [Main Thread]: D/nsHttp Must validate since response contains 'no-cache' header - the first channel's transaction's connection gets from some reason closed with ABORT (4004), it's about 20 seconds after the last activity: Patrick? ^^^ 2016-05-18 17:54:06.020247 UTC - [Socket Thread]: V/nsHttp nsHttpConnection::Close [this=12c8e6460 reason=80004004] - the cache entry is incomplete and non-resumable (never was!) and hence is not reused by the second channel To make a response resumable you must add following headers: * Accept-ranges: bytes * ETag: myetag ..and not "no-cache", which just means to always validate with the server, even if the cached response is not stale.
Flags: needinfo?(honzab.moz) → needinfo?(mcmanus)
Assignee | ||
Comment 15•8 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #14) > - the second channel is created while the first is still waiting for the > response > - the second channel is blocked on waiting for the cache entry because the > first one is still writing to it (and the entry needs to be revalidated): Okay, let me see if I can track this down, I thought that the first channel should have been closed before the second one was created. But the second point is still confusing me, knowing nothing about the cache is organized, I would have expected the entry for this URL to just be something indicating that it is uncacheable, and once the headers have been received there is enough information to finalize that entry. I'm surprised that the (uncacheable) data has to be fully received to finalize the cache entry. > - the first channel's transaction's connection gets from some reason closed > with ABORT (4004), it's about 20 seconds after the last activity: > > Patrick? ^^^ I let the test hang around deadlocked for about 20 seconds before killing it, I think the ABORT must have come from the shutdown process, that doesn't seem surprising to me.
Flags: needinfo?(mcmanus)
Comment 16•8 years ago
|
||
Re cache behavior: it would mean to explain how the http cache works, long story. Re connection cancel: aha! I missed that you killed the test. OK, then the problem is apparently in the server - it doesn't close the connection and the channel hangs forever waiting for the remaining 15 or so bytes. The sjs has to kill the connection. HTTP/2 will just wait indefinitely. The test is simply wrong. BTW, wasn't your test the last one before patrick added the two more tests of his?
Comment 17•8 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #16) > BTW, wasn't your test the last one before patrick added the two more tests > of his? Ignore this line, the structure of the test is apparently a bit different than I've thought.
Assignee | ||
Comment 18•8 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #16) > Re connection cancel: aha! I missed that you killed the test. OK, then the > problem is apparently in the server - it doesn't close the connection and > the channel hangs forever waiting for the remaining 15 or so bytes. The sjs > has to kill the connection. HTTP/2 will just wait indefinitely. The test > is simply wrong. Ah, so the fact that we've closed the connection from the client side doesn't matter as long as the server hasn't closed? This seems potentially bad, a buggy server could trigger this scenario for a "real" download or fetch. But regardless, I should be able to fix this now...
Comment 19•8 years ago
|
||
The test is trying to simulate a partial response to the first (non-range) requests. Is return Content-Length: 31 but only delivers 15 bytes. It goes via HTTP/2, AFAIU. In HTTP/1 world a partial response may only happen when the connection is torn down or the server is killed (whatever it means) before it finishes sending the whole response to the client. How to simulate partial response in HTTP/2 is something I don't know. Contact either Patrick (our HTTP/2 guru) or author of the tests.
Assignee | ||
Comment 20•8 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/53634/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/53634/
Attachment #8757126 -
Flags: review?(kmaglione+bmo)
Assignee | ||
Comment 21•8 years ago
|
||
See the bug for extended discussion, trying to fix this with re-used URLs doesn't seem practical so just use unique URLs for partial downloads. Review commit: https://reviewboard.mozilla.org/r/55674/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/55674/
Comment 22•8 years ago
|
||
Comment on attachment 8757126 [details] MozReview Request: Bug 1272425 Fix downloads misc tests r?kmag https://reviewboard.mozilla.org/r/55674/#review52558
Attachment #8757126 -
Flags: review?(kmaglione+bmo) → review+
Comment 23•8 years ago
|
||
https://reviewboard.mozilla.org/r/53634/#review52560 You can avoid creating review requests for temporary commits by passing -c to `hg push`.
Assignee | ||
Updated•8 years ago
|
Attachment #8757125 -
Attachment is obsolete: true
Assignee | ||
Comment 24•8 years ago
|
||
(In reply to Kris Maglione [:kmag] from comment #23) > https://reviewboard.mozilla.org/r/53634/#review52560 > > You can avoid creating review requests for temporary commits by passing -c > to `hg push`. yeah, i absent-mindedly did -r instead, let me fix that.
Assignee | ||
Updated•8 years ago
|
Keywords: leave-open
Comment 26•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/29c8e0e38c63
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox49:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Updated•6 years ago
|
Product: Toolkit → WebExtensions
You need to log in
before you can comment on or make changes to this bug.
Description
•