Closed Bug 1272425 Opened 8 years ago Closed 8 years ago

test_chrome_ext_downloads_misc.html is permafailing on cached downloads

Categories

(WebExtensions :: Untriaged, defect)

defect
Not set
normal

Tracking

(firefox49 fixed)

RESOLVED FIXED
mozilla49
Tracking Status
firefox49 --- fixed

People

(Reporter: aswan, Assigned: aswan)

Details

(Whiteboard: [downloads] triaged)

Attachments

(3 files, 4 obsolete files)

As of https://hg.mozilla.org/integration/mozilla-inbound/rev/546b3763d1b6d2c3d64b34b9d9d25d346b805996, downloads created from the downloads tests are failing since the downloaded objects are cached.
Attached patch wip (obsolete) — Splinter Review
MozReview-Commit-ID: 1ngLyDxvhRz
Quick background, this test covers the webextensions downloads api which uses DownloadCore.jsm to start downloads.  It has a few different test cases that use a custom .sjs to create an object that we can reliably fetch only part of (it's a 30 byte object and the handler only sends the first 15 bytes for a regular GET but handles Range headers to get bytes 16-30 in a subsequent request).

As of the change from the bug description, the second test case that tried to fetch this object began to fail.  I thought it would be as simple as adding cache-control headers to prevent this object from getting cached.  But that doesn't seem to work.  The wip patch I attached adds a dump statement to the onProgress handler that DownloadCore attaches to the http channel.  When I run test_chrome_ext_downloads_misc.html with this patch, I can see the partial request go by with tcpdump on the loopback interface, but the dump statement is never printed for the second test case (the first test case passes so of course the dump statement is displayed).

I'm a little stumped, any thoughts?
Flags: needinfo?(mcmanus)
Keywords: leave-open
(In reply to Andrew Swan [:aswan] from comment #3)
> Quick background, this test covers the webextensions downloads api which
> uses DownloadCore.jsm to start downloads.  It has a few different test cases
> that use a custom .sjs to create an object that we can reliably fetch only
> part of (it's a 30 byte object and the handler only sends the first 15 bytes
> for a regular GET but handles Range headers to get bytes 16-30 in a
> subsequent request).
> 
> As of the change from the bug description, the second test case that tried
> to fetch this object began to fail.  I thought it would be as simple as
> adding cache-control headers to prevent this object from getting cached. 
> But that doesn't seem to work.  The wip patch I attached adds a dump
> statement to the onProgress handler that DownloadCore attaches to the http
> channel.  When I run test_chrome_ext_downloads_misc.html with this patch, I
> can see the partial request go by with tcpdump on the loopback interface,
> but the dump statement is never printed for the second test case (the first
> test case passes so of course the dump statement is displayed).
> 
> I'm a little stumped, any thoughts?

I would have to dig into it to say more - but have you seen an NSPR log for it? That often helps.

https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging
Flags: needinfo?(mcmanus)
Whiteboard: [downloads] → [downloads] triaged
Attached file HTTP Log (obsolete) —
I've attached a lengthy NSPR log.
I took a glance and a couple of things stood out.  There's this at line 4842:
> 2016-05-17 18:19:55.546208 UTC - [Main Thread]: D/nsHttp Must validate since response contains 'no-cache' header

I assume "validate" means "make a request to the origin server" but it doesn't appear to actually do that.  Without seeing the "http request" logged, I do see the following in the vicinity of line 4850:

> 2016-05-17 18:19:55.547906 UTC - [Main Thread]: V/nsHttp Trading cache input stream for output stream [channel=12fc4f800]
...
> 2016-05-17 18:19:55.547940 UTC - [Main Thread]: V/nsHttp nsHttpChannel::OnDataAvailable [this=12fc4f800 request=11c2a88b0 offset=0 count=15]
> 2016-05-17 18:19:55.547943 UTC - [Main Thread]: V/nsHttp sending progress and status notification [this=12fc4f800 status=804b0006 progress=15/31]

This concerns me for two reasons: 1. there's a no-cache header but data is actually being served from cache?  2. It claims to be sending progress notification but we're not getting the notification.

Then this at line 4892:
> 2016-05-17 18:19:55.589116 UTC - [Main Thread]: V/nsHttp nsHttpChannel::Connect 128040800 AwaitingCacheCallbacks forces async

Again it appears to be waiting for the cache even though the object is marked no-cache.

I let the test sit quietly deadlocked for about 10 seconds but then noticed then when I quite the browser, during shutdown we finally got all this:

> 2016-05-17 18:20:09.266310 UTC - [Main Thread]: V/nsHttp nsHttpChannel::FinalizeCacheEntry [this=12fc4f800]
> 2016-05-17 18:20:09.266312 UTC - [Main Thread]: V/nsHttp   calling OnStopRequest
> 2016-05-17 18:20:09.266424 UTC - [Main Thread]: V/nsHttp nsHttpChannel::CloseCacheEntry [this=12fc4f800] mStatus=80004004 mCacheEntryIsWriteOnly=1
> 2016-05-17 18:20:09.266454 UTC - [Main Thread]: V/nsHttp   dooming cache entry!!
> 2016-05-17 18:20:09.268692 UTC - [Main Thread]: V/nsHttp nsHttpChannel::OnCacheEntryAvailable [this=128040800 entry=0 new=0 appcache=0 status=804b003d mAppCache=0 mAppCacheForWrite=0]
> 2016-05-17 18:20:09.268702 UTC - [Main Thread]: V/nsHttp nsHttpChannel::SetupTransaction [this=128040800]
> 2016-05-17 18:20:09.268713 UTC - [Main Thread]: D/nsHttp Creating nsHttpTransaction @11730e400
> 2016-05-17 18:20:09.268714 UTC - [Main Thread]: V/nsHttp nsHttpChannel 128040800 created nsHttpTransaction 11730e400
> 2016-05-17 18:20:09.268789 UTC - [Main Thread]: D/nsHttp nsHttpTransaction::Init [this=11730e400 caps=1]
> 2016-05-17 18:20:09.268824 UTC - [Main Thread]: I/nsHttp http request [
> 2016-05-17 18:20:09.268829 UTC - [Main Thread]: I/nsHttp   GET http://mochi.test:8888/chrome/toolkit/components/extensions/test/mochitest/interruptible.sjs HTTP/1.1
> 2016-05-17 18:20:09.268831 UTC - [Main Thread]: I/nsHttp   Host: mochi.test:8888
> 2016-05-17 18:20:09.268832 UTC - [Main Thread]: I/nsHttp   User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:49.0) Gecko/20100101 Firefox/49.0
> 2016-05-17 18:20:09.268834 UTC - [Main Thread]: I/nsHttp   Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> 2016-05-17 18:20:09.268835 UTC - [Main Thread]: I/nsHttp   Accept-Language: en-US,en;q=0.5
> 2016-05-17 18:20:09.268836 UTC - [Main Thread]: I/nsHttp   Accept-Encoding: gzip, deflate
> 2016-05-17 18:20:09.268838 UTC - [Main Thread]: I/nsHttp   Connection: keep-alive
> 2016-05-17 18:20:09.268839 UTC - [Main Thread]: I/nsHttp ]

Which looks even more like that connection was stuck waiting for the cache.
For further background on this test, one test case (test_cancel) starts a GET request but after the response is half complete, it cancels the request.  The next test case (test_pauseresume) tries to fetch the same object.  When it is working properly, we should get repeat the same behavior (cancel the GET request after receiving half the data), then issue a new GET request with a Range header to get the remaining data.

I hope I'm not too far off base with the above but I'm now out of my depth with this code, can you take it further from here?
Flags: needinfo?(mcmanus)
there are times we block some channels on pending cache entries (hoping to reuse the cache entry). There are flags to avoid this (XHR uses it, for example).

:mayhemer could give you more information with a focused question - he's the cache expert.
Flags: needinfo?(mcmanus)
Attached patch wipSplinter Review
MozReview-Commit-ID: HhYr1wxsrW4
Attachment #8751846 - Attachment is obsolete: true
Attached file HTTP Log (obsolete) —
Attachment #8753460 - Attachment is obsolete: true
The attached patch now has a much simpler test case that produces the same problem.  I've also updated the NSPR log so there's now less other stuff to get in the way.  I think the observations from comment 6 still apply though the line numbers from that comment are all different in the new patch.
Flags: needinfo?(honzab.moz)
I need two things:
- tell me what URL to look for (the one that fails)
- add also cache2:5 module to the NSPR_LOG_MODULES variable
Flags: needinfo?(honzab.moz) → needinfo?(aswan)
(In reply to Honza Bambas (:mayhemer) from comment #11)
> I need two things:
> - tell me what URL to look for (the one that fails)
> - add also cache2:5 module to the NSPR_LOG_MODULES variable

(and rerun the test with log captured ;))
Attached file nspr.txt
Thanks for taking a look, new log that includes cache2 is attached.  The URL used by the test is
http://mochi.test:8888/chrome/toolkit/components/extensions/test/mochitest/interruptible.sjs
Attachment #8753590 - Attachment is obsolete: true
Flags: needinfo?(aswan) → needinfo?(honzab.moz)
Thanks.  The log says the following:
- the first channel reads from the server
- gets the following response (which is apparently non-resumable btw, but about that more below)

2016-05-18 17:53:46.943045 UTC - [Socket Thread]: I/nsHttp http response [
2016-05-18 17:53:46.943053 UTC - [Socket Thread]: I/nsHttp   HTTP/1.1 200 OK
2016-05-18 17:53:46.943056 UTC - [Socket Thread]: I/nsHttp   Content-Type: text/plain
2016-05-18 17:53:46.943057 UTC - [Socket Thread]: I/nsHttp   Content-Length: 31
2016-05-18 17:53:46.943059 UTC - [Socket Thread]: I/nsHttp   Cache-Control: no-cache
2016-05-18 17:53:46.943061 UTC - [Socket Thread]: I/nsHttp   Connection: close
2016-05-18 17:53:46.943063 UTC - [Socket Thread]: I/nsHttp   Server: httpd.js
2016-05-18 17:53:46.943065 UTC - [Socket Thread]: I/nsHttp   Date: Wed, 18 May 2016 17:53:46 GMT
2016-05-18 17:53:46.943066 UTC - [Socket Thread]: I/nsHttp ]

- the second channel is created while the first is still waiting for the response
- the second channel is blocked on waiting for the cache entry because the first one is still writing to it (and the entry needs to be revalidated):

2016-05-18 17:53:46.963366 UTC - [Main Thread]: V/nsHttp   wait for entry completion, response is not resumable
2016-05-18 17:53:46.963393 UTC - [Main Thread]: D/nsHttp Must validate since response contains 'no-cache' header

- the first channel's transaction's connection gets from some reason closed with ABORT (4004), it's about 20 seconds after the last activity:

Patrick? ^^^

2016-05-18 17:54:06.020247 UTC - [Socket Thread]: V/nsHttp nsHttpConnection::Close [this=12c8e6460 reason=80004004]

- the cache entry is incomplete and non-resumable (never was!) and hence is not reused by the second channel


To make a response resumable you must add following headers:
* Accept-ranges: bytes
* ETag: myetag

..and not "no-cache", which just means to always validate with the server, even if the cached response is not stale.
Flags: needinfo?(honzab.moz) → needinfo?(mcmanus)
(In reply to Honza Bambas (:mayhemer) from comment #14)
> - the second channel is created while the first is still waiting for the
> response
> - the second channel is blocked on waiting for the cache entry because the
> first one is still writing to it (and the entry needs to be revalidated):

Okay, let me see if I can track this down, I thought that the first channel should have been closed before the second one was created.  But the second point is still confusing me, knowing nothing about the cache is organized, I would have expected the entry for this URL to just be something indicating that it is uncacheable, and once the headers have been received there is enough information to finalize that entry.  I'm surprised that the (uncacheable) data has to be fully received to finalize the cache entry.

> - the first channel's transaction's connection gets from some reason closed
> with ABORT (4004), it's about 20 seconds after the last activity:
> 
> Patrick? ^^^

I let the test hang around deadlocked for about 20 seconds before killing it, I think the ABORT must have come from the shutdown process, that doesn't seem surprising to me.
Flags: needinfo?(mcmanus)
Re cache behavior: it would mean to explain how the http cache works, long story.
Re connection cancel: aha!  I missed that you killed the test.  OK, then the problem is apparently in the server - it doesn't close the connection and the channel hangs forever waiting for the remaining 15 or so bytes.  The sjs has to kill the connection.  HTTP/2 will just wait indefinitely.  The test is simply wrong.

BTW, wasn't your test the last one before patrick added the two more tests of his?
(In reply to Honza Bambas (:mayhemer) from comment #16)
> BTW, wasn't your test the last one before patrick added the two more tests
> of his?

Ignore this line, the structure of the test is apparently a bit different than I've thought.
(In reply to Honza Bambas (:mayhemer) from comment #16)
> Re connection cancel: aha!  I missed that you killed the test.  OK, then the
> problem is apparently in the server - it doesn't close the connection and
> the channel hangs forever waiting for the remaining 15 or so bytes.  The sjs
> has to kill the connection.  HTTP/2 will just wait indefinitely.  The test
> is simply wrong.

Ah, so the fact that we've closed the connection from the client side doesn't matter as long as the server hasn't closed?  This seems potentially bad, a buggy server could trigger this scenario for a "real" download or fetch.  But regardless, I should be able to fix this now...
The test is trying to simulate a partial response to the first (non-range) requests.  Is return Content-Length: 31 but only delivers 15 bytes.  It goes via HTTP/2, AFAIU.  In HTTP/1 world a partial response may only happen when the connection is torn down or the server is killed (whatever it means) before it finishes sending the whole response to the client.

How to simulate partial response in HTTP/2 is something I don't know.  Contact either Patrick (our HTTP/2 guru) or author of the tests.
See the bug for extended discussion, trying to fix this with re-used
URLs doesn't seem practical so just use unique URLs for partial downloads.

Review commit: https://reviewboard.mozilla.org/r/55674/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/55674/
Comment on attachment 8757126 [details]
MozReview Request: Bug 1272425 Fix downloads misc tests r?kmag

https://reviewboard.mozilla.org/r/55674/#review52558
Attachment #8757126 - Flags: review?(kmaglione+bmo) → review+
https://reviewboard.mozilla.org/r/53634/#review52560

You can avoid creating review requests for temporary commits by passing -c to `hg push`.
Attachment #8757125 - Attachment is obsolete: true
(In reply to Kris Maglione [:kmag] from comment #23)
> https://reviewboard.mozilla.org/r/53634/#review52560
> 
> You can avoid creating review requests for temporary commits by passing -c
> to `hg push`.

yeah, i absent-mindedly did -r instead, let me fix that.
Keywords: leave-open
https://hg.mozilla.org/mozilla-central/rev/29c8e0e38c63
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Product: Toolkit → WebExtensions
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: