Closed Bug 1117763 Opened 5 years ago Closed 5 years ago

JavaScript is not executed after range request (due to partly cached file)

Categories

(Core :: Networking: Cache, defect)

34 Branch
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox38 --- fixed

People

(Reporter: julisys, Assigned: mayhemer)

Details

(Keywords: reproducible, testcase)

Attachments

(3 files, 2 obsolete files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5

Steps to reproduce:

In some cases our GWT application does not start up. GWT creates large js-Files which are cached by Firefox.
We observed that
- the file is not cached completely for some reason (about:cache)
- while reloading firefox is trying to get the missing bytes by performing a range request
- that javascript file will never be executed

We isolated the core problem with plain javascript:

- generate large js files with 3 lines. 
    first line calls js-code (add(i, 'start');).
    second line contains lots of spaces (~30mb). 
    third line calls js-code again (add(i, 'end');).
  We did this using the following groovy script

    def size = 31450000
    for (i in 0..12) {
        def f = new File("/Users/user/apache/htdocs/big${i}.cache.js")
        f.delete()
        f << "add($i, 'start');\n" << " ".multiply(size) << "\nadd($i, 'end');"
    }

- copy ct.html (see attached file) in your htdocs next to the big.cache.js files
- clear firefox cache
- navigate to ct.html
- after ct.html is loaded completely, reload the page


Actual results:

At least one of the big.cache.js files was not cached completely (more than zero but less than the expected size. seen in about:cache). Thus, firefox performs a range request for the rest of the file, which we can see on Firebugs network sheet (Status: 206 Partial Content).
The affected files javascript will not be executed.


Expected results:

No matter if the request is partial or not (due to previous caching), the javascript file should be fetched and executed.
Additional info:

Apache is configured according to
http://www.gwtproject.org/doc/latest/DevGuideCompilingAndDebugging.html#perfect_caching
which means for files containing ".cache." apache sends an "Expires" header of one year via the expires module:

<Files *.cache.*>
  ExpiresActive on
  ExpiresDefault "now plus 1 year"
</Files>
OS: Mac OS X → All
Hardware: x86 → All
Flags: needinfo?(gijskruitbosch+bugs)
Product: Firefox → Core
NB: cache instruction in comment #1 is important in order to reproduce this. :-)

Honza, looking at this through the network monitor, it seems the requests complete, but on the other hand the page loading icon never stops spinning, and the status tooltip keeps saying "transferring data from localhost..." which doesn't make much sense to me. And indeed, in those cases the JS from those files isn't actually run. Based on the confusion between the network monitor and the spinner, my first suspicion is an issue with networking/cache... can you take a look?
Status: UNCONFIRMED → NEW
Component: Untriaged → Networking: Cache
Ever confirmed: true
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(honzab.moz)
Attached file testcase.zip
This is funnily inefficient, but bugzilla doesn't care, disk is cheap, and if you don't have groovy, this is easier to set up - just unzip somewhere that's served by apache (including the .htaccess) and you should be OK.
Attached patch v1 (obsolete) — Splinter Review
Thanks for the test case!

cause:
- 10 parallel js files floods the cache write back log (even on a fast SSD when in debug build)
- we break most of them with OUT_OF_MEMORY (expected and correct)
- those are left partial but CacheFiles are left in failure state since the write was forcibly broken
- trying to use it again: C-L != entry.size -> do a range request
- we setup the range request (add headers)
- we fail to open input stream of the entry (correct)
- we leave the range request headers
=> we don't expect 206 => everlasting spin

Simple (upliftable) fix is to remove the headers on input stream open failure.  Better (more efficient) fix OTOH could be to *not* treat the OOM error as a fatal one on CacheFile.  Input stream would then open.  In reality, the entry, when purged out of memory and later reloaded, will be usable since some content has been stored.

Michal, would that work?
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Flags: needinfo?(honzab.moz)
Attachment #8551524 - Flags: review?(michal.novotny)
Michal, this is a pretty serious regression!  Can you please take a ASAP?  Thanks.
Flags: needinfo?(michal.novotny)
Comment on attachment 8551524 [details] [diff] [review]
v1

Review of attachment 8551524 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Honza Bambas (:mayhemer) from comment #4)
> failure.  Better (more efficient) fix OTOH could be to *not* treat the OOM
> error as a fatal one on CacheFile.  Input stream would then open.  In
> reality, the entry, when purged out of memory and later reloaded, will be
> usable since some content has been stored.
> 
> Michal, would that work?

I doubt this can be done easily. E.g. if we're going to try to survive OOM in the output stream, who is going to doom the entry if it's not resumable? We don't have this information on this level. Given that OOM in the output stream doesn't happen too often (see NETWORK_CACHE_V2_OUTPUT_STREAM_STATUS telemetry), I wouldn't complicate the code.

OTOH, it seems that we don't doom the entry immediately when CacheFileOutputStream::Write() fails so the entry exists in the failed state long enough that another channel gets this failed entry. So I would prefer to doom the entry as soon as possible to eliminate the possibility that opening input stream fails.

::: netwerk/protocol/http/nsHttpChannel.h
@@ +366,5 @@
>                       bool ignoreMissingPartialLen = false) const;
>      nsresult MaybeSetupByteRangeRequest(int64_t partialLen, int64_t contentLength,
>                                          bool ignoreMissingPartialLen = false);
>      nsresult SetupByteRangeRequest(int64_t partialLen);
> +    void UnsetupByteRangeRequest();

Please replace "unsetup" with some other word. I don't think it is a valid word in English.
Attachment #8551524 - Flags: review?(michal.novotny) → review+
Flags: needinfo?(michal.novotny)
(In reply to Michal Novotny (:michal) from comment #6)
> Given that OOM in the output
> stream doesn't happen too often (see NETWORK_CACHE_V2_OUTPUT_STREAM_STATUS
> telemetry), I wouldn't complicate the code.

It's 411.56M (0.1%) for Release 34 (desktop).  The concern is that we must doom the entry when it's not resumable, but on the level of CacheFile this information is not known (cannot be known).  Hence, it would be far more complicated, right.

> 
> OTOH, it seems that we don't doom the entry immediately when
> CacheFileOutputStream::Write() fails so the entry exists in the failed state
> long enough that another channel gets this failed entry. 

It can get it, but won't be able to open the input stream!  So we don't break.  Bug 1066726.

> So I would prefer
> to doom the entry as soon as possible to eliminate the possibility that
> opening input stream fails.

Don't we call back to CacheService to doom the entry?  Anyway, the primary goal is not to cache but to let things work w/o any visible user impact.

> 
> ::: netwerk/protocol/http/nsHttpChannel.h
> @@ +366,5 @@
> >                       bool ignoreMissingPartialLen = false) const;
> >      nsresult MaybeSetupByteRangeRequest(int64_t partialLen, int64_t contentLength,
> >                                          bool ignoreMissingPartialLen = false);
> >      nsresult SetupByteRangeRequest(int64_t partialLen);
> > +    void UnsetupByteRangeRequest();
> 
> Please replace "unsetup" with some other word. I don't think it is a valid
> word in English.

"Untie" could be better.

Thanks.
Attached patch v1.1 (obsolete) — Splinter Review
Attachment #8551524 - Attachment is obsolete: true
Attachment #8562939 - Flags: review+
could you check the try run, if the failures are related ?
Flags: needinfo?(honzab.moz)
Keywords: checkin-needed
Attached patch v1.1 (ci msg)Splinter Review
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c34ac09f8042
Attachment #8562939 - Attachment is obsolete: true
Flags: needinfo?(honzab.moz)
Attachment #8564174 - Flags: review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/20ffb3ebb7c5
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
You need to log in before you can comment on or make changes to this bug.