Unable to load from alternate data after re-saving it.

VERIFIED FIXED in Firefox 55

Status

()

Core
Networking: Cache
VERIFIED FIXED
8 months ago
7 months ago

People

(Reporter: nbp, Assigned: valentin)

Tracking

(Blocks: 1 bug)

unspecified
mozilla55
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox55 fixed)

Details

(Whiteboard: [necko-active])

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(3 attachments)

(Reporter)

Description

8 months ago
For the bytecode cache implementation, we currently save the content of the bytecode under the alternative output stream "javascript/moz-bytecode".

This content includes the Sub-Resource Integrity (SRI) hash.  This hash cannot be inferred ahead of time.  Also, even if we were to take the hash provided by the document, we would still want to be able to load the same resource, even if no SRI hash were provided.  Thus it cannot be part of the name of the alternate data name.

The test case[1] checks that the nsScriptLoader behave as expected by instrumenting the code of the nsScriptLoader function for emitting events on the script tag.  These events are post-process by the test case which follow the logic of a state machine to provide a unique name to each of the potential code paths.

When running the test case, with e10s enabled; it either fails when we attempt to save the javascript/moz-bytecode a second time, with the following signature:

IPDL protocol error: actor has been |delete|d
[Child 12904] ###!!! ABORT: IPDL error [PNeckoChild]: "actor has been |delete|d". abort()ing as a result.: file /home/nicolas/mozilla/contrib-push/ipc/glue/ProtocolUtils.cpp, line 302
#01: mozilla::net::PNeckoChild::Write(mozilla::net::PHttpChannelChild const*, IPC::Message*, bool) (/home/nicolas/mozilla/_build/firefox/bugzil.la/js-startup-cache/wip/x64/clang/dbg/ipc/ipdl/PNeckoChild.cpp:4193)
#02: mozilla::net::HttpChannelChild::OpenAlternativeOutputStream(nsACString_internal const&, nsIOutputStream**) (/home/nicolas/mozilla/contrib-push/netwerk/protocol/http/HttpChannelChild.cpp:2438)
#03: non-virtual thunk to mozilla::net::HttpChannelChild::OpenAlternativeOutputStream(nsACString_internal const&, nsIOutputStream**) (/home/nicolas/mozilla/contrib-push/netwerk/protocol/http/HttpChannelChild.cpp:2434)
#04: nsScriptLoader::EvaluateScript(nsScriptLoadRequest*) (/home/nicolas/mozilla/contrib-push/dom/base/nsScriptLoader.cpp:2389)

Or it fails because it cannot load the bytecode after re-saving it:

<---- await [3] ---->
ScriptLoadRequest (357bf10): mProgress = 0
ScriptLoadRequest (357bf10): Maybe request bytecode
nsScriptLoadHandler::OnIncrementalData, failed to decode SRI, restart request
ScriptLoadRequest (357bf10): mProgress = 1
ScriptLoadRequest (357bf10): Stream complete (url = …/file_js_cache.js)
ScriptLoadRequest (357bf10): Source length = 0
ScriptLoadRequest (357bf10): Query nsICacheInfoChannel = 3563f10
## ScriptLoader event: scriptloader_load_bytecode
## ScriptLoader event: scriptloader_fallback
## ScriptLoader event: scriptloader_load_source
ScriptLoadRequest (357bf10): Stream complete (url = …/file_js_cache.js)
ScriptLoadRequest (357bf10): Source length = 69
ScriptLoadRequest (357bf10): Query nsICacheInfoChannel = 2e38aa0
ScriptLoadRequest (357bf10): Sync (off-thread parsing) and Execute
<---- might crash here, with the above stack trace ---->
ScriptLoadRequest (357bf10): Write bytecode cache (rv = 0, length = 809, written = 809)
ScriptLoadRequest (357bf10): Closing (rv = 0)
## ScriptLoader event: scriptloader_encode_and_execute
## ScriptLoader event: scriptloader_bytecode_saved
<---- await [4] & await [5] ---->
ScriptLoadRequest (26b1b20): mProgress = 0
ScriptLoadRequest (26b1b20): Maybe request bytecode
ScriptLoadRequest (31e0d30): mProgress = 0
ScriptLoadRequest (31e0d30): Maybe request bytecode
ScriptLoadRequest (26b1b20): Stream complete (url = …/file_js_cache.js)
ScriptLoadRequest (26b1b20): Source length = 69
ScriptLoadRequest (26b1b20): Query nsICacheInfoChannel = 2cf6cf0
## ScriptLoader event: scriptloader_load_source
ScriptLoadRequest (31e0d30): Stream complete (url = …/file_js_cache.js)
ScriptLoadRequest (31e0d30): Source length = 69
ScriptLoadRequest (31e0d30): Query nsICacheInfoChannel = 2e38aa0
ScriptLoadRequest (26b1b20): Sync (off-thread parsing) and Execute
ScriptLoadRequest (26b1b20): Write bytecode cache (rv = 0, length = 809, written = 809)
ScriptLoadRequest (26b1b20): Closing (rv = 0)
ScriptLoadRequest (31e0d30): Sync (off-thread parsing) and Execute
ScriptLoadRequest (31e0d30): Write bytecode cache (rv = 0, length = 761, written = 761)
ScriptLoadRequest (31e0d30): Closing (rv = 0)
## ScriptLoader event: scriptloader_encode_and_execute
## ScriptLoader event: scriptloader_bytecode_saved

The "ScriptLoadRequest" messages are coming from the MOZ_LOG evironment variable and report the state of the nsScriptLoadHandler / nsScriptLoader.  The "## ScriptLoader event:" messages are the event queued by the script loader to let the test case trace the execution.

In this case, the test "[4]" in the test case [2] is expecting a trace which loads from the bytecode cache, instead of a trace which loads from the source.  Note, loading a single time, or twice does not seems to change the outcome of the test case.

Note, the same test already check ("[1]" & "[2]") that we are able to write alternate data to the cache, and to load them correctly.

[1]
The test case can be viewed:
  https://github.com/mozilla/gecko-dev/compare/master...nbp:bugzil.la/js-startup-cache/wip-test-1

You can check it out by doing a:
  git clone https://github.com/nbp/gecko-dev/
  git checkout bugzil.la/js-startup-cache/wip

After building, the test case can be executed with:
  MOZ_LOG='ScriptLoader:5,timestamp,sync' \
  mach mochitest dom/base/test/test_script_loader_js_cache.html

[2]
https://github.com/mozilla/gecko-dev/compare/master...nbp:bugzil.la/js-startup-cache/wip-test-1#diff-1542576ff0e9e767a91cda89a047b409R156
Flags: needinfo?(valentin.gosu)
Valentin, can you help here please?
Assignee: nobody → valentin.gosu
Whiteboard: [necko-active]
(Assignee)

Comment 2

8 months ago
Nicolas, did you maybe forget to add file_js_cache_with_sri.html to the branch?
Flags: needinfo?(valentin.gosu) → needinfo?(nicolas.b.pierron)
(Reporter)

Comment 3

8 months ago
(In reply to Valentin Gosu [:valentin] from comment #2)
> Nicolas, did you maybe forget to add file_js_cache_with_sri.html to the
> branch?

I did, sorry.
I pushed the file to the same branch on github, you should see it now if you pull the new changes.
Flags: needinfo?(nicolas.b.pierron)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Not a single word of explanation of the patch/patches.  What is the cause of this bug?  Also, what about the two other [mq] patches?  Are they part of the fix too?
Flags: needinfo?(valentin.gosu)
FYI, bug1341343-overwrite-test.patch is a test that fails on rewriting alt-data. The problem is that CacheFile::Truncate doesn't truncate the entry correctly because it doesn't update chunk hashes. I'm working on the fix.
(Assignee)

Comment 10

8 months ago
(In reply to Honza Bambas (:mayhemer) from comment #8)
> Not a single word of explanation of the patch/patches.  What is the cause of
> this bug?  Also, what about the two other [mq] patches?  Are they part of
> the fix too?

The patch I asked you for review is quite simple. We should not send an IPC message if the IPC channel has been shut down, otherwise we get the crash described in comment 0.
The next two patches are unrelated to the first, but are connected to Nicolas's use case, which I reduced to a xpcshell-test.   One of the issues is that after we save alt-data, we changed the value of the Content-Length header, which wasn't right, and caused the cache entry not to be used. It also uncovered the bug Michal described in comment 9.
Flags: needinfo?(valentin.gosu)
(In reply to Valentin Gosu [:valentin] from comment #10)
> (In reply to Honza Bambas (:mayhemer) from comment #8)
> > Not a single word of explanation of the patch/patches.  What is the cause of
> > this bug?  Also, what about the two other [mq] patches?  Are they part of
> > the fix too?
> 
> The patch I asked you for review is quite simple. We should not send an IPC
> message if the IPC channel has been shut down, otherwise we get the crash
> described in comment 0.

OK, to get rid of the crash, yes.  But is it correct to just fail here?  Is IPC expected to be closed at this point?  Do we have a solution here to save alt-data even that late?  Is it needed or is it OK to loose it?

> The next two patches are unrelated to the first, but are connected to
> Nicolas's use case, which I reduced to a xpcshell-test.  

Aha.

> One of the issues
> is that after we save alt-data, we changed the value of the Content-Length
> header, which wasn't right, and caused the cache entry not to be used. 

Is there a separate bug for that?

> It
> also uncovered the bug Michal described in comment 9.

OK, separate bug as well, I presume.

Thanks, now it makes sense.

Comment 12

8 months ago
mozreview-review
Comment on attachment 8839957 [details]
Bug 1341343 - Don't send AltDataOutputStreamConstructor IPC message if mIPCOpen is false

https://reviewboard.mozilla.org/r/114536/#review117144
Attachment #8839957 - Flags: review?(honzab.moz) → review+
Depends on: 1343916
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 16

7 months ago
mozreview-review
Comment on attachment 8840567 [details]
Bug 1341343 - Test for overwriting alt-data

https://reviewboard.mozilla.org/r/115034/#review121286
Attachment #8840567 - Flags: review?(michal.novotny) → review+

Comment 17

7 months ago
mozreview-review
Comment on attachment 8839957 [details]
Bug 1341343 - Don't send AltDataOutputStreamConstructor IPC message if mIPCOpen is false

https://reviewboard.mozilla.org/r/114536/#review121288
Attachment #8839957 - Flags: review+

Comment 18

7 months ago
mozreview-review
Comment on attachment 8840566 [details]
Bug 1341343 - Make sure the channel returns the correct contentLength for alt-data without saving it to the metadata

https://reviewboard.mozilla.org/r/115032/#review121292
Attachment #8840566 - Flags: review?(michal.novotny) → review+
Nicolas, these patches should fix the bug when rewriting alt-data. Anyway, rewriting whole alt-data only because you want to add something like hash is very inefficient. This should be part of the metadata. Could you please describe exactly the use case so we can find a better solution?

Also please note that opening alt-data output stream fails if there is any alt-data input stream opened. So let's say there is some entry with alt-data without the hash and you want to rewrite it to contain the hash. You won't be able to do it in case the same entry is opened in another tab due to opened input stream.
Flags: needinfo?(nicolas.b.pierron)
(Assignee)

Comment 20

7 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=332791408ac8
(Reporter)

Comment 21

7 months ago
(In reply to Michal Novotny (:michal) from comment #19)
> Nicolas, these patches should fix the bug when rewriting alt-data. Anyway,
> rewriting whole alt-data only because you want to add something like hash is
> very inefficient. This should be part of the metadata. Could you please
> describe exactly the use case so we can find a better solution?

The problem I faced is described in Bug 1288104 comment 0, to make it short:
 - We have different hash function, even if we have most likely one per kind per source.
 - The hash is computed on the downloaded bits, which cannot be recovered from the bytecode. The bytecode lose the ability to compute the hashes as we convert the input to 16 bits chars.

For the moment, I have no idea if this is a frequent case, and I should probably add telemetry to check if we fallback on this case.

If this is a frequent case, then we should probably take the most common hash function and store this hash by default.

> Also please note that opening alt-data output stream fails if there is any
> alt-data input stream opened. So let's say there is some entry with alt-data
> without the hash and you want to rewrite it to contain the hash. You won't
> be able to do it in case the same entry is opened in another tab due to
> opened input stream.

Ok, so it sounds that one can prevent saving the bytecode cache, by racing with the bytecode cache saving of scripts.  I would not expect anybody to do that on purpose, and the fact that this is a race should limit the impact of it?

I guess, that I should monitor that with the telemetry probes to be added.
Flags: needinfo?(nicolas.b.pierron)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 25

7 months ago
Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/7ea73c4bb75a
Test for overwriting alt-data r=michal
https://hg.mozilla.org/integration/autoland/rev/ef345da93a6e
Don't send AltDataOutputStreamConstructor IPC message if mIPCOpen is false r=mayhemer,michal
https://hg.mozilla.org/integration/autoland/rev/1b4dd2b452dd
Make sure the channel returns the correct contentLength for alt-data without saving it to the metadata r=michal

Comment 26

7 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/7ea73c4bb75a
https://hg.mozilla.org/mozilla-central/rev/ef345da93a6e
https://hg.mozilla.org/mozilla-central/rev/1b4dd2b452dd
Status: NEW → RESOLVED
Last Resolved: 7 months ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
(Reporter)

Comment 27

7 months ago
(In reply to Michal Novotny (:michal) from comment #19)
> Nicolas, these patches should fix the bug when rewriting alt-data.

I confirm, the test case from comment 0 fully pass now :)
Thanks.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.