Closed Bug 1368887 Opened 3 years ago Closed 2 years ago

Crash in mozilla::net::CacheFile::IsKilled

Categories

(Core :: Networking: Cache, defect, critical)

55 Branch
x86
Windows 10
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla56
Tracking Status
firefox-esr52 --- unaffected
firefox54 --- unaffected
firefox55 --- disabled
firefox56 --- verified

People

(Reporter: ananuti, Assigned: michal)

References

Details

(Keywords: crash, Whiteboard: [necko-active])

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-810dd8bc-cf1e-4107-8d0f-398d00170531.
=============================================================

It crashes randomly if dom.script_loader.bytecode_cache.enabled=true.
Flags: needinfo?(nicolas.b.pierron)
Ekanan, Thanks for testing this flag and reporting this issue.  This is really helpful :)
Did you enabled dom.script_loader.bytecode_cache.eager as well?

I will note that as of today (before Bug 1368675 landing), enabling dom.script_loader.bytecode_cache.enabled is doing nothing else than requesting the bytecode alternate data type.  We should not even attempt to save any alternate data content, which makes me wonder about this crash on a WriteEvent, when the CacheFile got freed.

I look if I can find a way to reproduce this issue locally.

Valentin, maybe you have more ideas on what might be going wrong?
Flags: needinfo?(valentin.gosu)
Flags: needinfo?(ananuti)
(In reply to Nicolas B. Pierron [:nbp] from comment #1)
> Ekanan, Thanks for testing this flag and reporting this issue.  This is
> really helpful :)
> Did you enabled dom.script_loader.bytecode_cache.eager as well?

I didn't.

I got this crash before bug 1364118 landed on nightly bp-9212bd2e-39c0-4abe-8c30-0bf170170511 so I think it doesn't matter.
Flags: needinfo?(ananuti)
I don't know the code terribly well, but it looks to me that WriteEvent has a CacheFileChunk as listener on which it calls IsKilled.
The IsKilled implementation returns mFile->IsKilled, but mFile may be nulled out in [1]
We probably only need a null check. Michal, is my logic correct?

[1] http://searchfox.org/mozilla-central/rev/b318c7dca7392bd16c0b11929f55b1be133f0b31/netwerk/cache2/CacheFile.cpp#1724
Flags: needinfo?(valentin.gosu) → needinfo?(michal.novotny)
(In reply to Ekanan Ketunuti from comment #2)
> I got this crash before bug 1364118 landed on nightly
> bp-9212bd2e-39c0-4abe-8c30-0bf170170511 so I think it doesn't matter.

Before bug 1364118, when enabled, the cache was eagerly generating and saving bytecode only for large scripts (≳ 100kB).
(In reply to Valentin Gosu [:valentin] from comment #3)
> The IsKilled implementation returns mFile->IsKilled, but mFile may be nulled
> out in [1]
> We probably only need a null check. Michal, is my logic correct?

This shouldn't happen, because the chunk is referenced by mDiscardedChunks and the write event, so CacheFile::DeactivateChunk wouldn't be called for this chunk.

Ekanan, if you can reproduce it easily, could you please provide a log? Use following log modules: timestamp,sync,rotate:500,cache2:5
Flags: needinfo?(michal.novotny) → needinfo?(ananuti)
Attached file bug-1368887.log.tar.xz
(In reply to Michal Novotny (:michal) from comment #5)
> Ekanan, if you can reproduce it easily, could you please provide a log? Use
> following log modules: timestamp,sync,rotate:500,cache2:5

Hard to reproduce. Yesterday I tried to play with this all day long but ain't got any luck. Today it did crash for me after short browsing.

Here comes the result for
 - dom.script_loader.bytecode_cache.eager;true
 - dom.script_loader.bytecode_cache.enabled;true
 - log modules: timestamp,sync,rotate:500,cache2:5
Flags: needinfo?(ananuti) → needinfo?(michal.novotny)
Assignee: nobody → michal.novotny
Flags: needinfo?(michal.novotny)
Whiteboard: [necko-active]
The problem is that CacheFile::OnChunkWritten called for discarded chunk proceeds normally and removes the newly created chunk with the new data.
Flags: needinfo?(nicolas.b.pierron)
Attached patch patch v1Splinter Review
changes in this patch:

- fixed an edge case when we could discard a chunk that is references by an input stream
- failing listeners before discarding chunk
- CacheFile::OnChunkRead and CacheFile::OnChunkWritten handle discarded chunks correctly
- fixed CacheFileInputStream::CanRead to respect end of the normal data
- added few assertions to make sure we truncate at the alt-data offset and that no input stream is beyond this point
Attachment #8879256 - Flags: review?(valentin.gosu)
Comment on attachment 8879256 [details] [diff] [review]
patch v1

Review of attachment 8879256 [details] [diff] [review]:
-----------------------------------------------------------------

It looks good to me, but I'm not comfortable enough with the code to be the only reviewer. Hopefully Honza can help with a quick review.
Attachment #8879256 - Flags: review?(valentin.gosu)
Attachment #8879256 - Flags: review?(honzab.moz)
Attachment #8879256 - Flags: review+
Comment on attachment 8879256 [details] [diff] [review]
patch v1

Review of attachment 8879256 [details] [diff] [review]:
-----------------------------------------------------------------

has this patch been through try?  I don't see any link to try in the bug.

OK, I could star into this for hours and not getting it much anyway.  If this has been tested (I presume an automated crash test would be too complicated?) then let's land it.

::: netwerk/cache2/CacheFile.cpp
@@ +1927,5 @@
>  
>    MOZ_ASSERT(aOffset <= mDataSize);
> +  // If we ever need to truncate on non alt-data boundary, we need to handle
> +  // existing input streams.
> +  MOZ_ASSERT(aOffset == mAltDataOffset, "Truncating normal data not implemented");

so, this patch removes the ability to truncate normal data?  or it never was implemented (or used?)

do we still need the MOZ_ASSERT(aOffset <= mDataSize); assertion?

@@ +1971,5 @@
> +    if (maxInputChunk < inputChunk) {
> +      maxInputChunk = inputChunk;
> +    }
> +
> +    MOZ_RELEASE_ASSERT(mInputs[i]->GetPosition() <= aOffset);

how can this ever be ensured?
Attachment #8879256 - Flags: review?(honzab.moz) → review+
(In reply to Honza Bambas (:mayhemer) from comment #12)
> has this patch been through try?  I don't see any link to try in the bug.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=16656d502af724a9c1ab9934b33885f0901a4b39

> OK, I could star into this for hours and not getting it much anyway.  If
> this has been tested (I presume an automated crash test would be too
> complicated?) then let's land it.

It's almost impossible to make an automated test. We would need to expose some debugging API that would synchronize all the involved components.

> ::: netwerk/cache2/CacheFile.cpp
> @@ +1927,5 @@
> >  
> >    MOZ_ASSERT(aOffset <= mDataSize);
> > +  // If we ever need to truncate on non alt-data boundary, we need to handle
> > +  // existing input streams.
> > +  MOZ_ASSERT(aOffset == mAltDataOffset, "Truncating normal data not implemented");
> 
> so, this patch removes the ability to truncate normal data?  or it never was
> implemented (or used?)

This was never needed, so it wasn't implemented.

> do we still need the MOZ_ASSERT(aOffset <= mDataSize); assertion?

It is probably redundant.

> @@ +1971,5 @@
> > +    if (maxInputChunk < inputChunk) {
> > +      maxInputChunk = inputChunk;
> > +    }
> > +
> > +    MOZ_RELEASE_ASSERT(mInputs[i]->GetPosition() <= aOffset);
> 
> how can this ever be ensured?

Truncate() is called from 2 places and both check that there is no input stream for alt-data. So all existing input stream are reading normal data and the position cannot go beyond mAltDataOffset.
Pushed by mnovotny@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/acd2949f5415
Crash in mozilla::net::CacheFile::IsKilled, r=valentin, r=honzab
https://hg.mozilla.org/mozilla-central/rev/acd2949f5415
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.