Closed Bug 1066726 Opened 5 years ago Closed 5 years ago

Concurrent cache read and write issues

Categories

(Core :: Networking: Cache, defect, major)

32 Branch
defect
Not set
major

Tracking

()

VERIFIED FIXED
mozilla35
Tracking Status
firefox32 --- wontfix
firefox33 --- verified
firefox34 --- verified
firefox35 --- verified
firefox-esr31 --- unaffected
b2g-v2.0 --- affected
b2g-v2.1 --- fixed
b2g-v2.2 --- fixed

People

(Reporter: mayhemer, Assigned: mayhemer)

References

Details

Attachments

(3 files, 2 obsolete files)

This is a cover bug for bug 1064408, bug 1036939 and bug 1048348.  They are all pretty connected together, I will dup them all to this single bug and have a single patch for all of them here.
Duplicate of this bug: 1064408
Duplicate of this bug: 1036939
Duplicate of this bug: 1048348
Attached patch v1Splinter Review
Patch description:
- bug 1064408 comment 15
- bug 1048348 comment 4

In general:
- when the output stream of the entry is closed with a fatal error, set state of the file to that error
- forbid opening input stream anymore (it's an atomic way to indicate the entry is in a bad shape and useless that http channel can handle well w/o any more changes needed)
- don't throw the entry on SetPredictedSize failure since it:
  - would crash (bug 1048348)
  - it wouldn't engage the interrupted concurrent write resume

The rest should be commented in the code and covered by tests


https://tbpl.mozilla.org/?tree=Try&rev=62da06db24ad
Attachment #8488773 - Flags: review?(michal.novotny)
Blocks: cache2enable
Duplicate of this bug: 1067103
Comment on attachment 8488773 [details] [diff] [review]
v1

Review of attachment 8488773 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/test/unit/test_cache2-28-concurrent_read_resumable_entry_size_zero.js
@@ +54,5 @@
> +
> +  var chan1 = make_channel(URL + "/content");
> +  chan1.asyncOpen(new ChannelListener(firstTimeThrough, null), null);
> +  var chan2 = make_channel(URL + "/content");
> +  chan2.asyncOpen(new ChannelListener(secondTimeThrough, null), null);

Is it always ensured that the second channel gets the cache entry before the entry is doomed by the first channel?
Attachment #8488773 - Flags: review?(michal.novotny) → review+
Duplicate of this bug: 1067564
(In reply to Michal Novotny (:michal) from comment #6)
> Comment on attachment 8488773 [details] [diff] [review]
> v1
> 
> Review of attachment 8488773 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> :::
> netwerk/test/unit/test_cache2-28-concurrent_read_resumable_entry_size_zero.js
> @@ +54,5 @@
> > +
> > +  var chan1 = make_channel(URL + "/content");
> > +  chan1.asyncOpen(new ChannelListener(firstTimeThrough, null), null);
> > +  var chan2 = make_channel(URL + "/content");
> > +  chan2.asyncOpen(new ChannelListener(secondTimeThrough, null), null);
> 
> Is it always ensured that the second channel gets the cache entry before the
> entry is doomed by the first channel?

Definitely is, the second channel gets the entry (OnCheck/OnAvail calls) when the first channel fills the metadata on the entry (what happens on the main thread in OnStart).  Dooming happens later in OnData of chan1, by that time chan2 already pump-reads the entry.
https://hg.mozilla.org/mozilla-central/rev/1bb80b72541f
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Comment on attachment 8488773 [details] [diff] [review]
v1

Approval Request Comment
[Feature/regressing bug #]: cache2enable
[User impact if declined]: broken or missing web content delivery to user's screen even freshly loading from servers (no work around with clearing cache or so)
[Describe test coverage new/current, TBPL]: patch adds a bunch of tests checking all corner cases the patch is fixing ; otherwise deeply tested code (main HTTP web content load)
[Risks and why]: hard to say, this code is complex
[String/UUID change made/needed]: none
Attachment #8488773 - Flags: approval-mozilla-beta?
Attachment #8488773 - Flags: approval-mozilla-aurora?
For QA, I think this can be tested using the 3 bugs listed in comment #0.
Attachment #8488773 - Flags: approval-mozilla-beta?
Attachment #8488773 - Flags: approval-mozilla-beta+
Attachment #8488773 - Flags: approval-mozilla-aurora?
Attachment #8488773 - Flags: approval-mozilla-aurora+
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #13)
> https://hg.mozilla.org/releases/mozilla-aurora/rev/83a473a0e907
> https://hg.mozilla.org/releases/mozilla-beta/rev/8a1cffa4c130
> 
> Is this something that needs tracking for B2G v2.0 (b2g32)?

Probably yes.
Flags: needinfo?(honzab.moz)
In that case, please nominate this patch for b2g32 approval :). Might want to nominate it for v2.0 blocking status as well if you feel it's warranted.
Comment on attachment 8488773 [details] [diff] [review]
v1

Comment 11
Attachment #8488773 - Flags: approval-mozilla-b2g32?
Flags: qe-verify+
Reproduced with Nightly 2014-07-20 on Windows 7 64bit with STR from bug 1036939.
Verified as fixed with Firefox 33 beta 6 (Build ID: 20140922173023), latest Aurora 34.0a2 (Build ID: 20140923004002) and latest Nightly 35.0a1 (Build ID: 20140923030204) on Windows 7 64-bit, Linux 14.04 32-bit and Mac OS X 10.9.5 with STR from bug 1036939 and bug 1064408.
Also, in Socorro there are no crashes after fixing this issue for [@ mozilla::net::nsHttpChannel::OnDoneReadingPartialCacheEntry(bool*)] signature (bug 1048348): https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Anet%3A%3AnsHttpChannel%3A%3AOnDoneReadingPartialCacheEntry%28bool%2A%29&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&hang_type=any&date=2014-09-23+14%3A00%3A00&range_value=1#tab-reports
Attachment #8488773 - Flags: approval-mozilla-b2g32? → approval-mozilla-b2g32+
The patch is technically dependent on bug 1000338 that has been uplifted on 2014-10-05 14:46:11.  I think you can reland now (I will first locally check the build and let you know).
Depends on: 1000338
Flags: needinfo?(honzab.moz)
I tried to land both at the same time.
(In reply to Honza Bambas (:mayhemer) from comment #20)
> The patch is technically dependent on bug 1000338 that has been uplifted on
> 2014-10-05 14:46:11.  I think you can reland now (I will first locally check
> the build and let you know).

Err... the bug is actually bug 1013395.  That one is not worth an uplift, but the parts needed to compile are small enough, I'll backport them.  However, there is some risk this patch won't be stable (since it is completely untested) w/o all other patches landed on 33+, this is simply a disclaimer.
Depends on: 1013395
No longer depends on: 1000338
This builds, but tests are consistently failing around the code this patch is touching - something I was afraid of.  If we want to backport this to b2g/32 we need more patches to go along with it.  I have to search for all bugs that we have fixed for 33 and select those needed to make this patch work (best all of them).
Attachment #8501227 - Attachment is patch: true
Attached file Missing bugs for backport (obsolete) —
This is the list of cache2 bugs landed on Beta and not on B2G/32.  Not sure we can just select few to make the patch for this bug safely back-portable...
Flags: needinfo?(ryanvm)
Probably more a question for Bhavana.
Flags: needinfo?(ryanvm) → needinfo?(bbajaj)
(In reply to Honza Bambas (:mayhemer), not reading bugmail until Oct 28 from comment #24)
> Created attachment 8501348 [details]
> Missing bugs for backport
> 
> This is the list of cache2 bugs landed on Beta and not on B2G/32.  Not sure
> we can just select few to make the patch for this bug safely back-portable...

I am surprised, because everything that lands on beta should be backported to b2g32. Ryan , any thoughts here?
Flags: needinfo?(bbajaj) → needinfo?(ryanvm)
I'm not entirely sure how many of the ones in that list landed on 33+ vs. when 32 was on beta.
Flags: needinfo?(ryanvm)
OK, so basically the bottom 5 landed on 32 when it was still on beta. Everything from bug 1013638 (rev 8f03e0ef5809) and up landed on 33+. I can't help but think that a mass uplift isn't what we're going to want to do here given where we are in the B2G v2.0 release cycle.

So I guess that leaves either wontfixing this or figuring out what the minimum branch patch is that could be safely uplifted to the fix the bug. Honza, thoughts?
Flags: needinfo?(honzab.moz)
issue is resolved, removing outdated qa-wanted tag
Keywords: qawanted
Duplicate of this bug: 1065466
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #28)
> OK, so basically the bottom 5 landed on 32 when it was still on beta.
> Everything from bug 1013638 (rev 8f03e0ef5809) and up landed on 33+. I can't
> help but think that a mass uplift isn't what we're going to want to do here
> given where we are in the B2G v2.0 release cycle.
> 
> So I guess that leaves either wontfixing this or figuring out what the
> minimum branch patch is that could be safely uplifted to the fix the bug.
> Honza, thoughts?

I think finding a minimum patch would be very complicated and risky.  I vote for all or nothing.
Flags: needinfo?(honzab.moz)
Honza, you're the best-qualified person to weigh the risks of doing that in comparison to the severity of the bugs it'd fix. B2G v2.0 is at a point where we don't want to be taking any unnecessary risks, so what's your take on wontfixing vs. proceeding with a mass uplift?
Flags: needinfo?(honzab.moz)
I'm against the mass uplift.  Those patches are touching also a different code and there can easily be dependencies we are not aware of and totally break the stable b2g 2.0 branch.

There are two issue when this bug is left unfixed:
- when disk cache size is set to 0, there are some problems with showing favicons, images, sometimes pages (1036939 1064408 1067103); this is a non-standard setting that some people from some reason used to set to disable the cache
- null crash in nsHttpChannel::OnDoneReadingPartialCacheEntry

The first one can be left as a known issue because of the non-standard setting.

The second one can be worked around by one of the options:
1. we can null check and fail on null entry (can lead to a failed response and not sure of all other consequences)
2. we can disobey a single entry limit when doing a partial request (safe and easy to do)

I'll create a branch patch with the second solution (one line).
Flags: needinfo?(honzab.moz)
Thanks, sounds like the way to go :)
- mCacheEntry->SetPredictedDataSize not called for a partial response (avoid the crash, bug 1048348)
- concurrent cache reading completely disabled (explanation in the code comment)
- test_partial_response_entry_size_smart_shrink.js has been slightly modified to accept missing bug 237623 on this branch (comments in the test)
Attachment #8501227 - Attachment is obsolete: true
Attachment #8501348 - Attachment is obsolete: true
Attachment #8517537 - Flags: review?(michal.novotny)
(In reply to Honza Bambas (:mayhemer) from comment #36)
> Comment on attachment 8517537 [details] [diff] [review]
> for mozilla-b2g32_v2_0, v2
> 
> https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=bee6dd0d450c

As bad as this looks, most of the noise is expected based on being an old branch trying to run tests it wasn't designed to run. That said, https://treeherder.mozilla.org/ui/logviewer.html#?job_id=2961089&repo=try looks real to me.
Attachment #8517537 - Flags: review?(michal.novotny) → review+
Idea: we could disable the disk cache when capacity is 0 (internally, disabling the cache completely is something else then setting the capacity to 0 bytes right now).  I'll provide a second patch here.
Comment on attachment 8517537 [details] [diff] [review]
for mozilla-b2g32_v2_0, v2

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): the new HTTP cache
User impact if declined: comment 33
Testing completed: tbpl, has own automated tests added
Risk to taking this patch (and alternatives if risky): low, it disables some not highly significant perfomance features of the new cache
String or UUID changes made by this patch: none
Attachment #8517537 - Flags: approval-mozilla-b2g32?
Comment on attachment 8517537 [details] [diff] [review]
for mozilla-b2g32_v2_0, v2

Does this address comment 37?
Flags: needinfo?(honzab.moz)
Attachment #8488773 - Flags: approval-mozilla-b2g32+
Consider cache completely disabled when the capacity is set to 0.  I play with an idea to add this patch to m-c as well, Michal, what would you think?
Attachment #8519955 - Flags: review?(michal.novotny)
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #40)
> Comment on attachment 8517537 [details] [diff] [review]
> for mozilla-b2g32_v2_0, v2
> 
> Does this address comment 37?

Hmm.. not sure what was there to be addressed.  I don't know how run tests on the b2g32. is there a separate try repo for it?
Flags: needinfo?(honzab.moz)
You can push it to Try like any other branch. Just don't expect sane results from test suites that normally don't run on it.

As for what needs addressing:
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #37)
> That said, https://treeherder.mozilla.org/ui/logviewer.html#?job_id=2961089&repo=try
> looks real to me.
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #43)
> You can push it to Try like any other branch. Just don't expect sane results
> from test suites that normally don't run on it.
> 
> As for what needs addressing:
> (In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #37)
> > That said, https://treeherder.mozilla.org/ui/logviewer.html#?job_id=2961089&repo=try
> > looks real to me.

So, what is the problem?
Flags: needinfo?(ryanvm)
Sorted it out on IRC.
I was confused by comment 37 - there is a real _issue_ in the run:

14635 10:33:56 INFO - 2589 INFO TEST-UNEXPECTED-FAIL | /tests/netwerk/test/mochitests/test_partially_cached_content.html | First response should exist
14636 10:33:56 INFO - 2590 INFO TEST-UNEXPECTED-FAIL | /tests/netwerk/test/mochitests/test_partially_cached_content.html | Second response should exist

going to look into it.
Comment on attachment 8517537 [details] [diff] [review]
for mozilla-b2g32_v2_0, v2

Dropping based on the last comment
Attachment #8517537 - Flags: approval-mozilla-b2g32?
Flags: needinfo?(ryanvm)
Comment on attachment 8517537 [details] [diff] [review]
for mozilla-b2g32_v2_0, v2

Review of attachment 8517537 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Honza Bambas (:mayhemer) from comment #41)
> Created attachment 8519955 [details] [diff] [review]
> for mozilla-b2g32_v2_0, v2, improvement 1, v1
> 
> Consider cache completely disabled when the capacity is set to 0.  I play
> with an idea to add this patch to m-c as well, Michal, what would you think?

Agree.
Attachment #8519955 - Flags: review?(michal.novotny) → review+
(In reply to Honza Bambas (:mayhemer) from comment #49)
> Trying again also with the second patch:
> https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=c29e55c08865

Guys, I don't know how to properly test this.  I need some help here.
Flags: needinfo?(bbajaj)
(In reply to Honza Bambas (:mayhemer) from comment #50)
> (In reply to Honza Bambas (:mayhemer) from comment #49)
> > Trying again also with the second patch:
> > https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=c29e55c08865
> 
> Guys, I don't know how to properly test this.  I need some help here.

NI some build experts here you may be able help debug the failures or provide input..
Flags: needinfo?(ted)
Flags: needinfo?(mh+mozilla)
Flags: needinfo?(bbajaj)
Marcia, is this something you're able to help with?
Flags: needinfo?(mozillamarcia.knous)
(In reply to Liz Henry :lizzard from comment #52)
> Marcia, is this something you're able to help with?

Not really. Guessing tedm is the best one to help here.
Flags: needinfo?(mozillamarcia.knous)
Unfortunately it looks like you're running into problems because the build automation has changed since that branch. I don't know if there's an easy way to make things work, maybe mshal has some idea.
Flags: needinfo?(ted) → needinfo?(mshal)
This is the same issue I'm trying to fix in bug 1102974. Unfortunately I don't think there's an easy way to fix it - I have to find the right patches to uplift. I'm having issues figuring out what those are though without causing other problems to surface :/
Flags: needinfo?(mshal)
Well, I think I was able to address the build issues at least. Can you try to rebase on top of b2g32 and see if it's any better? I'm not sure if the test job failures are related or some other issue though.
(In reply to Michael Shal [:mshal] from comment #56)
> Well, I think I was able to address the build issues at least. Can you try
> to rebase on top of b2g32 and see if it's any better? I'm not sure if the
> test job failures are related or some other issue though.

Thank you.  Here it is:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=65d165912d34
(In reply to Honza Bambas (:mayhemer) from comment #57)
> Thank you.  Here it is:
> https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=65d165912d34

Looks like the builds are green, at least. Is that sufficient for you? I don't think we'll be able to get all the tests green on try, since it will always be using m-c's test configuration on try, not b2g32's. Some of them may be easy to address (for example, it looks like the android tests just need an updated tooltool manifest), but from what I can gather I don't think we'll get much further without a lot of effort.
Don't worry about Android builds/tests on b2g32. Completely unsupported anyway :)

Honza, this looks like a real failure? I see it on WinXP debug as well. Everything else looks safe to ignore.
https://treeherder.mozilla.org/ui/logviewer.html#?job_id=3374596&repo=try
Flags: needinfo?(mh+mozilla)
Duplicate of this bug: 1058062
You need to log in before you can comment on or make changes to this bug.