Closed Bug 1247432 Opened 9 years ago Closed 8 years ago

Don't do any unnecessary I/O in cache2 after shutdown

Categories

(Core :: Networking: Cache, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox45 --- affected
firefox46 + fixed
firefox47 + fixed

People

(Reporter: mayhemer, Assigned: mayhemer)

References

Details

(Whiteboard: [necko-active])

Attachments

(1 file)

- don't read/open
- ignore doomed handles (already moved to doomed/ dir)
- write and close only for 2 seconds after shutdown

Still, I'm not sure at all this will help bug 1215970.  I believe there is some heavy I/O blockade happening from Skype (or whoever is the culprit) causing any op taking more than the time for shutdown.  So I think there is no help with this patch.  But we can try to at least make it a non-top crash :)
Whiteboard: [necko-active]
The shutdown-commenced point should be considered at profile-change-net-teardown and not xpcom-shutdown.  When reproducing bug 1215970 I can see the following:

11:21:42.066000 UTC - 0[2f11140]: nsHttpHandler::Observe [topic="profile-change-net-teardown"]
11:22:22.875000 UTC - 0[2f11140]: nsHttpHandler::Observe [topic="xpcom-shutdown"]

So, some 40 seconds later.  During these 40 seconds, cache should be muted for reading/opening.  Writes should be terminated up to xpcom-shutdown + 2 seconds.
Looking into the code this seems as trivial as shutting the cache io manager sooner from "profile-change-net-teardown".  since that point entries cannot be open on the high level, internal read/open operations will be bypassed (with failure), writing has the 2 second limit that will begin sooner as well.
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Attached patch v1Splinter Review
- shutdown IO manager at profile-net-change-teardown
- leaving other topics as well since this one is IMO not 100% in all cases to arrive
- redundancy is OK, this always happens on the main thread

https://treeherder.mozilla.org/#/jobs?repo=try&revision=469b207dea7c
Attachment #8718915 - Flags: review?(michal.novotny)
Attachment #8718915 - Flags: review?(michal.novotny) → review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/9d5fa6d7558a
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
I had to back this out for apparently causing a spike in ASAN failures in test_browserElement_oop_getWebManifest.html

https://treeherder.mozilla.org/logviewer.html#?job_id=21801214&repo=mozilla-inbound
https://hg.mozilla.org/integration/mozilla-inbound/rev/eda42451649d
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Wes Kocher (:KWierso) from comment #7)
> I had to back this out for apparently causing a spike in ASAN failures in
> test_browserElement_oop_getWebManifest.html
> 
> https://treeherder.mozilla.org/logviewer.html#?job_id=21801214&repo=mozilla-
> inbound
> https://hg.mozilla.org/integration/mozilla-inbound/rev/eda42451649d

Thanks Wes!  I'm currently investigating it persuaded it's from a different patch (not yet landed).
Status: REOPENED → NEW
11:03:13     INFO -  ==5232==ERROR: AddressSanitizer: heap-use-after-free on address 0x6270000ff900 at pc 0x440bf3 bp 0x7fde3c742750 sp 0x7fde3c742730
11:03:13     INFO -  READ of size 13552 at 0x6270000ff900 thread T15 (Cache2 I/O)
11:03:14     INFO -      #0 0x440bf2 in write /builds/slave/moz-toolchain/src/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc:325
11:03:14     INFO -      #1 0x7fde59dd52ef in pt_Write /builds/slave/try-l64-asan-00000000000000000/build/src/nsprpub/pr/src/pthreads/ptio.c:1315
11:03:15     INFO -      #2 0x7fde3f5782cf in (anonymous namespace)::interposedWrite(PRFileDesc*, void const*, int) /builds/slave/try-l64-asan-00000000000000000/build/src/xpcom/build/NSPRInterposer.cpp:72
11:03:15     INFO -      #3 0x7fde3f95aacf in mozilla::net::CacheFileIOManager::WriteInternal(mozilla::net::CacheFileHandle*, long, char const*, int, bool, bool) /builds/slave/try-l64-asan-00000000000000000/build/src/netwerk/cache2/CacheFileIOManager.cpp:2024
11:03:15     INFO -      #4 0x7fde3f9aa9c6 in mozilla::net::WriteEvent::Run() /builds/slave/try-l64-asan-00000000000000000/build/src/netwerk/cache2/CacheFileIOManager.cpp:725
11:03:15     INFO -      #5 0x7fde3f980b89 in mozilla::net::CacheIOThread::LoopOneLevel(unsigned int) /builds/slave/try-l64-asan-00000000000000000/build/src/netwerk/cache2/CacheIOThread.cpp:294
11:03:15     INFO -      #6 0x7fde3f98038c in mozilla::net::CacheIOThread::ThreadFunc() /builds/slave/try-l64-asan-00000000000000000/build/src/netwerk/cache2/CacheIOThread.cpp:224
11:03:15     INFO -      #7 0x7fde3f97fdd3 in mozilla::net::CacheIOThread::ThreadFunc(void*) /builds/slave/try-l64-asan-00000000000000000/build/src/netwerk/cache2/CacheIOThread.cpp:173
11:03:15     INFO -      #8 0x7fde59ddd3cf in _pt_root /builds/slave/try-l64-asan-00000000000000000/build/src/nsprpub/pr/src/pthreads/ptthread.c:216
11:03:15     INFO -      #9 0x7fde5e368e99 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7e99)
11:03:15     INFO -      #10 0x7fde5d4782ec (/lib/x86_64-linux-gnu/libc.so.6+0xf42ec)
11:03:15     INFO -  ASAN:SIGSEGV

Michal, do you have any idea from top of your head what this could be?  If not, I will look further.

Trying to obtain logs (not sure this will be reproducible with sync logging) at https://treeherder.mozilla.org/#/jobs?repo=try&revision=ea3283e4df75
Flags: needinfo?(michal.novotny)
I guess we post write event via CacheFileIOManager::Write(), but we release the buffer that holds the data sooner than WriteEvent::Run() is executed. But I have no idea how this can happen since WriteEvent holds the reference to the caller (mCallback) so the caller shouldn't be destructed. If no callback is passed to the WriteEvent, then the event takes buffer ownership and releases it in the destructor.
Flags: needinfo?(michal.novotny)
Depends on: 1248958
Michal, can you please look at bug 1248958 soon?  Thanks.
(In reply to Wes Kocher (:KWierso) from comment #7)
> I had to back this out for apparently causing a spike in ASAN failures in
> test_browserElement_oop_getWebManifest.html
> 
> https://treeherder.mozilla.org/logviewer.html#?job_id=21801214&repo=mozilla-
> inbound
> https://hg.mozilla.org/integration/mozilla-inbound/rev/eda42451649d

Trying with bug 1248958 again: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7ba867dfa9bd
Bug 1248958 has landed on m-i.
Status: NEW → ASSIGNED
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/2bfa3b1f1c74
Status: ASSIGNED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Shoudn't we uplift this patch (together with bug 1248958)? Would be nice to have it in the same release as bug 1249811.
Flags: needinfo?(honzab.moz)
> in the same release as bug 1249811.

I meant 913822.
(In reply to Michal Novotny (:michal) from comment #16)
> Shoudn't we uplift this patch (together with bug 1248958)? Would be nice to
> have it in the same release as bug 1249811.

Agree.  Wanted to do it actually.
Flags: needinfo?(honzab.moz)
Comment on attachment 8718915 [details] [diff] [review]
v1

Approval Request Comment
[Feature/regressing bug #]: The new HTTP cache
[User impact if declined]: This patch greatly helps with shutdown hangs and timeout crashes
[Describe test coverage new/current, TreeHerder]: landed on m-c while ago, tested on every browser/platform shutdown
[Risks and why]: after several test runs I can say zero when landed together with bug 1248958
[String/UUID change made/needed]: none

IMPORTANT: Land bug 1248958 *before* this bug on m-a !!!
Attachment #8718915 - Flags: approval-mozilla-aurora?
Tracking for 46 (and maybe 45?) as this should help our shutdown crashes including the one that looks like it's from skype click to play.
Comment on attachment 8718915 [details] [diff] [review]
v1

This may help with shutdown crashes.  Please uplift after bug 1248958
Attachment #8718915 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Depends on: 1251130
This should be backed out because it causes lot of shutdown crashes in the cache code (bugs 1251130, 1248389, 1254720, 1252746). We can re-land it once we find a proper fix for CacheFileIOManager shutdown.
Status: RESOLVED → REOPENED
Flags: needinfo?(honzab.moz)
Resolution: FIXED → ---
(In reply to Michal Novotny (:michal) from comment #23)
> This should be backed out because it causes lot of shutdown crashes in the
> cache code (bugs 1251130, 1248389, 1254720, 1252746). We can re-land it once
> we find a proper fix for CacheFileIOManager shutdown.

I can see only one bug we have here - bug 1251130 that needs a proper fix.

Do you believe there will be more even if we find a proper fix for it?


Otherwise, we can back *this* one out, but we'll again start crashing on shutdown timeout.  Hence, similar bad impact on users.

Let's try to find a proper fix for bug 1251130 first.  If that is going to be complicated or hard to uplift, let's back this out (from branches or from nightly as well - what would make really sad).
Flags: needinfo?(honzab.moz) → needinfo?(michal.novotny)
We are soon going to have a patch for bug 1251130.
Status: REOPENED → RESOLVED
Closed: 9 years ago8 years ago
Flags: needinfo?(michal.novotny)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: