Closed Bug 1395938 Opened 7 years ago Closed 7 years ago

Crash in RtlEnterCriticalSection | mozilla::net::CacheStorageService::CacheQueueSize

Categories

(Core :: Networking: Cache, defect)

Unspecified
Windows 10
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla57
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 --- unaffected
firefox57 --- verified

People

(Reporter: marcia, Assigned: valentin)

References

Details

(Keywords: crash, Whiteboard: [necko-active])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-0f58b76f-7fc7-436d-b758-d19f50170901.
=============================================================

Seen while looking at nightly crashes: http://bit.ly/2vQ2iWA. Crashes appear to have started with 20170831220208. 121 crashes so far.

Possible regression range based on Build ID: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ab2d700fda2b4934d24227216972dce9fac19b74&tochange=13d241d08912be31884f9d0d0e805b25343d6c0a

ni on Valentin for ideas, since it looks as if he worked in this area of code.
Flags: needinfo?(valentin.gosu)
Assignee: nobody → valentin.gosu
Whiteboard: [necko-active]
I would say this is caused by bug 1358060, which shows up in the regression range.
mozilla::net::nsHttpChannel::OnTailUnblock is in the callstack for the crash.
It seems that requests are unblocked during/after shutdown, which causes us to crash here because CacheFileIOManager::IOThread() returns a null pointer after shutdown.
Depends on: tailing
Flags: needinfo?(valentin.gosu)
The request context service should observer some early shutdown topic, cancel all context with a failure and disallow accepting new requests to the queue (simply don't tail).  Do you think you could fix that Valentin?  Or find someone who could?  I would like to prevent backing the tail patch out while this seems simple to fix.
(In reply to Honza Bambas (:mayhemer) from comment #2)
> Do you think you could fix that Valentin?

Sure, I'll try to fix it.
Comment on attachment 8903888 [details]
Bug 1395938 - Prevent crash when we are missing an IOThread during shutdown

https://reviewboard.mozilla.org/r/175672/#review180712

::: netwerk/base/RequestContextService.cpp:530
(Diff revision 1)
>    NS_ENSURE_ARG_POINTER(rc);
>    *rc = nullptr;
>  
> +  if (mShutdown) {
> +    return NS_ERROR_ILLEGAL_DURING_SHUTDOWN;
> +  }

that's one possibility, but you always have to add this condition to IsContextTailBlocked method (can be called later than a channel asks for rc)

Thanks for a patch!
This is the #3 Windows topcrash in Nightly 20170901220209. Please review and land ASAP!
This is also showing up on Mac with the signature `[@ pthread_mutex_lock | mozilla::detail::MutexImpl::lock ]`. It's the #1 topcrash on Mac.
(In reply to Nicholas Nethercote [:njn] from comment #8)
> This is also showing up on Mac with the signature `[@ pthread_mutex_lock |
> mozilla::detail::MutexImpl::lock ]`. It's the #1 topcrash on Mac.

I filed bug 1396451 to skip past the second entry, which would give a better signature in that case.
Comment on attachment 8903888 [details]
Bug 1395938 - Prevent crash when we are missing an IOThread during shutdown

https://reviewboard.mozilla.org/r/175672/#review181034
Attachment #8903888 - Flags: review?(michal.novotny) → review+
Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/d54673f840a4
Cancel RequestContext::RequestContext on shutdown r=michal
https://hg.mozilla.org/mozilla-central/rev/d54673f840a4
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Blocks: 1396377
The crashes with signature "RtlEnterCriticalSection | mozilla::net::CacheStorageService::CacheQueueSize" are still present in nightly 57 with buildids 20170905100117 (79 crashes) and 20170905220108 (88 crashes).
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I think the problem here is that we shutdown the CacheIOThread on several different topics [1] whereas we only shutdown the RequestContextService on "xpcom-shutdown".

[1] http://searchfox.org/mozilla-central/rev/4d8e389498a08668cce9ebf6232cc96be178c3e4/netwerk/cache2/CacheObserver.cpp#500-502

I guess the easiest way of fixing this one would be to guard for the existence of the IOThread in CacheStorageService::CacheQueueSize.
Blocks: 1397317
Crash Signature: [@ RtlEnterCriticalSection | mozilla::net::CacheStorageService::CacheQueueSize] → [@ RtlEnterCriticalSection | mozilla::net::CacheStorageService::CacheQueueSize] [@ RtlEnterCriticalSection | mozilla::detail::MutexImpl::lock | mozilla::net::CacheIOThread::QueueSize]
Crash Signature: [@ RtlEnterCriticalSection | mozilla::net::CacheStorageService::CacheQueueSize] [@ RtlEnterCriticalSection | mozilla::detail::MutexImpl::lock | mozilla::net::CacheIOThread::QueueSize] → [@ RtlEnterCriticalSection | mozilla::net::CacheStorageService::CacheQueueSize] [@ RtlEnterCriticalSection | mozilla::detail::MutexImpl::lock | mozilla::net::CacheIOThread::QueueSize] [@ pthread_mutex_lock | mozilla::detail::MutexImpl::lock | mozilla::n…
QA in Taipei will setup automation environment to reproduce this crash and capture corresponding log for analysis.

@valentin do you have any suggestion on the test step and logging category?
I ask QA to add "rotate:50,cache2:5" into our default logging module in about:networking, and the STR is to repetitively load a web page in new tab.
Flags: needinfo?(valentin.gosu)
Not sure if this needs QA attention. The problem here is fairly obvious.
If anything, I could use some help getting logs for bug 1396307.
Flags: needinfo?(valentin.gosu)
Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d6e1c5f5a5c3
Prevent crash when we are missing an IOThread during shutdown r=michal
https://hg.mozilla.org/integration/mozilla-inbound/rev/d6e1c5f5a5c3548e06135b851c7e39af080b492e
Bug 1395938 - Prevent crash when we are missing an IOThread during shutdown r=michal
https://hg.mozilla.org/mozilla-central/rev/d6e1c5f5a5c3
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
No more crashes with these signatures on Socorro.
Status: RESOLVED → VERIFIED
No longer blocks: 1396377
No longer blocks: tailing
You need to log in before you can comment on or make changes to this bug.