Closed
Bug 1504774
Opened 7 years ago
Closed 7 years ago
Crash in shutdownhang | mozilla::SyncRunnable::DispatchToThread
Categories
(Toolkit :: Safe Browsing, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla66
People
(Reporter: lizzard, Assigned: dimi)
References
Details
(Keywords: crash, regression)
Crash Data
Attachments
(1 file)
|
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
|
Details | Review |
Followup bug for crashes in 64 beta.
+++ This bug was initially created as a clone of Bug #1453038 +++
This bug was filed from the Socorro interface and is
report bp-53ff0449-a421-4467-b2c2-5b1e70180410.
=============================================================
Windows only crash: https://bit.ly/2HqEfF4. Some user comments mention continuous crashes.
Top 10 frames of crashing thread:
0 ntdll.dll NtWaitForAlertByThreadId
1 ntdll.dll RtlSleepConditionVariableCS
2 kernelbase.dll SleepConditionVariableCS
3 mozglue.dll mozilla::detail::ConditionVariableImpl::wait mozglue/misc/ConditionVariable_windows.cpp:58
4 xul.dll mozilla::CondVar::Wait xpcom/threads/CondVar.h:68
5 xul.dll mozilla::SyncRunnable::DispatchToThread xpcom/threads/SyncRunnable.h:70
6 xul.dll mozilla::SyncRunnable::DispatchToThread xpcom/threads/SyncRunnable.h:98
7 xul.dll nsUrlClassifierDBService::Shutdown toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:2539
8 xul.dll nsUrlClassifierDBService::Observe toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:2480
9 xul.dll nsObserverList::NotifyObservers xpcom/ds/nsObserverList.cpp:112
=============================================================
Comment 1•7 years ago
|
||
At least for the crash in comment 0, the crashing thread is a lie. The crash reason is:
MOZ_CRASH Reason MOZ_CRASH(Shutdown hanging before starting.)
and so the crashing thread is something else:
Thread 50, Name: Shutdown Hang Terminator
Frame Module Signature Source
0 xul.dll mozilla::`anonymous namespace'::RunWatchdog toolkit/components/terminator/nsTerminator.cpp:206
1 nss3.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:397
2 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:137
3 ucrtbase.dll thread_start<unsigned int (__stdcall*)(void*)>
4 kernel32.dll BaseThreadInitThunk
5 mozglue.dll patched_BaseThreadInitThunk mozglue/build/WindowsDllBlocklist.cpp:838
But the thread that's taking too long to do something, i.e. is stuck, is the main thread waiting on the URL classifier service.
Looking at the first couple reports in:
https://crash-stats.mozilla.com/signature/?signature=shutdownhang%20%7C%20mozilla%3A%3ASyncRunnable%3A%3ADispatchToThread&date=%3E%3D2018-11-01T10%3A36%3A06.000Z&date=%3C2018-11-08T09%3A36%3A06.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-version&_sort=-date&page=1
this seems to be the common pattern. What is taking so long here?
Flags: needinfo?(francois)
| Assignee | ||
Comment 2•7 years ago
|
||
This happened when URL classifier was doing a SafeBrowsing update which copies the blocklist database files to a tempory folder, update them, and then move them back to the original folder.
We have added several checkpoints so we can abort the update as soon as possible while shutting down.
The crash report in Comment 0 is from Firefox 59, that happened while we were copying the entire database directory. That should be fixed in Bug 1481819 which landed in 64.
I'll see if I can find anything by checking crash reports in firefox 64, 65.
Updated•7 years ago
|
Assignee: nobody → dlee
Status: NEW → ASSIGNED
Flags: needinfo?(francois)
| Assignee | ||
Comment 3•7 years ago
|
||
In Bug 1453038, |mUpdateInterrupted| is set in Classifer::Close() which is
called by PreShutdown to abort an ongoing update. That doesn't handle
all the cases.
The SafeBrowsing update involves two threads, worker thread, and update
thread. Update thread takes care of most of the update work, when it finishes
its task, it posts a task back to the worker thread to apply the updated database
and also do some cleanup stuff. Then the update is complete.
The fix in Bug 1453038 doesn't abort an update if the woker thread is doing
the job. This is because the |mUpdateInterrupted| flag is set in the
worker thread. The PreShutdown event which eventually sets the flag has to
wait until the worker thread's current task is done.
In this patch:
1. Check nsUrlClassifierDBService::ShutdownHasStarted() to abort shutdown.
This is set by main thread so both worker thread and update thread can
be interrupted now.
2. mIsClosed is now replaced by the mUpdateInterrupted. The semantics of
mUpdateInterrupted is now changed to abort update for any internal APIs
which should cause an update to abort.
3. Remove |mUpdateInterrupted| and |ShutdownHasStarted()| checks and
unify with |ShouldAbort()|
| Assignee | ||
Comment 4•7 years ago
|
||
This patch addressed an issue I found while checking this bug. But it doesn't cover all the cases.
To be honest, most of the crashes are pretty strange and I still don't have any clue.
Comment 5•7 years ago
|
||
Is this patch ready to land?
| Assignee | ||
Comment 6•7 years ago
|
||
(In reply to Julien Cristau [:jcristau] from comment #5)
> Is this patch ready to land?
Yes, I'll test on try again and land if everything goes well.
| Assignee | ||
Comment 7•7 years ago
|
||
Pushed by dlee@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5b1c8bfb5cd2
Fix url-classifier worker thread is not aborted while shutting down. r=francois
Comment 9•7 years ago
|
||
| bugherder | ||
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox66:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
Updated•7 years ago
|
status-firefox-esr60:
--- → affected
Comment 10•7 years ago
|
||
Please request Beta approval on this when you get a chance.
Flags: needinfo?(dlee)
| Assignee | ||
Comment 11•7 years ago
|
||
Comment on attachment 9025961 [details]
Bug 1504774 - Fix url-classifier worker thread is not aborted while shutting down. r?francois
[Beta/Release Uplift Approval Request]
Feature/Bug causing the regression: None
User impact if declined: crash while shutting down
Is this code covered by automated tests?: Yes
Has the fix been verified in Nightly?: Yes
Needs manual test from QE?: No
If yes, steps to reproduce:
List of other uplifts needed: None
Risk to taking this patch: Medium
Why is the change risky/not risky? (and alternatives if risky): This bug is really related to the timing while shutting down.
It is not easy to verify every scenario.
String changes made/needed: no
Flags: needinfo?(dlee)
Attachment #9025961 -
Flags: approval-mozilla-beta?
Comment 12•7 years ago
|
||
Comment on attachment 9025961 [details]
Bug 1504774 - Fix url-classifier worker thread is not aborted while shutting down. r?francois
[Triage Comment]
Improved shutdown handling to avoid crashes. Approved for 65.0b8.
Attachment #9025961 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 13•7 years ago
|
||
| bugherder uplift | ||
Comment 14•7 years ago
|
||
(In reply to Dimi Lee [:dimi][:dlee] from comment #11)
> Is this code covered by automated tests?: Yes
> Needs manual test from QE?: No
Setting qe-verify- since it is covered by automated tests.
Flags: qe-verify-
Hi Dimi, since ESR60 is affected, should we consider uplifting this patch to ESR60 repo?
Flags: needinfo?(dlee)
| Assignee | ||
Comment 16•7 years ago
|
||
(In reply to Ritu Kothari (:ritu) from comment #15)
Hi Dimi, since ESR60 is affected, should we consider uplifting this patch to ESR60 repo?
Hi Ritu,
We can't uplift this to ESR60 because it depends on lots of things we have done for the past 6 months.
Flags: needinfo?(dlee)
Thanks! In that case, let's update this to a wontfix.
You need to log in
before you can comment on or make changes to this bug.
Description
•