Startup Crash in [@ mozilla::intl::LocaleService::GetAppLocalesAsLangTags]
Categories
(Core :: Internationalization, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox81 | --- | unaffected |
firefox82 | --- | unaffected |
firefox83 | blocking | verified |
firefox84 | + | verified |
People
(Reporter: aryx, Assigned: m_kato)
References
Details
(Keywords: crash, regression)
Crash Data
Attachments
(1 file)
47 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta+
|
Details | Review |
During the past week, there were 4 crashes with this signature on 4 different devices. They hit in the first 3 minutes after the application got launched.
Crash report: https://crash-stats.mozilla.org/report/index/629e45c6-0fad-4f1e-9936-ba27b0200924
Top 10 frames of crashing thread:
0 xul.dll mozilla::intl::LocaleService::GetAppLocalesAsLangTags intl/locale/LocaleService.cpp:400
1 xul.dll mozilla::intl::LocaleService::GetAppLocaleAsLangTag intl/locale/LocaleService.cpp:427
2 xul.dll nsChromeRegistryChrome::GetBaseURIFromPackage chrome/nsChromeRegistryChrome.cpp:330
3 xul.dll nsChromeRegistry::ConvertChromeURL chrome/nsChromeRegistry.cpp:253
4 xul.dll nsChromeProtocolHandler::NewChannel chrome/nsChromeProtocolHandler.cpp:127
5 xul.dll mozilla::net::nsIOService::NewChannelFromURIWithProxyFlagsInternal netwerk/base/nsIOService.cpp:1134
6 xul.dll mozilla::net::nsIOService::NewChannelFromURIWithClientAndController netwerk/base/nsIOService.cpp:1076
7 xul.dll NS_NewChannelInternal netwerk/base/nsNetUtil.cpp:404
8 xul.dll NS_NewChannel netwerk/base/nsNetUtil.cpp:345
9 xul.dll nsStringBundleBase::ParseProperties intl/strres/nsStringBundle.cpp:446
Zibi, any idea what started this?
Comment 1•4 years ago
|
||
Zibi, any idea what started this?
I don't, but I'm pretty sure that historically nsTArray<nsCString> mAppLocales
should never be nullptr, and that there has been some work on nsTArray
recently, so maybe :sg or :froydnj will know since they're on the log as last people touching nsTArray
?
It seems that something changed about nsTArray
which makes it possible to be nullptr
and it was not the case before? That's my first guess.
Comment 2•4 years ago
|
||
I tried to look at the dumps of three of the crash reports (https://crash-stats.mozilla.org/report/index/27ad659c-321a-4d7d-898f-d14530200928, https://crash-stats.mozilla.org/report/index/cb234549-9183-4890-b592-8f3e50200928, https://crash-stats.mozilla.org/report/index/967137e2-bfca-4e8a-9ad8-fe59b0200927) with Visual Studio, but unfortunately, it doesn't load xul.dll symbols for any of those properly. Not sure what that means.
I am not sure what exactly you suspect to be nullptr
here. The nsTArray
mLocales
, which is a member of LocaleService
is nothing what would be affected by changes to nsTArray
. It's not exactly clear, since there might be inline frames involved, but to me it looks as if LocaleService::GetInstance()
returns nullptr
at https://hg.mozilla.org/mozilla-central/annotate/b7717ee20ba9d676a6f559efbcb45027db0003c6/chrome/nsChromeRegistryChrome.cpp#l330
This can happen when the call to ClearOnShutdown
gets into https://searchfox.org/mozilla-central/source/xpcom/base/ClearOnShutdown.cpp#21 from https://searchfox.org/mozilla-central/source/intl/locale/LocaleService.cpp#152, which doesn't check again if sLocaleService
was cleared immediately.
Updated•4 years ago
|
Comment 3•4 years ago
|
||
Thank you for your help and investigation!
I am not sure what exactly you suspect to be nullptr here. The nsTArray mLocales, which is a member of LocaleService is nothing what would be affected by changes to nsTArray.
Thanks, that's what it looked like, but I wasn't sure.
It's not exactly clear, since there might be inline frames involved, but to me it looks as if LocaleService::GetInstance() returns nullptr at https://hg.mozilla.org/mozilla-central/annotate/b7717ee20ba9d676a6f559efbcb45027db0003c6/chrome/nsChromeRegistryChrome.cpp#l330
How did you conclude that from the stack?
The top frame is https://searchfox.org/mozilla-central/source/intl/locale/LocaleService.cpp#400
Comment 4•4 years ago
|
||
Sure, here are some more details on my analysis when looking at https://crash-stats.mozilla.org/report/index/cb234549-9183-4890-b592-8f3e50200928:
- The top frame shows https://hg.mozilla.org/mozilla-central/annotate/4668c7b5e842a7afef9e4c183c8a85af950eb1ad/intl/locale/LocaleService.cpp#l400, which calls
mAppLocales.IsEmpty()
. Assuming this is not affected by other optimizations, sincensTArray_base::IsEmpty
is not a virtual function, this can't happen when tying to access the vtable, but we are probably inside that function, which is likely to have been inlined here, since it is very small. So https://searchfox.org/mozilla-central/rev/dfd9c0f72f9765bd4a187444e0c1e19e8834a506/xpcom/ds/nsTArray.h#416 callsLength
which accessesmHdr
at https://searchfox.org/mozilla-central/rev/dfd9c0f72f9765bd4a187444e0c1e19e8834a506/xpcom/ds/nsTArray.h#413. It then dereferencesmHdr
, so this could cause an access violation like the one mentioned ifmHdr
isnullptr
, but I think this is rather unlikely (ifnsTArray
had some bug that causes this to benullptr
we would probably have seen this elsewhere). The other possibility is that the base address of thensTArray
object itself is alreadynullptr + x
, so trying to accessmHdr
will try to accessnullptr + x + offsetof(nsTArray<...>, mHdr)
. I then went on to see if the pointer of the object containing thensTArray
might be anullptr
. - The second stack frame just delegates to the first.
- The third stack frame is at https://hg.mozilla.org/mozilla-central/annotate/4668c7b5e842a7afef9e4c183c8a85af950eb1ad/chrome/nsChromeRegistryChrome.cpp#l330 and is actually calling a method on the object containing the
nsTArray
through a pointer obtained by callingLocaleService::GetInstance()
. If that returnednullptr
, this would explain the access violation. - So I looked into https://searchfox.org/mozilla-central/rev/dfd9c0f72f9765bd4a187444e0c1e19e8834a506/intl/locale/LocaleService.cpp#135. At first sight, this looks as if it does
if (!sInstance) { sInstance= new LocaleService{...}; } return sInstance;
which would never returnnullptr
. However, there's the call toClearOnShutdown
at https://searchfox.org/mozilla-central/rev/dfd9c0f72f9765bd4a187444e0c1e19e8834a506/intl/locale/LocaleService.cpp#152. ClearOnShutdown
callsInsertIntoShutdownList
at https://searchfox.org/mozilla-central/rev/dfd9c0f72f9765bd4a187444e0c1e19e8834a506/xpcom/base/ClearOnShutdown.h#117InsertIntoShutdownList
checksPastShutdownPhase(aPhase)
and in that case callsaObserver->Shutdown();
at https://searchfox.org/mozilla-central/rev/dfd9c0f72f9765bd4a187444e0c1e19e8834a506/xpcom/base/ClearOnShutdown.cpp#21aObserver
is aPointerClearer
forsInstance
, so callingShutdown
will setsInstance
tonullptr
, which makesLocaleService::GetInstance
returnnullptr
, which explains the access violation.
What I now also saw in the crash metadata is that it has a GraphicsCriticalError
saying |[0][GFX1-]: Killing GPU process due to IPC reply timeout (t=502.452) |[1][GFX1-]: Failed as lost WebRenderBridgeChild. (t=502.452) |[2][GFX1-]: Failed to create remote compositor (t=504.172) |[3][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=545.582)
, so this sounds as if we are actually shutting down.
So it seems to me as if nsChromeRegistryChrome::GetBaseURIFromPackage
should check the return value of LocaleService::GetInstance()
before dereferencing.
Comment 5•4 years ago
|
||
Tracking for 83 as startup crashes means loosing users.
Comment 6•4 years ago
|
||
Thanks so much for the analysis and guidance through the stack trace debugging! Super helpful :)
I take your conclusion as actionable!
Comment 7•4 years ago
|
||
Setting P2 because it's a crash and a startup one at that, the fix is relatively simple but the volume is low, so below P1s.
Assignee | ||
Updated•4 years ago
|
Comment 8•4 years ago
|
||
Zibi, do you have a fix coming for this crash? Thanks
Comment 9•4 years ago
|
||
Not at the moment, I don't have cycles.
Comment 10•4 years ago
|
||
Insofar the root cause for this is the GPU process problems mentioned in comment 4, maybe a graphics person should give this a look?
Comment 11•4 years ago
|
||
Changing the priority to p2 as the bug is tracked by a release manager for the current nightly.
See What Do You Triage for more information
Comment 12•4 years ago
|
||
Forwarding to JEff. But I think gfx is just a red herring, and isn't related to the problem at hand
(In reply to Simon Giesecke [:sg] [he/him] from comment #4)
ClearOnShutdown
callsInsertIntoShutdownList
at https://searchfox.org/mozilla-central/rev/dfd9c0f72f9765bd4a187444e0c1e19e8834a506/xpcom/base/ClearOnShutdown.h#117InsertIntoShutdownList
checksPastShutdownPhase(aPhase)
and in that case callsaObserver->Shutdown();
at https://searchfox.org/mozilla-central/rev/dfd9c0f72f9765bd4a187444e0c1e19e8834a506/xpcom/base/ClearOnShutdown.cpp#21aObserver
is aPointerClearer
forsInstance
, so callingShutdown
will setsInstance
tonullptr
, which makesLocaleService::GetInstance
returnnullptr
, which explains the access violation.
I think you're on the money, we've started shutdown and something is calling GetInstance()
So it seems to me as if
nsChromeRegistryChrome::GetBaseURIFromPackage
should check the return value ofLocaleService::GetInstance()
before dereferencing.
The easiest will be to dispatch a task on the main thread to call ClearOnShutdown(&sInstance, ShutdownPhase::Shutdown) then.
worse case it leaks a bit because we failed to dispatch the task, and as we're shutting down it doesn't matter.
Comment 14•4 years ago
|
||
The startup crashes are going up since 83 moved to beta, makoto could we get somebody assigned on this one to have time to find a fix and uplift it to beta? Thanks
Assignee | ||
Comment 15•4 years ago
|
||
Zibi, could you look this issue? Since shutting down process won't get XPCOM instance, LocaleService and nsChromeRegistryChrome should consider that these are called during shunting down.
Updated•4 years ago
|
Comment 16•4 years ago
|
||
Hundreds of startup crashes on Beta, I consider it as a blocker for the 83 release. Makoto, Zibi, this needs an owner and a fix we can uplift in beta. Thanks.
Assignee | ||
Comment 17•4 years ago
|
||
Related to bug 1663747. I think that nsStringBundle::LoadProperties
should check shutdown phase like bug 1663747.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 18•4 years ago
|
||
ClearOnShutdown is called before shut down observer is notified. So when this
is notified, some services are already shut down.
So string bundle won't get the value since Locale service is shut down, so we
should return error when shutting down status.
Comment 19•4 years ago
|
||
Pushed by m_kato@ga2.so-net.ne.jp: https://hg.mozilla.org/integration/autoland/rev/83bf4fd3b1fb String bundle should return error after ClearOnShutdown is called. r=zbraniecki
Comment 20•4 years ago
|
||
bugherder |
Comment 21•4 years ago
|
||
Makoto, could you request an uplift to beta? Thanks
Assignee | ||
Comment 22•4 years ago
|
||
Comment on attachment 9184143 [details]
Bug 1667613 - String bundle should return error after ClearOnShutdown is called.
Beta/Release Uplift Approval Request
- User impact if declined: When user quits Firefox, Firefox may crash. Although we don't know how to reproduce this, but this is potential fix.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): String bundle returns error instead when shutting down is started. After user quits Firefox, remained HTTP request (most is captive portals service) will have no Accept-Language header. But it is unnecessary during shutdown.
- String changes made/needed: No
Comment 23•4 years ago
|
||
Comment on attachment 9184143 [details]
Bug 1667613 - String bundle should return error after ClearOnShutdown is called.
Approved for 83 beta 6, thanks.
Comment 24•4 years ago
|
||
bugherder uplift |
Comment 25•4 years ago
|
||
No crashes in beta 6 and 7.
Description
•