Closed Bug 1543331 Opened 5 years ago Closed 4 years ago

Crash in [@ @0x0 | nsHostResolver::FlushCache]

Categories

(Core :: Networking: DNS, defect, P3)

Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
mozilla75
Tracking Status
thunderbird_esr68 --- affected
firefox-esr60 --- unaffected
firefox-esr68 --- fixed
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- wontfix
firefox73 --- wontfix
firefox74 --- wontfix
firefox75 --- fixed

People

(Reporter: calixte, Assigned: valentin)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression, Whiteboard: [necko-triaged])

Crash Data

Attachments

(2 files)

This bug is for crash report bp-deea109f-ecbc-43cd-9ccf-3d3f40190331.

Top 10 frames of crashing thread:

0  @0x0 
1 dalvik-jit-code-cache_2359_2359 (deleted) dalvik-jit-code-cache_2359_2359 @0x7b7e628 
2 libxul.so nsHostResolver::FlushCache netwerk/dns/nsHostResolver.cpp:740
3 libxul.so nsDNSService::Observe netwerk/dns/nsDNSService2.cpp:1134
4 libxul.so nsObserverList::NotifyObservers xpcom/ds/nsObserverList.cpp:66
5 libxul.so nsObserverService::NotifyObservers xpcom/ds/nsObserverService.cpp:286
6 libxul.so mozilla::GeckoNetworkManager::OnStatusChanged widget/android/GeckoNetworkManager.h:37
7 libxul.so mozilla::detail::RunnableFunction<mozilla::jni::detail::ProxyNativeCall<GeckoAppShellSupport, mozilla::java::GeckoAppShell, true, false, mozilla::jni::StringParam const&> >::Run widget/android/jni/Natives.h:395
8 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1161
9 libxul.so NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:474

There are 63 crashes (from 61 installations) in 66.0.2 with buildid 20190326175229. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1450893.

[1] https://hg.mozilla.org/releases/mozilla-release/rev?node=1805599da5fd

Flags: needinfo?(valentin.gosu)

I see a few of them affecting Fennec 65.0.1:
https://crash-stats.mozilla.com/report/index/5dbf9fa0-3d45-4ac7-a89e-db83a0190406
And it crashes at the same spot.

The patch shouldn't have changed the behaviour at all if TRR is off.
Dragana, did I miss something in the patch?

Flags: needinfo?(valentin.gosu) → needinfo?(dd.mozilla)

(In reply to Valentin Gosu [:valentin] from comment #1)

I see a few of them affecting Fennec 65.0.1:
https://crash-stats.mozilla.com/report/index/5dbf9fa0-3d45-4ac7-a89e-db83a0190406
And it crashes at the same spot.

The patch shouldn't have changed the behaviour at all if TRR is off.
Dragana, did I miss something in the patch?

This is interesting. We have some random crashes in nsHostResolver they are mostly on Fennec. I have spent some time looking at them but I have not come to any conclusion and the code is same on all platforms, why it is affecting only Fennec.

What is interesting here is that the small change you made cause this crash to appear.
We do not call ClearCache with parameter true that could actually change the previous behavior.

That's what I found confusing as well. I'm inclined to say this is caused by a compiler bug, or maybe random bitflips, but I have no evidence to support that.

Assigning to myself to keep an eye on it. Given bug 1549398 it is possible we have an Android only DNS bug.

Assignee: nobody → valentin.gosu
Priority: -- → P3
Whiteboard: [necko-triaged]

Only 98 crashes on release - relatively low volume compared to the bug referenced in Comment 4.

Flags: needinfo?(dd.mozilla)

frame #5 of report https://crash-stats.mozilla.org/report/index/4dca6cb1-8d45-4bf5-8836-216810200217

This crash was rather obvious in retrospect, but I missed it because I was
looking at the wrong thing. We're not actually crashing in FlushCache,
instead mHostResolver is null in nsDNSService::Observe

What made it obvious is frame #5 of report https://crash-stats.mozilla.org/report/index/4dca6cb1-8d45-4bf5-8836-216810200217
Included here because crash reports expire:

1   libxul.so   nsHostResolver::FlushCache(bool)  netwerk/dns/nsHostResolver.cpp:740
2   libxul.so   nsDNSService::Observe(nsISupports*, char const*, char16_t const*)   netwerk/dns/nsDNSService2.cpp:1132
3   libxul.so   nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*)   xpcom/ds/nsObserverList.cpp:66
4   libxul.so   nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*)  xpcom/ds/nsObserverService.cpp:295
5   libxul.so   DecreasePrivateDocShellCount()  docshell/base/nsDocShell.cpp:306
6   libxul.so   nsDocShell::Destroy()   docshell/base/nsDocShell.cpp:5076

See the code points to this line:
https://hg.mozilla.org/releases/mozilla-esr68/annotate/ef373efc995d9350a676c4c231b344d173423e8a/docshell/base/nsDocShell.cpp#l306

As we can see, it emits the "last-pb-context-exited" notification,
and nsDNSService tries to call FlushCache.
However, it appears this notification may be called after we get the shutdown
notification and we null out the pointer. It's unclear why this crash was not
noticed before bug 1450893 landed.

Depends on D63107

Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/9fbafe41107e
clang-format nsDNSService2.cpp r=dragana
https://hg.mozilla.org/integration/autoland/rev/1e00af5facbe
Add a null check before calling mHostResolver->FlushCache() r=dragana
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla75

Comment on attachment 9127144 [details]
Bug 1543331 - Add a null check before calling mHostResolver->FlushCache() r=dragana

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Fixes crash on Fennec caused by null pointer dereference.
  • User impact if declined: Crashes on Fennec. Fenix doesn't seem to get these - unclear if it's a difference in architecture or if it just doesn't have enough users yet.
  • Fix Landed on Version: 75
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Just adds a null check. Very low risk.
  • String or UUID changes made by this patch:
Attachment #9127144 - Flags: approval-mozilla-esr68?

Comment on attachment 9127144 [details]
Bug 1543331 - Add a null check before calling mHostResolver->FlushCache() r=dragana

Fixes a low-volume Fennec crash. Approved for 68.6esr.

Attachment #9127144 - Flags: approval-mozilla-esr68? → approval-mozilla-esr68+
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: