Closed Bug 1334105 Opened 3 years ago Closed 3 years ago
Crash in wrap
_strdup | mozilla::Compare Versions
9.19 KB, text/plain
96.41 KB, text/plain
18.46 KB, text/plain
1.22 KB, patch
|Details | Diff | Splinter Review|
This bug was filed from the Socorro interface and is report bp-9674279b-b13f-41b6-89b3-d465c2170126. ============================================================= Crashing Thread (0) Frame Module Signature Source 0 mozglue.dll wrap_strdup memory/build/mozmemory_wrap.c:86 1 xul.dll mozilla::CompareVersions(char const*, char const*) xpcom/glue/nsVersionComparator.cpp:349 2 xul.dll mozilla::Telemetry::Common::IsExpiredVersion(char const*) toolkit/components/telemetry/TelemetryCommon.cpp:26 3 xul.dll TelemetryEvent::InitializeGlobalState(bool, bool) toolkit/components/telemetry/TelemetryEvent.cpp:366 4 xul.dll `anonymous namespace'::TelemetryImpl::CreateTelemetryInstance toolkit/components/telemetry/Telemetry.cpp:1922 5 xul.dll `anonymous namespace'::nsITelemetryConstructor toolkit/components/telemetry/Telemetry.cpp:2265 6 xul.dll nsComponentManagerImpl::CreateInstanceByContractID(char const*, nsISupports*, nsID const&, void**) xpcom/components/nsComponentManager.cpp:1170 7 xul.dll nsComponentManagerImpl::GetServiceByContractID(char const*, nsID const&, void**) xpcom/components/nsComponentManager.cpp:1526 8 xul.dll nsTHashtable<nsBaseHashtableET<nsCStringHashKey, nsFactoryEntry*> >::s_HashKey(void const*) obj-firefox/dist/include/nsTHashtable.h:376 9 xul.dll nsCOMPtr_base::assign_from_gs_contractid(nsGetServiceByContractID, nsID const&) xpcom/glue/nsCOMPtr.cpp:95 10 xul.dll mozilla::Telemetry::Init() toolkit/components/telemetry/Telemetry.cpp:2909 11 xul.dll nsLocalFile::GetParent(nsIFile**) xpcom/io/nsLocalFileWin.cpp:2857 12 xul.dll mozilla::ClearOnShutdown<mozilla::StaticRefPtr<nsPrintingProxy> >(mozilla::StaticRefPtr<nsPrintingProxy>*, mozilla::ShutdownPhase) obj-firefox/dist/include/mozilla/ClearOnShutdown.h:115 13 xul.dll mozilla::dom::ContentChild::Init(MessageLoop*, unsigned long, IPC::Channel*) dom/ipc/ContentChild.cpp:605 14 xul.dll mozilla::dom::ContentProcess::Init() dom/ipc/ContentProcess.cpp:138 15 xul.dll XRE_InitChildProcess toolkit/xre/nsEmbedFunctions.cpp:669 crashes with this signature are increasing in firefox 52 builds and later.
Priority: -- → P1
From the signature summary and skimming through reports: (1) the crash is limited to content processes (2) limited to Fx 52 (3) limited to beta channel (4) limited to crashes under "TelemetryEvent::InitializeGlobalState" (5) always crashing on `strdup(aStrB)` in `CompareVersions()`, with a null ptr read (2) makes sense as the first regularly recording event landed in 52 (bug 1316281). In 53, we made event recording default to disabled (bug 1329139). (3) is strange, why don't we see any crashes from Nightly & Aurora? I wonder if this is some odd build? For (4), comparing this to the scalar and histogram implementations, the event initialization is the only one checking expiry on initialization. For scalars & histograms we do this lazily, also using `Telemetry::Common::IsExpiredVersion()`: https://dxr.mozilla.org/mozilla-central/search?q=path%3ATelemetryHistogram.cpp+IsExpiredVersion https://dxr.mozilla.org/mozilla-central/search?q=path%3ATelemetryScalar.cpp+IsExpiredVersion ... so they could also be affected, but hidden by this crash. (5) is strange, coming from: * current_version = mozilla::Version(MOZ_APP_VERSION) * (mozilla::Version(aExpiration) <= current_version) * this calls: CompareVersions(versionContent, aRhs.ReadContent()) * CompareVersions(const char* aStrA, const char* aStrB) * char* B2 = strdup(aStrB); (see e.g. bp-9674279b-b13f-41b6-89b3-d465c2170126) I can't see how we can get to a 0 pointer there - strdup(MOZ_APP_VERSION) seems like it should always be valid. The other expiration parameter is asserted to be non-0.
All crashes do have the same buildid, which is legit - it's this build: http://ftp.mozilla.org/pub/firefox/candidates/52.0b1-candidates/build2/win32_info.txt
Another theory would be OOM or similar, i'm not sure how that shows up in crash reports now. Benjamin, do you know who can help with odd crashers like this these days?
it's also a crash at startup (<60s uptime) in all the reported cases last week.
That part is not surprising, this is initialization code for the Telemetry module running at/around startup.
Just checking the stack, it looks like "B" string passed to CompareVersions is null, and by debugging the minidump I've verified that static mozilla::Version current_version = mozilla::Version(MOZ_APP_VERSION); is null. It's probably not a good idea to have a function-scope static variable like that. It doesn't add a static destructor, but it does add a threadsafe dynamic constructor check to this codepath. Here is the disassembly for initializing current_version: 57832F40 push offset string "52.0" (59332560h) 57832F45 mov dword ptr ds:[5998C5FCh],eax 57832F4A call dword ptr [__imp__strdup (5921E39Ch)] 57832F50 mov dword ptr [current_version (599A9B30h)],eax 57832F55 mov dword ptr [esp],offset `mozilla::Telemetry::Common::IsExpiredVersion'::`2'::`dynamic atexit destructor for 'current_version'' (57B75E21h) 57832F5C call atexit (57B0310Bh) 57832F61 pop ecx 57832F62 mov ecx,dword ptr [esp+8] 57832F66 jmp mozilla::Telemetry::Common::IsExpiredVersion+1Bh (57832EBEh) I don't think strdup is infallible in our codebase, although it's obviously rather unlikely that it would fail during process startup like this. I don't know if glandium has thoughts about what might be happening with the CRT and allocators.
Flags: needinfo?(benjamin) → needinfo?(mh+mozilla)
Bonus points if a fix here can also address bug 959745.
There shouldn't be a problem with that strdup there. It would be worth double checking what dumpbin -imports says about __imp_strdup. It should come from mozglue.dll. Either way, it should not have returned NULL. But maybe the problem is that it wasn't called at all? what does the dynamic constructor check look like?
(In reply to Mike Hommey [:glandium] from comment #8) > There shouldn't be a problem with that strdup there. It would be worth > double checking what dumpbin -imports says about __imp_strdup. It should > come from mozglue.dll. It does (per attachment 8833920 [details], thanks Alessio). > Either way, it should not have returned NULL. But > maybe the problem is that it wasn't called at all? what does the dynamic > constructor check look like? Benjamin, can you answer that one? Thanks!
I can't spend more time on this now, but you should be able to answer this by disassembling this function xul.dll for the affected build(s).
Thanks, here it is: https://pastebin.mozilla.org/8974145 Glandium, does that tell you something?
Copy/pasting only the relevant instruction that lead to the static variable initialization: 10412ea4 a1fcc55612 mov eax,dword ptr [xul!`nsTHashtable<nsBaseHashtableET<nsStringHashKey,nsAutoPtr<`anonymous namespace'::HangReports::AnnotationInfo> > >::Ops'::`2'::`local static guard'+0x4 (1256c5fc)] 10412eaa 33db xor ebx,ebx 10412eaf 43 inc ebx 10412eb6 84c3 test bl,al 10412eb8 0f8480000000 je xul!mozilla::Telemetry::Common::IsExpiredVersion+0x9b (10412f3e) So, it loads the guard value at 0x1256c5fc into eax, sets ebx to 1, and compares the lower bits of both registers, and goes to the initialization code depending on the result of the comparison. The x86 test instruction is kind of weird. It sets flags depending on the result of a bitwise AND of the two given values. Assuming the uninitialized guard is 0, 0 & 1 is 0, which sets the ZF flag to 1. The je instruction jumps when the ZF flags is 1, so the jump happens if the guard is 0. If the guard is 1, then 1 & 1 is 1, which sets ZF to 0, and the jump doesn't happen. Then, in the initialization code: 10412f3e 0bc3 or eax,ebx 10412f45 a3fcc55612 mov dword ptr [xul!`nsTHashtable<nsBaseHashtableET<nsStringHashKey,nsAutoPtr<`anonymous namespace'::HangReports::AnnotationInfo> > >::Ops'::`2'::`local static guard'+0x4 (1256c5fc)],eax Since ebx is always 1, essentially, the or just sets eax to 1. Then eax is put into the guard value. All in all, the guard looks normal. Now the question is whether the guard is 0 or 1 when this all happens, and we likely don't have that info in the minidump, although, maybe we do, I don't know what minidumps include on windows. Surely the contents of the (modified) data segments of libraries would be interesting to have in minidump.
(In reply to Mike Hommey [:glandium] from comment #15) > All in all, the guard looks normal. Now the question is whether the guard is > 0 or 1 when this all happens, and we likely don't have that info in the > minidump, although, maybe we do, I don't know what minidumps include on > windows. Surely the contents of the (modified) data segments of libraries > would be interesting to have in minidump. Per Benjamin the minidump can't tell us that as its heap memory. Do you have an idea on next steps we can take here?
Speculative fix idea: Version has an operator<=(const char* aRhs), so how about just comparing (mozilla::Version(aExpiration) <= MOZ_APP_VERSION)?
Comment 17 sounds good.
Thanks for the help. I would love to know what is actually going on here, but i will push this fix up now. Chris, can you take a look?
Attachment #8834875 - Flags: review?(chutten)
Comment on attachment 8834875 [details] [diff] [review] Work around startup crashes in Version(MOZ_APP_VERSION) checks Review of attachment 8834875 [details] [diff] [review]: ----------------------------------------------------------------- Seems legit.
Attachment #8834875 - Flags: review?(chutten) → review+
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/66e0fa002986 Work around startup crashes in Version(MOZ_APP_VERSION) checks. r=chutten
Going off crash-stats, I assume we're not going to get any real sense of whether this patch helps or not unless we uplift to Beta, so please nominate it for Aurora/Beta approval when you get a chance.
Yes, that is the plan.
Comment on attachment 8834875 [details] [diff] [review] Work around startup crashes in Version(MOZ_APP_VERSION) checks Approval Request Comment [Feature/Bug causing the regression]: Event Telemetry [User impact if declined]: Startup crashes probably fixed. We will need to uplift to Beta to confirm though. [Is this code covered by automated tests?]: Yes. [Has the fix been verified in Nightly?]: Yes. [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: None. [Is the change risky?]: No. [Why is the change risky/not risky?]: This is a trivial change that is presumed to work around the startup crash here. [String changes made/needed]: None.
Comment on attachment 8834875 [details] [diff] [review] Work around startup crashes in Version(MOZ_APP_VERSION) checks Fix a crash. Aurora53+.
Attachment #8834875 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
We will probably only see confirmation of a fix on 52 beta here.
Comment on attachment 8834875 [details] [diff] [review] Work around startup crashes in Version(MOZ_APP_VERSION) checks startup crash fix for beta52
Attachment #8834875 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Setting qe-verify- per comment 26.
You need to log in before you can comment on or make changes to this bug.