Closed
Bug 1428612
Opened 6 years ago
Closed 6 years ago
Crash in OOM | small in TelemetryHistogram::InitializeGlobalState
Categories
(Toolkit :: Telemetry, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla60
People
(Reporter: philipp, Assigned: erahm)
Details
(Keywords: crash)
Crash Data
Attachments
(2 files)
1.42 KB,
patch
|
gfritzsche
:
review+
lizzard
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
1.18 KB,
patch
|
gfritzsche
:
review+
lizzard
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-b64045fe-3046-426d-ae81-59cbc0180106. ============================================================= Top 10 frames of crashing thread: 0 xul.dll NS_ABORT_OOM xpcom/base/nsDebugImpl.cpp:620 1 xul.dll nsTHashtable<nsBaseHashtableET<nsCStringHashKey, mozilla::Telemetry::HistogramID> >::s_InitEntry xpcom/ds/nsTHashtable.h:456 2 xul.dll PLDHashTable::Add xpcom/ds/PLDHashTable.cpp:590 3 xul.dll TelemetryHistogram::InitializeGlobalState toolkit/components/telemetry/TelemetryHistogram.cpp:1719 4 xul.dll `anonymous namespace'::TelemetryImpl::CreateTelemetryInstance toolkit/components/telemetry/Telemetry.cpp:1202 5 xul.dll `anonymous namespace'::nsITelemetryConstructor toolkit/components/telemetry/Telemetry.cpp:1572 6 xul.dll nsComponentManagerImpl::CreateInstanceByContractID xpcom/components/nsComponentManager.cpp:1086 7 xul.dll nsComponentManagerImpl::GetServiceByContractID xpcom/components/nsComponentManager.cpp:1446 8 xul.dll nsCOMPtr_base::assign_from_gs_contractid xpcom/base/nsCOMPtr.cpp:95 9 xul.dll mozilla::Telemetry::Init toolkit/components/telemetry/Telemetry.cpp:1968 ============================================================= these seem to be persistent crashes on startup that hit individual users repeatedly. some user comments: http://bit.ly/2CQzwxm
Updated•6 years ago
|
Flags: needinfo?(chutten)
Flags: needinfo?(alessio.placitelli)
Priority: -- → P1
Updated•6 years ago
|
tracking-firefox58:
--- → ?
Comment 1•6 years ago
|
||
To my knowledge no telemetry patches made it into 57.0.3 (which seems the most likely vector for the change that introduced this crash) Frame 3 is where gNameToHistogramIDMap is being populated with string hashes to histogram ids. It is limited by HistogramCount which can be at most std::numeric_limits<uint32_t>::max() (4294967295) but is actually on the order of about 1700[1]. Not sure how that would OOM unless something subtle was at play. [1]: https://searchfox.org/mozilla-central/source/__GENERATED__/toolkit/components/telemetry/TelemetryHistogramEnums.h#1863
Flags: needinfo?(chutten)
Comment 2•6 years ago
|
||
Since the signature isn't too helpful in this case, here's some links: Here's a report of it happening: https://crash-stats.mozilla.com/report/index/7f4cd699-a200-4a18-afc7-5c06b0180103 Here's a different report where we OOM a little later when init'ing TelemetryEvent: https://crash-stats.mozilla.com/report/index/6f2a3528-3d91-44f7-9045-f997e0180105 I'm seeing only Windows crashes for this so far.
Comment 3•6 years ago
|
||
The volume seems to become high on Dec 27: https://crash-stats.mozilla.com/signature/?signature=OOM%20%7C%20small&date=%3E%3D2017-10-08T15%3A55%3A17.000Z&date=%3C2018-01-08T14%3A55%3A17.000Z#graphs However, we deleted crash data from Dec 24 due to another bug. Dec 24 + boxing day(s) should see low usage volume.
Comment 4•6 years ago
|
||
The reports from comment 1 are a bit surprising to me (bp-7f4cd699-a200-4a18-afc7-5c06b0180103 and bp-6f2a3528-3d91-44f7-9045-f997e0180105): - both request tiny allocations, 6 bytes & 23 bytes - both have physical memory available, 2.33GB & 2.66GB
Comment 5•6 years ago
|
||
(In reply to Georg Fritzsche [:gfritzsche] from comment #4) > The reports from comment 1 are a bit surprising to me > (bp-7f4cd699-a200-4a18-afc7-5c06b0180103 and > bp-6f2a3528-3d91-44f7-9045-f997e0180105): > - both request tiny allocations, 6 bytes & 23 bytes > - both have physical memory available, 2.33GB & 2.66GB Is there *any* way that the amount you're requesting can overflow? Because that sounds like what's happening here, especially on 64-bit.
Comment 6•6 years ago
|
||
Doesn't look it? https://searchfox.org/mozilla-central/source/toolkit/components/telemetry/TelemetryHistogram.cpp#1719
Comment 7•6 years ago
|
||
In the two specific cases here i personally don't see it: - hashtable insertion in bp-7f4cd699-a200-4a18-afc7-5c06b0180103: - https://hg.mozilla.org/releases/mozilla-release/annotate/c548334d172b/toolkit/components/telemetry/TelemetryHistogram.cpp#l1719 - https://hg.mozilla.org/releases/mozilla-release/annotate/c548334d172b/xpcom/ds/nsTHashtable.h#l456 - string construction for bp-6f2a3528-3d91-44f7-9045-f997e0180105: - https://hg.mozilla.org/releases/mozilla-release/annotate/c548334d172b/toolkit/components/telemetry/TelemetryEvent.cpp#l702 - https://hg.mozilla.org/releases/mozilla-release/annotate/c548334d172b/xpcom/string/nsTString.h#l82 - https://hg.mozilla.org/releases/mozilla-release/annotate/c548334d172b/xpcom/string/nsTSubstring.cpp#l366
Updated•6 years ago
|
Flags: needinfo?(alessio.placitelli)
Comment 8•6 years ago
|
||
We're out of our depth here. :jimm, do you have any ideas? The tl;dr is that we're seeing an OOM|small startup crash within Telemetry init despite the report saying there's plenty of available mem. It happens in various places we're allocating memory.
Flags: needinfo?(jmathies)
Comment 9•6 years ago
|
||
https://crash-stats.mozilla.com/search/?proto_signature=~TelemetryHistogram%3A%3AInitializeGlobalState&uptime=%3C10&product=Firefox&date=%3E%3D2018-01-10T11%3A55%3A11.000Z&date=%3C2018-01-17T11%3A55%3A11.000Z&_sort=-date&_facets=signature&_facets=version&_facets=total_virtual_memory&_facets=available_virtual_memory&_facets=available_physical_memory&_facets=memory_vsize_max_contiguous&_facets=system_memory_use_percentage&_facets=oom_allocation_size&_facets=memory_heap_allocated&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature * affects 55 and up so this isn't 57 specific * memory parameter facets don't point to anything obvious * affected systems don't appear to be experiencing memory stress. * I'm unable to get virtual memory max contiguous data out of these reports. :/ * uptime generally is less than 10 seconds for these crashes volume for these crashes is really low, currently 95 reports in 57 across 7 days. Eric, any thoughts on what could cause these rather unexpected OOMs here?
Flags: needinfo?(jmathies) → needinfo?(erahm)
Assignee | ||
Comment 10•6 years ago
|
||
The stack is a little weird, I'm assuming we're OOMing in the string allocation [1]. OOMing that small that early is *highly* unusual, it could be a sign of corruption which would go along with my next comment. Some of the other crashes that are in the `nsDependentCString` constructor feel like a corruption issue, they're crashing due to EXCEPTION_ACCESS_VIOLATION_READ, eg: 69bb96ff-49e7-4471-a34a-408310180116 0eb9cd62-4ee2-46c4-bd07-8bf1e0180116 b2ead80b-f587-47cc-836a-bb4b10180116 That would mean either the string table pointer is junk or the offset is junk [2]. One of the addresses had `fee` in it which could mean we're poking memory that was `HeapFree`'d. There are also a few OOM small instances that are quite large when allocating the gigantic backing table [3]: d97b0c06-88ad-4a3c-8d4a-8387a0180115 f2ba8008-665f-466b-8689-683fc0180115 I guess that's not super helpful but there are some improvements that can be made: 1) Split up that ginormous array 2) `gNameToHistogramIDMap` actually contains heap allocated strings, contrary to comments claiming it doesn't. We could switch to `nsDepCharHashKey` instead, but that *could* have perf implications. We could also add a `nsDependentCStringHashKey` that doesn't copy, but I think we have to fix our string classes for that to work. 3) We could add an assert that that address returned for name is w/in gHistogramStringTable & gHistogramStringTable + sizeof( gHistogramStringTable) [1] https://searchfox.org/mozilla-central/rev/48cbb200aa027a0a379b6004b6196a167344b865/xpcom/string/nsTSubstring.cpp#441 [2] https://searchfox.org/mozilla-central/rev/1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/toolkit/components/telemetry/TelemetryHistogram.cpp#439 [3] https://searchfox.org/mozilla-central/rev/1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/toolkit/components/telemetry/TelemetryHistogram.cpp#1729-1730
Flags: needinfo?(erahm)
Assignee | ||
Comment 11•6 years ago
|
||
This switches from `nsDependentCString` to `nsLiteralCString` when adding histograms to the table. It will help save some memory and should avoid the OOM smalls we have been seeing.
Attachment #8943459 -
Flags: review?(gfritzsche)
Assignee | ||
Updated•6 years ago
|
Assignee: nobody → erahm
Status: NEW → ASSIGNED
Assignee | ||
Comment 12•6 years ago
|
||
This adds a dignostic assertion that the name pointer is actually within the static name table.
Attachment #8943460 -
Flags: review?(gfritzsche)
Assignee | ||
Comment 13•6 years ago
|
||
Suggestions 2 and 3 were pretty easy so I went ahead and took care of them.
Comment 14•6 years ago
|
||
From comment 9 it sounds like the volume is not huge, so I'll call this wontfix for 58.
Comment 15•6 years ago
|
||
Comment on attachment 8943459 [details] [diff] [review] Part 1: Use nsLiteralCString to wrap histogram names Review of attachment 8943459 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for the help Eric. This looks good to me.
Attachment #8943459 -
Flags: review?(gfritzsche) → review+
Comment 16•6 years ago
|
||
Comment on attachment 8943460 [details] [diff] [review] Part 2: Add an assertion to check the histogram name pointer Review of attachment 8943460 [details] [diff] [review]: ----------------------------------------------------------------- Thanks!
Attachment #8943460 -
Flags: review?(gfritzsche) → review+
Comment 17•6 years ago
|
||
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #10) > I guess that's not super helpful but there are some improvements that can be > made: > > 1) Split up that ginormous array Do you think this contiguous array size is problematic? We end up using `ptr-size * 17170` and `ptr-size * 8585` respectively. I can file a follow-up for breaking this down if that will avoid issues.
Flags: needinfo?(erahm)
Assignee | ||
Comment 18•6 years ago
|
||
(In reply to Georg Fritzsche [:gfritzsche] from comment #17) > (In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment > #10) > > I guess that's not super helpful but there are some improvements that can be > > made: > > > > 1) Split up that ginormous array > > Do you think this contiguous array size is problematic? > We end up using `ptr-size * 17170` and `ptr-size * 8585` respectively. > I can file a follow-up for breaking this down if that will avoid issues. Its probably worth trying as a followup, there were a few crashes: bp-d97b0c06-88ad-4a3c-8d4a-8387a0180115 bp-f2ba8008-665f-466b-8689-683fc0180115
Flags: needinfo?(erahm)
Updated•6 years ago
|
Keywords: checkin-needed
Comment 20•6 years ago
|
||
Pushed by erahm@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/13493806c047 Part 1: Use nsLiteralCString to wrap histogram names. r=gfritzsche https://hg.mozilla.org/integration/mozilla-inbound/rev/6931047d462f Part 2: Add an assertion to check the histogram name pointer. r=gfritzsche
Assignee | ||
Updated•6 years ago
|
Flags: needinfo?(erahm)
Comment 21•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/13493806c047 https://hg.mozilla.org/mozilla-central/rev/6931047d462f
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
status-firefox60:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Comment 22•6 years ago
|
||
Please request Beta approval on this when you get a chance.
Assignee | ||
Comment 23•6 years ago
|
||
Comment on attachment 8943459 [details] [diff] [review] Part 1: Use nsLiteralCString to wrap histogram names Approval Request Comment [Feature/Bug causing the regression]: Possible memory corruption, incorrect wrapping of static strings. [User impact if declined]: Slightly higher memory usage, random crashes. [Is this code covered by automated tests?]: Yes. [Has the fix been verified in Nightly?]: Yes, been on nightly for a few days. [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: Part 2 of this patch. [Is the change risky?]: Low risk. [Why is the change risky/not risky?]: This just properly wraps static strings and the next patch introduces a diagnostic assert that will help track down crashes. [String changes made/needed]: N/A
Flags: needinfo?(erahm)
Attachment #8943459 -
Flags: approval-mozilla-beta?
Assignee | ||
Comment 24•6 years ago
|
||
Comment on attachment 8943460 [details] [diff] [review] Part 2: Add an assertion to check the histogram name pointer (See comment 23)
Attachment #8943460 -
Flags: approval-mozilla-beta?
Comment 25•6 years ago
|
||
Comment on attachment 8943459 [details] [diff] [review] Part 1: Use nsLiteralCString to wrap histogram names Since this might help us avoid or diagnose some OOM crashes, let's uplift to beta. This should be in 59 beta 6 by the end of the week.
Attachment #8943459 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 26•6 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/1ced99ea80c9 https://hg.mozilla.org/releases/mozilla-beta/rev/cc295c6c964d
Updated•6 years ago
|
Attachment #8943460 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in
before you can comment on or make changes to this bug.
Description
•