Closed
Bug 1320134
Opened 7 years ago
Closed 6 years ago
Crash in xul.dll@0x4d768 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0...
Categories
(Core :: Audio/Video: Playback, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla55
People
(Reporter: philipp, Assigned: cyu)
References
Details
(Keywords: crash, regression, Whiteboard: [ps-radar])
Crash Data
Attachments
(5 files, 7 obsolete files)
3.43 KB,
patch
|
Details | Diff | Splinter Review | |
2.56 KB,
patch
|
cyu
:
review+
gchang
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
8.51 KB,
patch
|
cyu
:
review+
gchang
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
1.62 KB,
patch
|
cyu
:
review+
gchang
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
11.88 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-ba31674f-32a7-427e-916d-99e2d2161124. ============================================================= Crashing Thread (33) Frame Module Signature Source 0 xul.dll xul.dll@0x4d768 1 xul.dll xul.dll@0x4d739 2 xul.dll xul.dll@0x4d739 3 xul.dll xul.dll@0x4d739 4 xul.dll xul.dll@0x4d739 5 xul.dll xul.dll@0x4d739 6 xul.dll xul.dll@0x4d739 7 xul.dll xul.dll@0x4d739 8 xul.dll xul.dll@0x4d739 9 xul.dll xul.dll@0x4d739 10 xul.dll xul.dll@0x4d739 11 xul.dll xul.dll@0x4d739 12 xul.dll xul.dll@0x4d739 13 xul.dll xul.dll@0x4d739 14 xul.dll xul.dll@0x4d739 15 xul.dll xul.dll@0x4d739 16 xul.dll xul.dll@0x4d739 17 xul.dll xul.dll@0x4d739 18 xul.dll xul.dll@0x4d739 19 xul.dll xul.dll@0x4d739 20 xul.dll xul.dll@0x4d739 21 xul.dll xul.dll@0x4d739 22 xul.dll xul.dll@0x4d739 23 xul.dll xul.dll@0x4d739 24 xul.dll xul.dll@0x4d739 25 xul.dll xul.dll@0x4d739 26 xul.dll xul.dll@0x4d739 27 xul.dll xul.dll@0x4d739 28 xul.dll xul.dll@0x4d739 29 xul.dll xul.dll@0x4d739 30 xul.dll xul.dll@0x4d739 31 xul.dll xul.dll@0x4d739 32 xul.dll xul.dll@0x4d739 33 xul.dll xul.dll@0x4d739 34 xul.dll xul.dll@0x4d739 35 xul.dll xul.dll@0x4d739 36 xul.dll xul.dll@0x4d739 37 xul.dll xul.dll@0x4d739 38 xul.dll xul.dll@0x4d739 39 xul.dll xul.dll@0x4d739 40 xul.dll xul.dll@0x4d739 41 xul.dll xul.dll@0x4d739 42 xul.dll xul.dll@0x4d739 43 xul.dll xul.dll@0x4d739 44 xul.dll xul.dll@0x4d739 45 xul.dll xul.dll@0x4d739 46 xul.dll xul.dll@0x4d739 47 xul.dll xul.dll@0x4d739 48 xul.dll xul.dll@0x4d739 49 xul.dll xul.dll@0x4d739 50 xul.dll xul.dll@0x4d739 51 xul.dll xul.dll@0x4d739 52 xul.dll xul.dll@0x4d739 53 xul.dll xul.dll@0x4d739 54 xul.dll xul.dll@0x4d739 55 xul.dll xul.dll@0x4d739 56 xul.dll xul.dll@0x4d739 57 xul.dll xul.dll@0x4d739 58 xul.dll xul.dll@0x4d739 59 xul.dll xul.dll@0x4d739 60 xul.dll xul.dll@0x4d739 61 xul.dll xul.dll@0x4d739 62 xul.dll xul.dll@0x4d739 63 xul.dll xul.dll@0x4d739 64 xul.dll xul.dll@0x4d739 65 xul.dll xul.dll@0x4d739 66 xul.dll xul.dll@0x4d739 67 xul.dll xul.dll@0x4d739 68 xul.dll xul.dll@0x4d739 69 xul.dll xul.dll@0x4d739 70 xul.dll xul.dll@0x4d739 71 xul.dll xul.dll@0x4d739 72 xul.dll xul.dll@0x4d739 73 xul.dll xul.dll@0x4d739 74 xul.dll xul.dll@0x4d739 75 xul.dll xul.dll@0x4d739 76 xul.dll xul.dll@0x4d739 77 xul.dll xul.dll@0x4d739 78 xul.dll xul.dll@0x4d739 79 xul.dll xul.dll@0x4d739 80 xul.dll xul.dll@0x4d739 81 xul.dll xul.dll@0x4d739 82 xul.dll xul.dll@0x4d739 83 xul.dll xul.dll@0x4d739 84 xul.dll xul.dll@0x4d739 85 xul.dll xul.dll@0x4d739 86 xul.dll xul.dll@0x4d739 87 xul.dll xul.dll@0x4d739 88 xul.dll xul.dll@0x4d739 89 xul.dll xul.dll@0x4d739 187 xul.dll xul.dll@0x4d739 188 xul.dll xul.dll@0x4d739 189 xul.dll xul.dll@0x4d739 190 xul.dll mp4_demuxer::MP4MetadataRust::MP4MetadataRust(mp4_demuxer::Stream*) media/libstagefright/binding/MP4Metadata.cpp:589 191 xul.dll mp4_demuxer::MP4Metadata::MP4Metadata(mp4_demuxer::Stream*) media/libstagefright/binding/MP4Metadata.cpp:157 192 xul.dll mozilla::MP4Demuxer::Init() dom/media/fmp4/MP4Demuxer.cpp:140 193 xul.dll mozilla::TrackBuffersManager::InitializationSegmentReceived() dom/media/mediasource/TrackBuffersManager.cpp:901 194 nss3.dll nss3.dll@0x490f 195 nss3.dll PR_GetThreadPrivate nsprpub/pr/src/threads/prtpd.c:204 196 xul.dll nsThreadManager::GetCurrentThread() xpcom/threads/nsThreadManager.cpp:222 this signature is the #5 crash in the content process in 51.0b so far (around 2.35% of all content crashes) and occurring on win7 & 8.1.
Reporter | ||
Comment 1•7 years ago
|
||
Correlations for Firefox Beta (100.0% in signature vs 00.39% overall) reason = EXCEPTION_STACK_OVERFLOW (98.78% in signature vs 22.53% overall) Module "msmpeg2adec.dll" = true (98.78% in signature vs 22.93% overall) Module "slc.dll" = true (89.02% in signature vs 13.87% overall) Module "sqmapi.dll" = true (99.09% in signature vs 26.67% overall) Module "ksuser.dll" = true (98.78% in signature vs 27.15% overall) Module "atl.dll" = true (98.78% in signature vs 28.85% overall) Module "RpcRtRemote.dll" = true (100.0% in signature vs 31.10% overall) Module "wevtapi.dll" = true (100.0% in signature vs 31.76% overall) Module "msmpeg2vdec.dll" = true (100.0% in signature vs 32.21% overall) Module "mf.dll" = true (100.0% in signature vs 32.22% overall) Module "evr.dll" = true (100.0% in signature vs 32.75% overall) Module "Wpc.dll" = true (98.78% in signature vs 30.57% overall) Module "WSHTCPIP.DLL" = true (99.70% in signature vs 32.07% overall) Module "samcli.dll" = true (100.0% in signature vs 32.85% overall) Module "dxva2.dll" = true (100.0% in signature vs 32.86% overall) Module "mfplat.dll" = true (100.0% in signature vs 33.08% overall) Module "avrt.dll" = true (98.78% in signature vs 31.92% overall) Module "samlib.dll" = true (100.0% in signature vs 34.65% overall) Module "mozavcodec.dll" = true (100.0% in signature vs 34.66% overall) Module "mozavutil.dll" = true (98.17% in signature vs 32.03% overall) Module "netutils.dll" = true (98.78% in signature vs 33.44% overall) Module "dhcpcsvc.dll" = true (100.0% in signature vs 37.86% overall) dom_ipc_enabled = 1 (100.0% in signature vs 39.95% overall) Module "pnrpnsp.dll" = true (100.0% in signature vs 40.02% overall) Module "NapiNSP.dll" = true (05.18% in signature vs 67.33% overall) submitted_from_infobar = true (100.0% in signature vs 40.05% overall) Module "nlaapi.dll" = true (100.0% in signature vs 42.77% overall) Module "bcrypt.dll" = true (100.0% in signature vs 43.19% overall) Module "cryptsp.dll" = true (100.0% in signature vs 46.33% overall) Module "rsaenh.dll" = true (100.0% in signature vs 47.37% overall) Module "browsercomps.dll" = true (100.0% in signature vs 47.73% overall) Module "winrnr.dll" = true (100.0% in signature vs 48.20% overall) startup_crash = null (90.24% in signature vs 36.44% overall) Module "icm32.dll" = true (100.0% in signature vs 49.72% overall) Module "firefox.exe" = true (100.0% in signature vs 51.26% overall) Module "dnsapi.dll" = true (65.85% in signature vs 18.91% overall) os_arch = x86 (98.78% in signature vs 57.59% overall) Module "Wldap32.dll" = true (64.63% in signature vs 19.97% overall) platform_version = 6.1.7600 (80.49% in signature vs 37.01% overall) Module "softokn3.dll" = true (80.49% in signature vs 37.39% overall) Module "freebl3.dll" = true (98.78% in signature vs 59.58% overall) platform_pretty_version = Windows 7 (34.76% in signature vs 00.08% overall) adapter_device_id = 0x053b
Updated•7 years ago
|
Component: General → Audio/Video
Comment 2•7 years ago
|
||
Manish, should we duplicate this one to bug 1318943 ?
Flags: needinfo?(manishearth)
Comment 3•7 years ago
|
||
Ralph, Could you help have a look?
Component: Audio/Video → Audio/Video: Playback
Flags: needinfo?(giles)
Comment 4•7 years ago
|
||
(In reply to Kan-Ru Chen [:kanru] (UTC+8) from comment #2) > Manish, should we duplicate this one to bug 1318943 ? Nah, this is an mp4 crash, that was a rust-url crash (in non-shared code)
Flags: needinfo?(manishearth)
Comment 5•7 years ago
|
||
On beta, MP4Metadata.cpp:589 is `mRustParser.reset(mp4parse_new(&io))` in the MP4MetadataRust ctor. Since it's initial construction, it's presumedly the mp4parse_new() call which is failing rather than the mp4parse_free() invoked by UniquePtr::reset when replacing an established pointer. `io` is a local struct, so it should be valid. The rest of the stack could be the rust panic loop we've seen before. So what failed in mp4parse_new? This is a constructor as well; it just does (small) allocations and initialization. Is there a correlation with very low or fragmented memory?
Flags: needinfo?(giles)
Comment 6•7 years ago
|
||
I bet this is the same thing as bug 1305315, just presenting differently (the stack frames are missing symbols for some reason). The pseudo stacktrace I generated for a crash from that bug looked like: https://gist.github.com/luser/650057bfc2ca2f88ea5a703fca0c77bb There's a `xul.dll!mp4parse_new [local.rs:91faf7ec36cd : 208 + 0x14]` frame in there, calling into some std library code, which fails in initialization and then panics, and then fails during panic, etc.
Updated•7 years ago
|
Crash Signature: ...] → ...]
[@ xul.dll@0x4d7e8 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4…
Comment 8•7 years ago
|
||
(In reply to Marcia Knous [:marcia - use ni] from comment #7) > from Triage team: ni on marcia to follow up on this Marco and I tried to reprocess this signature, but unfortunately the signature hasn't changed - so I guess we are still missing symbols here.
Flags: needinfo?
Updated•7 years ago
|
Crash Signature: xul.dll@0x4d7b9 | ...] → xul.dll@0x4d7b9 | ...]
[@ xul.dll@0x4d778 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4…
Comment 9•7 years ago
|
||
I believe this bug should be flag as P1 because of the majority of crashes revealed after 51.0b8 is out. We might need either a clear STR or go over all uplifted FF51 patches regarding to demuxer to keep the investigation running. Blake, thoughts ?
Flags: needinfo?(bwu)
Priority: -- → P1
Comment 10•7 years ago
|
||
Yeah. I think the same way. Alfredo, Per discussing today, before Ralph comes back next week can you have a look to see if there is anything we can help?
Flags: needinfo?(bwu) → needinfo?(ayang)
Comment 11•7 years ago
|
||
(In reply to Blake Wu [:bwu][:blakewu] from comment #10) > Yeah. I think the same way. > Alfredo, > Per discussing today, before Ralph comes back next week can you have a look > to see if there is anything we can help? It crashes at MP4Metadata trying to allocate the rust demuxer. As we talked before, it could be something leaked or corrupted the memory. I believe rust code is the victim in this case. I agree with comment 6, this is the same one as bug 1305315.
Flags: needinfo?(ayang)
Comment 12•7 years ago
|
||
[Tracking Requested - why for this release]: high volume crash on Firefox 51 with Window 7.
tracking-firefox51:
--- → ?
Updated•7 years ago
|
Crash Signature: ...]
[@ xul.dll@0x4d7e8 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4… → ...]
[@ xul.dll@0x4d7f8 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4…
Comment 13•7 years ago
|
||
Note that the crash has always been there in 51 Beta, it didn't regress in Beta 8.
Comment 14•7 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #6) > I bet this is the same thing as bug 1305315, just presenting differently Ted, if this is the case, what's your suggestion to fix this issue ?
Flags: needinfo?(ted)
See Also: → 1305315
Updated•7 years ago
|
Comment 15•7 years ago
|
||
(In reply to Astley Chen [:astley] UTC+8 from comment #14) > Ted, if this is the case, what's your suggestion to fix this issue ? I ran out of ideas in that bug, so I don't have anything better to offer here, sadly.
Flags: needinfo?(ted)
Comment 17•6 years ago
|
||
Is this crash still happening? If so, how can we make it actionable?
Flags: needinfo?(madperson)
Flags: needinfo?(giles)
Reporter | ||
Comment 18•6 years ago
|
||
it's a bit difficult to answer, since super search doesn't play nice with these kinds of crashes unfortunately. they are clearly still around in 52.0b and make up ~2% of all content crashes, but on aurora & nightly i'm not sure... on 52.0b1 these crash stacks look like described in comment #0: https://crash-stats.mozilla.com/report/index/f0affd59-140d-4083-b054-ff7c72170209 afterwards it seems to get a little bit different: https://crash-stats.mozilla.com/report/index/c3fed71f-7afb-4bc8-b34c-c7a5d2170209
Flags: needinfo?(madperson)
Reporter | ||
Updated•6 years ago
|
Crash Signature: xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | ...] → xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | ...]
[@ xul.dll@0x499ee | xul.dll@0x499c4 | xul.dll@0x499c4 | xul.dll@0x499c4 | xul.dll@0x499c4 | xul.dll@0x4…
Comment 19•6 years ago
|
||
I still have no insight. Cervantes, what do you think about pursuing your TlsAlloc diagnostic idea from https://bugzilla.mozilla.org/show_bug.cgi?id=1305315#c24 on this bug?
Flags: needinfo?(giles) → needinfo?(cyu)
Assignee | ||
Comment 20•6 years ago
|
||
I can write the diagnostic patch.
Assignee: nobody → cyu
Flags: needinfo?(cyu)
Assignee | ||
Comment 21•6 years ago
|
||
This tracks allocation/free of TLS slots on Windows by interposing TlsAlloc()/TlsFree() calls. When observed TLS slot allocations has reached a high watermark, we start recording the stacks of TlsAlloc() calls (and forget the stack if it's TlsFree()'d). When we observe that TlsAlloc() returns TLS_OUT_OF_INDEXES, we serialize the recorded data to a string ready for crash annotation.
Attachment #8839094 -
Flags: review?(nfroyd)
Assignee | ||
Comment 22•6 years ago
|
||
This enables TlsAllocationTracker in the child process, inside which the crash for this bug is observed.
Assignee | ||
Updated•6 years ago
|
Attachment #8839095 -
Flags: review?(nfroyd)
Assignee | ||
Comment 23•6 years ago
|
||
This adds an annotation "TlsAllocations" to the crash report on TLS slots running out.
Attachment #8839100 -
Flags: review?(ted)
Assignee | ||
Comment 24•6 years ago
|
||
Patch to test TlsAllocationTracker. Not for review. #define TEST_TLS_TRACKER in TlsAllocationTracker.h and then the content process will record the first 10 TlsAlloc() stacks. Then crashinject.exe the content process, you'll see the annotation like the following. Then we can decode the stacks offline to see why we run out of TLS slots. TlsAllocations=[0x7ff676571613,0x7ff6765712f5,0x7ff676571adf,0x7ff676595f79,0x7ffb41a58364,0x7ffb43e770d1,],[0x7ffaf70a6ae1,0x7ffaf8dd63b0,0x7ff676571613,0x7ff6765712f5,0x7ff676571adf,0x7ff676595f79,0x7ffb41a58364,0x7ffb43e770d1,],[0x7ffaf70abd0b,0x7ffaf70af76b,0x7ffaf70b9df3,0x7ffaf8dd644d,0x7ff676571613,0x7ff6765712f5,0x7ff676571adf,0x7ff676595f79,0x7ffb41a58364,0x7ffb43e770d1,],[0x7ffaf8dd6718,0x7ffaf70caf6e,0x7ffaf8034b87,0x7ffaf8dd64c9,0x7ff676571613,0x7ff6765712f5,0x7ff676571adf,0x7ff676595f79,0x7ffb41a58364,0x7ffb43e770d1,],[0x7ffaf8dd6718,0x7ffaf70caf6e,0x7ffaf8034b87,0x7ffaf8dd64c9,0x7ff676571613,0x7ff6765712f5,0x7ff676571adf,0x7ff676595f79,0x7ffb41a58364,0x7ffb43e770d1,],[0x7ffaf6da67ab,0x7ffaf8dd6718,0x7ffaf70caf6e,0x7ffaf8034b87,0x7ffaf8dd64c9,0x7ff676571613,0x7ff6765712f5,0x7ff676571adf,0x7ff676595f79,0x7ffb41a58364,0x7ffb43e770d1,],[0x7ffaf8dd6718,0x7ffaf70caf6e,0x7ffaf8034b87,0x7ffaf8dd64c9,0x7ff676571613,0x7ff6765712f5,0x7ff676571adf,0x7ff676595f79,0x7ffb41a58364,0x7ffb43e770d1,],[0x7ffaf72b58bf,0x7ffaf72c76d9,0x7ffaf72ca9b6,0x7ffaf72d6568,0x7ffaf72c7ca9,0x7ffaf8583a1d,0x7ffaf6d7d57e,0x7ffaf6d7c4a6,0x7ffaf6d7bcfe,0x7ffaf6d7f995,0x7ffaf6d7f6b5,0x7ffaf6d3fd23,0x7ffaf6d8000e,0x7ffaf6d7db40,0x7ffaf6d810e0,0x7ffaf6d7e545,0x7ffaf6d7db90,0x7ffaf6d81096,0x7ffaf6d7e545,0x7ffaf6d7ebca,0x7ffaf6d7d10a,0x7ffaf6da67e6,0x7ffaf8dd6718,0x7ffaf70caf6e,0x7ffaf8034b87,0x7ffaf8dd64c9,0x7ff676571613,0x7ff6765712f5,0x7ff676571adf,0x7ff676595f79,0x7ffb41a58364,0x7ffb43e770d1,],[0x7ffb3fbb48ac,0x7ffb43e3a35f,0x7ffb43e1771a,0x7ffb43e17567,0x7ffb43e1d33d,0x7ffb43e37099,0x7ffb43e36add,0x7ffb43e19efc,0x7ffb35172b32,0x7ffb40abcd7f,0x7ffb41e1462a,0x7ffb41e1322a,0x7ffb41e1acf4,0x7ffb41e1a4af,0x7ffb0c787a55,0x7ffb0c78370b,0x7ffaf6dd4b0c,0x7ffaf6dcab69,0x7ffaf6e1a2f9,0x7ffaf6d90ecb,0x7ffaf6d9051f,0x7ffaf70c4122,0x7ffaf70aba9f,0x7ffaf70ab86e,0x7ffaf6d938e6,0x7ffb0c79512e,0x7ffb0c7882da,0x7ffb4119cab0,0x7ffb41a58364,0x7ffb43e770d1,],[0x7ffaf8586488,0x7ffaf8583a2c,0x7ffaf6d7d57e,0x7ffaf6d7c4a6,0x7ffaf6d7bcfe,0x7ffaf6d7f995,0x7ffaf6d7f6b5,0x7ffaf6d3fd23,0x7ffaf6d8000e,0x7ffaf6d7db40,0x7ffaf6d810e0,0x7ffaf6d7e545,0x7ffaf6d7db90,0x7ffaf6d81096,0x7ffaf6d7e545,0x7ffaf6d7ebca,0x7ffaf6d7d10a,0x7ffaf6da67e6,0x7ffaf8dd6718,0x7ffaf70caf6e,0x7ffaf8034b87,0x7ffaf8dd64c9,0x7ff676571613,0x7ff6765712f5,0x7ff676571adf,0x7ff676595f79,0x7ffb41a58364,0x7ffb43e770d1,],[0x7ffaf8dd6718,0x7ffaf70caf6e,0x7ffaf8034b87,0x7ffaf8dd64c9,0x7ff676571613,0x7ff6765712f5,0x7ff676571adf,0x7ff676595f79,0x7ffb41a58364,0x7ffb43e770d1,],
Assignee | ||
Comment 25•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6873be9d761b
![]() |
||
Comment 26•6 years ago
|
||
Comment on attachment 8839094 [details] [diff] [review] Part 1 - Tracking TLS allocations for diagnosing TLS slots run out on Windows. Review of attachment 8839094 [details] [diff] [review]: ----------------------------------------------------------------- Lots of little comments. Are we hoping that the last ~40 allocations are going to be the problematic ones, rather than something allocating several hundred TLS indices before we even start recording stacks? ::: xpcom/build/TlsAllocationTracker.cpp @@ +24,5 @@ > +namespace { > + > +using ::mozilla::Move; > +using ::mozilla::UniquePtr; > +using ::mozilla::Atomic; Are these really necessary, since we're already inside the mozilla:: namespace at this point? @@ +30,5 @@ > +std::string sTlsAllocationStacks; > + > +// Start recording TlsAlloc() call stacks when we observed kStartTrackingTlsAt > +// allocations. > +const uint32_t kStartTrackingTlsAt = 1040; Maybe add another sentence describing why we chose the particular value here: it's close to the maximum permitted number of allocated TLS slots. @@ +33,5 @@ > +// allocations. > +const uint32_t kStartTrackingTlsAt = 1040; > + > +using stack_t = std::vector<uintptr_t>; > +using stacks_t = std::map<DWORD, stack_t>; I would strongly prefer using XPCOM datastructures here and for sTlsAllocationStacks. @@ +34,5 @@ > +const uint32_t kStartTrackingTlsAt = 1040; > + > +using stack_t = std::vector<uintptr_t>; > +using stacks_t = std::map<DWORD, stack_t>; > +static UniquePtr<stacks_t> sRecentTlsAllocationStacks; Can we use StaticAutoPtr here to avoid the static constructor? @@ +41,5 @@ > +static CRITICAL_SECTION sMutex; > +static Atomic<uint32_t> sCurrentTlsSlots{0}; > + > +// Windows DLL interceptor > +static WindowsDllInterceptor sKernel32DllInterceptor{}; WDYT about moving this object into InitTlsAllocationTracker to avoid the static constructor? @@ +52,5 @@ > +class CriticalSectionAutoEnter > +{ > +public: > + explicit CriticalSectionAutoEnter(CRITICAL_SECTION* aCriticalSection) : > + mCriticalSection(aCriticalSection) The indentation for these methods seems a bit deeper than the two-space indents we normally use. I think it's also the style to format the initialization list like so: explicit CriticalSectionAutoEnter(...) : mCriticalSection(...) { ... @@ +81,5 @@ > + > + stack_t rawStack; > + auto callback = [](uint32_t, void* aPC, void*, void* aClosure) { > + std::vector<uintptr_t>* stack = > + static_cast<std::vector<uintptr_t>*>(aClosure); How about |auto* stack = static_cast<stack_t*>(aClosure);|? @@ +85,5 @@ > + static_cast<std::vector<uintptr_t>*>(aClosure); > + stack->push_back(reinterpret_cast<uintptr_t>(aPC)); > + }; > + MozStackWalk(callback, /* skip 2 frames */ 2, > + /* maxFrames */ 0, reinterpret_cast<void*>(&rawStack), 0, nullptr); You shouldn't need to reinterpret_cast to void*; static_cast should be fine. @@ +91,5 @@ > + CriticalSectionAutoEnter lock(&sMutex); > + if (!sRecentTlsAllocationStacks) { > + return; > + } > + sRecentTlsAllocationStacks->emplace(aTlsIndex, rawStack); Can we Move(rawStack) instead of copying it here? @@ +129,5 @@ > + const stack_t& stack = iter->second; > + output << "["; > + > + for (auto it = stack.begin(); it != stack.end(); ++it) { > + output << "0x" << *it << ","; Is it worth making this output valid JSON so it's easy to parse? @@ +137,5 @@ > + > + sTlsAllocationStacks = Move(output.str()); > +} > + > +DWORD NTAPI This should be WINAPI, for consistency with InterposedTlsFree and the Windows documentation. @@ +172,5 @@ > + if (sInitialized) { > + return; > + } > + > + InitializeCriticalSection(&sMutex); I think we need to address https://dxr.mozilla.org/mozilla-central/source/mozglue/misc/Mutex_windows.cpp#19 here as well, whether by using InitializeCriticalSectionEx or just using mozilla::detail::MutexImpl directly.
Attachment #8839094 -
Flags: review?(nfroyd) → feedback+
![]() |
||
Comment 27•6 years ago
|
||
Comment on attachment 8839095 [details] [diff] [review] Part 2 - Initialize the TLS allocation tracker for child process. Review of attachment 8839095 [details] [diff] [review]: ----------------------------------------------------------------- r=me assuming the answer to the question below is reasonable. ::: toolkit/xre/nsEmbedFunctions.cpp @@ +357,5 @@ > #endif > > #if defined(XP_WIN) > + // XXX Bug 1320134: added for diagnosing the crash because of TLS index run > + // out on Windows. Remove after the root cause is found. Nit: "...the crashes because we're running out of TLS indices on Windows. Remove after..." @@ +358,5 @@ > > #if defined(XP_WIN) > + // XXX Bug 1320134: added for diagnosing the crash because of TLS index run > + // out on Windows. Remove after the root cause is found. > + mozilla::InitTlsAllocationTracker(); Have we allocated any TLS indices prior to this point? Otherwise we might not start recording TLS index allocation stacks until it's too late...
Attachment #8839095 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 28•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=61a3b6c6a77d
Comment 29•6 years ago
|
||
Comment on attachment 8839100 [details] [diff] [review] Part 3 - Annotate the crash report with TLS allocation stacks on running out of TLS slots. Review of attachment 8839100 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/crashreporter/nsExceptionHandler.cpp @@ +1423,5 @@ > } > + > +#ifdef XP_WIN > + const std::string& tlsAllocations = mozilla::GetTlsAllocationStacks(); > + if (tlsAllocations.length()) { Nit: I would have used `if (!tlsAllocations.empty())`
Attachment #8839100 -
Flags: review?(ted) → review+
Assignee | ||
Comment 30•6 years ago
|
||
Changes to V1: - Lower the value of kStartTrackingTlsAt to 950. There are a couple of allocations before we start tracking TLS allocations. The values is chosen so that we don't add a very long annotation to the .extra file. For out of TLS indexes, we should catch something meaningful in the most recent 100 allocations that we record. If a long annotation to the .extra file is a nonissue, we can choose an even lower value like 800 or 700, as long as it doesn't impact normal users. - Output the annotation in JSON format. - Remove constructions in the global scope - Use StaticMutex. - Use XPCOM strings and data structures.
Attachment #8839094 -
Attachment is obsolete: true
Attachment #8840830 -
Flags: review?(nfroyd)
Assignee | ||
Comment 31•6 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #27) > Comment on attachment 8839095 [details] [diff] [review] > Part 2 - Initialize the TLS allocation tracker for child process. > > Review of attachment 8839095 [details] [diff] [review]: > ----------------------------------------------------------------- > > r=me assuming the answer to the question below is reasonable. > > @@ +358,5 @@ > > > > #if defined(XP_WIN) > > + // XXX Bug 1320134: added for diagnosing the crash because of TLS index run > > + // out on Windows. Remove after the root cause is found. > > + mozilla::InitTlsAllocationTracker(); > > Have we allocated any TLS indices prior to this point? Otherwise we might > not start recording TLS index allocation stacks until it's too late... You are right, in my test I can only observe ~1034 TlsAlloc() calls, which means about 50 TlsAlloc() calls are already made before we initialize the tracker. I lower the value of kStartTrackingTlsAt in part 1 to 950. We can lower the value if we want to have more stacks in the annotation.
Assignee | ||
Comment 32•6 years ago
|
||
Updated the comment.
Attachment #8839095 -
Attachment is obsolete: true
Attachment #8840838 -
Flags: review+
Assignee | ||
Comment 33•6 years ago
|
||
Updated for the interface change in part 1: use const char* return type instead of const std::string& because part1 uses nsCString internally.
Attachment #8839100 -
Attachment is obsolete: true
Attachment #8840840 -
Flags: review+
Assignee | ||
Comment 34•6 years ago
|
||
The JSON annotation now looks like: TlsAllocations={[1941097036, 1941096875, 1999824270, 1999638086, 1999547132, 1999546779, 1999572918, 1999635099, 1999621524, 1999545935, 1866344146, 1946064184, 1957586952, 1957642032, 1957677238, 1836443619, 1836428643, 1836428079, 1522326734, 1522294004, 1522548268, 1522111547, 1522109636, 1524802161, 1524729866, 1524729356, 1522119717, 1836489082, 1836445497, 1996084719, 1983996612, 1999769561, 1999769508], [1548049934, 11146465, 11145907, 11147568, 11259493, 1983996612, 1999769561, 1999769508], [1548020693, 1524821998, 1536856748, 1548020225, 1548049934, 11146465, 11145907, 11147568, 11259493, 1983996612, 1999769561, 1999769508], [1524821998, 1536856748, 1548020225, 1548049934, 11146465, 11145907, 11147568, 11259493, 1983996612, 1999769561, 1999769508], [1524730291, 1524741301, 1524772615, 1548017948, 1548020168, 1548049934, 11146465, 11145907, 11147568, 11259493, 1983996612, 1999769561, 1999769508], [1526424988, 1522046144, 1522057842, 1522057381, 1521850734, 1522059320, 1522052576, 1522063141, 1522046643, 1522052658, 1522063090, 1522046643, 1522054288, 1522055271, 1522050528, 1522174494, 1548020693, 1524821998, 1536856748, 1548020225, 1548049934, 11146465, 11145907, 11147568, 11259493, 1983996612, 1999769561, 1999769508], [1548020693, 1524821998, 1536856748, 1548020225, 1548049934, 11146465, 11145907, 11147568, 11259493, 1983996612, 1999769561, 1999769508], [1522046144, 1522057842, 1522057381, 1521850734, 1522059320, 1522052576, 1522063141, 1522046643, 1522052658, 1522063090, 1522046643, 1522054288, 1522055271, 1522050528, 1522174494, 1548020693, 1524821998, 1536856748, 1548020225, 1548049934, 11146465, 11145907, 11147568, 11259493, 1983996612, 1999769561, 1999769508], [1548020015, 1548049934, 11146465, 11145907, 11147568, 11259493, 1983996612, 1999769561, 1999769508], [1548020693, 1524821998, 1536856748, 1548020225, 1548049934, 11146465, 11145907, 11147568, 11259493, 1983996612, 1999769561, 1999769508], [1524821998, 1536856748, 1548020225, 1548049934, 11146465, 11145907, 11147568, 11259493, 1983996612, 1999769561, 1999769508]}
Comment 35•6 years ago
|
||
We could make this more compact at the expense of making it more of a pain to work with in Socorro by writing it out as part of the minidump, so it'd just be `N * sizeof(void*)` bytes extra in the minidump. That would mean we'd have to extract it from the minidump in the stackwalker to do anything useful with it, though.
Comment 36•6 years ago
|
||
Oh, I missed the fact that we're capturing full stacks here. That'd make it larger still. Do you have a plan for getting useful data out of this annotation? It strikes me that it might actually be better to put this in the minidump so we could symbolicate it with the symbols we already have during minidump processing.
![]() |
||
Comment 37•6 years ago
|
||
Comment on attachment 8840830 [details] [diff] [review] Part 1 - Tracking TLS allocations for diagnosing TLS slots run out on Windows. (v2) Review of attachment 8840830 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for the changes. This is fine as far as it goes, but we should have some discussion and consensus about the best way to send this data along for easy processing.
Attachment #8840830 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 38•6 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #36) > Oh, I missed the fact that we're capturing full stacks here. That'd make it > larger still. > > Do you have a plan for getting useful data out of this annotation? It At the very least, we can load the minidump in Visual Studio debugger and symbolicate manually in it. Or we can synthesize a request to http://symbolapi.mozilla.org as the gecko profiler addon does. > strikes me that it might actually be better to put this in the minidump so > we could symbolicate it with the symbols we already have during minidump > processing. I don't get it. Are you saying that we add more stacks to the minidump using the TLS allocation stacks we captured and make socorro decode the stacks for us? I am not sure if we can do this on Windows.
Assignee | ||
Comment 39•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b5b5a3f59b3b
Assignee | ||
Comment 40•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7e4b575d59d9
Comment 41•6 years ago
|
||
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #38) > I don't get it. Are you saying that we add more stacks to the minidump using > the TLS allocation stacks we captured and make socorro decode the stacks for > us? I am not sure if we can do this on Windows. Sort of--they wouldn't be stacks the way the other thread stacks are in the minidump, we could just add additional streams to the minidump that were arrays of instruction pointers, and then use the already-loaded symbols in the stackwalker to turn them into function names+source locations and put that in the stackwalker output. It'd be a bit more work though, so I think you should just land what you have and see what the data looks like. If it turns out to be difficult to work with we can revisit this.
Assignee | ||
Comment 42•6 years ago
|
||
Try bustage: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d29182af4a8f8a7f3ef7428b7c70a4a94b79e8ef&selectedJob=81105147 WindowsDllInterceptor fails to patch the function because it encounters unknown instruction byte 0xeb (Jmp rel8), both on x86 and x64, in patching TlsFree(), but local Windows build works fine.
Assignee | ||
Comment 43•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=19a80fbba7e6
If bug 1345856 is a duplicate of this issue, this is still a high volume crash on 53 beta 1.
tracking-firefox52:
--- → +
tracking-firefox53:
--- → +
Assignee | ||
Comment 45•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=baf8941f947f
Assignee | ||
Comment 46•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=dfc9502528b2
Updated•6 years ago
|
Whiteboard: [ps-radar]
Assignee | ||
Comment 47•6 years ago
|
||
Carryover r+ with the following changes/fixes: - Interface for shutting down the TLS tracker - Synchronization in GetTlsAllocationStacks() It looks like that the span of mutex lock in this function is not enough and needs to extend to nsExceptionHandler after it's written out to the crash annotation file, but the string annotation, sTlsAllocationStacks, will only be written by the first thread that observes TLS_OUT_OF_INDEXES. - Don't touch the wrapper functionality after shutdown.
Attachment #8840830 -
Attachment is obsolete: true
Attachment #8846571 -
Flags: review+
Assignee | ||
Comment 48•6 years ago
|
||
Carryover r+ with the following changes: - Shutdown before exiting the main function. - Only enable in non-debug build. This is to avoid the hard-to-debug crashes in try pushes because kernel32.dll!TlsFree() starts with an unknow instruction byte.
Attachment #8840838 -
Attachment is obsolete: true
Attachment #8846580 -
Flags: review+
Assignee | ||
Comment 49•6 years ago
|
||
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #46) > https://treeherder.mozilla.org/#/jobs?repo=try&revision=dfc9502528b2 Tests on Windows are green. I'll land with the above updates.
Assignee | ||
Comment 50•6 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/5fd991c5f12076490a52a2c0232af3275e1bae12 Bug 1320134 - Part 1: Tracking TLS allocations for diagnosing out-of-TLS-slots crashes on Windows. r=froydnj https://hg.mozilla.org/integration/mozilla-inbound/rev/e91e04dadeb3d8950dc7c3327fde93e3ec26045a Bug 1320134 - Part 2: Initialize and shutdown the TLS allocation tracker in the content process. r=froydnj https://hg.mozilla.org/integration/mozilla-inbound/rev/9ba65a580135e445066222199eec1fc4abb5a57c Bug 1320134 - Part 3: Annotate the crash report with TLS allocation stacks on running out of TLS slots. r=ted
Comment 51•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/5fd991c5f120 https://hg.mozilla.org/mozilla-central/rev/e91e04dadeb3 https://hg.mozilla.org/mozilla-central/rev/9ba65a580135
Status: NEW → RESOLVED
Closed: 6 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Updated•6 years ago
|
Cervantes, do you want to request uplift for this fix? If we can confirm that it's useful, and you don't think it's too risky of a change, I'd like to get it into beta 3.
Flags: needinfo?(cyu)
Comment 53•6 years ago
|
||
Note that the dependency on bug 1347130 was from Alice White seeing this fingered by mozregression as the source, so we may want to hold for a few days on uplift. Note that bug 1347130 seems to cause 'odd' mozregression results, with people getting different (and unlikely) results; perhaps due to something trashing memory. Perhaps an ASAN linux build could catch it, but since the crash is in GL code probably not.
Assignee | ||
Comment 54•6 years ago
|
||
Liz, the landed patches are *not* a fix. They are intended to be shipped to users for diagnosing the root cause of the crash.(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #52) > Cervantes, do you want to request uplift for this fix? If we can confirm > that it's useful, and you don't think it's too risky of a change, I'd like > to get it into beta 3. (In reply to Randell Jesup [:jesup] from comment #53) > Note that the dependency on bug 1347130 was from Alice White seeing this > fingered by mozregression as the source, so we may want to hold for a few > days on uplift. Note that bug 1347130 seems to cause 'odd' mozregression > results, with people getting different (and unlikely) results; perhaps due > to something trashing memory. > > Perhaps an ASAN linux build could catch it, but since the crash is in GL > code probably not. I can confirm that this bug causes the crash in bug 1347130. ASAN won't catch it because this is only enabled on Windows builds. It performs DLL interception to kernel32.dll. I'll backout the patches reland with the crash fiexed.
Flags: needinfo?(cyu)
Assignee | ||
Comment 55•6 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/7012620c3aee5e36beb7156ec3620de897b03c1f Backed out changeset 5fd991c5f120 (bug 1320134) for crashing webgl on Windows. r=backout https://hg.mozilla.org/integration/mozilla-inbound/rev/5389367569cc29a30a1631dda01606f1981d3bfc Backed out changeset e91e04dadeb3 (bug 1320134) for crashing webgl on Windows. r=backout https://hg.mozilla.org/integration/mozilla-inbound/rev/ef6728e3e1364596180a25268d705c52e03eb4f3 Backed out changeset 9ba65a580135 (bug 1320134) for crashing webgl on Windows. r=backout
Comment hidden (obsolete) |
Assignee | ||
Comment 57•6 years ago
|
||
Scratch comment #56. It's wrong calling convention. The function pointer prototypes miss the WINAPI calling convention and then mess up the stack after the call to gOriginalTlsFree(aTlsIndex). The interceptor of TlsAlloc() doesn't have this crash because it doesn't have parameter. After adding WINAPI to the function pointer type aliases, the crash doesn't show up locally. I'll test on try and reland after the tests are green.
Comment 59•6 years ago
|
||
backout bugherder |
also backed out from m-c https://hg.mozilla.org/mozilla-central/rev/7012620c3aee
Assignee | ||
Comment 60•6 years ago
|
||
I added the calling convention fix and repushed, but the server still have problems with us patching kernel32!TlsAlloc(): https://treeherder.mozilla.org/#/jobs?repo=try&revision=0a5860e4271dcc0b6b18b3d68e12fd43164fbf57 Another push to dump the instruction bytes on interception failure: https://treeherder.mozilla.org/#/jobs?repo=try&revision=85236a9fd4999afb2a5fe17821f42595068db040 My current plan is: let's see if the failure in patching TlsAlloc() can be fixed quickly. If not then I'll fix bug 1347130 and only enable on release build.
Updated•6 years ago
|
status-firefox54:
--- → affected
status-firefox-esr45:
--- → unaffected
status-firefox-esr52:
--- → affected
Assignee | ||
Comment 61•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=00bd544387de
Assignee | ||
Comment 62•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=fcec4dc410e3
Assignee | ||
Comment 63•6 years ago
|
||
Filed bug 1348747 for the crash we saw in intercepting TlsAlloc() and TlsFree().
Depends on: 1348747
Assignee | ||
Comment 64•6 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/f760842b14a25c8de500472ead62fcf50ee36c0b Bug 1320134 - Part 1: Tracking TLS allocations for diagnosing out-of-TLS-slots crashes on Windows. r=froydnj https://hg.mozilla.org/integration/mozilla-inbound/rev/051b765ca8f2a21093ebdd9161351af2b4ccd7ab Bug 1320134 - Part 2: Initialize and shutdown the TLS allocation tracker in the content process. r=froydnj https://hg.mozilla.org/integration/mozilla-inbound/rev/01125b5142e587fb8d8bd40c8dbaf6aaef71a9f9 Bug 1320134 - Part 3: Annotate the crash report with TLS allocation stacks on running out of TLS slots. r=ted
Reporter | ||
Updated•6 years ago
|
Crash Signature: [@ xul.dll@0x4d768 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | … → [@ std::sys::imp::thread_local::create ]
[@ RtlFindClearBits | RtlFindClearBitsAndSet | TlsAlloc ]
[@ xul.dll@0x4d768 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll…
Comment 66•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/f760842b14a2 https://hg.mozilla.org/mozilla-central/rev/051b765ca8f2 https://hg.mozilla.org/mozilla-central/rev/01125b5142e5
Assignee | ||
Comment 67•6 years ago
|
||
Attachment #8846571 -
Attachment is obsolete: true
Attachment #8851452 -
Flags: review+
Assignee | ||
Updated•6 years ago
|
Attachment #8851452 -
Attachment description: 8846571: Part 1 - Tracking TLS allocations for diagnosing out-of-TLS-slots crashes on Windows → Part 1 - Tracking TLS allocations for diagnosing out-of-TLS-slots crashes on Windows
Assignee | ||
Comment 68•6 years ago
|
||
Attachment #8840840 -
Attachment is obsolete: true
Attachment #8851453 -
Flags: review+
Assignee | ||
Comment 69•6 years ago
|
||
Comment on attachment 8851452 [details] [diff] [review] Part 1 - Tracking TLS allocations for diagnosing out-of-TLS-slots crashes on Windows Approval Request Comment [Feature/Bug causing the regression]: No bug, or the bug resulting in this crash is still unknown. [User impact if declined]: We won't be able to fix this crash. This is a debug patch for finding the root cause of this crash. We need to ship this patch to our users for catching information about the crash and really fix this crash. [Is this code covered by automated tests?]: No. This is hard to automate because running out of system resource, like TLS in this bug, make the running process very unstable and the behavior is nondeterministic. The code is tested locally. [Has the fix been verified in Nightly?]: Yes. [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: None. [Is the change risky?]: The change is intrusive and risky to some extent because it hooks some Windows system calls. However, most users, this patch is not risky because most users don't run out of TLS, and what this patch does is merely increment or decrement of an internal counter. [Why is the change risky/not risky?]: [String changes made/needed]: None.
Attachment #8851452 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 70•6 years ago
|
||
Comment on attachment 8846580 [details] [diff] [review] Part 2 - Initialize/shutdown the TLS allocation tracker in the content process. Approval Request Comment [Feature/Bug causing the regression]: No bug, or the bug resulting in this crash is still unknown. [User impact if declined]: We won't be able to fix this crash. This is a debug patch for finding the root cause of this crash. We need to ship this patch to our users for catching information about the crash and really fix this crash. [Is this code covered by automated tests?]: No. This is hard to automate because running out of system resource, like TLS in this bug, make the running process very unstable and the behavior is nondeterministic. The code is tested locally. [Has the fix been verified in Nightly?]: Yes. [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: None. [Is the change risky?]: The change is intrusive and risky to some extent because it hooks some Windows system calls. However, most users, this patch is not risky because most users don't run out of TLS, and what this patch does is merely increment or decrement of an internal counter. [Why is the change risky/not risky?]: [String changes made/needed]: None.
Attachment #8846580 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 71•6 years ago
|
||
Comment on attachment 8851453 [details] [diff] [review] Part 3 - Annotate the crash report with TLS allocation stacks on running out of TLS slots. Approval Request Comment [Feature/Bug causing the regression]: No bug, or the bug resulting in this crash is still unknown. [User impact if declined]: We won't be able to fix this crash. This is a debug patch for finding the root cause of this crash. We need to ship this patch to our users for catching information about the crash and really fix this crash. [Is this code covered by automated tests?]: No. This is hard to automate because running out of system resource, like TLS in this bug, make the running process very unstable and the behavior is nondeterministic. The code is tested locally. [Has the fix been verified in Nightly?]: Yes. [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: None. [Is the change risky?]: The change is intrusive and risky to some extent because it hooks some Windows system calls. However, most users, this patch is not risky because most users don't run out of TLS, and what this patch does is merely increment or decrement of an internal counter. [Why is the change risky/not risky?]: [String changes made/needed]: None.
Attachment #8851453 -
Flags: approval-mozilla-aurora?
Comment 72•6 years ago
|
||
Comment on attachment 8846580 [details] [diff] [review] Part 2 - Initialize/shutdown the TLS allocation tracker in the content process. This is a long-standing bug. We may use the debug patch to gather more information. Let's take it in Aurora54. Aurora54+. If anything goes wrong, we can backout the patch.
Attachment #8846580 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Updated•6 years ago
|
Attachment #8851452 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Updated•6 years ago
|
Attachment #8851453 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 73•6 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/5165e5a79b8c https://hg.mozilla.org/releases/mozilla-aurora/rev/b0e00e839fc2 https://hg.mozilla.org/releases/mozilla-aurora/rev/2c72c7f926a4
Comment 74•6 years ago
|
||
Mass wontfix for bugs affecting firefox 52.
Hi Jesup, would you consider uplifting this fix to 53? I am leaning towards wontfix'ing this for 53. Please let me know if you disagree.
Flags: needinfo?(rjesup)
Comment 76•6 years ago
|
||
I agree with wontfix for 53. -- and this patchset is diagnostic anyways.
Flags: needinfo?(rjesup)
![]() |
||
Comment 77•6 years ago
|
||
Cervantes, now that the diagnostics are in, can you look at the crashes to see if we can get any farther with this?
Flags: needinfo?(cyu)
Assignee | ||
Comment 78•6 years ago
|
||
We've seen crashes with the diagnostics, but they doesn't come with the annotation we expect. I seems DLL patching is not working on 32bit Windows 7 builds. I am looking into it and will add a followup fix for that.
Flags: needinfo?(cyu)
Assignee | ||
Comment 79•6 years ago
|
||
This patch fail to gather the information we need because of bug 1348747, but unfortunately, all crashes are from 32 bit firefox running on 32 or 64 bit Windows 7. No crashes seen on Window 8/8.1/10. We still need bug 1348747 to gather the information we need.
Looks like the work in bug 1348747 landed for 55. Does that help ? Did the signatures shift in some way?
Assignee | ||
Comment 81•6 years ago
|
||
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #80) > Looks like the work in bug 1348747 landed for 55. Does that help ? Did the > signatures shift in some way? I am still waiting for more results to tell. All Windows 7 crash reports are still on 54. There are 2 non-Windows 7 reports, but unluckily both are on 53. The signatures are still the same.
Flags: needinfo?(cyu)
Assignee | ||
Comment 82•6 years ago
|
||
From socorro, this crash only happens before 55. I think it's fixed on 55, very likely in the rust part. The debug patch is no longer necessary. I'll work on removing the patches.
Assignee | ||
Comment 83•6 years ago
|
||
Attachment #8898729 -
Flags: review?(nfroyd)
![]() |
||
Updated•6 years ago
|
Attachment #8898729 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 84•6 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/a8bf3954befdd08ed7e60cd863d755b244f19333 Bug 1320134 - Part 4: Remove the debug patches for running out of TLS slots on Windows. r=froydnj
Assignee | ||
Updated•6 years ago
|
Keywords: leave-open
Comment 85•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/a8bf3954befd
Status: REOPENED → RESOLVED
Closed: 6 years ago → 6 years ago
status-firefox57:
--- → fixed
Resolution: --- → FIXED
Updated•6 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•