Closed Bug 1320134 Opened 7 years ago Closed 6 years ago

Crash in xul.dll@0x4d768 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0...

Categories

(Core :: Audio/Video: Playback, defect, P1)

51 Branch
x86
Windows 7
defect

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox-esr45 --- unaffected
firefox50 --- unaffected
firefox51 + wontfix
firefox52 + wontfix
firefox-esr52 --- wontfix
firefox53 + wontfix
firefox54 --- wontfix
firefox55 --- wontfix
firefox56 --- wontfix
firefox57 --- fixed

People

(Reporter: philipp, Assigned: cyu)

References

Details

(Keywords: crash, regression, Whiteboard: [ps-radar])

Crash Data

Attachments

(5 files, 7 obsolete files)

3.43 KB, patch
Details | Diff | Splinter Review
2.56 KB, patch
cyu
: review+
Details | Diff | Splinter Review
8.51 KB, patch
cyu
: review+
Details | Diff | Splinter Review
1.62 KB, patch
cyu
: review+
Details | Diff | Splinter Review
11.88 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
This bug was filed from the Socorro interface and is 
report bp-ba31674f-32a7-427e-916d-99e2d2161124.
=============================================================
Crashing Thread (33)
Frame 	Module 	Signature 	Source
0 	xul.dll 	xul.dll@0x4d768 	
1 	xul.dll 	xul.dll@0x4d739 	
2 	xul.dll 	xul.dll@0x4d739 	
3 	xul.dll 	xul.dll@0x4d739 	
4 	xul.dll 	xul.dll@0x4d739 	
5 	xul.dll 	xul.dll@0x4d739 	
6 	xul.dll 	xul.dll@0x4d739 	
7 	xul.dll 	xul.dll@0x4d739 	
8 	xul.dll 	xul.dll@0x4d739 	
9 	xul.dll 	xul.dll@0x4d739 	
10 	xul.dll 	xul.dll@0x4d739 	
11 	xul.dll 	xul.dll@0x4d739 	
12 	xul.dll 	xul.dll@0x4d739 	
13 	xul.dll 	xul.dll@0x4d739 	
14 	xul.dll 	xul.dll@0x4d739 	
15 	xul.dll 	xul.dll@0x4d739 	
16 	xul.dll 	xul.dll@0x4d739 	
17 	xul.dll 	xul.dll@0x4d739 	
18 	xul.dll 	xul.dll@0x4d739 	
19 	xul.dll 	xul.dll@0x4d739 	
20 	xul.dll 	xul.dll@0x4d739 	
21 	xul.dll 	xul.dll@0x4d739 	
22 	xul.dll 	xul.dll@0x4d739 	
23 	xul.dll 	xul.dll@0x4d739 	
24 	xul.dll 	xul.dll@0x4d739 	
25 	xul.dll 	xul.dll@0x4d739 	
26 	xul.dll 	xul.dll@0x4d739 	
27 	xul.dll 	xul.dll@0x4d739 	
28 	xul.dll 	xul.dll@0x4d739 	
29 	xul.dll 	xul.dll@0x4d739 	
30 	xul.dll 	xul.dll@0x4d739 	
31 	xul.dll 	xul.dll@0x4d739 	
32 	xul.dll 	xul.dll@0x4d739 	
33 	xul.dll 	xul.dll@0x4d739 	
34 	xul.dll 	xul.dll@0x4d739 	
35 	xul.dll 	xul.dll@0x4d739 	
36 	xul.dll 	xul.dll@0x4d739 	
37 	xul.dll 	xul.dll@0x4d739 	
38 	xul.dll 	xul.dll@0x4d739 	
39 	xul.dll 	xul.dll@0x4d739 	
40 	xul.dll 	xul.dll@0x4d739 	
41 	xul.dll 	xul.dll@0x4d739 	
42 	xul.dll 	xul.dll@0x4d739 	
43 	xul.dll 	xul.dll@0x4d739 	
44 	xul.dll 	xul.dll@0x4d739 	
45 	xul.dll 	xul.dll@0x4d739 	
46 	xul.dll 	xul.dll@0x4d739 	
47 	xul.dll 	xul.dll@0x4d739 	
48 	xul.dll 	xul.dll@0x4d739 	
49 	xul.dll 	xul.dll@0x4d739 	
50 	xul.dll 	xul.dll@0x4d739 	
51 	xul.dll 	xul.dll@0x4d739 	
52 	xul.dll 	xul.dll@0x4d739 	
53 	xul.dll 	xul.dll@0x4d739 	
54 	xul.dll 	xul.dll@0x4d739 	
55 	xul.dll 	xul.dll@0x4d739 	
56 	xul.dll 	xul.dll@0x4d739 	
57 	xul.dll 	xul.dll@0x4d739 	
58 	xul.dll 	xul.dll@0x4d739 	
59 	xul.dll 	xul.dll@0x4d739 	
60 	xul.dll 	xul.dll@0x4d739 	
61 	xul.dll 	xul.dll@0x4d739 	
62 	xul.dll 	xul.dll@0x4d739 	
63 	xul.dll 	xul.dll@0x4d739 	
64 	xul.dll 	xul.dll@0x4d739 	
65 	xul.dll 	xul.dll@0x4d739 	
66 	xul.dll 	xul.dll@0x4d739 	
67 	xul.dll 	xul.dll@0x4d739 	
68 	xul.dll 	xul.dll@0x4d739 	
69 	xul.dll 	xul.dll@0x4d739 	
70 	xul.dll 	xul.dll@0x4d739 	
71 	xul.dll 	xul.dll@0x4d739 	
72 	xul.dll 	xul.dll@0x4d739 	
73 	xul.dll 	xul.dll@0x4d739 	
74 	xul.dll 	xul.dll@0x4d739 	
75 	xul.dll 	xul.dll@0x4d739 	
76 	xul.dll 	xul.dll@0x4d739 	
77 	xul.dll 	xul.dll@0x4d739 	
78 	xul.dll 	xul.dll@0x4d739 	
79 	xul.dll 	xul.dll@0x4d739 	
80 	xul.dll 	xul.dll@0x4d739 	
81 	xul.dll 	xul.dll@0x4d739 	
82 	xul.dll 	xul.dll@0x4d739 	
83 	xul.dll 	xul.dll@0x4d739 	
84 	xul.dll 	xul.dll@0x4d739 	
85 	xul.dll 	xul.dll@0x4d739 	
86 	xul.dll 	xul.dll@0x4d739 	
87 	xul.dll 	xul.dll@0x4d739 	
88 	xul.dll 	xul.dll@0x4d739 	
89 	xul.dll 	xul.dll@0x4d739 	
187 	xul.dll 	xul.dll@0x4d739 	
188 	xul.dll 	xul.dll@0x4d739 	
189 	xul.dll 	xul.dll@0x4d739 	
190 	xul.dll 	mp4_demuxer::MP4MetadataRust::MP4MetadataRust(mp4_demuxer::Stream*) 	media/libstagefright/binding/MP4Metadata.cpp:589
191 	xul.dll 	mp4_demuxer::MP4Metadata::MP4Metadata(mp4_demuxer::Stream*) 	media/libstagefright/binding/MP4Metadata.cpp:157
192 	xul.dll 	mozilla::MP4Demuxer::Init() 	dom/media/fmp4/MP4Demuxer.cpp:140
193 	xul.dll 	mozilla::TrackBuffersManager::InitializationSegmentReceived() 	dom/media/mediasource/TrackBuffersManager.cpp:901
194 	nss3.dll 	nss3.dll@0x490f 	
195 	nss3.dll 	PR_GetThreadPrivate 	nsprpub/pr/src/threads/prtpd.c:204
196 	xul.dll 	nsThreadManager::GetCurrentThread() 	xpcom/threads/nsThreadManager.cpp:222

this signature is the #5 crash in the content process in 51.0b so far (around 2.35% of all content crashes) and occurring on win7 & 8.1.
Correlations for Firefox Beta
(100.0% in signature vs 00.39% overall) reason = EXCEPTION_STACK_OVERFLOW
(98.78% in signature vs 22.53% overall) Module "msmpeg2adec.dll" = true
(98.78% in signature vs 22.93% overall) Module "slc.dll" = true
(89.02% in signature vs 13.87% overall) Module "sqmapi.dll" = true
(99.09% in signature vs 26.67% overall) Module "ksuser.dll" = true
(98.78% in signature vs 27.15% overall) Module "atl.dll" = true
(98.78% in signature vs 28.85% overall) Module "RpcRtRemote.dll" = true
(100.0% in signature vs 31.10% overall) Module "wevtapi.dll" = true
(100.0% in signature vs 31.76% overall) Module "msmpeg2vdec.dll" = true
(100.0% in signature vs 32.21% overall) Module "mf.dll" = true
(100.0% in signature vs 32.22% overall) Module "evr.dll" = true
(100.0% in signature vs 32.75% overall) Module "Wpc.dll" = true
(98.78% in signature vs 30.57% overall) Module "WSHTCPIP.DLL" = true
(99.70% in signature vs 32.07% overall) Module "samcli.dll" = true
(100.0% in signature vs 32.85% overall) Module "dxva2.dll" = true
(100.0% in signature vs 32.86% overall) Module "mfplat.dll" = true
(100.0% in signature vs 33.08% overall) Module "avrt.dll" = true
(98.78% in signature vs 31.92% overall) Module "samlib.dll" = true
(100.0% in signature vs 34.65% overall) Module "mozavcodec.dll" = true
(100.0% in signature vs 34.66% overall) Module "mozavutil.dll" = true
(98.17% in signature vs 32.03% overall) Module "netutils.dll" = true
(98.78% in signature vs 33.44% overall) Module "dhcpcsvc.dll" = true
(100.0% in signature vs 37.86% overall) dom_ipc_enabled = 1
(100.0% in signature vs 39.95% overall) Module "pnrpnsp.dll" = true
(100.0% in signature vs 40.02% overall) Module "NapiNSP.dll" = true
(05.18% in signature vs 67.33% overall) submitted_from_infobar = true
(100.0% in signature vs 40.05% overall) Module "nlaapi.dll" = true
(100.0% in signature vs 42.77% overall) Module "bcrypt.dll" = true
(100.0% in signature vs 43.19% overall) Module "cryptsp.dll" = true
(100.0% in signature vs 46.33% overall) Module "rsaenh.dll" = true
(100.0% in signature vs 47.37% overall) Module "browsercomps.dll" = true
(100.0% in signature vs 47.73% overall) Module "winrnr.dll" = true
(100.0% in signature vs 48.20% overall) startup_crash = null
(90.24% in signature vs 36.44% overall) Module "icm32.dll" = true
(100.0% in signature vs 49.72% overall) Module "firefox.exe" = true
(100.0% in signature vs 51.26% overall) Module "dnsapi.dll" = true
(65.85% in signature vs 18.91% overall) os_arch = x86
(98.78% in signature vs 57.59% overall) Module "Wldap32.dll" = true
(64.63% in signature vs 19.97% overall) platform_version = 6.1.7600
(80.49% in signature vs 37.01% overall) Module "softokn3.dll" = true
(80.49% in signature vs 37.39% overall) Module "freebl3.dll" = true
(98.78% in signature vs 59.58% overall) platform_pretty_version = Windows 7
(34.76% in signature vs 00.08% overall) adapter_device_id = 0x053b
See Also: → 1318943
Component: General → Audio/Video
Manish, should we duplicate this one to bug 1318943 ?
Flags: needinfo?(manishearth)
Ralph, 
Could you help have a look?
Component: Audio/Video → Audio/Video: Playback
Flags: needinfo?(giles)
(In reply to Kan-Ru Chen [:kanru] (UTC+8) from comment #2)
> Manish, should we duplicate this one to bug 1318943 ?

Nah, this is an mp4 crash, that was a rust-url crash (in non-shared code)
Flags: needinfo?(manishearth)
On beta, MP4Metadata.cpp:589 is `mRustParser.reset(mp4parse_new(&io))` in the MP4MetadataRust ctor. Since it's initial construction, it's presumedly the mp4parse_new() call which is failing rather than the mp4parse_free() invoked by UniquePtr::reset when replacing an established pointer.

`io` is a local struct, so it should be valid. The rest of the stack could be the rust panic loop we've seen before. So what failed in mp4parse_new? This is a constructor as well; it just does (small) allocations and initialization. Is there a correlation with very low or fragmented memory?
Flags: needinfo?(giles)
I bet this is the same thing as bug 1305315, just presenting differently (the stack frames are missing symbols for some reason). The pseudo stacktrace I generated for a crash from that bug looked like:
https://gist.github.com/luser/650057bfc2ca2f88ea5a703fca0c77bb

There's a `xul.dll!mp4parse_new [local.rs:91faf7ec36cd : 208 + 0x14]` frame in there, calling into some std library code, which fails in initialization and then panics, and then fails during panic, etc.
Crash Signature: ...] → ...] [@ xul.dll@0x4d7e8 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4…
from Triage team: ni on marcia to follow up on this
Flags: needinfo?
(In reply to Marcia Knous [:marcia - use ni] from comment #7)
> from Triage team: ni on marcia to follow up on this

Marco and I tried to reprocess this signature, but unfortunately the signature hasn't changed - so I guess we are still missing symbols here.
Flags: needinfo?
Crash Signature: xul.dll@0x4d7b9 | ...] → xul.dll@0x4d7b9 | ...] [@ xul.dll@0x4d778 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4…
I believe this bug should be flag as P1 because of the majority of crashes revealed after 51.0b8 is out.
We might need either a clear STR or go over all uplifted FF51 patches regarding to demuxer to keep the investigation running.

Blake, thoughts ?
Flags: needinfo?(bwu)
Priority: -- → P1
Yeah. I think the same way. 
Alfredo, 
Per discussing today, before Ralph comes back next week can you have a look to see if there is anything we can help?
Flags: needinfo?(bwu) → needinfo?(ayang)
(In reply to Blake Wu [:bwu][:blakewu] from comment #10)
> Yeah. I think the same way. 
> Alfredo, 
> Per discussing today, before Ralph comes back next week can you have a look
> to see if there is anything we can help?

It crashes at MP4Metadata trying to allocate the rust demuxer. As we talked before, it could be something leaked or corrupted the memory. I believe rust code is the victim in this case.

I agree with comment 6, this is the same one as bug 1305315.
Flags: needinfo?(ayang)
[Tracking Requested - why for this release]: high volume crash on Firefox 51 with Window 7.
Crash Signature: ...] [@ xul.dll@0x4d7e8 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4… → ...] [@ xul.dll@0x4d7f8 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4…
Note that the crash has always been there in 51 Beta, it didn't regress in Beta 8.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #6)
> I bet this is the same thing as bug 1305315, just presenting differently

Ted, if this is the case, what's your suggestion to fix this issue ?
Flags: needinfo?(ted)
See Also: → 1305315
(In reply to Astley Chen [:astley] UTC+8 from comment #14)
> Ted, if this is the case, what's your suggestion to fix this issue ?

I ran out of ideas in that bug, so I don't have anything better to offer here, sadly.
Flags: needinfo?(ted)
Track 51+ as the volume of crash is high.
Is this crash still happening? If so, how can we make it actionable?
Flags: needinfo?(madperson)
Flags: needinfo?(giles)
it's a bit difficult to answer, since super search doesn't play nice with these kinds of crashes unfortunately.
they are clearly still around in 52.0b and make up ~2% of all content crashes, but on aurora & nightly i'm not sure...

on 52.0b1 these crash stacks look like described in comment #0: https://crash-stats.mozilla.com/report/index/f0affd59-140d-4083-b054-ff7c72170209
afterwards it seems to get a little bit different:
https://crash-stats.mozilla.com/report/index/c3fed71f-7afb-4bc8-b34c-c7a5d2170209
Flags: needinfo?(madperson)
Crash Signature: xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | ...] → xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | xul.dll@0x4d7b9 | ...] [@ xul.dll@0x499ee | xul.dll@0x499c4 | xul.dll@0x499c4 | xul.dll@0x499c4 | xul.dll@0x499c4 | xul.dll@0x4…
I still have no insight. Cervantes, what do you think about pursuing your TlsAlloc diagnostic idea from https://bugzilla.mozilla.org/show_bug.cgi?id=1305315#c24 on this bug?
Flags: needinfo?(giles) → needinfo?(cyu)
I can write the diagnostic patch.
Assignee: nobody → cyu
Flags: needinfo?(cyu)
This tracks allocation/free of TLS slots on Windows by interposing TlsAlloc()/TlsFree() calls. When observed TLS slot allocations has reached a high watermark, we start recording the stacks of TlsAlloc() calls (and forget the stack if it's TlsFree()'d). When we observe that TlsAlloc() returns TLS_OUT_OF_INDEXES, we serialize the recorded data to a string ready for crash annotation.
Attachment #8839094 - Flags: review?(nfroyd)
This enables TlsAllocationTracker in the child process, inside which the crash for this bug is observed.
Attachment #8839095 - Flags: review?(nfroyd)
This adds an annotation "TlsAllocations" to the crash report on TLS slots running out.
Attachment #8839100 - Flags: review?(ted)
Patch to test TlsAllocationTracker. Not for review.

#define TEST_TLS_TRACKER in TlsAllocationTracker.h and then the content process will record the first 10 TlsAlloc() stacks. Then crashinject.exe the content process, you'll see the annotation like the following. Then we can decode the stacks offline to see why we run out of TLS slots.

TlsAllocations=[0x7ff676571613,0x7ff6765712f5,0x7ff676571adf,0x7ff676595f79,0x7ffb41a58364,0x7ffb43e770d1,],[0x7ffaf70a6ae1,0x7ffaf8dd63b0,0x7ff676571613,0x7ff6765712f5,0x7ff676571adf,0x7ff676595f79,0x7ffb41a58364,0x7ffb43e770d1,],[0x7ffaf70abd0b,0x7ffaf70af76b,0x7ffaf70b9df3,0x7ffaf8dd644d,0x7ff676571613,0x7ff6765712f5,0x7ff676571adf,0x7ff676595f79,0x7ffb41a58364,0x7ffb43e770d1,],[0x7ffaf8dd6718,0x7ffaf70caf6e,0x7ffaf8034b87,0x7ffaf8dd64c9,0x7ff676571613,0x7ff6765712f5,0x7ff676571adf,0x7ff676595f79,0x7ffb41a58364,0x7ffb43e770d1,],[0x7ffaf8dd6718,0x7ffaf70caf6e,0x7ffaf8034b87,0x7ffaf8dd64c9,0x7ff676571613,0x7ff6765712f5,0x7ff676571adf,0x7ff676595f79,0x7ffb41a58364,0x7ffb43e770d1,],[0x7ffaf6da67ab,0x7ffaf8dd6718,0x7ffaf70caf6e,0x7ffaf8034b87,0x7ffaf8dd64c9,0x7ff676571613,0x7ff6765712f5,0x7ff676571adf,0x7ff676595f79,0x7ffb41a58364,0x7ffb43e770d1,],[0x7ffaf8dd6718,0x7ffaf70caf6e,0x7ffaf8034b87,0x7ffaf8dd64c9,0x7ff676571613,0x7ff6765712f5,0x7ff676571adf,0x7ff676595f79,0x7ffb41a58364,0x7ffb43e770d1,],[0x7ffaf72b58bf,0x7ffaf72c76d9,0x7ffaf72ca9b6,0x7ffaf72d6568,0x7ffaf72c7ca9,0x7ffaf8583a1d,0x7ffaf6d7d57e,0x7ffaf6d7c4a6,0x7ffaf6d7bcfe,0x7ffaf6d7f995,0x7ffaf6d7f6b5,0x7ffaf6d3fd23,0x7ffaf6d8000e,0x7ffaf6d7db40,0x7ffaf6d810e0,0x7ffaf6d7e545,0x7ffaf6d7db90,0x7ffaf6d81096,0x7ffaf6d7e545,0x7ffaf6d7ebca,0x7ffaf6d7d10a,0x7ffaf6da67e6,0x7ffaf8dd6718,0x7ffaf70caf6e,0x7ffaf8034b87,0x7ffaf8dd64c9,0x7ff676571613,0x7ff6765712f5,0x7ff676571adf,0x7ff676595f79,0x7ffb41a58364,0x7ffb43e770d1,],[0x7ffb3fbb48ac,0x7ffb43e3a35f,0x7ffb43e1771a,0x7ffb43e17567,0x7ffb43e1d33d,0x7ffb43e37099,0x7ffb43e36add,0x7ffb43e19efc,0x7ffb35172b32,0x7ffb40abcd7f,0x7ffb41e1462a,0x7ffb41e1322a,0x7ffb41e1acf4,0x7ffb41e1a4af,0x7ffb0c787a55,0x7ffb0c78370b,0x7ffaf6dd4b0c,0x7ffaf6dcab69,0x7ffaf6e1a2f9,0x7ffaf6d90ecb,0x7ffaf6d9051f,0x7ffaf70c4122,0x7ffaf70aba9f,0x7ffaf70ab86e,0x7ffaf6d938e6,0x7ffb0c79512e,0x7ffb0c7882da,0x7ffb4119cab0,0x7ffb41a58364,0x7ffb43e770d1,],[0x7ffaf8586488,0x7ffaf8583a2c,0x7ffaf6d7d57e,0x7ffaf6d7c4a6,0x7ffaf6d7bcfe,0x7ffaf6d7f995,0x7ffaf6d7f6b5,0x7ffaf6d3fd23,0x7ffaf6d8000e,0x7ffaf6d7db40,0x7ffaf6d810e0,0x7ffaf6d7e545,0x7ffaf6d7db90,0x7ffaf6d81096,0x7ffaf6d7e545,0x7ffaf6d7ebca,0x7ffaf6d7d10a,0x7ffaf6da67e6,0x7ffaf8dd6718,0x7ffaf70caf6e,0x7ffaf8034b87,0x7ffaf8dd64c9,0x7ff676571613,0x7ff6765712f5,0x7ff676571adf,0x7ff676595f79,0x7ffb41a58364,0x7ffb43e770d1,],[0x7ffaf8dd6718,0x7ffaf70caf6e,0x7ffaf8034b87,0x7ffaf8dd64c9,0x7ff676571613,0x7ff6765712f5,0x7ff676571adf,0x7ff676595f79,0x7ffb41a58364,0x7ffb43e770d1,],
Comment on attachment 8839094 [details] [diff] [review]
Part 1 - Tracking TLS allocations for diagnosing TLS slots run out on Windows.

Review of attachment 8839094 [details] [diff] [review]:
-----------------------------------------------------------------

Lots of little comments.  Are we hoping that the last ~40 allocations are going to be the problematic ones, rather than something allocating several hundred TLS indices before we even start recording stacks?

::: xpcom/build/TlsAllocationTracker.cpp
@@ +24,5 @@
> +namespace {
> +
> +using ::mozilla::Move;
> +using ::mozilla::UniquePtr;
> +using ::mozilla::Atomic;

Are these really necessary, since we're already inside the mozilla:: namespace at this point?

@@ +30,5 @@
> +std::string sTlsAllocationStacks;
> +
> +// Start recording TlsAlloc() call stacks when we observed kStartTrackingTlsAt
> +// allocations.
> +const uint32_t kStartTrackingTlsAt = 1040;

Maybe add another sentence describing why we chose the particular value here: it's close to the maximum permitted number of allocated TLS slots.

@@ +33,5 @@
> +// allocations.
> +const uint32_t kStartTrackingTlsAt = 1040;
> +
> +using stack_t = std::vector<uintptr_t>;
> +using stacks_t = std::map<DWORD, stack_t>;

I would strongly prefer using XPCOM datastructures here and for sTlsAllocationStacks.

@@ +34,5 @@
> +const uint32_t kStartTrackingTlsAt = 1040;
> +
> +using stack_t = std::vector<uintptr_t>;
> +using stacks_t = std::map<DWORD, stack_t>;
> +static UniquePtr<stacks_t> sRecentTlsAllocationStacks;

Can we use StaticAutoPtr here to avoid the static constructor?

@@ +41,5 @@
> +static CRITICAL_SECTION sMutex;
> +static Atomic<uint32_t> sCurrentTlsSlots{0};
> +
> +// Windows DLL interceptor
> +static WindowsDllInterceptor sKernel32DllInterceptor{};

WDYT about moving this object into InitTlsAllocationTracker to avoid the static constructor?

@@ +52,5 @@
> +class CriticalSectionAutoEnter
> +{
> +public:
> +    explicit CriticalSectionAutoEnter(CRITICAL_SECTION* aCriticalSection) :
> +        mCriticalSection(aCriticalSection)

The indentation for these methods seems a bit deeper than the two-space indents we normally use.  I think it's also the style to format the initialization list like so:

  explicit CriticalSectionAutoEnter(...)
    : mCriticalSection(...)
  {
    ...

@@ +81,5 @@
> +
> +  stack_t rawStack;
> +  auto callback = [](uint32_t, void* aPC, void*, void* aClosure) {
> +    std::vector<uintptr_t>* stack =
> +      static_cast<std::vector<uintptr_t>*>(aClosure);

How about |auto* stack = static_cast<stack_t*>(aClosure);|?

@@ +85,5 @@
> +      static_cast<std::vector<uintptr_t>*>(aClosure);
> +    stack->push_back(reinterpret_cast<uintptr_t>(aPC));
> +  };
> +  MozStackWalk(callback, /* skip 2 frames */ 2,
> +              /* maxFrames */ 0, reinterpret_cast<void*>(&rawStack), 0, nullptr);

You shouldn't need to reinterpret_cast to void*; static_cast should be fine.

@@ +91,5 @@
> +  CriticalSectionAutoEnter lock(&sMutex);
> +  if (!sRecentTlsAllocationStacks) {
> +    return;
> +  }
> +  sRecentTlsAllocationStacks->emplace(aTlsIndex, rawStack);

Can we Move(rawStack) instead of copying it here?

@@ +129,5 @@
> +    const stack_t& stack = iter->second;
> +    output << "[";
> +
> +    for (auto it = stack.begin(); it != stack.end(); ++it) {
> +      output << "0x" << *it << ",";

Is it worth making this output valid JSON so it's easy to parse?

@@ +137,5 @@
> +
> +  sTlsAllocationStacks = Move(output.str());
> +}
> +
> +DWORD NTAPI

This should be WINAPI, for consistency with InterposedTlsFree and the Windows documentation.

@@ +172,5 @@
> +  if (sInitialized) {
> +    return;
> +  }
> +
> +  InitializeCriticalSection(&sMutex);

I think we need to address https://dxr.mozilla.org/mozilla-central/source/mozglue/misc/Mutex_windows.cpp#19 here as well, whether by using InitializeCriticalSectionEx or just using mozilla::detail::MutexImpl directly.
Attachment #8839094 - Flags: review?(nfroyd) → feedback+
Comment on attachment 8839095 [details] [diff] [review]
Part 2 - Initialize the TLS allocation tracker for child process.

Review of attachment 8839095 [details] [diff] [review]:
-----------------------------------------------------------------

r=me assuming the answer to the question below is reasonable.

::: toolkit/xre/nsEmbedFunctions.cpp
@@ +357,5 @@
>  #endif
>  
>  #if defined(XP_WIN)
> +  // XXX Bug 1320134: added for diagnosing the crash because of TLS index run
> +  // out on Windows. Remove after the root cause is found.

Nit: "...the crashes because we're running out of TLS indices on Windows.  Remove after..."

@@ +358,5 @@
>  
>  #if defined(XP_WIN)
> +  // XXX Bug 1320134: added for diagnosing the crash because of TLS index run
> +  // out on Windows. Remove after the root cause is found.
> +  mozilla::InitTlsAllocationTracker();

Have we allocated any TLS indices prior to this point?  Otherwise we might not start recording TLS index allocation stacks until it's too late...
Attachment #8839095 - Flags: review?(nfroyd) → review+
Comment on attachment 8839100 [details] [diff] [review]
Part 3 -  Annotate the crash report with TLS allocation stacks on running out of TLS slots.

Review of attachment 8839100 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/crashreporter/nsExceptionHandler.cpp
@@ +1423,5 @@
>    }
> +
> +#ifdef XP_WIN
> +  const std::string& tlsAllocations = mozilla::GetTlsAllocationStacks();
> +  if (tlsAllocations.length()) {

Nit: I would have used `if (!tlsAllocations.empty())`
Attachment #8839100 - Flags: review?(ted) → review+
Changes to V1:
- Lower the value of kStartTrackingTlsAt to 950. There are a couple of allocations before we start tracking TLS allocations. The values is chosen so that we don't add a very long annotation to the .extra file. For out of TLS indexes, we should catch something meaningful in the most recent 100 allocations that we record. If a long annotation to the .extra file is a nonissue, we can choose an even lower value like 800 or 700, as long as it doesn't impact normal users.
- Output the annotation in JSON format.
- Remove constructions in the global scope
- Use StaticMutex.
- Use XPCOM strings and data structures.
Attachment #8839094 - Attachment is obsolete: true
Attachment #8840830 - Flags: review?(nfroyd)
(In reply to Nathan Froyd [:froydnj] from comment #27)
> Comment on attachment 8839095 [details] [diff] [review]
> Part 2 - Initialize the TLS allocation tracker for child process.
> 
> Review of attachment 8839095 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r=me assuming the answer to the question below is reasonable.
> 
> @@ +358,5 @@
> >  
> >  #if defined(XP_WIN)
> > +  // XXX Bug 1320134: added for diagnosing the crash because of TLS index run
> > +  // out on Windows. Remove after the root cause is found.
> > +  mozilla::InitTlsAllocationTracker();
> 
> Have we allocated any TLS indices prior to this point?  Otherwise we might
> not start recording TLS index allocation stacks until it's too late...
You are right, in my test I can only observe ~1034 TlsAlloc() calls, which means about 50 TlsAlloc() calls are already made before we initialize the tracker. I lower the value of kStartTrackingTlsAt in part 1 to 950. We can lower the value if we want to have more stacks in the annotation.
Updated the comment.
Attachment #8839095 - Attachment is obsolete: true
Attachment #8840838 - Flags: review+
Updated for the interface change in part 1: use const char* return type instead of const std::string& because part1 uses nsCString internally.
Attachment #8839100 - Attachment is obsolete: true
Attachment #8840840 - Flags: review+
The JSON annotation now looks like:

TlsAllocations={[1941097036, 1941096875, 1999824270, 1999638086, 1999547132, 1999546779, 1999572918, 1999635099, 1999621524, 1999545935, 1866344146, 1946064184, 1957586952, 1957642032, 1957677238, 1836443619, 1836428643, 1836428079, 1522326734, 1522294004, 1522548268, 1522111547, 1522109636, 1524802161, 1524729866, 1524729356, 1522119717, 1836489082, 1836445497, 1996084719, 1983996612, 1999769561, 1999769508], [1548049934, 11146465, 11145907, 11147568, 11259493, 1983996612, 1999769561, 1999769508], [1548020693, 1524821998, 1536856748, 1548020225, 1548049934, 11146465, 11145907, 11147568, 11259493, 1983996612, 1999769561, 1999769508], [1524821998, 1536856748, 1548020225, 1548049934, 11146465, 11145907, 11147568, 11259493, 1983996612, 1999769561, 1999769508], [1524730291, 1524741301, 1524772615, 1548017948, 1548020168, 1548049934, 11146465, 11145907, 11147568, 11259493, 1983996612, 1999769561, 1999769508], [1526424988, 1522046144, 1522057842, 1522057381, 1521850734, 1522059320, 1522052576, 1522063141, 1522046643, 1522052658, 1522063090, 1522046643, 1522054288, 1522055271, 1522050528, 1522174494, 1548020693, 1524821998, 1536856748, 1548020225, 1548049934, 11146465, 11145907, 11147568, 11259493, 1983996612, 1999769561, 1999769508], [1548020693, 1524821998, 1536856748, 1548020225, 1548049934, 11146465, 11145907, 11147568, 11259493, 1983996612, 1999769561, 1999769508], [1522046144, 1522057842, 1522057381, 1521850734, 1522059320, 1522052576, 1522063141, 1522046643, 1522052658, 1522063090, 1522046643, 1522054288, 1522055271, 1522050528, 1522174494, 1548020693, 1524821998, 1536856748, 1548020225, 1548049934, 11146465, 11145907, 11147568, 11259493, 1983996612, 1999769561, 1999769508], [1548020015, 1548049934, 11146465, 11145907, 11147568, 11259493, 1983996612, 1999769561, 1999769508], [1548020693, 1524821998, 1536856748, 1548020225, 1548049934, 11146465, 11145907, 11147568, 11259493, 1983996612, 1999769561, 1999769508], [1524821998, 1536856748, 1548020225, 1548049934, 11146465, 11145907, 11147568, 11259493, 1983996612, 1999769561, 1999769508]}
We could make this more compact at the expense of making it more of a pain to work with in Socorro by writing it out as part of the minidump, so it'd just be `N * sizeof(void*)` bytes extra in the minidump. That would mean we'd have to extract it from the minidump in the stackwalker to do anything useful with it, though.
Oh, I missed the fact that we're capturing full stacks here. That'd make it larger still.

Do you have a plan for getting useful data out of this annotation? It strikes me that it might actually be better to put this in the minidump so we could symbolicate it with the symbols we already have during minidump processing.
Comment on attachment 8840830 [details] [diff] [review]
Part 1 - Tracking TLS allocations for diagnosing TLS slots run out on Windows. (v2)

Review of attachment 8840830 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the changes.  This is fine as far as it goes, but we should have some discussion and consensus about the best way to send this data along for easy processing.
Attachment #8840830 - Flags: review?(nfroyd) → review+
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #36)
> Oh, I missed the fact that we're capturing full stacks here. That'd make it
> larger still.
> 
> Do you have a plan for getting useful data out of this annotation? It
At the very least, we can load the minidump in Visual Studio debugger and symbolicate manually in it. Or we can synthesize a request to http://symbolapi.mozilla.org as the gecko profiler addon does.

> strikes me that it might actually be better to put this in the minidump so
> we could symbolicate it with the symbols we already have during minidump
> processing.
I don't get it. Are you saying that we add more stacks to the minidump using the TLS allocation stacks we captured and make socorro decode the stacks for us? I am not sure if we can do this on Windows.
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #38)
> I don't get it. Are you saying that we add more stacks to the minidump using
> the TLS allocation stacks we captured and make socorro decode the stacks for
> us? I am not sure if we can do this on Windows.

Sort of--they wouldn't be stacks the way the other thread stacks are in the minidump, we could just add additional streams to the minidump that were arrays of instruction pointers, and then use the already-loaded symbols in the stackwalker to turn them into function names+source locations and put that in the stackwalker output.

It'd be a bit more work though, so I think you should just land what you have and see what the data looks like. If it turns out to be difficult to work with we can revisit this.
Try bustage: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d29182af4a8f8a7f3ef7428b7c70a4a94b79e8ef&selectedJob=81105147

WindowsDllInterceptor fails to patch the function because it encounters unknown instruction byte 0xeb (Jmp rel8), both on x86 and x64, in patching TlsFree(), but local Windows build works fine.
If bug 1345856 is a duplicate of this issue, this is still a high volume crash on 53 beta 1.
Whiteboard: [ps-radar]
Carryover r+ with the following changes/fixes:
- Interface for shutting down the TLS tracker
- Synchronization in GetTlsAllocationStacks()
  It looks like that the span of mutex lock in this function is not enough and needs to extend to nsExceptionHandler after it's written out to the crash annotation file, but the string annotation, sTlsAllocationStacks, will only be written by the first thread that observes TLS_OUT_OF_INDEXES.
- Don't touch the wrapper functionality after shutdown.
Attachment #8840830 - Attachment is obsolete: true
Attachment #8846571 - Flags: review+
Carryover r+ with the following changes:
- Shutdown before exiting the main function.
- Only enable in non-debug build.
  This is to avoid the hard-to-debug crashes in try pushes because kernel32.dll!TlsFree() starts with an unknow instruction byte.
Attachment #8840838 - Attachment is obsolete: true
Attachment #8846580 - Flags: review+
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #46)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=dfc9502528b2

Tests on Windows are green. I'll land with the above updates.
https://hg.mozilla.org/integration/mozilla-inbound/rev/5fd991c5f12076490a52a2c0232af3275e1bae12
Bug 1320134 - Part 1: Tracking TLS allocations for diagnosing out-of-TLS-slots crashes on Windows. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/e91e04dadeb3d8950dc7c3327fde93e3ec26045a
Bug 1320134 - Part 2: Initialize and shutdown the TLS allocation tracker in the content process. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/9ba65a580135e445066222199eec1fc4abb5a57c
Bug 1320134 - Part 3: Annotate the crash report with TLS allocation stacks on running out of TLS slots. r=ted
Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---
Cervantes, do you want to request uplift for this fix?  If we can confirm that it's useful, and you don't think it's too risky of a change,  I'd like to get it into beta 3.
Flags: needinfo?(cyu)
Note that the dependency on bug 1347130 was from Alice White seeing this fingered by mozregression as the source, so we may want to hold for a few days on uplift.  Note that bug 1347130 seems to cause 'odd' mozregression results, with people getting different (and unlikely) results; perhaps due to something trashing memory.

Perhaps an ASAN linux build could catch it, but since the crash is in GL code probably not.
Liz, the landed patches are *not* a fix. They are intended to be shipped to users for diagnosing the root cause of the crash.(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #52)

> Cervantes, do you want to request uplift for this fix?  If we can confirm
> that it's useful, and you don't think it's too risky of a change,  I'd like
> to get it into beta 3.

(In reply to Randell Jesup [:jesup] from comment #53)
> Note that the dependency on bug 1347130 was from Alice White seeing this
> fingered by mozregression as the source, so we may want to hold for a few
> days on uplift.  Note that bug 1347130 seems to cause 'odd' mozregression
> results, with people getting different (and unlikely) results; perhaps due
> to something trashing memory.
> 
> Perhaps an ASAN linux build could catch it, but since the crash is in GL
> code probably not.

I can confirm that this bug causes the crash in bug 1347130. ASAN won't catch it because this is only enabled on Windows builds. It performs DLL interception to kernel32.dll. I'll backout the patches reland with the crash fiexed.
Flags: needinfo?(cyu)
https://hg.mozilla.org/integration/mozilla-inbound/rev/7012620c3aee5e36beb7156ec3620de897b03c1f
Backed out changeset 5fd991c5f120 (bug 1320134) for crashing webgl on Windows. r=backout

https://hg.mozilla.org/integration/mozilla-inbound/rev/5389367569cc29a30a1631dda01606f1981d3bfc
Backed out changeset e91e04dadeb3 (bug 1320134) for crashing webgl on Windows. r=backout

https://hg.mozilla.org/integration/mozilla-inbound/rev/ef6728e3e1364596180a25268d705c52e03eb4f3
Backed out changeset 9ba65a580135 (bug 1320134) for crashing webgl on Windows. r=backout
Scratch comment #56. It's wrong calling convention. The function pointer prototypes miss the WINAPI calling convention and then mess up the stack after the call to gOriginalTlsFree(aTlsIndex). The interceptor of TlsAlloc() doesn't have this crash because it doesn't have parameter. After adding WINAPI to the function pointer type aliases, the crash doesn't show up locally.

I'll test on try and reland after the tests are green.
also backed out from m-c
https://hg.mozilla.org/mozilla-central/rev/7012620c3aee
I added the calling convention fix and repushed, but the server still have problems with us patching kernel32!TlsAlloc():
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0a5860e4271dcc0b6b18b3d68e12fd43164fbf57

Another push to dump the instruction bytes on interception failure:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=85236a9fd4999afb2a5fe17821f42595068db040

My current plan is: let's see if the failure in patching TlsAlloc() can be fixed quickly. If not then I'll fix bug 1347130 and only enable on release build.
Filed bug 1348747 for the crash we saw in intercepting TlsAlloc() and TlsFree().
Depends on: 1348747
https://hg.mozilla.org/integration/mozilla-inbound/rev/f760842b14a25c8de500472ead62fcf50ee36c0b
Bug 1320134 - Part 1: Tracking TLS allocations for diagnosing out-of-TLS-slots crashes on Windows. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/051b765ca8f2a21093ebdd9161351af2b4ccd7ab
Bug 1320134 - Part 2: Initialize and shutdown the TLS allocation tracker in the content process. r=froydnj

https://hg.mozilla.org/integration/mozilla-inbound/rev/01125b5142e587fb8d8bd40c8dbaf6aaef71a9f9
Bug 1320134 - Part 3: Annotate the crash report with TLS allocation stacks on running out of TLS slots. r=ted
Crash Signature: [@ xul.dll@0x4d768 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | … → [@ std::sys::imp::thread_local::create ] [@ RtlFindClearBits | RtlFindClearBitsAndSet | TlsAlloc ] [@ xul.dll@0x4d768 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll@0x4d739 | xul.dll…
Attachment #8851452 - Attachment description: 8846571: Part 1 - Tracking TLS allocations for diagnosing out-of-TLS-slots crashes on Windows → Part 1 - Tracking TLS allocations for diagnosing out-of-TLS-slots crashes on Windows
Comment on attachment 8851452 [details] [diff] [review]
Part 1 - Tracking TLS allocations for diagnosing out-of-TLS-slots crashes on Windows

Approval Request Comment
[Feature/Bug causing the regression]: No bug, or the bug resulting in this crash is still unknown.
[User impact if declined]: We won't be able to fix this crash. This is a debug patch for finding the root cause of this crash. We need to ship this patch to our users for catching information about the crash and really fix this crash.
[Is this code covered by automated tests?]: No. This is hard to automate because running out of system resource, like TLS in this bug, make the running process very unstable and the behavior is nondeterministic. The code is tested locally.
[Has the fix been verified in Nightly?]: Yes.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: The change is intrusive and risky to some extent because it hooks some Windows system calls. However, most users, this patch is not risky because most users don't run out of TLS, and what this patch does is merely increment or decrement of an internal counter.
[Why is the change risky/not risky?]:
[String changes made/needed]: None.
Attachment #8851452 - Flags: approval-mozilla-aurora?
Comment on attachment 8846580 [details] [diff] [review]
Part 2 - Initialize/shutdown the TLS allocation tracker in the content process.

Approval Request Comment
[Feature/Bug causing the regression]: No bug, or the bug resulting in this crash is still unknown.
[User impact if declined]: We won't be able to fix this crash. This is a debug patch for finding the root cause of this crash. We need to ship this patch to our users for catching information about the crash and really fix this crash.
[Is this code covered by automated tests?]: No. This is hard to automate because running out of system resource, like TLS in this bug, make the running process very unstable and the behavior is nondeterministic. The code is tested locally.
[Has the fix been verified in Nightly?]: Yes.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: The change is intrusive and risky to some extent because it hooks some Windows system calls. However, most users, this patch is not risky because most users don't run out of TLS, and what this patch does is merely increment or decrement of an internal counter.
[Why is the change risky/not risky?]:
[String changes made/needed]: None.
Attachment #8846580 - Flags: approval-mozilla-aurora?
Comment on attachment 8851453 [details] [diff] [review]
Part 3 - Annotate the crash report with TLS allocation stacks on running out of TLS slots.

Approval Request Comment
[Feature/Bug causing the regression]: No bug, or the bug resulting in this crash is still unknown.
[User impact if declined]: We won't be able to fix this crash. This is a debug patch for finding the root cause of this crash. We need to ship this patch to our users for catching information about the crash and really fix this crash.
[Is this code covered by automated tests?]: No. This is hard to automate because running out of system resource, like TLS in this bug, make the running process very unstable and the behavior is nondeterministic. The code is tested locally.
[Has the fix been verified in Nightly?]: Yes.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: The change is intrusive and risky to some extent because it hooks some Windows system calls. However, most users, this patch is not risky because most users don't run out of TLS, and what this patch does is merely increment or decrement of an internal counter.
[Why is the change risky/not risky?]:
[String changes made/needed]: None.
Attachment #8851453 - Flags: approval-mozilla-aurora?
Comment on attachment 8846580 [details] [diff] [review]
Part 2 - Initialize/shutdown the TLS allocation tracker in the content process.

This is a long-standing bug. We may use the debug patch to gather more information. Let's take it in Aurora54. Aurora54+. If anything goes wrong, we can backout the patch.
Attachment #8846580 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8851452 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8851453 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Mass wontfix for bugs affecting firefox 52.
Hi Jesup, would you consider uplifting this fix to 53? I am leaning towards wontfix'ing this for 53. Please let me know if you disagree.
Flags: needinfo?(rjesup)
I agree with wontfix for 53. -- and this patchset is diagnostic anyways.
Flags: needinfo?(rjesup)
Cervantes, now that the diagnostics are in, can you look at the crashes to see if we can get any farther with this?
Flags: needinfo?(cyu)
We've seen crashes with the diagnostics, but they doesn't come with the annotation we expect. I seems DLL patching is not working on 32bit Windows 7 builds. I am looking into it and will add a followup fix for that.
Flags: needinfo?(cyu)
This patch fail to gather the information we need because of bug 1348747, but unfortunately, all crashes are from 32 bit firefox running on 32 or 64 bit Windows 7. No crashes seen on Window 8/8.1/10. We still need bug 1348747 to gather the information we need.
Looks like the work in bug 1348747 landed for 55. Does that help ?  Did the signatures shift in some way?
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #80)
> Looks like the work in bug 1348747 landed for 55. Does that help ?  Did the
> signatures shift in some way?

I am still waiting for more results to tell. All Windows 7 crash reports are still on 54. There are 2 non-Windows 7 reports, but unluckily both are on 53. The signatures are still the same.
Flags: needinfo?(cyu)
From socorro, this crash only happens before 55. I think it's fixed on 55, very likely in the rust part. The debug patch is no longer necessary. I'll work on removing the patches.
Attachment #8898729 - Flags: review?(nfroyd)
Attachment #8898729 - Flags: review?(nfroyd) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/a8bf3954befdd08ed7e60cd863d755b244f19333
Bug 1320134 - Part 4: Remove the debug patches for running out of TLS slots on Windows. r=froydnj
Keywords: leave-open
https://hg.mozilla.org/mozilla-central/rev/a8bf3954befd
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.