startup crash in log via mozilla::Telemetry::Accumulate

RESOLVED FIXED in Firefox 39

Status

()

--
critical
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: tracy, Assigned: dmajor)

Tracking

({crash, topcrash-thunderbird, topcrash-win})

Trunk
mozilla41
Unspecified
Windows 7
crash, topcrash-thunderbird, topcrash-win
Points:
---

Firefox Tracking Flags

(firefox39+ fixed, firefox40+ fixed, firefox41 fixed)

Details

(Whiteboard: [tbird topcrash][startupcrash], crash signature)

Attachments

(4 attachments)

(Reporter)

Description

4 years ago
Startup crasher is number 3 on the 3 day top crash list.  First affected Nightly build was 2015042703 on Windows 7 amd64 only.

This bug was filed from the Socorro interface and is 
report bp-2f8ef536-1f91-4c00-9413-6a3a92150427.
=============================================================

Frame 	Module 	Signature 	Source
0 	msvcr120.dll 	log 	f:\dd\vctools\crt\fpw32\tran\amd64\log.asm:429
1 	mozglue.dll 	moz_xmalloc 	memory/mozalloc/mozalloc.cpp
2 	xul.dll 	base::Histogram::InitializeBucketRange() 	ipc/chromium/src/base/histogram.cc
3 	xul.dll 	base::Histogram::FactoryGet(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int, unsigned __int64, base::Histogram::Flags) 	ipc/chromium/src/base/histogram.cc
4 	xul.dll 	`anonymous namespace'::HistogramGet(char const*, char const*, unsigned int, unsigned int, unsigned int, unsigned int, bool, base::Histogram**) 	toolkit/components/telemetry/Telemetry.cpp
5 	xul.dll 	`anonymous namespace'::GetHistogramByEnumId(mozilla::Telemetry::ID, base::Histogram**) 	toolkit/components/telemetry/Telemetry.cpp
6 	xul.dll 	mozilla::Telemetry::Accumulate(mozilla::Telemetry::ID, unsigned int) 	toolkit/components/telemetry/Telemetry.cpp
7 	xul.dll 	nsSocketTransportService::DoPollIteration(bool, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>*) 	netwerk/base/nsSocketTransportService2.cpp
8 	xul.dll 	nsSocketTransportService::Run() 	netwerk/base/nsSocketTransportService2.cpp
9 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp
10 	xul.dll 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/glue/nsThreadUtils.cpp
11 	xul.dll 	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp
12 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc
13 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc
14 	xul.dll 	nsThread::ThreadFunc(void*) 	xpcom/threads/nsThread.cpp
15 	nss3.dll 	PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c
16 	nss3.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c
17 	msvcr120.dll 	_callthreadstartex 	f:\dd\vctools\crt\crtw32\startup\threadex.c:376
18 	msvcr120.dll 	_threadstartex 	f:\dd\vctools\crt\crtw32\startup\threadex.c:354
19 	kernel32.dll 	BaseThreadInitThunk 	
20 	ntdll.dll 	RtlUserThreadStart 	
21 	kernel32.dll 	BasepReportFault 	
22 	kernel32.dll 	BasepReportFault

Updated

4 years ago
Summary: crash in log → crash in log via mozilla::Telemetry::Accumulate

Updated

4 years ago
Keywords: topcrash-thunderbird
Summary: crash in log via mozilla::Telemetry::Accumulate → startup crash in log via mozilla::Telemetry::Accumulate
Whiteboard: [tbird topcrash][startupcrash]
(Assignee)

Comment 1

4 years ago
The crash happens when the CPU does not support the "vpsrlq" instruction. Normally an alternate codepath would be used for these processors, but there is a bug in MSVCR120: https://connect.microsoft.com/VisualStudio/feedback/details/987093/x64-log-function-uses-vpsrlq-avx-instruction-without-regard-to-operating-system-so-it-crashes-on-vista-x64

It seems the only workaround is to disable these fast-path computations. That could affect perf, so we should try to disable only where necessary.
Assignee: nobody → dmajor
(Assignee)

Comment 3

4 years ago
Created attachment 8601767 [details] [diff] [review]
Microsoft's suggested workaround

(It's ok to call cpuid7 when it's not supported, as long as you don't use the result.)
Attachment #8601767 - Flags: review?(nfroyd)
Attachment #8601767 - Flags: review?(nfroyd) → review+
(Assignee)

Updated

4 years ago
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/3bf223d7eedd
Status: NEW → RESOLVED
Last Resolved: 4 years ago
status-firefox40: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
(Reporter)

Comment 6

4 years ago
There haven't been any reports of this bug with builds since the fix here was landed.
Status: RESOLVED → VERIFIED
status-firefox40: fixed → verified
(Assignee)

Comment 7

4 years ago
I think I missed the nightly by a few hours so really it's tomorrow's stats that will be interesting.
(Assignee)

Comment 8

4 years ago
Er, what I mean is that there hasn't yet been a nightly with this fix out in the wild.
(Reporter)

Comment 9

4 years ago
ok, I'll circle back to reconfirm crash stats on Monday.
(Assignee)

Comment 10

4 years ago
bp-4d516b25-457a-463a-a848-d83482150508

The fix isn't working, and I know why. My patch put the code in firefox.exe, which static-links the CRT. We'll also need to disable FMA3 in the DLL version of the CRT.
Status: VERIFIED → REOPENED
status-firefox40: verified → affected
Resolution: FIXED → ---
https://hg.mozilla.org/mozilla-central/rev/f7a79386ab63
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago4 years ago
status-firefox40: affected → fixed
Resolution: --- → FIXED
(Reporter)

Comment 13

4 years ago
no crashes reported of this signature with builds from yesterday, 20150510.
Status: RESOLVED → VERIFIED
status-firefox40: fixed → verified
(Assignee)

Comment 14

4 years ago
(Ah sorry Tracy, I was hoping to save you the trouble, but we mid-aired.)

I still see this happening on 20150510: bp-2a095a65-1f29-4966-aee3-622652150511

I notice that all the reports are Win7 RTM, build 7600. It seems that Windows did not gain support for these instructions until Win7 SP1: http://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Operating_system_support
Status: VERIFIED → REOPENED
status-firefox40: verified → affected
Resolution: FIXED → ---
(Reporter)

Comment 15

4 years ago
dmajor, how is it you're seeing reports, but I don't in the signature summary > table?
(Reporter)

Comment 16

4 years ago
ok, lesson learned: don't trust the table.  I see the crashes with 20150501030205 in the reports list.

Comment 17

4 years ago
(In reply to [:tracy] Tracy Walker - QA Mentor from comment #15)
> dmajor, how is it you're seeing reports, but I don't in the signature
> summary > table?

Can you tell me exactly what tab on what URL you are looking at? The "Table" tab is known to be buggy.
(Assignee)

Comment 18

4 years ago
Created attachment 8604253 [details] [diff] [review]
Also disable on Win7RTM, and move to mozglue

Flagging this one for another review since I'm doing something slightly yucky.

I just realized that XRE_main doesn't run in content processes. Instead of adding a third instance of this code, I decided to move it to DllBlocklist_Initialize. It doesn't fit the blocklist theme, but it's run by chrome and content processes alike, and it has the added benefit of being the very first line of code we run after loading MSVCR120.dll.
Attachment #8604253 - Flags: review?(nfroyd)
(Assignee)

Comment 19

4 years ago
Doh, trailing space. Already fixed locally.
Comment on attachment 8604253 [details] [diff] [review]
Also disable on Win7RTM, and move to mozglue

Review of attachment 8604253 [details] [diff] [review]:
-----------------------------------------------------------------

So, just to make sure I understand what's going on here:

- We initially put the code only in firefox.exe (nsWindowsWMain.cpp);
- That actually wasn't good enough, so we had to duplicate the code into nsAppRunner.cpp so libxul's calls into the CRT would be covered;
- (I admit to not really understanding why we have different CRTs for firefox.exe and xul.dll, but I trust there's good reasons for it)
- But that *still* doesn't cover content processes, because they start with another call in libxul (and they load a different CRT?);
- So rather than duplicate the code again for the content process's entry point, put it someplace where it's more-or-less covering firefox.exe, libxul, and plugin-container.exe?

Or did it not really matter that we had a call in nsWindowsWMain.cpp once we had the call in nsAppRunner.cpp?  So we could really get away with removing the bits in nsWindowsWMain.cpp and pasting them into the appropriate plugin-container.exe entry point?

I think the patch is reasonable, but I'd appreciate some short explanation that covers the above questions, if you could.
Attachment #8604253 - Flags: review?(nfroyd) → review+
(Assignee)

Comment 21

4 years ago
So, the first problem is that there are two copies of the CRT in the process. One is static-linked into firefox.exe (unfortunate fallout from the XPSP2 hack) and the other is a dynamic-linked instance of msvcr120.dll shared by the other binaries. If code in firefox.exe calls |log|, the use of FMA3 will be controlled by a flag located in firefox.exe memory. If code anywhere else calls |log|, FMA3 will be gated by a flag in msvcr120.dll's globals. So to be on the safe side, we need to disable both.

The other problem is that the particular place where I tried to disable msvcr120.dll's copy of the flag, was in a codepath that isn't executed by child processes. The blocklist code is executed on both types of processes, so it's a better place to disable the msvcr120 flag.
https://hg.mozilla.org/mozilla-central/rev/730d9ce9776b
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago4 years ago
status-firefox41: --- → fixed
Resolution: --- → FIXED
(Assignee)

Comment 24

4 years ago
Finally! The crashes are gone from nightly. In the meantime version 40 has moved to aurora, but there haven't been any builds released yet. Hopefully I can get this uplifted before the first build goes out.

Despite a lack of reports on beta, I'd like to uplift there too, because I don't see why the issue couldn't happen there as well.

Comment 25

4 years ago
[Tracking Requested - why for this release]:
Given this is a startup crash, requesting tracking. Ideally, we should get this into 40 today so it will be in tomorrow when we want to turn on updates on Dev Edition.
tracking-firefox40: --- → ?
(Assignee)

Comment 26

4 years ago
Created attachment 8605870 [details] [diff] [review]
patch for aurora
Attachment #8605870 - Flags: review+
(Assignee)

Comment 27

4 years ago
Comment on attachment 8605870 [details] [diff] [review]
patch for aurora

Approval Request Comment
[Feature/regressing bug #]: Microsoft C library
[User impact if declined]: Startup crashes on win64
[Describe test coverage new/current, TreeHerder]: confirmed by crash-stats
[Risks and why]: Low risk, and win64 only
[String/UUID change made/needed]: none
Attachment #8605870 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 28

4 years ago
Setting 39 status to "presumed affected" despite a lack of reports, since the tools key off this flag.
status-firefox39: --- → affected
(Assignee)

Comment 29

4 years ago
Created attachment 8605880 [details] [diff] [review]
patch for beta
Attachment #8605880 - Flags: review+
(Assignee)

Comment 30

4 years ago
Comment on attachment 8605880 [details] [diff] [review]
patch for beta

Approval Request Comment
[Feature/regressing bug #]: Microsoft C library
[User impact if declined]: Possible startup crashes on win64
[Describe test coverage new/current, TreeHerder]: confirmed by crash-stats
[Risks and why]: Low risk, and win64 only
[String/UUID change made/needed]: none
Attachment #8605880 - Flags: approval-mozilla-beta?
Comment on attachment 8605870 [details] [diff] [review]
patch for aurora

Verified fix. Good work on this one. Aurora+
Attachment #8605870 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Nomming for 39 as per comment 28 this may affect the 39 release as well.
tracking-firefox39: --- → ?
tracking-firefox40: ? → +
Target Milestone: mozilla40 → mozilla41
Comment on attachment 8605880 [details] [diff] [review]
patch for beta

Approved for uplift to beta (39). No reported crashes for 39 but based on comment 28, sounds like this needs to be on 39 as well.
Attachment #8605880 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
tracking-firefox39: ? → +
Depends on: 1279769
You need to log in before you can comment on or make changes to this bug.