Closed
Bug 1160148
Opened 10 years ago
Closed 10 years ago
startup crash in log via mozilla::Telemetry::Accumulate
Categories
(Toolkit :: Telemetry, defect)
Tracking
()
RESOLVED
FIXED
mozilla41
People
(Reporter: tracy, Assigned: away)
References
Details
(Keywords: crash, topcrash-thunderbird, topcrash-win, Whiteboard: [tbird topcrash][startupcrash])
Crash Data
Attachments
(4 files)
1.49 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
3.61 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
2.65 KB,
patch
|
away
:
review+
lmandel
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
2.74 KB,
patch
|
away
:
review+
lizzard
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
Startup crasher is number 3 on the 3 day top crash list. First affected Nightly build was 2015042703 on Windows 7 amd64 only.
This bug was filed from the Socorro interface and is
report bp-2f8ef536-1f91-4c00-9413-6a3a92150427.
=============================================================
Frame Module Signature Source
0 msvcr120.dll log f:\dd\vctools\crt\fpw32\tran\amd64\log.asm:429
1 mozglue.dll moz_xmalloc memory/mozalloc/mozalloc.cpp
2 xul.dll base::Histogram::InitializeBucketRange() ipc/chromium/src/base/histogram.cc
3 xul.dll base::Histogram::FactoryGet(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int, unsigned __int64, base::Histogram::Flags) ipc/chromium/src/base/histogram.cc
4 xul.dll `anonymous namespace'::HistogramGet(char const*, char const*, unsigned int, unsigned int, unsigned int, unsigned int, bool, base::Histogram**) toolkit/components/telemetry/Telemetry.cpp
5 xul.dll `anonymous namespace'::GetHistogramByEnumId(mozilla::Telemetry::ID, base::Histogram**) toolkit/components/telemetry/Telemetry.cpp
6 xul.dll mozilla::Telemetry::Accumulate(mozilla::Telemetry::ID, unsigned int) toolkit/components/telemetry/Telemetry.cpp
7 xul.dll nsSocketTransportService::DoPollIteration(bool, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>*) netwerk/base/nsSocketTransportService2.cpp
8 xul.dll nsSocketTransportService::Run() netwerk/base/nsSocketTransportService2.cpp
9 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp
10 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp
11 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp
12 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc
13 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc
14 xul.dll nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp
15 nss3.dll PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c
16 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c
17 msvcr120.dll _callthreadstartex f:\dd\vctools\crt\crtw32\startup\threadex.c:376
18 msvcr120.dll _threadstartex f:\dd\vctools\crt\crtw32\startup\threadex.c:354
19 kernel32.dll BaseThreadInitThunk
20 ntdll.dll RtlUserThreadStart
21 kernel32.dll BasepReportFault
22 kernel32.dll BasepReportFault
Updated•10 years ago
|
Summary: crash in log → crash in log via mozilla::Telemetry::Accumulate
Updated•10 years ago
|
Keywords: topcrash-thunderbird
Summary: crash in log via mozilla::Telemetry::Accumulate → startup crash in log via mozilla::Telemetry::Accumulate
Whiteboard: [tbird topcrash][startupcrash]
The crash happens when the CPU does not support the "vpsrlq" instruction. Normally an alternate codepath would be used for these processors, but there is a bug in MSVCR120: https://connect.microsoft.com/VisualStudio/feedback/details/987093/x64-log-function-uses-vpsrlq-avx-instruction-without-regard-to-operating-system-so-it-crashes-on-vista-x64
It seems the only workaround is to disable these fast-path computations. That could affect perf, so we should try to disable only where necessary.
Assignee: nobody → dmajor
We should check: CPUID.(EAX=07H, ECX=0H):EBX.AVX2[bit 5]==1
Per: https://software.intel.com/sites/default/files/article/405250/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family.pdf
(It's ok to call cpuid7 when it's not supported, as long as you don't use the result.)
Attachment #8601767 -
Flags: review?(nfroyd)
Updated•10 years ago
|
Attachment #8601767 -
Flags: review?(nfroyd) → review+
Keywords: checkin-needed
Keywords: checkin-needed
Comment 5•10 years ago
|
||
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Reporter | ||
Comment 6•10 years ago
|
||
There haven't been any reports of this bug with builds since the fix here was landed.
Status: RESOLVED → VERIFIED
I think I missed the nightly by a few hours so really it's tomorrow's stats that will be interesting.
Er, what I mean is that there hasn't yet been a nightly with this fix out in the wild.
Reporter | ||
Comment 9•10 years ago
|
||
ok, I'll circle back to reconfirm crash stats on Monday.
Assignee | ||
Comment 10•10 years ago
|
||
bp-4d516b25-457a-463a-a848-d83482150508
The fix isn't working, and I know why. My patch put the code in firefox.exe, which static-links the CRT. We'll also need to disable FMA3 in the DLL version of the CRT.
Comment 11•10 years ago
|
||
Comment 12•10 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 13•10 years ago
|
||
no crashes reported of this signature with builds from yesterday, 20150510.
Status: RESOLVED → VERIFIED
Assignee | ||
Comment 14•10 years ago
|
||
(Ah sorry Tracy, I was hoping to save you the trouble, but we mid-aired.)
I still see this happening on 20150510: bp-2a095a65-1f29-4966-aee3-622652150511
I notice that all the reports are Win7 RTM, build 7600. It seems that Windows did not gain support for these instructions until Win7 SP1: http://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Operating_system_support
Reporter | ||
Comment 15•10 years ago
|
||
dmajor, how is it you're seeing reports, but I don't in the signature summary > table?
Reporter | ||
Comment 16•10 years ago
|
||
ok, lesson learned: don't trust the table. I see the crashes with 20150501030205 in the reports list.
Comment 17•10 years ago
|
||
(In reply to [:tracy] Tracy Walker - QA Mentor from comment #15)
> dmajor, how is it you're seeing reports, but I don't in the signature
> summary > table?
Can you tell me exactly what tab on what URL you are looking at? The "Table" tab is known to be buggy.
Assignee | ||
Comment 18•10 years ago
|
||
Flagging this one for another review since I'm doing something slightly yucky.
I just realized that XRE_main doesn't run in content processes. Instead of adding a third instance of this code, I decided to move it to DllBlocklist_Initialize. It doesn't fit the blocklist theme, but it's run by chrome and content processes alike, and it has the added benefit of being the very first line of code we run after loading MSVCR120.dll.
Attachment #8604253 -
Flags: review?(nfroyd)
Assignee | ||
Comment 19•10 years ago
|
||
Doh, trailing space. Already fixed locally.
Comment 20•10 years ago
|
||
Comment on attachment 8604253 [details] [diff] [review]
Also disable on Win7RTM, and move to mozglue
Review of attachment 8604253 [details] [diff] [review]:
-----------------------------------------------------------------
So, just to make sure I understand what's going on here:
- We initially put the code only in firefox.exe (nsWindowsWMain.cpp);
- That actually wasn't good enough, so we had to duplicate the code into nsAppRunner.cpp so libxul's calls into the CRT would be covered;
- (I admit to not really understanding why we have different CRTs for firefox.exe and xul.dll, but I trust there's good reasons for it)
- But that *still* doesn't cover content processes, because they start with another call in libxul (and they load a different CRT?);
- So rather than duplicate the code again for the content process's entry point, put it someplace where it's more-or-less covering firefox.exe, libxul, and plugin-container.exe?
Or did it not really matter that we had a call in nsWindowsWMain.cpp once we had the call in nsAppRunner.cpp? So we could really get away with removing the bits in nsWindowsWMain.cpp and pasting them into the appropriate plugin-container.exe entry point?
I think the patch is reasonable, but I'd appreciate some short explanation that covers the above questions, if you could.
Attachment #8604253 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 21•10 years ago
|
||
So, the first problem is that there are two copies of the CRT in the process. One is static-linked into firefox.exe (unfortunate fallout from the XPSP2 hack) and the other is a dynamic-linked instance of msvcr120.dll shared by the other binaries. If code in firefox.exe calls |log|, the use of FMA3 will be controlled by a flag located in firefox.exe memory. If code anywhere else calls |log|, FMA3 will be gated by a flag in msvcr120.dll's globals. So to be on the safe side, we need to disable both.
The other problem is that the particular place where I tried to disable msvcr120.dll's copy of the flag, was in a codepath that isn't executed by child processes. The blocklist code is executed on both types of processes, so it's a better place to disable the msvcr120 flag.
Comment 22•10 years ago
|
||
Comment 23•10 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
status-firefox41:
--- → fixed
Resolution: --- → FIXED
Assignee | ||
Comment 24•10 years ago
|
||
Finally! The crashes are gone from nightly. In the meantime version 40 has moved to aurora, but there haven't been any builds released yet. Hopefully I can get this uplifted before the first build goes out.
Despite a lack of reports on beta, I'd like to uplift there too, because I don't see why the issue couldn't happen there as well.
Comment 25•10 years ago
|
||
[Tracking Requested - why for this release]:
Given this is a startup crash, requesting tracking. Ideally, we should get this into 40 today so it will be in tomorrow when we want to turn on updates on Dev Edition.
tracking-firefox40:
--- → ?
Assignee | ||
Comment 26•10 years ago
|
||
Attachment #8605870 -
Flags: review+
Assignee | ||
Comment 27•10 years ago
|
||
Comment on attachment 8605870 [details] [diff] [review]
patch for aurora
Approval Request Comment
[Feature/regressing bug #]: Microsoft C library
[User impact if declined]: Startup crashes on win64
[Describe test coverage new/current, TreeHerder]: confirmed by crash-stats
[Risks and why]: Low risk, and win64 only
[String/UUID change made/needed]: none
Attachment #8605870 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 28•10 years ago
|
||
Setting 39 status to "presumed affected" despite a lack of reports, since the tools key off this flag.
status-firefox39:
--- → affected
Assignee | ||
Comment 29•10 years ago
|
||
Attachment #8605880 -
Flags: review+
Assignee | ||
Comment 30•10 years ago
|
||
Comment on attachment 8605880 [details] [diff] [review]
patch for beta
Approval Request Comment
[Feature/regressing bug #]: Microsoft C library
[User impact if declined]: Possible startup crashes on win64
[Describe test coverage new/current, TreeHerder]: confirmed by crash-stats
[Risks and why]: Low risk, and win64 only
[String/UUID change made/needed]: none
Attachment #8605880 -
Flags: approval-mozilla-beta?
Comment 31•10 years ago
|
||
Comment on attachment 8605870 [details] [diff] [review]
patch for aurora
Verified fix. Good work on this one. Aurora+
Attachment #8605870 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 32•10 years ago
|
||
Nomming for 39 as per comment 28 this may affect the 39 release as well.
tracking-firefox39:
--- → ?
Updated•10 years ago
|
Target Milestone: mozilla40 → mozilla41
Comment 33•10 years ago
|
||
Comment 34•10 years ago
|
||
Comment on attachment 8605880 [details] [diff] [review]
patch for beta
Approved for uplift to beta (39). No reported crashes for 39 but based on comment 28, sounds like this needs to be on 39 as well.
Attachment #8605880 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Updated•10 years ago
|
Comment 35•10 years ago
|
||
You need to log in
before you can comment on or make changes to this bug.
Description
•