Closed Bug 1160148 Opened 10 years ago Closed 10 years ago

startup crash in log via mozilla::Telemetry::Accumulate

Categories

(Toolkit :: Telemetry, defect)

Unspecified
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox39 + fixed
firefox40 + fixed
firefox41 --- fixed

People

(Reporter: tracy, Assigned: away)

References

Details

(Keywords: crash, topcrash-thunderbird, topcrash-win, Whiteboard: [tbird topcrash][startupcrash])

Crash Data

Attachments

(4 files)

Startup crasher is number 3 on the 3 day top crash list. First affected Nightly build was 2015042703 on Windows 7 amd64 only. This bug was filed from the Socorro interface and is report bp-2f8ef536-1f91-4c00-9413-6a3a92150427. ============================================================= Frame Module Signature Source 0 msvcr120.dll log f:\dd\vctools\crt\fpw32\tran\amd64\log.asm:429 1 mozglue.dll moz_xmalloc memory/mozalloc/mozalloc.cpp 2 xul.dll base::Histogram::InitializeBucketRange() ipc/chromium/src/base/histogram.cc 3 xul.dll base::Histogram::FactoryGet(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int, unsigned __int64, base::Histogram::Flags) ipc/chromium/src/base/histogram.cc 4 xul.dll `anonymous namespace'::HistogramGet(char const*, char const*, unsigned int, unsigned int, unsigned int, unsigned int, bool, base::Histogram**) toolkit/components/telemetry/Telemetry.cpp 5 xul.dll `anonymous namespace'::GetHistogramByEnumId(mozilla::Telemetry::ID, base::Histogram**) toolkit/components/telemetry/Telemetry.cpp 6 xul.dll mozilla::Telemetry::Accumulate(mozilla::Telemetry::ID, unsigned int) toolkit/components/telemetry/Telemetry.cpp 7 xul.dll nsSocketTransportService::DoPollIteration(bool, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>*) netwerk/base/nsSocketTransportService2.cpp 8 xul.dll nsSocketTransportService::Run() netwerk/base/nsSocketTransportService2.cpp 9 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp 10 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp 11 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp 12 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc 13 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc 14 xul.dll nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp 15 nss3.dll PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c 16 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c 17 msvcr120.dll _callthreadstartex f:\dd\vctools\crt\crtw32\startup\threadex.c:376 18 msvcr120.dll _threadstartex f:\dd\vctools\crt\crtw32\startup\threadex.c:354 19 kernel32.dll BaseThreadInitThunk 20 ntdll.dll RtlUserThreadStart 21 kernel32.dll BasepReportFault 22 kernel32.dll BasepReportFault
Summary: crash in log → crash in log via mozilla::Telemetry::Accumulate
Summary: crash in log via mozilla::Telemetry::Accumulate → startup crash in log via mozilla::Telemetry::Accumulate
Whiteboard: [tbird topcrash][startupcrash]
The crash happens when the CPU does not support the "vpsrlq" instruction. Normally an alternate codepath would be used for these processors, but there is a bug in MSVCR120: https://connect.microsoft.com/VisualStudio/feedback/details/987093/x64-log-function-uses-vpsrlq-avx-instruction-without-regard-to-operating-system-so-it-crashes-on-vista-x64 It seems the only workaround is to disable these fast-path computations. That could affect perf, so we should try to disable only where necessary.
Assignee: nobody → dmajor
(It's ok to call cpuid7 when it's not supported, as long as you don't use the result.)
Attachment #8601767 - Flags: review?(nfroyd)
Attachment #8601767 - Flags: review?(nfroyd) → review+
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
There haven't been any reports of this bug with builds since the fix here was landed.
Status: RESOLVED → VERIFIED
I think I missed the nightly by a few hours so really it's tomorrow's stats that will be interesting.
Er, what I mean is that there hasn't yet been a nightly with this fix out in the wild.
ok, I'll circle back to reconfirm crash stats on Monday.
bp-4d516b25-457a-463a-a848-d83482150508 The fix isn't working, and I know why. My patch put the code in firefox.exe, which static-links the CRT. We'll also need to disable FMA3 in the DLL version of the CRT.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
no crashes reported of this signature with builds from yesterday, 20150510.
Status: RESOLVED → VERIFIED
(Ah sorry Tracy, I was hoping to save you the trouble, but we mid-aired.) I still see this happening on 20150510: bp-2a095a65-1f29-4966-aee3-622652150511 I notice that all the reports are Win7 RTM, build 7600. It seems that Windows did not gain support for these instructions until Win7 SP1: http://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Operating_system_support
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
dmajor, how is it you're seeing reports, but I don't in the signature summary > table?
ok, lesson learned: don't trust the table. I see the crashes with 20150501030205 in the reports list.
(In reply to [:tracy] Tracy Walker - QA Mentor from comment #15) > dmajor, how is it you're seeing reports, but I don't in the signature > summary > table? Can you tell me exactly what tab on what URL you are looking at? The "Table" tab is known to be buggy.
Flagging this one for another review since I'm doing something slightly yucky. I just realized that XRE_main doesn't run in content processes. Instead of adding a third instance of this code, I decided to move it to DllBlocklist_Initialize. It doesn't fit the blocklist theme, but it's run by chrome and content processes alike, and it has the added benefit of being the very first line of code we run after loading MSVCR120.dll.
Attachment #8604253 - Flags: review?(nfroyd)
Doh, trailing space. Already fixed locally.
Comment on attachment 8604253 [details] [diff] [review] Also disable on Win7RTM, and move to mozglue Review of attachment 8604253 [details] [diff] [review]: ----------------------------------------------------------------- So, just to make sure I understand what's going on here: - We initially put the code only in firefox.exe (nsWindowsWMain.cpp); - That actually wasn't good enough, so we had to duplicate the code into nsAppRunner.cpp so libxul's calls into the CRT would be covered; - (I admit to not really understanding why we have different CRTs for firefox.exe and xul.dll, but I trust there's good reasons for it) - But that *still* doesn't cover content processes, because they start with another call in libxul (and they load a different CRT?); - So rather than duplicate the code again for the content process's entry point, put it someplace where it's more-or-less covering firefox.exe, libxul, and plugin-container.exe? Or did it not really matter that we had a call in nsWindowsWMain.cpp once we had the call in nsAppRunner.cpp? So we could really get away with removing the bits in nsWindowsWMain.cpp and pasting them into the appropriate plugin-container.exe entry point? I think the patch is reasonable, but I'd appreciate some short explanation that covers the above questions, if you could.
Attachment #8604253 - Flags: review?(nfroyd) → review+
So, the first problem is that there are two copies of the CRT in the process. One is static-linked into firefox.exe (unfortunate fallout from the XPSP2 hack) and the other is a dynamic-linked instance of msvcr120.dll shared by the other binaries. If code in firefox.exe calls |log|, the use of FMA3 will be controlled by a flag located in firefox.exe memory. If code anywhere else calls |log|, FMA3 will be gated by a flag in msvcr120.dll's globals. So to be on the safe side, we need to disable both. The other problem is that the particular place where I tried to disable msvcr120.dll's copy of the flag, was in a codepath that isn't executed by child processes. The blocklist code is executed on both types of processes, so it's a better place to disable the msvcr120 flag.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Finally! The crashes are gone from nightly. In the meantime version 40 has moved to aurora, but there haven't been any builds released yet. Hopefully I can get this uplifted before the first build goes out. Despite a lack of reports on beta, I'd like to uplift there too, because I don't see why the issue couldn't happen there as well.
[Tracking Requested - why for this release]: Given this is a startup crash, requesting tracking. Ideally, we should get this into 40 today so it will be in tomorrow when we want to turn on updates on Dev Edition.
Attached patch patch for auroraSplinter Review
Attachment #8605870 - Flags: review+
Comment on attachment 8605870 [details] [diff] [review] patch for aurora Approval Request Comment [Feature/regressing bug #]: Microsoft C library [User impact if declined]: Startup crashes on win64 [Describe test coverage new/current, TreeHerder]: confirmed by crash-stats [Risks and why]: Low risk, and win64 only [String/UUID change made/needed]: none
Attachment #8605870 - Flags: approval-mozilla-aurora?
Setting 39 status to "presumed affected" despite a lack of reports, since the tools key off this flag.
Attached patch patch for betaSplinter Review
Attachment #8605880 - Flags: review+
Comment on attachment 8605880 [details] [diff] [review] patch for beta Approval Request Comment [Feature/regressing bug #]: Microsoft C library [User impact if declined]: Possible startup crashes on win64 [Describe test coverage new/current, TreeHerder]: confirmed by crash-stats [Risks and why]: Low risk, and win64 only [String/UUID change made/needed]: none
Attachment #8605880 - Flags: approval-mozilla-beta?
Comment on attachment 8605870 [details] [diff] [review] patch for aurora Verified fix. Good work on this one. Aurora+
Attachment #8605870 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Nomming for 39 as per comment 28 this may affect the 39 release as well.
Target Milestone: mozilla40 → mozilla41
Comment on attachment 8605880 [details] [diff] [review] patch for beta Approved for uplift to beta (39). No reported crashes for 39 but based on comment 28, sounds like this needs to be on 39 as well.
Attachment #8605880 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Depends on: 1279769
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: