Last Comment Bug 771083 - Shutdown telemetry causes "Assertion failure: r == count, at xpcom/build/mozPoisonWriteMac.cpp:194"
: Shutdown telemetry causes "Assertion failure: r == count, at xpcom/build/mozP...
Status: RESOLVED FIXED
: assertion, regression
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: Trunk
: x86 Mac OS X
: -- normal (vote)
: mozilla16
Assigned To: Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
:
Mentors:
: 771920 (view as bug list)
Depends on:
Blocks: 732173 753461
  Show dependency treegraph
 
Reported: 2012-07-05 03:14 PDT by Panos Astithas [:past]
Modified: 2012-07-09 06:50 PDT (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
White list the shutdown time fd (1.51 KB, patch)
2012-07-05 14:11 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
benjamin: review+
Details | Diff | Review

Description Panos Astithas [:past] 2012-07-05 03:14:47 PDT
My latest fx-team tip build (changeset ea890a6eed56) crashes on shutdown with this message. Stack:

Thread 0 Crashed:: Main Thread  Dispatch queue: com.apple.main-thread
0   XUL                             0x0000000102ebc076 (anonymous namespace)::AbortOnBadWrite(int, void const*, unsigned long) + 774 (mozPoisonWriteMac.cpp:183)
1   XUL                             0x0000000102ebc292 (anonymous namespace)::wrap_write(int, void const*, unsigned long) + 34 (mozPoisonWriteMac.cpp:40)
2   libnspr4.dylib                  0x0000000100080da2 pt_Write + 98 (ptio.c:1315)
3   libnspr4.dylib                  0x0000000100069f05 PR_vfprintf + 69 (prstdio.c:32)
4   libnspr4.dylib                  0x0000000100069fcb PR_fprintf + 155 (prstdio.c:22)
5   XUL                             0x00000001027b744b __tcf_0 + 315 (nsAppStartup.cpp:308)
6   libsystem_c.dylib               0x00007fff86f9f7c8 __cxa_finalize + 274
7   libsystem_c.dylib               0x00007fff86f9f652 exit + 18
8   firefox-bin                     0x000000010000111b start + 59

Full info at:

http://past.pastebin.mozilla.org/1691309
Comment 1 Jesse Ruderman 2012-07-05 12:31:43 PDT
I hit this on mozilla-central as well, if I set
  user_pref("toolkit.telemetry.enabled", true);

The stack trace in comment 0 is somehow missing a few function names, despite having correct line numbers.  Here's a better stack:

> Thread 0 Crashed:: Main Thread  Dispatch queue: com.apple.main-thread
> 0   XUL                           	0x00000001031454f8 (anonymous namespace)::AbortOnBadWrite(int, void const*, unsigned long) + 664 (mozPoisonWriteMac.cpp:194)
> 1   XUL                           	0x00000001031457d3 _ZN12_GLOBAL__N_115wrap_write_tempILZNS_10write_dataEEEEliPKvm + 35 (mozPoisonWriteMac.cpp:39)
> 2   XUL                           	0x00000001031457a3 (anonymous namespace)::wrap_write(int, void const*, unsigned long) + 35 (mozPoisonWriteMac.cpp:85)
> 3   libnspr4.dylib                	0x000000010009f4e4 pt_Write + 84 (ptio.c:1315)
> 4   libnspr4.dylib                	0x000000010006e906 PR_Write + 54 (priometh.c:114)
> 5   libnspr4.dylib                	0x0000000100079943 PR_vfprintf + 99 (prstdio.c:70)
> 6   libnspr4.dylib                	0x000000010007989c PR_fprintf + 364 (prstdio.c:19)
> 7   XUL                           	0x000000010294b37c _ZL26RecordShutdownEndTimeStampv + 316 (nsAppStartup.cpp:308)
> 8   XUL                           	0x000000010294bd11 RecordShutdownEndTimeStampHelper::~RecordShutdownEndTimeStampHelper() + 17 (nsAppStartup.cpp:324)
> 9   XUL                           	0x000000010294b535 RecordShutdownEndTimeStampHelper::~RecordShutdownEndTimeStampHelper() + 21 (nsAppStartup.cpp:324)
> 10  libsystem_c.dylib             	0x00007fff87ab37c8 __cxa_finalize + 274
> 11  libsystem_c.dylib             	0x00007fff87ab3652 exit + 18
> 12  firefox-bin                   	0x0000000100000e4b start + 59
Comment 2 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-07-05 13:22:14 PDT
Interesting, I do wonder how why this doesn't fail on m-c.
Comment 3 :Ehsan Akhgari (busy, don't ask for review please) 2012-07-05 13:29:37 PDT
(In reply to comment #2)
> Interesting, I do wonder how why this doesn't fail on m-c.

I think telemetry is disabled by default in debug builds.
Comment 4 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-07-05 14:11:26 PDT
Created attachment 639465 [details] [diff] [review]
White list the shutdown time fd

The patch also switches to using good old FILEs as I could not find how to say fileno in mozillian.

There are some alternatives on how to fix this. We could disable it on debug builds as the collected time is not relevant for debug builds, but we should also consider what release builds will look like in the end.

On a release build, we will _exit(0) early or poison write just like in a debug build. In the _exit(0) case the correct answer is clear, we should record the time just before the _exit(0) call and don't have to care about poisoning. In the case where we do a full shutdown, what should we write to disk? The time at which _exit(0) would have been called or the actual time it took us to shutdown. I guess the second option is probably better, in which case this code has to handle poisoning anyway.
Comment 5 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-07-06 17:45:27 PDT
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=17305771f594
Comment 6 Ryan VanderMeulen [:RyanVM] 2012-07-07 12:00:34 PDT
https://hg.mozilla.org/mozilla-central/rev/17305771f594
Comment 7 Boris Zbarsky [:bz] 2012-07-08 09:36:36 PDT
*** Bug 771920 has been marked as a duplicate of this bug. ***
Comment 8 Nathan Froyd [:froydnj] 2012-07-08 17:04:08 PDT
(In reply to Ehsan Akhgari [:ehsan] from comment #3)
> (In reply to comment #2)
> > Interesting, I do wonder how why this doesn't fail on m-c.
> 
> I think telemetry is disabled by default in debug builds.

JFTR, telemetry is disabled by default in non-official builds, period.
Comment 9 Boris Zbarsky [:bz] 2012-07-08 21:48:48 PDT
Uh... my builds that were hitting this bug are very much non-official last I checked!
Comment 10 Nathan Froyd [:froydnj] 2012-07-09 06:50:39 PDT
(In reply to Boris Zbarsky (:bz) from comment #9)
> Uh... my builds that were hitting this bug are very much non-official last I
> checked!

Hm, maybe we are missing some telemetry-enabled checks somewhere, then...

Note You need to log in before you can comment on or make changes to this bug.