Closed Bug 826029 Opened 7 years ago Closed 7 years ago

Assertion in mozPoisonWriteMac due to Mac camera code trying to write a defaults file on exit | Assertion failure: ok, at ../../../xpcom/build/mozPoisonWriteMac.cpp:90

Categories

(Core :: WebRTC: Audio/Video, defect, critical)

17 Branch
x86_64
macOS
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla20

People

(Reporter: jesup, Assigned: espindola)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure, Whiteboard: [getUserMedia][blocking-gum+][qa-])

Attachments

(2 files, 3 obsolete files)

This appears to be writing a device preference file on exit from a global object.  We need to try to identify what it's writing and see if we can force it to write before we call _exit() in opt shutdown.

Per espindola, it may need CACFPreferences::Synchronize


(gdb) bt
#0  (anonymous namespace)::ValidWriteAssert (ok=false) at /Users/ekr/dev/mozilla-inbound/xpcom/build/mozPoisonWriteMac.cpp:90
#1  0x000000010388a431 in (anonymous namespace)::AbortOnBadWrite (fd=4, wbuf=0x100384c00, count=889) at /Users/ekr/dev/mozilla-inbound/xpcom/build/mozPoisonWriteMac.cpp:353
#2  0x000000010388b043 in _ZN12_GLOBAL__N_115wrap_write_tempILZNS_10write_dataEEEEliPKvm (fd=4, buf=0x100384c00, count=889) at /Users/ekr/dev/mozilla-inbound/xpcom/build/mozPoisonWriteMac.cpp:213
#3  0x00007fff8a6b1a1b in -[CFXPreferencesPropertyListSource writePlistToDisk] ()
#4  0x00007fff8a6871a0 in -[CFXPreferencesPropertyListSource synchronize] ()
#5  0x00007fff8a6ae90b in CFPreferencesSynchronize ()
#6  0x00007fff85136809 in CACFPreferences::Synchronize ()
#7  0x00007fff85131978 in MIO::DAL::DALDefaultDevice::UpdateDefaultDevice ()
#8  0x00007fff85131bf5 in MIO::DAL::DALDefaultDevice::UpdateDefaultDevices ()
#9  0x00007fff8512f83b in MIO::DAL::System::ObjectsPublishedAndDied ()
#10 0x00007fff8512a5fc in TundraObjectsPublishedAndDied ()
#11 0x0000000130216630 in dyld_stub_vvpowf ()
#12 0x0000000130216a11 in dyld_stub_vvpowf ()
#13 0x00000001302223e0 in AppleDALScreenInputDeviceNewPlugIn ()
#14 0x00007fff8512fe2d in MIO::DAL::PlugIn::Teardown ()
#15 0x00007fff85132cac in MIO::DAL::PlugInManagement::Teardown ()
#16 0x00007fff8512ed34 in MIO::DAL::System::AtExitHandler ()
#17 0x00007fff85131010 in MIO::DAL::AtExit::AtExitHandler ()
#18 0x00007fff8545f37f in __cxa_finalize ()
#19 0x00007fff8545f28c in exit ()
#20 0x0000000100000e4b in start ()
This failure happens all the time on the Alder branch for OS X 10.6 debug builds only. Same I have also seen on inbound today when i tried to enable crashtests per default:

https://tbpl.mozilla.org/php/getParsedLog.php?id=18395381&tree=Alder
Summary: Assertion in mozPoisonWriteMac due to Mac camera code trying to write a defaults file on exit → Assertion in mozPoisonWriteMac due to Mac camera code trying to write a defaults file on exit | Assertion failure: ok, at ../../../xpcom/build/mozPoisonWriteMac.cpp:90
Whiteboard: [getUserMedia], [blocking-gum+] → [getUserMedia][blocking-gum+][automation-blocked]
(In reply to Henrik Skupin (:whimboo) from comment #1)
> This failure happens all the time on the Alder branch for OS X 10.6 debug
> builds only. Same I have also seen on inbound today when i tried to enable
> crashtests per default:
> 
> https://tbpl.mozilla.org/php/getParsedLog.php?id=18395381&tree=Alder

Is it the same backtrace?
As given by Randell on IRC it's very likely that it is the same issue even we do not get any stacktrace from try server runs.
I think this might fix it, but my build is still going.
Assignee: rjesup → respindola
Status: NEW → ASSIGNED
This is the last point we can do it in XUL. I will try exporting a symbol too to see what it looks like.
Attachment #697199 - Attachment is obsolete: true
Adding r? vladan for when we reenable writes.
Attachment #697218 - Attachment is obsolete: true
Attachment #697247 - Flags: review?(vdjeric)
Comment on attachment 697247 [details] [diff] [review]
Enable writes again at the end of main.

f? to ekr to know if this works and a r? for ehsan for the XRE_* bits.
Attachment #697247 - Flags: review?(ehsan)
Attachment #697247 - Flags: feedback?(ekr)
Attachment #697247 - Flags: review?(ehsan) → review+
Attachment #697247 - Flags: review?(vdjeric) → feedback+
Don't forget to file a bug to re-enable this later + make it block shutdown-faster
Comment on attachment 697247 [details] [diff] [review]
Enable writes again at the end of main.

just ran three tests in a row and it looks good.
Attachment #697247 - Flags: feedback?(ekr) → feedback+
Backed out on suspicion of WINNT xpcshell failures (push before didn't have any builds, but was a merge from m-c, which was green over there. So unless it is in fact clobber-needed bustage, this bug is the cause):
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=xpcshell&rev=bfeb3bc3da4e

https://hg.mozilla.org/integration/mozilla-inbound/rev/bf3fd2911e14
I did a try push to
https://tbpl.mozilla.org/?tree=Try&rev=9eaac6b0861d

I am also trying to reproduce this locally. In my first attempt the tests just passed. I am now doing a clean build with the "same" mozconfig that try uses. As far as I can tell the only difference is the msvc install path.
Turns out that the crash on windows was because we would try to call a XUL function after XUL was unloaded when running firefox with -process-updates.

I was curious why this was not exploding everywhere. The reasons are
* On OS X we never unload XUL.
* On linux we try to unload, but fail because of bug 826567.
* No idea why the previous patch works on a regular windows run.

The new patch tries to enable writes again only on OS X.
https://tbpl.mozilla.org/?tree=Try&rev=a823021b0e2d
Attachment #697247 - Attachment is obsolete: true
Attachment #697758 - Flags: review?(ehsan)
Attachment #697758 - Flags: review?(ehsan) → review+
This patch does not appear to apply on trunk:

[454] hg qpush --move 1
applying a823021b0e2d
patching file browser/app/nsBrowserApp.cpp
Hunk #1 FAILED at 101
Hunk #2 FAILED at 274
2 out of 2 hunks FAILED -- saving rejects to file browser/app/nsBrowserApp.cpp.rej
patching file toolkit/xre/nsAppRunner.cpp
patching file xpcom/build/nsXULAppAPI.h
patch failed to apply
toolkit/xre/nsAppRunner.cpp
xpcom/build/nsXULAppAPI.h
patch failed, rejects left in working dir
errors during apply, please fix and refresh a823021b0e2d
I tried to resolve the conflicts and now I get:

/Users/ekr/dev/mozilla-inbound/obj-x86_64-apple-darwin10.8.0/_virtualenv/bin/python /Users/ekr/dev/mozilla-inbound/config/pythonpath.py -I../../config /Users/ekr/dev/mozilla-inbound/config/expandlibs_exec.py --depend .deps/firefox.pp --target firefox --uselist --  /opt/local/bin/ccache /opt/local/bin/clang++-mp-3.2 -o firefox -Qunused-arguments  -Qunused-arguments -Wall -Wpointer-arith -Woverloaded-virtual -Werror=return-type -Wtype-limits -Wempty-body -Wno-ctor-dtor-privacy -Wno-overlength-strings -Wno-invalid-offsetof -Wno-variadic-macros -Wno-c++0x-extensions -Wno-extended-offsetof -Wno-unknown-warning-option -Wno-return-type-c-linkage -Wno-mismatched-tags -fno-exceptions -fno-strict-aliasing -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -std=gnu++0x -pthread -DNO_X11 -pipe  -DDEBUG -D_DEBUG -DTRACING -g -fno-omit-frame-pointer  nsBrowserApp.o   -framework Cocoa -lobjc  -framework ExceptionHandling -Wl,-executable_path,/Users/ekr/dev/mozilla-inbound/obj-x86_64-apple-darwin10.8.0/dist/bin  -L../../dist/bin -L../../dist/lib   /Users/ekr/dev/mozilla-inbound/obj-x86_64-apple-darwin10.8.0/dist/lib/libxpcomglue.a  -L/Users/ekr/dev/mozilla-inbound/obj-x86_64-apple-darwin10.8.0/dist/lib -lmozglue    
Undefined symbols for architecture x86_64:
  "_XRE_DisableWritePoisoning", referenced from:
      _main in nsBrowserApp.o
      kXULFuncs in nsBrowserApp.o
ld: symbol(s) not found for architecture x86_64
I think I may have missed something... Retrying
Assignee: respindola → ekr
I deconflicted the patch, tested it, and it seems to work. Revised patch uploaded above.
Assignee: ekr → respindola
I'm hitting this crash all over the place with a latest tinderbox debug build. Even for profiles where gUM is NOT enabled. So not sure the assumption is correct that camera code is triggering the assertion.

http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx64-debug/1357229626/

Sadly I don't get a helpful stack: bp-8661f4b3-78b4-4547-9db2-8f91e2130104
Tinderbox debug builds don't have symbols uploaded for crash-stats, so you won't get a useful crash report out of them.
https://hg.mozilla.org/mozilla-central/rev/9e0448803282
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
Whiteboard: [getUserMedia][blocking-gum+][automation-blocked] → [getUserMedia][blocking-gum+][automation-blocked][qa-]
The report in comment 26 was with an early version of espindola's patch which was later backed out.  The final version didn't hit m-c until 30 hours after this push to Alder.
This issue doesn't happen anymore and we are green now on tbpl:
https://tbpl.mozilla.org/?tree=Alder&rev=df98893d9f90

Rafael, what is the follow-up bug to re-enable that feature again? See comment 10.
Status: RESOLVED → VERIFIED
(In reply to Henrik Skupin (:whimboo) from comment #28)
> This issue doesn't happen anymore and we are green now on tbpl:
> https://tbpl.mozilla.org/?tree=Alder&rev=df98893d9f90
> 
> Rafael, what is the follow-up bug to re-enable that feature again? See
> comment 10.

bug 826143.
Flags: in-testsuite-
Whiteboard: [getUserMedia][blocking-gum+][automation-blocked][qa-] → [getUserMedia][blocking-gum+][qa-]
You need to log in before you can comment on or make changes to this bug.