Closed Bug 980800 Opened 7 years ago Closed 6 years ago
Firefox startup crashes when trying to create a local socket via ctypes [@ pt
_Set Socket Option ]
Since a while are having problems with our socket connection in Mozmill, which is used to communicate between the Jsbridge Python and extension part. Intermittently we can see disconnects right after startup when we initiate the server socket via ctypes and NSS: https://github.com/mozilla/mozmill/blob/master/jsbridge/jsbridge/extension/resource/modules/Sockets.jsm#L98 As noticed today those disconnects are actually caused because Firefox crashes. So we submitted a report for now: Crash report: bp-a51d2cc8-2af4-47b1-aff1-e0a072140307 Crash Reason EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE Crash Address 0xb500000 Here the first 10 stack frames: 0 libnss3.dylib pt_SetSocketOption 1 libmozglue.dylib arena_malloc memory/mozjemalloc/jemalloc.c 2 XUL ffi_call js/src/ctypes/libffi/src/x86/ffi64.c 3 XUL js::ctypes::FunctionType::Call js/src/ctypes/CTypes.cpp 4 XUL js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) js/src/jscntxtinlines.h 5 XUL js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) js/src/vm/Interpreter.cpp 6 XUL js::DirectProxyHandler::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) js/src/jsproxy.cpp 7 XUL js::CrossCompartmentWrapper::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) js/src/jswrapper.cpp 8 XUL js::Proxy::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) js/src/jsproxy.cpp 9 XUL proxy_Call js/src/jsproxy.cpp 10 XUL js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) js/src/jscntxtinlines.h We haven't found a reproducible pattern yet, but we will continue to investigate that. For now I will close this bug as a security one given that we access some weird memory location. Andreea, can you please check which platforms are affected here? Any startup disconnect for jsbridge should be that problem. So mainly all failures we see at the moment.
Whiteboard: [mozmill] → [mozmill][qa-automation-blocked]
Crash Signature: [@ pt_SetSocketOption ]
I checked several machines from the last 3 weeks which failed with jsbridge, submitted crash reports but only another one on the same machine mm-osx-109-4 from February 14th had the same signature. So until now only OS X is affected, on 10.9.2.
Given the callstack, the reason is almost certainly that the JS function you passed to C++ has been GCed. See the second big warning here: https://developer.mozilla.org/en-US/docs/Mozilla/js-ctypes/js-ctypes_reference/Callbacks
Sounds like this test thing is using ctypes incorrectly, which isn't really a security issues. This should get moved to whatever component is appropriate for this test code, and can be opened up.
Assignee: nobody → nobody
Component: Libraries → Mozmill
Product: NSS → Testing
QA Contact: hskupin
Version: 3.15.6 → unspecified
Opening this bug up until I have more time to work on it.
Bobby, so we are defining a NSS object which contains a helper method to call 'PR_SetSocketOption()': https://github.com/mozilla/mozmill/blob/master/jsbridge/jsbridge/extension/resource/modules/NSS.jsm#L150 When we call that method I don't see that any callback is in use: https://github.com/mozilla/mozmill/blob/master/jsbridge/jsbridge/extension/resource/modules/Sockets.jsm#L116 So I'm not sure if I understand the problem. Could it probably mean that fd is invalid?
Oh yeah, my bad - I read the callstack backwards :P Yeah, this looks like it should all work. Someone needs to fire up a debugger and figure out what's going on
Bobby, so we should move this back into the NSS component? Andreea, have we ever seen this crash again? Hopefully with Mozmill 2.0.6 we will see the crash exposed on OS X now. Lets have an eye on it. Would be good to get some reproducible steps.
(In reply to Henrik Skupin (:whimboo) from comment #7) > Bobby, so we should move this back into the NSS component? Hard to say. It's probably not js-ctypes, but it could be either NSS or Mozmill (using NSS incorrectly). > Andreea, have we ever seen this crash again? Hopefully with Mozmill 2.0.6 we > will see the crash exposed on OS X now. Lets have an eye on it. Would be > good to get some reproducible steps. Yeah, STR would be good here.
Firefox crashed today on our mm-osx-107-4 with the exact same signature (Aurora de). https://crash-stats.mozilla.com/report/index/b3d18e86-97d9-45de-9166-4f6812140520 It happened with a functional testrun when running this test: tests/functional/restartTests/testAddons_changeTheme/test2.js
Failed twice on mm-osx-107-4, after tests/functional/restartTests/testAddons_changeTheme/test1.js https://crash-stats.mozilla.com/report/index/0d1f5ab5-834e-44c8-97ba-711902140610 I tried to reproduce it by running changeTheme tests in a loop (x10) and by running a a complete testrun on the affected node.
Fairly low volume, which actually doesn't block us from testing with Mozmill.
Whiteboard: [mozmill][qa-automation-blocked] → [mozmill]
Crashed 3 times today, all of them on OSX 10.6 with Firefox 33: https://crash-stats.mozilla.com/report/index/667ceceb-4310-442b-ab14-cc3112140923 https://crash-stats.mozilla.com/report/index/1d49b6b9-33c9-4d5c-8969-273c62140923 https://crash-stats.mozilla.com/report/index/58c04bce-0172-43a9-a56b-5c9db2140923
With the latest runs on beta, crashed 10 times on the same machine - mm-osx-106-3. The test that fails is: /restartTests/testAddons_uninstallExtension/test3.js | test3.js I will try to run it in a loop, as well as the testrun to see if it reproduces. http://mozmill-release.blargon7.com/#/functional/failure?app=All&branch=All&platform=All&from=2014-02-03&test=%2FrestartTests%2FtestAddons_uninstallExtension%2Ftest3.js&func=test3.js bp-1e04d5e7-4387-4e99-95c2-3d7f52141001 bp-46f879ce-1a74-4318-80e2-790782141001 bp-b5dd3389-a001-4c72-8ef1-d38a52141001 bp-ee510279-1b61-448e-a411-31f402141001 bp-f39c2f6d-42c3-46df-ad40-8864e2141001 bp-9d18eb0a-1a13-4984-a89b-b4f022141001 bp-963a148b-c459-46d9-842a-d03ab2141001 bp-d6601e36-4ffe-417b-9612-b9e762141001 bp-922267cc-69bd-4d85-89d0-018d82141001 bp-b919ffa1-0fd2-462d-aa7a-f69582141001
We've had a recent surge in crashes. All of them point to a fault in a XUL library. Seems FirefoxOS is also experiencing lots of crashes with todays build. They received a large merge from mc with the following pushlog: http://hg.mozilla.org/integration/b2g-inbound/pushloghtml?fromchange=4355feecf4bd&tochange=f4e8988b3881 Something from that pushlog is likely to be the regressor.
The new app bundle structure for v2 signing landed on m-c yesterday. If we have a much higher crash rate due to this change, we should get this investigated as long as we have the chance. So maybe bug 1047584 plays into role here.
(In reply to Andrei Eftimie from comment #15) > Seems FirefoxOS is also experiencing lots of crashes with todays build. They > received a large merge from mc with the following pushlog: Wait. What do you mean with Firefox OS here? And which version on which platforms? Any links? Please give more details.
(In reply to Henrik Skupin (:whimboo) from comment #17) > (In reply to Andrei Eftimie from comment #15) > > Seems FirefoxOS is also experiencing lots of crashes with todays build. They > > received a large merge from mc with the following pushlog: > > Wait. What do you mean with Firefox OS here? And which version on which > platforms? Any links? Please give more details. Found it: bug 1075387 Not sure if it's clear from the report, but they see multiple signature crashes, in different places of their tests, all pointing to the libxul.so (which is what we've seen in the last week).
Libxul contains nearly everything. So there is a minimal relation to this issue. If we see some of those other issues, mark the filed bugs appropriately (cc me) or file new ones. This one is separate.
I couldn't reproduce the crash in running the test about 50 times and the testrun about 5 times on the affected machine. I put it back online now.
Crashed 9 times on the last beta today with the same signature here All on the same machine (check link from Comment 14) So may be something related to the machine's configuration, I'm not sure, given that it always happen on the same test I'll continue to try reproducing it.
A good hint with mm-osx-106-3 here. I think it's a good idea to concentrate on it. Btw. does it vary when it crashes? Or is it always happening after the exact same test?
Whiteboard: [mozmill] → [mozmill][mm-osx-106-3 only]
We haven't seen this crash for a long time and it's unlikely that we would fix that in Mozmill proper given that we transition to Marionette now. Closing as wontfix.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
For reference, this is due to mozmill using structure declarations in ctypes that don't actually match the C structures on 64-bit platforms, so PR_SetSocketOption tries to read the new option value from after the end of a heap allocation, which sometimes crashes and otherwise just reads garbage; see bug 1223302 comment #9. If mozmill is sufficiently end-of-life that that's not worth fixing then this could stay closed.
Yes, Mozmill has reached its EOL a while ago. No further releases will be made. If tests are still using Mozmill and they broke due to changes in Firefox, they will need to be reimplemented by using Marionette.
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.