Closed Bug 1161717 Opened 10 years ago Closed 10 years ago

[Page Thumbnails] Startup crash for localized Firefox builds [@ mozilla::net::PNeckoChild::SendPHttpChannelConstructor]

Categories

(Core :: Networking: HTTP, defect)

Product:

Component:

Version:

38 Branch

Type:

defect

Priority:

Not set

Severity:

critical

Tracking

()

Status:

RESOLVED INCOMPLETE

Tracking Flags:

Tracking

Status

firefox38

---

wontfix

firefox39

-

wontfix

firefox-esr38

-

wontfix

People

(Reporter: whimboo, Unassigned)

References

Details

(Keywords: crash, Whiteboard: [mozmill][qa-automation-blocked])

Crash Data

Attachments

(3 files)

log.txt 10 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 6.31 KB, text/plain		Details
http_log.txt 10 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 320.15 KB, text/plain		Details
http_log.txt.child-1 10 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 4.59 KB, text/plain		Details

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Description

•

10 years ago

While investigating the test results from our Mozmill tests from the last 4 weeks I have seen that we have a new top crasher on OS X 10.8. It seems to crash the build each time during startup between two tests. It started on April 24th (20150424064750): report: bp-9ff2f47e-5b93-4b16-a5e6-c7a572150505. Crash Reason EXC_BAD_ACCESS / KERN_INVALID_ADDRESS Crash Address 0x0 First ten stack frames: 0 libmozalloc.dylib mozalloc_abort(char const*) memory/mozalloc/mozalloc_abort.cpp 1 XUL Abort xpcom/base/nsDebugImpl.cpp 2 XUL NS_DebugBreak xpcom/base/nsDebugImpl.cpp 3 XUL mozilla::net::PNeckoChild::SendPHttpChannelConstructor(mozilla::net::PHttpChannelChild*, mozilla::dom::PBrowserOrId const&, IPC::SerializedLoadContext const&, mozilla::net::HttpChannelCreationArgs const&) obj-firefox/x86_64/ipc/ipdl/PNeckoChild.cpp 4 XUL mozilla::net::HttpChannelChild::ContinueAsyncOpen() netwerk/protocol/http/HttpChannelChild.cpp 5 XUL mozilla::net::HttpChannelChild::AsyncOpen(nsIStreamListener*, nsISupports*) netwerk/protocol/http/HttpChannelChild.cpp 6 XUL nsScriptLoader::StartLoad(nsScriptLoadRequest*, nsAString_internal const&, bool) dom/base/nsScriptLoader.cpp 7 XUL nsScriptLoader::ProcessScriptElement(nsIScriptElement*) dom/base/nsScriptLoader.cpp 8 XUL nsScriptElement::MaybeProcessScript() dom/base/nsScriptElement.cpp 9 XUL mozilla::dom::HTMLScriptElement::BindToTree(nsIDocument*, nsIContent*, nsIContent*, bool) dom/html/HTMLScriptElement.cpp 10 XUL nsINode::doInsertChildAt(nsIContent*, unsigned int, bool, nsAttrAndChildArray&) dom/base/nsINode.cpp

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 1

•

10 years ago

It might be a regression between 38b6 and 38b7. Pushlog: https://hg.mozilla.org/releases/mozilla-release/pushloghtml?fromchange=c68a6293bb0d&tochange=504ec068cc33 I don't see anything obvious here.

Patrick McManus [:mcmanus]

Comment 2

•

10 years ago

that's e10s code asserting failure in a non e10s (38) build. Since this is between two tests could the prefs be in a weird state?

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 3

•

10 years ago

For Mozmill tests we always have e10s turned off because this framework doesn't support it. So not sure why this code is getting run in that case. https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/__init__.py#L125

Patrick McManus [:mcmanus]

Comment 4

•

10 years ago

ideas?

Flags: needinfo?(jduell.mcbugs)

Flags: needinfo?(honzab.moz)

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 5

•

10 years ago

The same also applies to all the Linux boxes we own in SCL3. Sadly I'm not able to submit any of those reports via the crash reporter. But ted helped me and gave me details for the dmp files: Thread 0 (crashed) 0 libmozalloc.so!mozalloc_abort(char const*) [mozalloc_abort.cpp:504ec068cc33 : 37 + 0x0] rbx = 0x00007fa52ebad868 r12 = 0x00007fa52ebad868 r13 = 0x0000000000000000 r14 = 0x00007fa53202bdb4 r15 = 0x0000000000000139 rip = 0x00007fa5344fbfc9 rsp = 0x00007fff8eac6b70 rbp = 0x0000000000000003 Found by: given as instruction pointer in context 1 libxul.so!NS_DebugBreak [nsDebugImpl.cpp:504ec068cc33 : 469 + 0x7] rbx = 0x00007fff8eac6bc0 r12 = 0x00007fa52ebad868 r13 = 0x0000000000000000 r14 = 0x00007fa53202bdb4 r15 = 0x0000000000000139 rip = 0x00007fa53039ee85 rsp = 0x00007fff8eac6b80 rbp = 0x0000000000000003 Found by: call frame info 2 libxul.so!mozilla::net::PNeckoChild::SendPHttpChannelConstructor(mozilla::net::PHttpChannelChild*, mozilla::dom::PBrowserOrId const&, IPC::SerializedLoadContext const&, mozilla::net::HttpChannelCreationArgs const&) [PNeckoChild.cpp:504ec068cc33 : 313 + 0x4] rbx = 0x0000000000000000 r12 = 0x00007fff8eac7290 r13 = 0x00007fff8eac70a0 r14 = 0x00007fff8eac7090 r15 = 0x00007fa5339c1a50 rip = 0x00007fa530615f73 rsp = 0x00007fff8eac6ff0 rbp = 0x00007fa5197c71c0 Found by: call frame info 3 libxul.so!mozilla::net::HttpChannelChild::ContinueAsyncOpen() [HttpChannelChild.cpp:504ec068cc33 : 1644 + 0x43] rbx = 0x00007fa51765f000 r12 = 0x00007fa51d424800 r13 = 0x00007fa5339b2990 r14 = 0x00000000cd140014 r15 = 0x00007fa5339c1a50 rip = 0x00007fa5304e9424 rsp = 0x00007fff8eac7040 rbp = 0x0000000000000000 Found by: call frame info

OS: Mac OS X → All

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 6

•

10 years ago

Marking as qa-automation-blocked because we have too many instances of those crashes. Right now its kinda hard to nail down the problem because this crash does not always happen but only each 5 or so full test runs.

Hardware: x86_64 → All

Whiteboard: [mozmill] → [mozmill][qa-automation-blocked]

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 7

•

10 years ago

Actually I can see some of those lines in our add-on related Mozmill tests for Firefox 38.0 RC: > [Child 27528] ###!!! ABORT: constructor for actor failed: file ./PNeckoChild.cpp, line 313 It's the same file as for the crash, so I wonder if that is related.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Updated

•

10 years ago

Blocks: 1150242

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Updated

•

10 years ago

No longer blocks: 1150242

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Updated

•

10 years ago

Blocks: 1163181

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 8

•

10 years ago

This crash is the most occurring crash for our automated Mozmill test runs and Firefox 38. In detail it means that really each locale except en-US is crashing with this stack. This happens for 38.0 build 3 and for todays 38.0.5b1. http://mozmill-release.blargon7.com/#/functional/failure?app=Firefox&branch=38&platform=Linux&from=2015-05-10&to=2015-05-11&test=%2FtestToolbar%2FtestHomeButton.js&func=testHomeButton.js We collect crashes at the end of the testrun, so I cannot precisely say where exactly the crash is occurring. But one scenario I have already found can be triggered by running the following two tests in sequence: * testMenu_quitApplication (http://hg.mozilla.org/qa/mozmill-tests/file/mozilla-release/firefox/tests/functional/restartTests/testMenu_quitApplication/test1.js) * testChangeTheme (http://hg.mozilla.org/qa/mozmill-tests/file/mozilla-release/firefox/tests/functional/testAddons/testChangeTheme.js)

Summary: Startup crash in mozilla::net::PNeckoChild::SendPHttpChannelConstructor → Startup crash for localized Firefox builds [@ mozilla::net::PNeckoChild::SendPHttpChannelConstructor]

Version: 39 Branch → 38 Branch

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 9

•

10 years ago

The link to our dashboard should have been the following to also include the OS X crashes and the 38.0 build 3 results: http://mozmill-release.blargon7.com/#/functional/failure?app=Firefox&branch=38&platform=All&from=2015-05-8&to=2015-05-11&test=%2FtestToolbar%2FtestHomeButton.js&func=testHomeButton.js

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 10

•

10 years ago

And typo again :( http://mozmill-release.blargon7.com/#/functional/failure?app=Firefox&branch=38&platform=All&from=2015-05-08&to=2015-05-11&test=%2FtestToolbar%2FtestHomeButton.js&func=testHomeButton.js

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 11

•

10 years ago

The tests which trigger the crash are doing the following: * Quit Firefox via the menu's quit entry * Closing all tabs * Opening 'addons/install.html?addon=themes/plain.jar' via the local httpd.js webserver (which is identical to http://mozqa.com/data/firefox/addons/install.html?addon=themes/plain.jar) * Click the link to install the theme and proceed the install dialog * Open the Add-on Manager, selecting the theme pane, and restart Firefox During that restart Firefox crashes directly during start-up.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 12

•

10 years ago

Attached file log.txt — Details

Log file with some Javascript warnings/errors which might help to investigate this problem.

Dragana Damjanovic [:dragana]

Comment 13

•

10 years ago

Can you make http log? Thanks

Flags: needinfo?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 14

•

10 years ago

Attached file http_log.txt — Details

Sure! I totally forgot about that. Here it is.

Flags: needinfo?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 15

•

10 years ago

Attached file http_log.txt.child-1 — Details

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 16

•

10 years ago

So by further testing this problem seems to exist only when about:newtab is used when new tabs are getting opened. The crash is gone when I make use of about:blank in the Mozmill test. We already had a couple of problems with this page already in the past. Mostly with background thumbnailing. Given the situation here on that bug I feel that this could closely be related.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 17

•

10 years ago

Interestingly this is not happening on Windows, maybe because the content sandbox is disabled via bug 1158849?

Flags: needinfo?(bobowen.code)

Dragana Damjanovic [:dragana]

Comment 18

•

10 years ago

Looking at the log, I can only see that after restart is using e10s as if the pref is changed. And there is a log for child process so e10s pref had changed.

Bob Owen (:bobowen)

Comment 19

•

10 years ago

(In reply to Henrik Skupin (:whimboo) from comment #17) > Interestingly this is not happening on Windows, maybe because the content > sandbox is disabled via bug 1158849? As far as I can tell, the content process (and therefore the thumbnail process) is only sandboxed on Nightly for both OSX and Windows now. So, I don't think so. The thumbnail process was being sandboxed in branches other than Nightly by mistake on Windows, which is what bug 1158849 fixed.

Flags: needinfo?(bobowen.code)

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 20

•

10 years ago

(In reply to Dragana Damjanovic [:dragana] from comment #18) > Looking at the log, I can only see that after restart is using e10s as if > the pref is changed. > And there is a log for child process so e10s pref had changed. The pref you are referring here should be browser.tabs.remote.autostart right? And in this case it should be true?

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 21

•

10 years ago

If that is the pref, it is still set to false after restart. Just tested.

Dragana Damjanovic [:dragana]

Comment 22

•

10 years ago

about:blank is taking a bit different path that is why it is not crashing. Why it is not crashing on windows ant this moment i do not know. The pref that i was talking about is browser.tabs.remote.autostart. The place where necko code decides to call child or not child is: http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpHandler.cpp#1840 I do not see much from the log.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 23

•

10 years ago

(In reply to Dragana Damjanovic [:dragana] from comment #22) > about:blank is taking a bit different path that is why it is not crashing. Right, because it doesn't trigger the background thumbnail process. I simply used that for now to let the crash stop for our tests of Firefox 38 builds. > Why it is not crashing on windows ant this moment i do not know. > > The pref that i was talking about is browser.tabs.remote.autostart. > > The place where necko code decides to call child or not child is: > http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/ > nsHttpHandler.cpp#1840 Also I would like say that e10s on Aurora is not enabled by default. So are those child processes created because of an asynchronous background process?

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 24

•

10 years ago

I found bug 726347 which added a pref for disabling the background thumbnail process. I disabled that while leaving the newtab page active, and indeed it stops crashing Firefox.

Flags: needinfo?(ttaubert)

Summary: Startup crash for localized Firefox builds [@ mozilla::net::PNeckoChild::SendPHttpChannelConstructor] → [Page Thumbnails] Startup crash for localized Firefox builds [@ mozilla::net::PNeckoChild::SendPHttpChannelConstructor]

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 25

•

10 years ago

Just as an information... our machines were this crash is occurring are located in SCL3 behind a proxy. Maybe that has an influence here. But not sure why its only happening for localized builds and not en-US, and not on Windows. It's fun, and I do not have time to further investigate it. If it turns out to be really important I can have a further look but so far I will continue with other important things.

Tim Taubert [:ttaubert] (inactive)

Comment 26

•

10 years ago

Sorry, no time to investigate this in the near future. Drew has been working mostly on the background thumbnailer though. You might also want to loop in a few e10s folks maybe that know about HTTP channel impl.

Flags: needinfo?(ttaubert)

Comment 27

•

10 years ago

Sounds like we've got steps to reproduce. We should do that and attach a debugger, and see what happens in the code in comment 22 (ie. where we create HTTP channels) and see what's going on. We create an e10s child channel is IsNeckoChild() returns true. But if that's happening, something really weird is going on with XRE_GetProcessType() and we should ask :bent how it could ever return the wrong answer. Note that IsNeckoChild() caches its result, so if there's a race/window where XRE_GetProcessType() returns the wrong answer, we'd keep it forever. But really, processes shouldn't change type (?). I'm fine with getting rid of the cached result if it "fixes" things here.

Flags: needinfo?(jduell.mcbugs)

Flags: needinfo?(honzab.moz)

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 28

•

10 years ago

I assume someone would have to build an l10n debug build of a most recent Firefox 38.0. I don't have the capacity to look that deep into it. But I would be happy to run a debugger if I get such a build e.g. via try.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 29

•

10 years ago

This is still a top crasher for our Mozmill tests. We get hundreds of reports for each release. :(

Sylvestre Ledru [:Sylvestre]

Comment 30

•

10 years ago

OK, tracking for 39.

status-firefox39: --- → affected

tracking-firefox39: --- → +

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 31

•

10 years ago

[Tracking Requested - why for this release]: It would be good if we can get this fixed for 38ESR given that we will have more releases here. I have to say that we have not seen this crash for 39 builds yet.

status-firefox-esr38: --- → affected

tracking-firefox-esr38: --- → ?

Dragana Damjanovic [:dragana]

Comment 32

•

10 years ago

I have tries to reproduce it locally, but I couldn't. I have even used proxy, but no difference. From a chat with :whimboo - he will start replacing mozmill tests with marionette, so we will see in a week. I will try to figure out how to make build from comment #28.

Flags: needinfo?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 33

•

10 years ago

Due to a lot of failures for Mozmill tests and no-one who could fix or at least analyze the problems, we decided to shutdown most of them about 2 weeks ago. Since then we no longer see this crash. I feel its not worth the time to dig more into this crash given that all reports so far came from our testing machines. I would close it as incomplete for now, with the option to reopen if it comes back with the Marionette tests.

Status: NEW → RESOLVED

Closed: 10 years ago

Flags: needinfo?(hskupin)

Resolution: --- → INCOMPLETE

Liz Henry (:lizzard) (privacy/fingerprinting team)

Comment 34

•

10 years ago

Dropping tracking for this since the crash sounds very specific to particular test machines and no one is working on it. We aren't seeing this in crash-stats.

tracking-firefox39: + → -

tracking-firefox-esr38: ? → -

Ryan VanderMeulen [:RyanVM]

Updated

•

3 years ago

status-firefox38: affected → wontfix

status-firefox39: affected → wontfix

status-firefox-esr38: affected → wontfix

You need to log in before you can comment on or make changes to this bug.