Closed Bug 1524035 Opened 5 years ago Closed 4 years ago

Assertions in socket process do not output a stack trace

Categories

(Core :: Networking, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla79
Tracking Status
firefox79 --- fixed

People

(Reporter: bwc, Assigned: kershaw)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged])

Attachments

(1 file)

Due to timing differences, I am hitting an assertion in the ICE stack in my try pushes, but there is no stack trace in the logging. There's probably something special we need to do when setting up the socket process for this to happen. Given how hard this makes it to diagnose/fix failures on try, I think we need to fix this before landing bug 1521879.

(In reply to Byron Campen [:bwc] from comment #0)

Due to timing differences, I am hitting an assertion in the ICE stack in my try pushes, but there is no stack trace in the logging. There's probably something special we need to do when setting up the socket process for this to happen. Given how hard this makes it to diagnose/fix failures on try, I think we need to fix this before landing bug 1521879.

Do you have a try link? Does this happen on all platforms, or just linux64?

Flags: needinfo?(docfaraday)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=247f5549eba37f2b8ac7bfe5dbb891b3ca68e240&selectedJob=225021263

The assertion is happening intermittently on all linux debug, and in no case do I see a stack trace. Other platforms may have the same problem, but they do not crash, so I don't know.

Flags: needinfo?(docfaraday)

OK, so there are a couple of different problems here.

One is that for content process crashes, we immediately exit if MOZ_CRASHREPORTER_SHUTDOWN is present in the environment:

https://searchfox.org/mozilla-central/rev/78cd247b5d7a08832f87d786541d3e2204842e8e/browser/modules/ContentCrashHandlers.jsm#105-153

triggered from:

https://searchfox.org/mozilla-central/rev/78cd247b5d7a08832f87d786541d3e2204842e8e/dom/ipc/ContentParent.cpp#1688

But we don't do the same for other process types, AFAICT. So that's why the testsuite just keeps going. This would be a problem for the RDD process, too, I guess.

I'm not entirely sure where to add the logic to do the same thing on socket process crashes...I guess somewhere in SocketProcessParent::ActorDestroy?

But! Those minidumps do get processed...they just get processed at the end of the test run, which you can see in the complete log:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=225021181&repo=try&lineNumber=49268

You may have to scroll down slightly. You may also notice that those crashes don't have symbols (o.O), which is bug 1524051.

But! Linux32 symbols do work:

https://taskcluster-artifacts.net/IkA_DFaHTw-JBnNq4BPD4A/0/public/logs/live_backing.log

and that says the main process appears to be crashing in:

INFO - Crash reason: SIGABRT
INFO - Crash address: 0x5c0
INFO - Process uptime: not available
INFO -
INFO - Thread 3 (crashed)
INFO - 0 linux-gate.so + 0xcd9
INFO - eip = 0xf76edcd9 esp = 0xe8754a78 ebp = 0xe8754c48 ebx = 0x000005c0
INFO - esi = 0xf76c6000 edi = 0xe8754b34 eax = 0x00000000 ecx = 0x000005c7
INFO - edx = 0x00000006 efl = 0x00000206
INFO - Found by: given as instruction pointer in context
INFO - 1 libxul.so!nr_ice_candidate_pair_do_triggered_check [ice_candidate_pair.c:247f5549eba37f2b8ac7bfe5dbb891b3ca68e240 : 491 + 0xd]
INFO - eip = 0xf0e2b859 esp = 0xe8754c50 ebp = 0xe8754c88
INFO - Found by: previous frame's frame pointer
INFO - 2 libxul.so!nr_ice_component_handle_triggered_check [ice_component.c:247f5549eba37f2b8ac7bfe5dbb891b3ca68e240 : 819 + 0xc]
INFO - eip = 0xf0e35a7f esp = 0xe8754c90 ebp = 0xe8754cb8 ebx = 0xf53ab000
INFO - esi = 0xe8162c0c edi = 0xf7021acc
INFO - Found by: call frame info
INFO - 3 libxul.so!nr_ice_component_process_incoming_check [ice_component.c:247f5549eba37f2b8ac7bfe5dbb891b3ca68e240 : 892 + 0x12]
INFO - eip = 0xf0e2de27 esp = 0xe8754cc0 ebp = 0xe8754d08 ebx = 0xf53ab000
INFO - esi = 0xf7021acc edi = 0xe8162c0c
INFO - Found by: call frame info
INFO - 4 libxul.so!nr_ice_component_stun_server_cb [ice_component.c:247f5549eba37f2b8ac7bfe5dbb891b3ca68e240 : 989 + 0x14]
INFO - eip = 0xf0e2e60a esp = 0xe8754d10 ebp = 0xe8754eb8 ebx = 0xf53ab000
INFO - esi = 0xf7021acc edi = 0xe8754ef8
INFO - Found by: call frame info
INFO - 5 libxul.so!nr_stun_server_process_request [stun_server_ctx.c:247f5549eba37f2b8ac7bfe5dbb891b3ca68e240 : 327 + 0x18]
INFO - eip = 0xf0e3dd51 esp = 0xe8754ec0 ebp = 0xe87552a8 ebx = 0xf53ab000
INFO - esi = 0xe806d40c edi = 0xe83ef00c
INFO - Found by: call frame info
INFO - 6 libxul.so!nr_ice_socket_readable_cb [ice_socket.c:247f5549eba37f2b8ac7bfe5dbb891b3ca68e240 : 120 + 0x2a]
INFO - eip = 0xf0e3378a esp = 0xe87552b0 ebp = 0xe8757b28 ebx = 0xf53ab000
INFO - esi = 0xf70e8cac edi = 0xe851b3dc
INFO - Found by: call frame info
INFO - 7 libxul.so!mozilla::NrSocketBase::fire_callback(int) [nr_socket_prsock.cpp:247f5549eba37f2b8ac7bfe5dbb891b3ca68e240 : 339 + 0x13]
INFO - eip = 0xedbe1cba esp = 0xe8757b30 ebp = 0xe8757b58 ebx = 0xf53ab000
INFO - esi = 0x00000000 edi = 0xe8064040
INFO - Found by: call frame info
INFO - 8 libxul.so!mozilla::NrSocket::OnSocketReady(PRFileDesc*, short) [nr_socket_prsock.cpp:247f5549eba37f2b8ac7bfe5dbb891b3ca68e240 : 347 + 0x12]
INFO - eip = 0xedbe1f8f esp = 0xe8757b60 ebp = 0xe8757b78 ebx = 0xf53ab000
INFO - esi = 0xe8064040 edi = 0x00000001
INFO - Found by: call frame info
INFO - 9 libxul.so!non-virtual thunk to mozilla::NrSocket::OnSocketReady(PRFileDesc*, short) [nr_socket_prsock.cpp:247f5549eba37f2b8ac7bfe5dbb891b3ca68e240 : 0 + 0x5]
INFO - eip = 0xedbe1ff1 esp = 0xe8757b80 ebp = 0xe8757b98 ebx = 0xf53ab000
INFO - esi = 0x0000000f edi = 0x000000b4
INFO - Found by: call frame info
INFO - 10 libxul.so!mozilla::net::nsSocketTransportService::DoPollIteration(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>*) [nsSocketTransportService2.cpp:247f5549eba37f2b8ac7bfe5dbb891b3ca68e240 : 1210 + 0x19]
INFO - eip = 0xed1e4e29 esp = 0xe8757ba0 ebp = 0xe8757bf8 ebx = 0xf53ab000
INFO - esi = 0x0000000f edi = 0x000000b4
INFO - Found by: call frame info
INFO - 11 libxul.so!mozilla::net::nsSocketTransportService::Run() [nsSocketTransportService2.cpp:247f5549eba37f2b8ac7bfe5dbb891b3ca68e240 : 972 + 0xf]
INFO - eip = 0xed1e44c5 esp = 0xe8757c00 ebp = 0xe8757d08 ebx = 0xf53ab000
INFO - esi = 0x00000000 edi = 0xf70612e0
INFO - Found by: call frame info
INFO - 12 libxul.so!non-virtual thunk to mozilla::net::nsSocketTransportService::Run() [nsSocketTransportService2.cpp:247f5549eba37f2b8ac7bfe5dbb891b3ca68e240 : 0 + 0x5]
INFO - eip = 0xed1e5255 esp = 0xe8757d10 ebp = 0xe8757d18 ebx = 0xf53ab000
INFO - esi = 0xe8757d90 edi = 0xf7060180
INFO - Found by: call frame info

There are a ton of child process crashes/leaks, which I assume is a side-effect of the socket process crashes?

Nathan, is there anything that the Necko team has to do to address the original problem reported here - "missing stack trace in output" ?

Thanks.

Flags: needinfo?(nfroyd)

(In reply to Honza Bambas (:mayhemer) from comment #4)

Nathan, is there anything that the Necko team has to do to address the original problem reported here - "missing stack trace in output" ?

Yes, the Necko team would need to figure out how to address this bit of comment 8:

(In reply to Nathan Froyd [:froydnj] from comment #3)

OK, so there are a couple of different problems here.

One is that for content process crashes, we immediately exit if MOZ_CRASHREPORTER_SHUTDOWN is present in the environment:

https://searchfox.org/mozilla-central/rev/78cd247b5d7a08832f87d786541d3e2204842e8e/browser/modules/ContentCrashHandlers.jsm#105-153

triggered from:

https://searchfox.org/mozilla-central/rev/78cd247b5d7a08832f87d786541d3e2204842e8e/dom/ipc/ContentParent.cpp#1688

But we don't do the same for other process types, AFAICT. So that's why the testsuite just keeps going. This would be a problem for the RDD process, too, I guess.

I'm not entirely sure where to add the logic to do the same thing on socket process crashes...I guess somewhere in SocketProcessParent::ActorDestroy?

so that socket process crashes exit the parent process while running tests.

Flags: needinfo?(nfroyd)

thanks!

P2 based on the priority of the bug this one blocks.

Whiteboard: [necko-triage]
Whiteboard: [necko-triage] → [necko-triaged]
Priority: -- → P2
Blocks: socket-proc
Assignee: nobody → kershaw
Component: DOM: Networking → Networking
Pushed by kjang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ba5056606cbb
Make parent process shut down when socket process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set r=dragana
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: