Closed Bug 1085266 Opened 10 years ago Closed 10 years ago

Intermittent test_storageConstructor.html | application crashed [@ PR_Close][@ nsUDPSocket::Close()]

Categories

(Core :: Networking, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla36
Tracking Status
firefox34 --- fixed
firefox35 --- fixed
firefox36 --- fixed
firefox-esr31 --- wontfix

People

(Reporter: cbook, Assigned: dragana)

References

(Depends on 1 open bug, )

Details

(Keywords: crash, intermittent-failure)

Attachments

(1 file, 1 obsolete file)

Ubuntu VM 12.04 x64 fx-team pgo test mochitest-e10s-3

https://treeherder.mozilla.org/ui/logviewer.html#?job_id=940424&repo=fx-team

04:42:20 INFO - 3655 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/tests/mochitest/localstorage/test_storageConstructor.html | application terminated with exit code 11 

04:42:34 WARNING - PROCESS-CRASH | /tests/dom/tests/mochitest/localstorage/test_storageConstructor.html | application crashed [@ PR_Close]
04:42:34 INFO - Crash dump filename: /tmp/tmpA8Rjr3.mozrunner/minidumps/24a5bece-7cfb-a782-20311a1f-5cc81b59.dmp
04:42:34 INFO - Operating system: Linux
04:42:34 INFO - 0.0.0 Linux 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64
04:42:34 INFO - CPU: amd64
04:42:34 INFO - family 6 model 62 stepping 4
04:42:34 INFO - 1 CPU
04:42:34 INFO - Crash reason: SIGSEGV
04:42:34 INFO - Crash address: 0x0
04:42:34 INFO - Thread 0 (crashed)
04:42:34 INFO - 0 libnspr4.so!PR_Close [priometh.c:37e5f630049b : 104 + 0x4]
04:42:34 INFO - rbx = 0x00007f36a81286e0 r12 = 0x00007f36a8128700
04:42:34 INFO - r13 = 0x00007f36d254b6f0 r14 = 0x00007f36a81286f8
04:42:34 INFO - r15 = 0x00007f36a81286f8 rip = 0x00007f36d72d5e7c
04:42:34 INFO - rsp = 0x00007fffef6f28d8 rbp = 0x00007fffef6f2900
04:42:34 INFO - Found by: given as instruction pointer in context
04:42:34 INFO - 1 libxul.so!nsUDPSocket::Close() [nsUDPSocket.cpp:37e5f630049b : 655 + 0x4]
04:42:34 INFO - rbx = 0x00007f36a81286e0 r12 = 0x00007f36a8128700
04:42:34 INFO - r13 = 0x00007f36d254b6f0 r14 = 0x00007f36a81286f8
04:42:34 INFO - r15 = 0x00007f36a81286f8 rip = 0x00007f36d25f25ab
04:42:34 INFO - rsp = 0x00007fffef6f28e0 rbp = 0x00007fffef6f2900
04:42:34 INFO - Found by: call frame info
04:42:34 INFO - 2 libxul.so!nsUDPSocket::~nsUDPSocket() [nsUDPSocket.cpp:37e5f630049b : 277 + 0x4]
04:42:34 INFO - rbx = 0x00007f36a81286e0 r12 = 0x00007f36a8128700
04:42:34 INFO - r13 = 0x00007f36d254b6f0 r14 = 0x00007f36a81286f8
04:42:34 INFO - r15 = 0x00007f36a81286f8 rip = 0x00007f36d25f28e5
04:42:34 INFO - rsp = 0x00007fffef6f2910 rbp = 0x00007fffef6f2920
04:42:34 INFO - Found by: call frame info
04:42:34 INFO - 3 libxul.so!nsUDPSocket::~nsUDPSocket() [nsUDPSocket.cpp:37e5f630049b : 280 + 0x4]
04:42:34 INFO - rbx = 0x00007f36a81286e0 r12 = 0x00007f36a8128700
04:42:34 INFO - r13 = 0x00007f36d254b6f0 r14 = 0x00007f36a81286f8
04:42:34 INFO - r15 = 0x00007f36a81286f8 rip = 0x00007f36d25f2934
04:42:34 INFO - rsp = 0x00007fffef6f2930 rbp = 0x00007fffef6f2940
04:42:34 INFO - Found by: call frame info
Hey Jason, any suggestions for who can look into this?
Component: DOM → Networking
Flags: needinfo?(jduell.mcbugs)
Summary: Intermittent test_storageConstructor.html | application terminated with exit code 11 | application crashed [@ PR_Close] → Intermittent test_storageConstructor.html | application crashed [@ PR_Close][@ nsUDPSocket::Close()]
I will take a look
Flags: needinfo?(jduell.mcbugs)
Assignee: nobody → dd.mozilla
I could not reproduce it not once locally, so it was rather hard to figure out what is wrong.

This part:
https://hg.mozilla.org/integration/fx-team/annotate/37e5f630049b/toolkit/modules/secondscreen/SimpleServiceDiscovery.jsm#l163
until line 165 (there is always an error message for line 164 or 165)
are called really late when everything is shut down and also SocketTransportService is shutdown (there are errors showing that) which shuts down NetworkActivityMonitor which is attached to UDP socket as an layer and since it does not exist it crashes :)

Hi Mark,
I do not know anything about SimpleServiceDiscovery. 
I saw you reviewing this, maybe you are not the right person but you probably know who is. Can you take a look at this?

I can fix the crash in UDPSocket destructor, but i thing that SimpleServiceDiscovery.jsm#l163 should not be called so late in shutdown.
Flags: needinfo?(mark.finkle)
The discovery system does a polling broadcast every 2 minutes, by default. One of those is happening during shutdown. We could check to see if we are shutting down and avoid the new polling broadcast.

One way that comes to mind is registering for an "xpcom-shutdown" or "application-quit" notification when starting a search. If someone could file a new bug for it, we could probably take a look sometime soon-ish.
Flags: needinfo?(mark.finkle)
Although, this error:
JavaScript error: resource://gre/modules/SimpleServiceDiscovery.jsm, line 165: NS_NOINTERFACE: Component does not have requested interface [nsIConsoleService.logStringMessage] 

could probably be fixed by changing:
let log = Cc["@mozilla.org/consoleservice;1"].getService(Ci.nsIConsoleService).logStringMessage

to:
let log = function(msg) { Services.console.logStringMessage(msg); }
The problem are not log messages but the crash that is happening because ff is shutting down and the SocketTransportService is
Depends on: 1089556
So I have not finished last sentence.

The problem is that SocketTransportService is already closed and therefore, already initialized UDPSockets can not be closed properly.
SocketTransportService shuts down on:
xpcom-shutdown so Discovery Service should shutdown on xpcom-will-shutdown 
but SocketTransportService also shuts down on:
profile-change-net-teardown and there is no events before this one.

The problem here are udp sockets that are not attached to sockettransport but already initialized. So latest on AttachSocket there will be an error and the udp sockets will try to close underlying sockets. Above the raw socket there is  NetworkActivityMonitor that is destroy on SocketTransporService shut down. Closing a udp socket, closing underlying socket, after SocketTransporService shutdown will cause a crash.

I can introduce a check if SocketTransporService is shutdown before closing underlying sockets - not a perfect solution. Or shutdown Discovery Service on profile-change-net-teardown, but in which order are listeners notify, can the order be guarantied (they are all called on the same thread)?

Honza, you know the architecture much better than I do, is there an easy way to fix this?
Flags: needinfo?(honzab.moz)
sorry i meant notify udpsocket on profile-change-net-teardown so that they can close underlying sockets.
Flags: needinfo?(honzab.moz)
Hi Patrick, you know the architecture much better then i do. Am I missing something?

So the problem with UDPSockets are that they need to be closed before SocketTransportService shuts down because NetworkActivityMonitor is destroyed in SocketTransportService shutdown. 

SocketTransportService is shut down on xpcom-shutdown (this is not a problem udpsockets must be closed on an event before this one) and profile-change-net-teardown. I notice that profile-change-net-teardown is sometimes sent without any notification being sent beforehand.

Should we introduce a new notification (probably not that easy) or fix NetworkActivityMonitor.
Flags: needinfo?(mcmanus)
Attached patch fix v1 (obsolete) — Splinter Review
I think you have implemented NetworkActivityMonitor.
I have made this changes because some services creates nsUDPSocket when SocketTransportService is in shutdown or right before shutdown. So the sockets are not attached to sts and trying to destroy the sockets after sts is shut down leads to crash.

The change in nsUDPSocket it is only because at this point there is no sense in dispatching anything (if mListener is set)
Attachment #8515905 - Flags: review?(michal.novotny)
Flags: needinfo?(mcmanus)
Attachment #8515905 - Flags: review?(michal.novotny) → review+
Attached patch fix v1Splinter Review
Only the commit message has been changed.
Attachment #8515905 - Attachment is obsolete: true
Attachment #8516559 - Flags: review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/f3a860f6dedd
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
How far back does this code defect go?
Flags: needinfo?(dd.mozilla)
I am not sure. Actually, the crash is cause by discovery service because it is not stopping on shutdown. other things that use UDPSocket are closing on time so they do not cause a crash, or at least I do not know any other problems, all crashes in this bug and in Bug 1088076 are cause by discovery service. 

I do not know if they change something in discovery service to start causing this crash often (or at all).

Mark,
Has something changed in discovery service that start casing this crash, maybe you are better person to answer how far back this defect goes.
Flags: needinfo?(dd.mozilla) → needinfo?(mark.finkle)
(In reply to Dragana Damjanovic from comment #24)

> Mark,
> Has something changed in discovery service that start casing this crash,
> maybe you are better person to answer how far back this defect goes.

Not that I know about. The only change I think happened was making the discovery service run on Desktop, where previously it only ran on Android.

I'm not against adding some code to the discovery service to listen for shutdown. We never did this on Android because we never really shutdown on Android. The OS just kills the application.
Flags: needinfo?(mark.finkle)
Local backporting attempts say this goes back at least as far as esr31, but it's probably not worth taking that far back. I think Aurora and Beta would be reasonable assuming the risk is low.
Approval Request Comment
[Feature/regressing bug #]: The Discovery service initialize UDPsocket during shutdown and transport service is already in shut down and partially destroyed. This has probably started with making the discovery service run on Desktop (comment 25)
[User impact if declined]: A crash can happen during shutdown.
[Describe test coverage new/current, TBPL]: hard to test, because it needs the exact timing 
[Risks and why]: risk is low
[String/UUID change made/needed]: none
Attachment #8516559 - Flags: approval-mozilla-beta?
Attachment #8516559 - Flags: approval-mozilla-aurora?
Comment on attachment 8516559 [details] [diff] [review]
fix v1

Let's take this in beta7. If there is any fallout, we can back this out in beta8.
Attachment #8516559 - Flags: approval-mozilla-beta?
Attachment #8516559 - Flags: approval-mozilla-beta+
Attachment #8516559 - Flags: approval-mozilla-aurora?
Attachment #8516559 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: