Closed Bug 1321449 Opened 4 years ago Closed 4 years ago

Crash in IPCError-browser | ShutDownKill

Categories

(Toolkit :: Safe Browsing, defect, P1)

53 Branch
x86_64
Windows 10
defect

Tracking

()

RESOLVED DUPLICATE of bug 1279293

People

(Reporter: euthanasia_waltz, Unassigned)

Details

This bug was filed from the Socorro interface and is 
report bp-8f3a0117-c898-478a-b7b6-faf742161201.
=============================================================

STR:
1. Start firefox with a new profile
2. Visit twitter.com (i.e. https://twitter.com/firefox)
3. Exit firefox
4. Start firefox

ER:
normal starting
AR:
"You have an unsent crash report" notification appears

mozregression:
13:57.04 INFO: Last good revision: f3a0522f6dd8af40dae5da95b3d35c8a2ff5ea29
13:57.04 INFO: First bad revision: a468efda20dfb3a29bb37ef56292febd992bba4e
13:57.04 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=f3a0522f6dd8af40dae5da95b3d35c8a2ff5ea29&tochange=a468efda20dfb3a29bb37ef56292febd992bba4e

13:58.14 INFO: Looks like the following bug has the changes which introduced the regression:
https://bugzilla.mozilla.org/show_bug.cgi?id=1315386
Blocks: 1315386
Priority: -- → P1
I am looking into this one, but I don't see any code related to dbservice/classifier in the crash report.
Oddly it breaks at IPC so I am scanning the IPC things which the dbservice relates to. 
- 1: nsIURIClassifier in content process has been recently supported, tasks in content process in [1] should be stopped if the app is shutting down (apologize for not rebasing and adding the shutdown aware to this change). But I don't think it could be the reason of this problem because we had shutdown aware in parent process. And IPC should be false at [2] (and we could see error IPC log)

[1] https://hg.mozilla.org/mozilla-central/diff/f80a175c7f3f/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp
[2] https://hg.mozilla.org/mozilla-central/file/3cb3e4ebc788/dom/ipc/URLClassifierParent.cpp#l26

- 2: dbservice completed its work but channel was still using the classifier result and send/receive IPC (for example [3], [4]), but I also did not see any log in the crash report like that. It's likely not the the problem.
[3]https://dxr.mozilla.org/mozilla-central/source/netwerk/base/nsChannelClassifier.cpp#683.
[4]https://dxr.mozilla.org/mozilla-central/source/netwerk/base/nsChannelClassifier.cpp#519

- 3: We are handling the shutdown hang Bug 1310060. It occurs when shutdown time takes too long and we got the issue if any randomly task is running while shutting down (unfortunately dbservice did take really long time to do its update). Hopefully bug 1315386 fixed that, as the result, mostly the hangs in bug 1310060 will be transformed to this one (or hang in another module, prove that we still have similar hang shutdown issues somewhere) 
I see the graph in [5], seems "Crash in IPCError-browser | ShutDownKill" occurs continuously for Firefox 53.0a1. I am not sure why mozregression tool pointed to bug 1315386 as introducing regression (still keep investigating it)
[5] https://crash-stats.mozilla.com/signature/?product=Firefox&version=53.0a1&signature=IPCError-browser%20%7C%20ShutDownKill&date=%3E%3D2016-11-01T03%3A42%3A51.000Z&date=%3C2016-12-01T03%3A42%3A51.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=version&page=1#graphs

I am inclined to case 3 because I don't see dbservice things in the report.
(In reply to Thomas Nguyen[:tnguyen] (use ni? plz) from comment #2)
> module, prove that we still have similar hang shutdown issues somewhere) 
> I see the graph in [5], seems "Crash in IPCError-browser | ShutDownKill"
> occurs continuously for Firefox 53.0a1. I am not sure why mozregression tool
> pointed to bug 1315386 as introducing regression (still keep investigating
> it)

I have just thought about the case in which firefox shutdowned while doing update and then restarted. FF then restores data and something could be corrupted. But the big question is I don't see and dbservice code in the report.
Francois may have another better idea. Francois, what do you think about this?
Flags: needinfo?(francois)
Looking for in crash report, I see there'are many these reports. It tends to be in the place where content process in random places is on slow shutdown.
We might find some signatures here that appear to be common, but apparently there's no dbservice's.
See Also: → 1319906
It appears to be the same issue Bug 1319906 with the same signature
(In reply to Jim Jeffery not reading bug-mail 1/2/11 from comment #7)

> Dupe of https://bugzilla.mozilla.org/show_bug.cgi?id=1279293 ?

Agree, I believe it's a dup of bug 1279293 (. But I have no idea(In reply to atlanto from comment #0)

> STR:
> 1. Start firefox with a new profile
> 2. Visit twitter.com (i.e. https://twitter.com/firefox)
> 3. Exit firefox
> 4. Start firefox
> 
I see the steps are quite similar to bug 1319906 (and same signature), is there any freeze or hang in this case?
Flags: needinfo?(euthanasia_waltz)
But I have no idea mozgression points it as regression of https://bugzilla.mozilla.org/show_bug.cgi?id=1315386. This crash signature occurred for a while before landing 1315386
Before I submitted this report, I guessed the nearest bug report is bug 1319906 because twitter.com and facebook.com are problematic site which causes this crash, I don't know other problematic sites. But I'm not having freeze/hang at all, and this crash started after 2016/11/29 nightly build for me. It is the reason why I reported here.

BTW, I don't know the root cause of this crash. This may or may not be dup of bug 1279293.(I've not experienced that crash, I don't have that, I cannot reproduce that, even now)
I cannnot guarantee the result of mozregression but I know some bugfix changes may reveal hidden bugs so the result may not point true cause in some situation.
Flags: needinfo?(euthanasia_waltz)
The problematic sites could not be reproduced recently in this bug (and bug 1319906, as I just looked around). I see the signature error ipc_channel_error = ShutDownKill in correlations of the reports, and sometimes the time out we set for this kill is pretty quick https://dxr.mozilla.org/mozilla-central/source/dom/ipc/ContentParent.cpp#1673
(timeout is only 5 secs) https://dxr.mozilla.org/mozilla-central/source/modules/libpref/init/all.js#2838
I would like to make this bug as a dup of 1279293, but feel free to reopen it if you find anything else
Status: UNCONFIRMED → RESOLVED
Closed: 4 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: IPCError_ShutDownKill
Flags: needinfo?(francois)
Bug #1315386 has nothing to do with thic crashlog signature, so I'm unblocking it.
No longer blocks: 1315386
Crash Signature: [@ IPCError-browser | ShutDownKill]
You need to log in before you can comment on or make changes to this bug.