Closed
Bug 1345978
Opened 8 years ago
Closed 8 years ago
Check if the parent and child processes have different build ids
Categories
(Core :: IPC, enhancement)
Core
IPC
Tracking
()
RESOLVED
FIXED
mozilla55
| Tracking | Status | |
|---|---|---|
| firefox55 | --- | fixed |
People
(Reporter: mccr8, Assigned: mccr8)
References
Details
Attachments
(2 files)
As seen in bug 1345872, it is possible for the updater to run while Firefox is open, and if we then open a new content process then they can end up having different build ids, which can cause weird crashes. We should explicitly detect this condition. Ideally, we'd then crash the parent process so that users are not stuck with a weird zombie state where they can't actually load any tabs. (This could happen with other process types, but maybe just checking the content process is enough.)
| Assignee | ||
Comment 1•8 years ago
|
||
Bill walked me through enough of how this could work I think I can work on it.
Assignee: nobody → continuation
| Comment hidden (mozreview-request) |
| Comment hidden (mozreview-request) |
| Assignee | ||
Comment 4•8 years ago
|
||
try run hasn't finished yet but it'll probably be okay:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=95da734ea2fa0e62c90a4d593a9554ff7dd7fc98
I tested what happens when the child process intentionally sends a bad buildid, but I didn't test the real scenario of an update happening while the browser is running.
Comment 5•8 years ago
|
||
Can you add telemetry to know how often this happens? We don't want to be in the dark about this.
| Assignee | ||
Comment 6•8 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #5)
> Can you add telemetry to know how often this happens? We don't want to be in
> the dark about this.
Well, the way the patch works now, when we detect this error we just crash the parent process, which should generate a crash report (the idea being that the only real way to fix it is for the user to restart Firefox). Is that sufficient?
Ideally this would just quit Firefox nicely and then we'd need telemetry I suppose...
Flags: needinfo?(benjamin)
Comment 7•8 years ago
|
||
| mozreview-review | ||
Comment on attachment 8846746 [details]
Bug 1345978, part 1 - Add a way to get the platform build id without the service manager.
https://reviewboard.mozilla.org/r/119754/#review121628
Oh man, I have been having the exact difficulty this bug addresses. Yay!
Attachment #8846746 -
Flags: review?(nfroyd) → review+
Comment 8•8 years ago
|
||
Oh, this would still generate a crash report? That's sufficient I suppose, as long as we have a signature we can point to. I figured it did something like _exit(123).
Flags: needinfo?(benjamin)
| Assignee | ||
Comment 9•8 years ago
|
||
The patch has this which is where it will crash:
MOZ_RELEASE_ASSERT(parentBuildID == childBuildID);
The signature will probably be kind of crummy (it crashes inside some random IPC method). I could split out a separate NEVER_INLINE method if that would be better.
Comment 10•8 years ago
|
||
That would probably make it a lot more useful, yes.
| Comment hidden (mozreview-request) |
| Assignee | ||
Comment 12•8 years ago
|
||
I've outlined the checking code. It is also good because that means the IPC method isn't as junked up.
Comment 13•8 years ago
|
||
| mozreview-review | ||
Comment on attachment 8846747 [details]
Bug 1345978, part 2 - Check build ids early in content startup.
https://reviewboard.mozilla.org/r/119756/#review121662
Thanks so much!
::: ipc/glue/MessageChannel.cpp:850
(Diff revision 2)
> + IPC::WriteParam(msg, buildID);
> +
> + MOZ_RELEASE_ASSERT(!msg->is_sync());
> + MOZ_RELEASE_ASSERT(msg->nested_level() != IPC::Message::NESTED_INSIDE_SYNC);
> +
> + CxxStackFrame frame(*this, OUT_MESSAGE, msg);
I don't think you need this.
::: ipc/glue/MessageChannel.cpp:885
(Diff revision 2)
> +MOZ_NEVER_INLINE static void
> +CheckChildProcessBuildID(const IPC::Message& aMsg)
> +{
> + MOZ_ASSERT(XRE_IsParentProcess());
> + nsCString childBuildID;
> + PickleIterator msgIter = PickleIterator(aMsg);
PickleIterator msgIter(aMsg) works just as well.
Attachment #8846747 -
Flags: review?(wmccloskey) → review+
| Comment hidden (mozreview-request) |
| Comment hidden (mozreview-request) |
Comment 16•8 years ago
|
||
Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7b3260b7e95a
part 1 - Add a way to get the platform build id without the service manager. r=froydnj
https://hg.mozilla.org/integration/autoland/rev/4ccdf8cf5a14
part 2 - Check build ids early in content startup. r=billm
Comment 17•8 years ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/7b3260b7e95a
https://hg.mozilla.org/mozilla-central/rev/4ccdf8cf5a14
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
| Assignee | ||
Comment 18•8 years ago
|
||
I was thinking about uplifting this, but I haven't seen any evidence of these crashes yet. I'll try testing on one of my Windows machines. (OSX didn't seem to trigger the issue when I killed the content process after applying an update.)
| Assignee | ||
Comment 19•8 years ago
|
||
I tested on Windows, by killing the child process while an update was waiting, and it didn't work either.
Comment 20•8 years ago
|
||
One way I've been able to reliably trigger updates while Firefox is running is launching the browser debugger while there is an update pending.
| Assignee | ||
Comment 21•8 years ago
|
||
This crash should show up as CheckChildProcessBuildID(), or maybe something with the crash reason as "parentBuildID == childBuildID".
Comment 22•8 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #21)
> This crash should show up as CheckChildProcessBuildID(), or maybe something
> with the crash reason as "parentBuildID == childBuildID".
I see some crashes with [@ mozilla::ipc::MessageChannel::SendBuildID | mozilla::dom::ContentChild::Init], but they all seem to be with Build ID 20170330030213. Not sure if those are related to this bug but the top source lines led me to this bug.
| Assignee | ||
Comment 23•8 years ago
|
||
(In reply to Marcia Knous [:marcia - use ni] from comment #22)
> I see some crashes with [@ mozilla::ipc::MessageChannel::SendBuildID |
> mozilla::dom::ContentChild::Init], but they all seem to be with Build ID
> 20170330030213. Not sure if those are related to this bug but the top source
> lines led me to this bug.
Yeah that's a regression from this bug. I guess I should fail more gracefully.
Comment 24•8 years ago
|
||
I have hit this bug while a browser debugger is open.
Firefox apparently updated in the background (macOS) and when I was at about:preferences, opened a new tab (or pasted a data:image/png,-URL in a tab and pressed Ctrl-Enter; I cannot remember which of the two I did), the browser crashed.
Is it really necessary to crash the whole browser, and not just a single tab and prompt the user for a restart? I had multiple debugging sessions in different tabs, and lost all of that due to this assertion...
https://crash-stats.mozilla.com/report/index/c5dfc6ec-31d2-4132-8465-6aac30170901
You need to log in
before you can comment on or make changes to this bug.
Description
•