Closed Bug 1345978 Opened 8 years ago Closed 8 years ago

Check if the parent and child processes have different build ids

Categories

(Core :: IPC, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: mccr8, Assigned: mccr8)

References

Details

Attachments

(2 files)

As seen in bug 1345872, it is possible for the updater to run while Firefox is open, and if we then open a new content process then they can end up having different build ids, which can cause weird crashes. We should explicitly detect this condition. Ideally, we'd then crash the parent process so that users are not stuck with a weird zombie state where they can't actually load any tabs. (This could happen with other process types, but maybe just checking the content process is enough.)
Bill walked me through enough of how this could work I think I can work on it.
Assignee: nobody → continuation
try run hasn't finished yet but it'll probably be okay: https://treeherder.mozilla.org/#/jobs?repo=try&revision=95da734ea2fa0e62c90a4d593a9554ff7dd7fc98 I tested what happens when the child process intentionally sends a bad buildid, but I didn't test the real scenario of an update happening while the browser is running.
Can you add telemetry to know how often this happens? We don't want to be in the dark about this.
(In reply to Benjamin Smedberg [:bsmedberg] from comment #5) > Can you add telemetry to know how often this happens? We don't want to be in > the dark about this. Well, the way the patch works now, when we detect this error we just crash the parent process, which should generate a crash report (the idea being that the only real way to fix it is for the user to restart Firefox). Is that sufficient? Ideally this would just quit Firefox nicely and then we'd need telemetry I suppose...
Flags: needinfo?(benjamin)
Comment on attachment 8846746 [details] Bug 1345978, part 1 - Add a way to get the platform build id without the service manager. https://reviewboard.mozilla.org/r/119754/#review121628 Oh man, I have been having the exact difficulty this bug addresses. Yay!
Attachment #8846746 - Flags: review?(nfroyd) → review+
Oh, this would still generate a crash report? That's sufficient I suppose, as long as we have a signature we can point to. I figured it did something like _exit(123).
Flags: needinfo?(benjamin)
The patch has this which is where it will crash: MOZ_RELEASE_ASSERT(parentBuildID == childBuildID); The signature will probably be kind of crummy (it crashes inside some random IPC method). I could split out a separate NEVER_INLINE method if that would be better.
That would probably make it a lot more useful, yes.
I've outlined the checking code. It is also good because that means the IPC method isn't as junked up.
Comment on attachment 8846747 [details] Bug 1345978, part 2 - Check build ids early in content startup. https://reviewboard.mozilla.org/r/119756/#review121662 Thanks so much! ::: ipc/glue/MessageChannel.cpp:850 (Diff revision 2) > + IPC::WriteParam(msg, buildID); > + > + MOZ_RELEASE_ASSERT(!msg->is_sync()); > + MOZ_RELEASE_ASSERT(msg->nested_level() != IPC::Message::NESTED_INSIDE_SYNC); > + > + CxxStackFrame frame(*this, OUT_MESSAGE, msg); I don't think you need this. ::: ipc/glue/MessageChannel.cpp:885 (Diff revision 2) > +MOZ_NEVER_INLINE static void > +CheckChildProcessBuildID(const IPC::Message& aMsg) > +{ > + MOZ_ASSERT(XRE_IsParentProcess()); > + nsCString childBuildID; > + PickleIterator msgIter = PickleIterator(aMsg); PickleIterator msgIter(aMsg) works just as well.
Attachment #8846747 - Flags: review?(wmccloskey) → review+
Pushed by amccreight@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7b3260b7e95a part 1 - Add a way to get the platform build id without the service manager. r=froydnj https://hg.mozilla.org/integration/autoland/rev/4ccdf8cf5a14 part 2 - Check build ids early in content startup. r=billm
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
I was thinking about uplifting this, but I haven't seen any evidence of these crashes yet. I'll try testing on one of my Windows machines. (OSX didn't seem to trigger the issue when I killed the content process after applying an update.)
I tested on Windows, by killing the child process while an update was waiting, and it didn't work either.
One way I've been able to reliably trigger updates while Firefox is running is launching the browser debugger while there is an update pending.
This crash should show up as CheckChildProcessBuildID(), or maybe something with the crash reason as "parentBuildID == childBuildID".
(In reply to Andrew McCreight [:mccr8] from comment #21) > This crash should show up as CheckChildProcessBuildID(), or maybe something > with the crash reason as "parentBuildID == childBuildID". I see some crashes with [@ mozilla::ipc::MessageChannel::SendBuildID | mozilla::dom::ContentChild::Init], but they all seem to be with Build ID 20170330030213. Not sure if those are related to this bug but the top source lines led me to this bug.
(In reply to Marcia Knous [:marcia - use ni] from comment #22) > I see some crashes with [@ mozilla::ipc::MessageChannel::SendBuildID | > mozilla::dom::ContentChild::Init], but they all seem to be with Build ID > 20170330030213. Not sure if those are related to this bug but the top source > lines led me to this bug. Yeah that's a regression from this bug. I guess I should fail more gracefully.
Depends on: 1352458
Depends on: 1360329
Depends on: 1370091
Depends on: 1389760
I have hit this bug while a browser debugger is open. Firefox apparently updated in the background (macOS) and when I was at about:preferences, opened a new tab (or pasted a data:image/png,-URL in a tab and pressed Ctrl-Enter; I cannot remember which of the two I did), the browser crashed. Is it really necessary to crash the whole browser, and not just a single tab and prompt the user for a restart? I had multiple debugging sessions in different tabs, and lost all of that due to this assertion... https://crash-stats.mozilla.com/report/index/c5dfc6ec-31d2-4132-8465-6aac30170901
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: