1366808 - Crash in MOZ_RELEASE_ASSERT(parentBuildID == childBuildID) after updating with multiple profiles (content process update)

Reporter

Description

•

7 years ago

This bug was filed from the Socorro interface and is 
report bp-d31fb160-b65a-4914-9dd5-edfff0170516.
=============================================================

Seen while looking at Mac crashes in Nightly - this one appears to have started around 20170401030204 on 55 - so far a small amount of crashes (but typical of Mac volume: http://bit.ly/2qORDu0

Possible regression range based on Build ID: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=8df9fabf2587b7020889755acb9e75b664fe13cf&tochange=00a166a8640dffa2e0f48650f966d75ca3c1836e

Bug 1344629 touched some code. ni on dmajor in case he has ideas about this crash.

Comments:

trying to open a file: url for some rustdoc output 

One URL: https://developer.mozilla.org/en-US/docs/Web/CSS/image-resolution

Flags: needinfo?(dmajor)

(Away)

Comment 1

•

7 years ago

> This bug was filed from the Socorro interface and is 
> report bp-d31fb160-b65a-4914-9dd5-edfff0170516.
> ============================================================

Wow, there's something wrong with those source links. Not one of them is correct. Ted, is there anything we can do about that?

The actual crash is:
    // This assert can fail if the child process has been updated
    // to a newer version while the parent process was running.
    MOZ_RELEASE_ASSERT(parentBuildID == childBuildID);
mccr8, want to take a look?

Flags: needinfo?(ted)

Flags: needinfo?(dmajor)

Flags: needinfo?(continuation)

(Away)

Comment 2

•

7 years ago

Actually I wonder if the buildID mismatch is in fact the cause of the bad sources...

Andrew McCreight [:mccr8]

Comment 3

•

7 years ago

This is probably just a new signature for bug 1112937. (Though most of these crashes are on OSX which will need a different patch, apparently.)

Depends on: 1112937

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 4

•

7 years ago

Though based on comment 0, it is possible this is some new interaction with the separate process for file: URIs.

Andrew McCreight [:mccr8]

Updated

•

7 years ago

Component: String → Application Update

Product: Core → Toolkit

(not currently active) Ted Mielczarek

Comment 5

•

7 years ago

(In reply to David Major [:dmajor] from comment #1)
> Wow, there's something wrong with those source links. Not one of them is
> correct. Ted, is there anything we can do about that?

I don't think there's anything *incorrect* about them, they're just not super useful. I think this is just inlining making life hard. Breakpad does not have any real support for dealing with inlined frames.

The top frame is doubly unfortunate due to our weird decision to unpack things like our toolchain into $topsrcdir, which makes our scripts that try to map source file names into repository links think that they are part of the hg repo.

Flags: needinfo?(ted)

(Away)

Comment 6

•

7 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #5)
> I don't think there's anything *incorrect* about them, they're just not
> super useful. I think this is just inlining making life hard. Breakpad does
> not have any real support for dealing with inlined frames.

If it were inlining you'd expect to see the source lines on (or after) some function call related to the work that's going on. In this case the lines are just plain bad. And I'm going to say it's not ICF either because these are nontrivial functions. I really think the build mismatch is causing this.

(not currently active) Ted Mielczarek

Comment 7

•

7 years ago

Oh, yeah, I suppose that's possible. If the browser had been updated out from under the process, it's possible we could be looking at the wrong files to get the debug_ids we use to match up symbols.

Andrew McCreight [:mccr8]

Updated

•

7 years ago

Summary: Crash in mozilla::ipc::MessageChannel::MaybeInterceptSpecialIOMessage → Crash in MOZ_RELEASE_ASSERT(parentBuildID == childBuildID)

Sylvestre Ledru [:Sylvestre]

Updated

•

7 years ago

OS: Mac OS X → All

Andrew McCreight [:mccr8]

Updated

•

7 years ago

Crash Signature: [@ mozilla::ipc::MessageChannel::MaybeInterceptSpecialIOMessage] → [@ mozilla::ipc::MessageChannel::MaybeInterceptSpecialIOMessage] [@ mozilla::ipc::CheckChildProcessBuildID ]

Andrew McCreight [:mccr8]

Updated

•

7 years ago

Depends on: 1370677

Julien Wajsberg [:julienw]

Comment 11

•

7 years ago

Bug 1112937 is Windows only, should this bug be its alter ego for Mac, and file one for Linux ?

Andrew McCreight [:mccr8]

Comment 12

•

7 years ago

In the 9-13 OSX build, this showed up with a new signature.

Crash Signature: [@ mozilla::ipc::MessageChannel::MaybeInterceptSpecialIOMessage] [@ mozilla::ipc::CheckChildProcessBuildID ] → [@ mozilla::ipc::MessageChannel::MaybeInterceptSpecialIOMessage] [@ mozilla::ipc::CheckChildProcessBuildID ] [@ mozilla::ipc::MessageChannel::Send | mozilla::ipc::MessageChannel::Open ]

Andrew McCreight [:mccr8]

Updated

•

7 years ago

Crash Signature: [@ mozilla::ipc::MessageChannel::MaybeInterceptSpecialIOMessage] [@ mozilla::ipc::CheckChildProcessBuildID ] [@ mozilla::ipc::MessageChannel::Send | mozilla::ipc::MessageChannel::Open ] → [@ mozilla::ipc::MessageChannel::MaybeInterceptSpecialIOMessage] [@ mozilla::ipc::CheckChildProcessBuildID ] [@ mozilla::ipc::MessageChannel::Send | mozilla::ipc::MessageChannel::Open ] [@ mozilla::ipc::MessageChannel::OnMessageReceivedFromLink ]

Ting-Yu Chou [:ting] (away)

Updated

•

7 years ago

Crash Signature: [@ mozilla::ipc::MessageChannel::MaybeInterceptSpecialIOMessage] [@ mozilla::ipc::CheckChildProcessBuildID ] [@ mozilla::ipc::MessageChannel::Send | mozilla::ipc::MessageChannel::Open ] [@ mozilla::ipc::MessageChannel::OnMessageReceivedFromLink ] → [@ mozilla::ipc::MessageChannel::MaybeInterceptSpecialIOMessage] [@ mozilla::ipc::CheckChildProcessBuildID ] [@ mozilla::ipc::MessageChannel::Send | mozilla::ipc::MessageChannel::Open ] [@ mozilla::ipc::MessageChannel::OnMessageReceivedFromLink ] [@ …

Kan-Ru Chen [:kanru] (UTC+9)

Comment 14

•

7 years ago

A new signature for Mac.

Crash Signature: mozilla::ipc::MessageChannel::ShouldDeferMessage] → mozilla::ipc::MessageChannel::ShouldDeferMessage] [@ mozilla::ipc::MessageChannel::RejectPendingPromisesForActor ]

Andrew McCreight [:mccr8]

Comment 15

•

7 years ago

Another new Mac signature.

Crash Signature: mozilla::ipc::MessageChannel::ShouldDeferMessage] [@ mozilla::ipc::MessageChannel::RejectPendingPromisesForActor ] → mozilla::ipc::MessageChannel::ShouldDeferMessage] [@ mozilla::ipc::MessageChannel::RejectPendingPromisesForActor ] [@ mozilla::ipc::MessageChannel::Send | mozilla::ipc::MessageChannel::OnOpenAsSlave ]

(no longer active)

Comment 16

•

7 years ago

FWIW I just got this crash: https://crash-stats.mozilla.com/report/index/09cbf135-c74c-4250-9d49-284560171003

What happened was that my Nightly was open for a few days, and I had ran Nightly in a new profile manually like below:

$ /path/to/nightly/firefox -no-remote -profile /path/to/profile

this had caused Nightly to update while the main browser instance was running.  Then I clicked on a tab which wasn't restored yet from the previous session and the browser crashed with the above signature.

Julien Wajsberg [:julienw]

Comment 17

•

7 years ago

Yep. My way of avoiding this now is having 2 different nightly installs: one for work, one for navigation.

Boris Zbarsky [:bzbarsky]

Comment 18

•

7 years ago

I ran into this today without any manual new-profile stuff, just using our UI.  My steps to reproduce:

1)  Run nightly for a few days.
2)  Make sure "devtools.chrome.enabled" is true.
3)  Tools > Web Developer > Browser Toolbox
4)  Keep browsing.

Step 3 starts a new Firefox instance, installs an update, and then the next time you start a new child process you lose.  We should probably disable the devtool menuitems that trigger a separate instance when there is an update pending or something.

Calixte Denizet (:calixte)

Comment 19

•

7 years ago

There are 211 crashes in nightly 58 for the pattern signature "libxul.so.moz-backup (deleted)@"0x[0-9a-fA-F]+ (see [1]).
The moz crash reason is "MOZ_RELEASE_ASSERT(parentBuildID == childBuildID)" for 91% of those crashes.

[1] http://bit.ly/2gymszf

status-firefox58: --- → affected

Andrew McCreight [:mccr8]

Comment 20

•

7 years ago

Another new signature for this.

Crash Signature: mozilla::ipc::MessageChannel::ShouldDeferMessage] [@ mozilla::ipc::MessageChannel::RejectPendingPromisesForActor ] [@ mozilla::ipc::MessageChannel::Send | mozilla::ipc::MessageChannel::OnOpenAsSlave ] → mozilla::ipc::MessageChannel::ShouldDeferMessage] [@ mozilla::ipc::MessageChannel::RejectPendingPromisesForActor ] [@ mozilla::ipc::MessageChannel::Send | mozilla::ipc::MessageChannel::OnOpenAsSlave ] [@ mozilla::ipc::ProcessLink::Open ]

Andrew McCreight [:mccr8]

Updated

•

7 years ago

Depends on: 1408448

Nathan Froyd [:froydnj]

Comment 21

•

7 years ago

CheckChildProcessBuildID, the function where the assert comes from, is already MOZ_NEVER_INLINE, and it's only called in one place.  So I could see the MaybeInterceptSpecialIOMessage signature (where CheckChildProcessBuildID is called from) and the CheckChildProcessBuildID signature (obviously) being valid, but I can't understand where all the other signatures are coming from...unless there is some wild amount of inlining going on.  MaybeInterceptSpecialIOMessage is only being called from one place (MessageChannel::OnMessageReceivedFromLink)...maybe there's a couple of other levels of inlining at play here and/or the debug information is getting mangled by the compiler and/or our symbol processing somehow?

Nathan Froyd [:froydnj]

Comment 22

•

7 years ago

We could try moving CheckChildProcessBuildID out into its own file and putting that file in SOURCES, so the compiler couldn't see from MessageChannel.cpp that the function only gets called once and therefore would be forced to do a full call.  LTO would defeat that, but a quick skim of signatures looks like this is Mac-only, where we don't do LTO, so we'd be "safe"?

Another option would be to make CheckChildProcessBuildID non-static--maybe there's some heuristic in the compiler inliner that says "if the function is static and only called once, inline", and that heuristic fires before it checks for MOZ_NEVER_INLINE"?

Moving it out into a separate file seems like a good start, at least.

Andrew McCreight [:mccr8]

Comment 23

•

7 years ago

Yeah, I don't know what is going on. The stack for the latest signature is really bizarre: bp-cc1803f9-7a3a-418c-9e5a-2e8060171012

Nathan Froyd [:froydnj]

Comment 24

•

7 years ago

(In reply to Andrew McCreight [:mccr8] from comment #23)
> Yeah, I don't know what is going on. The stack for the latest signature is
> really bizarre: bp-cc1803f9-7a3a-418c-9e5a-2e8060171012

0 	XUL 	mozilla::ipc::ProcessLink::Open(IPC::Channel*, MessageLoop*, mozilla::ipc::Side) 	mfbt/RefPtr.h:79
1 	XUL 	mozilla::ipc::MessageChannel::Clear() 	clang/include/c++/v1/deque:2588
2 	XUL 	mozilla::ipc::MessageChannel::Open(mozilla::ipc::MessageChannel*, nsIEventTarget*, mozilla::ipc::Side) 	ipc/glue/MessageChannel.cpp:842
3 	XUL 	mozilla::ipc::MessageChannel::PostErrorNotifyTask() 	mfbt/AlreadyAddRefed.h:72

The locations for those functions are flat-out nonsense, with the possible exception of MessageChannel::Open...

(not currently active) Ted Mielczarek

Comment 25

•

7 years ago

Per comment 6 / comment 7, I think this is an artifact of this type of crash--the app binaries have been updated out from under the running process, so when it goes to write a crash it's looking at the files for the *new* version to read the mach headers and get the debug identifiers.

On Linux this manifests differently (per comment 19) in that we read /proc/self/maps and that tracks what the actual binaries are.

Calixte Denizet (:calixte)

Comment 26

•

7 years ago

The signature "mozilla::ipc::MessageChannel::Send | mozilla::ipc::MessageChannel::Send | mozilla::ipc::MessageChannel::Send | mozilla::ipc::ActorIdReadError" just arised today.

Crash Signature: mozilla::ipc::MessageChannel::ShouldDeferMessage] [@ mozilla::ipc::MessageChannel::RejectPendingPromisesForActor ] [@ mozilla::ipc::MessageChannel::Send | mozilla::ipc::MessageChannel::OnOpenAsSlave ] [@ mozilla::ipc::ProcessLink::Open ] → mozilla::ipc::MessageChannel::ShouldDeferMessage] [@ mozilla::ipc::MessageChannel::RejectPendingPromisesForActor ] [@ mozilla::ipc::MessageChannel::Send | mozilla::ipc::MessageChannel::OnOpenAsSlave ] [@ mozilla::ipc::ProcessLink::Open ] [@ mozilla::ip…

Calixte Denizet (:calixte)

Comment 27

•

7 years ago

There are 157 crashes in 57.0b8 and nothing before for signature "mozilla::ipc::MessageChannel::RepostAllMessages".
The pushlog for beta 8 is:
https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=b92b69f3503e&tochange=d96198d960f3

Crash Signature: mozilla::ipc::MessageChannel::Send | mozilla::ipc::MessageChannel::Send | mozilla::ipc::MessageChannel::Send | mozilla::ipc::ActorIdReadError ] → mozilla::ipc::MessageChannel::Send | mozilla::ipc::MessageChannel::Send | mozilla::ipc::MessageChannel::Send | mozilla::ipc::ActorIdReadError ] [@ mozilla::ipc::MessageChannel::RepostAllMessages ]

status-firefox57: --- → affected

[:philipp]

Updated

•

7 years ago

Crash Signature: mozilla::ipc::MessageChannel::Send | mozilla::ipc::MessageChannel::Send | mozilla::ipc::MessageChannel::Send | mozilla::ipc::ActorIdReadError ] [@ mozilla::ipc::MessageChannel::RepostAllMessages ] → mozilla::ipc::MessageChannel::Send | mozilla::ipc::MessageChannel::Send | mozilla::ipc::MessageChannel::Send | mozilla::ipc::ActorIdReadError ] [@ mozilla::ipc::MessageChannel::RepostAllMessages ] [@ mozilla::ipc::MessageChannel::CancelCurrentTransacti…

Timothy Nikkel (:tnikkel)

Comment 28

•

7 years ago

I get this crash somewhat regularly. I haven't tested it, but my speculation for what would be steps to reproduce (this is on OS X):
1) start Firefox in profile 1
2) wait until an update is available, complete the update of Firefox from a different profile
3) navigate to a file:// uri in the still running Firefox with profile 1.
4) crash

I've crashed many times doing something like this. I guess we spawn a new process to handle the file uri and it uses the new build?

Andrew McCreight [:mccr8]

Comment 29

•

7 years ago

Yes, the basic problem is updating when running with multiple profiles, then starting a new process. See bug 1112937.

Andrew McCreight [:mccr8]

Updated

•

7 years ago

Summary: Crash in MOZ_RELEASE_ASSERT(parentBuildID == childBuildID) → Crash in MOZ_RELEASE_ASSERT(parentBuildID == childBuildID) after updating with multiple profiles

Robert Strong (they/them - no direct email)

Updated

•

7 years ago

Priority: -- → P2

Marcia Knous [:marcia]

Reporter

Comment 30

•

7 years ago

Adding another signature that affects Mac and Linux.

Crash Signature: mozilla::ipc::MessageChannel::CancelCurrentTransaction ] → mozilla::ipc::MessageChannel::CancelCurrentTransaction ] [@ parentBuildID != childBuildID]

Attila Soki

Comment 31

•

7 years ago

i don't know if this signature is already added, but here is a crash report from 57.0b11->57.0b12
bp-b62a6272-5fcd-476c-93f9-87d100171030

in my case, the crash occurred within the same profile as the already running instance. I started accidentally a second copy of Firefox.
After the second instance updated the binaries, a warning was displayed with "another copy of firefox is already running".
The first instance crashed on the next click.

Marcia Knous [:marcia]

Reporter

Comment 32

•

7 years ago

parentBuildID != childBuildID has moved up to #7 on 58. Combination of Linux + Mac crashes.

Douglas Bagnall

Comment 33

•

7 years ago

"parentBuildID != childBuildID" affects linux users who have distro packages and have something like unattended-upgrades running. Firefox is replaced in the background (often overnight) and there is no UI warning.

It is not necessary to be running multiple profiles or advanced versions. This hits people with regular 55->56 and 56->57 upgrades.

Brian Birtles (:birtles, Feb 26~ parental leave)

Comment 34

•

7 years ago

I hit this on Nightly on Ubuntu 17.04 just now. I had two profiles running, only one of them showed it had a pending update (maybe I'd already updated the other and forgotten about it?). After I updated the one showing a pending update, I opened a new tab in the other profile and it crashed.

Pierre Rudloff

Comment 36

•

6 years ago

Like Douglas Bagnall said, this happens every time I update the Firefox package on Ubuntu and then open a new tab. It doesn't even have to be a major update.

Hiroyuki Ikezoe (:hiro)

Comment 37

•

6 years ago

This crash happened now.  From what I can tell this happens when firefox was updated in a different profile (I open a new firefox with '-P -no-remote' option and the nightly was updated automatically) while another firefox instance is alive, and after the update I open a new tab on the another firefox (which is the previous nightly instance), then crashed.

Oh, now I realized Brian commented the same thing in comment 34. :)

Boris Zbarsky [:bzbarsky]

Updated

•

6 years ago

Summary: Crash in MOZ_RELEASE_ASSERT(parentBuildID == childBuildID) after updating with multiple profiles → Crash in MOZ_RELEASE_ASSERT(parentBuildID == childBuildID) after updating with multiple profiles (content process update)

Stephen A Pohl [:spohl]

Assignee

Comment 38

•

6 years ago

Is my understanding correct that this is a parent process crash? Would it be possible to fork a child process rather than starting an entirely new process? Are there situations where we do not have a child process available to fork?

Flaburgan

Comment 39

•

6 years ago

Hi everyone, firefox nightly just crashed (https://crash-stats.mozilla.com/report/index/7a97bfd9-03a5-487c-8a53-7cb6f0180211) so this leaded me here. I lost my tabs, including the pinned ones btw :(

To give some context to this crash, I just downloaded a video with Video Download Helper, which notified me the download ended. I then closed firefox, but IIRC the notification was still here (and it didn't look like the ones of my system, not too sure if it the firefox process was the one displaying it). Then I suspend Ubuntu for a few minutes, and then open Firefox, which crashed (see bug report above) and reopen itself without tabs (my history is still there though).

Wayne Mery (:wsmwk)

Comment 40

•

6 years ago

bp-fb75554f-13c8-437b-b2f3-bd8320180205 linux

(not currently active) Ted Mielczarek

Comment 41

•

6 years ago

(In reply to Stephen A Pohl [:spohl] from comment #38)
> Is my understanding correct that this is a parent process crash? Would it be
> possible to fork a child process rather than starting an entirely new
> process? Are there situations where we do not have a child process available
> to fork?

Correct. The child process apparently sends its build ID over in an IPC message on startup, and the parent process does this assertion:
https://dxr.mozilla.org/mozilla-central/rev/2b7d42d527af582a465e49187fe387442d524a7c/ipc/glue/MessageChannel.cpp#1046

Forking seems fraught with peril given all the existing Gecko threads in the parent process, and the sandboxing setup we do for child processes, but we did have a similar feature for B2G where we would preload child processes by forking. Additionally, we can't fork on Windows, so it doesn't help there.

I would think the best solution here would be to detect that we've applied an update *before* trying to launch a new child process, and if we have, just reuse existing content processes rather than wind up crashing the whole browser.

Stephen A Pohl [:spohl]

Assignee

Comment 42

•

6 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #41)
> I would think the best solution here would be to detect that we've applied
> an update *before* trying to launch a new child process, and if we have,
> just reuse existing content processes rather than wind up crashing the whole
> browser.

I'm going to start by looking at the best way to detect that an update has occurred. Ideally, we could avoid launching a child process to get its build ID. I'm not sure if this is going to be possible.

Assignee: nobody → spohl.mozilla.bugs

Priority: P2 → P1

(not currently active) Ted Mielczarek

Comment 43

•

6 years ago

(In reply to Stephen A Pohl [:spohl] from comment #42)
> I'm going to start by looking at the best way to detect that an update has
> occurred. Ideally, we could avoid launching a child process to get its build
> ID. I'm not sure if this is going to be possible.

You should be able to just check application.ini in the install directory. You could cache the mtime on startup and then do a fairly cheap stat to check if it has changed since then, and if so actually read the file and check the build id.

Haik Aftandilian [:haik]

Comment 44

•

6 years ago

(In reply to Stephen A Pohl [:spohl] from comment #42)
> (In reply to Ted Mielczarek [:ted.mielczarek] from comment #41)
> > I would think the best solution here would be to detect that we've applied
> > an update *before* trying to launch a new child process, and if we have,
> > just reuse existing content processes rather than wind up crashing the whole
> > browser.

We'd have to handle a crashed tab by moving the session to an existing running process. If we restarted a crashed tab, we would hit the parent crash. That would probably also be true for launching media/plugin processes which both get spawned on-demand. If we move work to existing processes, I'd worry that we might end up with degraded performance because more work than normal is being done by fewer processes.

> I'm going to start by looking at the best way to detect that an update has
> occurred. Ideally, we could avoid launching a child process to get its build
> ID. I'm not sure if this is going to be possible.

I think the ideal solution would be to have updates not overwrite existing versions and let multiple versions exist at the same time on disk. Parent process version X would only ever launch child process version X. When an update is downloaded (or at some other opportune time) we could delete versions for which there is no in-use profile.

(not currently active) Ted Mielczarek

Comment 45

•

6 years ago

(In reply to Haik Aftandilian [:haik] from comment #44)
> (In reply to Stephen A Pohl [:spohl] from comment #42)
> > (In reply to Ted Mielczarek [:ted.mielczarek] from comment #41)
> > > I would think the best solution here would be to detect that we've applied
> > > an update *before* trying to launch a new child process, and if we have,
> > > just reuse existing content processes rather than wind up crashing the whole
> > > browser.
> 
> We'd have to handle a crashed tab by moving the session to an existing
> running process. If we restarted a crashed tab, we would hit the parent
> crash. That would probably also be true for launching media/plugin processes
> which both get spawned on-demand. If we move work to existing processes, I'd
> worry that we might end up with degraded performance because more work than
> normal is being done by fewer processes.

Agreed, but degraded performance is certainly better than crashing the entire browser. :)

> > I'm going to start by looking at the best way to detect that an update has
> > occurred. Ideally, we could avoid launching a child process to get its build
> > ID. I'm not sure if this is going to be possible.
> 
> I think the ideal solution would be to have updates not overwrite existing
> versions and let multiple versions exist at the same time on disk. Parent
> process version X would only ever launch child process version X. When an
> update is downloaded (or at some other opportune time) we could delete
> versions for which there is no in-use profile.

This does sound good, but would require changes to the updater. I believe this is basically what Chrome does.

Stephen A Pohl [:spohl]

Assignee

Comment 46

•

6 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #45)
> > > I'm going to start by looking at the best way to detect that an update has
> > > occurred. Ideally, we could avoid launching a child process to get its build
> > > ID. I'm not sure if this is going to be possible.
> > 
> > I think the ideal solution would be to have updates not overwrite existing
> > versions and let multiple versions exist at the same time on disk. Parent
> > process version X would only ever launch child process version X. When an
> > update is downloaded (or at some other opportune time) we could delete
> > versions for which there is no in-use profile.
> 
> This does sound good, but would require changes to the updater. I believe
> this is basically what Chrome does.

I don't believe that having multiple versions installed in parallel is going to be a solution due to security and maintenance/cleanup concerns. Chrome may be able to get away with it because it is installed in a per-user location (if I recall correctly). I'm going to expand the current checks for pending updates in our startup code to also check for currently running Firefox processes (only for the path that we're trying to update). We may be able to get away with postponing updates. Or we could notify the user of the pending update and offer the option to forcibly shut down existing Firefox processes. The details here are tbd.

Thank you for all the input so far!

Marcia Knous [:marcia]

Reporter

Comment 47

•

6 years ago

Adding 59 and 60 as affected. On 60 nightly this is the #3 overall browser crash.

status-firefox59: --- → affected

status-firefox60: --- → affected

Stephen A Pohl [:spohl]

Assignee

Comment 48

•

6 years ago

(In reply to Marcia Knous [:marcia - needinfo? me] from comment #47)
> Adding 59 and 60 as affected. On 60 nightly this is the #3 overall browser
> crash.

Fyi, this is explained by the higher frequency of updates on nightly. There will be a steep dropoff when 60 moves to beta, and even more when it moves to release.

Stephen A Pohl [:spohl]

Assignee

Comment 49

•

6 years ago

Attached image chrome page mockup — Details

Romain, I put together a document with different solutions to address this crash:

https://docs.google.com/document/d/1dUhc1fS6vHTIJXMLcs6vTatKmrb2FXtyiMZbtriZGME

The recommended solution is to display a chrome page similar to the attached mockup to notify the user of the fact that an update has occurred in the background and that Firefox needs to be restarted to continue. Would you be able to help us find someone in UX to nail down the specifics of this chrome page? Thank you!

Flags: needinfo?(rtestard)

Stephen A Pohl [:spohl]

Assignee

Updated

•

6 years ago

URL: https://docs.google.com/document/d/1d...

Romain Testard [:RT]

Comment 50

•

6 years ago

(In reply to Stephen A Pohl [:spohl] from comment #49)
> Created attachment 8960281 [details]
> chrome page mockup
> 
> Romain, I put together a document with different solutions to address this
> crash:
> 
> https://docs.google.com/document/d/
> 1dUhc1fS6vHTIJXMLcs6vTatKmrb2FXtyiMZbtriZGME
> 
> The recommended solution is to display a chrome page similar to the attached
> mockup to notify the user of the fact that an update has occurred in the
> background and that Firefox needs to be restarted to continue. Would you be
> able to help us find someone in UX to nail down the specifics of this chrome
> page? Thank you!

Thanks for the details there.
Do we have an idea of how often this happens on the different channels (crashes per KHours of usage or absolute number of crashes in a day)?
The recommendation sounds right so long as this is not a frequent thing for users to see. Also seeing it on Nightly is obviously more acceptable than release.
I NI Bram from UX who designed the update UI - Bram do you have views on this? Maybe there is a way this can fit into the existing update UI (I was thinking we re-use the flow where we display a door hanger asking users to restart if they have not restarted X hours after update was downloaded - this currently applies to users who enabled "check for updates but lets you choose to install them").

Flags: needinfo?(spohl.mozilla.bugs)

Flags: needinfo?(rtestard)

Flags: needinfo?(bram)

Douglas Bagnall

Comment 51

•

6 years ago

(In reply to Romain Testard [:RT] from comment #50)

> Do we have an idea of how often this happens on the different channels
> (crashes per KHours of usage or absolute number of crashes in a day)?

For people using the standard release channel this happens once per major version,
and sometimes with a bugfix release.

(In reply to Stephen A Pohl [:spohl] from comment #50)

> > The recommended solution is to display a chrome page similar to the attached
> > mockup to notify the user of the fact that an update has occurred in the
> > background and that Firefox needs to be restarted to continue. Would you be
> > able to help us find someone in UX to nail down the specifics of this chrome
> > page? Thank you!

I agree that this is a good solution.

Bram Pitoyo [:bram]

Comment 52

•

6 years ago

(In reply to Romain Testard [:RT] from comment #50)
> […] Maybe there is a way this can fit into the existing update UI (I was
> thinking we re-use the flow where we display a door hanger asking users to
> restart if they have not restarted X hours after update was downloaded -
> this currently applies to users who enabled "check for updates but lets you
> choose to install them").

Yes. I would prefer to show a doorhanger (rather than an in-content page) that asks users to restart Firefox some time after the update was downloaded.

They may choose “Not Now”. Doing so will simply minimise the doorhanger to be inside the menu (the message inside the menu is “Restart to Update Firefox”. The menu is now flagged with a green up arrow icon.

In other words, same behaviour as the one we have now.

Flags: needinfo?(bram)

Stephen A Pohl [:spohl]

Assignee

Comment 53

•

6 years ago

(In reply to Douglas Bagnall from comment #51)
> (In reply to Romain Testard [:RT] from comment #50)
> 
> > Do we have an idea of how often this happens on the different channels
> > (crashes per KHours of usage or absolute number of crashes in a day)?
> 
> For people using the standard release channel this happens once per major
> version,
> and sometimes with a bugfix release.

This would be worst case scenario, yes. However, this should only affect people who run multiple profiles, who open the browser toolbox (which runs with a different user profile) or who run under multiple user accounts. Most users are expected to run a single profile under one user account. I will try to extract actual numbers from Socorro to give us a slightly better idea here.

(In reply to Bram Pitoyo [:bram] from comment #52)
> (In reply to Romain Testard [:RT] from comment #50)
> > […] Maybe there is a way this can fit into the existing update UI (I was
> > thinking we re-use the flow where we display a door hanger asking users to
> > restart if they have not restarted X hours after update was downloaded -
> > this currently applies to users who enabled "check for updates but lets you
> > choose to install them").
> 
> Yes. I would prefer to show a doorhanger (rather than an in-content page)
> that asks users to restart Firefox some time after the update was downloaded.
> 
> They may choose “Not Now”. Doing so will simply minimise the doorhanger to
> be inside the menu (the message inside the menu is “Restart to Update
> Firefox”. The menu is now flagged with a green up arrow icon.
> 
> In other words, same behaviour as the one we have now.

The doorhanger may not be a particularly good solution here for the following reason: This crash occurs when an update has occurred in the background (without the user's awareness) and a new content process is spawned. From a user's persective, this typically happens when a new tab is opened. When a user runs into this situation, there isn't anything to display as content in the tab. If we display the doorhanger, and especially if we allow the user to dismiss the doorhanger with "Not Now", the user would be left with a blank tab. Any further new tabs will equally display a blank page and basically lead to a broken experience until the browser is restarted. Only existing tabs will continue to work. From a user's perspective, this scenario resembles the "Gah! Your tab has crashed." scenario, which is why the chrome page was suggested.

Flags: needinfo?(spohl.mozilla.bugs) → needinfo?(bram)

Bram Pitoyo [:bram]

Comment 54

•

6 years ago

I didn’t know that failing to take action will lead to a broken experience. I had thought that the user can say “Not Now” and continue browsing without a problem – the update will install on next restart.

In this case, the in-page solution is the right one. But it’s quite in-your-face and can be jarring, so I hope that users won’t have to see it very often. From the discussions above, it sounds like they don’t.

We can say something like:

> Sorry. We just need to do one small thing to keep going.
> 
> We have just installed an update in the background. Click Restart Firefox to finish applying it.
> 
> We will restore all your pages, windows and tabs afterwards, so you can be on your way quickly.

For illustration, we can reuse error-connection-failure.svg.

Caveat: if we expect a significant number of users to see this error, then it becomes a bigger problem. Let’s consult with Content Strategy.

Flags: needinfo?(bram)

Bram Pitoyo [:bram]

Comment 55

•

6 years ago

Attached image update applied — Details

Bram Pitoyo [:bram]

Comment 56

•

6 years ago

Mockup attached.

Jim Mathies [:jimm]

Updated

•

6 years ago

Blocks: 1352192

Jim Mathies [:jimm]

Comment 58

•

6 years ago

(In reply to Bram Pitoyo [:bram] from comment #54)
> I didn’t know that failing to take action will lead to a broken experience.
> I had thought that the user can say “Not Now” and continue browsing without
> a problem – the update will install on next restart.
> 
> In this case, the in-page solution is the right one. But it’s quite
> in-your-face and can be jarring, so I hope that users won’t have to see it
> very often. From the discussions above, it sounds like they don’t.

Safe bet, I doubt this will be seen by anyone other than power users. I can think of two typical scenarios that might bring this up:

1) multiple users uner multiple accounts are on the same device and are using Firefox concurrently
2) power users who run two or more Firefox profiles at the same time within their session.

Trevor Skywalker

Comment 59

•

6 years ago

(In reply to Jim Mathies [:jimm] from comment #58)
> (In reply to Bram Pitoyo [:bram] from comment #54)
> > I didn’t know that failing to take action will lead to a broken experience.
...
> Safe bet, I doubt this will be seen by anyone other than power users. I can
> think of two typical scenarios that might bring this up:
> 
> 1) multiple users uner multiple accounts are on the same device and are
> using Firefox concurrently
> 2) power users who run two or more Firefox profiles at the same time within
> their session.

I do both those things. I use multiple windows user accounts (switch user) and multiple firefox profiles. I also run multiple firefox profiles within a single user account on Windows. Although my problems with hangs and crashes also occur when I'm only running a single profile, a single firefox instance. 

If I run Nightly concurrently with beta or release there is an advantage when looking at the Windows Resource Monitor because Nightly gets a distinct name (under Description). So I can tell which instance of firefox is hung, or is consuming high CPU or RAM. 

Image		Description
firefox.exe	Firefox
firefox.exe	Firefox Nightly

Windows 7 on a few laptops and desktops.

Anthony DeRobertis

Comment 60

•

6 years ago

(In reply to Jim Mathies [:jimm] from comment #58)

> Safe bet, I doubt this will be seen by anyone other than power users. I can
> think of two typical scenarios that might bring this up:

I don't think it has been mentioned in this bug (couldn't find it in the comments so far) and it's not in :spohl's document, so: those scenarios are Windows (and probably Mac OS X) -specific scenarios.

On Linux, Firefox is often installed through and updated by the distro's package manager; and thus even on single-user desktops the upgrade can wind up happening while Firefox is running w/o the user being aware (and unattended upgrades are more likely to be in use on non-power-user desktops). Definitely happens on Debian (from first-hand experience), and should on Ubuntu as well. This also means that approach #2 from the document won't work on Linux, but the recommended approach #3 does.

[On Linux of course there should be another approach — keep using the old executables, as they're not really deleted as long as they're open. But I'm guessing that'd require a bunch of platform-specific code. So personally I'd consider #3 fine.]

PS: Remember Bugzilla comments should add additional information useful to solving the bug; each comment winds up being emailed to a lot of people,

Julien Wajsberg [:julienw]

Comment 61

•

6 years ago

(In reply to Jim Mathies [:jimm] from comment #58)
> 
> 1) multiple users uner multiple accounts are on the same device and are
> using Firefox concurrently
> 2) power users who run two or more Firefox profiles at the same time within
> their session.

A few other cases have been mentioned:

3) running a new content process

I believe this can happen for example if a content process has crashed (or the user killed it because for example s/he saw a big CPU usage for this process because of some website).
Or if not every content process were launched before an update started -- I think this can happen with very lowtech users, that for example open firefox to visit one website only, firefox updates in background, and the user opens a new tab at that moment.

I've definitely seen this scenario myself, but as a Nightly user this is triggered differently: when restarting after an update, not all content processes are run at restart, then I get a new update right away because it's Nightly, then I open a new tab.

4) power users running the browser toolbox (OK, this is very unusual on release).

Stephen A Pohl [:spohl]

Assignee

Comment 62

•

6 years ago

Attached patch 1. about:restartrequired page (obsolete) — Details — Splinter Review

This adds the new about:restartrequired page. :jaws, I'm requesting review from you since you've reviewed the about:tabcrashed page, which this page is modeled after. Please feel free to redirect the review as you see fit. Thank you!

Attachment #8965351 - Flags: review?(jaws)

Stephen A Pohl [:spohl]

Assignee

Comment 63

•

6 years ago

Attached patch 2. Handle buildID mismatch (obsolete) — Details — Splinter Review

The parent and child are considered of different buildIDs until the child confirms that the buildIDs match. This allows us to shut down the child immediately if a buildID mismatch is detected and will allow the parent to display the correct about: page when it detects that the child has shut down. If the parent never received a "buildIDs match" message, it will display the new about:restartrequired page. Otherwise, it will be treated as a regular content process crash and continue to display the expected about:tabcrashed page.

I'm requesting review as follows, but please feel free to forward reviews as you see fit:
:jimm - IPC code
:bz - docshell
:jaws - frontend

Attachment #8965357 - Flags: review?(jmathies)

Attachment #8965357 - Flags: review?(jaws)

Attachment #8965357 - Flags: review?(bzbarsky)

Jared Wein [:jaws] (please needinfo? me)

Updated

•

6 years ago

Attachment #8965351 - Flags: review?(jaws) → review?(felipc)

Jared Wein [:jaws] (please needinfo? me)

Updated

•

6 years ago

Attachment #8965357 - Flags: review?(jaws) → review?(felipc)

Boris Zbarsky [:bzbarsky]

Comment 64

•

6 years ago

Comment on attachment 8965357 [details] [diff] [review]
2. Handle buildID mismatch

r=me on the docshell bit

Attachment #8965357 - Flags: review?(bzbarsky) → review+

Jim Mathies [:jimm]

Comment 65

•

6 years ago

Comment on attachment 8965357 [details] [diff] [review]
2. Handle buildID mismatch

Review of attachment 8965357 [details] [diff] [review]:
-----------------------------------------------------------------

For the /ipc and dom/ipc stuff in TabParent - lgtm.

Attachment #8965357 - Flags: review?(jmathies) → review+

Stephen A Pohl [:spohl]

Assignee

Comment 66

•

6 years ago

Attached patch 3. Telemetry probe (obsolete) — Details — Splinter Review

The way we're currently "tracking" cases of buildID mismatches is via crash reports, since every buildID mismatch is currently crashing the browser. We should make sure that we still have a way to track the number of times that these buildID mismatches occur after we land the patches here, but now via telemetry instead. This data can help us determine if users run into this too frequently and if so, we may want to explore additional remedies.

This is modeled after the work in bug 1269961.

Attachment #8965521 - Flags: review?(jmathies)

Stephen A Pohl [:spohl]

Assignee

Updated

•

6 years ago

Attachment #8965357 - Attachment filename: bug1366808-buildIdMismatch.diff → 2. Patch

Stephen A Pohl [:spohl]

Assignee

Updated

•

6 years ago

Attachment #8965351 - Attachment description: about:restartrequired page → 1. about:restartrequired page

Stephen A Pohl [:spohl]

Assignee

Updated

•

6 years ago

Attachment #8965357 - Attachment description: bug1366808-buildIdMismatch.diff → 2. bug1366808-buildIdMismatch.diff

Attachment #8965357 - Attachment filename: 2. Patch → Patch

Stephen A Pohl [:spohl]

Assignee

Updated

•

6 years ago

Attachment #8965521 - Attachment description: Telemetry probe → 3. Telemetry probe

Jim Mathies [:jimm]

Updated

•

6 years ago

Attachment #8965521 - Flags: review?(jmathies) → review+

:Felipe Gomes (needinfo for replies!)

Comment 67

•

6 years ago

Comment on attachment 8965357 [details] [diff] [review]
2. Handle buildID mismatch

Review of attachment 8965357 [details] [diff] [review]:
-----------------------------------------------------------------

r+ for the JS parts

Attachment #8965357 - Flags: review?(felipc) → review+

:Felipe Gomes (needinfo for replies!)

Updated

•

6 years ago

Attachment #8965351 - Flags: review?(felipc) → review+

Stephen A Pohl [:spohl]

Assignee

Updated

•

6 years ago

Attachment #8965357 - Attachment description: 2. bug1366808-buildIdMismatch.diff → 2. Handle buildID mismatch

Stephen A Pohl [:spohl]

Assignee

Comment 68

•

6 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/0649f0d9884b6e6cf126db83589c6c2778df3a88
Bug 1366808: Add about:restartrequired page for situations when a background update has occurred and restarting Firefox is required to continue. r=felipe

https://hg.mozilla.org/integration/mozilla-inbound/rev/1fc82af3a155bcfcb4744a8462b1698f4a19e81c
Bug 1366808: Properly detect buildID mismatches between parent and child processes and display about:restartrequired to prompt the user to restart Firefox before proceeding. r=jimm,felipe,bz

https://hg.mozilla.org/integration/mozilla-inbound/rev/bf2262b6aca8650eb733e930e17886e3f3854d0c
Bug 1366808: Add telemetry probe for about:restartrequired page. r=jimm

Stephen A Pohl [:spohl]

Assignee

Comment 69

•

6 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/23f60e5acaa2543d623ae4cbe5a12331db1ce93c
Bug 1366808: Fix Eslint failures on inbound. CLOSED TREE r=me

Stephen A Pohl [:spohl]

Assignee

Comment 70

•

6 years ago

There appear to be some leaks on Windows that I haven't seen before. I've requested a backout of the patches and I'll take another look tomorrow.

Andreea Pavel [:apavel]

Comment 71

•

6 years ago

Backed out for Windows GPU leakchecks

Push that caused the failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=bf2262b6aca8650eb733e930e17886e3f3854d0c

Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=172789757&repo=mozilla-inbound&lineNumber=8368

Backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/5a326d6404a76e6c80e1a74eb7a8eb6b07943fad

Flags: needinfo?(spohl.mozilla.bugs)

Andreea Pavel [:apavel]

Comment 72

•

6 years ago

Also noticed these failures occurring: https://treeherder.mozilla.org/logviewer.html#?job_id=172789686&repo=mozilla-inbound&lineNumber=11220

Francesco Lodolo [:flod]

Comment 73

•

6 years ago

Comment on attachment 8965351 [details] [diff] [review]
1. about:restartrequired page

Review of attachment 8965351 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/locales/en-US/chrome/browser/aboutRestartRequired.dtd
@@ +5,5 @@
> +<!ENTITY restartRequired.title "Restart Required">
> +
> +<!ENTITY restartRequired.header "Sorry. We just need to do one small thing to keep going.">
> +<!ENTITY restartRequired.description "
> +<p>We have just installed an update in the background. Click Restart Firefox to finish applying it.</p>

Firefox should be &brandShortName; here

@@ +9,5 @@
> +<p>We have just installed an update in the background. Click Restart Firefox to finish applying it.</p>
> +<p>We will restore all your pages, windows and tabs afterwards, so you can be on your way quickly.</p>
> +">
> +
> +<!ENTITY restartRequired.label "Restart Firefox">

Please pick better string IDs. 

This is a button, restartButton.label sounds like a better choice than restartRequired.label

Attachment #8965351 - Flags: feedback-

Francesco Lodolo [:flod]

Comment 74

•

6 years ago

(In reply to Francesco Lodolo [:flod] from comment #73)
> Please pick better string IDs. 

This came out snarky… Background: localizers see one string at a time, in lack of localization comments, the string ID helps giving context to the message they're localizing.

Having consistent naming within the file (restartRequired.*) is less important than a string ID that clearly explain the role of the associated element.

Stephen A Pohl [:spohl]

Assignee

Comment 75

•

6 years ago

(In reply to Francesco Lodolo [:flod] from comment #74)
> (In reply to Francesco Lodolo [:flod] from comment #73)
> > Please pick better string IDs. 
> 
> This came out snarky… Background: localizers see one string at a time, in
> lack of localization comments, the string ID helps giving context to the
> message they're localizing.
> 
> Having consistent naming within the file (restartRequired.*) is less
> important than a string ID that clearly explain the role of the associated
> element.

No worries at all. I appreciate your clarifications. Ironically, earlier patches of mine actually had better string IDs. I changed the IDs to have consistent naming in the file after looking at the files for the "tabcrashed" page, where all IDs start with "tabCrashed.". That'll teach me... ;-)

I've already applied your suggested changes to the patch locally and will upload once I've isolated and fixed the failures that we were seeing on inbound yesterday.

Joel Maher ( :jmaher ) (UTC -8)

Comment 76

•

6 years ago

this bug looks to have generated an increase in a benchmark:
== Change summary for alert #12615 (as of Tue, 10 Apr 2018 01:59:28 GMT) ==

Improvements:

  3%  stylebench windows7-32 opt e10s stylo     50.22 -> 51.83

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=12615

Alessio Placitelli [:Dexter]

Comment 77

•

6 years ago

Hi Stephen and sorry for the fly-by ;)

Please note that that adding new probes requires a data-review from a Data Collection Peer (https://wiki.mozilla.org/Firefox/Data_Collection#Requesting_Approval).

Moreover, please note that "count" histograms are deprecated and should not be used (see https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/collection/histograms.html#count). You should be using scalars instead (that's something the data-review would have catched ;) )

Since this was backed out, it could be a good opportunity to fix the above :-P Please let me know if you have further questions!

Jim Mathies [:jimm]

Comment 78

•

6 years ago

(In reply to Alessio Placitelli [:Dexter] from comment #77)
> Hi Stephen and sorry for the fly-by ;)
> 
> Please note that that adding new probes requires a data-review from a Data
> Collection Peer
> (https://wiki.mozilla.org/Firefox/Data_Collection#Requesting_Approval).
> 
> Moreover, please note that "count" histograms are deprecated and should not
> be used (see
> https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/
> telemetry/collection/histograms.html#count). You should be using scalars
> instead (that's something the data-review would have catched ;) )
> 
> Since this was backed out, it could be a good opportunity to fix the above
> :-P Please let me know if you have further questions!

Oops, my mistake. I should have flagged this for data review instead of r+'ing.

Alessio Placitelli [:Dexter]

Comment 79

•

6 years ago

(In reply to Jim Mathies [:jimm] from comment #78)
> Oops, my mistake. I should have flagged this for data review instead of
> r+'ing.

No problem at all! This can happen :) I wish we had hooks in place to prevent this, but we're not there yet :(

Stephen A Pohl [:spohl]

Assignee

Comment 80

•

6 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f05bf493819ec804a129dc32501794acfe27b66e

Stephen A Pohl [:spohl]

Assignee

Comment 81

•

6 years ago

Attached patch 3. Start passing parentBuildID to GPU process (obsolete) — Details — Splinter Review

The GPU test failures on inbound were due to the fact that we didn't pass parentBuildID to the GPU process, resulting in a perceived buildID mismatch in the GPU process and a quick exit of said process. Green try run in comment 80.

Attachment #8965351 - Attachment is obsolete: true

Attachment #8965357 - Attachment is obsolete: true

Attachment #8965521 - Attachment is obsolete: true

Flags: needinfo?(spohl.mozilla.bugs)

Attachment #8968927 - Flags: review?(jmathies)

Stephen A Pohl [:spohl]

Assignee

Comment 82

•

6 years ago

Attached patch 1. Add about:restartrequired page (obsolete) — Details — Splinter Review

Addressed feedback and updated for current trunk. Carrying over r+.

Attachment #8968928 - Flags: review+

Stephen A Pohl [:spohl]

Assignee

Comment 83

•

6 years ago

Attached patch 2. Detect and handle buildID mismatches — Details — Splinter Review

Addressed feedback and updated for current trunk. Carrying over r+.

Attachment #8968929 - Flags: review+

Stephen A Pohl [:spohl]

Assignee

Comment 84

•

6 years ago

I'll be posting a new telemetry patch shortly and request the required data review. There is also a test-only patch forthcoming.

Stephen A Pohl [:spohl]

Assignee

Comment 85

•

6 years ago

Attached patch 4. Telemetry (obsolete) — Details — Splinter Review

Addressed feedback, carrying over r+. Will request data review next.

Attachment #8969344 - Flags: review+

Jim Mathies [:jimm]

Comment 86

•

6 years ago

Comment on attachment 8968927 [details] [diff] [review]
3. Start passing parentBuildID to GPU process

Review of attachment 8968927 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/ipc/GPUProcessHost.h
@@ +54,5 @@
>    // Launch the subprocess asynchronously. On failure, false is returned.
>    // Otherwise, true is returned, and the OnProcessLaunchComplete listener
>    // callback will be invoked either when a connection has been established, or
>    // if a connection could not be established due to an asynchronous error.
> +  bool Launch(StringVector aExtraOpts);

nit - comment about the new param

Attachment #8968927 - Flags: review?(jmathies) → review+

Stephen A Pohl [:spohl]

Assignee

Comment 87

•

6 years ago

Comment on attachment 8969344 [details] [diff] [review]
4. Telemetry

Review of attachment 8969344 [details] [diff] [review]:
-----------------------------------------------------------------

Request for data collection review form

All questions are mandatory. You must receive review from a data steward peer on your responses to these questions before shipping new data collection.

1. What questions will you answer with this data?

This data is to maintain the ability to detect the frequency of buildID mismatches between parent and child processes. The frequency of these mismatches could previously only be analyzed via crash stats on Socorro. Telemetry is a better and more appropriate way of gathering/analyzing this data.

2. Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements?

We need to know the frequency of buildID mismatches occurring in the wild. Once this bug (bug 1366808) lands, we will start displaying an error page. Currently, this is believed to be a sufficient remedy because it is believed that the frequency of these mismatches is small. Should the frequency turn out to be higher than expected, we may have to invest additional resources to find different technical remedies.

3. What alternative methods did you consider to answer these questions? Why were they not sufficient?

The only other method was the previous method via crash stats in Socorro, which isn't an appropriate way of gathering/analyzing this data.

4. Can current instrumentation answer these questions?

No (there are no existing telemetry probes that can be used to answer this question).

5. List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories on the Mozilla wiki.

Measurement Description: Frequency of buildID mismatches
Data Collection Category: Category 1
Tracking Bug #: Bug 1366808

6. How long will this data be collected? Choose one of the following:

I want to permanently monitor this data, as changes to installers/updaters may change the frequency of buildID mismatches.

7. What populations will you measure?

All

8. If this data collection is default on, what is the opt-out mechanism for users?

Default opt-out mechanism for opt-out telemetry probes.

9. Please provide a general description of how you will analyze this data.

This telemetry probe maintains the ability to detect the frequency of buildID mismatches between parent and child processes that was provided by Socorro and crash reports before. Interested parties can analyze this data using dashboards etc.

10. Where do you intend to share the results of your analysis?

Internally, to guide decisions about allocation of engineering resources to develop additional remedies.

Attachment #8969344 - Flags: review?(chutten)

Stephen A Pohl [:spohl]

Assignee

Comment 88

•

6 years ago

Attached patch 3. Start passing parentBuildID to GPU process — Details — Splinter Review

Addressed feedback, carrying over r+.

Attachment #8968927 - Attachment is obsolete: true

Attachment #8969360 - Flags: review+

Chris H-C :chutten

Comment 89

•

6 years ago

Comment on attachment 8969344 [details] [diff] [review]
4. Telemetry

Review of attachment 8969344 [details] [diff] [review]:
-----------------------------------------------------------------

This isn't the actual data review, but a drive-by. The data review'll be the next comment or so.

::: browser/modules/ContentCrashHandlers.jsm
@@ +331,5 @@
> +
> +    // Make sure to only count once even if there are multiple windows
> +    // that will all show about:restartrequired.
> +    if (this._crashedTabCount == 1) {
> +      Services.telemetry.keyedScalarAdd("dom.contentprocess.buildID_mismatch",

This isn't a keyed scalar, so this likely will error out and fail to work. You probably want scalarAdd instead.

You may wish to add an automated test to ensure this continues to be accumulated in the way you expect even as others manipulate the code. You can read parent-process scalars using Services.telemetry.snapshotScalars(Ci.nsITelemetry.DATASET_RELEASE_CHANNEL_OPTIN) and looking in the "parent" portion for the scalar by name.

Attachment #8969344 - Flags: review?(chutten) → review-

Chris H-C :chutten

Comment 90

•

6 years ago

DATA COLLECTION REVIEW RESPONSE:

Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate? (see here, here, and here for examples). Refer to the appendix for "documentation" if more detail about documentation standards is needed.

Yes, standard Telemetry considerations apply.

Is there a control mechanism that allows the user to turn the data collection on and off? (Note, for data collection not needed for security purposes, Mozilla provides such a control mechanism) Provide details as to the control mechanism available.

Yes, standard Telemetry considerations apply.

If the request is for permanent data collection, is there someone who will monitor the data over time?**

Yes, :spohl.

Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1. (You could -almost- make a case for Category 2 given that it's user interaction that installs the update that causes the mismatch... but that's a weak case, so Cat1 it is)

Is the data collection request for default-on or default-off?

Default-on

Does the instrumentation include the addition of any new identifiers (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)?

No, just a count.

Is the data collection covered by the existing Firefox privacy notice?

Yes.

Does there need to be a check-in in the future to determine whether to renew the data?

No.

---
Result: datareview+

Stephen A Pohl [:spohl]

Assignee

Comment 91

•

6 years ago

Thank you for your feedback!

(In reply to Chris H-C :chutten from comment #90)
> DATA COLLECTION REVIEW RESPONSE:
> 
>     Using the category system of data types on the Mozilla wiki, what
> collection type of data do the requested measurements fall under?
> 
> Category 1. (You could -almost- make a case for Category 2 given that it's
> user interaction that installs the update that causes the mismatch... but
> that's a weak case, so Cat1 it is)

This bug actually deals with background updates that didn't involve user interaction, with the exception of some extremely rare edge cases when a developer intentionally triggered an update with running Firefox processes. These situations used to crash the browser. We now display an about: page instead. Hopefully, this makes it a bit clearer why Cat1 was suggested for this telemetry probe.

Chris H-C :chutten

Comment 92

•

6 years ago

...you know what, I thought I'd written "it's user interaction that spawns the updated content process" until I reread that right now. So not only was I editorializing for no purpose, I wrote it poorly. :(

I'm sorry my (incorrectly-written and unnecessary) parenthetical ramblings made it sound as though I was casting doubt on your categorization of the data. To be clear (for once): I'm not. Your decision was perfectly correct, and I support it 100%.

Stephen A Pohl [:spohl]

Assignee

Comment 93

•

6 years ago

Attached patch 1. Add about:restartrequired page — Details — Splinter Review

Attachment #8968928 - Attachment is obsolete: true

Attachment #8970396 - Flags: review+

Stephen A Pohl [:spohl]

Assignee

Comment 94

•

6 years ago

Attached patch 4. Telemetry (obsolete) — Details — Splinter Review

Attachment #8969344 - Attachment is obsolete: true

Attachment #8970401 - Flags: review?(chutten)

Stephen A Pohl [:spohl]

Assignee

Comment 95

•

6 years ago

Attached patch 5. Tests (obsolete) — Details — Splinter Review

Attachment #8970544 - Flags: review?(felipc)

Stephen A Pohl [:spohl]

Assignee

Comment 96

•

6 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=0cb268568ccbac48143a9dfcf7e991730233e4c8

Stephen A Pohl [:spohl]

Assignee

Comment 97

•

6 years ago

Changing product & component since the current fix is now in IPC, rather than app update.

Component: Application Update → IPC

Product: Toolkit → Core

Stephen A Pohl [:spohl]

Assignee

Comment 98

•

6 years ago

Attached patch 5. Tests — Details — Splinter Review

Remove unused var in test.

Attachment #8970544 - Attachment is obsolete: true

Attachment #8970544 - Flags: review?(felipc)

Attachment #8970560 - Flags: review?(felipc)

Stephen A Pohl [:spohl]

Assignee

Comment 99

•

6 years ago

Attached patch 4. Telemetry — Details — Splinter Review

Updated telemetry test to only check preconditions that we control in the test itself.

Attachment #8970401 - Attachment is obsolete: true

Attachment #8970401 - Flags: review?(chutten)

Attachment #8970573 - Flags: review?(chutten)

Stephen A Pohl [:spohl]

Assignee

Comment 100

•

6 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=dbd297d0ee4879f5762761dee9143f087beef07c

Chris H-C :chutten

Updated

•

6 years ago

Attachment #8970573 - Flags: review?(chutten) → review+

:Felipe Gomes (needinfo for replies!)

Updated

•

6 years ago

Attachment #8970560 - Flags: review?(felipc) → review+

Stephen A Pohl [:spohl]

Assignee

Comment 101

•

6 years ago

Thanks, everyone! Release management has requested that we hold off on landing these patches for now due to soft code freeze. I'm planning to land these patches on May 8th.

Jim Mathies [:jimm]

Updated

•

6 years ago

status-firefox53: affected → wontfix

status-firefox55: affected → wontfix

status-firefox57: affected → wontfix

status-firefox58: affected → wontfix

status-firefox59: affected → wontfix

status-firefox60: affected → wontfix

status-firefox61: --- → wontfix

Stephen A Pohl [:spohl]

Assignee

Comment 102

•

6 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/805ebcdc2928e34a283deaeb41dde1fa6316dd20
Bug 1366808: Add about:restartrequired page for situations when a background update has occurred and restarting Firefox is required to continue. r=felipe

https://hg.mozilla.org/integration/mozilla-inbound/rev/8a249088360fdd7fe5c8b962e96a4415dbf71289
Bug 1366808: Properly detect buildID mismatches between parent and child processes and display about:restartrequired to prompt the user to restart Firefox before proceeding. r=jimm,felipe,bz

https://hg.mozilla.org/integration/mozilla-inbound/rev/bf0dad1ea2c4891fd0f47f3a20c066259ea08bb7
Bug 1366808: Start passing parentBuildID to GPU process to detect buildID mismatches. r=jimm

https://hg.mozilla.org/integration/mozilla-inbound/rev/65f5cd4ca5d8776c9d958a44af388c8c1f2ece88
Bug 1366808: Add telemetry probe for about:restartrequired page. r=jimm,chutten

https://hg.mozilla.org/integration/mozilla-inbound/rev/eb036f55167d9369d12c131ae019a1a633986009
Bug 1366808: Add tests for about:restartrequired page. r=felipe

Andrei Ciure[:aciure]

Comment 103

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/805ebcdc2928
https://hg.mozilla.org/mozilla-central/rev/8a249088360f
https://hg.mozilla.org/mozilla-central/rev/bf0dad1ea2c4
https://hg.mozilla.org/mozilla-central/rev/65f5cd4ca5d8
https://hg.mozilla.org/mozilla-central/rev/eb036f55167d

Status: NEW → RESOLVED

Closed: 6 years ago

status-firefox62: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla62

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

status-firefox-esr52: --- → wontfix

status-firefox-esr60: --- → wontfix

Gingerbread Man

Updated

•

6 years ago

Depends on: 1460127

Jim Mathies [:jimm]

Comment 104

•

6 years ago

Confirming using latest Windows Nightly, this work! \o/

Nils Ohlmeier [:drno]

Comment 105

•

6 years ago

I can also confirm that it works, but it took some explanation from MattN for me to realize why Firefox was showing me this "aggressive" update message. Just by itself I didn't understand that the message was there to protect me from further harm.

•

6 years ago

Blocks: 1474386

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

No longer blocks: 1474386

Jared Wein [:jaws] (please needinfo? me)

Updated

•

6 years ago

Depends on: 1475873

Robin Steuber (they/them) [:bytesized]

I keep running into this in Firefox ESR 60 on debian linux. For ESR users, it's a long wait for the fix.

Report ID Date Submitted
bp-f11eb6c7-22dc-40c6-84b0-d856c0190527
5/27/19 9:35 AM
bp-2484b95e-bf92-455c-af71-dcf7f0190513
5/13/19 9:42 AM
bp-00d3603f-7a8c-44de-bc40-b32a40190405
4/5/19 12:19 PM
bp-a3ad02fe-4d7e-463a-a71f-acce10190401
4/1/19 1:34 PM
bp-ac03cedd-7874-4cfe-8af7-8635c0190304
3/4/19 12:01 PM
bp-db919a48-cd8e-47a7-bf5a-e68590190211
2/11/19 12:35 PM

BugBot [:suhaib / :marco/ :calixte]

Updated

•

5 years ago

Keywords: regression

Jim Mathies [:jimm]

Updated

•

5 years ago

Updated

•

4 years ago

Regressions: 1594325

•

2 years ago

No longer depends on: 1464441

Regressions: 1464441

Mathew Hodson

Updated

•

2 years ago

No longer regressions: 1594325

Mathew Hodson

Updated

•

2 years ago

Blocks: 1685387, 1475873

No longer depends on: 1475873, 1685387

chrome page mockup 6 years ago Stephen A Pohl [:spohl] 22.05 KB, image/png		Details
update applied 6 years ago Bram Pitoyo [:bram] 92.89 KB, image/png		Details
1. about:restartrequired page 6 years ago Stephen A Pohl [:spohl] 13.77 KB, patch	Felipe : review+ flod : feedback-	Details \| Diff \| Splinter Review
2. Handle buildID mismatch 6 years ago Stephen A Pohl [:spohl] 27.01 KB, patch	bzbarsky : review+ jimm : review+ Felipe : review+	Details \| Diff \| Splinter Review
3. Telemetry probe 6 years ago Stephen A Pohl [:spohl] 3.43 KB, patch	jimm : review+	Details \| Diff \| Splinter Review
3. Start passing parentBuildID to GPU process 6 years ago Stephen A Pohl [:spohl] 3.58 KB, patch	jimm : review+	Details \| Diff \| Splinter Review
1. Add about:restartrequired page 6 years ago Stephen A Pohl [:spohl] 13.53 KB, patch	spohl : review+	Details \| Diff \| Splinter Review
2. Detect and handle buildID mismatches 6 years ago Stephen A Pohl [:spohl] 26.92 KB, patch	spohl : review+	Details \| Diff \| Splinter Review
4. Telemetry 6 years ago Stephen A Pohl [:spohl] 2.27 KB, patch	spohl : review+ chutten : review-	Details \| Diff \| Splinter Review
3. Start passing parentBuildID to GPU process 6 years ago Stephen A Pohl [:spohl] 3.67 KB, patch	spohl : review+	Details \| Diff \| Splinter Review
1. Add about:restartrequired page 6 years ago Stephen A Pohl [:spohl] 13.53 KB, patch	spohl : review+	Details \| Diff \| Splinter Review
4. Telemetry 6 years ago Stephen A Pohl [:spohl] 4.15 KB, patch		Details \| Diff \| Splinter Review
5. Tests 6 years ago Stephen A Pohl [:spohl] 7.45 KB, patch		Details \| Diff \| Splinter Review
5. Tests 6 years ago Stephen A Pohl [:spohl] 7.40 KB, patch	Felipe : review+	Details \| Diff \| Splinter Review
4. Telemetry 6 years ago Stephen A Pohl [:spohl] 4.06 KB, patch	chutten : review+	Details \| Diff \| Splinter Review