Last Comment Bug 731613 - No way to exit safe mode after it's triggered by crashes when Nightly is started with the -profilemanager switch.
: No way to exit safe mode after it's triggered by crashes when Nightly is star...
Status: VERIFIED FIXED
: regression
Product: Toolkit
Classification: Components
Component: Startup and Profile System (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla14
Assigned To: Matthew N. [:MattN] (In Taipei until Sep. 23)
:
Mentors:
Depends on:
Blocks: 294260
  Show dependency treegraph
 
Reported: 2012-02-29 07:47 PST by ferongr
Modified: 2012-03-28 15:19 PDT (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed


Attachments
v.1 Don't check if last startup crashed if XRE_PROFILE_PATH is set (1.33 KB, patch)
2012-03-15 00:35 PDT, Matthew N. [:MattN] (In Taipei until Sep. 23)
dtownsend: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
Screenshot of telemetry showing drop in STARTUP_CRASH_DETECTED (71.07 KB, image/png)
2012-03-27 14:15 PDT, Matthew N. [:MattN] (In Taipei until Sep. 23)
no flags Details

Description ferongr 2012-02-29 07:47:13 PST
STR:

1: Start Nightly with a new profile
2: Kill the process
3: Start it again and kill it again
4: Start Nightly again. Safe mode dialog should show up, click Continue...
5: Exit Nightly normally (i.e. not killing it)
6: Start it again

Expected result: Start normally.

Actual result: Safe mode dialog shows up again.

Doing some digging into the prefs it seems like that "toolkit.startup.recent_crashes" does not expire after it goes over the threshold set "toolkit.startup.max_resumed_crashes" (by default 2) when a successful startup happens, and neither does the value in "toolkit.startup.last_success" change.

In effect that means that no matter how many times you open close and close Fx in quick succession, you won't be able to exit safe mode. The only way is to either wait a long time for recent_crashes to lower by itself or reset that pref and quickly restart before it's restored. This has the potential to create confusion (it certainly confused me).

Furthermore, it seems that when you restart before recent_crashes expires and is subsequently removed, the pref is re set again at +1 the value max_resume_crashes is set.

Changing the value of max_resumed_crashes while recent_crashes exists will also change recent_crashes to +1 of that value.

This code is whack, yo!
Comment 1 Alice0775 White 2012-02-29 08:22:17 PST
confirmed
Safe mode dialog shows up again.
http://hg.mozilla.org/mozilla-central/rev/30b4f99a137c
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20120229 Firefox/13.0a1 ID:20120229031108
Comment 2 Matthew N. [:MattN] (In Taipei until Sep. 23) 2012-02-29 16:07:02 PST
(In reply to John Volikas from comment #0)
> STR:
> 
> 1: Start Nightly with a new profile

Are you sure you didn't do a successful startup before the first time that you killed firefox?  Startup crash tracking is skipped if there is no last_success pref set so these STR shouldn't trigger safe mode at all as written.

> 2: Kill the process
> 3: Start it again and kill it again

I'm assuming you're missing another "Start it again and kill it again" here in order to go over the threshold.

> Doing some digging into the prefs it seems like that
> "toolkit.startup.recent_crashes" does not expire after it goes over the
> threshold set "toolkit.startup.max_resumed_crashes" (by default 2) when a
> successful startup happens, and neither does the value in
> "toolkit.startup.last_success" change.

That's correct, recent_crashes is a counter so it keeps incrementing until a successful startup (when last_success changes).  This shouldn't cause the bug you're reporting though.

> In effect that means that no matter how many times you open close and close
> Fx in quick succession, you won't be able to exit safe mode. The only way is
> to either wait a long time for recent_crashes to lower by itself or reset
> that pref and quickly restart before it's restored. This has the potential
> to create confusion (it certainly confused me).

There is a bug so this is not the intended behaviour.

> Furthermore, it seems that when you restart before recent_crashes expires
> and is subsequently removed, the pref is re set again at +1 the value
> max_resumed_crashes is set.

Perhaps part of the confusion is that the recent_crashes pref is only incremented on the startup following a crash (as we obviously can't increment the counter when we crash). This sounds like intended behaviour if last_success doesn't match the previous startup and thus a crash is detected.

> Changing the value of max_resumed_crashes while recent_crashes exists will
> also change recent_crashes to +1 of that value.

When are you changing the value, before or during Firefox execution?  When does the recent_crashes get incremented, in regular startup or safe mode?

> This code is whack, yo!

Based on your analysis, I suspect there is a bit of a misunderstanding of how it's supposed to work. I agree there is a bug though but I need more data since I can't reproduce the problem.

After step 5 can you gather the following information: 
1) The modification time of parent.lock in your profile folder.
2) all toolkit.startup.* prefs from prefs.js in your profile folder.

Then start the browser and confirm that you were forced into safe mode again.
Comment 3 Matthew N. [:MattN] (In Taipei until Sep. 23) 2012-03-02 16:13:25 PST
If anyone can reproduce the problem using the modified STR in comment 2 and include the following information before and after step 5 then it would be very helpful.  

* parent.lock file modification time
* toolkit.startup.* values from prefs.js

I'd like to get this fixed before it moves to Aurora. Thanks.
Comment 4 ferongr 2012-03-02 16:43:28 PST
Apoogies, for some reason bugmail didn't notify me of Comment 2.

After entering safe-mode for the first time
------------------------------------------
Today, 03 March, 2012, 02:39:34

user_pref("toolkit.startup.last_success", 1330735135);
user_pref("toolkit.startup.recent_crashes", 3);


After graceful exit
-------------------
Today, 03 March, 2012, 02:39:34

user_pref("toolkit.startup.last_success", 1330735173);
user_pref("toolkit.startup.recent_crashes", 2);


After next startup, getting prompted for safemode again
-------------------------------------------------------
Today, 03 March, 2012, 02:40:49

user_pref("toolkit.startup.last_success", 1330735173);
user_pref("toolkit.startup.recent_crashes", 3);
Comment 5 ferongr 2012-03-02 16:59:00 PST
My original STR are flawed. Revised STR:

1: Create new profile
2: Start Fx
3: Close Fx normally (to save post first run config?)
4: Start Fx
5: Kill Fx
6: Repeat 4 and 5. Note that as I wrote the original comment, only 2kills are necessary to see the dialog. I'm not sure why, but according to the default setting it should take 3.
7: Start Fx. See dialog, click "Continue in Safe mode"
8: Quit normally (Meny->Exit or window controls)
9: Start Fx. See dialong...
Comment 6 Matthew N. [:MattN] (In Taipei until Sep. 23) 2012-03-02 17:16:46 PST
Thank you very much John

What operating system are you using?

Can you startup Firefox from the command-line after the graceful exit and see if you get any NS_WARNING's related to startup crash detection or "nsAppStartup.cpp"?

Also, are you seeing the crash reporter at all during the STR?

I'm assuming you are at GMT+0200.  "After graceful exit [from safe mode]"  toolkit.startup.last_success == Today, 03 March, 2012, 02:39:33 GMT+0200 and since there is a 10s buffer allowed, we should detect that it was a graceful shutdown so I'm not sure why it's incrementing from 2 to 3 afterwards.

To clarify some confusion from earlier, the reason "toolkit.startup.recent_crashes" = 2 after existing safe mode is because we want to give the user one more chance in normal mode in case they didn't fix their problem in safe mode. Therefore, it gets set to max_resumed_crashes. Where max_resumed_crashes defaults to 2. We go into safe mode when recent_crashes > max_resumed_crashes.

(In reply to John Volikas from comment #5)
> 3: Close Fx normally (to save post first run config?)
Yes, this sets toolkit.startup.last_success for the first time. Without toolkit.startup.last_success set we can't detect startup crashes.

> 4: Start Fx
> 5: Kill Fx
> 6: Repeat 4 and 5. Note that as I wrote the original comment, only 2kills
> are necessary to see the dialog. I'm not sure why, but according to the
> default setting it should take 3.

Hmm.  That's interesting and may be useful for diagnosis.
Comment 7 ferongr 2012-03-02 17:33:41 PST
OS is XP SP3

See updated description. Shame on me for not crossing my mind to try and reproduce removing the switch, and to include that in the bug info.

To summarize. Everything works as intended when not using the profile manager. 3 kills are needed, and after one graceful exit you don't get the safe-mode dialog anymore.

When using a shortcut with the -profilemanager switch to start Fx every time is where the bug appers.

Apologies for the confusion :(
Comment 8 Matthew N. [:MattN] (In Taipei until Sep. 23) 2012-03-02 17:39:45 PST
OK, thanks very much for your investigation.  I understand the problem now and will have a patch shortly.
Comment 9 Matthew N. [:MattN] (In Taipei until Sep. 23) 2012-03-15 00:35:45 PDT
Created attachment 606130 [details] [diff] [review]
v.1 Don't check if last startup crashed if XRE_PROFILE_PATH is set

This solution should work as long as the XRE_PROFILE_PATH doesn't persist after a crash on some platform. I'm going to test that more. The result is that someone who uses the profile manager for every startup doesn't get startup crash tracking which I think is acceptable since profile manager is already an edge case that will be replaced (see bug 214675).

Other solutions I considered:
1) Add a method to the profile lock to be able to detect if the profile is locked without actually locking it.  This will also fix bug 726759.
2) Add an argument to unlock which also deletes the lock file. – This seems like it's exposing an implementation detail unnecessarily.
3) Through some code which can get an nsILocalFile for the lock file, revert the modification time after unlocking for the profile manager.
4) Add a new environment variable or command line argument to indicate that the application was launched from the profile manager.

Solution #1 and #3 support startup crash tracking for people using the profile manager. The others would have to skip checking the previous startup.

I think the attached patch is lower risk than most alternatives which would be suitable for Aurora uplift.
Comment 10 Matthew N. [:MattN] (In Taipei until Sep. 23) 2012-03-15 19:51:55 PDT
Note that until this is fixed, the telemetry probe STARTUP_CRASH_DETECTED reports a crash anytime the profile manager is used.
Comment 11 Matthew N. [:MattN] (In Taipei until Sep. 23) 2012-03-21 13:32:20 PDT
Thanks.

https://hg.mozilla.org/integration/mozilla-inbound/rev/610a13a58050
Comment 12 Marco Bonardo [::mak] 2012-03-22 06:34:00 PDT
https://hg.mozilla.org/mozilla-central/rev/610a13a58050
Comment 13 Matthew N. [:MattN] (In Taipei until Sep. 23) 2012-03-26 15:33:21 PDT
Comment on attachment 606130 [details] [diff] [review]
v.1 Don't check if last startup crashed if XRE_PROFILE_PATH is set

[Approval Request Comment]
Regression caused by (bug #): 294260
User impact if declined: Users of the profile manager will get forced safe mode earlier than expected.  Once forced safe mode is triggered, the user will continually start in safe mode if they continue using the profile manager (rather than getting one chance in normal mode).
Testing completed (on m-c, etc.): m-c (comment 12)
Risk to taking this patch (and alternatives if risky): Possibility that startup crash detection will be disabled in some additional cases.
String changes made by this patch: None
Comment 14 Matthew N. [:MattN] (In Taipei until Sep. 23) 2012-03-27 14:15:26 PDT
Created attachment 609880 [details]
Screenshot of telemetry showing drop in STARTUP_CRASH_DETECTED

Telemetry confirms that the number of detected startup crashes was approximately cut in half after this landed on m-c.
Comment 15 Alex Keybl [:akeybl] 2012-03-28 14:56:25 PDT
Comment on attachment 606130 [details] [diff] [review]
v.1 Don't check if last startup crashed if XRE_PROFILE_PATH is set

[Triage Comment]
Only risk is to changing the startup crash coverage - which wouldn't be a regression at the time of final release. Approved for Aurora 13.
Comment 16 Matthew N. [:MattN] (In Taipei until Sep. 23) 2012-03-28 15:19:30 PDT
Thanks.

https://hg.mozilla.org/releases/mozilla-aurora/rev/787aedcb98eb

Note You need to log in before you can comment on or make changes to this bug.