Closed Bug 640652 Opened 9 years ago Closed 9 years ago

Firefox 4.0 Crash [@ nsEditor::Init(nsIDOMDocument*, nsIPresShell*, nsIContent*, nsISelectionController*, unsigned int) ] with nsXULElement::UnsetAttr on the stack

Categories

(Core :: DOM: Editor, defect, critical)

x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla5
Tracking Status
blocking2.0 --- Macaw+
status2.0 --- .1-fixed

People

(Reporter: marcia, Assigned: ehsan)

References

Details

(5 keywords, Whiteboard: [watching crash-stats][fx4-rc-ridealong][regressed after b12])

Crash Data

Attachments

(2 files, 3 obsolete files)

Seen while reviewing RC crash stats. Currently ranked #4 in early crash stats. http://tinyurl.com/4scvp26 to the crashes which are all Windows. Comments: "Crashed while attempting to move toolbars" or doing something with the menu bar.

https://crash-stats.mozilla.com/report/index/390769b4-e17d-44b3-b6b2-3df382110309

Frame 	Module 	Signature [Expand] 	Source
0 	xul.dll 	nsEditor::Init 	editor/libeditor/base/nsEditor.cpp:262
1 	xul.dll 	nsPlaintextEditor::Init 	editor/libeditor/text/nsPlaintextEditor.cpp:157
2 	xul.dll 	nsTextEditorState::PrepareEditor 	content/html/content/src/nsTextEditorState.cpp:1234
3 	xul.dll 	PrepareEditorEvent::Run 	content/html/content/src/nsTextEditorState.cpp:1053
4 	xul.dll 	nsContentUtils::RemoveScriptBlocker 	content/base/src/nsContentUtils.cpp:4728
5 	xul.dll 	nsXULElement::UnsetAttr 	content/xul/content/src/nsXULElement.cpp:1499
6 	xul.dll 	nsXULElement::RemoveAttribute 	content/xul/content/src/nsXULElement.h:561
7 	xul.dll 	nsIDOMElement_RemoveAttribute 	obj-firefox/js/src/xpconnect/src/dom_quickstubs.cpp:4621
8 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:4799
9 	mozjs.dll 	js::RunScript 	js/src/jsinterp.cpp:653
10 	mozjs.dll 	js::Invoke 	js/src/jsinterp.cpp:740
11 	mozjs.dll 	js::ExternalInvoke 	js/src/jsinterp.cpp:863
12 	mozjs.dll 	JS_CallFunctionValue 	js/src/jsapi.cpp:5173
13 	xul.dll 	nsXPCWrappedJSClass::CallMethod 	js/src/xpconnect/src/xpcwrappedjsclass.cpp:1672
14 	xul.dll 	nsXPCWrappedJS::CallMethod 	js/src/xpconnect/src/xpcwrappedjs.cpp:588
15 	xul.dll 	PrepareAndDispatch 	xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:114
16 	xul.dll 	SharedStub 	xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:141
17 	xul.dll 	nsEventListenerManager::HandleEventSubType 	content/events/src/nsEventListenerManager.cpp:1127
low volume residual crash that exploded yesterday with the release of 4.0 rc1

20110301 1 3.6.132010120307 	1 , 
20110302 1 4.0b13pre2011030203 	1 , 
20110303   
20110304 4  	3 3.6.142011021812, 
        		1 4.0b13pre2011030312, 
20110305   
20110306 2  	1 4.0b122011022221, 
        		1 4.02011030319, 
20110307 1 4.0b122011022221 	1 , 
20110308 4  	2 4.02011030319, 
        		1 4.0b122011022221, 	1 3.6.132010120307, 
20110309 34  	24 4.02011030319, 
        		7 4.0b122011022221, 	2 4.0b13pre2011030803, 
        		1 4.0b13pre2011030312, 

volume continues high as of mid-day today another 34 crashes.

Correlation to startup or time of session
34 total crashes for nsEditor::Init.nsIDOMDocument...nsIPresShell...nsIContent...nsISelectionController...unsigned.int. on 20110309-crashdata.csv
13 startup crashes inside 30 sec.
20 startup crashes inside 3 min.
15 repeated crashes inside 3 min. of last crash

os breakdown
nsEditor::Init.nsIDOMDocument...nsIPresShell...nsIContent...nsISelectionController...unsigned.int.Total 34
Win5.1  0.32
Win6.0  0.09
Win6.1  0.59

wide distribution in the urls with no pattern there.  I think the user comments about interaction with the menu bar are the best thing to go on:

menu bar crashed it
Crashed while attempting to move toolbars.
I was toggling the menu toolbar.
I moved the address-tool to the menu bar...
possible RC regression that we need to watch closely.
blocking2.0: --- → ?
safe null crash looking at the stack
Ehsan: can you look?
Whiteboard: [watching crash-stats]
Assignee: nobody → ehsan
I could reproduce this on Windows 7 using these steps:

1. Go to the toolbar customization dialog, and drag the address bar to the menu bar.
2. Close the customization dialog.
3. From Firefox > Options, select Menu Bar.  Firefox crashes: bp-6f74d9f8-3b90-4824-a589-736232110310

I still don't know if this is the only case where this could happen.  On the surface, the crash looks bad, but if further investigation shows that this is the only case where this could happen, then this doesn't block Firefox 4 final.

I need more data here; investigating.
Keywords: reproducible
A bad thing about this crash is that if the user manages to keep the menu bar shown, they will probably keep hitting this over and over again on every startup.
I can confirm that this crash doesn't happen in beta12.
Regression range:

http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=919f15d71153&tochange=e79b2b60d394

Nothing jumps at me in that range immediately.  Bisecting...
Bug 637214 has exposed this by adding a removable script blocker to nsXULElement::UnsetAttr: <http://hg.mozilla.org/mozilla-central/diff/23cf0cedfd4a/content/xul/content/src/nsXULElement.cpp>.

This code is triggered from updateAppButtonDisplay <http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser.js#5143>, which causes a scheduled PrepareEditorEvent to run when it's not supposed to.  I'm investigating why this happens...
Blocks: 637214
Keywords: regression
This stack shows where things start going bad.  http://pastebin.mozilla.org/1145111

Inside nsTextEditorState::PrepareEditor, we call nsTextControlFrame::UpdateValueDisplay, which sets the text on the textnode inside our anonymous div, and that triggers the wonderful XBL binding manager, which causes us to attach this binding chrome://global/content/bindings/toolbar.xml#toolbar, which triggers this code <http://mxr.mozilla.org/mozilla-central/source/toolkit/content/widgets/toolbar.xml#243> under this js stack:

0 set_currentSet(val = "menubar-items,urlbar-container") ["chrome://global/content/bindings/toolbar.xml":243]
    nodeToAdd = undefined
    children = [object NodeList @ 0x7a2ab28 (native @ 0x6093d50)]
    paletteChildren = [object NodeList @ 0x7a2a738 (native @ 0x5027378)]
    palette = [object XULElement @ 0x58e9038 (native @ 0x5028550)]
    added = [object Object]
    paletteItems = [object Object]
    nodeidx = 1
    ids = menubar-items,urlbar-container
    this = [object XULElement @ 0x5dd42b8 (native @ 0x5d19800)]
1 _init() ["chrome://global/content/bindings/toolbar.xml":172]
    currentSet = "menubar-items,urlbar-container"
    node = undefined
    toolbox = [object XULElement @ 0x5dcf438 (native @ 0x5d196b8)]
    this = [object XULElement @ 0x5dd42b8 (native @ 0x5d19800)]
2 () ["chrome://global/content/bindings/toolbar.xml":127]
    this = [object XULElement @ 0x5dd42b8 (native @ 0x5d19800)]

In short, we attempt to inject something before the textfield while PrepareEditor is in progress, which causes it to unbind the frame while PrepareEditor is in progress, which causes us to crash later on.
Attached patch Patch (v1) (obsolete) — Splinter Review
OK, here's the root cause of the problem.  Currently, the nsAutoRemovableScriptBlocker gets destroyed before the mozAutoDocUpdate object, which means that when nsDocument::EndDocument gets called, we're no longer under a script blocker.  We don't want that.

This patch fixes the crash.  I'm struggling to write a test for this.  I'll attach it if that effort porves fruitful.
Attachment #518542 - Flags: review?(jonas)
This crash only happens if you remove the chromemargins attribute from a XUL element which has an XBL binding which causes a reframe on a text control in its constructor... In other words, only in the STR in comment 5.  I'd say it should block .x, and given the trivial fix, it can be an RC2 ridealong, if there's ever going to be an RC2.
Status: NEW → ASSIGNED
blocking2.0: ? → .x+
Whiteboard: [watching crash-stats] → [watching crash-stats][fx4-rc-ridealong]
Getting this crash under a test seems to be a lot trickier than I realized...
(In reply to comment #11)
> OK, here's the root cause of the problem.  Currently, the
> nsAutoRemovableScriptBlocker gets destroyed before the mozAutoDocUpdate object,
> which means that when nsDocument::EndDocument gets called, we're no longer
> under a script blocker.  We don't want that.

The explanation of what's going wrong is, well, wrong!

When nsAutoRemovableScriptBlocker goes out of scope, the mozAutoDocUpdate has died, which means that the document update nesting level is 0, which means that the XBL construction stuff happens.  Changing the order of destruction for these two objects makes it so that the doc update nesting level wouldn't be 0 when the script blocker is being removed, which makes sure that nsDocument::EndUpdate won't be called down the line.

The patch is still correct though.  :-)
Attached file Stack in comment 10
Attachment #518582 - Attachment mime type: application/octet-stream → text/plain
Attached patch Patch (v2) (obsolete) — Splinter Review
Attachment #518542 - Attachment is obsolete: true
Attachment #518542 - Flags: review?(jonas)
Attachment #518583 - Flags: review?(jonas)
hsan,  are the problenms getting the crash to happen under automated tests
also mean there is some difficulty in users hitting it as well, or do you think
its easy for users to hit with a variety of menu/tool bar interactions?

seems like this should be in consideration to trigger the need for RC2 if its
fairly easy to hit.

A regression since the last beta in the RC that causes the kind of problems in
comment 6 looks pretty serious.  http://www.mozilla.com/en-US/firefox/RC/faq/
lists the kind of customizability in the crash comments as the #3 feature area
we are promoting.

We'll have some more data tomorrow morning to understand the frequency better.

volume has slowed in the last few hours.
(In reply to comment #17)
> hsan,  are the problenms getting the crash to happen under automated tests
> also mean there is some difficulty in users hitting it as well, or do you think
> its easy for users to hit with a variety of menu/tool bar interactions?

No, it's very easy for users to hit this.  But in order to hit this, users should hide/show the menubar (which triggers the display/hiding of the Firefox button).

The crash is 100% reproducible though, and as I said in comment 6, it can be hard to recover from.

> seems like this should be in consideration to trigger the need for RC2 if its
> fairly easy to hit.

Well, if users put either the search box or the address bar in the menu bar, then there is a very high chance that they would hit this crash.

> A regression since the last beta in the RC that causes the kind of problems in
> comment 6 looks pretty serious.  http://www.mozilla.com/en-US/firefox/RC/faq/
> lists the kind of customizability in the crash comments as the #3 feature area
> we are promoting.
> 
> We'll have some more data tomorrow morning to understand the frequency better.
> 
> volume has slowed in the last few hours.

I'm gonna renom this so that it shows up in the queries for tomorrow's triage.
> volume has slowed in the last few hours.

I was tricked by marcia's query that had a frozen date range. use this one instead to track recent reports and comments.

    https://crash-stats.mozilla.com/report/list?signature=nsEditor%3A%3AInit%28nsIDOMDocument*%2C%20nsIPresShell*%2C%20nsIContent*%2C%20nsISelectionController*%2C%20unsigned%20int%29&version=Firefox%3A4.0

already well over 130 crashes today.  guessing this will move higher in the rankings.
blocking2.0: .x+ → ?
If you sort by date you can see that its fewer people crashing over and over. Its probably people trying to fiddle with the toolbar and crashing and then eventually giving up twiddling with it (looks like 3-4 attempts until people give up).
Attachment #518583 - Flags: approval2.0?
(In reply to comment #21)
> If you sort by date you can see that its fewer people crashing over and over.
> Its probably people trying to fiddle with the toolbar and crashing and then
> eventually giving up twiddling with it (looks like 3-4 attempts until people
> give up).

That sounds like a plausible theory to me.  That said, this is now the #6 topcrasher on 2.0.
Comment on attachment 518583 [details] [diff] [review]
Patch (v2)

Approved for landing on mozilla-central (not release branch). The potential for conflicts with mobile code we've taken seems minimal, and nightly data will help us make decisions.
Attachment #518583 - Flags: approval2.0? → approval2.0+
Attached patch For check-in on mozilla-central (obsolete) — Splinter Review
Attachment #518583 - Attachment is obsolete: true
Keywords: checkin-needed
Attachment #518656 - Attachment is obsolete: true
http://hg.mozilla.org/mozilla-central/rev/e00e8ee0aeb7
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
We might want to keep this open since you are requesting a ride-along.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Can we get a regression window for this? How long has this been broken? And do we know what broke this? I think that would help assessing severity.
daily volume is in the second column of comment 1. there is a residual low volume underlying crash for several previous 4.0x betas.  it really didn't show up  in volume until the RC was pushed to the beta testers and it exposed the easier to reproduce crash that regressed since b12.

I'm not sure landing on mozilla-central is going to tell us much. those testers aren't as likely to be playing with the tool bar customization as new users that are seeing 4.0 for the first time.  at best we will just go back to seeing the residual crashes.
(In reply to comment #28)
> Can we get a regression window for this? How long has this been broken? And do
> we know what broke this? I think that would help assessing severity.

Comment 8 has the regression window, and comment 9 has the exact regression changeset.  We know that this is a regression from bug 637214.
Whiteboard: [watching crash-stats][fx4-rc-ridealong] → [watching crash-stats][fx4-rc-ridealong][fixed on mozilla-central]
And it has regressed after b12.  In other words, the first RC1 is the first release with this bug.
Whiteboard: [watching crash-stats][fx4-rc-ridealong][fixed on mozilla-central] → [watching crash-stats][fx4-rc-ridealong][fixed on m-c][regressed after b12]
(In reply to comment #29)
> daily volume is in the second column of comment 1. there is a residual low
> volume underlying crash for several previous 4.0x betas.  it really didn't show
> up  in volume until the RC was pushed to the beta testers and it exposed the
> easier to reproduce crash that regressed since b12.

Note that crashes with this signature on versions before 4.0 are *not* this bug.  This bug is about the crash stack in comment 0.  Specifically, frames 3, 4 and 5 of the stack in comment 0 are crucial for a crash to be this bug.

> I'm not sure landing on mozilla-central is going to tell us much. those testers
> aren't as likely to be playing with the tool bar customization as new users
> that are seeing 4.0 for the first time.  at best we will just go back to seeing
> the residual crashes.

Those residual crashes are different bugs, as I mentioned.  So we should just watch for stacks such as this one in the nightlies.  However, I agree that nightly users are not very likely to play with toolbar customization; otherwise we'd have probably discovered this before RC...

...which makes me think: this might be a crash that users who upgrade to 4.0 will hit a lot if they start changing the toolbar customization.  Specifically, this happens while showing the menu bar, so if a user doesn't like the Firefox button and want to go back to the menu bar, they are likely to hit this crash.
Summary: Firefox 4.0 Crash [@ nsEditor::Init(nsIDOMDocument*, nsIPresShell*, nsIContent*, nsISelectionController*, unsigned int) ] → Firefox 4.0 Crash [@ nsEditor::Init(nsIDOMDocument*, nsIPresShell*, nsIContent*, nsISelectionController*, unsigned int) ] with nsXULElement::UnsetAttr on the stack
just to be clear the volume of the crash is a function of the number of people
playing with toolbar customizations.

here is the scenario that we want to avoid.

1. people hear about firefox and think of upgrading.
2. the message in the press/blogs/and our faq is a) lots of changes to the UI
that we hope you like. but b) if you don't you can customize.
3. people download, have a look, decide they don't like, and start customizing
4. they crash and send us comments like:

-menu bar crashed it
-Crashed while attempting to move toolbars.
-I was toggling the menu toolbar.
-I moved the address-tool to the menu bar..
-wanted to show the menu from right click after customizing the toolbars
-Clicked 'Menu Bar' while customizing toolbars
-Playing around with customizing the toolbars when this occured.

5. and the crash again, and again
- I was experimenting with disabling the menu bar. This is the second crash in
a row doing that. The crash happened exactly when I tried to uncheck "Menu
bar".
- Third time for this error. I'm going to guess it happened because I had the
Add-ons Manager page open while trying to uncheck Menu Barin the Toolbars Menu
- Fourth time for this error. Seems not to be connected to have the Add-ons
Manager page opened as I surmised previously. Just had a Salon website page
open. I'm going to restart Firefox and then close it before trying this again.
- Fifth time with this error of crashing when trying to disable the Menu bar.
The crash happened while not doing anything and with only a blank page
displayed. I'll try a restart of Windows 7.

6. and if they are unlucky as mentioned in comment 6 they completely hose their
installation. 
--A bad thing about this crash is that if the user manages to keep the menu bar
shown, they will probably keep hitting this over and over again on every
startup.

all this adds up to a broken brand promise (users can customize if they don't
like the new UI) at a mininum, and potentially a lost user out of frustration
of crash early in adoption, in a highly visible area, and maybe a totally
unusable browser.

if this hits a reporter or blogger the message gets amplified.

date     crashes at
        
nsEditor::Init.nsIDOMDocument.,.nsIPresShell.,.nsIContent.,.nsISelectionController.,.unsigned.int.

20110301 1
20110302 1
20110303 5
20110304 4
20110305 0
20110306 2
20110307 2
20110308 5                   ~14,338 users
20110309 34 <- RC1 released  ~25,360 users
20110310 150                 ~320,448 users

 also another 42 crashes since midnight last night.
Keywords: qawanted
(In reply to comment #21)
> If you sort by date you can see that its fewer people crashing over and over.
> Its probably people trying to fiddle with the toolbar and crashing and then
> eventually giving up twiddling with it (looks like 3-4 attempts until people
> give up).

Do we know that these people who are crashing only 3-4 times aren't experiencing it on startup? If they do, that could be an indication that they're giving up on Firefox 4, rather than just giving up on the idea of customizing the toolbar.
I just did some testing. This is easy to reproduce.
1. Customize toolbar
2. Drag location or search bar into menu bar
3. Click done
4. Show the menu bar
5. Crash

On all of my tests, I get the crash reporter and click "Restart Firefox". When Firefox starts, my toolbars are reset and don't crash again. If however, I say, "that didn't stick, let me try it again" I'll certainly crash again (and again until I get tired of that and give up).
I don't see a lot of startup crashes for this. My menu changes didn't stick when I reproduced this bug and I didn't crash after restarting.
I'll revise my forumla for crash volume on this.

 crash_volume =   (volume & effectiveness, of the customizability mkting msg)
                * (users that aren't as comfortable with any of the new UI )
                * (users that learn of/play with customization in post b12 blds)

dail up any of those and crash volume increases.  the only control we have on this after release is the first item, and I'm not sure we want to take customization off the table.
(In reply to comment #34)
> (In reply to comment #21)
> > If you sort by date you can see that its fewer people crashing over and over.
> > Its probably people trying to fiddle with the toolbar and crashing and then
> > eventually giving up twiddling with it (looks like 3-4 attempts until people
> > give up).
> 
> Do we know that these people who are crashing only 3-4 times aren't
> experiencing it on startup? If they do, that could be an indication that
> they're giving up on Firefox 4, rather than just giving up on the idea of
> customizing the toolbar.

I agree with Verdi.  I haven't managed to trigger this crash at startup so far.

I think Verdi's theory in comment 35 makes sense.
Would it be a good idea for us to ask the QA team to verify this fix on mozilla-central, to make sure that all of the crashes when customizing the toolbar have been fixed?  (In case we decide to get this on RC2).
I have been testing Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b13pre) Gecko/20110311 Firefox/4.0b13pre this morning and haven't yet been able to repro the crash. We should make sure to check Vista and XP as there are crashes there too.

(In reply to comment #39)
> Would it be a good idea for us to ask the QA team to verify this fix on
> mozilla-central, to make sure that all of the crashes when customizing the
> toolbar have been fixed?  (In case we decide to get this on RC2).
The tests I did earlier were with RC1 on Windows 7.

I went back and tested RC1 on XP and can reproduce if you first hide the menu bar and then repeat the steps in Comment 35.
This really seems pretty bad to me, and it's very scary that we regressed this so late in the game which means that we have less data on how it affects people.

One possible solution would be to back out the patch in bug 637214. This would be safer and require less testing if we respin, but of course it also fixes an important bug.

I think I'm in favor of respinning to take this :(
Here's a risk analysis on the patch for this bug.

The patch modifies the UnsetAttr call on nsXULElement.  This affects all places in Gecko where we remove an attribute on a XUL element.  (Therefore, it has no effect on web content).

This bug can only be triggered when we remove the chromemargin attribute from a XUL element (most likely the document element).  The only place in Firefox when we do this <http://mxr.mozilla.org/mozilla-central/search?string=chromemargin> seems to be when calling updateAppButtonDisplay in browser.js <http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser.js#5143>, which is done when a new window is opened, when you toggle the display of a toolbar (including the menu bar), and when you enter and exit print preview.  The code in questions only runs on Windows and Linux, as far as I can tell.

The fix essentially makes sure that when the blocked script runners run, we can't execute scripts as a result of XBL bindings being instantiated.  The patch moves the auto update batch object a little bit earlier in the function.  So, the only potential effects of the patch can be on one of the below two function calls:

* nsDocument::RemoveFromIdTable <http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsDocument.cpp#2635>.  Investigating this function, the only interesting part seems to be the call to nsIdentifierMapEntry::FireChangeCallbacks.  The only cases where that is a not a no-op is when we have an HTML form element with an ID (which doesn't happen in Firefox's XUL code), or when we have SVG or SMILE effects which reference another element in the XUL document with an ID attribute, and that ID attribute is removed from the XUL document.  I investigated all of the SVG files which we use in the theme, but I couldn't find any place where we do this.

* nsAttrAndChildArray::IndexOfAttr <http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsAttrAndChildArray.cpp#538> which is no-op as far as this is concerned.


With this detailed analysis, I think that the patch is very safe to take on the relbranch.  I'm certain that this addresses the crash correctly, since I know the exact cause of the crash, and have verified the fix in extensive testing.  And it also preserves the protection added to nsXULElement::UnsetAttr in bug 637214.

So, to conclude, I'd recommend taking this on the relbranch if the drivers deem this bug as a blocker for the release of Firefox 4.
First and foremost, thanks for all the detailed analysis.

It is my understanding (confirmed by my own local testing) that this crash only happens:

 - on Windows only,
 - when a user customizes their Firefox UI such that the Location Bar or Search Bar (or presumably any other widget that taxes text input) is placed in the menuBar,
 - and when the user then toggles the setting of "View > Menu Bar"

Note that if the menuBar is hidden and the user customizes their UI to put the location bar in the menuBar, then presses Alt to show the menuBar, everything works as one might expect. It's only if the user then toggles the setting to show/hide the menuBar that there is a crash. When the user restores, in my experience (though this might be because I toggle the setting right away) the browser is restored to a good state previous to the customization that caused the crash.

This is definitely user-doc-needed so that the SUMO guys know about it, but there is no chance we would hold the release for it.

Feel free to renominate if I got any of that wrong. :)
blocking2.0: ? → .x+
(In reply to comment #40)
> I have been testing Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b13pre)
I guess I assumed this might be in the latest nightly, but after reproducing it using http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6a898157f842&tochange=823105711a3b and viewing http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6a898157f842&tochange=823105711a3b, I guess the checkin did not make the latest nightly.

> Gecko/20110311 Firefox/4.0b13pre this morning and haven't yet been able to
> repro the crash. We should make sure to check Vista and XP as there are crashes
> there too.
> 
> (In reply to comment #39)
> > Would it be a good idea for us to ask the QA team to verify this fix on
> > mozilla-central, to make sure that all of the crashes when customizing the
> > toolbar have been fixed?  (In case we decide to get this on RC2).
(In reply to comment #44)
> First and foremost, thanks for all the detailed analysis.
> 
> It is my understanding (confirmed by my own local testing) that this crash only
> happens:
> 
>  - on Windows only,
>  - when a user customizes their Firefox UI such that the Location Bar or Search
> Bar (or presumably any other widget that taxes text input) is placed in the
> menuBar,
>  - and when the user then toggles the setting of "View > Menu Bar"

That's correct.

> Note that if the menuBar is hidden and the user customizes their UI to put the
> location bar in the menuBar, then presses Alt to show the menuBar, everything
> works as one might expect. It's only if the user then toggles the setting to
> show/hide the menuBar that there is a crash.

Specifically, the thing which triggers the crash is toggling the display of the Firefox button (your description is also correct, wanted to be more explicit).

> When the user restores, in my
> experience (though this might be because I toggle the setting right away) the
> browser is restored to a good state previous to the customization that caused
> the crash.

Yes, I think that's correct, because the crash happens a bit before we get a chance to write the new state to disk for the next startup.

> This is definitely user-doc-needed so that the SUMO guys know about it, but
> there is no chance we would hold the release for it.

Should we also relnote it?

> Feel free to renominate if I got any of that wrong. :)

Seems like you've got all of the facts right.  :-)
blocking2.0: .x+ → ?
(In reply to comment #45)
> (In reply to comment #40)
> > I have been testing Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b13pre)
> I guess I assumed this might be in the latest nightly, but after reproducing it
> using
> http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6a898157f842&tochange=823105711a3b
> and viewing
> http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6a898157f842&tochange=823105711a3b,
> I guess the checkin did not make the latest nightly.

Yes, unfortunately the nightly was built from the previous check-in.  The fix will be in tomorrow's nightly.
Why was this re-nominated again?
(In reply to comment #48)
> Why was this re-nominated again?

Sorry, my mistake.
blocking2.0: ? → .x+
(In reply to comment #46)
> (In reply to comment #44)
> > This is definitely user-doc-needed so that the SUMO guys know about it, but
> > there is no chance we would hold the release for it.

Verdi, let's discuss how we'll document this on SUMO in http://support.mozilla.com/forums/knowledge-base-articles. This should probably be a new article, but we'll have to make it clear for localizers that this bug won't exist very long so they're aware of that when localizing it...

> 
> Should we also relnote it?

I definitely think we should.
(In reply to comment #44)
> First and foremost, thanks for all the detailed analysis.
> 
> It is my understanding (confirmed by my own local testing) that this crash only
> happens:
> 
>  - on Windows only,
>  - when a user customizes their Firefox UI such that the Location Bar or Search
> Bar (or presumably any other widget that taxes text input) is placed in the
> menuBar,
>  - and when the user then toggles the setting of "View > Menu Bar"

One thing to keep in mind is that the menu bar shows up automatically when customizing the toolbars. This means that it's probably more common than we might think that a user will drop things on that seemingly empty space above the normal toolbar -- probably thinking it will make the whole layout more space efficient. When done customizing, that whole row of stuff you just moved completely disappears. From an uninformed user's point of view, I can see how that might be very confusing and cause you to look for where it went.

My point is that the user crash scenario isn't as rare as it might seem. The good thing is that it's easy to restore the toolbars after you've crashed, so it's not a very critical crash in that sense -- but the typical recovery will probably be to try to show the menu bar, fail, try again, fail, and then hopefully restore the toolbar layout (because if not, you'll effectively lose the piece of UI you moved to the menu bar).

On SUMO, we should probably look for both users complaining about crashing as well as users complaining about toolbar UI elements being gone.
Some preliminary testing using Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b13pre) Gecko/20110312 Firefox/4.0b13pre looks good - I have not yet been able to crash using the STR in Comment 5.
This has been fixed on mozilla-central for some time.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Whiteboard: [watching crash-stats][fx4-rc-ridealong][fixed on m-c][regressed after b12] → [watching crash-stats][fx4-rc-ridealong][regressed after b12]
Target Milestone: --- → mozilla2.2
Just as a noted, this spiked significantly on FF4 release day, to 1127 total crashes or 188 crashes per million 4.0* ADU, ranking as the #16 topcrash on 4.0* on that day.
blocking2.0: .x+ → Macaw
Attachment #518657 - Flags: approval2.0?
Attachment #518583 - Flags: approval2.0+
Comment on attachment 518657 [details] [diff] [review]
For check-in on mozilla-central

Approved for the mozilla2.0 repository, a=dveditz for release-drivers
Attachment #518657 - Flags: approval2.0? → approval2.0+
Crash Signature: [@ nsEditor::Init(nsIDOMDocument*, nsIPresShell*, nsIContent*, nsISelectionController*, unsigned int) ]
You need to log in before you can comment on or make changes to this bug.