Firefox crashes during Addons -> Find Update [@nsQueryInterface::operator()]

VERIFIED FIXED in mozilla1.9beta3

Status

()

Toolkit
Add-ons Manager
P1
critical
VERIFIED FIXED
10 years ago
7 years ago

People

(Reporter: aja+bugzilla, Assigned: jst)

Tracking

({crash, regression})

Trunk
mozilla1.9beta3
crash, regression
Points:
---
Dependency tree / graph
Bug Flags:
blocking1.9 +
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

(crash signature)

Attachments

(5 attachments, 2 obsolete attachments)

(Reporter)

Description

10 years ago
crashes during addons -> Find Update

Regression info:
20080129_0533_firefox-3.0b3pre.en-US.win32.zip OK
20080129_0733_firefox-3.0b3pre.en-US.win32.zip Crashing

which looks like it would narrow it down to either:

Bug 402252 - application details should be accessible from prefs UI.

or

Bug 391002 - "broadcaster/command element failed to re-forward all attributes to the target element"
Flags: blocking-firefox3?
Keywords: crash
Priority: -- → P1
Target Milestone: --- → Firefox 3 beta3
(Reporter)

Comment 1

10 years ago
Apparently on Linux as well
OS: Windows XP → All
(Reporter)

Comment 2

10 years ago
Report ID                        	Date Submitted
7c6add80-cee2-11dc-9071-001a4bd43e5c	2008-01-29	21:21
f59751a3-cee0-11dc-ac82-001a4bd43ef6	2008-01-29	21:10
98ab0614-cee0-11dc-bc3d-001a4bd43ef6	2008-01-29	21:08
5871effa-cedf-11dc-b077-001a4bd43ed6	2008-01-29	20:59
30a73d78-ced9-11dc-9421-001a4bd43e5c	2008-01-29	20:15
00044838-ced8-11dc-98b8-001a4bd43ed6	2008-01-29	20:06
7b8a3c28-ced4-11dc-9861-001a4bd43ed6	2008-01-29	19:41
20db8863-ced3-11dc-84e0-001a4bd43ef6	2008-01-29	19:31
1a8ef4fd-ced1-11dc-aa73-001a4bd43ef6	2008-01-29	19:17
5d3d4ef8-ced0-11dc-b590-001a4bd46e84	2008-01-29	19:12
Please can you provide breakpad ids from a nightly build.
(In reply to comment #3)
> Please can you provide breakpad ids from a nightly build.

It just landed this morning...
(Reporter)

Comment 5

10 years ago
My extension list, in case it may help:

Console² 0.3.9
DOM Inspector 1.9b3pre
Firebug 1.1.0b10 [DISABLED]
Firefox Accessibility Extension 1.3.0.2
Link Widgets 1.6
Nightly Tester Tools 1.3b4
Operator 0.9b
printpdf 0.75
RAMBack 1.0
Update Channel Selector 1.0.2
Web Developer 1.1.4
YSlow 0.9.2 [DISABLED]
Severity: normal → major
Peter, are you seeing this as well?
(Reporter)

Comment 7

10 years ago
FWIW, pal-moz confirmed on win, hhh on linux.

Updated

10 years ago
Severity: major → critical
Keywords: regression
Blocking for now. Is the crash 100% reproducible?
Flags: blocking-firefox3? → blocking-firefox3+
Flags: blocking-firefox3+ → blocking-firefox3?
See comment #8 - blocking was granted, then someone set the 'request blocking' flag after blocking was granted and knocked off the blocking flag.

Needs to be reset.

(Reporter)

Comment 10

10 years ago
from IRC:

[07:22]	<arno>	reed: I made a trace, and yes, it really looks like my patch causes bug 414747
[07:22]	<arno>	I'm at my paid work, I will have time to investigate more tonight

---

Will post a few crash reports using the nightly soon.
(Reporter)

Updated

10 years ago
Blocks: 391002

Updated

10 years ago
Duplicate of this bug: 414805

Comment 14

10 years ago
I confirm the crash too, but it wasnt because of Find Update.   This one just happened for me as well on Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9b3pre) Gecko/2008013004 Minefield/3.0b3pre.   Please change the bug title to reflect the crash.

Stack:
http://crash-stats.mozilla.com/report/index/ee79c517-cf57-11dc-8307-001a4bd43ef6


0  	nsQueryInterface::operator()(nsID const&, void**) const  	 nsCOMPtr.cpp:47
1 	nsCOMPtr_base::assign_from_qi(nsQueryInterface, nsID const&) 	nsCOMPtr.cpp:96
2 	nsXULDocument::AttributeChanged(nsIDocument*, nsIContent*, int, nsIAtom*, int, unsigned int) 	mozilla/content/xul/document/src/nsXULDocument.cpp:645
3 	nsNodeUtils::AttributeChanged(nsIContent*, int, nsIAtom*, int, unsigned int) 	mozilla/content/base/src/nsNodeUtils.cpp:107
4 	nsGenericElement::SetAttrAndNotify(int, nsIAtom*, nsIAtom*, nsAString_internal const&, nsAttrValue&, int, int, int) 	mozilla/content/base/src/nsGenericElement.cpp:3716
5 	nsGenericElement::SetAttr(int, nsIAtom*, nsIAtom*, nsAString_internal const&, int) 	mozilla/content/base/src/nsGenericElement.cpp:3647
6 	nsGenericElement::SetAttribute(nsAString_internal const&, nsAString_internal const&) 	mozilla/content/base/src/nsGenericElement.cpp:396

Updated

10 years ago
Hardware: PC → All
Summary: crashes during addons -> Find Update → Firefox crashes during Addons -> Find Update [@nsQueryInterface::operator()]
Since this requires users to check for updates, which isn't a hugely common action, I'm not sure we *need* it to ship. Very interested in a patch if one comes along, though, please nominate for approval1.9b3?
Flags: blocking-firefox3? → blocking-firefox3+
Target Milestone: Firefox 3 beta3 → Firefox 3 beta4
(In reply to comment #15)
> Since this requires users to check for updates, which isn't a hugely common
> action, I'm not sure we *need* it to ship. Very interested in a patch if one
> comes along, though, please nominate for approval1.9b3?

There are alternate STR in bug 414805 that make this more of a real problem and not just limited to manually checking for updates, so I think this needs to block beta 3. If we can't come up with a fix for it, we could probably just back out bug 391002.
Target Milestone: Firefox 3 beta4 → Firefox 3 beta3
Hm, combining the windows and OSX stacks (which seem to fail at different steps) this becomes a topcrasher for beta 3. I think Reed's right!

Comment 18

10 years ago
Created attachment 300493 [details]
testcase

the bug seems to appear in nsXULDocument::AttributeChanged when you synchronize broadcasters and listeners. If a broadcaster is still associated (with mBroadcasterMap) with a listener that has been deleted, as you try to access it, Firefox sometimes crashes. Sometimes, it does not crash, so you may need to reload testcases a few times.
This is a really nasty testcase arno. When I want to close the tab which holds the testcase by middle-click or via the close button Firefox also crashes. The crasherid is not available yet. Just adding it for further investigation:

bp-2f963b45-cf85-11dc-a22f-001a4bd46e84

Comment 20

10 years ago
Created attachment 300500 [details]
testcase v2

sorry, here is a testcase that tries to crash only if you explicitly asks for it.
Attachment #300493 - Attachment is obsolete: true

Updated

10 years ago
Blocks: 395671

Comment 21

10 years ago
Created attachment 300520 [details] [diff] [review]
modification that seems to fix crash

I think that bug is a revival of bug 18127. 18127's testcase (slightly modified) made my firefox crash. That bug occurs because mBroadcasterMap entries do not have an owning reference to listeners. Crash was even "predicted": http://mxr.mozilla.org/seamonkey/source/docshell/base/nsDocShell.cpp#4368

After applying that small patch, I could not reproduce crash. But I don't understand xpcom enough to known if it could cause some memory leaks.

Comment 22

10 years ago
Created attachment 300522 [details] [diff] [review]
 modification that seems to fix crash
Attachment #300520 - Attachment is obsolete: true
(Assignee)

Comment 23

10 years ago
Comment on attachment 300522 [details] [diff] [review]
 modification that seems to fix crash  

Unless proven otherwise, I say we can't take this due to the leaks it will introduce.

If we do this we'd need to add cycle collection magic to this code too. Peterv says that wouldn't be too hard...
Attachment #300522 - Flags: superreview-
(Assignee)

Comment 24

10 years ago
Created attachment 300546 [details] [diff] [review]
Same thing, with cycle collection to avoid leaks.
Attachment #300546 - Flags: superreview?(peterv)
Attachment #300546 - Flags: review?(peterv)
Attachment #300546 - Flags: superreview?(peterv)
Attachment #300546 - Flags: superreview+
Attachment #300546 - Flags: review?(peterv)
Attachment #300546 - Flags: review+
(Assignee)

Updated

10 years ago
Attachment #300546 - Flags: approval1.9b3?

Comment 25

10 years ago
When updating from Firefox 3.0 beta 2 to Firefox 3.0 beta 3 and Firefox asks the user if they want to check for updates, because they have incompatable add-ons, will Firefox crash if the user tells Firefox to check for an update?
If so, this should definently be a beta 3 blocker.  Especially if the user should update an extension to fix a leak, security issue or whatever else you can think up that makes for a good need to update add-ons situation.
Comment on attachment 300546 [details] [diff] [review]
Same thing, with cycle collection to avoid leaks.

a=beltzner for beta 3
Attachment #300546 - Flags: approval1.9b3? → approval1.9b3+
Keywords: checkin-needed
Landed.

Checking in nsXULDocument.cpp;
/cvsroot/mozilla/content/xul/document/src/nsXULDocument.cpp,v  <--  nsXULDocument.cpp
new revision: 1.799; previous revision: 1.798
done
Keywords: checkin-needed

Updated

10 years ago
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED
Flags: in-testsuite?
(Reporter)

Comment 28

10 years ago
WFM now with:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b3pre) Gecko/2008013022 Minefield/3.0b3pre ID:2008013022
Status: RESOLVED → VERIFIED
Assignee: nobody → jst
An alternative solution would have been to use nsWeakRef as all nodes support that now. I'm actually a little worried that the current solution will introduce leaks given that i've seen cycles involving broadcast listeners and controllers before. In fact we manually still have to break cycles involving controllers from nsDocument::Destroy. Not sure if broadcast listener cycles are less likely to involve random other objects though.

Comment 30

10 years ago
Yes, this is of course not the right fix, just wallpaper.

The problem was caused because we had a stale BroadcastListener object. When
removing the button from the document, this breaks the broadcaster relation,
which arno decided should remove all the broadcast attributes. However, we also
check the broadcaster hookup when any attribute changes. This promptly restores
the broadcaster we just removed.

If instead we only check the broadcaster hookup when an observes or command
attribute changes then the problem should not occur.

We should also assert in ~nsXULDocument that the broadcaster hash is empty.

Comment 31

10 years ago
> The problem was caused because we had a stale BroadcastListener object. When
> removing the button from the document, this breaks the broadcaster relation,
> which arno decided should remove all the broadcast attributes.

A first, I found it more coherent to remove broadcast attributes (see bug 391002 comment 13).  But that seems to cause a lot of problems, including bug 414846 and bug 414907. So, now, I think it's better to not remove those attributes. Shall I reopen bug 391002 ?

Comment 32

10 years ago
Created attachment 300642 [details] [diff] [review]
Attempt to fix the underlying cause of the bug

Note that although I don't crash I still see the assertion on shutdown.
Perhaps I should just change it to a warning?
Attachment #300642 - Flags: superreview?(jst)
Attachment #300642 - Flags: review?(jst)

Comment 33

10 years ago
Oh, and in case it's not clear from the CVS diff, my build doesn't have attachment 300456 [details] [diff] [review], and I still don't crash with my patch.
(Assignee)

Comment 34

10 years ago
Comment on attachment 300642 [details] [diff] [review]
Attempt to fix the underlying cause of the bug

r+sr=jst. And after talking to Jonas we should also undo the first fix for this bug and replace the strong reference with an nsWeakRef to avoid adding new cycles, whether they're right now potentially causing leaks or not. And that way we won't be keeping objects alive longer etc either.

Neil, wanna revert the first patch and replace the strong reference that it added with an nsWeakRef while you're here? If not, let me know and I'll do that part...
Attachment #300642 - Flags: superreview?(jst)
Attachment #300642 - Flags: superreview+
Attachment #300642 - Flags: review?(jst)
Attachment #300642 - Flags: review+

Comment 35

10 years ago
(In reply to comment #34)
>Neil, wanna revert the first patch and replace the strong reference that it
>added with an nsWeakRef while you're here?
I'm happy to revert the first patch, but until I crash, I can't be bothered to switch to an nsWeakRef ;-)

I'm also not sure whether my assertion is bogus yet. The testcase itself doesn't trigger it, but the chrome sure does!
Please don't add back the raw pointer. It even has a comment indicating that it's a crash waiting to happen. We've had crashes with broadcast classes in the past due to rawpointers. It would suck having to do a security drill just because we didn't fix it now.
Attachment #300642 - Flags: approval1.9b3?
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Comment on attachment 300642 [details] [diff] [review]
Attempt to fix the underlying cause of the bug

a=beltzner
Attachment #300642 - Flags: approval1.9b3? → approval1.9b3+

Updated

10 years ago
Blocks: 413631
I asked jst on IRC about the last patch, and he said it could go ahead and land, so I went ahead and landed it to get some time baking before b3 builds start. jst said to leave the first patch in for now, so I didn't back it out. The other issues need to be resolved, though, so I'm leaving this open until then (unless somebody wants to file a follow-up).

Checking in content/xul/document/src/nsXULDocument.cpp;
/cvsroot/mozilla/content/xul/document/src/nsXULDocument.cpp,v  <--  nsXULDocument.cpp
new revision: 1.800; previous revision: 1.799
done
Neil's patch caused test failures with the test added in bug 391002, so I backed it out. :(

*** 11551 INFO Running /tests/content/xul/document/test/test_bug391002.xul...
*** 11552 INFO PASS | undefined
*** 11553 INFO PASS | undefined
*** 11554 ERROR FAIL | undefined | got "", expected "cmd2" | /tests/content/xul/document/test/test_bug391002.xul
*** 11555 ERROR TODO WORKED? | undefined | /tests/content/xul/document/test/test_bug391002.xul
I'm really worried that we're introducing a pile of leaks right before shipping beta3. It should be trivial to use an nsWeakRef rather than an nsCOMPtr. I guess I'll whip up a patch myself.

Comment 41

10 years ago
As far as I understand, neil's patches is useful because at the moment:
when removing an broadcast/listener relation, it removes attributes, and therefore, triggers CheckBroadcasterHookup, and causes broadcast/listener relation to be set again.

But in patch for bug 414907, we do not remove attributes anymore. So is neil's patch redudant with bug 414907 ?
(Assignee)

Comment 43

10 years ago
Comment on attachment 301391 [details] [diff] [review]
Use nsWeakRef

r+sr=jst, but you could leave the changes like this in:

     BroadcasterMapEntry* entry =
         static_cast<BroadcasterMapEntry*>
                    (PL_DHashTableOperate(mBroadcasterMap, aBroadcaster,
-                                         PL_DHASH_LOOKUP));
+                                            PL_DHASH_LOOKUP));
Attachment #301391 - Flags: superreview?(jst)
Attachment #301391 - Flags: superreview+
Attachment #301391 - Flags: review?(jst)
Attachment #301391 - Flags: review+
Comment on attachment 301391 [details] [diff] [review]
Use nsWeakRef

a=mconnor on behalf of drivers
Attachment #301391 - Flags: approval1.9b3+

Comment 45

10 years ago
Should this be marked as fixed?
i'm not crashing anymore after the fix was checked in yesterday with the steps to reproduce and Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9b3) Gecko/2008020418 Firefox/3.0b3
(Reporter)

Comment 47

10 years ago
(In reply to comment #45)
> Should this be marked as fixed?
> 

Resolving FIXED.  If this was being kept open for additional broadcast/listener reasons, please feel free to reopen / morph the title / file followups.
As per comment 28, it's been fixed for me since attachment 300546 [details] [diff] [review] landed. 
Status: REOPENED → RESOLVED
Last Resolved: 10 years ago10 years ago
Resolution: --- → FIXED

Comment 48

10 years ago
*** VERIFIED

Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9b3) Gecko/2008020514 Firefox/3.0b3

Updated

10 years ago
Status: RESOLVED → VERIFIED
Product: Firefox → Toolkit
This is for the old UI. We have tests that verify clicking the equivalent button in the new UI which is probably all we can do here I think.
Flags: in-testsuite? → in-testsuite+
Crash Signature: [@nsQueryInterface::operator()]
You need to log in before you can comment on or make changes to this bug.