Closed Bug 414747 Opened 17 years ago Closed 17 years ago

Firefox crashes during Addons -> Find Update [@nsQueryInterface::operator()]

Categories

(Toolkit :: Add-ons Manager, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla1.9beta3

People

(Reporter: aja+bugzilla, Assigned: jst)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(5 files, 2 obsolete files)

crashes during addons -> Find Update Regression info: 20080129_0533_firefox-3.0b3pre.en-US.win32.zip OK 20080129_0733_firefox-3.0b3pre.en-US.win32.zip Crashing which looks like it would narrow it down to either: Bug 402252 - application details should be accessible from prefs UI. or Bug 391002 - "broadcaster/command element failed to re-forward all attributes to the target element"
Flags: blocking-firefox3?
Keywords: crash
Priority: -- → P1
Target Milestone: --- → Firefox 3 beta3
Apparently on Linux as well
OS: Windows XP → All
Report ID Date Submitted 7c6add80-cee2-11dc-9071-001a4bd43e5c 2008-01-29 21:21 f59751a3-cee0-11dc-ac82-001a4bd43ef6 2008-01-29 21:10 98ab0614-cee0-11dc-bc3d-001a4bd43ef6 2008-01-29 21:08 5871effa-cedf-11dc-b077-001a4bd43ed6 2008-01-29 20:59 30a73d78-ced9-11dc-9421-001a4bd43e5c 2008-01-29 20:15 00044838-ced8-11dc-98b8-001a4bd43ed6 2008-01-29 20:06 7b8a3c28-ced4-11dc-9861-001a4bd43ed6 2008-01-29 19:41 20db8863-ced3-11dc-84e0-001a4bd43ef6 2008-01-29 19:31 1a8ef4fd-ced1-11dc-aa73-001a4bd43ef6 2008-01-29 19:17 5d3d4ef8-ced0-11dc-b590-001a4bd46e84 2008-01-29 19:12
Please can you provide breakpad ids from a nightly build.
(In reply to comment #3) > Please can you provide breakpad ids from a nightly build. It just landed this morning...
My extension list, in case it may help: Console² 0.3.9 DOM Inspector 1.9b3pre Firebug 1.1.0b10 [DISABLED] Firefox Accessibility Extension 1.3.0.2 Link Widgets 1.6 Nightly Tester Tools 1.3b4 Operator 0.9b printpdf 0.75 RAMBack 1.0 Update Channel Selector 1.0.2 Web Developer 1.1.4 YSlow 0.9.2 [DISABLED]
Severity: normal → major
Peter, are you seeing this as well?
FWIW, pal-moz confirmed on win, hhh on linux.
Severity: major → critical
Keywords: regression
Blocking for now. Is the crash 100% reproducible?
Flags: blocking-firefox3? → blocking-firefox3+
Flags: blocking-firefox3+ → blocking-firefox3?
See comment #8 - blocking was granted, then someone set the 'request blocking' flag after blocking was granted and knocked off the blocking flag. Needs to be reset.
from IRC: [07:22] <arno> reed: I made a trace, and yes, it really looks like my patch causes bug 414747 [07:22] <arno> I'm at my paid work, I will have time to investigate more tonight --- Will post a few crash reports using the nightly soon.
Blocks: 391002
I confirm the crash too, but it wasnt because of Find Update. This one just happened for me as well on Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9b3pre) Gecko/2008013004 Minefield/3.0b3pre. Please change the bug title to reflect the crash. Stack: http://crash-stats.mozilla.com/report/index/ee79c517-cf57-11dc-8307-001a4bd43ef6 0 nsQueryInterface::operator()(nsID const&, void**) const nsCOMPtr.cpp:47 1 nsCOMPtr_base::assign_from_qi(nsQueryInterface, nsID const&) nsCOMPtr.cpp:96 2 nsXULDocument::AttributeChanged(nsIDocument*, nsIContent*, int, nsIAtom*, int, unsigned int) mozilla/content/xul/document/src/nsXULDocument.cpp:645 3 nsNodeUtils::AttributeChanged(nsIContent*, int, nsIAtom*, int, unsigned int) mozilla/content/base/src/nsNodeUtils.cpp:107 4 nsGenericElement::SetAttrAndNotify(int, nsIAtom*, nsIAtom*, nsAString_internal const&, nsAttrValue&, int, int, int) mozilla/content/base/src/nsGenericElement.cpp:3716 5 nsGenericElement::SetAttr(int, nsIAtom*, nsIAtom*, nsAString_internal const&, int) mozilla/content/base/src/nsGenericElement.cpp:3647 6 nsGenericElement::SetAttribute(nsAString_internal const&, nsAString_internal const&) mozilla/content/base/src/nsGenericElement.cpp:396
Hardware: PC → All
Summary: crashes during addons -> Find Update → Firefox crashes during Addons -> Find Update [@nsQueryInterface::operator()]
Since this requires users to check for updates, which isn't a hugely common action, I'm not sure we *need* it to ship. Very interested in a patch if one comes along, though, please nominate for approval1.9b3?
Flags: blocking-firefox3? → blocking-firefox3+
Target Milestone: Firefox 3 beta3 → Firefox 3 beta4
(In reply to comment #15) > Since this requires users to check for updates, which isn't a hugely common > action, I'm not sure we *need* it to ship. Very interested in a patch if one > comes along, though, please nominate for approval1.9b3? There are alternate STR in bug 414805 that make this more of a real problem and not just limited to manually checking for updates, so I think this needs to block beta 3. If we can't come up with a fix for it, we could probably just back out bug 391002.
Target Milestone: Firefox 3 beta4 → Firefox 3 beta3
Hm, combining the windows and OSX stacks (which seem to fail at different steps) this becomes a topcrasher for beta 3. I think Reed's right!
Attached file testcase (obsolete) —
the bug seems to appear in nsXULDocument::AttributeChanged when you synchronize broadcasters and listeners. If a broadcaster is still associated (with mBroadcasterMap) with a listener that has been deleted, as you try to access it, Firefox sometimes crashes. Sometimes, it does not crash, so you may need to reload testcases a few times.
This is a really nasty testcase arno. When I want to close the tab which holds the testcase by middle-click or via the close button Firefox also crashes. The crasherid is not available yet. Just adding it for further investigation: bp-2f963b45-cf85-11dc-a22f-001a4bd46e84
Attached file testcase v2
sorry, here is a testcase that tries to crash only if you explicitly asks for it.
Attachment #300493 - Attachment is obsolete: true
I think that bug is a revival of bug 18127. 18127's testcase (slightly modified) made my firefox crash. That bug occurs because mBroadcasterMap entries do not have an owning reference to listeners. Crash was even "predicted": http://mxr.mozilla.org/seamonkey/source/docshell/base/nsDocShell.cpp#4368 After applying that small patch, I could not reproduce crash. But I don't understand xpcom enough to known if it could cause some memory leaks.
Attachment #300520 - Attachment is obsolete: true
Comment on attachment 300522 [details] [diff] [review] modification that seems to fix crash Unless proven otherwise, I say we can't take this due to the leaks it will introduce. If we do this we'd need to add cycle collection magic to this code too. Peterv says that wouldn't be too hard...
Attachment #300522 - Flags: superreview-
Attachment #300546 - Flags: superreview?(peterv)
Attachment #300546 - Flags: review?(peterv)
Attachment #300546 - Flags: superreview?(peterv)
Attachment #300546 - Flags: superreview+
Attachment #300546 - Flags: review?(peterv)
Attachment #300546 - Flags: review+
Attachment #300546 - Flags: approval1.9b3?
When updating from Firefox 3.0 beta 2 to Firefox 3.0 beta 3 and Firefox asks the user if they want to check for updates, because they have incompatable add-ons, will Firefox crash if the user tells Firefox to check for an update? If so, this should definently be a beta 3 blocker. Especially if the user should update an extension to fix a leak, security issue or whatever else you can think up that makes for a good need to update add-ons situation.
Comment on attachment 300546 [details] [diff] [review] Same thing, with cycle collection to avoid leaks. a=beltzner for beta 3
Attachment #300546 - Flags: approval1.9b3? → approval1.9b3+
Landed. Checking in nsXULDocument.cpp; /cvsroot/mozilla/content/xul/document/src/nsXULDocument.cpp,v <-- nsXULDocument.cpp new revision: 1.799; previous revision: 1.798 done
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Flags: in-testsuite?
WFM now with: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b3pre) Gecko/2008013022 Minefield/3.0b3pre ID:2008013022
Status: RESOLVED → VERIFIED
Assignee: nobody → jst
An alternative solution would have been to use nsWeakRef as all nodes support that now. I'm actually a little worried that the current solution will introduce leaks given that i've seen cycles involving broadcast listeners and controllers before. In fact we manually still have to break cycles involving controllers from nsDocument::Destroy. Not sure if broadcast listener cycles are less likely to involve random other objects though.
Yes, this is of course not the right fix, just wallpaper. The problem was caused because we had a stale BroadcastListener object. When removing the button from the document, this breaks the broadcaster relation, which arno decided should remove all the broadcast attributes. However, we also check the broadcaster hookup when any attribute changes. This promptly restores the broadcaster we just removed. If instead we only check the broadcaster hookup when an observes or command attribute changes then the problem should not occur. We should also assert in ~nsXULDocument that the broadcaster hash is empty.
> The problem was caused because we had a stale BroadcastListener object. When > removing the button from the document, this breaks the broadcaster relation, > which arno decided should remove all the broadcast attributes. A first, I found it more coherent to remove broadcast attributes (see bug 391002 comment 13). But that seems to cause a lot of problems, including bug 414846 and bug 414907. So, now, I think it's better to not remove those attributes. Shall I reopen bug 391002 ?
Note that although I don't crash I still see the assertion on shutdown. Perhaps I should just change it to a warning?
Attachment #300642 - Flags: superreview?(jst)
Attachment #300642 - Flags: review?(jst)
Oh, and in case it's not clear from the CVS diff, my build doesn't have attachment 300456 [details] [diff] [review], and I still don't crash with my patch.
Comment on attachment 300642 [details] [diff] [review] Attempt to fix the underlying cause of the bug r+sr=jst. And after talking to Jonas we should also undo the first fix for this bug and replace the strong reference with an nsWeakRef to avoid adding new cycles, whether they're right now potentially causing leaks or not. And that way we won't be keeping objects alive longer etc either. Neil, wanna revert the first patch and replace the strong reference that it added with an nsWeakRef while you're here? If not, let me know and I'll do that part...
Attachment #300642 - Flags: superreview?(jst)
Attachment #300642 - Flags: superreview+
Attachment #300642 - Flags: review?(jst)
Attachment #300642 - Flags: review+
(In reply to comment #34) >Neil, wanna revert the first patch and replace the strong reference that it >added with an nsWeakRef while you're here? I'm happy to revert the first patch, but until I crash, I can't be bothered to switch to an nsWeakRef ;-) I'm also not sure whether my assertion is bogus yet. The testcase itself doesn't trigger it, but the chrome sure does!
Please don't add back the raw pointer. It even has a comment indicating that it's a crash waiting to happen. We've had crashes with broadcast classes in the past due to rawpointers. It would suck having to do a security drill just because we didn't fix it now.
Attachment #300642 - Flags: approval1.9b3?
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Comment on attachment 300642 [details] [diff] [review] Attempt to fix the underlying cause of the bug a=beltzner
Attachment #300642 - Flags: approval1.9b3? → approval1.9b3+
Blocks: 413631
I asked jst on IRC about the last patch, and he said it could go ahead and land, so I went ahead and landed it to get some time baking before b3 builds start. jst said to leave the first patch in for now, so I didn't back it out. The other issues need to be resolved, though, so I'm leaving this open until then (unless somebody wants to file a follow-up). Checking in content/xul/document/src/nsXULDocument.cpp; /cvsroot/mozilla/content/xul/document/src/nsXULDocument.cpp,v <-- nsXULDocument.cpp new revision: 1.800; previous revision: 1.799 done
Neil's patch caused test failures with the test added in bug 391002, so I backed it out. :( *** 11551 INFO Running /tests/content/xul/document/test/test_bug391002.xul... *** 11552 INFO PASS | undefined *** 11553 INFO PASS | undefined *** 11554 ERROR FAIL | undefined | got "", expected "cmd2" | /tests/content/xul/document/test/test_bug391002.xul *** 11555 ERROR TODO WORKED? | undefined | /tests/content/xul/document/test/test_bug391002.xul
I'm really worried that we're introducing a pile of leaks right before shipping beta3. It should be trivial to use an nsWeakRef rather than an nsCOMPtr. I guess I'll whip up a patch myself.
As far as I understand, neil's patches is useful because at the moment: when removing an broadcast/listener relation, it removes attributes, and therefore, triggers CheckBroadcasterHookup, and causes broadcast/listener relation to be set again. But in patch for bug 414907, we do not remove attributes anymore. So is neil's patch redudant with bug 414907 ?
Comment on attachment 301391 [details] [diff] [review] Use nsWeakRef r+sr=jst, but you could leave the changes like this in: BroadcasterMapEntry* entry = static_cast<BroadcasterMapEntry*> (PL_DHashTableOperate(mBroadcasterMap, aBroadcaster, - PL_DHASH_LOOKUP)); + PL_DHASH_LOOKUP));
Attachment #301391 - Flags: superreview?(jst)
Attachment #301391 - Flags: superreview+
Attachment #301391 - Flags: review?(jst)
Attachment #301391 - Flags: review+
Comment on attachment 301391 [details] [diff] [review] Use nsWeakRef a=mconnor on behalf of drivers
Attachment #301391 - Flags: approval1.9b3+
Should this be marked as fixed?
i'm not crashing anymore after the fix was checked in yesterday with the steps to reproduce and Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9b3) Gecko/2008020418 Firefox/3.0b3
(In reply to comment #45) > Should this be marked as fixed? > Resolving FIXED. If this was being kept open for additional broadcast/listener reasons, please feel free to reopen / morph the title / file followups. As per comment 28, it's been fixed for me since attachment 300546 [details] [diff] [review] landed.
Status: REOPENED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → FIXED
*** VERIFIED Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9b3) Gecko/2008020514 Firefox/3.0b3
Status: RESOLVED → VERIFIED
Product: Firefox → Toolkit
This is for the old UI. We have tests that verify clicking the equivalent button in the new UI which is probably all we can do here I think.
Flags: in-testsuite? → in-testsuite+
Crash Signature: [@nsQueryInterface::operator()]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: