Closed Bug 718284 Opened 12 years ago Closed 12 years ago

Cycle collector crash when using the default Wikipedia(en) search plugin with HTTPS-Everywhere

Categories

(Core :: DOM: Events, defect)

11 Branch
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla13
Tracking Status
firefox10 --- unaffected
firefox11 + verified
firefox12 --- verified
firefox13 --- verified

People

(Reporter: Fanolian+BMO, Assigned: mccr8)

References

Details

(5 keywords, Whiteboard: [qa!])

Crash Data

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0a1) Gecko/20120114 Firefox/12.0a1
Build ID: 20120114031054

Steps to reproduce:

1. In a new profile, install HTTPS-Everywhere 1.2.2 or 2.0development.4.
2. In the option of HTTPS-Everywhere, make sure "Wikipedia" is checked (enabled).
3. From the default Wikipedia(en) search plugin in the searchbar, search for "cat", "dog", "doll" or any keywords (without quotes).


Actual results:

More than half of the time Nightly crashes with Signature GCGraphBuilder::NoteXPCOMChild(nsISupports*). If it does not crash the first time, reopen Nightly in a new session and search from the searchbar again. Nightly will crash eventually.
I haven't been able to reproduce it in a vanilla Nightly yet, nor by disabling the Wikipedia ruleset in HTTPS-Everywhere.


Expected results:

Even if it is the HTTPS-Everywhere ruleset that is responsible for the malfunction, Nightly should not crash.


Here is the crash report following the above steps:
https://crash-stats.mozilla.com/report/index/bp-894f0833-ff02-4701-82ec-2916f2120115
Crash report in my main profile:
https://crash-stats.mozilla.com/report/index/bp-b9d31430-1522-4e43-a8db-8e9602120115

Selective crash reports from other users similar to my case (Wikipedia + HTTPS-Everywhere), found in the Comments Section of https://crash-stats.mozilla.com/report/list?query_search=signature&query_type=contains&reason_type=contains&range_value=4&range_unit=weeks&hang_type=any&process_type=any&signature=GCGraphBuilder%3A%3ANoteXPCOMChild%28nsISupports*%29

"wikipedia, ssl everywhereenabled": https://crash-stats.mozilla.com/report/index/99cfaf69-aad2-4512-a31d-d8d682111221

"This crash occurs every time I attempt to search Wikipedia via the search bar plugin. It does not occur when I use other searches from the same bar. It does not occur when I navigate to Wikipedia using the URL bar or search via the Wikipedia landing page.": https://crash-stats.mozilla.com/report/index/8f05f424-ef7b-42f9-826b-486502120108

"Opening Wikipedia search results page from search engine box.": https://crash-stats.mozilla.com/report/index/d831313b-0a15-4dc2-8842-32dee2120104

"I used the search bar to search a term on wikipedia. When I saw the result, the browser crashed. It happened sometimes on wiktionary too. However, I didn't encounter the same issue when I used other search engines.": https://crash-stats.mozilla.com/report/index/07f1cf15-c04c-407f-9c7e-e1a0c2120106

(This one is without any extensions) "This crash (along with the one submitted from my same machine about 2.5 hours ago, and along with the previous six crashes for which I didn't submit crash reports) was triggered by entering something into the #nav-bar's search bar, then clicking/pressng enter to tell Firefox to search, with Wikipedia as the selected search engine. (It used to seem associated with alt-tabbed away while still loading, but in this case Firefox still had focus.)": https://crash-stats.mozilla.com/report/index/6c08e4f9-3a8e-4489-8064-f31a42120109
I mark this as major because it involves crashes by a popular default search plugin with a (seemingly) popular extension.
Severity: normal → major
Crash Signature: GCGraphBuilder::NoteXPCOMChild(nsISupports*)
Amendment:
Crash report following the steps in comment#0:
https://crash-stats.mozilla.com/report/index/bp-007565f8-2df6-48e7-800d-ea4cf2120115
Thanks for the steps to reproduce! I set it to critical because this is a crash.

The stacks look like this:
0 GCGraphBuilder::NoteXPCOMChild 	xpcom/base/nsCycleCollector.cpp:1724
1 nsDOMEvent::cycleCollection::Traverse 	content/events/src/nsDOMEvent.cpp:239
2 nsCycleCollector::MarkRoots 	xpcom/base/nsCycleCollector.cpp:1920

The NoteXPComChild line is this:
   if (!child || !(child = canonicalize(child)))
The Traverse line is this:
   NS_IMPL_CYCLE_COLLECTION_TRAVERSE_NSCOMPTR(mEvent->target)
Blocks: 469267
Severity: major → critical
Status: UNCONFIRMED → NEW
Component: Untriaged → DOM: Events
Ever confirmed: true
Product: Firefox → Core
QA Contact: untriaged → events
Summary: Crash when using the default Wikipedia(en) search plugin with HTTPS-Everywhere → Cycle collector crash when using the default Wikipedia(en) search plugin with HTTPS-Everywhere
Crash Signature: GCGraphBuilder::NoteXPCOMChild(nsISupports*) → [@ GCGraphBuilder::NoteXPCOMChild(nsISupports*)] [@ @0x0 | GCGraphBuilder::NoteXPCOMChild(nsISupports*)] [@ GCGraphBuilder::NoteXPCOMChild] [@ @0x0 | GCGraphBuilder::NoteXPCOMChild]
Keywords: crash
OS: Windows 7 → All
Hardware: x86_64 → All
I do not know if these are related, but here are some other crash reports with different signatures when I was testing the crash.

XPCWrappedNative::FlatJSObjectFinalized(JSContext*)
https://crash-stats.mozilla.com/report/index/bp-b8b98caf-d428-4a92-86d3-6846c2120115
https://crash-stats.mozilla.com/report/index/bp-dd855df4-2066-49f8-8eb0-7ff2f2111226
(This happened on the other day, but it was the first crash of that day using the Wikipedia search plugin.)

nsEventListenerManager::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&)
https://crash-stats.mozilla.com/report/index/bp-30a00671-6180-4c8d-b0b4-713062120115
https://crash-stats.mozilla.com/report/index/bp-e9b6dd0b-f875-4fc6-aaa6-761002120115
That's interesting.  Those are the same signatures that that other reporter (and me, twice) was getting when reloading html5test.com.  Which I unfortunately haven't been able to reproduce since I made it crash twice.  But that was without HTTPS-everywhere, which suggests there is an underlying problem even in the absence of extensions.
Keywords: reproducible
The FlatJSObjectFinalized crash is in a QI:
  CallQueryInterface(mIdentity, &cache);

The eventListenerManager crash is in this line:
  PRUint32 count = tmp->mListeners.Length();
One of these was a null deref, the other was a deref of 0x3f000000.
Does it happens in 10.0b5 because it's a top crasher higher in 10.0 Beta than in 9.0.1?
Hypothesis 1: is it possible that changes in FF 12 cause search plugins to raise some of the same kinds of compatibility problems that HTTPS Everywhere has encountered with other extensions:  https://trac.torproject.org/projects/tor/ticket/3190 ?

(In that case a fix should be fairly easy)

Hypothesis 2: HTTPS Everywhere has also had some other crash bugs that have appeared in FF 12 (https://bugzilla.mozilla.org/show_bug.cgi?id=715496), and which we don't yet know how to address.  Is it possible this is related?
Well, there were some reports of this crash in 11 with people with the addon.  It does look related to that other bug.

The crashes in this case are similar to some seen on html5test.com without any addons, so it probably is just a very reproducible case of an underlying bug.  That would be my guess.
Keywords: topcrash
Assignee: nobody → continuation
I tried out Nightly 13, Aurora 11.0a2 and they both crashed every time when I searched for cat in the Wikipedia bar.  It did not happen immediately, presumably because a CC or GC was needed.  There are two crash signatures I am seeing, NoteXPCOMChild (invoked from nsDOMEvent::cycleCollection::Traverse) and  XPCWrappedNative::FlatJSObjectFinalized.

I also tried out Beta 10, release 10 and Release 9 and none of them crashed.  I tried multiple searches for each.

In bug 715496, bsmith says there were "There were big changes in how we handle SSL connections in Firefox 11" so perhaps that is related.
Crash Signature: [@ GCGraphBuilder::NoteXPCOMChild(nsISupports*)] [@ @0x0 | GCGraphBuilder::NoteXPCOMChild(nsISupports*)] [@ GCGraphBuilder::NoteXPCOMChild] [@ @0x0 | GCGraphBuilder::NoteXPCOMChild] → [@ GCGraphBuilder::NoteXPCOMChild(nsISupports*)] [@ @0x0 | GCGraphBuilder::NoteXPCOMChild(nsISupports*)] [@ GCGraphBuilder::NoteXPCOMChild] [@ @0x0 | GCGraphBuilder::NoteXPCOMChild] [@ XPCWrappedNative::FlatJSObjectFinalized]
It would be good to get a regression window.  This is really easy to reproduce.
Crash Signature: [@ GCGraphBuilder::NoteXPCOMChild(nsISupports*)] [@ @0x0 | GCGraphBuilder::NoteXPCOMChild(nsISupports*)] [@ GCGraphBuilder::NoteXPCOMChild] [@ @0x0 | GCGraphBuilder::NoteXPCOMChild] [@ XPCWrappedNative::FlatJSObjectFinalized] → [@ GCGraphBuilder::NoteXPCOMChild(nsISupports*)] [@ @0x0 | GCGraphBuilder::NoteXPCOMChild(nsISupports*)] [@ GCGraphBuilder::NoteXPCOMChild] [@ @0x0 | GCGraphBuilder::NoteXPCOMChild] [@ XPCWrappedNative::FlatJSObjectFinalized] [@ XPCWrappedNative::Fla…
Andrew, let me see if I can get QA to help with this. Any chance you guys could help isolate the regression window? Maybe start by looking at the correlation bsmith suggests in comment 11.
Going to track this for 11.
I think I narrowed the window to around 12-10 to 12-13, but I was getting some other kind of crash in there so I'm not sure exactly what is going on.
Oops, didn't mean to clear the flag.
I'm getting a crash with this signature sometimes: Firefox 11.0a1 Crash Report [@ XUL@0xadc236 | XUL@0xae535a | XUL@0x13ea445 | gettimeofday ]
https://crash-stats.mozilla.com/report/index/bp-2f11e01c-7824-4052-b280-126652120203

Not sure what the deal with that is.
SPDY landed on December 3.
The SSL thread changes were made on December 1.
XPCOM proxy removal in PSM happened in November.

There were more minor changes that happened since December 3. Dumb question: how do I see all the checkins that changes security/* between December 10 and December 13?
Hmm.  I'm having trouble getting a regression range.  I seem to hit more crashes from gettimeofday than anything else, such as on the 16th.
(In reply to Brian Smith (:bsmith) from comment #18)
> how do I see all the checkins that changes security/* between December 10
> and December 13?

 hg log -d "2011-12-10 to 2011-12-13" security/

Add -p to see the patches.
Perhaps this is a problem with network.http.spdy.coalesce-hostnames=true. We should test with network.http.spdy.coalesce-hostnames=false to see if that fixes the problem. I wouldn't be surprised if the extension is doing something like expecting every distinct hostname to have a different connection. (Note that https://wikipedia.org uses a *.wikipedia.org certificate.)

But, SPDY is off by default. We should try with both network.http.spdy.enabled=true and network.http.spdy.enabled=false. 

changeset:   82562:cf0b31ff2b6d
parent:      82499:271d2711b66c
user:        Patrick McManus <mcmanus@ducksong.com>
date:        Tue Dec 13 10:55:50 2011 -0500
summary:     bug 528288 - reland spdy after libxul weightloss a=khuey CLOSED TREE

changeset:   82409:dc48c0992358
user:        Ed Morley <bmo@edmorley.co.uk>
date:        Sat Dec 10 22:36:26 2011 +0000
summary:     Backout SPDY to keep us under the MSVC virtual address space limit during win PGO builds (bug 709193)
Well, Wikipedia isn't using SPDY so if the above checkins affect this, it would be because of some change in the non-SPDY-specific code (which includes some of the changes in security/, for sure).
(In reply to Brian Smith (:bsmith) from comment #21)
> Perhaps this is a problem with network.http.spdy.coalesce-hostnames=true. We
> should test with network.http.spdy.coalesce-hostnames=false to see if that
> fixes the problem. I wouldn't be surprised if the extension is doing
> something like expecting every distinct hostname to have a different
> connection. (Note that https://wikipedia.org uses a *.wikipedia.org
> certificate.)
> 
> But, SPDY is off by default. We should try with both
> network.http.spdy.enabled=true and network.http.spdy.enabled=false. 

Including some QA folks to help with identifying a root cause. Would we like them to focus on the STR in https://bugzilla.mozilla.org/show_bug.cgi?id=718284#c0 with FF11 while toggling network.http.spdy.coalesce-hostnames and network.http.spdy.enabled?
Keywords: qawanted
I've possibly found what is causing this.  I've filed a separate bug for it.
Depends on: 726777
Fixed by the patch in bug 726777?
Yes.  I guess this should be marked fixed, too?
Status: NEW → RESOLVED
Closed: 12 years ago
Keywords: qawanted
Resolution: --- → FIXED
Is there a plan to land the patch of bug 726777 in Aurora and Beta?
Target Milestone: --- → mozilla13
Version: Trunk → 11 Branch
Many thanks for getting this fixed before FF 11 went stable!
Whiteboard: [qa+]
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20100101 Firefox/11.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:11.0) Gecko/20100101 Firefox/11.0
Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20100101 Firefox/11.0
Mozilla/5.0 (X11; Linux i686; rv:11.0) Gecko/20100101 Firefox/11.0 

Verified Wikipedia search (from search bar) with HTTP-Everywhere on latest beta builds (11beta4) and no crash occured.
Marking as verified for Firefox 11.
Keywords: verified-beta
Whiteboard: [qa+] → [qa!:11] [qa+]
Verified on FF 12 Aurora.
Verified on Nightly.
Status: RESOLVED → VERIFIED
Whiteboard: [qa!:11] [qa+] → [qa!]
You need to log in before you can comment on or make changes to this bug.