2.0b3 crash [@objc_msgSend | -[KeychainBrowserListener onLoadingCompleted:] ] or [@ objc_msgSend | -[KeychainBrowserListener fillDOMWindow:] ]

REOPENED

Status

defect
--
critical
REOPENED
10 years ago
8 years ago

People

(Reporter: alqahira, Assigned: stuart.morgan+bugzilla)

Tracking

({crash})

unspecified
All
macOS
Bug Flags:
camino2.0.2 +
camino2.0.4 -
camino2.0.5 +
camino2.0.6 -

Details

(Whiteboard: [camino-2.0.2][camino-2.0.5], crash signature, URL)

Attachments

(4 attachments)

This is our #20 topcrash in 2.0.1 now.

Interestingly, they seem to be a mix of crashes triggered during quit:

bp-b64d1a5e-bc08-447c-baa5-4f71a2091207
bp-7450d47c-baf1-455f-8691-98cbd2091207
bp-395c2904-56c6-447e-830f-8e2402091205

and crashes triggered during actual keychain autocompletion:

bp-fa19c6fb-a78d-452a-b71a-96c232091207
bp-8e038df7-f9d5-469c-a41f-fb7932091205

Not sure what these two were, but they don't really fall into either category:

bp-895cee91-cb43-420b-8a2a-22d152091207
bp-ca98340f-9ac6-45c7-9f90-bf9b22091206
(Assignee)

Comment 4

9 years ago
This is really speculative, but it seems worth a shot. Rationale:
- CHBrowserView owns CHBrowserListener, which is an addref'd core object
- BrowserWrapper creates a KeychainBrowserListener, which has a weak reference to the BW's CHBrowserView and passes sole ownership to that CHBrowserView's CHBrowserListener object.
- BrowserView calls -[CHBrowserView destroyBrowser] during the teardown process, and that de-ref's the CHBrowserListener object. However, there are public calls that allow other objects to get addref'd references to the CHBrowserListener, so it's not clear that we can actually count on CHBrowserListener being *destroyed* at that point (thus dealloc'ing the keychain listener with its soon-to-be-invalid weak ref).

This just makes it so that we explicitly unregister the keychain listener during teardown, so there's no chance of it outliving the CHBrowserView, or even just the other core bits destroyed during teardown.
Attachment #422702 - Flags: superreview?(mikepinkerton)
Comment on attachment 422702 [details] [diff] [review]
speculative change

sr=pink

good eval.
Attachment #422702 - Flags: superreview?(mikepinkerton) → superreview+
Landed on cvs trunk and CAMINO_2_0_BRANCH.

Marking this FIXED since we landed something, but we'll have to monitor 2.0.2 crashes once released (there are too few 2.1a1pre reports of this to tell anything, and none for 2.0.2pre).
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Flags: camino2.0.2+
Resolution: --- → FIXED
Whiteboard: [camino-2.0.2]
"I logged into my facebook account and opened ZooWorld (Flash Game) and down he was :/"

This is just a completely baseless guess, but is is possible that the page is tearing down a password field (or itself) while we're trying to fill?

We know from Ilya that facebook is just one giant modification of the same page (URL), and another place that people seem to be hitting this these days is Gmail, which is similar in that respect.

We have people on 10.6 hitting this; if we didn't have Breakpad (or if Breakpad had http://code.google.com/p/google-breakpad/issues/detail?id=332 ), we'd be able to get the selector from the OS, and maybe get new hints.
https://crash-stats.mozilla.com/report/list?version=Camino%3A2.0.3&build_id=&query_search=signature&query_type=exact&query=&date=2010-06-23%2009%3A00%3A00&range_value=2&range_unit=weeks&hang_type=any&process_type=any&plugin_field=&plugin_query_type=&plugin_query=&do_query=&signature=objc_msgSend%20|%20-[KeychainBrowserListener%20onLoadingCompleted%3A]&missing_sig=&page=1

"went to facebook homepage and then it stoped working" (facebook.com)
"just watching a show online" (www.southparkstudios.com)
"I was logging into my e-mail" (www.yahoo.com)
"I lookedup replacement colorado license" (www.google.com)
"farmville flash" (apps.facebook.com)

On the plus side, we seem to be catching the right URLs, since the URLs/domains for all of these comments appear to match up with the user's comments.  On the minus side, this kicked up a bunch this week, and we still have no ideas what's going on :(
Recent comments:

"why did that close for not reson i was on petville a game on facebook and then it closed plz fix it- before i was playing" (facebook.com)
"wasa on Facebook when it suddenly quit" (facebook.com)
"Surfing... nothing special" (mashable.com)
"nothing special happened before the crash .... just clicked on a link to return to that page (cnn.com)" (cnn.com)
"no i will not" (cbo/start)
"It froze so hard even the force quit wouldn't work" (facebook.com)
"I was in the process of signing into my email with road runner." (rr.com)
"i was clicking to see a website for nassau coliseum fair" (google.com)
I missed getting flags on this for 2.0.4, but it's been #2 or #3 behind Flash, and recently #1, topcrash for 2.0.3.
Flags: camino2.0.4-
There are also a handful of  [@ objc_msgSend | -[KeychainBrowserListener fillDOMWindow:] ] crashes, with -[KeychainBrowserListener onLoadingCompleted:] as the next frame below that; I presume they're related.

Comments on -[KeychainBrowserListener onLoadingCompleted:] continue to look broadly like comment 10; some interesting ones this week were

"closing window" (facebook.com)
"Tried to scroll to top of page with scroll bar" (caminoplanet, which has no form fields at all)
(loosely, from German) "switching tabs" (nsfw image site)
(In reply to comment #12)
> There are also a handful of  [@ objc_msgSend | -[KeychainBrowserListener
> fillDOMWindow:] ] crashes, with -[KeychainBrowserListener onLoadingCompleted:]
> as the next frame below that; I presume they're related.

fillDOMWindow: crashes are at http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/camino/src/formfill/KeychainService.mm&rev=1.48.2.3&mark=1370#1370, which is [self fillDOMWindow:frameNode];

onLoadingCompleted: crashes are at http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/camino/src/formfill/KeychainService.mm&rev=1.48.2.3&mark=1386#1386, which is [self fillDOMWindow:domWin];

We null-check the domWin and frameNode just before calling those.

It seems like, in spite of the ownership changes in comment 4, there's still some way for us to lose |self| while this code is running, so that we send the missing |self| messages and crash :(

https://crash-stats.mozilla.com/report/index/76f2498d-1acf-42b3-bea6-c93bd2100815 is an interesting report; somehow this code was running on http://www.mozilla.org/projects/camino/homepage.html while the user was trying to quit!
To capture from irc from last week:

[4:53pm] smorgan: Would logging that crash help? We don't have any live users, do we?
[4:53pm] ardissone: no, but we could mine crash-stats for emails
[4:54pm] smorgan: Ah, right
[4:54pm] ardissone: given the numbers, we're likely to have at least a few emails
[4:55pm] ardissone: of the 19 with comments, 7 have emails
[4:55pm] smorgan: Nice
[4:56pm] ardissone: it's much more difficult to mine from the rest of the 135 crashes, but we can
[4:56pm] ardissone: i have a bug on file about making that easier, iirc
[4:56pm] ardissone: or maybe for URLs; i really can't remember any more 
[4:57pm] ardissone: i file about 1 bug/week on crash-stats 
[5:11pm] smorgan: I hate this crash :P
[5:12pm] ardissone: :
[5:12pm] ardissone: er :P
[5:12pm] ardissone: don't we all :(
[5:12pm] smorgan: I don't believe those lines, but I don't see any obvious tail call optimization opportunities either
[5:13pm] smorgan: In fact I would expect *that* frame to be tail-call'd out
[5:13pm] smorgan: Maybe I should check the assembly
[6:24pm] smorgan: ardissone: as I suspected, the problem isn't that self has somehow magically gone bad within a few lines: http://crash-stats.mozilla.com/report/index/7d3caba7-0ef9-47cf-9799-10ec22100908
[6:24pm] smorgan: It's probably 'sploding inside fill
[6:25pm] ardissone: so, the stacks are all bogus?
[6:26pm] smorgan: I don't think they are bogus exactly, just short a frame
[6:26pm] smorgan: Which meshes with the second version being the recursive call
[6:26pm] smorgan: If it's actually inside the fill method, then both forms are the same crash, but one having recursed one more time
[6:27pm] ardissone: both forms being KBL and CHBL, or?
[6:28pm] smorgan: ardissone: both forms being the two crashes you mention at the end of the bug
[6:28pm] smorgan: Both KBL
[6:29pm] ardissone: smorgan:  oh, gotcah
[6:30pm] ardissone: but still no idea what's dying?
[6:30pm] smorgan: ardissone: not yet. I was just testing a theory
[6:30pm] ardissone: k

Also, fwiw, we've had weird crashes in the *Listeners' on* functions before: bug 377203, bug 351504
Summary: 2.0b3 crash [@objc_msgSend | -[KeychainBrowserListener onLoadingCompleted:] ] → 2.0b3 crash [@objc_msgSend | -[KeychainBrowserListener onLoadingCompleted:] ] or [@ objc_msgSend | -[KeychainBrowserListener fillDOMWindow:] ]
(Assignee)

Comment 15

9 years ago
The shutdown version of this looks really obvious, actually. The crash is happening inside of NS_TermEmbedding, which comes after we send the shutdown notification that causes the shared KeychainService instance to be released, at which point it's a garbage value.

(I'm deliberately not setting it to nil in this patch because that would be just as bad, since it would cause us to try to re-build a bunch of core listeners during XPCOM shutdown.)
Attachment #477936 - Flags: superreview?(mikepinkerton)
(Assignee)

Comment 16

9 years ago
I played with some local breakpad dumps of deliberate crashes in release builds, by the way, and a bad objc_msgSend pretty much anywhere in that function seems to be missing the top fillDOMWindow: frame, so crashing on the first use of [KeychainService instance] there would give the stack we see for shutdown.
Comment on attachment 477936 [details] [diff] [review]
shutdown fix [landed]

sr=pink
Attachment #477936 - Flags: superreview?(mikepinkerton) → superreview+
(Assignee)

Comment 18

9 years ago
Comment on attachment 477936 [details] [diff] [review]
shutdown fix [landed]

Shutdown fix landed as http://hg.mozilla.org/camino/rev/77f937b50297
Attachment #477936 - Attachment description: shutdown fix → shutdown fix [landed]
(Assignee)

Updated

9 years ago
Flags: camino2.0.5? → camino2.0.5+
(In reply to comment #18)
> Comment on attachment 477936 [details] [diff] [review]
> shutdown fix [landed]
> 
> Shutdown fix landed as http://hg.mozilla.org/camino/rev/77f937b50297

And on CAMINO_2_0_BRANCH for 2.0.5.
Whiteboard: [camino-2.0.2] → [camino-2.0.2][camino-2.0.5]
bp-861d074f-01d6-47e9-aefc-f7a2d2101026 [@ _objc_search_builtins ] looks related, too. (2.0.4 had exactly 1 instance of this in the past 4 weeks, too: bp-60114a6f-dabd-41a5-b0d5-cca3a2101022)
Flagging so we don't forget about this (heh!) for release tracking purposes, although there may be nothing new ready for 2.0.6.
Flags: camino2.0.6?
Flags: camino2.0.6? → camino2.0.6-
I dug into this as best I could with the tools available (which means, not much).

The only graph I can get is attached, which shows which builds had crashes in this signature in the past 12 weeks.  Roughly, it tells us that during the past 12 weeks (which is about the time we released 2.0.6), we have a few people still using 2.0, 2.0.1, 2.0.2, 2.0.3 and 2.0.4; more people still using 2.0.5; a bunch of people using 2.0.6, and a couple of nightly users :P

By way of comparison, we landed the speculative lifetime fix for 2.0.2, and the shutdown fix for 2.0.5; it was during 2.0.5's lifetime that this crash spiked and regularly beat out all of Flash combined for our top crash (for 2.0.4 and early 2.0.6, it played second fiddle to Flash but was usually still the single-signature winner).

After the new year (or, at least, after we returned from the end-of-year meeting hiatus), it was all but gone, and overall Camino 2.0.x/Camino 2.0.6 usage hadn't declined appreciably (it grew slightly in Jan, as KBL continued to decline).
Actually, I realized I could scrape the data from the reports table and come up with the graph I wanted, manually.

Here's KBL crashes for 2.0.5 and 2.0.6--yellow line--from 13 Nov (the beginning of this dataset) until 5 Feb (the day I ran the query).  2.0.6 RC'd late on 11 Nov and was released on 16 Nov.  The blue line is total 2.0.5/2.0.6 update pings for the days in question.

Go figure.
Huh. Attachment 510219 [details] is very interesting because it implies that nothing we did in this bug fixed anything and, very likely, something on the web changed on the 17th of December.

I'm curious if Firefox has had a similar crash (though it'd have a very different signature). It's very possible that there's some malware or a bad script out there that was causing this and it got fixed.

It'd be very useful (as Smokey mentioned) to get all URLs (in aggregate) for this signature. That's bug 550538. It'd help us determine if there was a specific page (or set of pages) causing this bug... and if there were just a few, what they were. Maybe we'd discover some malware.

(Hi chofmann, any chance you can lean on bug 550538 and research if Firefox saw a similar change in crash levels around this time for one or more signatures? It wouldn't be *every* signature and it'd likely only be on Mac.)
(Assignee)

Comment 25

8 years ago
> It's very possible that there's some malware or a bad
> script out there that was causing this and it got fixed.

Malware seems unlikely; more likely would be some specific, reasonably popular login page triggered a latent Camino bug due to something about the way it worked, and they happened to change it.
We had a handful of these this week, so I looked at the URLs manually: google search, myspace, uol.com.br, amazon music download, hamptoninn.com logging out

…And "Chose a bookmark to fivethirtyeight.com" (and we show he crashed on http://www.fivethirtyeight.com/; that redirects to http://fivethirtyeight.blogs.nytimes.com/, which could have been loading and not yet have updated our URL)
Crash Signature: [@objc_msgSend | -[KeychainBrowserListener onLoadingCompleted:] ] or [@ objc_msgSend | -[KeychainBrowserListener fillDOMWindow:] ]
"Just" had this crash myself in yesterday's nightly while making hotel reservations: bp-d9e8bd62-9bb8-496e-ac8b-cfddb2110716

I had logged into our InterContinental Hotels Group account via a form in a "div" that "pops out" when clicking on a link (e.g., the "Sign In" link on https://secure.hiexpress.com/hotels/us/en/reservation).

Thereafter, I noticed that <select>s were "stuck" and not popping open when I filled out the remainder of my reservation; in fact, the first <select> popped open when I hit the submit button on that page of the reservation process.  (Re)selected the appropriate option to close the <select> and proceeded with the reservation successfully and printed, and thought nothing further of the oddness.

Switched to another tab in the same window, where I had a Choice Hotels page open (and had previously logged in to the account there), and set out making a reservation at a Quality Inn.  When I clicked the submit button on the page that actually does the final reservation submission, Camino crashed (apparently before transmitting the data).

So it still seems like some kind of lifetime issue, but I'm not sure how weirdness in two different tabs is related, and I have no way of reproducing this :/
I noticed in bug 343232 comment 7 that Firefox does some teardown stuff on pagehide; any chance that these crashes are related to that (and that we need to be tearing down our KBL on pagehide, aka the page going into bfcache)?
You need to log in before you can comment on or make changes to this bug.