Closed Bug 526545 Opened 11 years ago Closed 11 years ago

Crash reporter still can send wrong URL when crashing during pageload

Categories

(Toolkit :: Crash Reporting, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9.3a1
Tracking Status
status1.9.2 --- beta4-fixed

People

(Reporter: dietrich, Assigned: dietrich)

References

(Blocks 1 open bug)

Details

(Whiteboard: [crashkill])

Attachments

(1 file, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #411930 +++

The fix in bug 411930 doesn't go far enough for some crash testcases. We need to start notifying the crash reporter in various places both closer to the front-end and further down into the plumbing. Some places might be:

- location bar handler, after a user hits enter
- after URI is fixed up
- network, when a URL load request is first received
- network, possibly when a redirect is handled, since the redirect target might be the crash cause

This is blocking since it's imperative that the crash reports show the correct URL, and bug 411930 blocked and this bug is an extension of it.
Flags: blocking1.9.2+
Summary: Crash reporter sends wrong URL when crashing during pageload → Crash reporter still can send wrong URL when crashing during pageload
Dietrich, There are a number of reports from yesterday which I submitted for the test attached to bug 411930. If you have access to the crash dumps, you can find them for 3.7 with comments beginning with bc:. The reported urls ranged from null, to about:blank, to a previous url to the actual one.
is status1.9.2 beta2-fixed a carry over from the bug this was cloned from? I don't seen any indication this is really fixed anywhere.
(In reply to comment #2)
> is status1.9.2 beta2-fixed a carry over from the bug this was cloned from? I
> don't seen any indication this is really fixed anywhere.

Yes.
Assignee: paul → dietrich
Attached patch v1 (obsolete) — Splinter Review
put it in docshell. seems to be early stage wrt to knowing the uri to be loaded, while also catching the broadest number of scenarios of anywhere that i tested.
Attachment #411840 - Flags: review?(bzbarsky)
Whiteboard: [crashkill][has patch][needs review bz]
So I'm trying to understand what this is actually doing...  Is the idea that if we crash before another AnnotateCrashReport("URL") happens then the URL we're passing here will be used as the crash url?  That is, does crash reporter basically maintain a single "current URI" member and we're synchronously updating it here?

If so, what other callsites update that member?
AnnotateCrashReport just updates an internal hashtable, overwriting any data that was previously stored for that key. The only place that currently sets the "URL" annotation is in sessionstore code:
http://mxr.mozilla.org/mozilla-central/source/browser/components/sessionstore/src/nsSessionStore.js#2753
(Also, we should probably be careful about how often we call this, because it not only updates the hash table, but serializes the entire thing to a string on every call to AnnotateCrashReport, so it doesn't have to allocate memory in the exception handler.)
I saw the sessionstore caller; where is that in the sessionstore timeline?

It looks like this patch sets the url right before we start the network load for the page.  Would it be better to set it right before we start parsing the page instead?
Yes, I think it's strange that we would set the URL during network load before the page actually starts loading. Ideally I think we'd wait until the unload handlers of the previous page were called, but there may be some overlap there, I'm not sure.
Overlap in what sense?
would producing a short doc that outlines some of the key events once a user has initiated a request to load a page by clicking a url or hitting return in the awsome bar help here?

all performance and other consideration being equal (and maybe they are not), I'd think we would want to update the recorded url with the change as close to time the user has signaled a request to load a new page as possible.   that approach might help us to understand any crashes in dns or other early activity that happens well before we start parsing the page.  does that make sense?

at any rate, having some good timeline doc about what kind of events are covered when we see the url in the crash data, and which events are not (or might not be), would have long lasting value.
I do not think that we want to change the URL as soon as the user has made the request (clicking on a link or choosing something in the awesome bar). We'd be trading off crashes caused by network activity (before parsing/pageload started) against crashes in unload handlers, which I suspect are at least as prevalent.

The typical flow is something like:
1. user starts navigation
2. network load begins
3. when network data starts arriving, we fire the unload handlers from the old page
4. then do an internal navigation to the new page and being loading the new content

I think we want to change the crashreporter URL inbetween #3 and #4. I believe that we should be able to do this from a webprogresslistener STATE_START | STATE_IS_DOCUMENT observer, no?
Or onLocationChange, I'd think.  Or we can hack it directly into nsDocShell::Embed if we have to.
Can we keep track of two urls?  One being the one we started loading (set at point 1 from comment 12) and one set between points 3 and 4?  How serious is the performance concern?
You can set any arbitrary key name, we just have to ensure that the server collects the data. The AnnotateCrashReport implementation is here:
http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/nsExceptionHandler.cpp#727

I've never profiled it, I just assume that enumerating a hashtable and serializing all keys and values is not something you want to do anywhere time-critical. The hashtable probably isn't very big, so maybe it's not a real worry as long as we're not doing it in a tight loop anywhere.
sounds pretty cool of we could have data that says

   if two urls are in the crash report this might be a page transition bug
   if one url then its happening on the new page load.

I think that still hold true to the idea that we want to collect as little data as possible and keep it focused on helping us to understand what might be causing the crash, and what state the browser is in.

the two url idea might be getting to complicated for 3.6; so maybe we do that as future enhancement?

the most important thing for 3.6 is to get at least some (maybe small?) improvement in the quailty of the urls we can use in crash testing.  we really need that badly.
I'll test a couple of the options discussed above for single URL tracking between bsmedberg's steps 3 & 4.
Whiteboard: [crashkill][has patch][needs review bz] → [crashkill]
It seems to me the two url thing would be pretty simple to do in Gecko code; it might require a bit of server support to expose both, but that's ok...
nsIWebProgressListener.on*Change does not catch things like tabs opened in the background.

nsDocShell:internalLoad (current patch) does record loads of not-top-level URLs as well, which might not be desirable.

Step 1 could be in browser.xml:loadURI(WithFlags), which would cover top-level URL loads for all windows/tabs regardless of foreground/background.

Boris, is nsDocShell::Embed the spot for catching URL between step 3 & 4?
> nsIWebProgressListener.on*Change does not catch things like tabs opened in the
> background.

Uh... it doesn't?  Is that just the <tabbrowser> messing with you?  Because those notifications are certainly dispatched for all loads.

> nsDocShell:internalLoad (current patch) does record loads of not-top-level
> URLs

That's pretty easy to fix if desired, actually.

> Step 1 could be in browser.xml:loadURI(WithFlags), which would cover top-level
> URL loads for all windows/tabs regardless of foreground/background.

It wouldn't cover loads due to link clicks, right?

> Boris, is nsDocShell::Embed the spot for catching URL between step 3 & 4?

It's a possible spot, for sure.  Embed is what changes the current document in the window, etc.
(In reply to comment #20)
> Uh... it doesn't?  Is that just the <tabbrowser> messing with you?  Because
> those notifications are certainly dispatched for all loads.

User error. I hooked into one of the listeners in browser.js, which was filtering out some loads.

> > Step 1 could be in browser.xml:loadURI(WithFlags), which would cover top-level
> > URL loads for all windows/tabs regardless of foreground/background.
> 
> It wouldn't cover loads due to link clicks, right?

Bah, yes. I now remember that it's exactly this which led to me to patch this into docshell in the first place :P
To be clear, I'm fine with this code being in docshell if needed (e.g. if the progress listeners aren't sufficient for our purposes).  My only question in comment 8 was whether this is the right place in docshell.
looking like a progress listener might suffice. however, i'm still not getting STATE_START for background tab loads, when using gBrowser.addTabsProgressListener(). here's the sequence:

TPL::onLocationChange(): about:blank
TPL::onStateChange(): about:blank for states: STATE_STOP, STATE_IS_NETWORK, STATE_IS_WINDOW, STATE_SECURE_HIGH,
browser.xml:loadURIWithFlags(http://planet.mozilla.org/)
docshell::Embed(): http://planet.mozilla.org/
++DOMWINDOW == 15 (04F91378) [serial = 16] [outer = 04F90F58]
TPL::onLocationChange(): http://planet.mozilla.org/
TPL::onStateChange(): http://planet.mozilla.org/ for states: STATE_STOP, STATE_IS_NETWORK, STATE_IS_WINDOW, STATE_SECURE_HIGH,
onLocationChange does catch all scenarios that i tested. However, it fires much later than Embed(), which seems like too late.
ok, finally figured out what was going on w/ STATE_START. it wasn't missing, just that i was only dumping entries if aRequest.URI was not null... because it *was* for some notifications.

apparently the aRequest passed to onStateChange doesn't expose wrapped nsIChannel *if* it's a combination of STATE_START loading in a *background* tab. QI'ing to nsIChannel resolved it and i can get access to the URI.
Attached patch v2Splinter Review
this catches all top-level URLs for each load scenario that i tested, and is early enough to annotate the crash report with the proper URL in crash scenarios that the session restore annotation was not.

hm, that said, i don't think there's any reason to keep the session restore annotation any more afaict.
Attachment #411840 - Attachment is obsolete: true
Attachment #412333 - Flags: review?
Attachment #411840 - Flags: review?(bzbarsky)
Attachment #412333 - Flags: review? → review?(benjamin)
> However, it fires much later than Embed()

That's a bit surprising to me, actually...
Whiteboard: [crashkill] → [crashkill][has patch][needs review bsmedberg]
Attachment #412333 - Flags: review?(benjamin) → review+
Whiteboard: [crashkill][has patch][needs review bsmedberg] → [crashkill][has patch]
http://hg.mozilla.org/mozilla-central/rev/eb28fe3a6172
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [crashkill][has patch] → [crashkill][baking]
https://bugzilla.mozilla.org/attachment.cgi?id=410259 correctly showed up in the 20091117 crashdata. Thank you!
Whiteboard: [crashkill][baking] → [crashkill]
r- on the woo!  ->  WWWWOOOOOHHHHHOOO!  ;=) 
 
tomcat,  warp 10 on the url crash testing vm's with this new data.
lets go find some more click to crash pages.
thanks to everyone who pulled hair out on this one.
from the data from 11/17 dump the only reproducible crash for 1.9.3 was https://bugzilla.mozilla.org/attachment.cgi?id=410259

but will still monitoring the urls we get :)) so also woooo ! :)
http://www.tusfotoscaseras.com/2007/09/29/adolescentes-modelos-teens-tetonas-y-culonas/ crashes on load on win 1.9.2 and mac 1.9.3 at least (may be nsfw, I don't know).

when the session restore comes up, the previous url for the tab is displayed instead of this one. bp-abec7a01-0e8d-4d5a-98ac-729642091118 

if this bug is fixed, would session restore show the correct url in the crashed tab?
(In reply to comment #33)
> http://www.tusfotoscaseras.com/2007/09/29/adolescentes-modelos-teens-tetonas-y-culonas/
> crashes on load on win 1.9.2 and mac 1.9.3 at least (may be nsfw, I don't
> know).

That is definitely NSFW.

> if this bug is fixed, would session restore show the correct url in the crashed
> tab?

Not likely. Unless we can manage to squeeze a write-to-disk in between clicking and crashing, session restore won't have the new url saved.

This matches up with what we saw in bug 411930 where we tried to do it when sessionstore knew about the URL, but that didn't work - thus this bug :)
AFAIK this bug was not intended to fix sessionstore, just the crash reporter's reported URL. I don't think it even makes sense to optimize sessionstore to try to restore URLs that crash during load.
right, but until i can get the logs for today I wasn't sure if this was a signal that there might be additional issues. iirc, one of the symptoms of the previous test case was an inconsistency in sessionstore. We'll see in the morning.
Depends on: 529622
url in crash from comment 33 is recorded correctly in 20091118 logs. w00t!
Depends on: 530056
Could someone please comment on what to delete to resolve the regression this caused with existing profiles?  See https://bugzilla.mozilla.org/show_bug.cgi?id=529952#c4

Thanks
Depends on: 529952
Depends on: 530266
Duplicate of this bug: 530266
Blocks: FF2SM
Target Milestone: --- → mozilla1.9.3a1
You need to log in before you can comment on or make changes to this bug.