Closed Bug 516307 Opened 16 years ago Closed 16 years ago

Crash [@nsImageLoadingContent::OnStopDecode(imgIRequest*, unsigned int, unsigned short const*) ]

Categories

(Core :: Graphics: ImageLib, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.3a1
Tracking Status
status1.9.2 --- unaffected
status1.9.1 --- unaffected

People

(Reporter: aja+bugzilla, Assigned: bholley)

References

Details

(Keywords: regression, Whiteboard: [sg:dos] [decodeondraw])

Attachments

(1 file, 1 obsolete file)

I just pushed bug 516311 (f34cc41267d8), which disables decode-on-draw and discarding for the moment. In general this should be fixable by a simple null check. However, I'm curious as to why it's null (hoping bz can help here). aja - once that build rolls, can you see if you can reproduce the crash?
Attached patch patch v1 (obsolete) — Splinter Review
Added a patch, flagging bz for review. Hopefully he can weigh in on whether we can assume that we have a document.
Assignee: nobody → bobbyholley
Attachment #400415 - Flags: review?(bzbarsky)
> + nsIPresShell* shell; I think you want to initialize this to null.
Attached patch patch v2Splinter Review
(In reply to comment #3) > > + nsIPresShell* shell; > > I think you want to initialize this to null. yeah just saw that. fixed patch. dveditz thinks this might not be a null deref issue. Indeed, one of the crash stacks has a non-null address. So I'm not seeking review on this patch yet. Ideas?
Attachment #400415 - Attachment is obsolete: true
Attachment #400415 - Flags: review?(bzbarsky)
All the stacks referenced in comment 0 are non-null. If we're trying to call a function on a random chunk of memory that could be bad. (Is this really an "imagelib" bug? the crash is in content)
Group: core-security
Whiteboard: [sg:critical?]
A testcase would be really helpful, especially one that results in a non-null crash.
Keywords: testcase-wanted
Comment on attachment 400418 [details] [diff] [review] patch v2 >+ nsIPresShell* shell = nsnull; >+ if (doc) >+ shell = doc->GetPrimaryShell(); What about using this format instead: nsIPresShell* shell = doc ? doc->GetPrimaryShell() : nsnull; Just seems more readable to me...
> Perhaps these mentions of crashiness are related as well; > http://forums.mozillazine.org/viewtopic.php?p=7505565#p7505565 At least some of the other crashes discussed here are treebuilder-related http://forums.mozillazine.org/viewtopic.php?p=7506845#p7506845
Whiteboard: [sg:critical?] → [sg:critical?] [decodeondraw]
(In reply to comment #1) > I just pushed bug 516311 (f34cc41267d8), which disables decode-on-draw and > discarding for the moment. > > In general this should be fixable by a simple null check. However, I'm curious > as to why it's null (hoping bz can help here). > > aja - once that build rolls, can you see if you can reproduce the crash? http://hg.mozilla.org/mozilla-central/rev/f34cc41267d8 <== this hourly build still crashes (no symbols, obviously) http://crash-stats.mozilla.com/report/index/bp-486e2b7d-7e2d-4c7a-8407-64e6f2090913
(In reply to comment #9) > http://hg.mozilla.org/mozilla-central/rev/f34cc41267d8 <== this hourly build > still crashes (no symbols, obviously) > http://crash-stats.mozilla.com/report/index/bp-486e2b7d-7e2d-4c7a-8407-64e6f2090913 aja - thanks, that's really useful to know. What this tells us is that the behavior isn't related to decode-on-draw, and instead is isolated to the code in bug 512435 (there were 5 patches pushed at once in the decode-on-draw push). That bug is, indeed, where the crash is, but this removes suspicion from all the new decode-on-draw machinery as the cause of somehow getting a bogus document from GetOurDocument(). In summary - a new feature was added to support decode-on-draw, and that feature is crashing. But the feature crashes regardless of whether decode-on-draw is on or off.
Blocks: 512435
No longer blocks: 435296
Hmm. GetOurDocument() _could_ conceivably return null if the node has outlived the document. It should never be returning garbage. Looking at the stacks (the ones with symbols), they all look like Firebug or some similar extension is in use (in particular, something that is intercepting all HTTP network traffic). Is that the case for those who can reproduce this problem? Does starting up in safe mode work ok?
Here's the list of extensions used for most of my crashes...though it also crashed in -safe-mode: about:accessibilityenabled 1.2 About:Tab 0.0.36 Console² 0.5 Content Security Policy 0.2.0 DOM Inspector 2.0.3 Firebug 1.5X.0a24 Firefox Accessibility Extension 1.5.53.0 Force-TLS 1.0.3 IE8 Activities (Accelerators) for Firefox 0.7.6 KallOut - Accelerators for Firefox 1.1.21.26 Link Widgets 1.6.6 Live HTTP headers 0.15 mozNetworkPrioritizer 0.1 Nightly Tester Tools 2.0.2 Operator 0.9.4.1 Page Speed 1.2 RequestPolicy 0.5.8 Update Channel Selector 1.5 Weave 0.7pre2 Web Developer 1.1.8 WebChunks 0.30 YSlow 2.0.0
erm....the -safe-mode crash had same crash signature, and first several lines looked the same
It did? It had the nsStreamListenerWrapper::OnStopRequest?
> It did? It had the nsStreamListenerWrapper::OnStopRequest? Yes http://crash-stats.mozilla.com/report/index/bp-70d71633-00a2-406c-b406-290402090913 this one was in -safe-mode ^^^
OK, that stack does NOT have nsStreamListenerWrapper, which restores my sanity somewhat... Do you happen to know what websites (if any) are coming up when you start the browser? Does the crash happen if you start against a brand-new profile? You can use something like: mkdir /tmp/test-profile firefox -profile /tmp/test-profile to test that.
So far, aja is the only instance of crashes on non-null addresses[1] (I think, anyways.) It'd be really interesting to know what about his/her environment is causing it to happen... http://tr.im/yHSU
should I seek review and try to land the null patch then, and see whether it fixes most of the cases? bz did mention that it's possible for GetOurDocument() to return null...
So, I found this on Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.3a1pre) Gecko/20090914 Minefield/3.7a1pre when I was restarting the browser. It was incredibly reproducible with my current set of open tabs in two windows when I did the following: 1. Start up the browser 2. Ensure weave is syncing while all pages are loading 3. Click on the > to scroll the tab strip to the right and hold it down. 4. Click on any specific site to see it load in the content pane Repeat steps 3 and 4 while everything else is still loading until the browser crashes. = Crashes = * http://crash-stats.mozilla.com/report/index/f7529c99-7bb4-4a6b-adff-febff2090915?p=1 * http://crash-stats.mozilla.com/report/index/b6d1edf2-709b-4c07-8ce0-df6202090915?p=1 * http://crash-stats.mozilla.com/report/index/d3a96e95-f842-4e66-ab4c-5c2702090915?p=1
Damn, hit commit too soon. And I disabled weave and it no longer occurs - same set of open pages. weave 0.6 is my version. It's hard to tell if it is actually something in weave or if it is merely that weave slows down the browser enough that it causes the crash to become apparent. Sorry for the spam.
I hit this crash today after I updated to the latest nightly build - http://crash-stats.mozilla.com/report/index/bp-d5b35d9f-5134-4352-a91a-2f8d92090916. No extensions in my profile.
Comment on attachment 400418 [details] [diff] [review] patch v2 Make it look like comment 7, and r=me. Ted says the crash address for breakpad on Linux is the address of the instruction, not the place the pointer is pointing. So I'm 99% sure aja is just seeing null derefs. That's certainly what dbaron just saw when he caught it in the debugger.
Attachment #400418 - Flags: review+
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a1
dbaron has convinced me I was misinterpreting Linux crash-stats, and that it's a null deref like all the mac stacks.
Whiteboard: [sg:critical?] [decodeondraw] → [sg:dos] [decodeondraw]
Got one on a win build today, too: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.3a1pre) Gecko/20090916 Minefield/3.7a1pre (.NET CLR 3.5.30729) ID:20090916050534 http://crash-stats.mozilla.com/report/index/c383bd05-4102-471b-8561-a40212090916
OS: Linux → All
No crashes so far with Linux and win32 hourlies, and Linux nightly since the patch landed.
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: