Closed
Bug 910881
(CVE-2013-5596)
Opened 11 years ago
Closed 11 years ago
Crash on long pages (neverending reddit with images enabled)
Categories
(Core :: Graphics: ImageLib, defect)
Tracking
()
RESOLVED
FIXED
mozilla26
People
(Reporter: ezra, Assigned: seth)
References
Details
(Keywords: crash, regression, sec-high, Whiteboard: [adv-main25+][adv-esr24-1+])
Crash Data
Attachments
(4 files)
48.49 KB,
application/octet-stream
|
Details | |
31.46 KB,
application/octet-stream
|
Details | |
32.45 KB,
application/octet-stream
|
Details | |
5.62 KB,
patch
|
jdm
:
review+
abillings
:
approval-mozilla-aurora+
abillings
:
approval-mozilla-beta+
abillings
:
approval-mozilla-esr24+
abillings
:
sec-approval+
|
Details | Diff | Splinter Review |
1. Install Reddit Enhancement Suite
2. Enable Neverending Reddit
3. Enable Inline Images
4. Click on 'Show Images'
5. Start browsing (17+ pages opened)
6. CRASH
I don't think this is because of Reddit Enhancement Suite but that Firefox simply can't handle a page that gets this long with this amount of images and other content loaded.
That smells OOM crash. What's the memory use before the crash?
Could you test with a clean profile (see https://support.mozilla.org/en-US/kb/profile-manager-create-and-remove-firefox-profiles) and only with your STR, then go to about:memory and save the memory log.
Flags: needinfo?(ezra)
Reporter | ||
Comment 2•11 years ago
|
||
Memory Report at page 1
Reporter | ||
Comment 3•11 years ago
|
||
Memory report at page 16
Reporter | ||
Comment 5•11 years ago
|
||
Attached 3 memory reports, Fx crashed around page 26-27 this time.
Sadly the crash reporter could not find anything usable: https://crash-stats.mozilla.com/report/index/bb130fb3-9154-451d-ae93-b53e12130831
This was the previous crash report (without a clean profile): https://crash-stats.mozilla.com/report/index/e87be6fc-9d8a-4b11-9d1d-ee8f72130829
![]() |
||
Comment 6•11 years ago
|
||
It *smells* like OOM alright, but the crash reports say that quite a bit of virtual and physical memory is still available.
Comment 7•11 years ago
|
||
Indeed, the memory usage indicators all look alright, and I don't see an OOMAllocationSize in the raw JSON, so there's none of the usual OOM hallmarks (aside from the empty dump). I can proably get a good stack out of bp-e87be6fc-9d8a-4b11-9d1d-ee8f72130829 with a little fiddling.
Comment 8•11 years ago
|
||
Stack from bp-e87be6fc-9d8a-4b11-9d1d-ee8f72130829:
Operating system: Windows NT
6.1.7601 Service Pack 1
CPU: x86
AuthenticAMD family 20 model 1 stepping 0
2 CPUs
Crash reason: EXCEPTION_BREAKPOINT
Crash address: 0xff7ff56
Thread 20 (crashed)
0 xul.dll!NS_CycleCollectorSuspect2 [nsCycleCollector.cpp:a55c55edf302 : 2938 + 0x0]
eip = 0x0ff7ff56 esp = 0x0efff670 ebp = 0x0efff6a8 ebx = 0x0efff76c
esi = 0x00000000 edi = 0x175534c0 eax = 0x00000000 ecx = 0x0000002e
edx = 0x00000008 efl = 0x00000246
Found by: given as instruction pointer in context
1 xul.dll!imgStatusTracker::SyncNotifyState(nsTObserverArray<imgRequestProxy *> &,bool,unsigned int,nsIntRect &,bool) [imgStatusTracker.cpp:a55c55edf302 : 514 + 0x40]
eip = 0x0fbcd88f esp = 0x0efff6b0 ebp = 0x0efff6c8
Found by: previous frame's frame pointer
2 xul.dll!imgStatusTracker::SyncNotifyDifference(imgStatusTracker::StatusDiff) [imgStatusTracker.cpp:a55c55edf302 : 581 + 0x27]
eip = 0x0fbcd97c esp = 0x0efff6d0 ebp = 0x0efff75c
Found by: call frame info
3 xul.dll!mozilla::image::RasterImage::FinishedSomeDecoding(mozilla::image::RasterImage::eShutdownIntent,mozilla::image::RasterImage::DecodeRequest *) [RasterImage.cpp:a55c55edf302 : 3510 + 0x18]
eip = 0x0fbbf9b2 esp = 0x0efff764 ebp = 0x0efff7d0
Found by: call frame info
4 xul.dll!mozilla::image::RasterImage::DoError() [RasterImage.cpp:a55c55edf302 : 3363 + 0xb]
eip = 0x0fbbc8aa esp = 0x0efff7d8 ebp = 0x0efff810
Found by: call frame info
5 xul.dll!mozilla::image::RasterImage::WriteToDecoder(char const *,unsigned int) [RasterImage.cpp:a55c55edf302 : 2692 + 0x42]
eip = 0x0fbbd203 esp = 0x0efff7ec ebp = 0x0efff810
Found by: call frame info
6 xul.dll!mozilla::image::RasterImage::DecodeSomeData(unsigned int) [RasterImage.cpp:a55c55edf302 : 3305 + 0xe]
eip = 0x0fbbd2f4 esp = 0x0efff7fc ebp = 0x0efff810
Found by: call frame info
7 xul.dll!mozilla::image::RasterImage::DecodePool::DecodeSomeOfImage(mozilla::image::RasterImage *,mozilla::image::RasterImage::DecodePool::DecodeType,unsigned int) [RasterImage.cpp:a55c55edf302 : 3845 + 0x8]
eip = 0x0fbbd46e esp = 0x0efff818 ebp = 0x0efff880
Found by: call frame info
8 xul.dll!mozilla::image::RasterImage::DecodePool::DecodeJob::Run() [RasterImage.cpp:a55c55edf302 : 3714 + 0x17]
eip = 0x0fbbeb06 esp = 0x0efff888 ebp = 0x0efff8a0
Found by: call frame info
9 xul.dll!nsThreadPool::Run() [nsThreadPool.cpp:a55c55edf302 : 194 + 0x8]
eip = 0x0fabf2c2 esp = 0x0efff8a8 ebp = 0x0efff8c4
Found by: call frame info
10 xul.dll!nsThread::ProcessNextEvent(bool,bool *) [nsThread.cpp:a55c55edf302 : 627 + 0x14]
eip = 0x0fa5fe96 esp = 0x0efff8d4 ebp = 0x0efff8c4
Found by: call frame info
11 xul.dll!nsThread::ThreadFunc(void *) [nsThread.cpp:a55c55edf302 : 265 + 0x1d]
eip = 0x0fab05f8 esp = 0x0efff94c ebp = 0x0efff8c4
Found by: call frame info
12 nss3.dll!_PR_NativeRunThread [pruthr.c:a55c55edf302 : 397 + 0x9]
eip = 0x638d37e0 esp = 0x0efff974 ebp = 0x0efff8c4
Found by: call frame info
13 nss3.dll!pr_root [w95thred.c:a55c55edf302 : 90 + 0xd]
eip = 0x638d4bcd esp = 0x0efff998 ebp = 0x0efff9d0
Found by: call frame info
Comment 9•11 years ago
|
||
Frame 0 is hitting the MOZ_CRASH here: http://hg.mozilla.org/releases/mozilla-release/annotate/a55c55edf302/xpcom/base/nsCycleCollector.cpp#l2938
via frame 1 here: http://hg.mozilla.org/releases/mozilla-release/annotate/a55c55edf302/image/src/imgStatusTracker.cpp#l514
Comment 10•11 years ago
|
||
This is calling NS_CycleCollectorSuspect2 on a decoding thread, which doesn't have a cycle collector. Entirely a graphics-system bug. According to blame, joe and sfowler are the most likely candidates for dealing with this.
Status: UNCONFIRMED → NEW
Component: General → ImageLib
Ever confirmed: true
Product: Firefox → Core
Comment 11•11 years ago
|
||
That probably means that a cycle collected object is being Release'd in the decoding thread.
Comment 12•11 years ago
|
||
Does this happen if you set image.multithreaded_decoding.enabled to false? What about if it's left at the default, but image.multithreaded_decoding.limit is set to 1?
Reporter | ||
Comment 13•11 years ago
|
||
Sadly Fx crashed even when image.multithreaded_decoding.enabled was set to false. and to make it even worse, the crash dump was corrupted :(
https://crash-stats.mozilla.com/report/index/490aefff-0b07-4dcf-9903-9f5f22130908
Reporter | ||
Comment 14•11 years ago
|
||
I'm still suspecting OOM actually, when browsing reddit for that long Fx's memory rises to 1.7GB according to task manager, and when switching to another tab (about:memory) for example, memory drops to about 700MB, and after switching back to reddit it increases to 1.5GB again.
Could this explain why the about:memory reports don't show that Fx is running very high on memory?
Every time shortly after memory hits around 1.7GB of memory Fx crashes.
Comment 15•11 years ago
|
||
There's a MOZ_ASSERT in RasterImage::FinishedSomeDecoding that we're running on the main thread which is a big warning flag here. I'm not sure where in the stack we switch from threadsafe to non-threadsafe code, but DoError looks possible to me.
Reporter | ||
Comment 16•11 years ago
|
||
> What about if it's left at the default, but image.multithreaded_decoding.limit is set to 1?
Also crashed :-(
https://crash-stats.mozilla.com/report/index/6827dca4-493d-40b7-9962-1673d2130908
Comment 17•11 years ago
|
||
(In reply to Ezra from comment #14)
> I'm still suspecting OOM actually, when browsing reddit for that long Fx's
> memory rises to 1.7GB according to task manager, and when switching to
> another tab (about:memory) for example, memory drops to about 700MB, and
> after switching back to reddit it increases to 1.5GB again.
>
> Could this explain why the about:memory reports don't show that Fx is
> running very high on memory?
>
> Every time shortly after memory hits around 1.7GB of memory Fx crashes.
Try opening about:memory in a new window instead of a different tab, that will prevent image data from being discarded when the problem tab becomes a background tab.
Comment 18•11 years ago
|
||
(In reply to Ezra from comment #14)
> I'm still suspecting OOM actually, when browsing reddit for that long Fx's
> memory rises to 1.7GB according to task manager, and when switching to
> another tab (about:memory) for example, memory drops to about 700MB, and
> after switching back to reddit it increases to 1.5GB again.
>
> Could this explain why the about:memory reports don't show that Fx is
> running very high on memory?
>
> Every time shortly after memory hits around 1.7GB of memory Fx crashes.
Oh, and we also have some improved memory handling for images starting in 24 if you want to try beta (or aurora or nightly).
Updated•11 years ago
|
Group: core-security
Comment 20•11 years ago
|
||
(In reply to Josh Matthews [:jdm] from comment #15)
> There's a MOZ_ASSERT in RasterImage::FinishedSomeDecoding that we're running
> on the main thread which is a big warning flag here. I'm not sure where in
> the stack we switch from threadsafe to non-threadsafe code, but DoError
> looks possible to me.
Jeff identified that WriteToDecoder should be on the decoding thread, FinshedSomeDecoding should be on the main thread, so we just need to figure out if DoError should be on the decoding thread or main thread and make sure the right thing happens. Doesn't look we "designed" for this, it may not have been obvious what DoError would cause.
Assignee | ||
Comment 21•11 years ago
|
||
I'm working on this.
One lame thing is that this bug will look different on different platforms. This call stack acquires mDecodingMutex recursively, which will work on Windows since the underlying EnterCriticalSection primitive is recursive, but will deadlock on platforms where we use pthreads.
Flags: needinfo?(seth)
Assignee | ||
Comment 22•11 years ago
|
||
Unfortunately, even following the STR, I didn't have much luck reproducing this locally. This may be due to the platform difference or due to a dependence on the specific images we happened to come across (this *is* error-handling code, after all).
It's pretty apparent that RasterImage::|DoError| is not currently safe to call off-main-thread, though. This patch adds a |HandleErrorWorker| which handles an error on the main thread in these cases.
I added a new |mPendingError| flag to prevent further decoding work even before the main thread handled the error, and to prevent more than one |HandleErrorWorker| from being dispatched. Although this overlaps somewhat with the duties of |mError|, just setting |mError| to true and unconditionally executing the error-handling code when |HandleErrorWorker| ran would invite concurrency bugs since |DoError| might be called from another thread as well - we might end up with duplicate notifications and similar issues.
Attachment #801954 -
Flags: review?(josh)
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → seth
Assignee | ||
Comment 23•11 years ago
|
||
Try job here:
https://tbpl.mozilla.org/?tree=Try&rev=48ac7c6129ee
Assignee | ||
Comment 24•11 years ago
|
||
Try looks OK. Ezra, could you check if you still observe the problem with this build? You can grab it here:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mfowler@mozilla.com-48ac7c6129ee/try-win32/firefox-26.0a1.en-US.win32.installer.exe
Flags: needinfo?(ezra)
Comment 25•11 years ago
|
||
We were looking at rating this bug during pre-CritSmash today but are unsure how controllable a hypothetical exploit could be. Generally threading bugs involving writing can be sec-high-ish. Seth, Milan, would you rate this sec-high? (Feel free to set the rating)
Comment 26•11 years ago
|
||
I want to say that anything in the cycle collector is (at least) sec-high, but I'm not sure I can recall all the reasons why I think that.
Assignee | ||
Comment 27•11 years ago
|
||
I'd say that sec-high is reasonable to me. Marked.
Keywords: sec-high
Updated•11 years ago
|
Attachment #801954 -
Flags: review?(josh) → review+
Assignee | ||
Comment 28•11 years ago
|
||
Pushed. Leaving open for the time being; I'd like to hear back from Ezra if possible.
https://hg.mozilla.org/integration/mozilla-inbound/rev/8af98453b511
Whiteboard: [leave open]
Reporter | ||
Comment 29•11 years ago
|
||
I have not been able to test your try build yet, will do so tonight, sorry about that.
Comment 30•11 years ago
|
||
Reporter | ||
Comment 31•11 years ago
|
||
Just tested seth's try build, it seems to be working great!
Scrolled for a loooooong time, memory didn't rise as fast as it used to which is great, repeatedly got to a point where RES didn't want to scroll any further.
Reloaded the page a couple of times to keep the memory usage high and keep scrolling, finally reached the point of 1.7GB, Fx did stall for a little while, before dropping back to 600MB of memory and continuing on it's merry way.
So I guess it's fixed, awesome :-)
Flags: needinfo?(ezra)
Assignee | ||
Comment 32•11 years ago
|
||
(In reply to Ezra from comment #31)
> So I guess it's fixed, awesome :-)
I'm glad to hear that! Thanks for reporting the bug and for testing the fix.
Feel free to reopen if you run across the bug again. For now, I'm marking this bug as resolved.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [leave open]
Comment 33•11 years ago
|
||
Patches for security bugs have a landing policy to prevent 0-daying our Release users
https://wiki.mozilla.org/Security/Bug_Approval_Process
What old branches are affected? Clearly we need this on Beta and ESR24, do we also need to worry about b2g18 (which we'll be supporting until March)? Or is this a more recent regression?
status-b2g18:
--- → ?
status-firefox24:
--- → affected
status-firefox25:
--- → affected
status-firefox27:
--- → fixed
status-firefox-esr17:
--- → ?
status-firefox-esr24:
--- → affected
tracking-firefox25:
--- → +
tracking-firefox26:
--- → +
tracking-firefox-esr24:
--- → 25+
Flags: needinfo?(seth)
Comment 34•11 years ago
|
||
This should have been tracking much sooner. Please get an approval nomination up, for the possibility of landing in FF25 (currently Beta, and <2wk from release).
Assignee | ||
Comment 35•11 years ago
|
||
This is a regression from bug 716140 AFAICT, which means it has been around for a while. I'm honestly not certain about b2g18, but this given when bug 716140 landed this affects FF22+.
Flags: needinfo?(seth)
Assignee | ||
Comment 36•11 years ago
|
||
Comment on attachment 801954 [details] [diff] [review]
Do not call RasterImage::|DoError| off the main thread.
[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: Potential vulnerability.
Fix Landed on Version: FF26
Risk to taking this patch (and alternatives if risky): Low. This has been on m-c/Aurora for a while and doesn't seem to have caused issues.
String or UUID changes made by this patch: None.
(Note that I guess Release _is_ ESR24 right now, so requesting both is perhaps redundant...)
See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #801954 -
Flags: approval-mozilla-release?
Attachment #801954 -
Flags: approval-mozilla-esr24?
Attachment #801954 -
Flags: approval-mozilla-beta?
Attachment #801954 -
Flags: approval-mozilla-aurora?
Comment 38•11 years ago
|
||
Seth, can you please set sec-approval? on your patch and answer the security questions?
As a sec-high, it needs to go through sec-approval. See https://wiki.mozilla.org/Security/Bug_Approval_Process
Flags: needinfo?(seth)
Assignee | ||
Comment 39•11 years ago
|
||
Comment on attachment 801954 [details] [diff] [review]
Do not call RasterImage::|DoError| off the main thread.
[Security approval request comment]
How easily could an exploit be constructed based on the patch?
Not entirely sure. The bug involves unsafe parallel access to data structures which are not directly accessible from JavaScript, but which JavaScript certainly has some indirect influence over.
Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
Both the check-in comment and comments in the patch point to the fact that calling RasterImage::DoError off main thread is unsafe. They don't make clear exactly what's unsafe about it, and DoError involves a fair amount of internal complexity. I wouldn't describe this as a bulls-eye but it does give a good starting point for further investigation.
Which older supported branches are affected by this flaw?
This was caused by bug 716140 which means it affects versions FF22 and onward.
If not all supported branches, which bug introduced the flaw?
Bug 716140.
Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
I'd expect the same patch, perhaps with minor changes, should work as far back as FF22.
How likely is this patch to cause regressions; how much testing does it need?
It's been on m-c for a while with no known regressions. (It seems that I should have requested sec-approval before landing it; sorry about that. I'll do that in the future.)
Attachment #801954 -
Flags: sec-approval?
Assignee | ||
Updated•11 years ago
|
Flags: needinfo?(seth)
Comment 40•11 years ago
|
||
Comment on attachment 801954 [details] [diff] [review]
Do not call RasterImage::|DoError| off the main thread.
sec-approval+. I somehow missed that this has wound up on trunk already.
Since this is in and hasn't caused regressions and it a sec-high shipping all of the upcoming branches and ESR24, I'm approving it for those as well.
Attachment #801954 -
Flags: sec-approval?
Attachment #801954 -
Flags: sec-approval+
Attachment #801954 -
Flags: approval-mozilla-esr24?
Attachment #801954 -
Flags: approval-mozilla-esr24+
Attachment #801954 -
Flags: approval-mozilla-beta?
Attachment #801954 -
Flags: approval-mozilla-beta+
Attachment #801954 -
Flags: approval-mozilla-aurora?
Attachment #801954 -
Flags: approval-mozilla-aurora+
Updated•11 years ago
|
Attachment #801954 -
Flags: approval-mozilla-release?
Comment 41•11 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/5a9c43923a4d
https://hg.mozilla.org/releases/mozilla-esr24/rev/2c94c477f1fb
Landed on trunk when for mozilla26, so Aurora already has the fix.
Updated•11 years ago
|
Whiteboard: [adv-main25+][adv-esr24-1+]
Comment 42•11 years ago
|
||
I tried to reproduce this issue on Firefox 24.0 and verify it on Firefox 25.0b8 (20131017174213) and Firefox 26.0a2 (20131018004002).
This is what happens on all three builds, when I reproduce the steps in comment 0 (in this exact order):
- the memory consumption keeps increasing,
- memory consumption oscilates around 1.5GB for a little while (about 1-2 minutes more on the builds with the fix than on Fx24.0),
- Firefox crashes with an EMPTY signature. Examples:
Fx24.0 https://crash-stats.mozilla.com/report/index/23b3c119-eb59-44af-8fae-6ddf92131018
Fx25.0b8 https://crash-stats.mozilla.com/report/index/b9b28bb0-963b-42f2-a381-42c732131018
Fx26.0a2 https://crash-stats.mozilla.com/report/index/6fb8a45a-04c6-435c-983f-d85dc2131018
Updated•11 years ago
|
Alias: CVE-2013-5596
Comment 43•11 years ago
|
||
Ioana, can you assign someone to verify this bug fix? Ideally in as many branches as possible, your call. Thanks.
QA Contact: ioana.budnar
Comment 44•11 years ago
|
||
I see the same crashes on Firefox 24.1.0 and Firefox 27 as in comment 42.
Firefox 27 (10/22): https://crash-stats.mozilla.com/report/index/2e3ada8b-fb04-4ea0-96da-e0c652131023
Firefox 24.1.0 ESR: https://crash-stats.mozilla.com/report/index/d5f2b141-8726-48b5-b8e5-3a06f2131023
Considering that the bug reproduces exactly the same for me as in Firefox 24.0 (which is "wontfix"), I can't mark this bug as verified. The problem is that I always get empty crash reports, so I can't say for sure that it actually is the same bug.
Updated•11 years ago
|
Comment 45•11 years ago
|
||
Seth, can you address Ioana's concerns from comment 44? We'd like to ship this with confidence but this still leaves us with some uncertainty. Thank you.
Flags: needinfo?(seth)
Assignee | ||
Comment 46•11 years ago
|
||
Sorry Matt, I just got back from a workweek.
I'm sorta concerned that there are multiple bugs here.
The patch in this bug should, to my knowledge, fix the issue described in comment 9.
I've again tried to reproduce a crash using the STR on my machine and I'm able to load every page available (92 of them) without a crash. Memory usage at that point on my machine is about 2.2 GB, with vsize at 5.8 GB. Is this the problem? (I know crashes with empty dumps have been associated with running out of address space in the past.) It appears, although I'm not 100% sure, that everyone encountering this crash is on a Windows machine, where we aren't 64-bit, and although the crash reports don't report zero remaining virtual address space it does look low.
Flags: needinfo?(seth)
Comment 47•11 years ago
|
||
I've been trying to get this crash to reproduce - both before and after the fix - and I can't.
Ioana, are you on Windows 7, 32 or 64 bit? We might want to see if we can get this to reproduce on another PC to see what could be different. Otherwise, we may just have to give up on it for now.
Comment 48•11 years ago
|
||
(In reply to Matt Wobensmith from comment #47)
> I've been trying to get this crash to reproduce - both before and after the
> fix - and I can't.
>
> Ioana, are you on Windows 7, 32 or 64 bit? We might want to see if we can
> get this to reproduce on another PC to see what could be different.
> Otherwise, we may just have to give up on it for now.
I was on Windows 7 32bit, which, from what I remember, is where this bug reproduced most.
Comment 49•11 years ago
|
||
I haven't seem this crash in Socorro post-Firefox 23.0.1.
Ezra, can you please try to verify this is fixed in Firefox 26, 27 or 24.1.1 ESR?
You can find the necessary builds here:
ftp.mozilla.org/pub/mozilla.org/firefox/nightly/
Flags: needinfo?(ezra)
Keywords: verifyme
Reporter | ||
Comment 50•11 years ago
|
||
Just tested it with Fx 25.0.1 with RES 4.3.0.4, after 40+ pages no crash, could not get the memory above 1.2GB.
I tried to get it as high as possible, because as stated in comment 14 it would crash at around 1.7GB, but it would keep at a steady 1.2GB this time.
I'll try the nightly builds too asap.
Reporter | ||
Comment 51•11 years ago
|
||
Can't reproduce this on Fx 24.1.1ESR, same as above basically, can't really verify if this is because of the fix or because memory never reaches 1.7GB as it did with 23.
Reporter | ||
Comment 52•11 years ago
|
||
Can't reproduce this on Fx 26, same as above basically, can't really verify if this is because of the fix or because memory never reaches 1.7GB as it did with 23.
Probably because of the changes that unload images from memory if they are not withing the viewport, that were not in Fx23?
So no crash, but I can't reliably say it's because of the fix or the other changes.
Flags: needinfo?(ezra)
Updated•10 years ago
|
Group: core-security
You need to log in
before you can comment on or make changes to this bug.
Description
•