<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Reporter

Comment 2

•

11 years ago

Attached file memory-report.json.gz — Details

Memory Report at page 1

Reporter

Comment 3

•

11 years ago

Attached file memory-report-2.json.gz — Details

Memory report at page 16

Reporter

Comment 4

•

11 years ago

Attached file memory-report-3.json.gz — Details

Memory Report around page 23

Flags: needinfo?(ezra)

(not currently active) Ted Mielczarek

Reporter

Comment 5

•

11 years ago

Attached 3 memory reports, Fx crashed around page 26-27 this time. Sadly the crash reporter could not find anything usable: https://crash-stats.mozilla.com/report/index/bb130fb3-9154-451d-ae93-b53e12130831 This was the previous crash report (without a clean profile): https://crash-stats.mozilla.com/report/index/e87be6fc-9d8a-4b11-9d1d-ee8f72130829

Loic

Updated

•

11 years ago

Severity: normal → critical

Keywords: crash

Robert Kaiser

Comment 6

•

11 years ago

It *smells* like OOM alright, but the crash reports say that quite a bit of virtual and physical memory is still available.

Comment 7

•

11 years ago

Indeed, the memory usage indicators all look alright, and I don't see an OOMAllocationSize in the raw JSON, so there's none of the usual OOM hallmarks (aside from the empty dump). I can proably get a good stack out of bp-e87be6fc-9d8a-4b11-9d1d-ee8f72130829 with a little fiddling.

(not currently active) Ted Mielczarek

Comment 8

•

11 years ago

Stack from bp-e87be6fc-9d8a-4b11-9d1d-ee8f72130829: Operating system: Windows NT 6.1.7601 Service Pack 1 CPU: x86 AuthenticAMD family 20 model 1 stepping 0 2 CPUs Crash reason: EXCEPTION_BREAKPOINT Crash address: 0xff7ff56 Thread 20 (crashed) 0 xul.dll!NS_CycleCollectorSuspect2 [nsCycleCollector.cpp:a55c55edf302 : 2938 + 0x0] eip = 0x0ff7ff56 esp = 0x0efff670 ebp = 0x0efff6a8 ebx = 0x0efff76c esi = 0x00000000 edi = 0x175534c0 eax = 0x00000000 ecx = 0x0000002e edx = 0x00000008 efl = 0x00000246 Found by: given as instruction pointer in context 1 xul.dll!imgStatusTracker::SyncNotifyState(nsTObserverArray<imgRequestProxy *> &,bool,unsigned int,nsIntRect &,bool) [imgStatusTracker.cpp:a55c55edf302 : 514 + 0x40] eip = 0x0fbcd88f esp = 0x0efff6b0 ebp = 0x0efff6c8 Found by: previous frame's frame pointer 2 xul.dll!imgStatusTracker::SyncNotifyDifference(imgStatusTracker::StatusDiff) [imgStatusTracker.cpp:a55c55edf302 : 581 + 0x27] eip = 0x0fbcd97c esp = 0x0efff6d0 ebp = 0x0efff75c Found by: call frame info 3 xul.dll!mozilla::image::RasterImage::FinishedSomeDecoding(mozilla::image::RasterImage::eShutdownIntent,mozilla::image::RasterImage::DecodeRequest *) [RasterImage.cpp:a55c55edf302 : 3510 + 0x18] eip = 0x0fbbf9b2 esp = 0x0efff764 ebp = 0x0efff7d0 Found by: call frame info 4 xul.dll!mozilla::image::RasterImage::DoError() [RasterImage.cpp:a55c55edf302 : 3363 + 0xb] eip = 0x0fbbc8aa esp = 0x0efff7d8 ebp = 0x0efff810 Found by: call frame info 5 xul.dll!mozilla::image::RasterImage::WriteToDecoder(char const *,unsigned int) [RasterImage.cpp:a55c55edf302 : 2692 + 0x42] eip = 0x0fbbd203 esp = 0x0efff7ec ebp = 0x0efff810 Found by: call frame info 6 xul.dll!mozilla::image::RasterImage::DecodeSomeData(unsigned int) [RasterImage.cpp:a55c55edf302 : 3305 + 0xe] eip = 0x0fbbd2f4 esp = 0x0efff7fc ebp = 0x0efff810 Found by: call frame info 7 xul.dll!mozilla::image::RasterImage::DecodePool::DecodeSomeOfImage(mozilla::image::RasterImage *,mozilla::image::RasterImage::DecodePool::DecodeType,unsigned int) [RasterImage.cpp:a55c55edf302 : 3845 + 0x8] eip = 0x0fbbd46e esp = 0x0efff818 ebp = 0x0efff880 Found by: call frame info 8 xul.dll!mozilla::image::RasterImage::DecodePool::DecodeJob::Run() [RasterImage.cpp:a55c55edf302 : 3714 + 0x17] eip = 0x0fbbeb06 esp = 0x0efff888 ebp = 0x0efff8a0 Found by: call frame info 9 xul.dll!nsThreadPool::Run() [nsThreadPool.cpp:a55c55edf302 : 194 + 0x8] eip = 0x0fabf2c2 esp = 0x0efff8a8 ebp = 0x0efff8c4 Found by: call frame info 10 xul.dll!nsThread::ProcessNextEvent(bool,bool *) [nsThread.cpp:a55c55edf302 : 627 + 0x14] eip = 0x0fa5fe96 esp = 0x0efff8d4 ebp = 0x0efff8c4 Found by: call frame info 11 xul.dll!nsThread::ThreadFunc(void *) [nsThread.cpp:a55c55edf302 : 265 + 0x1d] eip = 0x0fab05f8 esp = 0x0efff94c ebp = 0x0efff8c4 Found by: call frame info 12 nss3.dll!_PR_NativeRunThread [pruthr.c:a55c55edf302 : 397 + 0x9] eip = 0x638d37e0 esp = 0x0efff974 ebp = 0x0efff8c4 Found by: call frame info 13 nss3.dll!pr_root [w95thred.c:a55c55edf302 : 90 + 0xd] eip = 0x638d4bcd esp = 0x0efff998 ebp = 0x0efff9d0 Found by: call frame info

(not currently active) Ted Mielczarek

Comment 9

•

11 years ago

Frame 0 is hitting the MOZ_CRASH here: http://hg.mozilla.org/releases/mozilla-release/annotate/a55c55edf302/xpcom/base/nsCycleCollector.cpp#l2938 via frame 1 here: http://hg.mozilla.org/releases/mozilla-release/annotate/a55c55edf302/image/src/imgStatusTracker.cpp#l514

Benjamin Smedberg

Comment 10

•

11 years ago

This is calling NS_CycleCollectorSuspect2 on a decoding thread, which doesn't have a cycle collector. Entirely a graphics-system bug. According to blame, joe and sfowler are the most likely candidates for dealing with this.

Status: UNCONFIRMED → NEW

Component: General → ImageLib

Ever confirmed: true

Product: Firefox → Core

Andrew McCreight [:mccr8]

Comment 11

•

11 years ago

That probably means that a cycle collected object is being Release'd in the decoding thread.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 12

•

11 years ago

Does this happen if you set image.multithreaded_decoding.enabled to false? What about if it's left at the default, but image.multithreaded_decoding.limit is set to 1?

Reporter

Comment 13

•

11 years ago

Sadly Fx crashed even when image.multithreaded_decoding.enabled was set to false. and to make it even worse, the crash dump was corrupted :( https://crash-stats.mozilla.com/report/index/490aefff-0b07-4dcf-9903-9f5f22130908

Reporter

Comment 14

•

11 years ago

I'm still suspecting OOM actually, when browsing reddit for that long Fx's memory rises to 1.7GB according to task manager, and when switching to another tab (about:memory) for example, memory drops to about 700MB, and after switching back to reddit it increases to 1.5GB again. Could this explain why the about:memory reports don't show that Fx is running very high on memory? Every time shortly after memory hits around 1.7GB of memory Fx crashes.

Josh Matthews [:jdm]

Comment 15

•

11 years ago

There's a MOZ_ASSERT in RasterImage::FinishedSomeDecoding that we're running on the main thread which is a big warning flag here. I'm not sure where in the stack we switch from threadsafe to non-threadsafe code, but DoError looks possible to me.

Timothy Nikkel (:tnikkel)

Reporter

Comment 16

•

11 years ago

> What about if it's left at the default, but image.multithreaded_decoding.limit is set to 1? Also crashed :-( https://crash-stats.mozilla.com/report/index/6827dca4-493d-40b7-9962-1673d2130908

Comment 17

•

11 years ago

(In reply to Ezra from comment #14) > I'm still suspecting OOM actually, when browsing reddit for that long Fx's > memory rises to 1.7GB according to task manager, and when switching to > another tab (about:memory) for example, memory drops to about 700MB, and > after switching back to reddit it increases to 1.5GB again. > > Could this explain why the about:memory reports don't show that Fx is > running very high on memory? > > Every time shortly after memory hits around 1.7GB of memory Fx crashes. Try opening about:memory in a new window instead of a different tab, that will prevent image data from being discarded when the problem tab becomes a background tab.

Timothy Nikkel (:tnikkel)

Comment 18

•

11 years ago

(In reply to Ezra from comment #14) > I'm still suspecting OOM actually, when browsing reddit for that long Fx's > memory rises to 1.7GB according to task manager, and when switching to > another tab (about:memory) for example, memory drops to about 700MB, and > after switching back to reddit it increases to 1.5GB again. > > Could this explain why the about:memory reports don't show that Fx is > running very high on memory? > > Every time shortly after memory hits around 1.7GB of memory Fx crashes. Oh, and we also have some improved memory handling for images starting in 24 if you want to try beta (or aurora or nightly).

Josh Matthews [:jdm]

Comment 19

•

11 years ago

See comment 15.

Flags: needinfo?(seth)

Jeff Muizelaar [:jrmuizel]

Updated

•

11 years ago

Group: core-security

Milan Sreckovic [:milan] (needinfo for best results)

Comment 20

•

11 years ago

(In reply to Josh Matthews [:jdm] from comment #15) > There's a MOZ_ASSERT in RasterImage::FinishedSomeDecoding that we're running > on the main thread which is a big warning flag here. I'm not sure where in > the stack we switch from threadsafe to non-threadsafe code, but DoError > looks possible to me. Jeff identified that WriteToDecoder should be on the decoding thread, FinshedSomeDecoding should be on the main thread, so we just need to figure out if DoError should be on the decoding thread or main thread and make sure the right thing happens. Doesn't look we "designed" for this, it may not have been obvious what DoError would cause.

Assignee

Comment 21

•

11 years ago

I'm working on this. One lame thing is that this bug will look different on different platforms. This call stack acquires mDecodingMutex recursively, which will work on Windows since the underlying EnterCriticalSection primitive is recursive, but will deadlock on platforms where we use pthreads.

Flags: needinfo?(seth)

Assignee

Comment 22

•

11 years ago

Attached patch Do not call RasterImage::|DoError| off the main thread. — Details — Splinter Review

Unfortunately, even following the STR, I didn't have much luck reproducing this locally. This may be due to the platform difference or due to a dependence on the specific images we happened to come across (this *is* error-handling code, after all). It's pretty apparent that RasterImage::|DoError| is not currently safe to call off-main-thread, though. This patch adds a |HandleErrorWorker| which handles an error on the main thread in these cases. I added a new |mPendingError| flag to prevent further decoding work even before the main thread handled the error, and to prevent more than one |HandleErrorWorker| from being dispatched. Although this overlaps somewhat with the duties of |mError|, just setting |mError| to true and unconditionally executing the error-handling code when |HandleErrorWorker| ran would invite concurrency bugs since |DoError| might be called from another thread as well - we might end up with duplicate notifications and similar issues.

Attachment #801954 - Flags: review?(josh)

Assignee

Updated

•

11 years ago

Assignee: nobody → seth

Assignee

Comment 23

•

11 years ago

Try job here: https://tbpl.mozilla.org/?tree=Try&rev=48ac7c6129ee

David Bolter [:davidb] (NeedInfo me for attention)

Assignee

Comment 24

•

11 years ago

Try looks OK. Ezra, could you check if you still observe the problem with this build? You can grab it here: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mfowler@mozilla.com-48ac7c6129ee/try-win32/firefox-26.0a1.en-US.win32.installer.exe

Flags: needinfo?(ezra)

Comment 25

•

11 years ago

We were looking at rating this bug during pre-CritSmash today but are unsure how controllable a hypothetical exploit could be. Generally threading bugs involving writing can be sec-high-ish. Seth, Milan, would you rate this sec-high? (Feel free to set the rating)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 26

•

11 years ago

I want to say that anything in the cycle collector is (at least) sec-high, but I'm not sure I can recall all the reasons why I think that.

Assignee

Comment 27

•

11 years ago

I'd say that sec-high is reasonable to me. Marked.

Keywords: sec-high

Josh Matthews [:jdm]

Updated

•

11 years ago

Attachment #801954 - Flags: review?(josh) → review+

Assignee

Comment 28

•

11 years ago

Pushed. Leaving open for the time being; I'd like to hear back from Ezra if possible. https://hg.mozilla.org/integration/mozilla-inbound/rev/8af98453b511

Whiteboard: [leave open]

https://hg.mozilla.org/mozilla-central/rev/8af98453b511

Reporter

Comment 29

•

11 years ago

I have not been able to test your try build yet, will do so tonight, sorry about that.

Ed Morley [:emorley]

Comment 30

•

11 years ago

Status: NEW → ASSIGNED

status-firefox26: --- → fixed

Target Milestone: --- → mozilla26

Reporter

Comment 31

•

11 years ago

Just tested seth's try build, it seems to be working great! Scrolled for a loooooong time, memory didn't rise as fast as it used to which is great, repeatedly got to a point where RES didn't want to scroll any further. Reloaded the page a couple of times to keep the memory usage high and keep scrolling, finally reached the point of 1.7GB, Fx did stall for a little while, before dropping back to 600MB of memory and continuing on it's merry way. So I guess it's fixed, awesome :-)

Flags: needinfo?(ezra)

Assignee

Comment 32

•

11 years ago

(In reply to Ezra from comment #31) > So I guess it's fixed, awesome :-) I'm glad to hear that! Thanks for reporting the bug and for testing the fix. Feel free to reopen if you run across the bug again. For now, I'm marking this bug as resolved.

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Whiteboard: [leave open]

Daniel Veditz [:dveditz]

Comment 33

•

11 years ago

Patches for security bugs have a landing policy to prevent 0-daying our Release users https://wiki.mozilla.org/Security/Bug_Approval_Process What old branches are affected? Clearly we need this on Beta and ESR24, do we also need to worry about b2g18 (which we'll be supporting until March)? Or is this a more recent regression?

status-b2g18: --- → ?

status-firefox24: --- → affected

status-firefox25: --- → affected

status-firefox27: --- → fixed

status-firefox-esr17: --- → ?

status-firefox-esr24: --- → affected

tracking-firefox25: --- → +

tracking-firefox26: --- → +

tracking-firefox-esr24: --- → 25+

Flags: needinfo?(seth)

Alex Keybl [:akeybl]

Comment 34

•

11 years ago

This should have been tracking much sooner. Please get an approval nomination up, for the possibility of landing in FF25 (currently Beta, and <2wk from release).

Assignee

Comment 35

•

11 years ago

This is a regression from bug 716140 AFAICT, which means it has been around for a while. I'm honestly not certain about b2g18, but this given when bug 716140 landed this affects FF22+.

Flags: needinfo?(seth)

Assignee

Comment 36

•

11 years ago

Comment on attachment 801954 [details] [diff] [review] Do not call RasterImage::|DoError| off the main thread. [Approval Request Comment] If this is not a sec:{high,crit} bug, please state case for ESR consideration: User impact if declined: Potential vulnerability. Fix Landed on Version: FF26 Risk to taking this patch (and alternatives if risky): Low. This has been on m-c/Aurora for a while and doesn't seem to have caused issues. String or UUID changes made by this patch: None. (Note that I guess Release _is_ ESR24 right now, so requesting both is perhaps redundant...) See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

Attachment #801954 - Flags: approval-mozilla-release?

Attachment #801954 - Flags: approval-mozilla-esr24?

Attachment #801954 - Flags: approval-mozilla-beta?

Attachment #801954 - Flags: approval-mozilla-aurora?

u279076

Comment 37

•

11 years ago

Ioana, please verify this is fixed.

Keywords: verifyme

Comment 38

•

11 years ago

Seth, can you please set sec-approval? on your patch and answer the security questions? As a sec-high, it needs to go through sec-approval. See https://wiki.mozilla.org/Security/Bug_Approval_Process

Flags: needinfo?(seth)

Assignee

Comment 39

•

11 years ago

Comment on attachment 801954 [details] [diff] [review] Do not call RasterImage::|DoError| off the main thread. [Security approval request comment] How easily could an exploit be constructed based on the patch? Not entirely sure. The bug involves unsafe parallel access to data structures which are not directly accessible from JavaScript, but which JavaScript certainly has some indirect influence over. Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? Both the check-in comment and comments in the patch point to the fact that calling RasterImage::DoError off main thread is unsafe. They don't make clear exactly what's unsafe about it, and DoError involves a fair amount of internal complexity. I wouldn't describe this as a bulls-eye but it does give a good starting point for further investigation. Which older supported branches are affected by this flaw? This was caused by bug 716140 which means it affects versions FF22 and onward. If not all supported branches, which bug introduced the flaw? Bug 716140. Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? I'd expect the same patch, perhaps with minor changes, should work as far back as FF22. How likely is this patch to cause regressions; how much testing does it need? It's been on m-c for a while with no known regressions. (It seems that I should have requested sec-approval before landing it; sorry about that. I'll do that in the future.)

Attachment #801954 - Flags: sec-approval?

Assignee

Updated

•

11 years ago

Flags: needinfo?(seth)

Ryan VanderMeulen [:RyanVM]

Comment 40

•

11 years ago

Comment on attachment 801954 [details] [diff] [review] Do not call RasterImage::|DoError| off the main thread. sec-approval+. I somehow missed that this has wound up on trunk already. Since this is in and hasn't caused regressions and it a sec-high shipping all of the upcoming branches and ESR24, I'm approving it for those as well.

Attachment #801954 - Flags: sec-approval?

Attachment #801954 - Flags: sec-approval+

Attachment #801954 - Flags: approval-mozilla-esr24?

Attachment #801954 - Flags: approval-mozilla-esr24+

Attachment #801954 - Flags: approval-mozilla-beta?

Attachment #801954 - Flags: approval-mozilla-beta+

Attachment #801954 - Flags: approval-mozilla-aurora?

Attachment #801954 - Flags: approval-mozilla-aurora+

Updated

•

11 years ago

Attachment #801954 - Flags: approval-mozilla-release?

Ryan VanderMeulen [:RyanVM]

Comment 41

•

11 years ago

https://hg.mozilla.org/releases/mozilla-beta/rev/5a9c43923a4d https://hg.mozilla.org/releases/mozilla-esr24/rev/2c94c477f1fb Landed on trunk when for mozilla26, so Aurora already has the fix.

status-b2g-v1.2: --- → fixed

status-firefox24: affected → wontfix

status-firefox25: affected → fixed

status-firefox-esr24: affected → fixed

Updated

•

11 years ago

status-firefox-esr17: ? → unaffected

Whiteboard: [adv-main25+][adv-esr24-1+]

Comment 42

•

11 years ago

I tried to reproduce this issue on Firefox 24.0 and verify it on Firefox 25.0b8 (20131017174213) and Firefox 26.0a2 (20131018004002). This is what happens on all three builds, when I reproduce the steps in comment 0 (in this exact order): - the memory consumption keeps increasing, - memory consumption oscilates around 1.5GB for a little while (about 1-2 minutes more on the builds with the fix than on Fx24.0), - Firefox crashes with an EMPTY signature. Examples: Fx24.0 https://crash-stats.mozilla.com/report/index/23b3c119-eb59-44af-8fae-6ddf92131018 Fx25.0b8 https://crash-stats.mozilla.com/report/index/b9b28bb0-963b-42f2-a381-42c732131018 Fx26.0a2 https://crash-stats.mozilla.com/report/index/6fb8a45a-04c6-435c-983f-d85dc2131018

Matt Wobensmith [:mwobensmith][:matt:]

Updated

•

11 years ago

Alias: CVE-2013-5596

Comment 43

•

11 years ago

Ioana, can you assign someone to verify this bug fix? Ideally in as many branches as possible, your call. Thanks.

QA Contact: ioana.budnar

Matt Wobensmith [:mwobensmith][:matt:]

Comment 44

•

11 years ago

I see the same crashes on Firefox 24.1.0 and Firefox 27 as in comment 42. Firefox 27 (10/22): https://crash-stats.mozilla.com/report/index/2e3ada8b-fb04-4ea0-96da-e0c652131023 Firefox 24.1.0 ESR: https://crash-stats.mozilla.com/report/index/d5f2b141-8726-48b5-b8e5-3a06f2131023 Considering that the bug reproduces exactly the same for me as in Firefox 24.0 (which is "wontfix"), I can't mark this bug as verified. The problem is that I always get empty crash reports, so I can't say for sure that it actually is the same bug.

Daniel Veditz [:dveditz]

Updated

•

11 years ago

Blocks: 716140

status-b2g18: ? → unaffected

Keywords: regression

Comment 45

•

11 years ago

Seth, can you address Ioana's concerns from comment 44? We'd like to ship this with confidence but this still leaves us with some uncertainty. Thank you.

Flags: needinfo?(seth)

Matt Wobensmith [:mwobensmith][:matt:]

Assignee

Comment 46

•

11 years ago

Sorry Matt, I just got back from a workweek. I'm sorta concerned that there are multiple bugs here. The patch in this bug should, to my knowledge, fix the issue described in comment 9. I've again tried to reproduce a crash using the STR on my machine and I'm able to load every page available (92 of them) without a crash. Memory usage at that point on my machine is about 2.2 GB, with vsize at 5.8 GB. Is this the problem? (I know crashes with empty dumps have been associated with running out of address space in the past.) It appears, although I'm not 100% sure, that everyone encountering this crash is on a Windows machine, where we aren't 64-bit, and although the crash reports don't report zero remaining virtual address space it does look low.

Flags: needinfo?(seth)

Comment 47

•

11 years ago

I've been trying to get this crash to reproduce - both before and after the fix - and I can't. Ioana, are you on Windows 7, 32 or 64 bit? We might want to see if we can get this to reproduce on another PC to see what could be different. Otherwise, we may just have to give up on it for now.

Comment 48

•

11 years ago

(In reply to Matt Wobensmith from comment #47) > I've been trying to get this crash to reproduce - both before and after the > fix - and I can't. > > Ioana, are you on Windows 7, 32 or 64 bit? We might want to see if we can > get this to reproduce on another PC to see what could be different. > Otherwise, we may just have to give up on it for now. I was on Windows 7 32bit, which, from what I remember, is where this bug reproduced most.

Comment 49

•

11 years ago

I haven't seem this crash in Socorro post-Firefox 23.0.1. Ezra, can you please try to verify this is fixed in Firefox 26, 27 or 24.1.1 ESR? You can find the necessary builds here: ftp.mozilla.org/pub/mozilla.org/firefox/nightly/

Flags: needinfo?(ezra)

Keywords: verifyme

Reporter

Comment 50

•

11 years ago

Just tested it with Fx 25.0.1 with RES 4.3.0.4, after 40+ pages no crash, could not get the memory above 1.2GB. I tried to get it as high as possible, because as stated in comment 14 it would crash at around 1.7GB, but it would keep at a steady 1.2GB this time. I'll try the nightly builds too asap.

Reporter

Comment 51

•

11 years ago

Can't reproduce this on Fx 24.1.1ESR, same as above basically, can't really verify if this is because of the fix or because memory never reaches 1.7GB as it did with 23.