639885 - Crash [@ memmove ] via mozilla::layers::ReadbackManagerD3D10::ProcessTasks

Reporter

Description

•

14 years ago

I hit this crash three times in the past two days:

 	mozcrt19.dll!memmove(unsigned char * dst=0x67e23fa0, unsigned char * src=0x67e23fa4, unsigned long count=4294967292)  Line 188	Asm
 	xul.dll!nsTArray_base<nsTArrayDefaultAllocator>::ShiftData(unsigned int start=0, unsigned int oldLen=0, unsigned int newLen=0, unsigned int elemSize=4)  Line 180 + 0xd bytes	C++
 	xul.dll!nsTArray<nsAutoPtr<mozilla::layers::ReadbackTask>,nsTArrayDefaultAllocator>::RemoveElementsAt(unsigned int start=1736935553, unsigned int count=1)  Line 836	C++
 	xul.dll!nsTArray<nsAutoPtr<mozilla::layers::ReadbackTask>,nsTArrayDefaultAllocator>::RemoveElementAt(unsigned int index=1736935646)  Line 841	C++
>	xul.dll!mozilla::layers::ReadbackManagerD3D10::ProcessTasks()  Line 232	C++
 	xul.dll!mozilla::layers::StartTaskThread(void * aManager=0x124f7730)  Line 135	C++
 	kernel32.dll!@BaseThreadInitThunk@12()  + 0x12 bytes	
 	ntdll.dll!___RtlUserThreadStart@8()  + 0x27 bytes	
 	ntdll.dll!__RtlUserThreadStart@8()  + 0x1b bytes	

The worst part is that before we crash, the memmove manages to overwrite a static member variable that Breakpad needs to function properly, so the exception handler crashes, so you just get a Windows crash dialog instead of our crash reporter.

All three times I managed to reproduce this I had another instance of Firefox running (with -no-remote). I hit it twice today under those circumstances. (The second time I attached a debugger and set a data breakpoint, which is how I determined that the member variable was being overwritten here.)

Benjamin Smedberg

Updated

•

14 years ago

blocking2.0: --- → ?

Andreas Gal :gal

Updated

•

14 years ago

Group: core-security

Andreas Gal :gal

Comment 1

•

14 years ago

Hidden until further analysis.

Robert Sayre

Comment 2

•

14 years ago

Need more data on whether we have a frequent crash. Leaving the nomination now.

Group: core-security

Joe Drew (not getting mail)

Updated

•

14 years ago

Assignee: nobody → bas.schouten

Robert Sayre

Updated

•

14 years ago

Group: core-security

Bas Schouten (:bas.schouten)

Assignee

Comment 3

•

14 years ago

This crash is very strange, I've got a theory on what might cause something like this, but it should only occur on a device driver hang/reset or some 10-20 seconds after startup.

I will write a fix for that issue and hope it takes care of this. Sadly considering this shows a weakness in our crash reporter, we have no idea how often this happens in the wild.

I can see no way this can be exploited.

Benjamin Smedberg

Comment 4

•

14 years ago

Given the codepath here, this is probably only related to windowless plugins on D3D10, which is most commonly testable on hulu and blip... most youtube videos are windowed.

chris hofmann

Comment 5

•

14 years ago

> Need more data on whether we have a frequent crash.

I don't get any matches for a stack or signature looking like this when trying to find this in reports in socorro, or searching the top ten lines of all the stacks in the top 300 for beta 12 or 3.6.14

http://people.mozilla.org/crash_stacks/stack-summary-4.0b12.txt
http://people.mozilla.org/crash_stacks/stack-summary-3.6.14.txt

Benjamin Smedberg

Comment 6

•

14 years ago

Yes, that's because the bug here causes no crash reporter to come up.

Bas Schouten (:bas.schouten)

Assignee

Comment 7

•

14 years ago

Attached patch Fix potential access-after-free (obsolete) — Details — Splinter Review

This fixes the potential access-after-free bug here. We call terminate before releasing our last external reference inside the DeviceAttachments.

Terminate signals the task thread to stop after checking whether it has grabbed a reference to the ReadbackManager yet. That reference will be released when the task thread terminates.

This should fix the only cause I can think of for this crash, outside of someone else corrupting the readbackmanager.

Attachment #517848 - Flags: review?(jmuizelaar)

Joe Drew (not getting mail)

Updated

•

14 years ago

Attachment #517848 - Flags: review?(jones.chris.g)

Andreas Gal :gal

Comment 8

•

14 years ago

Is this read or write after free? Write after free is almost always exploitable in a multi-threaded system with user control over background threads.

Bas Schouten (:bas.schouten)

Assignee

Comment 9

•

14 years ago

This is a write after free. But I don't see how anyone could affect what's being written to from JS or HTML or anything like that.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 10

•

14 years ago

Comment on attachment 517848 [details] [diff] [review]
Fix potential access-after-free

This looks mostly OK, comments follow.

> struct DeviceAttachments
> {
>+  ~DeviceAttachments() {
>+    if (mReadbackManager) {
>+      mReadbackManager->Terminate();
>+    }

I would remove this ...

>+  mMayTerminate = ::CreateEventA(NULL, TRUE, FALSE, "MayTerminateEvent");

... rename this to mReadbackThreadHasRef ...

>+void
>+ReadbackManagerD3D10::Terminate()
>+{
>+  // Practically always this event should have been set by the time we reach
>+  // this, we need to do this to prevent the rare race condition where we'd
>+  // get terminated and released on the main thread before we take a reference
>+  // to ourselves.
>+  ::WaitForSingleObject(mMayTerminate, INFINITE);

... remove this ...

... and in ReadbackManagerD3D10::ReadbackManagerD3D10(), after
creating the readback thread, block on

  ::WaitForSingleObject(mReadbackThreadHasRef, INFINITE);

Your impl here looks OK, but I don't like distributing the invariant
that the readback thread has a reference around this code.  This will
mean that we block on startup of the readback thread, but I would hope
that's not too slow on windows.

Jeff Muizelaar [:jrmuizel]

Comment 11

•

14 years ago

It looks like this is:
  memmove(base, base+4, 4294967292);
where base is a heap address.

So move everything on the heap forward 4 bytes until we reach the end of the heap.
I'm sort of surprised that the static member variables for crash reporter are in memory contiguous with the heap.

Bas Schouten (:bas.schouten)

Assignee

Comment 12

•

14 years ago

Attached patch Fix potential access-after-free v2 (obsolete) — Details — Splinter Review

Addressed cjones' comments as per IRC discussion.

Attachment #517848 - Attachment is obsolete: true

Attachment #517848 - Flags: review?(jones.chris.g)

Attachment #517848 - Flags: review?(jmuizelaar)

Attachment #517877 - Flags: review?(jones.chris.g)

Attachment #517877 - Flags: review?(jmuizelaar)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 13

•

14 years ago

Comment on attachment 517877 [details] [diff] [review]
Fix potential access-after-free v2

I still think the "mMayTerminate" name is ambiguous and confusing, but don't need to hold an sg on that.

>+void
>+ReadbackManagerD3D10::Terminate()
>+{

Assert main-thread here.

Attachment #517877 - Flags: review?(jones.chris.g) → review+

Mike Beltzner [:beltzner, not reading bugmail]

Comment 14

•

14 years ago

In order to make a judgement on blocking status, the following questions need to be answered:

 - is this a recent regression?
 - do we see any secondary indicators that the crash is common?
 - can it be trivially reproduced on Windows systems running common plugins on common websites?

JP Rosevear [:jpr]

Comment 15

•

14 years ago

(In reply to comment #14)
> In order to make a judgement on blocking status, the following questions need
> to be answered:
> 
>  - is this a recent regression?

Probably since this code landed only in b12.

>  - do we see any secondary indicators that the crash is common?

MS crash data will hopefully provide the answer.  Asa found nothing concrete in the input database (see release-drivers).

>  - can it be trivially reproduced on Windows systems running common plugins on
> common websites?

Not yet at least.

Jeff Muizelaar [:jrmuizel]

Comment 16

•

14 years ago

Bas, can you give a step-by-step walk through about your what happens with the current code, provided your theory about the problem is true?

Jeff Muizelaar [:jrmuizel]

Comment 17

•

14 years ago

Also, this patch seems to assume that DeviceAttachments will be the last thing to hold the reference. This stack seems to suggest otherwise:

 xul.dll!mozilla::layers::ReadbackManagerD3D10::~ReadbackManagerD3D10()  Line 148	C++
 xul.dll!mozilla::layers::ReadbackManagerD3D10::`scalar deleting destructor'()  + 0xf bytes	C++
 xul.dll!mozilla::layers::ReadbackManagerD3D10::Release()  Line 213 + 0x1c bytes	C++
 xul.dll!nsRefPtr<mozilla::layers::ReadbackManagerD3D10>::~nsRefPtr<mozilla::layers::ReadbackManagerD3D10>()  Line 970	C++
 xul.dll!mozilla::layers::LayerManagerD3D10::~LayerManagerD3D10()  Line 117 + 0x37 bytes	C++
 xul.dll!mozilla::layers::LayerManagerD3D10::`vector deleting destructor'()  + 0x4d bytes	C++
 xul.dll!mozilla::layers::LayerManager::Release()  Line 256 + 0x141 bytes	C++
 xul.dll!nsRefPtr<mozilla::layers::LayerManager>::assign_assuming_AddRef(mozilla::layers::LayerManager * newPtr=0x00000000)  Line 958	C++
 xul.dll!nsRefPtr<mozilla::layers::LayerManager>::assign_with_AddRef(mozilla::layers::LayerManager * rawPtr=0x00000000)  Line 942	C++
 xul.dll!nsRefPtr<mozilla::layers::LayerManager>::operator=(mozilla::layers::LayerManager * rhs=0x00000000)  Line 1026	C++
 xul.dll!nsWindow::Destroy()  Line 760	C++
 xul.dll!nsXULWindow::Destroy()  Line 556	C++

Jeff Muizelaar [:jrmuizel]

Comment 18

•

14 years ago

Ehsan and I have artificially reproduced the crash by changing the CreateSemaphore initial count to be one less than the maximum and visiting hulu.com

Jeff Muizelaar [:jrmuizel]

Comment 19

•

14 years ago

(In reply to comment #18)
> Ehsan and I have artificially reproduced the crash by changing the
> CreateSemaphore initial count to be one less than the maximum and visiting
> hulu.com

However, the reproduction may be a red herring...

Jeff Muizelaar [:jrmuizel]

Comment 20

•

14 years ago

Ted, did you notice a browser hang of about 5 seconds before the crash?

(no longer active)

Comment 21

•

14 years ago

Attached patch Abort in case of unsafe program state (for Gecko 2.0) (obsolete) — Details — Splinter Review

One possible failure scenario here is if the thread takes more than 5 seconds to shut down.  <http://mxr.mozilla.org/mozilla-central/source/gfx/layers/d3d10/ReadbackManagerD3D10.cpp#160>

In this case we end up with a running thread touching the ReadbackManagerD3D10 object.  Note that the destructor for nsTArray calls Clear on it, which sets mLength to 0, which might lead to the exact crash in comment 0.

I think it's best for us to change things so that we do a runtime abort if this happens.  This enables us to get crash data from our users who hit this, and might help us in debugging this bug further and fixing it in a future release.  This is what this patch does.

This code is really hard to reason about.  Jeff and I discussed this code and Bas patch for several hours today, and in the end neither of us seemed comfortable with taking Bas' patch here.  I highly recommend against taking that patch for Firefox 4, unless a safe fix which has been tested on the nightlies for some time can be backported to the 2.0 branch in a future dot-release.

Attachment #517941 - Flags: review?(jones.chris.g)

Attachment #517941 - Flags: review?(jmuizelaar)

Attachment #517941 - Flags: review?(bas.schouten)

(no longer active)

Comment 22

•

14 years ago

(In reply to comment #21)
> One possible failure scenario here is if the thread takes more than 5 seconds
> to shut down. 

Possible examples of the cases where this might happen is having lots of windowless plugins on a page (pages with ads?), or an extremely busy system (possibly thrashing stuff to disk).  Basically either a large number of jobs to process, or the starvation of the background thread.

(no longer active)

Comment 23

•

14 years ago

And let me also add that this code has a lot of other problems as well (such as using named objects, not doing enough error checking, using a semaphore as a means of tracking the number of objects in the queue, using CreateThread directly, etc).  I think those problems should be dealt with in other bugs though.

(no longer active)

Comment 24

•

14 years ago

Note that if the scenario in comment 21 is correct, this might be exploitable, because we'd be accessing |this| after the object has gone away, and it would be possible to cause the memmove to overwrite a vtable, which means that we're owned.  Determining if this can really happen requires understanding the code way better than I do.

(not currently active) Ted Mielczarek

Reporter

Comment 25

•

14 years ago

(In reply to comment #20)
> Ted, did you notice a browser hang of about 5 seconds before the crash?

Can't say that I did, but I'm not a reliable observer, I don't think. This is on a Core i7 machine running Windows 7 x64, FWIW. The machine is not otherwise under heavy load, just browsing.

Bas Schouten (:bas.schouten)

Assignee

Comment 26

•

14 years ago

I've setup a reporting structure for Ted that keeps reporting how many tasks are in the task queue. If it happens for him again, or maybe even if it doesn't, if we see a rising trend on the amount of tasks in the queue this could be responsible. It sounds very unlikely though. Even if this was the cause we'd simply start leaking tasks (the semaphore will refuse to increment while the queue grows).

It will be interesting debug information anyway, so let's hope it crashes and let's see what comes out of it.

Bas Schouten (:bas.schouten)

Assignee

Comment 27

•

14 years ago

(In reply to comment #24)
> Note that if the scenario in comment 21 is correct, this might be exploitable,
> because we'd be accessing |this| after the object has gone away, and it would
> be possible to cause the memmove to overwrite a vtable, which means that we're
> owned.  Determining if this can really happen requires understanding the code
> way better than I do.

I agree with all this, I don't see a reasonable way for anyone to control what memory the memmove moves where though. But I don't see any attack vectors to inject anything into the vtable. Nor a way to get this vtable to be called after object destruction (the task function itself makes no virtual function calls, i.e. there's no vtable access after free).

So yes, memory can be moved into an area that once was a vtable, but not a vtable that's going to be used for anything.

Bas Schouten (:bas.schouten)

Assignee

Comment 28

•

14 years ago

(In reply to comment #21)
> This code is really hard to reason about.  Jeff and I discussed this code and
> Bas patch for several hours today, and in the end neither of us seemed
> comfortable with taking Bas' patch here.  I highly recommend against taking
> that patch for Firefox 4, unless a safe fix which has been tested on the
> nightlies for some time can be backported to the 2.0 branch in a future
> dot-release.

Although any lifecycle management changes in a multithreaded environment are risky. I personally think this is a pretty safe patch (yes, it would require a QA cycle). If we do have any future RC I'd certainly recommend taking it as this is a pretty nasty bug, even though I can't imagine the prevalence being high.

(no longer active)

Comment 29

•

14 years ago

(In reply to comment #27)
> (In reply to comment #24)
> > Note that if the scenario in comment 21 is correct, this might be exploitable,
> > because we'd be accessing |this| after the object has gone away, and it would
> > be possible to cause the memmove to overwrite a vtable, which means that we're
> > owned.  Determining if this can really happen requires understanding the code
> > way better than I do.
> 
> I agree with all this, I don't see a reasonable way for anyone to control what
> memory the memmove moves where though. But I don't see any attack vectors to
> inject anything into the vtable. Nor a way to get this vtable to be called
> after object destruction (the task function itself makes no virtual function
> calls, i.e. there's no vtable access after free).
> 
> So yes, memory can be moved into an area that once was a vtable, but not a
> vtable that's going to be used for anything.

I was not merely talking about the vtables for these objects.  The fault memmove call goes ahead and corrupts a large portion of the heap this way, so a value which is stored adjacent to an object pointer can be replaced with the pointer, for example, and that value could be constructed to point to the attacker's favorite vtable coming from an image, etc.  :-)

(no longer active)

Comment 30

•

14 years ago

(In reply to comment #28)
> (In reply to comment #21)
> > This code is really hard to reason about.  Jeff and I discussed this code and
> > Bas patch for several hours today, and in the end neither of us seemed
> > comfortable with taking Bas' patch here.  I highly recommend against taking
> > that patch for Firefox 4, unless a safe fix which has been tested on the
> > nightlies for some time can be backported to the 2.0 branch in a future
> > dot-release.
> 
> Although any lifecycle management changes in a multithreaded environment are
> risky. I personally think this is a pretty safe patch (yes, it would require a
> QA cycle). If we do have any future RC I'd certainly recommend taking it as
> this is a pretty nasty bug, even though I can't imagine the prevalence being
> high.

Can you reply to comment 16, please?

(no longer active)

Comment 31

•

14 years ago

Try server push: http://tbpl.mozilla.org/?tree=MozillaTry&rev=0ff4f7282a24

Bas Schouten (:bas.schouten)

Assignee

Comment 32

•

14 years ago

Attached patch Part 2: Stop using named objects [checkin: comment 58] — Details — Splinter Review

I don't agree with all of Ehsan's problems with the code, however he's certainly right that the use of named objects here is a -very- bad thing here! Let's stop doing it now.

I wrote this with in my mind that this was a singleton class, where it would've been benign, but not really that informative anyway. However under some conditions (driver crashes/etc.), it's not, if the shutdown then also takes longer than 5 seconds (I'll detail this situation as requested by Jeff in a separate comment), this will linger the old objects. This means that if the semaphore lingers, our new semaphore will share its state with the old one! This is a big problem since this could make the worker thread on one object process an event that was posted on the other. It would access the wrong array and not find any task there, and crash. This situation should also be extremely rare, and indeed if it does occur we're already stuck with an access-after-free without my other patch.

However it is very bad, as such I'm posting this here as we certainly want to fix this, quickly, as well.

Attachment #517958 - Flags: review?(jmuizelaar)

Bas Schouten (:bas.schouten)

Assignee

Comment 33

•

14 years ago

(In reply to comment #16)
> Bas, can you give a step-by-step walk through about your what happens with the
> current code, provided your theory about the problem is true?

Ehsan gives a pretty good description in comment 21.

Essentially what happens is this:

1) The destructor is called, this tells the thread to shutdown when it unblocks
2) The destructor waits for 5 seconds, and in order not to freeze the browser if the video driver is acting up, it continues execution, it finalizes and deallocates the object out under the task thread
3) The taskthread unblocks, and accesses the freed member variables

Bas Schouten (:bas.schouten)

Assignee

Comment 34

•

14 years ago

Comment on attachment 517941 [details] [diff] [review]
Abort in case of unsafe program state (for Gecko 2.0)

As a measurement device this is a fine solution. I'm r+'ing this so we can go forward with it if this approach is the consensus.

Attachment #517941 - Flags: review?(bas.schouten) → review+

Jeff Muizelaar [:jrmuizel]

Updated

•

14 years ago

Attachment #517941 - Flags: review?(jmuizelaar) → review+

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Updated

•

14 years ago

Attachment #517941 - Flags: review?(jones.chris.g) → review+

(no longer active)

Comment 35

•

14 years ago

Attached patch Abort in case of unsafe program state (for Gecko 2.0) [check-in: comment 51] — Details — Splinter Review

This version changes the patch header so that if we decide to land it, it's hg import'able.

Attachment #517941 - Attachment is obsolete: true

Mike Beltzner [:beltzner, not reading bugmail]

Comment 36

•

14 years ago

Comment on attachment 517979 [details] [diff] [review]
Abort in case of unsafe program state (for Gecko 2.0) [check-in: comment 51]

r+a=beltzner for landing on mozilla-central to get nightly feedback

Attachment #517979 - Flags: review+

Attachment #517979 - Flags: approval2.0+

Mike Beltzner [:beltzner, not reading bugmail]

Comment 37

•

14 years ago

(the r+ is carried over from Bas, whoops)

Mike Beltzner [:beltzner, not reading bugmail]

Updated

•

14 years ago

Whiteboard: [needs to land on mozilla-central]

(not currently active) Ted Mielczarek

Reporter

Comment 38

•

14 years ago

Okay, so, I found STR thanks to Bas:
1) Run Firefox, load Hulu
2) Run another Firefox -no-remote, load Hulu
3) The first browser will crash

(not currently active) Ted Mielczarek

Reporter

Comment 39

•

14 years ago

I think this means that it's only reproducible when running two instances of Firefox, so this shouldn't be an issue that most people run into.

Jeff Muizelaar [:jrmuizel]

Comment 40

•

14 years ago

(In reply to comment #39)
> I think this means that it's only reproducible when running two instances of
> Firefox, so this shouldn't be an issue that most people run into.

And also suggests that this has to do with the named synchronization primitives.

Bas Schouten (:bas.schouten)

Assignee

Comment 41

•

14 years ago

So, the second hunk of Ehsan's patch will catch what the actual bug is here. I've figured it out after thinking about it more.

The evil here is the named objects (indirectly: Nice catch Ehsan!), if you have two browser sessions open (i.e. two instances of firefox.exe) open in a single user login, the second will open the semaphore of the first (after all the name already exists within that Desktop Session). Then when one process gets a task processed, either one of the two can get released, if the wrong one is released mPendingReadbackTasks will be empty, and everything will fall apart. The fix for this is trivial, safe, and already part 2 on this bug.

This particular bug will only ever occur for users running -two or more- instances of firefox.exe (i.e. starting one with -no-remote), and having used windowless plugins in both of them.

Benjamin Smedberg

Comment 42

•

14 years ago

Not a blocker.

blocking2.0: ? → .x+

Whiteboard: [needs to land on mozilla-central] → [needs to land on mozilla-central][fx4-rc-ridealong]

Benjamin Smedberg

Updated

•

14 years ago

Keywords: relnote

(no longer active)

Comment 43

•

14 years ago

(In reply to comment #41)
> So, the second hunk of Ehsan's patch will catch what the actual bug is here.
> I've figured it out after thinking about it more.
> 
> The evil here is the named objects (indirectly: Nice catch Ehsan!), if you have
> two browser sessions open (i.e. two instances of firefox.exe) open in a single
> user login, the second will open the semaphore of the first (after all the name
> already exists within that Desktop Session). Then when one process gets a task
> processed, either one of the two can get released, if the wrong one is released
> mPendingReadbackTasks will be empty, and everything will fall apart. The fix
> for this is trivial, safe, and already part 2 on this bug.
> 
> This particular bug will only ever occur for users running -two or more-
> instances of firefox.exe (i.e. starting one with -no-remote), and having used
> windowless plugins in both of them.

Is there any other scenario under which we allocate a second instance of ReadbackManagerD3D10 without the first instance's dtor being called?  For example, what happens if the graphics driver crashes/gets reset?

And BTW, does this mean that my abort patch is no longer needed?

(no longer active)

Comment 44

•

14 years ago

And I still think that a scenario such as comment 21 needs to be addressed too.  In fact, I don't think why we're waiting 5 seconds for the thread to finish.  Why can't that be INFINITE?  I think whatever we do here, we need to address this case too, and my patch may help to see if comment 21 happens in practice, even with attachment 517958 [details] [diff] [review].

Andreas Gal :gal

Comment 45

•

14 years ago

I am not particularly confident in the patch, based on the discussion in the bug. We don't seem to understand at all still how this really triggers.

Andreas Gal :gal

Comment 46

•

14 years ago

Comment on attachment 517958 [details] [diff] [review]
Part 2: Stop using named objects [checkin: comment 58]

This seems to be the patch we really want to take instead.

Andreas Gal :gal

Comment 47

•

14 years ago

Also if comment 39 is correct, then this is most definitely not a blocker.

(no longer active)

Comment 48

•

14 years ago

(In reply to comment #45)
> I am not particularly confident in the patch, based on the discussion in the
> bug. We don't seem to understand at all still how this really triggers.

Which patch do you mean?

Andreas Gal :gal

Comment 49

•

14 years ago

The one that is ready to land and got approval. Thats not going to fix anything. The unreviewed ones are the ones we need, right?

(no longer active)

Comment 50

•

14 years ago

(In reply to comment #49)
> The one that is ready to land and got approval. Thats not going to fix
> anything. The unreviewed ones are the ones we need, right?

The purpose of that patch is not to fix anything.  Its purpose is to make the crashes visible in our crash data.  Please see the release-drivers thread which discusses why we need this patch at length.

The "part 2" patch fixes at least one manifestation of this problem, but I'm not convinced that it's all that we need to do in this bug yet.

(no longer active)

Comment 51

•

14 years ago

Attachment 517979 [details] [diff] landed on mozilla-central as <http://hg.mozilla.org/mozilla-central/rev/cae1cd4aae73>.

(I initially landed it on the relbranch by mistake: <http://hg.mozilla.org/mozilla-central/rev/30a98bd29e53> and immediately backed it out: <http://hg.mozilla.org/mozilla-central/rev/864d3507de6e>)

Whiteboard: [needs to land on mozilla-central][fx4-rc-ridealong] → [fx4-rc-ridealong]

(no longer active)

Updated

•

14 years ago

Attachment #517979 - Attachment description: Abort in case of unsafe program state (for Gecko 2.0) → Abort in case of unsafe program state (for Gecko 2.0) [check-in: comment 51]

Jeff Muizelaar [:jrmuizel]

Comment 52

•

14 years ago

Comment on attachment 517958 [details] [diff] [review]
Part 2: Stop using named objects [checkin: comment 58]

This is certainly an improvement, but why do we use win32 primitives instead of nspr here?

Attachment #517958 - Flags: review?(jmuizelaar) → review+

Bas Schouten (:bas.schouten)

Assignee

Comment 53

•

14 years ago

(In reply to comment #52)
> Comment on attachment 517958 [details] [diff] [review]
> Part 2: Stop using named objects
> 
> This is certainly an improvement, but why do we use win32 primitives instead of
> nspr here?

This is platform specific code, roc told me it was fine to use platform primitives directly in these cases.

NSPR only offers us condition variables for use here, these (particularly our implementation), have a not-insignificant overhead. Implementing this using condition variables is also slightly less elegant. Semaphores are better suited for the Producer/Consumer problem and are, at least at this point, not available in an efficient mapping to Win32 APIs.

The main reason was to make the code as thin and light-weight as possible.

Daniel Veditz [:dveditz] (OOO: back Aug 26)

Comment 54

•

14 years ago

The memory corruption memmove/memcpy can cause is usually exploitable, but getting your victim to run two copies of Firefox at the same time is a pretty big hurdle. Guessing [sg:low] severity in practice.

Probably don't need to keep this hidden.

Group: core-security

Whiteboard: [fx4-rc-ridealong] → [sg:low][fx4-rc-ridealong]

Joe Drew (not getting mail)

Updated

•

14 years ago

Whiteboard: [sg:low][fx4-rc-ridealong] → [sg:low][fx4-rc-ridealong][approved-patches-landed]

Stuart Parmenter

Updated

•

14 years ago

Attachment #517958 - Flags: approval2.0+

Joe Drew (not getting mail)

Comment 56

•

14 years ago

Stuart has given us the go-ahead to land attachment 517958 [details] [diff] [review] on mozilla-central. Bas, can you do so? (Or anyone else - adding checkin-needed.)

Keywords: checkin-needed

Whiteboard: [sg:low][fx4-rc-ridealong][approved-patches-landed] → [sg:low][fx4-rc-ridealong]

(no longer active)

Comment 57

•

14 years ago

If there was ever another product using Gecko 2.0, users who ran those programs at the same time as Firefox could potentially run into the same issue as well!

(no longer active)

Comment 58

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/4173c1b88642

After landing I realized that I had committed this under my name.  Going through bash log shows why:

hg qref -U 'Bas Schouten <bas.schouten@live.nl>'

(Note the uppercase U).

Sorry about that.  And in the future, please add a From header to the patches that you attach so that this won't happen again.  Thanks!

Keywords: checkin-needed

(no longer active)

Updated

•

14 years ago

Attachment #517958 - Attachment description: Part 2: Stop using named objects → Part 2: Stop using named objects [checkin: comment 58]

Joel

Comment 59

•

14 years ago

Hi I Just had my first bug marked a dup of this one (sorry)

I can reproduce it is every time in 5 min or less if that helps.

Here's my dup https://bugzilla.mozilla.org/show_bug.cgi?id=641293

Let me know if there is anything a non-coding nOOb can do...

(no longer active)

Comment 60

•

14 years ago

This bug in its original form has been fixed.  Further patches should move to their own bugs.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Daniel Veditz [:dveditz] (OOO: back Aug 26)

Comment 61

•

14 years ago

Ehsan: what are the risks in taking the two approved patches on the mozilla2.0 repo?

blocking2.0: .x+ → ?

Whiteboard: [sg:low][fx4-rc-ridealong] → [sg:moderate][fx4-rc-ridealong] requires two instances of Firefox

Benjamin Smedberg

Comment 62

•

14 years ago

I believe it is absolutely essential to take these on the branch if there are going to be any other apps such as tbird running at the same time. Otherwise, this is going to end up as a guaranteed crash.

(no longer active)

Comment 63

•

14 years ago

(In reply to comment #61)
> Ehsan: what are the risks in taking the two approved patches on the mozilla2.0
> repo?

The *only* patch needed to fix this bug is attachment 517958 [details] [diff] [review].  It is as safe as a patch could get.  We basically create unnamed event objects, which is how event objects should be created by default, so I think this patch is a very safe change to take on a stable branch, and as Benjamin points out in comment 62, we should definitely take it.

(no longer active)

Updated

•

14 years ago

Attachment #517877 - Attachment is obsolete: true

Attachment #517877 - Flags: review?(jmuizelaar)

(no longer active)

Comment 64

•

14 years ago

Comment on attachment 517958 [details] [diff] [review]
Part 2: Stop using named objects [checkin: comment 58]

Asking for approval to take this patch on mozilla-2.0.

Attachment #517958 - Flags: approval2.0+ → approval2.0?

Daniel Veditz [:dveditz] (OOO: back Aug 26)

Updated

•

14 years ago

blocking2.0: ? → Macaw

status2.0: --- → wanted

Daniel Veditz [:dveditz] (OOO: back Aug 26)

Comment 65

•

14 years ago

Comment on attachment 517958 [details] [diff] [review]
Part 2: Stop using named objects [checkin: comment 58]

Approved for the mozilla2.0 repository, a=dveditz for release-drivers

Attachment #517958 - Flags: approval2.0? → approval2.0+

(no longer active)

Comment 66

•

14 years ago

http://hg.mozilla.org/releases/mozilla-2.0/rev/d4b16b847d92

status2.0: wanted → .1-fixed

Joel

Comment 67

•

14 years ago

Not fixed for me. went away for a week or so. now it's back. Forcing me back to Chrome, opera and safari.

Could someone please change the status or un-dup my bug.

(no longer active)

Comment 68

•

14 years ago

(In reply to comment #67)
> Not fixed for me. went away for a week or so. now it's back. Forcing me back to
> Chrome, opera and safari.

Ca you please go to about:crashes, and paste the contents of that page here?

Ryan Pertusio

Comment 69

•

14 years ago

Verifying that crash no longer occurs for the following sites: Hulu, Pogo, Blip (which were 3 easy sites to test the crash).

CRASHES  with 4.0pre (2011-04-01) (20110401030210)or earlier
NO CRASH with 4.0pre (2011-04-02) (20110402030209)

Good job!

Joel

Comment 70

•

14 years ago

No new reports in About:crashes. Ie, it's not leaving a crash report. The only crashes there have the wrong time/date.... It's in the but I lodged that was marked as a dupe (I'm not clever enough to confirm the dupe, Everything on this page goes straight over my head.)

It was fixed in RC2 yet since final release (and I don't think I updated?) it's crashing again.

When it crashes, its really fast, the window just closes (sometime as I type) with no warning and no crash report.

I'm mostly on Facebook and Mafia Wars (with both windows/profiles) when it happens. But it can happen with the Firefox start page as the only tab in two profiles. I don't think it's site related....

about:crashes copy/paste:
Report ID
Date Submitted

bp-84f2e4a9-0e5b-47f9-8faf-9b23021104012/04/20118:41 AM10340c57-8d37-4baf-bb76-ea5d44fac63331/03/20118:16 AM96ca20bc-26d8-4091-9fc5-464d2e65cd555/03/20115:19 PM22e9683e-15df-4426-9f5d-e2cfac2a13a327/01/20114:42 PMc5156629-7bbf-48f8-992b-3b69721ac0392/01/201112:02 PMa2e23d5f-7566-4be2-8381-9ae5538932192/01/20119:28 AMbp-b4cc27c8-d36e-4875-b649-7a0ae210122930/12/201011:22 AMbp-1cfb0957-1991-4e2c-92e1-be444210122727/12/201010:03 PMbp-ad475ca0-4a9d-4237-9500-794d5210122727/12/20109:58 PM9a812b12-70d5-458a-be1c-04b3032884c626/12/20101:57 PM77126794-b2ec-4d92-b538-2724394144ac26/12/201012:19 PMbp-2e0fe8e4-f330-43c7-b01c-fb422210122324/12/201012:34 PMbp-180b52eb-c175-48aa-b4de-c8f6c210122323/12/20108:00 PM

as you can see these crashes go back months, yet I have encountered multiple profile crashes at least daily for the last few weeks (except for a few days between RC2 and final FF4, when it worked fine).... The last one was on 31/3/11 just after 9pm, it was the third in under an hour (I went back to chrome for second profile after that). There should be strings of crash reports 5-10 min apart......

Hope this helps.

Fix potential access-after-free 14 years ago Bas Schouten (:bas.schouten) 6.81 KB, patch		Details \| Diff \| Splinter Review
Fix potential access-after-free v2 14 years ago Bas Schouten (:bas.schouten) 6.64 KB, patch	cjones : review+	Details \| Diff \| Splinter Review
Abort in case of unsafe program state (for Gecko 2.0) 14 years ago (no longer active) 1.96 KB, patch	jrmuizel : review+ cjones : review+ bas.schouten : review+	Details \| Diff \| Splinter Review
Part 2: Stop using named objects [checkin: comment 58] 14 years ago Bas Schouten (:bas.schouten) 1.11 KB, patch	jrmuizel : review+ dveditz : approval2.0+	Details \| Diff \| Splinter Review
Abort in case of unsafe program state (for Gecko 2.0) [check-in: comment 51] 14 years ago (no longer active) 1.99 KB, patch	beltzner : review+ beltzner : approval2.0+	Details \| Diff \| Splinter Review