reftests crash during test run in CompareCanvas

RESOLVED FIXED

Status

()

Core
Canvas: 2D
RESOLVED FIXED
8 years ago
8 years ago

People

(Reporter: jmaher, Assigned: Brian Crowder)

Tracking

Trunk
ARM
Windows Mobile 6 Professional
Points:
---

Firefox Tracking Flags

(status1.9.2 beta5-fixed)

Details

Attachments

(1 attachment, 13 obsolete attachments)

7.66 KB, patch
Brian Crowder
: review+
Brian Crowder
: superreview+
beltzner
: approval1.9.2+
Details | Diff | Splinter Review
(Reporter)

Description

8 years ago
While trying to run reftests on windows mobile builds for fennec, I get a crash in xul.dll which points me to this source:
 http://mxr.mozilla.org/mozilla-central/source/dom/base/nsDOMWindowUtils.cpp#639

This is while running the layout/reftets/image/reftest.list batch of reftests (3 total, first test is problematic).

Running reftests have worked in the past (3 weeks ago) on windows mobile, so something has regressed.  I am running these without the python script, but instead by launching them directory with cli arguments to fennec.exe.
What are the values/variables there?
(Reporter)

Comment 2

8 years ago
I put a debug build on my device and this is what I see:
v = memcmp(img1->Data(), img2->Data(), size.width * size.height * 4);

img1->mRawPtr->mData = null
img2->mRawPtr->mData = null

Both img1 and img2 have identical mSize attributes(800x1000).

There is no call stack other than:
0x03f9ded8
xul.dll!nsDOMWindowUtils::CompareCanvases(aCanvas1 = 0x00000000, aCanvas2 = 0x0030d3fc, aMaxDifference = 0x00000000, retVal = 0x35bbeab8)
asmXPTC_InvokeByIndex
(Reporter)

Comment 3

8 years ago
I am still seeing this on a build from 20090910, from my debugger:

>	xul.dll!nsDOMWindowUtils::CompareCanvases(nsIDOMHTMLCanvasElement* aCanvas1 = 0x01bd6a3c, nsIDOMHTMLCanvasElement* aCanvas2 = 0x01de7a4c, unsigned int* aMaxDifference = 0x2dbbbb38, unsigned int* retVal = 0x2dbbbb48) Line: 639, Byte Offsets: 0x284	C++

Comment 4

8 years ago
joel, can you reproduce this one test case without running any other ref test and without browsing any other pages?  The reason I ask, is that I am wondering if this is a Out-of-memory (OOM) probably caused by stuff that leads up to the test.
(Reporter)

Comment 5

8 years ago
this is actually reproduced by doing this:

\tests\fennec\fennec.exe --environ:NO_EM_RESTART=1 -no-remote -profile \tests\reftest\temp1\ -reftest \tests\reftest\tests\layout\reftests\image\reftest.list

I can put whatever path or reftest.list in here and it seems to crash.  This is before a reftest even loads, and on the device the titlebar says reftest: 0/3 (0%), but the main display area has never drawn a fennec window, it is the same background as before running the test.
(Assignee)

Updated

8 years ago
Assignee: nobody → crowder
(Assignee)

Comment 6

8 years ago
Created attachment 406579 [details] [diff] [review]
helps

This was working for a few moments today, I was able to run reftests, but was crashing on shutdown.  Something I did broke that, though, and now I am getting a crash in pixman_walk_composite_region() where it is calling *compositeRect with what seems to be a bogus function ptr.  More later...
(Assignee)

Comment 7

8 years ago
Oh, I think I have an idea what's wrong here.  Something is still trying to use the cached data, following the resize.
(Assignee)

Comment 8

8 years ago
Created attachment 406697 [details] [diff] [review]
better

This one doesn't crash until shutdown, in the cycle-collector somewhere.  Still diagnosing.
Attachment #406579 - Attachment is obsolete: true
(Assignee)

Comment 9

8 years ago
Incidentally, I was wrong in comment #7.  The issue in comment #6 was that we were building a pixman image for the DDraw surface which was sized at the source image size, not the size of the DDraw surface.
(Assignee)

Comment 10

8 years ago
Created attachment 406821 [details] [diff] [review]
This one survives startup and shutdown

I'll expound on the jemalloc mayhem tomorrow, as well as do the obvious cleanup this patch needs.
Attachment #406697 - Attachment is obsolete: true
Comment on attachment 406821 [details] [diff] [review]
This one survives startup and shutdown

> #if defined(MOZ_MEMORY_WINCE) && !defined(MOZ_MEMORY_WINCE6)
>-#define	CHUNK_2POW_DEFAULT	21
>+#define	CHUNK_2POW_DEFAULT	20
> #else
> #define	CHUNK_2POW_DEFAULT	20
> #endif

For windows mobile we need to reserve 2Mb (or more) at a time in order to not be limited to the 32Mb of per-process address space.  If we take this change knocking it down to 1Mb we'll end up with the "checkerboard of death" all over again.
(Assignee)

Comment 12

8 years ago
I don't see why/how VirtualAlloc()ing larger chunks somehow allows us to exceed the 32MB limit.
(Assignee)

Comment 13

8 years ago
Created attachment 406868 [details] [diff] [review]
the correct incantation for VirtualFree()

After consulting the bones, ritually sacrificing two goats, a small and very cute bunny, and one chicken, I have discovered this dark virtual-memory magic.  The mystical number 21 remains ascendant, though I still don't understand how its invocation improves our sorcery.
Attachment #406821 - Attachment is obsolete: true
(Assignee)

Comment 14

8 years ago
Comment on attachment 406868 [details] [diff] [review]
the correct incantation for VirtualFree()

With this patch, I can run the image reftests in debug and release with no crashes.
Attachment #406868 - Flags: superreview?(pavlov)
Attachment #406868 - Flags: review?(bugmail)
Comment on attachment 406868 [details] [diff] [review]
the correct incantation for VirtualFree()

>Bug 506926 - unable to run reftests on Windows Mobile
>
>diff --git a/memory/jemalloc/jemalloc.c b/memory/jemalloc/jemalloc.c
>--- a/memory/jemalloc/jemalloc.c
>+++ b/memory/jemalloc/jemalloc.c
>@@ -2215,7 +2215,12 @@ pages_unmap(void *addr, size_t size)
> 		if (GetLastError() == ERROR_INVALID_PARAMETER) {
> 			MEMORY_BASIC_INFORMATION info;
> 			VirtualQuery(addr, &info, sizeof(info));
>-			if (VirtualFree(info.AllocationBase, 0, MEM_RELEASE))
>+
>+      // Had trouble!  Decommit unilaterally, then release.
>+      if (VirtualFree(info.AllocationBase, 
>+                      size + ((char *)info.BaseAddress -
>+                      (char *)info.AllocationBase), MEM_DECOMMIT) &&
>+          VirtualFree(info.AllocationBase, 0, MEM_RELEASE))
> 				return;
> 		}

two nits, white space.  Align the if with the VirtualQuery. Also I think changing "Decommit unilaterally" to  "Force decommit." would be more clear.

Also, I'd like to note that according to MSDN's documentation, decommitting before releasing is redundant since VirtualFree called with MEM_RELEASE is supposed to decommit the pages then release them.

Finally, since it wasn't clear to you why we use 2Mb chunks, we should probably have a comment about that. The relevant documentation is here: http://msdn.microsoft.com/en-us/library/aa908768.aspx Specifically this section from the remarks
> Before Windows Embedded CE 6.0 , if you call VirtualAlloc with dwSize >= 2 MB, flAllocationType  set to MEM_RESERVE, and flProtect set to PAGE_NOACCESS, it automatically reserves memory at the shared memory region. This preserves per-process virtual memory.

To translate a bit, prior to WinCe 6.0 (Windows Mobile uses WinCe 5.x), all calls to VirtualAlloc with a size <2Mb reserve memory in the process's private address space. That space is only 32Mb, so we're limited to a total of 32Mb of allocations with a chunk size less than 2Mb.

I want to run with this patch a  bit before reviewing the widget part of it.
(Assignee)

Comment 16

8 years ago
(In reply to comment #15)
> two nits, white space.  Align the if with the VirtualQuery. Also I think
> changing "Decommit unilaterally" to  "Force decommit." would be more clear.

Mind showing me how you want the if () clause with both virtual queries aligned?

> Also, I'd like to note that according to MSDN's documentation, decommitting
> before releasing is redundant since VirtualFree called with MEM_RELEASE is
> supposed to decommit the pages then release them.

The problem is (I think) that because of the way we've aligned our memory with two allocations (one of which leaves a large chunk of pages set w/ protected-memory flags), one deallocation isn't enough to release and free all of it.  I tried a number of different techniques for releasing memory both inside and outside of jemalloc (in my own app, as you suggested) and this is the only one that seems to be consistently effective against our weird alloc->alloc-aligned code.
 

> Finally, since it wasn't clear to you why we use 2Mb chunks, we should
> probably have a comment about that.

Agreed.


> I want to run with this patch a  bit before reviewing the widget part of it.

Fair enough.
(In reply to comment #16)

> Mind showing me how you want the if () clause with both virtual queries
> aligned?
this is what I had in mind:

	if (VirtualFree(addr, 0, MEM_RELEASE) == 0) {
#if defined(MOZ_MEMORY_WINCE) && !defined(MOZ_MEMORY_WINCE6)
		if (GetLastError() == ERROR_INVALID_PARAMETER) {
			MEMORY_BASIC_INFORMATION info;
			VirtualQuery(addr, &info, sizeof(info));

			// Had trouble!  Force decommit, then release.
			if (VirtualFree(info.AllocationBase, 
					size + ((char *)info.BaseAddress -
						(char *)info.AllocationBase), 
					MEM_DECOMMIT) &&
			    VirtualFree(info.AllocationBase, 0, MEM_RELEASE))
				return;
		}
#endif


> 
> > Also, I'd like to note that according to MSDN's documentation, decommitting
> > before releasing is redundant since VirtualFree called with MEM_RELEASE is
> > supposed to decommit the pages then release them.
> 
> The problem is (I think) that because of the way we've aligned our memory with
> two allocations (one of which leaves a large chunk of pages set w/
> protected-memory flags), one deallocation isn't enough to release and free all
> of it.  I tried a number of different techniques for releasing memory both
> inside and outside of jemalloc (in my own app, as you suggested) and this is
> the only one that seems to be consistently effective against our weird
> alloc->alloc-aligned code.
> 

It wouldn't be the first time that the msdn documentation wasn't spot on, so I trust you.  Decommitting before releasing shouldn't break anything even if the APIs behaved exactly as described, so I think this change is pretty safe. Also, the expectation is that the unaligned allocation is the odd case since once we get an aligned allocation (2Mb + offset), the system allocator should be aligned to our liking, so it shouldn't really hurt us in terms of performance either.
(Assignee)

Comment 18

8 years ago
Created attachment 407346 [details] [diff] [review]
cleanup

I cleaned up the EnsureSize routine as we discussed on IRC and am offering what I hope is a cleaner stab at the whitespacing.
Attachment #406868 - Attachment is obsolete: true
Attachment #407346 - Flags: superreview?
Attachment #407346 - Flags: review?(bugmail)
Attachment #406868 - Flags: superreview?(pavlov)
Attachment #406868 - Flags: review?(bugmail)
(Assignee)

Updated

8 years ago
Attachment #407346 - Flags: superreview? → superreview?(pavlov)
(Assignee)

Comment 19

8 years ago
Created attachment 407358 [details] [diff] [review]
whitespace fixing for the rest of this function

Fixing tabs in the rest of this function
Attachment #407346 - Attachment is obsolete: true
Attachment #407358 - Flags: superreview?(pavlov)
Attachment #407358 - Flags: review?(bugmail)
Attachment #407346 - Flags: superreview?(pavlov)
Attachment #407346 - Flags: review?(bugmail)
(Assignee)

Comment 20

8 years ago
Comment on attachment 407358 [details] [diff] [review]
whitespace fixing for the rest of this function

Argh, missed some.
Attachment #407358 - Attachment is obsolete: true
Attachment #407358 - Flags: superreview?(pavlov)
Attachment #407358 - Flags: review?(bugmail)
(Assignee)

Comment 21

8 years ago
Created attachment 407359 [details] [diff] [review]
one more swing
Attachment #407359 - Flags: superreview?(pavlov)
Attachment #407359 - Flags: review?(bugmail)
(Assignee)

Comment 22

8 years ago
Created attachment 407363 [details] [diff] [review]
mayhem!
Attachment #407359 - Attachment is obsolete: true
Attachment #407363 - Flags: superreview?(pavlov)
Attachment #407363 - Flags: review?(bugmail)
Attachment #407359 - Flags: superreview?(pavlov)
Attachment #407359 - Flags: review?(bugmail)
(Assignee)

Comment 23

8 years ago
Created attachment 407364 [details] [diff] [review]
tabs are the bane of my existence
Attachment #407363 - Attachment is obsolete: true
Attachment #407364 - Flags: superreview?(pavlov)
Attachment #407364 - Flags: review?(bugmail)
Attachment #407363 - Flags: superreview?(pavlov)
Attachment #407363 - Flags: review?(bugmail)
(Assignee)

Comment 24

8 years ago
Created attachment 407365 [details] [diff] [review]
With the 1 << 21 chunksize comment
Attachment #407364 - Attachment is obsolete: true
Attachment #407365 - Flags: superreview?(pavlov)
Attachment #407365 - Flags: review?(bugmail)
Attachment #407364 - Flags: superreview?(pavlov)
Attachment #407364 - Flags: review?(bugmail)
Attachment #407365 - Flags: review?(bugmail) → review+
Comment on attachment 407365 [details] [diff] [review]
With the 1 << 21 chunksize comment

this patch doesn't compile
Attachment #407365 - Flags: review+ → review-
(Assignee)

Comment 26

8 years ago
Created attachment 407767 [details] [diff] [review]
compiles
Attachment #407365 - Attachment is obsolete: true
Attachment #407365 - Flags: superreview?(pavlov)
(Assignee)

Comment 27

8 years ago
yeah, I'd fixed the compilation bug locally, but hadn't uploaded yet.  This is better (bugzilla interdiff works)
(Assignee)

Updated

8 years ago
Attachment #407767 - Flags: superreview?(pavlov)
Attachment #407767 - Flags: review?(bugmail)
(Assignee)

Comment 28

8 years ago
Created attachment 407885 [details] [diff] [review]
tabs restored!
Attachment #407767 - Attachment is obsolete: true
Attachment #407767 - Flags: superreview?(pavlov)
Attachment #407767 - Flags: review?(bugmail)
(Assignee)

Updated

8 years ago
Attachment #407885 - Flags: review?(bugmail)
I can run the reftests without the changes to jemalloc in this patch. Perhaps I'm missing something, but if you can as well I'd prefer not to change what doesn't need to be changed.
Attachment #407885 - Flags: review?(vladimir)
Comment on attachment 407885 [details] [diff] [review]
tabs restored!

r=me without the changes to jemalloc
Attachment #407885 - Flags: review?(bugmail) → review+
Comment on attachment 407885 [details] [diff] [review]
tabs restored!

This looks mostly fine, just a few questions; no comment on the jemalloc stuff, as that's all black magic :-)

>diff --git a/widget/src/windows/nsWindow.h b/widget/src/windows/nsWindow.h
>--- a/widget/src/windows/nsWindow.h
>+++ b/widget/src/windows/nsWindow.h
>@@ -470,9 +470,6 @@ protected:
>   // Graphics
>   HDC                   mPaintDC; // only set during painting
> 
>-  static nsAutoPtr<PRUint8> sSharedSurfaceData;
>-  static gfxIntSize     sSharedSurfaceSize;
>-

Any reason to pull these out of here?  They were in here to avoid polluting the global space, EnsureSharedSurface() could become static in here to give it the right access.  (Or, the globals should just be static.)

>--- a/widget/src/windows/nsWindowGfx.cpp
>+++ b/widget/src/windows/nsWindowGfx.cpp

>+#define WORDSSIZE(x) ((x).width * (x).height)

This scares me a little; are we sure that width*height will never overflow?  (That is, it won't ever come from a user, right?)

>+PRBool
>+EnsureSharedSurfaceSize(gfxIntSize size)
>+{
>+  gfxIntSize screenSize;
>+  screenSize.height = GetSystemMetrics(SM_CYSCREEN);
>+  screenSize.width = GetSystemMetrics(SM_CXSCREEN);
>+
>+  if (WORDSSIZE(screenSize) > WORDSSIZE(size))
>+    size = screenSize;
>+
>+  if (!sSharedSurfaceData || (WORDSSIZE(size) > WORDSSIZE(sSharedSurfaceSize))) {
>+    sSharedSurfaceSize = size;
>+    sSharedSurfaceData = nsnull;
>+    sSharedSurfaceData = (PRUint8 *)malloc(WORDSSIZE(sSharedSurfaceSize) * 4);

No need to assign nsnull first.

Also, for surfaces that are bigger than the screen (which shouldn't happen), this means that we'll keep around a surface of that large size -- that seems bad.  Are we hitting that case?  I'd be more comfortable if there was at least a debug build warning in here that said if size > screensize to warn that that shouldn't happen, and we should figure out why it's happening.  (I don't think it ever makes sense to paint an area larger than the screen in this path.)
Attachment #407885 - Flags: review+ → review?
(Assignee)

Comment 32

8 years ago
I'll make the Ensure function and the two globals all static.  Thanks for catching that.

Surface width and height should never overflow, the data always comes from the engine, not from a user (ie., we don't use these for canvas surfaces in content, afaik -- if that's wrong, please show me how).

Actually, I -do- want to assign nsnull first, otherwise (please check my order-of-evaluation and logic here), the data won't be released until after the malloc (that is, the assignment will happen after malloc(), and the original memory won't be released until after the new bits are allocated), which could yield greater fragmentation than otherwise, or even (in limited memory situations) a failure to allocate memory where we might otherwise have been able to, had we released the surface earlier.  Since these surfaces are very large allocations (the largest we ever make in my debugging), I think the extra work/code is worth this potential improvement.

I believe we only get surfaces larger than the screen while running reftests...  does this code need to be smarter about releasing large surfaces where possible?

I found myself wishing that the gfx code actually held a reference to these surfaces, would that be a better approach (for a follow-up bug)?
(Assignee)

Comment 33

8 years ago
As discussed on IRC, I'll gut the jemalloc changes before landing.  They can come in a follow-up bug, if I'm able to reproduce the shutdown crash I was seeing before.  As it is, I cannot currently reproduce because of a separate nanojit bug while running my debug build.
(Assignee)

Comment 34

8 years ago
Created attachment 408901 [details] [diff] [review]
no jemalloc changes at all

Carrying forward blassey's review, and this also includes the static changes discussed previously.
Attachment #407885 - Attachment is obsolete: true
Attachment #408901 - Flags: superreview?
Attachment #408901 - Flags: review+
Attachment #407885 - Flags: review?(vladimir)
Attachment #407885 - Flags: review?
(Assignee)

Updated

8 years ago
Attachment #408901 - Flags: superreview? → superreview?(vladimir)
Attachment #408901 - Flags: superreview?(vladimir) → superreview+
Comment on attachment 408901 [details] [diff] [review]
no jemalloc changes at all

Looks fine, but please add the NS_WARNING if the requested size is bigger than the screen size.
(Assignee)

Updated

8 years ago
Blocks: 525058
(Assignee)

Updated

8 years ago
No longer blocks: 525058
(Assignee)

Comment 36

8 years ago
Created attachment 408920 [details] [diff] [review]
the thing to land

Carrying reviews over.
Attachment #408901 - Attachment is obsolete: true
Attachment #408920 - Flags: superreview+
Attachment #408920 - Flags: review+
pushed http://hg.mozilla.org/mozilla-central/rev/1225e71411ad
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
(Assignee)

Comment 38

8 years ago
Does this need to land on 1.9.2 as well?

Comment 39

8 years ago
(In reply to comment #38)
> Does this need to land on 1.9.2 as well?
Yes, I think so.  We're running all the tests for winmo and wince from 1.9.2 since that is the branch they will be released from.
(Assignee)

Updated

8 years ago
Attachment #408920 - Flags: approval1.9.2?
Comment on attachment 408920 [details] [diff] [review]
the thing to land

a192=beltzner
Attachment #408920 - Flags: approval1.9.2? → approval1.9.2+
(Assignee)

Comment 41

8 years ago
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/ccf05dcea711
status1.9.2: --- → final-fixed
You need to log in before you can comment on or make changes to this bug.