Closed
Bug 506926
Opened 15 years ago
Closed 15 years ago
reftests crash during test run in CompareCanvas
Categories
(Core :: Graphics: Canvas2D, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
status1.9.2 | --- | beta5-fixed |
People
(Reporter: jmaher, Assigned: crowderbt)
References
Details
Attachments
(1 file, 13 obsolete files)
7.66 KB,
patch
|
crowderbt
:
review+
crowderbt
:
superreview+
beltzner
:
approval1.9.2+
|
Details | Diff | Splinter Review |
While trying to run reftests on windows mobile builds for fennec, I get a crash in xul.dll which points me to this source: http://mxr.mozilla.org/mozilla-central/source/dom/base/nsDOMWindowUtils.cpp#639 This is while running the layout/reftets/image/reftest.list batch of reftests (3 total, first test is problematic). Running reftests have worked in the past (3 weeks ago) on windows mobile, so something has regressed. I am running these without the python script, but instead by launching them directory with cli arguments to fennec.exe.
What are the values/variables there?
Reporter | ||
Comment 2•15 years ago
|
||
I put a debug build on my device and this is what I see: v = memcmp(img1->Data(), img2->Data(), size.width * size.height * 4); img1->mRawPtr->mData = null img2->mRawPtr->mData = null Both img1 and img2 have identical mSize attributes(800x1000). There is no call stack other than: 0x03f9ded8 xul.dll!nsDOMWindowUtils::CompareCanvases(aCanvas1 = 0x00000000, aCanvas2 = 0x0030d3fc, aMaxDifference = 0x00000000, retVal = 0x35bbeab8) asmXPTC_InvokeByIndex
Reporter | ||
Comment 3•15 years ago
|
||
I am still seeing this on a build from 20090910, from my debugger:
> xul.dll!nsDOMWindowUtils::CompareCanvases(nsIDOMHTMLCanvasElement* aCanvas1 = 0x01bd6a3c, nsIDOMHTMLCanvasElement* aCanvas2 = 0x01de7a4c, unsigned int* aMaxDifference = 0x2dbbbb38, unsigned int* retVal = 0x2dbbbb48) Line: 639, Byte Offsets: 0x284 C++
Comment 4•15 years ago
|
||
joel, can you reproduce this one test case without running any other ref test and without browsing any other pages? The reason I ask, is that I am wondering if this is a Out-of-memory (OOM) probably caused by stuff that leads up to the test.
Reporter | ||
Comment 5•15 years ago
|
||
this is actually reproduced by doing this: \tests\fennec\fennec.exe --environ:NO_EM_RESTART=1 -no-remote -profile \tests\reftest\temp1\ -reftest \tests\reftest\tests\layout\reftests\image\reftest.list I can put whatever path or reftest.list in here and it seems to crash. This is before a reftest even loads, and on the device the titlebar says reftest: 0/3 (0%), but the main display area has never drawn a fennec window, it is the same background as before running the test.
Assignee | ||
Updated•15 years ago
|
Assignee: nobody → crowder
Assignee | ||
Comment 6•15 years ago
|
||
This was working for a few moments today, I was able to run reftests, but was crashing on shutdown. Something I did broke that, though, and now I am getting a crash in pixman_walk_composite_region() where it is calling *compositeRect with what seems to be a bogus function ptr. More later...
Assignee | ||
Comment 7•15 years ago
|
||
Oh, I think I have an idea what's wrong here. Something is still trying to use the cached data, following the resize.
Assignee | ||
Comment 8•15 years ago
|
||
This one doesn't crash until shutdown, in the cycle-collector somewhere. Still diagnosing.
Attachment #406579 -
Attachment is obsolete: true
Assignee | ||
Comment 9•15 years ago
|
||
Incidentally, I was wrong in comment #7. The issue in comment #6 was that we were building a pixman image for the DDraw surface which was sized at the source image size, not the size of the DDraw surface.
Assignee | ||
Comment 10•15 years ago
|
||
I'll expound on the jemalloc mayhem tomorrow, as well as do the obvious cleanup this patch needs.
Attachment #406697 -
Attachment is obsolete: true
Comment 11•15 years ago
|
||
Comment on attachment 406821 [details] [diff] [review] This one survives startup and shutdown > #if defined(MOZ_MEMORY_WINCE) && !defined(MOZ_MEMORY_WINCE6) >-#define CHUNK_2POW_DEFAULT 21 >+#define CHUNK_2POW_DEFAULT 20 > #else > #define CHUNK_2POW_DEFAULT 20 > #endif For windows mobile we need to reserve 2Mb (or more) at a time in order to not be limited to the 32Mb of per-process address space. If we take this change knocking it down to 1Mb we'll end up with the "checkerboard of death" all over again.
Assignee | ||
Comment 12•15 years ago
|
||
I don't see why/how VirtualAlloc()ing larger chunks somehow allows us to exceed the 32MB limit.
Assignee | ||
Comment 13•15 years ago
|
||
After consulting the bones, ritually sacrificing two goats, a small and very cute bunny, and one chicken, I have discovered this dark virtual-memory magic. The mystical number 21 remains ascendant, though I still don't understand how its invocation improves our sorcery.
Attachment #406821 -
Attachment is obsolete: true
Assignee | ||
Comment 14•15 years ago
|
||
Comment on attachment 406868 [details] [diff] [review] the correct incantation for VirtualFree() With this patch, I can run the image reftests in debug and release with no crashes.
Attachment #406868 -
Flags: superreview?(pavlov)
Attachment #406868 -
Flags: review?(bugmail)
Comment 15•15 years ago
|
||
Comment on attachment 406868 [details] [diff] [review] the correct incantation for VirtualFree() >Bug 506926 - unable to run reftests on Windows Mobile > >diff --git a/memory/jemalloc/jemalloc.c b/memory/jemalloc/jemalloc.c >--- a/memory/jemalloc/jemalloc.c >+++ b/memory/jemalloc/jemalloc.c >@@ -2215,7 +2215,12 @@ pages_unmap(void *addr, size_t size) > if (GetLastError() == ERROR_INVALID_PARAMETER) { > MEMORY_BASIC_INFORMATION info; > VirtualQuery(addr, &info, sizeof(info)); >- if (VirtualFree(info.AllocationBase, 0, MEM_RELEASE)) >+ >+ // Had trouble! Decommit unilaterally, then release. >+ if (VirtualFree(info.AllocationBase, >+ size + ((char *)info.BaseAddress - >+ (char *)info.AllocationBase), MEM_DECOMMIT) && >+ VirtualFree(info.AllocationBase, 0, MEM_RELEASE)) > return; > } two nits, white space. Align the if with the VirtualQuery. Also I think changing "Decommit unilaterally" to "Force decommit." would be more clear. Also, I'd like to note that according to MSDN's documentation, decommitting before releasing is redundant since VirtualFree called with MEM_RELEASE is supposed to decommit the pages then release them. Finally, since it wasn't clear to you why we use 2Mb chunks, we should probably have a comment about that. The relevant documentation is here: http://msdn.microsoft.com/en-us/library/aa908768.aspx Specifically this section from the remarks > Before Windows Embedded CE 6.0 , if you call VirtualAlloc with dwSize >= 2 MB, flAllocationType set to MEM_RESERVE, and flProtect set to PAGE_NOACCESS, it automatically reserves memory at the shared memory region. This preserves per-process virtual memory. To translate a bit, prior to WinCe 6.0 (Windows Mobile uses WinCe 5.x), all calls to VirtualAlloc with a size <2Mb reserve memory in the process's private address space. That space is only 32Mb, so we're limited to a total of 32Mb of allocations with a chunk size less than 2Mb. I want to run with this patch a bit before reviewing the widget part of it.
Assignee | ||
Comment 16•15 years ago
|
||
(In reply to comment #15) > two nits, white space. Align the if with the VirtualQuery. Also I think > changing "Decommit unilaterally" to "Force decommit." would be more clear. Mind showing me how you want the if () clause with both virtual queries aligned? > Also, I'd like to note that according to MSDN's documentation, decommitting > before releasing is redundant since VirtualFree called with MEM_RELEASE is > supposed to decommit the pages then release them. The problem is (I think) that because of the way we've aligned our memory with two allocations (one of which leaves a large chunk of pages set w/ protected-memory flags), one deallocation isn't enough to release and free all of it. I tried a number of different techniques for releasing memory both inside and outside of jemalloc (in my own app, as you suggested) and this is the only one that seems to be consistently effective against our weird alloc->alloc-aligned code. > Finally, since it wasn't clear to you why we use 2Mb chunks, we should > probably have a comment about that. Agreed. > I want to run with this patch a bit before reviewing the widget part of it. Fair enough.
Comment 17•15 years ago
|
||
(In reply to comment #16) > Mind showing me how you want the if () clause with both virtual queries > aligned? this is what I had in mind: if (VirtualFree(addr, 0, MEM_RELEASE) == 0) { #if defined(MOZ_MEMORY_WINCE) && !defined(MOZ_MEMORY_WINCE6) if (GetLastError() == ERROR_INVALID_PARAMETER) { MEMORY_BASIC_INFORMATION info; VirtualQuery(addr, &info, sizeof(info)); // Had trouble! Force decommit, then release. if (VirtualFree(info.AllocationBase, size + ((char *)info.BaseAddress - (char *)info.AllocationBase), MEM_DECOMMIT) && VirtualFree(info.AllocationBase, 0, MEM_RELEASE)) return; } #endif > > > Also, I'd like to note that according to MSDN's documentation, decommitting > > before releasing is redundant since VirtualFree called with MEM_RELEASE is > > supposed to decommit the pages then release them. > > The problem is (I think) that because of the way we've aligned our memory with > two allocations (one of which leaves a large chunk of pages set w/ > protected-memory flags), one deallocation isn't enough to release and free all > of it. I tried a number of different techniques for releasing memory both > inside and outside of jemalloc (in my own app, as you suggested) and this is > the only one that seems to be consistently effective against our weird > alloc->alloc-aligned code. > It wouldn't be the first time that the msdn documentation wasn't spot on, so I trust you. Decommitting before releasing shouldn't break anything even if the APIs behaved exactly as described, so I think this change is pretty safe. Also, the expectation is that the unaligned allocation is the odd case since once we get an aligned allocation (2Mb + offset), the system allocator should be aligned to our liking, so it shouldn't really hurt us in terms of performance either.
Assignee | ||
Comment 18•15 years ago
|
||
I cleaned up the EnsureSize routine as we discussed on IRC and am offering what I hope is a cleaner stab at the whitespacing.
Attachment #406868 -
Attachment is obsolete: true
Attachment #407346 -
Flags: superreview?
Attachment #407346 -
Flags: review?(bugmail)
Attachment #406868 -
Flags: superreview?(pavlov)
Attachment #406868 -
Flags: review?(bugmail)
Assignee | ||
Updated•15 years ago
|
Attachment #407346 -
Flags: superreview? → superreview?(pavlov)
Assignee | ||
Comment 19•15 years ago
|
||
Fixing tabs in the rest of this function
Attachment #407346 -
Attachment is obsolete: true
Attachment #407358 -
Flags: superreview?(pavlov)
Attachment #407358 -
Flags: review?(bugmail)
Attachment #407346 -
Flags: superreview?(pavlov)
Attachment #407346 -
Flags: review?(bugmail)
Assignee | ||
Comment 20•15 years ago
|
||
Comment on attachment 407358 [details] [diff] [review] whitespace fixing for the rest of this function Argh, missed some.
Attachment #407358 -
Attachment is obsolete: true
Attachment #407358 -
Flags: superreview?(pavlov)
Attachment #407358 -
Flags: review?(bugmail)
Assignee | ||
Comment 21•15 years ago
|
||
Attachment #407359 -
Flags: superreview?(pavlov)
Attachment #407359 -
Flags: review?(bugmail)
Assignee | ||
Comment 22•15 years ago
|
||
Attachment #407359 -
Attachment is obsolete: true
Attachment #407363 -
Flags: superreview?(pavlov)
Attachment #407363 -
Flags: review?(bugmail)
Attachment #407359 -
Flags: superreview?(pavlov)
Attachment #407359 -
Flags: review?(bugmail)
Assignee | ||
Comment 23•15 years ago
|
||
Attachment #407363 -
Attachment is obsolete: true
Attachment #407364 -
Flags: superreview?(pavlov)
Attachment #407364 -
Flags: review?(bugmail)
Attachment #407363 -
Flags: superreview?(pavlov)
Attachment #407363 -
Flags: review?(bugmail)
Assignee | ||
Comment 24•15 years ago
|
||
Attachment #407364 -
Attachment is obsolete: true
Attachment #407365 -
Flags: superreview?(pavlov)
Attachment #407365 -
Flags: review?(bugmail)
Attachment #407364 -
Flags: superreview?(pavlov)
Attachment #407364 -
Flags: review?(bugmail)
Updated•15 years ago
|
Attachment #407365 -
Flags: review?(bugmail) → review+
Comment 25•15 years ago
|
||
Comment on attachment 407365 [details] [diff] [review] With the 1 << 21 chunksize comment this patch doesn't compile
Attachment #407365 -
Flags: review+ → review-
Assignee | ||
Comment 26•15 years ago
|
||
Attachment #407365 -
Attachment is obsolete: true
Attachment #407365 -
Flags: superreview?(pavlov)
Assignee | ||
Comment 27•15 years ago
|
||
yeah, I'd fixed the compilation bug locally, but hadn't uploaded yet. This is better (bugzilla interdiff works)
Assignee | ||
Updated•15 years ago
|
Attachment #407767 -
Flags: superreview?(pavlov)
Attachment #407767 -
Flags: review?(bugmail)
Assignee | ||
Comment 28•15 years ago
|
||
Attachment #407767 -
Attachment is obsolete: true
Attachment #407767 -
Flags: superreview?(pavlov)
Attachment #407767 -
Flags: review?(bugmail)
Assignee | ||
Updated•15 years ago
|
Attachment #407885 -
Flags: review?(bugmail)
Comment 29•15 years ago
|
||
I can run the reftests without the changes to jemalloc in this patch. Perhaps I'm missing something, but if you can as well I'd prefer not to change what doesn't need to be changed.
Updated•15 years ago
|
Attachment #407885 -
Flags: review?(vladimir)
Comment 30•15 years ago
|
||
Comment on attachment 407885 [details] [diff] [review] tabs restored! r=me without the changes to jemalloc
Attachment #407885 -
Flags: review?(bugmail) → review+
Comment on attachment 407885 [details] [diff] [review] tabs restored! This looks mostly fine, just a few questions; no comment on the jemalloc stuff, as that's all black magic :-) >diff --git a/widget/src/windows/nsWindow.h b/widget/src/windows/nsWindow.h >--- a/widget/src/windows/nsWindow.h >+++ b/widget/src/windows/nsWindow.h >@@ -470,9 +470,6 @@ protected: > // Graphics > HDC mPaintDC; // only set during painting > >- static nsAutoPtr<PRUint8> sSharedSurfaceData; >- static gfxIntSize sSharedSurfaceSize; >- Any reason to pull these out of here? They were in here to avoid polluting the global space, EnsureSharedSurface() could become static in here to give it the right access. (Or, the globals should just be static.) >--- a/widget/src/windows/nsWindowGfx.cpp >+++ b/widget/src/windows/nsWindowGfx.cpp >+#define WORDSSIZE(x) ((x).width * (x).height) This scares me a little; are we sure that width*height will never overflow? (That is, it won't ever come from a user, right?) >+PRBool >+EnsureSharedSurfaceSize(gfxIntSize size) >+{ >+ gfxIntSize screenSize; >+ screenSize.height = GetSystemMetrics(SM_CYSCREEN); >+ screenSize.width = GetSystemMetrics(SM_CXSCREEN); >+ >+ if (WORDSSIZE(screenSize) > WORDSSIZE(size)) >+ size = screenSize; >+ >+ if (!sSharedSurfaceData || (WORDSSIZE(size) > WORDSSIZE(sSharedSurfaceSize))) { >+ sSharedSurfaceSize = size; >+ sSharedSurfaceData = nsnull; >+ sSharedSurfaceData = (PRUint8 *)malloc(WORDSSIZE(sSharedSurfaceSize) * 4); No need to assign nsnull first. Also, for surfaces that are bigger than the screen (which shouldn't happen), this means that we'll keep around a surface of that large size -- that seems bad. Are we hitting that case? I'd be more comfortable if there was at least a debug build warning in here that said if size > screensize to warn that that shouldn't happen, and we should figure out why it's happening. (I don't think it ever makes sense to paint an area larger than the screen in this path.)
Attachment #407885 -
Flags: review+ → review?
Assignee | ||
Comment 32•15 years ago
|
||
I'll make the Ensure function and the two globals all static. Thanks for catching that. Surface width and height should never overflow, the data always comes from the engine, not from a user (ie., we don't use these for canvas surfaces in content, afaik -- if that's wrong, please show me how). Actually, I -do- want to assign nsnull first, otherwise (please check my order-of-evaluation and logic here), the data won't be released until after the malloc (that is, the assignment will happen after malloc(), and the original memory won't be released until after the new bits are allocated), which could yield greater fragmentation than otherwise, or even (in limited memory situations) a failure to allocate memory where we might otherwise have been able to, had we released the surface earlier. Since these surfaces are very large allocations (the largest we ever make in my debugging), I think the extra work/code is worth this potential improvement. I believe we only get surfaces larger than the screen while running reftests... does this code need to be smarter about releasing large surfaces where possible? I found myself wishing that the gfx code actually held a reference to these surfaces, would that be a better approach (for a follow-up bug)?
Assignee | ||
Comment 33•15 years ago
|
||
As discussed on IRC, I'll gut the jemalloc changes before landing. They can come in a follow-up bug, if I'm able to reproduce the shutdown crash I was seeing before. As it is, I cannot currently reproduce because of a separate nanojit bug while running my debug build.
Assignee | ||
Comment 34•15 years ago
|
||
Carrying forward blassey's review, and this also includes the static changes discussed previously.
Attachment #407885 -
Attachment is obsolete: true
Attachment #408901 -
Flags: superreview?
Attachment #408901 -
Flags: review+
Attachment #407885 -
Flags: review?(vladimir)
Attachment #407885 -
Flags: review?
Assignee | ||
Updated•15 years ago
|
Attachment #408901 -
Flags: superreview? → superreview?(vladimir)
Attachment #408901 -
Flags: superreview?(vladimir) → superreview+
Comment on attachment 408901 [details] [diff] [review] no jemalloc changes at all Looks fine, but please add the NS_WARNING if the requested size is bigger than the screen size.
Assignee | ||
Comment 36•15 years ago
|
||
Carrying reviews over.
Attachment #408901 -
Attachment is obsolete: true
Attachment #408920 -
Flags: superreview+
Attachment #408920 -
Flags: review+
Comment 37•15 years ago
|
||
pushed http://hg.mozilla.org/mozilla-central/rev/1225e71411ad
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 38•15 years ago
|
||
Does this need to land on 1.9.2 as well?
Comment 39•15 years ago
|
||
(In reply to comment #38) > Does this need to land on 1.9.2 as well? Yes, I think so. We're running all the tests for winmo and wince from 1.9.2 since that is the branch they will be released from.
Assignee | ||
Updated•15 years ago
|
Attachment #408920 -
Flags: approval1.9.2?
Comment 40•15 years ago
|
||
Comment on attachment 408920 [details] [diff] [review] the thing to land a192=beltzner
Attachment #408920 -
Flags: approval1.9.2? → approval1.9.2+
Assignee | ||
Comment 41•15 years ago
|
||
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/ccf05dcea711
status1.9.2:
--- → final-fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•