Closed Bug 1286213 Opened 4 years ago Closed 2 years ago

Crash in `anonymous namespace''::Sk4px::MapDstSrc<T>

Categories

(Core :: Graphics, defect, P3)

46 Branch
x86
Windows XP
defect

Tracking

()

RESOLVED WORKSFORME
mozilla48
Tracking Status
firefox47 --- wontfix
firefox48 --- fixed
firefox49 --- fixed
firefox50 --- fixed

People

(Reporter: bogdan_maris, Assigned: lsalzman)

References

Details

(Keywords: crash, regression, Whiteboard: [gfx-noted])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-1c76cd13-9589-4640-a23e-ad2472160712.
=============================================================

Platforms Affected:
- Only Windows XP 32bit (did not check 64bit)

Versions Affected:
- Firefox 47.0.1
- Firefox 48 beta 6
-- Was unable to reproduce on latest Nightly.

This are not 100% reliable steps, sometimes it does not crash.
STR:
1. Start Firefox with a clean profile
2. Visit http://www23.statcan.gc.ca/imdb-bmdi/pub/instrument/3901_Q2_V3-eng.pdf

Expected: PDF is loaded successfully.

Actual: Just before PDF finishes loading it will cause a Firefox crash.

Additional Notes:
- Not sure if this is the right component to log this crash.
according to crash stats data this seems to be regressing since firefox 46 - is that due to bug 1082598?
Flags: needinfo?(lsalzman)
Keywords: regression
Bogdan, can you try to track down what fixed this? |mozregression --find-fix| should do the trick.
Component: PDF Viewer → Graphics
Flags: needinfo?(bogdan.maris)
Product: Firefox → Core
Version: Trunk → 46 Branch
Given that there have been several Skia updates between then and now, it is more than likely it was one of the updates that did fix it. A mozregression result would confirm, though.
Flags: needinfo?(lsalzman)
46 crashes in the past week, if the fix is simple, we could take it, otherwise we can just ride this out.
Mason Chang and I tried to repro this, but couldn't get it to happen.

So either we need to find a way to more reliably trigger this or hope that mozgression pinpoints something I may have fixed along the way.
Did a bit more investigating, and since most of the bug reports are showing an illegal instruction exception, and in a place where I expect it is using "_mm_storeu_si128", which generates an MOVDQU instruction.

Now, on all of these bug reports, it is reporting Windows XP. And also it is usually reporting a processor that on paper supports SSE2, like a Core i7-2600k.

But yet mysteriously we're still getting an illegal instruction exception. There are only a few possibilities there:
1) the stack traces could be pointing in the wrong direction and it is some other instruction then MOVDQU.
2) the instruction manual hints that if the OSFXSR bit of the CR4 control register is not set by the operating system, indicating it does not support saving and restoring of extended floating-point context state (and which it should be setting to support SSE2 at all), then the instruction can generate the exception

Without being able to reproduce it here, though, it's hard to really pinpoint what's going on beyond those two hypothetical guesses, unless anyone's got better ideas.
(In reply to Lee Salzman [:lsalzman] from comment #6)
> Did a bit more investigating, and since most of the bug reports are showing
> an illegal instruction exception, and in a place where I expect it is using
> "_mm_storeu_si128", which generates an MOVDQU instruction.
> 
> Now, on all of these bug reports, it is reporting Windows XP. And also it is
> usually reporting a processor that on paper supports SSE2, like a Core
> i7-2600k.
> 
> But yet mysteriously we're still getting an illegal instruction exception.
> There are only a few possibilities there:
> 1) the stack traces could be pointing in the wrong direction and it is some
> other instruction then MOVDQU.
> 2) the instruction manual hints that if the OSFXSR bit of the CR4 control
> register is not set by the operating system, indicating it does not support
> saving and restoring of extended floating-point context state (and which it
> should be setting to support SSE2 at all), then the instruction can generate
> the exception
> 
> Without being able to reproduce it here, though, it's hard to really
> pinpoint what's going on beyond those two hypothetical guesses, unless
> anyone's got better ideas.

Actually, this crash report is using VMOVUPS, see also a dump, this is the compiled code:

            if (n >= 8) {
029E128E  cmp         ecx,8  
029E1291  jl          `anonymous namespace'::Sk4px::MapDstSrc<`anonymous namespace'::Src>+25h (029E12AAh)  
                Sk4px dst0 = fn(Load4(dst+0), Load4(src+0)),
                      dst4 = fn(Load4(dst+4), Load4(src+4));
                dst0.store4(dst+0);
029E1293  vmovups     ymm0,ymmword ptr [eax]  
029E1297  vmovups     ymmword ptr [edx],ymm0  
                dst4.store4(dst+4);
                dst += 8; src += 8; n -= 8;
029E129B  sub         ecx,8  
029E129E  add         edx,20h  
029E12A1  add         eax,20h  
029E12A4  test        ecx,ecx  
029E12A6  jg          `anonymous namespace'::Sk4px::MapDstSrc<`anonymous namespace'::Src>+9h (029E128Eh)  
029E12A8  jmp         `anonymous namespace'::Sk4px::MapDstSrc<`anonymous namespace'::Src>+5Eh (029E12E3h)  
                continue;  // Keep our stride at 8 pixels as long as possible.
            }

Unconditional execution of AVX seems weird and like it should cause more issues? Sandy Bridge should support AVX though.
Flags: needinfo?(lsalzman)
(In reply to Bas Schouten (:bas.schouten) from comment #7)
> (In reply to Lee Salzman [:lsalzman] from comment #6)
> > Did a bit more investigating, and since most of the bug reports are showing
> > an illegal instruction exception, and in a place where I expect it is using
> > "_mm_storeu_si128", which generates an MOVDQU instruction.
> > 
> > Now, on all of these bug reports, it is reporting Windows XP. And also it is
> > usually reporting a processor that on paper supports SSE2, like a Core
> > i7-2600k.
> > 
> > But yet mysteriously we're still getting an illegal instruction exception.
> > There are only a few possibilities there:
> > 1) the stack traces could be pointing in the wrong direction and it is some
> > other instruction then MOVDQU.
> > 2) the instruction manual hints that if the OSFXSR bit of the CR4 control
> > register is not set by the operating system, indicating it does not support
> > saving and restoring of extended floating-point context state (and which it
> > should be setting to support SSE2 at all), then the instruction can generate
> > the exception
> > 
> > Without being able to reproduce it here, though, it's hard to really
> > pinpoint what's going on beyond those two hypothetical guesses, unless
> > anyone's got better ideas.
> 
> Actually, this crash report is using VMOVUPS, see also a dump, this is the
> compiled code:
> 
>             if (n >= 8) {
> 029E128E  cmp         ecx,8  
> 029E1291  jl          `anonymous namespace'::Sk4px::MapDstSrc<`anonymous
> namespace'::Src>+25h (029E12AAh)  
>                 Sk4px dst0 = fn(Load4(dst+0), Load4(src+0)),
>                       dst4 = fn(Load4(dst+4), Load4(src+4));
>                 dst0.store4(dst+0);
> 029E1293  vmovups     ymm0,ymmword ptr [eax]  
> 029E1297  vmovups     ymmword ptr [edx],ymm0  
>                 dst4.store4(dst+4);
>                 dst += 8; src += 8; n -= 8;
> 029E129B  sub         ecx,8  
> 029E129E  add         edx,20h  
> 029E12A1  add         eax,20h  
> 029E12A4  test        ecx,ecx  
> 029E12A6  jg          `anonymous namespace'::Sk4px::MapDstSrc<`anonymous
> namespace'::Src>+9h (029E128Eh)  
> 029E12A8  jmp         `anonymous namespace'::Sk4px::MapDstSrc<`anonymous
> namespace'::Src>+5Eh (029E12E3h)  
>                 continue;  // Keep our stride at 8 pixels as long as
> possible.
>             }
> 
> Unconditional execution of AVX seems weird and like it should cause more
> issues? Sandy Bridge should support AVX though.

(Note I'm pretty impressed by the optimizer figuring this out.. it's just a little sad we're getting AVX code unconditionally)

A quick glance shows no other AVX code is being generated for Sk4px.h, suggesting this is probably just fixed by some random changes and the compiler no longer optimizing to AVX code here, it's worrying that it happened at all though.
Bogdan, can you try the following experimental build to see if it fixes the issue for you?
http://archive.mozilla.org/pub/firefox/try-builds/lsalzman@mozilla.com-1000cf4169ece9d767ebf78b4723325af7e7ca1c/try-win32/

This is assuming that there is some link-time code generation issue that is causing the linker to merge AVX versions of the code with code paths that should not be executing AVX code at all, and so this build tries to resolve that issue...
Flags: needinfo?(lsalzman)
(In reply to Lee Salzman [:lsalzman] from comment #9)
> Bogdan, can you try the following experimental build to see if it fixes the
> issue for you?
> http://archive.mozilla.org/pub/firefox/try-builds/lsalzman@mozilla.com-
> 1000cf4169ece9d767ebf78b4723325af7e7ca1c/try-win32/
> 
> This is assuming that there is some link-time code generation issue that is
> causing the linker to merge AVX versions of the code with code paths that
> should not be executing AVX code at all, and so this build tries to resolve
> that issue...

I tried a couple of times to reproduce the initial crash using this trybuild and I can report that I haven't received the same signature. I did manage to crash it though three times:

bp-64f6b19b-a319-4d64-9d59-8b13c2160721
bp-7a09d573-b7af-4731-a510-a79422160721
bp-2c208bcb-93ec-4f67-a98a-c83c92160720
Flags: needinfo?(bogdan.maris) → needinfo?(lsalzman)
The freebl3 crashes are worrying because they indicate AVX may be causing crashes elsewhere, in this case the nss library.

Go to about:telemetry -> Environment Data -> system. Tell me what the cpu.extensions line says.
Flags: needinfo?(lsalzman) → needinfo?(bogdan.maris)
(In reply to Lee Salzman [:lsalzman] from comment #11)
> The freebl3 crashes are worrying because they indicate AVX may be causing
> crashes elsewhere, in this case the nss library.
> 
> Go to about:telemetry -> Environment Data -> system. Tell me what the
> cpu.extensions line says.

cpu.extensions	[hasMMX, hasSSE, hasSSE2, hasSSE3, hasSSSE3, hasSSE4A, hasSSE4_1, hasSSE4_2]
Flags: needinfo?(bogdan.maris)
This removes the AVX optimization paths from Skia. Skia upstream removed these as they seemingly did not offer strong enough benefits to bother keeping them. We no longer use these in aurora or nightly. So let's just disable them here since that is safer and will likely address this bug.
Assignee: nobody → lsalzman
Status: NEW → ASSIGNED
Attachment #8773904 - Flags: review?(milan)
Attachment #8773904 - Flags: review?(milan) → review+
Comment on attachment 8773904 [details] [diff] [review]
remove unnecessary AVX optimizations from Skia

Approval Request Comment
[Feature/regressing bug #]: Bug 1082598
[User impact if declined]: Canvas-related crashes on Windows XP.
[Describe test coverage new/current, TreeHerder]: crashtests, reftests, mochitests
[Risks and why]: Little to no risk. We already removed the AVX paths in the Skia version we use in aurora and nightly, and the reporter can not reproduce the issue in those versions. So we're better off removing this problematic optimization path in beta too.
[String/UUID change made/needed]: None
Attachment #8773904 - Flags: approval-mozilla-beta?
In a local beta build using VS2013, vmovups & ymm combinations disappear after this patch.
Comment on attachment 8773904 [details] [diff] [review]
remove unnecessary AVX optimizations from Skia

Review of attachment 8773904 [details] [diff] [review]:
-----------------------------------------------------------------

This patch fixes a Canvas-related crashes. Let's take it in 48 beta RC.
Attachment #8773904 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
 https://hg.mozilla.org/releases/mozilla-beta/rev/f3d7abb885c2

Lee, btw can you make sure next time the commit message is proper to the requirements like Bug Number - Title and r=reviewer 

This would help to save sometime for the one who do the checkin, thanks! :)
Target Milestone: --- → mozilla48
Did something slip through the cracks for 49 here?
Flags: needinfo?(lsalzman)
I believe the bug just didn't get tagged with the patch that landed.
(In reply to Ryan VanderMeulen [:RyanVM] from comment #18)
> Did something slip through the cracks for 49 here?

When we updated to Skia in bug 1265131, which was within 49, there were no longer any AVX codepaths in upstream Skia to remove, so it is not afflicted by this bug.

So now that 48 has been patched, and provided we've not noticed any new occurrences of this bug related to this cause, I would consider this fixed.
Flags: needinfo?(lsalzman)
See Also: → 1265131
Priority: -- → P3
Whiteboard: [gfx-noted]
No recent occurrences of this bug. So marking it as resolved for now.
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.