Open Bug 1255130 Opened 4 years ago Updated 9 months ago

Firefox crashes when opening a specific page on thepointsguy.com

Categories

(Core :: Graphics: Layers, defect, P3, critical)

Unspecified
All
defect

Tracking

()

REOPENED
Tracking Status
e10s - ---
firefox46 --- wontfix
firefox47 --- wontfix
firefox48 --- wontfix
firefox-esr52 --- wontfix
firefox-esr60 --- fix-optional
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- wontfix

People

(Reporter: mathieu.marquer, Unassigned)

References

()

Details

(Keywords: crash, regression, Whiteboard: gfx-noted)

Crash Data

Attachments

(2 files, 3 obsolete files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0
Build ID: 20160309030419

Steps to reproduce:

Open http://thepointsguy.com/2015/11/google-flights-updates-trains-lufthansa/ and wait a few seconds


Actual results:

Firefox crashes, even with a clean new profile.
Crash report: https://crash-stats.mozilla.com/report/index/67bbfc44-ce33-4929-b135-d042e2160309


Expected results:

Firefox should not crash.
Crash Signature: [@ libc-2.21.so@0x172488 ]
Product: Firefox → Core
I can reproduce on Ubuntu14.04 32bit.
And the crash seems only with e10s+APZ both enabled

Steps to reproduce:
1. Open http://thepointsguy.com/2015/11/google-flights-updates-trains-lufthansa/
2. Scroll up/down by dragging thumb and arrow button

bp-5321aafb-0c6e-4f21-ac63-cae0a2160310
bp-55a04191-f9f5-4af6-9f6f-938992160310
Severity: normal → critical
tracking-e10s: --- → ?
crash in:

mozilla::layers::ShmemTextureData::Create(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits>, mozilla::gfx::SurfaceFormat, mozilla::gfx::BackendType, mozilla::layers::TextureFlags, mozilla::layers::TextureAllocationFlags, mozilla::layers::ISurfaceAllocator*)

I can't find other crashes in 'ShmemTextureData::Create' in crashstats.
Component: Untriaged → Graphics
OS: Unspecified → Linux
When I repro this on Linux I get an assertion failure in HitTestingTreeNode.cpp:88 about aChild->GetApzc() != parent. I suspect we are getting a bad layer tree, same as in bug 1255725 which was duped to bug 1250718. Marking that as a dependency.
Status: UNCONFIRMED → NEW
Component: Graphics → Graphics: Layers
Depends on: 1250718
Ever confirmed: true
Attached file Layers dump
Here's the layers dump. Layer 0x7f8756e38800, scrollId=3 has a child 0x7f8756e39800 which is also scrollId=3.
Firefox: 48.0a1, Build ID: 20160316030233
User Agent  Mozilla/5.0 (X11; Linux i686; rv:48.0) Gecko/20100101 Firefox/48.0

Hi, 

I have tested this issue on the latest Nightly (48.0a1) build and latest Firefox (45.0) release, but I could not reproduce it. I have opened the provided page and scrolled up and down a lot time but the browser did not crash. I have also tested on Windows 7 x64, Ubuntu 12.04 x32, Ubuntu 14.04 x64. During the tests e10s and APZ were enabled.

I am willing to perform a regression on this issue, but since I was unable to reproduce it maybe someone who is able to reproduce this issue can also find a regression window (http://mozilla.github.io/mozregression/). 

Thanks,
Cosmin.
This seems to be fixed in m-c with bug 1250718. I'm duping it over; it should be fixed in tomorrow's nightly build (March 19).
Status: NEW → RESOLVED
Closed: 4 years ago
No longer depends on: 1250718
Resolution: --- → DUPLICATE
Duplicate of bug: 1250718
Still crashes, but the signature is now [@ mozilla::layers::ShmemTextureData::Create ]
Crash Signature: [@ libc-2.21.so@0x172488 ] → [@ libc-2.21.so@0x172488 ] [@ mozilla::layers::ShmemTextureData::Create ]
Reopening per bug 1250718 comment 29, 30.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Nical, looks like something you might be interested in. Reproducible crash while memset'ing a shmem buffer?
Flags: needinfo?(nical.bugzilla)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #9)
> Nical, looks like something you might be interested in. Reproducible crash
> while memset'ing a shmem buffer?

I haven't been able to reproduce on my computer but by reading the code, all I say is:
 - It's a sigbus on a "very aligned" address (0x7f1ba5e4b000) (looks like the beginning of a page).
 - The same uint32_t bufSize value is passed to both AllocUnsafeShmem and and InitBuffer, none of which truncate it to a an integer type with less bits, so there is no risk of overflow causing a difference between what we ask AllocUnsafeShmem and what we memset right after.
 - If AllocUnsafeShmem failed, it should have returned false (it didn't, though).

So I assume the problem is somewhere in the shmem allocation code. I am not used to seeing sigbus on something that is not a trivially misaligned address, so this part is interesting. Perhaps it failed somewhere but didn't return false.If so it could be that allocation itself failed (then where is this address coming from? If the shmem was not assigned some value, it would have a null pointer from its default ctor) or that it allocated something smaller than what was asked and memset ended up stepping on the beginning of the guard page at the end of the shmem, or that the shmem was properly allocated but some pages were not marked as writable as they should have (would these trigger sigbus or sigsegv?).
I don't know the shmem allocation code, so my very hand-wavy theory doesn't go further than that.
Flags: needinfo?(nical.bugzilla)
For the record the latest crash reports submitted by Mathieu (in bug 1250718) are:

https://crash-stats.mozilla.com/report/index/b8dbc58a-ab06-4612-9c31-c53d22160319
https://crash-stats.mozilla.com/report/index/803dcbcd-1c56-4e72-b2e8-75dea2160319

I'm wondering if the bufSize is too large relative to the actual buffer, and so the memset runs off the end of the buffer and hits the SIGBUS from that.
I'm trying to produce a crash report with every nightly, in case the bug disappears somewhere.

The address is always a "very aligned" address, but not always the same.
https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Alayers%3A%3AShmemTextureData%3A%3ACreate#tab-reports

Tried today on a Windows x86_64 computer, couldn't get Firefox to crash, could be a Linux issue then?
Possibly. I spent some time looking at the code and couldn't see any obvious problems. I think the next step is to put together a build with additional logging that you can run and hopefully it will give us some more useful information. I can do this, but nical, feel free to pre-empt me since you know this code much better.
Mathieu, can you reproduce the problem with the build at [1], collect the output to stderr, and attach it to this bug? Thanks!

[1] http://archive.mozilla.org/pub/firefox/try-builds/kgupta@mozilla.com-21f73c75fa45497c4221f7f6b7a81e8dc034c973/try-linux64-debug/firefox-48.0a1.en-US.linux-x86_64.tar.bz2
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #14)
> collect the
> output to stderr,

Actually, just collect all output (stdout and stderr), there might be some useful stuff in stdout too.
Attached file stdout_stderr_comment16.txt (obsolete) —
Here it is. Surprinsgly (or not?), FF doesn't crash, but just hangs, so the last lines of the output appear when I force close Firefox.

Created within a clean FF profile.
Thanks! This is odd, you're running into the assertion failure that I was seeing back in comment 3 and that should have been fixed by bug 1250718.
Ugh, my try push was apparently based on an old version of mozilla-central (from the 17th), so it doesn't have that fix. Sorry about that, I'll make a new one with the latest code.
Parking this bug with me for now.
Assignee: nobody → bugmail.mozilla
Attached file stdout_stderr_comment21.txt (obsolete) —
So... I tried twice, and it looks like I got two different traces. On the first try (here), FF just stopped responding.
Attachment #8733018 - Attachment is obsolete: true
Attached file stdout_stderr_comment22.txt (obsolete) —
And on the second try (here), FF did show the "oops, this tab has crashed" screen, I chose not to restore it, and FF exited apparently normally on its own.
Thanks! The first trace you got (comment 21) looks like an unrelated bug - an assertion failure in Skia which caused the child process to crash with a SIGSEGV, so it got caught by the debugger hook. I'll file another bug for that.

The second trace looks like it's the one we care about. The child process died, but it must have died with something other than SIGSEGV because the debugger hook didn't get tripped, It also happened right after printing

Bug1255130: Requested unsafe shmem for 1270x879 (bufSize 4465328 pageSize 4096), got 0x7fda75da7000

but before the corresponding "Bug1255130: InitBuffer passed" output, so it most likely died in the InitBuffer, same as the stacks we have from crash-stats.

The problem is, this unsafe shmem allocation doesn't appear to be any different than the hundreds that preceded it. Each one appears to be requesting a shmem of size 1270x879 and the addresses keep moving down by 0x445000 bytes. It might be that we're just hitting some magical boundary in the address space, or maybe leaking all these shmems/handles that the OS decides to kill the process. It's not really clear to me. I'll think about it some more.
Given that there are very few crashes with this signature in crash-stats, it's unlikely we'll uplift a fix to 46, assuming we even get a fix. Marking as wontfix for that release.
I'm totally grasping at straws here, but here's another build for you to try:

http://archive.mozilla.org/pub/firefox/try-builds/kgupta@mozilla.com-a124fd489ae04da65fb348de6e5f4a8fbf7d5285/try-linux64-debug/firefox-48.0a1.en-US.linux-x86_64.tar.bz2

This one will also dump stuff from /proc/self/maps every time it allocates one of these shmems but before it calls InitBuffer on it, so maybe it will turn up something interesting.
Results attached.
Attachment #8733083 - Attachment is obsolete: true
Attachment #8733084 - Attachment is obsolete: true
Thanks for the quick response! Unfortunately the results look totally normal, and again I have no idea why this is failing. I don't know what else to try here.

I'm gonna put back the regressionwindow-wanted that I removed in comment 6; given this is a different issue than I thought it was then, getting a regression window would be useful.

If that doesn't turn up anything we can try asking one of the ipc/shmem experts like :billm, except he's away for a few weeks right now.
Assignee: bugmail.mozilla → nobody
I tried Mozregression, but I'm not 100% sure of the results, because during testing in the last days there were some tries for which I couldn't get FF to crash. After waiting a bit and/or a few refreshes, FF would crash again.

Mozregression 1 : 
https://hg.mozilla.org/mozilla-central/rev/432ef38dab95

After setting gfx.xrender.enabled to false, Mozregression 2 :
https://hg.mozilla.org/mozilla-central/rev/30742281c223

I'll try again later with layers.async-pan-zoom.enabled = true for previous builds.
OK so I'm getting https://hg.mozilla.org/mozilla-central/rev/c6765de566a3
Also, whenever the FF created by Mozregression would crash, my "normal" FF instance would crash at the same time, see https://crash-stats.mozilla.com/report/index/cea3ae4a-243b-4ec8-857c-bb7e02160326 for example.
Crash Signature: [@ libc-2.21.so@0x172488 ] [@ mozilla::layers::ShmemTextureData::Create ] → [@ libc-2.21.so@0x172488 ] [@ mozilla::layers::ShmemTextureData::Create ] [@ mozilla::ipc::Shmem::Alloc ]
(In reply to Mathieu Marquer from comment #29)
> OK so I'm getting https://hg.mozilla.org/mozilla-central/rev/c6765de566a3

Markus, this would seem to implicate your patch in bug 1203190. Can you please take a look?
Depends on: 1203190
Flags: needinfo?(mstange)
Version: 48 Branch → 44 Branch
Bug 1203190 may have resulted in an increase in shmem usage, which is worth investigating. But the fact that we crash at all when using shmems is a separate bug and not caused by bug 1203190.
Flags: needinfo?(mstange)
(In reply to Markus Stange [:mstange] from comment #31)
> The fact that we crash at all when using shmems is a separate bug and not caused by bug 1203190.

Thanks Markus.

Kats, is this crash in your area of responsibility? If not, who would be best to take a look at this? If so, is there more information you need to debug this further?

Mathieu, how easy is this to reproduce for you? I'm only seeing one report of this crash over the last week which seems to indicate this is an isolated incident.
At this point I think maybe billm should probably take a look. Bill: a quick summary is that the code is crashing while trying to memset a shmem buffer to 0. It's reproducible and the shmem address seems fine. I even got the reporter to run a build with extra logging that dumped parts of /proc/self/maps (see comment 25 and 26) and as far as I could tell everything looked normal. Do you have any idea what might be going on?
Flags: needinfo?(wmccloskey)
Unfortunately (or not?) it seems like I can't get FF to crash anymore (even past builds), but I'm not sure if it's because of a change to the website content (unlikely I would say), or if a bug in Linux kernel that has been fixed.

Not sure if it's related, but I'm getting a lot of "Adobe Flash has crashed" banners, but crash reports don't seem to contain much usable information: https://crash-stats.mozilla.com/report/index/9cb42219-2fc6-4cae-a94f-34d412160503
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → WORKSFORME
Dropping needinfo since if the issue has gone away there's not much we can do.
Flags: needinfo?(wmccloskey)
Version: 44 Branch → unspecified
Windows is also impacted
OS: Linux → All
Not seeing this crash in 63 release at all. It may have been fixed in a patch in another bug.
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #39)
> Not seeing this crash in 63 release at all. It may have been fixed in a
> patch in another bug.

There is a handful of crashes on 63 and esr but nothing to be worried about. No crashes yet on 64/65.
You need to log in before you can comment on or make changes to this bug.