Closed Bug 1236266 Opened 8 years ago Closed 8 years ago

Assertion failure: !r->IsEmpty() at GfxMessageUtils.h:380

Categories

(Core :: Graphics: Layers, defect)

44 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox44 --- fixed
firefox45 --- fixed
firefox46 --- fixed
b2g-v2.5 --- fixed

People

(Reporter: bc, Assigned: billm)

References

()

Details

(Keywords: assertion, regression)

Attachments

(3 files, 2 obsolete files)

Attached file minidump crash report
1. Install Spider extension http://bclary.com/projects/spider/spider-0.1.0.3-sm+tb+fx+an+fn.xpi

This appears to be required. If this makes the bug invalid, please make it so and I apologize for the spam.

2. firefox -spider -start -quit -wait 0 -depth 1 -url http://player.chicagopublicmedia.org/#wbez -domain player.chicagopublicmedia.org

Media Warning: ["HTTP "Content-Type" of "audio/mpeg" is not supported. Load of media resource http://stream.wbez.org/wbez64.mp3 failed." {file: "http://player.chicagopublicmedia.org/#wbez" line: 0}]. Source File: http://player.chicagopublicmedia.org/#wbez, Line: 0.

Assertion failure: !r->IsEmpty(), at /builds/slave/m-cen-l64-d-000000000000000000/build/src/obj-firefox/dist/include/mozilla/GfxMessageUtils.h:380

Bughunter can reproduce with Beta/44, Aurora/45, Nightly/46 on Linux, OSX, Windows.

The assertion was not entirely 100% reproducible, so I used 10 attempts for each test to find:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=e04c6beb23e19f9721b73763efe082d7697c6c11&tochange=7529d41e6f1b32d7767d8027b9941e61ec38591b

-> Bug 1224209 - Fix missing propagation of 'fixedPositionSides' to the compositor, and add it to the layers dump.

The regression range doesn't appear to include Beta/44 however so I'm somewhat skeptical of it. Perhaps it only made an existing bug more susceptible.

Bughunter also found crashes related to stacks like mozilla::ipc::FatalError mozilla::layers::PLayerTransactionParent::Read also occur in opt builds on urls from this site.
(In reply to Bob Clary [:bc:] from comment #0)
> -> Bug 1224209 - Fix missing propagation of 'fixedPositionSides' to the
> compositor, and add it to the layers dump.

Note this is the wrong bug number, it was backed out and re-pushed with the right number. The right bug is bug 1214267.
Blocks: 1214267
No longer blocks: 1224209
:tn, thanks! that makes more sense as it has landed in 44.
I tried to repro using a local debug build from recent m-c on Linux but couldn't after ~10 attempts. I see the Media Warning that you posted but no assertion/crash.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #3)
> Was this on Linux?

Oh, never mind. You said it happens on all of Linux/Windows/OS X. I'll try with a stock nightly as well.
Kartikaya, did you use the Spider extension? It appears to be required to reproduce.
(In reply to Bob Clary [:bc:] from comment #6)
> Kartikaya, did you use the Spider extension? It appears to be required to
> reproduce.

Yes.

With the nightly on Linux I still wasn't able to reproduce. Although I suspect the website might be blocking me now since at some point I stopped getting the Media Warning as well.

Loading the URL in Aurora on OS X without the spider plugin did trigger a crash for me, https://crash-stats.mozilla.com/report/index/b399b1ae-ab4e-4bc7-93af-021d72160104. It might be specific to something on the page (like an ad) that's getting rotated on each page load.

Looking at the code, it looks like this assertion would fire if a region being sent over IPC had an empty rect inside, and the patches from bug 1214267 don't affect any regions. So I'm not convinced that bug 1214267 is to blame here - at least it's not clear to me how that change would trigger this assertion.
I can reproduce with a build from m-c this morning on Fedora 23 x86_64 using

firefox -spider -start -quit -url 'http://player.chicagopublicmedia.org/#wbez' -depth 1 -domain player.chicagopublicmedia 2>&1 | grep Assertion.*IsEmpty

and can capture it in gdb. The assertion doesn't crash the full browser, you need to look for the message or run it under gdb to catch it.
Can you try loading the page in the browser, saving it, and seeing if you can repro it on the saved version? I strongly suspect it has to do with which ad you get in the bottom-right corner of the page. Most of the time now I am seeing the same "Avery Coonley School" ad which doesn't trigger it.
Ah, I found it. The profile I was using has Privacy -> Never Remember History set with the previous Cache cleared. I can't reproduce it without that set. Sorry for failing to notice/tell you about this before.
I'm able to reproduce this (without Bob's change in comment 10). The problem is that we have a region that violates pixman's invariants. It's not considered empty, but it has zero width. This invalid region is generated here:
https://dxr.mozilla.org/mozilla-central/source/gfx/cairo/libpixman/src/pixman-region.c#1331
If we pass in a rectangle with zero width, we'll end up with a region with zero width. Instead, we should get the special empty region, which has data == &pixman_region_empty_data.

Jeff points out that this came up in https://bugzilla.mozilla.org/show_bug.cgi?id=1022612#c97. It looks like roc landed something there, but I don't think it fully fixes the problem.

I think the right fix is:
1. Change the IPC code so that it doesn't depend on our region invariants. We should just count the number of rectangles up front and encode that in the stream. That's much safer, which is important since this code is super security critical.
2. Fix the pixman bug.
Attached patch ipc (obsolete) — Splinter Review
This patch ensures that we don't mess up IPC just because a region is a little bit wrong.
Assignee: nobody → wmccloskey
Status: NEW → ASSIGNED
Attachment #8703891 - Flags: review?(jmuizelaar)
Attached patch pixman (obsolete) — Splinter Review
This is a patch for pixman. You can see that it already does a similar thing for union, so I don't think the maintainers would object to this patch.
Attachment #8703892 - Flags: review?(jmuizelaar)
Filed bug 1236735 to add some assertions to prevent this sort of thing at the IPC level. I suspect we should also add more assertions to the region code, but someone else should probably deal with that.
Comment on attachment 8703891 [details] [diff] [review]
ipc

Review of attachment 8703891 [details] [diff] [review]:
-----------------------------------------------------------------

It seems like this will make bugs harder to find and to avoid security problems we should make sure the receiver handles this properly.
Attachment #8703891 - Flags: review?(jmuizelaar) → review-
(In reply to Jeff Muizelaar [:jrmuizel] from comment #15)
> Comment on attachment 8703891 [details] [diff] [review]
> ipc
> 
> Review of attachment 8703891 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> It seems like this will make bugs harder to find and to avoid security
> problems we should make sure the receiver handles this properly.

Thinking it over, we can still leave the assertion in. However, I think the code is correct. It shouldn't change the meaning of a region to omit an empty rectangle.

If you'd prefer, we could make this a release assertion instead. However, I'm worried that we would end up with a lot of crashes and no way to diagnose them.
(In reply to Bill McCloskey (:billm) from comment #16)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #15)
> > Comment on attachment 8703891 [details] [diff] [review]
> > ipc
> > 
> > Review of attachment 8703891 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > It seems like this will make bugs harder to find and to avoid security
> > problems we should make sure the receiver handles this properly.
> 
> Thinking it over, we can still leave the assertion in. However, I think the
> code is correct. It shouldn't change the meaning of a region to omit an
> empty rectangle.

Since it doesn't change the meaning why do we need to avoid sending them?
(In reply to Jeff Muizelaar [:jrmuizel] from comment #17)
> Since it doesn't change the meaning why do we need to avoid sending them?

Because the empty region is used as a sentinel. You can see the read-side code a bit below:
https://dxr.mozilla.org/mozilla-central/source/gfx/ipc/GfxMessageUtils.h#388
If we send the empty region, then no other regions will be read.
(In reply to Bill McCloskey (:billm) from comment #18)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #17)
> > Since it doesn't change the meaning why do we need to avoid sending them?
> 
> Because the empty region is used as a sentinel. You can see the read-side
> code a bit below:
> https://dxr.mozilla.org/mozilla-central/source/gfx/ipc/GfxMessageUtils.h#388
> If we send the empty region, then no other regions will be read.

Ah, I see. My preference is still to leave the code unchanged. If the region has empty rects it's considered corrupt and while dropping them is probably safer than truncating the region, I'd prefer things to blow up more drastically than for us to try to get by.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #19)
> Ah, I see. My preference is still to leave the code unchanged. If the region
> has empty rects it's considered corrupt and while dropping them is probably
> safer than truncating the region, I'd prefer things to blow up more
> drastically than for us to try to get by.

There are two problems with that approach:

1. Things blow up in a way that tells us nothing about the actual problem. I'm pretty confident that this bug is the cause of bug 1208226. We've been trying to determine the cause of that bug for months and had no luck.

2. Any bug in the IPC serialization code basically allows an attacker to compromise the chrome process. Once we turn on e10s and sandboxing, a bug like this will allow an attacker to escape the sandbox. Since this Write method here essentially makes the correctness of the IPC code depend on the correctness of this region invariant, an attacker who finds a bug in our region code can escape the sandbox. That's really dumb, especially since we don't even maintain pixman ourselves.
(In reply to Bill McCloskey (:billm) from comment #20)
> 2. Any bug in the IPC serialization code basically allows an attacker to
> compromise the chrome process. Once we turn on e10s and sandboxing, a bug
> like this will allow an attacker to escape the sandbox.

Doesn't that mean that an attacker that has code execution in sandbox can escape
the sandbox by just sending a bad region?
(In reply to Jeff Muizelaar [:jrmuizel] from comment #21)
> (In reply to Bill McCloskey (:billm) from comment #20)
> > 2. Any bug in the IPC serialization code basically allows an attacker to
> > compromise the chrome process. Once we turn on e10s and sandboxing, a bug
> > like this will allow an attacker to escape the sandbox.
> 
> Doesn't that mean that an attacker that has code execution in sandbox can
> escape
> the sandbox by just sending a bad region?

Until we fix this problem, yes. As far as I can tell, the patch here fixes it.
(In reply to Bill McCloskey (:billm) from comment #22)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #21)
> > (In reply to Bill McCloskey (:billm) from comment #20)
> > > 2. Any bug in the IPC serialization code basically allows an attacker to
> > > compromise the chrome process. Once we turn on e10s and sandboxing, a bug
> > > like this will allow an attacker to escape the sandbox.
> > 
> > Doesn't that mean that an attacker that has code execution in sandbox can
> > escape
> > the sandbox by just sending a bad region?
> 
> Until we fix this problem, yes. As far as I can tell, the patch here fixes
> it.

If the attacker has execution then can send what ever garbage they want to the chrome process so I don't see how changing the sending code helps plug that hole.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #23)
> (In reply to Bill McCloskey (:billm) from comment #22)
> > (In reply to Jeff Muizelaar [:jrmuizel] from comment #21)
> > > (In reply to Bill McCloskey (:billm) from comment #20)
> > > > 2. Any bug in the IPC serialization code basically allows an attacker to
> > > > compromise the chrome process. Once we turn on e10s and sandboxing, a bug
> > > > like this will allow an attacker to escape the sandbox.
> > > 
> > > Doesn't that mean that an attacker that has code execution in sandbox can
> > > escape
> > > the sandbox by just sending a bad region?
> > 
> > Until we fix this problem, yes. As far as I can tell, the patch here fixes
> > it.
> 
> If the attacker has execution then can send what ever garbage they want to
> the chrome process so I don't see how changing the sending code helps plug
> that hole.

You're right, I guess my thinking had become muddled. The IPC code checks that you don't overrun the message when reading data out of it. So the only security-sensitive that can go wrong here is if the compositor gets confused when part of the region is read out as some other part of the message. But that's already a possibility we have to consider if the attacker can send anything.

Aside from security, my first point from comment 20 still stands. We can use a release assertion here to make the problem more apparent when it happens, but I'm worried that we have other region bugs that we won't be able to diagnose without an STR. So I think my patch still makes sense.
(In reply to Bill McCloskey (:billm) from comment #24)
> 
> Aside from security, my first point from comment 20 still stands. We can use
> a release assertion here to make the problem more apparent when it happens,
> but I'm worried that we have other region bugs that we won't be able to
> diagnose without an STR. So I think my patch still makes sense.

I'd be happy to take a release assert instead of not writing the empty rectangle. If we end up seeing crashes that we don't know how to handle we can downgrade to your patch? How does that sound?
Attached patch release assertSplinter Review
OK, sounds good.
Attachment #8703891 - Attachment is obsolete: true
Attachment #8704343 - Flags: review?(jmuizelaar)
Comment on attachment 8703892 [details] [diff] [review]
pixman

Review of attachment 8703892 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/cairo/libpixman/src/pixman-region.c
@@ +1338,5 @@
> +    {
> +        if (BAD_RECT (&region.extents))
> +            _pixman_log_error (FUNC, "Invalid rectangle passed");
> +        PREFIX (_init) (dest);
> +        return;

return TRUE;
Attachment #8703892 - Flags: review?(jmuizelaar) → review+
Attachment #8704343 - Flags: review?(jmuizelaar) → review+
Hello, this has caused Static Checking Build bustage like this

https://treeherder.mozilla.org/logviewer.html#?job_id=19350945&repo=mozilla-inbound

 22:12:49     INFO -  /builds/slave/m-in-m64-st-an-d-0000000000000/build/src/gfx/cairo/libpixman/src/pixman-region.c:1342:9: error: non-void function '_moz_pixman_region_intersect_rect' should return a value [-Wreturn-type]
 22:12:49     INFO -          return;
 22:12:49     INFO -          ^
 22:12:49     INFO -  1 error generated.
 22:12:49     INFO -  make[5]: *** [pixman-region16.o] Error 1
 22:12:49     INFO -  make[4]: *** [gfx/cairo/libpixman/src/target] Error 2
 22:12:49     INFO -  make[4]: *** Waiting for unfinished jobs....
 22:12:51     INFO -  libembedding_components_webbrowserpersist.a.desc
 22:12:51     INFO -  rm -f libembedding_components_webbrowserpersist.a
 22:12:51     INFO -  /builds/slave/m-in-m64-st-an-d-0000000000000/build/src/obj-firefox/_virtualenv/bin/python /builds/slave/m-in-m64-st-an-d-0000000000000/build/src/config/expandlibs_gen.py -o libembedding_components_webbrowserpersist.a.desc WebBrowserPersistDocumentChild.o WebBrowserPersistDocumentParent.o WebBrowserPersistLocalDocument.o WebBrowserPersistRemoteDocument.o WebBrowserPersistResourcesChild.o WebBrowserPersistResourcesParent.o WebBrowserPersistSerializeChild.o WebBrowserPersistSerializeParent.o nsWebBrowserPersist.o
 22:12:51     INFO -  libnetwerk_protocol_websocket.a.desc
 22:12:51     INFO -  rm -f libnetwerk_protocol_websocket.a
 22:12:51     INFO -  /builds/slave/m-in-m64-st-an-d-0000000000000/build/src/obj-firefox/_virtualenv/bin/python /builds/slave/m-in-m64-st-an-d-0000000000000/build/src/config/expandlibs_gen.py -o libnetwerk_protocol_websocket.a.desc Unified_cpp_protocol_websocket0.o
 22:12:53     INFO -  libdom_messagechannel.a.desc
 22:12:53     INFO -  rm -f libdom_messagechannel.a
 22:12:53     INFO -  /builds/slave/m-in-m64-st-an-d-0000000000000/build/src/obj-firefox/_virtualenv/bin/python /builds/slave/m-in-m64-st-an-d-0000000000000/build/src/config/expandlibs_gen.py -o libdom_messagechannel.a.desc Unified_cpp_dom_messagechannel0.o
 22:12:54     INFO -  libtoolkit_components_alerts.a.desc
 22:12:54     INFO -  rm -f libtoolkit_components_alerts.a
 22:12:54     INFO -  /builds/slave/m-in-m64-st-an-d-0000000000000/build/src/obj-firefox/_virtualenv/bin/python /builds/slave/m-in-m64-st-an-d-0000000000000/build/src/config/expandlibs_gen.py -o libtoolkit_components_alerts.a.desc Unified_cpp_components_alerts0.o
 22:12:57     INFO -  libextensions_spellcheck_hunspell_glue.a.desc
 22:12:57     INFO -  rm -f libextensions_spellcheck_hunspell_glue.a
 22:12:57     INFO -  /builds/slave/m-in-m64-st-an-d-0000000000000/build/src/obj-firefox/_virtualenv/bin/python /builds/slave/m-in-m64-st-an-d-0000000000000/build/src/config/expandlibs_gen.py -o libextensions_spellcheck_hunspell_glue.a.desc RemoteSpellCheckEngineChild.o RemoteSpellCheckEngineParent.o mozHunspell.o mozHunspellDirProvider.o
 22:12:57     INFO -  make[3]: *** [compile] Error 2
 22:12:57     INFO -  make[2]: *** [default] Error 2
 22:12:57     INFO -  make[1]: *** [realbuild] Error 2
 22:12:57     INFO -  make: *** [build] Error 2
 22:12:57     INFO -  262 compiler warnings present.
 22:12:59     INFO -  Notification center failed: Install terminal-notifier to get a notification when the build finishes.
 22:12:59    ERROR - Return code: 2
 22:12:59  WARNING - setting return code to 2
 22:12:59    FATAL - 'mach build' did not run successfully. Please check log for errors.
 22:12:59    FATAL - Running post_fatal callback...
 22:12:59    FATAL - Exiting -1
 22:12:59     INFO - Running post-action listener: influxdb_recording_post_action
 22:12:59     INFO - Resetting dropped connection: goldiewilson-onepointtwentyone-1.c.influxdb.com
 /builds/slave/m-in-m64-st-an-d-0000000000000/build/venv/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:90: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
   InsecurePlatformWarning
 22:12:59     INFO - Running post-action listener: record_mach_stats
 22:12:59     INFO - No build_resources.json found, not logging stats
 22:12:59     INFO - Running post-run listener: _summarize
Ah, nevermind. You fixed it :)
Backed out for Leaks on ASAN

TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at alloc_data, _moz_pixman_region32_copy, Copy, operator=

https://treeherder.mozilla.org/logviewer.html#?job_id=19353821&repo=mozilla-inbound
Comment on attachment 8703892 [details] [diff] [review]
pixman

Review of attachment 8703892 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/cairo/libpixman/src/pixman-region.c
@@ +1336,5 @@
>  
> +    if (!GOOD_RECT (&region.extents))
> +    {
> +        if (BAD_RECT (&region.extents))
> +            _pixman_log_error (FUNC, "Invalid rectangle passed");

There should be a FREE_DATA (dest); here to avoid the leak. Sorry for not catching that sooner.
Attachment #8703892 - Flags: review+ → review-
Attached patch pixman v2Splinter Review
Attachment #8703892 - Attachment is obsolete: true
Attachment #8704684 - Flags: review?(jmuizelaar)
Attachment #8704684 - Flags: review?(jmuizelaar) → review+
https://hg.mozilla.org/mozilla-central/rev/0a1121061fa8
https://hg.mozilla.org/mozilla-central/rev/51c62defb597
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Comment on attachment 8704684 [details] [diff] [review]
pixman v2

Approval Request Comment
[Feature/regressing bug #]: bug 570096 (ancient)
[User impact if declined]: crashes (bug 1198765, bug 1208226)
[Describe test coverage new/current, TreeHerder]: Been on m-c for a while.
[Risks and why]: Low risk patch.
[String/UUID change made/needed]: None
Attachment #8704684 - Flags: approval-mozilla-beta?
Attachment #8704684 - Flags: approval-mozilla-aurora?
Comment on attachment 8704684 [details] [diff] [review]
pixman v2

I confirmed with Bill that this addresses crashes in non-e10s scenario as well so taking it in Beta44, Aurora45.
Attachment #8704684 - Flags: approval-mozilla-beta?
Attachment #8704684 - Flags: approval-mozilla-beta+
Attachment #8704684 - Flags: approval-mozilla-aurora?
Attachment #8704684 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: