Closed Bug 1388165 Opened 2 years ago Closed 2 years ago

Stylo: DevTools test using XBL triggers crashes in ServoStyleSet::StyleRuleMap

Categories

(Core :: CSS Parsing and Computation, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox57 --- fixed

People

(Reporter: jryans, Assigned: TYLin)

References

Details

Crash Data

Attachments

(1 file)

(This is same test from bug 1386865, which now fails in a different way.)

DevTools test devtools/client/inspector/markup/test/browser_markup_anonymous_04.js triggers[1] a crash in Stylo during ServoStyleSet::StyleRuleMap.

(The test is currently skipped in DevTools Stylo test runs.)

[1]: https://treeherder.mozilla.org/logviewer.html#?job_id=121475765&repo=try&lineNumber=25302
:xidorn, maybe you have an idea about this since you worked on bug 1386865?
Flags: needinfo?(xidorn+moz)
Weird... that test passes for me locally without crash. Can you reproduce the crash locally?

That source line looks like mPresContext->Document() returns nullptr? How can that happen? And without triggering the assertion before?
Flags: needinfo?(xidorn+moz)
OK, so I can reproduce this locally if I run another test before this test.

It seems in my local case, mPresContext is somehow a dangling pointer, which is referring to an area which contains non-sensible data for an nsPresContext. I have no idea why this happens... It seems to be a more general issue that we are holding a dangling pointer in a live object :/
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #3)
> OK, so I can reproduce this locally if I run another test before this test.
> 
> It seems in my local case, mPresContext is somehow a dangling pointer, which
> is referring to an area which contains non-sensible data for an
> nsPresContext. I have no idea why this happens... It seems to be a more
> general issue that we are holding a dangling pointer in a live object :/

I happened to be working on ASAN for Stylo in another bug, and it appears to agree with you:

https://treeherder.mozilla.org/logviewer.html#?job_id=121527470&repo=try&lineNumber=2795

ServoStyleSet::StyleRuleMap tries to get the document from the pres context, but the pres context was previously freed by the cycle collector.
(This led me to wonder why the test is still running at all, but I only skipped it on debug.  Outside of ASAN, there is also bug 1388194 about this issue on opt test runs.)
See Also: → 1388194
Top #6 of Nightly 20170813183258 on Mac, e.g., bp-92c086ca-af81-4851-b361-6271a0170814. The first observation was from build 20170808114032. Possible regression range: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=47248637eafa9a38dade8dc3aa6c4736177c8d8d&tochange=a921bfb8a2cf3db4d9edebe9b35799a3f9d035da
Crash Signature: [@ mozilla::ServoStyleSet::StyleRuleMap]
Looking at the code, I think the reason of the use-after-free is that, nsXBLPrototypeResources is shared among different documents, but it holds a pointer to ServoStyleSet, which holds the pointer to mPresContext. This pointer becomes dangling pointer when the pres context gets destroyed.

I guess the ServoStyleSet is supposed to be destroyed in nsXBLPrototypeResources::ComputeServoStyleSet when a new document wants to style element with the given binding, but the style set seems to somehow be leaked to the new document.

TYLin, what do you think about this issue? Leaving a dangling pointer somewhere doesn't seem to be something sensible anyway... We probably should destroy nsXBLPrototypeResources::mServoStyleSet when pres context goes away somehow.
Flags: needinfo?(tlin)
FWIW, to reproduce this issue, you need to first remove the skip-if = stylo of devtools/client/inspector/markup/test/browser_markup_anonymous_04.js, then run something like
> ./mach mochitest devtools/client/inspector/markup/test/browser_markup_anonymous_*.js
so that we have another test run before it.
I can reproduce the crash by using STR in comment 8. Looking.
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #7)
> Looking at the code, I think the reason of the use-after-free is that,
> nsXBLPrototypeResources is shared among different documents, but it holds a
> pointer to ServoStyleSet, which holds the pointer to mPresContext. This
> pointer becomes dangling pointer when the pres context gets destroyed.

Yes. I think this observation is correct.

> I guess the ServoStyleSet is supposed to be destroyed in
> nsXBLPrototypeResources::ComputeServoStyleSet when a new document wants to
> style element with the given binding, but the style set seems to somehow be
> leaked to the new document.

Destroying the XBL ServoStyleSet after the bound document goes away might ruin the binding cache mechanism. 

The prototype bindings are cached once they're loaded, so the mPresContext in XBL ServoStyleSet are actually the PresContext which loads the binding the first time. Any ServoStyleSet operations involving mPresContext after it's created in [1] are dangerous given the PresContext might be destroyed.

The XBL stylesheets don't change dynamically, so I guess we don't really need to call any method that uses mPresContext.

> TYLin, what do you think about this issue? Leaving a dangling pointer
> somewhere doesn't seem to be something sensible anyway... We probably should
> destroy nsXBLPrototypeResources::mServoStyleSet when pres context goes away
> somehow.

To solve this completely, we'll need to invent nsCSSRuleProcessor for servo (bug 1370446), and use it instead of ServoStyleSet in nsXBLPrototypeResources. That might help bug 1382078 if we're done it right.

WeakPtr to PresContext cannot be used here because it doesn't support multi-thread which is needed by stylo. For now, we might need to workaround that by eagerly creating StyleRuleMap for XBL ServoStyleSet and not observing the bounding document.

[1] http://searchfox.org/mozilla-central/rev/48ea452803907f2575d81021e8678634e8067fc2/dom/xbl/nsXBLPrototypeResources.cpp#168
Flags: needinfo?(tlin)
If XBL stylesheets never change dynamically, I think the right approach is to just skip addig observer for them.

Also keeping a dangling pointer around is generally harmful. So I suggest that we clear the pointer when we should, and simply avoid adding observer when mPresContext is null.

I don't think we should eagerly create the map for xbl if possible, because that is used even more rarely than devtools -- it is behind a devtools option which is off by default.
Comment on attachment 8899678 [details]
Bug 1388165 - Clear mPresContext after creating XBL styleset.

https://reviewboard.mozilla.org/r/170986/#review176154
Attachment #8899678 - Flags: review?(xidorn+moz)
Comment on attachment 8899678 [details]
Bug 1388165 - Clear mPresContext after creating XBL styleset.

https://reviewboard.mozilla.org/r/170986/#review176210
Attachment #8899678 - Flags: review?(xidorn+moz) → review+
Assignee: nobody → tlin
Status: NEW → ASSIGNED
Pushed by tlin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9a93541800d3
Clear mPresContext after creating XBL styleset. r=xidorn
https://hg.mozilla.org/mozilla-central/rev/9a93541800d3
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
You need to log in before you can comment on or make changes to this bug.