Closed Bug 1114058 Opened 9 years ago Closed 9 years ago

Crash [@ js::RegExpShared::~RegExpShared]

Categories

(Core :: JavaScript Engine: JIT, defect)

x86
macOS
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla37
Tracking Status
firefox34 --- wontfix
firefox35 --- wontfix
firefox36 + fixed
firefox37 --- verified
firefox-esr31 --- unaffected
b2g-v1.4 --- unaffected
b2g-v2.0 --- fixed
b2g-v2.0M --- fixed
b2g-v2.1 --- fixed
b2g-v2.2 --- fixed

People

(Reporter: gkw, Assigned: terrence)

References

Details

(4 keywords, Whiteboard: [jsbugmon:update][adv-main36+])

Crash Data

Attachments

(4 files, 1 obsolete file)

// Randomly chosen test: js/src/jit-test/tests/auto-regress/bug812235.js
gcslice(3)
// Randomly chosen test: js/src/tests/test262/ch07/7.8/7.8.5/S7.8.5_A2.1_T2.js
var hex = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F"];
for (var i1 = 0; i1 < 16; i1++) {
    for (var i2 = 0; i2 < 16; i2++) {
        for (var i3 = 0; i3 < 16; i3++) {
            for (var i4 = 0; i4 < 16; i4++) {
                try {
                    var uu = hex[i1] + hex[i2] + hex[i3] + hex[i4];
                    eval("/" + String.fromCharCode("0x" + uu) + "/");
                } catch (e) {}
            }
        }
    }
}
// Randomly chosen test: js/src/jit-test/tests/debug/bug-826669.js
gczeal(9, 2)
newGlobal();

crashes 32-bit js debug shell on m-c changeset 490f124d7dea with --fuzzing-safe --no-threads --ion-eager at js::RegExpShared::~RegExpShared.

Debug configure options:

LD=ld CROSS_COMPILE=1 CC="clang -Qunused-arguments -msse2 -mfpmath=sse -arch i386" RANLIB=ranlib CXX="clang++ -Qunused-arguments -msse2 -mfpmath=sse -arch i386" AS=$CC AR=ar STRIP="strip -x -S" HOST_CC="clang -Qunused-arguments -msse2 -mfpmath=sse" AUTOCONF=/usr/local/Cellar/autoconf213/2.13/bin/autoconf213 HOST_CXX="clang++ -Qunused-arguments -msse2 -mfpmath=sse" sh /Users/skywalker/trees/mozilla-central/js/src/configure --target=i386-apple-darwin9.2.0 --enable-macos-target=10.5 --enable-debug --enable-optimize --enable-nspr-build --enable-more-deterministic --with-ccache --enable-gczeal --enable-debug-symbols --disable-tests

This was found by combining random js tests together with jsfunfuzz, the specific file(s) is/are:

http://hg.mozilla.org/mozilla-central/file/490f124d7dea/js/src/jit-test/tests/auto-regress/bug812235.js
http://hg.mozilla.org/mozilla-central/file/490f124d7dea/js/src/tests/test262/ch07/7.8/7.8.5/S7.8.5_A2.1_T2.js
http://hg.mozilla.org/mozilla-central/file/490f124d7dea/js/src/jit-test/tests/debug/bug-826669.js

autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/a57b5d713883
parent:      213775:4e0a21777c55
user:        Terrence Cole
date:        Wed Oct 29 14:17:45 2014 -0700
summary:     Bug 1074961 - Part 11: Use a ChunkPool to manage available Chunks list; r=sfink

Setting s-s because this seems to involve gcslice and gczeal.

Terrence, is bug 1074961 a likely regressor?
Flags: needinfo?(terrence)
Attached file stack
(lldb) bt 5
* thread #1: tid = 0x46ba0, 0x00761a96 js-dbg-opt-32-dm-nsprBuild-darwin-490f124d7dea`js::RegExpShared::~RegExpShared() [inlined] JSString::isPermanentAtom() const at String.h:441, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x7e4b950)
  * frame #0: 0x00761a96 js-dbg-opt-32-dm-nsprBuild-darwin-490f124d7dea`js::RegExpShared::~RegExpShared() [inlined] JSString::isPermanentAtom() const at String.h:441
    frame #1: 0x00761a96 js-dbg-opt-32-dm-nsprBuild-darwin-490f124d7dea`js::RegExpShared::~RegExpShared() [inlined] JSString::writeBarrierPre(JSString*) + 4 at String.h:504
    frame #2: 0x00761a92 js-dbg-opt-32-dm-nsprBuild-darwin-490f124d7dea`js::RegExpShared::~RegExpShared() [inlined] js::InternalGCMethods<JSAtom*>::preBarrier(v=0x07e4b950) at Barrier.h:300
    frame #3: 0x00761a92 js-dbg-opt-32-dm-nsprBuild-darwin-490f124d7dea`js::RegExpShared::~RegExpShared() [inlined] js::BarrieredBase<JSAtom*>::pre() + 2 at Barrier.h:452
    frame #4: 0x00761a90 js-dbg-opt-32-dm-nsprBuild-darwin-490f124d7dea`js::RegExpShared::~RegExpShared() [inlined] js::BarrieredBase<JSAtom*>::~BarrieredBase() at Barrier.h:421
(lldb)
This also repros on normal linux32 debug build with --no-threads and --no-baseline. What's going on here is that we are cleaning up the RegExpShared's source atom one or more GC's before the RegExpShared itself gets cleaned up. As usual, the failure path is extremely complicated. The facts so far:

* We keep a regexp shared if it is marked and if the source atom is about to die. From RegExpCompartment::sweep:
        bool keep = shared->marked() &&
                    !IsStringAboutToBeFinalizedFromAnyThread(shared->source.unsafeGet());
* When visiting the relevant table entry: shared->source->isMarked(BLACK) == false HOWEVER IsStringAboutToBeFinalized(shared->source) == false.
* In IsAboutToBeFinalized, we only actually check the mark bits if zone->isGCSweeping().
* This was true at the re-introdution of IsAboutToBeFinalized in bug 790338 (Oct 2012) and differed from the existing IsMarked function, which continues to check zone->isCollecting(). Presumable, there was no breakage at that time.
* The breaking edge is a cross zone edge from the regexp compartment to the atoms zone.
* At the time of the IsStringAboutToBeFinalized call, the atoms zone state is MARK.
* The call to IsStringAboutToBeFinalized that is miss-firing was introduced by bug 1010441 (May 2014), keeping jitcode for regexp around when keeping jitcode for a compartment.
* The SCC contains an edge from the impacted zone to the atoms zone during the failing GC. However, the impacted zone-group does not contain the atoms zone. I think this is expected as we shouldn't have any edges out of the atoms zone.

After reflection, I don't understand how the zone->isGCSweeping() test can be correct. I think the intent may have been along the lines of: SCC constructs zone groups which reflect all edges; if it is possible to visit an edge during sweeping, it should either be in the current zone group or be in an already-swept group; if the edge is in the current zone group, then the target zone state must have been set to sweeping; if the edge was already swept, then the zone state should still be SWEEP. However, this last is wrong as endMarkingZoneGroup resets the state to MARK for reasons that aren't really clear to me.

If I swap the zone->isGCSweeping test out for zone->isGCScheduled, the crash, naturally, goes away. Jon, is this the right solution, or am I missing something critical to the way the SCC works?
Flags: needinfo?(terrence)
Attached patch bug-1114058-v0.diff (obsolete) — Splinter Review
Attachment #8540390 - Flags: review?(jcoppeard)
Comment on attachment 8540390 [details] [diff] [review]
bug-1114058-v0.diff

Oh, very interesting! With this patch applied, auto-regress/bug746376.js starts failing when some regex jit-code disappears under us.
Attachment #8540390 - Flags: review?(jcoppeard)
Keywords: sec-high
(In reply to Terrence Cole [:terrence] from comment #2)
> * The breaking edge is a cross zone edge from the regexp compartment to the
> atoms zone.

Ok this is interesting, I didn't think about the case of IsAboutToBeFinalized() being called on an object in a different zone group to the entity being swept.

> After reflection, I don't understand how the zone->isGCSweeping() test can
> be correct. 

The idea is to only sweep entities associated with the zone group currently being swept. E.g. if there is a table containing entries related to objects in several different groups, this would be swept once for each zone group but only the entries related to objects that were being swept in the current group would be removed each time.

I think the solution here is probably to use |IsMarked(source)| rather than |!IsAboutToBeFinalized(source)|.
(In reply to Jon Coppeard (:jonco) (PTO until 29th Dec) from comment #5)
> (In reply to Terrence Cole [:terrence] from comment #2)
> The idea is to only sweep entities associated with the zone group currently
> being swept. E.g. if there is a table containing entries related to objects
> in several different groups, this would be swept once for each zone group
> but only the entries related to objects that were being swept in the current
> group would be removed each time.
> 
> I think the solution here is probably to use |IsMarked(source)| rather than
> |!IsAboutToBeFinalized(source)|.

Yes, that would also work, but I thought it would be a too much of a hack. Maybe not, but there are still two things that I really do not understand.

1) In this case (and it seems in the case of any weak pointer), by looking at the target to choose whether to keep the pointer, we are effectively marking in the opposite direction of the pointer. The SCC does not currently take into account these reversed "edges". Should it? If not, why not?

2) Why does this bug not bite us constantly? Even if the SCC algorithm takes into account the forward weak edges, the SCC algorithm does not guarantee that these edges, if cross compartment, will land in the same ZoneGroup. Thus, it seems like calling IsAboutAboutToBeFinalized on any cross-compartment edge should assert so that we can switch them all to IsMarked?
Flags: needinfo?(jcoppeard)
(In reply to Terrence Cole [:terrence] from comment #6)
> 1) In this case (and it seems in the case of any weak pointer), by looking
> at the target to choose whether to keep the pointer, we are effectively
> marking in the opposite direction of the pointer. The SCC does not currently
> take into account these reversed "edges". Should it? If not, why not?

The issue that grouping the zones into SCCs solves is to ensure that for all strong cross compartment edges A -> B we don't sweep A's zone after B's zone, which would allow the possibility of A being marked after B had been already been swept.

So yes, thinking about weak edges it does seem like the SCC computation should reverse the edge, since we need to know if the target is marked before we know whether to sweep the source pointer.

In this case if we use IsMarked() we could remove a RegExpShared unnecessarily if the source atom was not marked when we swept it, but became marked later.  I don't know if this is actually a problem for RegExpShared in particular.  (Also I'm not sure whether this one actually needs to be weak pointer at all or whether we could mark the source atom if the RegExpShared itself was marked.)

In general we could reverse the SCC edges for weak pointers but this runs the risk of pulling everything into the same zone group and defeating incremental sweeping, especially if we have weak references into the atoms compartment as probably every compartment has strong edges into there.

> 2) Why does this bug not bite us constantly? Even if the SCC algorithm takes
> into account the forward weak edges, the SCC algorithm does not guarantee
> that these edges, if cross compartment, will land in the same ZoneGroup.
> Thus, it seems like calling IsAboutAboutToBeFinalized on any
> cross-compartment edge should assert so that we can switch them all to
> IsMarked?

Well as in the example in my last comment, sometimes it's ok to call IsAboutToBeFinalized() on something in a zone we're not sweeping at the moment.  So I'm not sure if we could assert this.

I'm trying to think how many cross compartment weak pointers we actually have...  For weakmaps with key delegates in a different zone we already add SCC edges to ensure that we finish marking the delegates' zones before the key zones, so that case is handled.  Maybe these are the only cases?
Flags: needinfo?(jcoppeard)
(In reply to Jon Coppeard (:jonco) (PTO until 29th Dec) from comment #7)
> I'm trying to think how many cross compartment weak pointers we actually
> have...  For weakmaps with key delegates in a different zone we already add
> SCC edges to ensure that we finish marking the delegates' zones before the
> key zones, so that case is handled.  Maybe these are the only cases?

Exactly! We're already jumping through hoops to make this case work for WeakMaps; could we introduce a new WeakPointer primitive that guarantees that weak edges get the same treatment? What we're doing with relying on manual IsAboutToBeFinalized and nulling during sweeping is only going to get more fragile as we incrementalize further. At the very least, it would be nice to have a linked-list in DEBUG so that we could add some solid verifications that the ad-hoc approach is working.
Let's switch this user to IsMarked and uplift to solve the proximate cause now and work on a more general fix elsewhere. I've verified that this fixes the crash and doesn't cause other issues.
Assignee: nobody → terrence
Attachment #8540390 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8540982 - Flags: review?(jcoppeard)
Attachment #8540982 - Flags: review?(jcoppeard) → review+
Comment on attachment 8540982 [details] [diff] [review]
bug-1114058-v1.diff

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
Not easily.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
Nope.

Which older supported branches are affected by this flaw?
The potential for the bug to exist goes back to 2012, but I think the only actual instance is from May 2014. So I think we need to backport everywhere that sourced from m-c around that time. 

If not all supported branches, which bug introduced the flaw?
Bug 1010441

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
It's a one-liner and I don't think the code has changed much since May.

How likely is this patch to cause regressions; how much testing does it need?
It should be very low risk.
Attachment #8540982 - Flags: sec-approval?
Comment on attachment 8540982 [details] [diff] [review]
bug-1114058-v1.diff

sec-approval+.

We should get this checked into trunk and nominated for Beta and Aurora branches ASAP as it is running the risk of not getting into Beta (and therefore not into 35).
Attachment #8540982 - Flags: sec-approval? → sec-approval+
https://hg.mozilla.org/mozilla-central/rev/f1c8fc215969
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
Status: RESOLVED → VERIFIED
JSBugMon: This bug has been automatically verified fixed.
So this has almost-certainly missed Fx35 now (the RC build is due to be spun today). Can we at least get approval requests for Aurora?
Flags: needinfo?(terrence)
Comment on attachment 8540982 [details] [diff] [review]
bug-1114058-v1.diff

Approval Request Comment
[Feature/regressing bug #]: Bug 1010441.
[User impact if declined]: Potential security risk.
[Describe test coverage new/current, TBPL]: On m-i for a week.
[Risks and why]: Low - this switches the caller to a different, but existing, primitive, so is unlikely to break other things or introduce new bugs. 
[String/UUID change made/needed]: None.
Flags: needinfo?(terrence)
Attachment #8540982 - Flags: approval-mozilla-aurora?
Attachment #8540982 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Terrence, can you please help rebase this for b2g34/b2g32?
Flags: needinfo?(terrence)
Attached patch backport-32.diffSplinter Review
Attachment #8546830 - Flags: review+
Attachment #8546830 - Flags: checkin?(ryanvm)
Attached patch backport-34.diffSplinter Review
Flags: needinfo?(terrence)
Attachment #8546841 - Flags: review+
Attachment #8546841 - Flags: checkin?(ryanvm)
Whiteboard: [jsbugmon:update] → [jsbugmon:update][adv-main36+]
Group: core-security → core-security-release
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.