Closed Bug 803730 Opened 7 years ago Closed 7 years ago

<HTML element> instanceof Node sometimes returns true and sometimes false for the same element, and turning on Firebug makes the bug go away

Categories

(Core :: JavaScript Engine, defect, major)

x86_64
Linux
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla19
Tracking Status
firefox17 --- unaffected
firefox18 + fixed
firefox19 + fixed
firefox-esr10 --- unaffected
firefox-esr17 --- unaffected

People

(Reporter: rain1, Assigned: djvj)

References

Details

(Keywords: regression, sec-high, Whiteboard: [ion:p1][adv-main18-])

Attachments

(4 files, 1 obsolete file)

I'm seeing a bug in one of the Facebook internal web tools where for some HTML elements, <element> instanceof Node *sometimes* returns false. As expected, this breaks the tool.

Turning on Firebug for the page makes the bug go away. Adding a console.log(obj instanceof Node) doesn't make the bug go away, but it does make the bug appear in a different place. Seems to be a nasty Heisenbug.

The amazing thing is that it sometimes returns true and sometimes false. Here's the structure of the code:

in one module:

var foo = {
...
  isNode: function(obj) {
    return !!(obj instanceof Node);
  }
...
}

and in the caller:

var elem = document.createElement("span");
var trueCount = 0;
var falseCount = 0;
for (var x = 0; x < 100000; x++) {
  if (foo.isNode(elem))
    trueCount++;
  else
    falseCount++;
}
console.log("trueCount: " + trueCount + ", falseCount: " + falseCount);

prints out

trueCount: 10352, falseCount: 89648

deterministically.

I'm seeing this on Nightly (19.0a1) and Aurora (18.0a2) at least. I'm trying to get a reduced testcase but failing at it, unfortunately.
I'm testing on Ubuntu 12.04 x64 with the x64 Firefox nightly from nightly.mozilla.org, by the way.
Summary: <HTML element> instanceof Node returns false in some cases, and turning on Firebug makes the bug go away → <HTML element> instanceof Node sometimes returns true and sometimes false for the same element, and turning on Firebug makes the bug go away
Works in 2012-09-11 (96287ad60bef), broken in 2012-09-12 (d260fcec71ce). Regression range: http://hg.mozilla.org/mozilla-central/pushloghtml?from=96287ad60bef&to=d260fcec71ce
Looks like IonMonkey broke it.
Severity: normal → major
Attached file testcase
Here's a testcase.
Turning off ion makes the bug go away, indeed.  Thank you for the testcase!
Assignee: nobody → general
Component: DOM → JavaScript Engine
Keywords: regression
One other note.  My counts are deterministically:

  trueCount: 10337, falseCount: 89663.

in a Mac nightly with ion enabled.  Not sure why they're just slightly off from yours...
Ah, I should have mentioned that I see 10337/89663 with this testcase too. It's with the internal tool that I saw 10352/89648.
Flags: in-testsuite?
Can we get this assigned to someone? This is something we need to fix also in Aurora.
someone really wants this bug fixed, I keep getting this targeted ad on facebook linking to it..  heh
Assignee: general → kvijayan
Found the issue.  We're passing an unboxed pointer into js::HasInstance, as if it were a boxed value.  An unboxed pointer treated like a boxedval looks like a boxed double, so HasInstance thinks that the value it's checking is a double, which is obviously not an instance of Node.

Testing a fix for this now.
Attachment #677468 - Flags: review?(sstangl)
Attached patch Fix. (obsolete) — Splinter Review
The main change here is to fix up the stub generator for instanceOf (which is shared between both LGetInstanceOfO and LGetInstanceOfV), to box its LHS side if it's working on the "InstanceOfO" variant.

The remaining changes are just to add the supporting methods to the arch-specific assembler and codegen.
Attachment #677472 - Flags: review?(sstangl)
Comment on attachment 677468 [details] [diff] [review]
Patch to add mochitest for bug.

Review of attachment 677468 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/tests/mochitest/test_bug803730.html
@@ +17,5 @@
> +<pre id="test">
> +<script type="application/javascript">
> +
> +/** Test for Bug 803730 **/
> +SimpleTest.waitForExplicitFinish();

This is used for testing asynchronous code, and requires a call to SimpleTest.finish() for the test to be marked as successful. Removing this function call should be all that's necessary.
Attachment #677468 - Flags: review?(sstangl) → review+
Comment on attachment 677472 [details] [diff] [review]
Fix.

Review of attachment 677472 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/CodeGenerator.cpp
@@ +4046,5 @@
> +    // if we're handling a LInstanceOfO.
> +
> +    OutOfLineCode *call = oolCallVM(
> +        HasInstanceInfo,
> +        ins,

Breaks style guidelines -- put "HasInstanceInfo, ins," immediately after "oolCallVM(". OK to not line up the other lines exactly with the arguments.

Code structured as in the patch generally is completely formatted as below, but that is even further from our engine style.

> foo (
>   a,
>   b
> );

@@ +4047,5 @@
> +
> +    OutOfLineCode *call = oolCallVM(
> +        HasInstanceInfo,
> +        ins,
> +        (ArgList(), rhs, lhsIsValue ? ToValue(ins, 0) : ToTempValue(ins, 0)),

"ToTempValue()" is a confusing name -- it sounds like it's interpreting non-Value operands as a temporary Value. Maybe "TempsAsValue()"? Another option would be to define "0" as a static named const, e.g., "FirstTemp".

@@ +4079,5 @@
> +        // calling into js::HasInstance.
> +        Label dontCallHasInstance;
> +        masm.j(Assembler::Equal, &dontCallHasInstance);
> +        masm.boxNonDouble(JSVAL_TYPE_OBJECT,
> +                          ToRegister(ins->getOperand(0)),

Prefer ins->lhs(). Also below.

@@ +4080,5 @@
> +        Label dontCallHasInstance;
> +        masm.j(Assembler::Equal, &dontCallHasInstance);
> +        masm.boxNonDouble(JSVAL_TYPE_OBJECT,
> +                          ToRegister(ins->getOperand(0)),
> +                          ToTempValue(ins, 0));

Column limit is 99 -- arguments can share the same line. Also below.

@@ +4163,5 @@
> +        // If the input LHS is raw object pointer, it must be boxed before
> +        // calling into js::HasInstance.
> +        Label dontCallHasInstance;
> +        masm.branch32(Assembler::NotEqual, lhsTmp, Imm32(1),
> +                      &dontCallHasInstance);

&dontCallHasInstance can be on the line above -- max 99 columns. It's not particularly important that this code be shared, even though it's a duplication, but factoring out the cmpPtr()/j() from the branch32() would allow it to be structured as above, possibly permitting sharing.
Attachment #677472 - Flags: review?(sstangl) → review+
(In reply to Sean Stangl from comment #15)
> "ToTempValue()" is a confusing name -- it sounds like it's interpreting
> non-Value operands as a temporary Value. Maybe "TempsAsValue()"? Another
> option would be to define "0" as a static named const, e.g., "FirstTemp".

The naming is in keeping with the currently existing |ToValue| and |ToOutValue| which construct ValueOperand representations of inputs and outputs in the same way.

I agree that the naming is confusing, but I think fixing that inconsistently is worse.  Leaving the existing names as-is and having this variant of it named differently is the least desirable option.  The naming stays a bit confusing, but now it'll also be not predictable even after you've learned the odd convention.

With the current naming, having encountered and understood ToValue will tell the reader what to expect from ToTempValue.  That seems to be a desirable property to go for.

That said, I agree with your criticism of the name - but we should fix all of them to be something better sounding (e.g. InputAsValue, OutputAsValue, TempAsValue?).

What do you think?
Flags: needinfo?(sstangl)
Flags: needinfo?(sstangl)
Comment on attachment 677472 [details] [diff] [review]
Fix.

[Security approval request comment]
How easily can the security issue be deduced from the patch?

Someone familiar with both our Value boxing format could probably recognize the problem. As Kannan describes it, we accidentally interpret random bits in memory as a register, and then use that register as part of a JS-visible value. Very difficult to exploit, but possible.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

Not per se, you have to read in between the lines.

Which older supported branches are affected by this flaw?

Firefox 18.

If not all supported branches, which bug introduced the flaw?

IonMonkey.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

This patch should apply to Firefox 18.

How likely is this patch to cause regressions; how much testing does it need?

Very unlikely, our test suites should cover it.
Attachment #677472 - Flags: sec-approval?
Attachment #677472 - Flags: sec-approval? → sec-approval+
Putting needinfo back (see comment 16).
Flags: needinfo?(sstangl)
Try run with review comments addressed: https://tbpl.mozilla.org/?tree=Try&rev=da264e8dae3e

The one prior to addressing review comments was entirely green.  And the line of oth oranges in this one are also present on the tbpl for the mozilla-inbound checkin it's based on.
Marking as sec-high because treating random values in memory as JS values sounds bad, even if this particular symptom isn't that bad. Feel free to adjust up or down as needed.
Keywords: sec-high
(In reply to Kannan Vijayan [:djvj] from comment #16)
> That said, I agree with your criticism of the name - but we should fix all
> of them to be something better sounding (e.g. InputAsValue, OutputAsValue,
> TempAsValue?).
> 
> What do you think?

"InputAsValue" is significantly longer than "ToValue", and is the common case, so I would prefer to keep ToValue is it is now. If that means keeping ToTempValue(), that's fine.
Flags: needinfo?(sstangl)
(In reply to Kannan Vijayan [:djvj] from comment #22)
> MochiTest:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/a5e214d4f8b0
> Fix: https://hg.mozilla.org/integration/mozilla-inbound/rev/a9cfcf4b62d9

The test was landed before the fix, and in a different push, which turned the tree orange. Please can you land fix-first in the future (even if in the same push), since it makes future bisecting easier.

Thanks :-)
(In reply to Kannan Vijayan [:djvj] from comment #22)
> MochiTest:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/a5e214d4f8b0
> Fix: https://hg.mozilla.org/integration/mozilla-inbound/rev/a9cfcf4b62d9

For future reference: best practices for security bugs are to _wait_ on pushing testcases until the fix has made it to release (& ESR, if applicable), so that our trunk-landing doesn't give attackers some ready-made test to help them exploit the bug.  That includes Try pushes, too -- best not to publish the testcase at all, if possible. (and just test locally)

Also:
(In reply to Kannan Vijayan [:djvj] from comment #19)
> Try run with review comments addressed:
> https://tbpl.mozilla.org/?tree=Try&rev=da264e8dae3e

Technically, any Try runs for security bugs shouldn't have any mention of the bug number, either, so they don't stand out to attackers as "here's a chunk of code that's linked to a hidden bug -- so there might be something exploitable here that I should try to hammer on." 

(The testcase in that Try run had the bug number in its filename)

Anyway, not a big deal, as long this isn't super-exploitable -- just want to make sure you're aware of those best practices, for future security bugs.
Flags: in-testsuite? → in-testsuite+
Status: NEW → ASSIGNED
Version: unspecified → Trunk
Flags: in-testsuite+ → in-testsuite?
Sorry guys, will remember the best practices hints in the future.

I don't understand why it got backed out though.  If the test is intermittently failing, then it indicates that there still exists some other bug that's causing issues, and a bug should be probably be filed for that.

I'm pushing my fix back in and leaving the test out for now until I get clarification on how I should proceed with checking it in.
(In reply to Kannan Vijayan [:djvj] from comment #26)
> Sorry guys, will remember the best practices hints in the future.

No worries!

> I don't understand why it got backed out though.

Generally, when a patch lands w/ tests and the tests fail immediately or nearly-immediately, the whole thing is backed out to be on the safe side, and what happens next can be discussed on the bug (here).

> If the test is
> intermittently failing, then it indicates that there still exists some other
> bug that's causing issues and a bug should be probably be filed for that.

Or, it could indicate that the original bug (this bug) isn't as fixed as we thought it was. That's a judgement call for the patch-author or reviewer to make, not for the tree-sheriffs -- hence, tree-sheriffs will tend to back out eagerly, and let you the patch author sort it out.

If you're confident that the original issue is fixed by your patch here and that remaining test-issues should be tracked in a followup bug, that's absolutely your call. (and you're probably the best-informed/qualified to file the followup) (I imagine that followup should be security-sensitive, though, if it ends up discussing details of this security-sensitive bug?)

> I'm pushing my fix back in and leaving the test out for now until I get
> clarification on how I should proceed with checking it in.

That sounds fine, *if* you're still confident that your patch fixes the underlying issue. (I'm not sure about that, given that the test is failing some fraction of the time -- but I haven't looked at the code involved, so I don't really know)
"Intermittent" is probably the wrong word; there weren't a lot of m4 runs when I did that backout.

However, your new testcase consistently fails on WinXP, debug or opt, and 32-bit Linux, debug or opt.  Those failures show up on your try run, too.  So either your test is busted and needs to be fixed, or the fix is busted and needs to be fixed.  Please back out.
(In reply to Daniel Holbert [:dholbert] from comment #27)
> Or, it could indicate that the original bug (this bug) isn't as fixed as we
> thought it was. That's a judgement call for the patch-author or reviewer to
> make, not for the tree-sheriffs -- hence, tree-sheriffs will tend to back
> out eagerly, and let you the patch author sort it out.

What he said.

(In reply to Nathan Froyd (:froydnj) from comment #28)
> "Intermittent" is probably the wrong word; there weren't a lot of m4 runs
> when I did that backout.
> 
> However, your new testcase consistently fails on WinXP, debug or opt, and
> 32-bit Linux, debug or opt.  Those failures show up on your try run, too. 
> So either your test is busted and needs to be fixed, or the fix is busted
> and needs to be fixed.  Please back out.

And he said.
I'm sure that the testcase failure is not caused by the bug I just fixed in Ion.

If that issue was still manifesting, it would be showing up in a much different way in the output of the testcase.  Looking through the sporadic oranges on M4, the outputs all say "elem instanceof Node working correctly. - got 1, expected 0".  If this ion bug was manifesting, then that output would be "got X, expected 0", where X should be some very high number around 90,000.

The fact that the testcase is consistently reporting that exactly one of the "instanceOf" tests is failing out of the total 100,000 that ran, then it's something else failing in a different way.

Unless the fix patch causes oranges, I don't think I'll back it out.  I have access to a windows machine so I'll take a look at why the testcase is messing up on win64 debug, and file a related bug.

There's a small chance that I'm wrong about this and the ion bug is still manifesting, but given the symptoms I seriously doubt it.
You could make the test check that the value is either 1 or 0. It would at least prevent this particular bug from regressing.
I'm looking to exactly how the test is failing right now and verifying my earlier statement that my patch fixes the issue and I'm not just missing something.  If I'm wrong I'll back out the patch push promptly.

I tested it via mochitest on my local machine (linux64) and it ran the test fine.  Verifying on windows machine now.
I hope you realize that if the test is indeed perma-orange, myself, philor, or edmorley will back it out. It's not acceptable to leave the tree in that condition. If you need to wallpaper the test for now, that's fine, but please push it sooner rather than later.
The test hasn't been pushed - the fix for the issue I found in Ion has as it addresses a legitimate security issue.  I'm investigating what's up with the test right now.
So I was wrong.  The patch does indeed seem to not be fixing the issues on 32-bit systems.  On 64-bit it does fix the problem.  As I'm not sure what's happening on 32-bit, and don't want to introduce new issues, I'm backing out the patch until I can resolve the problem.

Checkin: https://hg.mozilla.org/integration/mozilla-inbound/rev/72078b2daa84
Backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/ff7dbd3e59fd
Thanks for backing out.  I apologize for not looking at the issues and pushes involved a little more closely to give you a better description of the rationale.  I learned a lot from backing out somebody else's patches for the first time today. :)
No worries :)
Found the issue with the patch.  |masm.boxNonDouble| on 32-bit was using the JSVAL_TYPE_* enum directly as the tag value instead of the actual tag corresponding to that type.

I'm going to get an r+ on IRC for this super-minor change and re-pushing on the basis of the old review, since nothing is fundamentally changing with this fix.
https://hg.mozilla.org/mozilla-central/rev/e08357289bed
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
Blocks: LandIon
[Approval Request Comment]
Bug caused by (feature/regressing bug #):
Bug 803730

User impact if declined:
Possibly exploitable security issue.  Incorrect behaviour of "instanceof" javascript operator on websites.  Incorrect behaviour has already been noticed by third parties.

Testing completed (on m-c, etc.):
Green on m-i and m-c.  Mochitest case passes on try. (https://tbpl.mozilla.org/?tree=Try&rev=c8ea528bf34f)

Risk to taking this patch (and alternatives if risky):
Minimal to none.  Pure plugging of potential security hole and incorrect behaviour.

String or UUID changes made by this patch: 
None
Attachment #677472 - Attachment is obsolete: true
Attachment #682319 - Flags: approval-mozilla-aurora?
Comment on attachment 682319 [details] [diff] [review]
Fix boxing of object-inputs for instanceOf.

[Triage Comment]
low risk sec-high fix, approving for Aurora.  Please land before Monday morning PT to make it in before the merge.
Attachment #682319 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Whiteboard: [ion:p1] → [ion:p1][adv-main18-]
Group: core-security
You need to log in before you can comment on or make changes to this bug.