Closed Bug 1028990 Opened 10 years ago Closed 10 years ago

Broken null check for object property

Categories

(Core :: JavaScript Engine: JIT, defect)

30 Branch
x86_64
Windows 7
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla33
Tracking Status
firefox31 - wontfix
firefox32 + verified
firefox33 --- verified
firefox-esr31 + fixed

People

(Reporter: michael.frischeisen-koehler, Assigned: bhackett1024)

References

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0 (Beta/Release)
Build ID: 20140605174243

Steps to reproduce:

1. Launch http://live.yworks.com/yfiles-for-html/1.2/demos/graphviewer/demo.yfiles.graph.viewer/index.html
2. Quickly choose the "project-application" entry from the combo box


Actual results:

A null reference exception is thrown when trying to access a property that has been checked for null in the preceding line. 
The unobfuscated code where the exception is thrown: 

if (this.$localSerializationProperties !== null) {
  /*final*/ var value = {};
  // The exception is thrown when trying to access this.$localSerializationProperties here, 
  // even though this property has been checked for null two lines above. 
  if (this.$localSerializationProperties.tryGetValue(key, value)) { 
     // ...
  }
  // ...
}

The property is initialized like this: 
'$localSerializationProperties': null, 

Note that the exception doesn't occur if 

if (this.$localSerializationProperties !== null)

is replaced with

if (this.$localSerializationProperties)

It seems like the property is undefined on first access and null on second access, even though there is no way the value can change between the null check and the property access. 

Unfortunately, this problem can't be debugged - it doesn't occur anymore when the debugger is active. I think this clearly indicates that it is a firefox bug. 

The problem also occurs with the current nightly build (33.0a1). 

The problem doesn't occur when waiting about 10 seconds before choosing the mentioned combo box value.



Expected results:

No exception (as in all other browsers)
> it doesn't occur anymore when the debugger is active.

Does it also stop occurring if you flip the "javascript.options.ion" preference to false in about:config?  This sounds like a JIT issue...
Status: UNCONFIRMED → NEW
Component: JavaScript Engine → JavaScript Engine: JIT
Ever confirmed: true
Flags: needinfo?(michael.frischeisen-koehler)
(In reply to Boris Zbarsky [:bz] from comment #1)
> > it doesn't occur anymore when the debugger is active.
> 
> Does it also stop occurring if you flip the "javascript.options.ion"
> preference to false in about:config?  This sounds like a JIT issue...

Yes, it seems like it doesn't occur anymore when "javascript.options.ion" is turned off.
Flags: needinfo?(michael.frischeisen-koehler)
Flags: needinfo?(jdemooij)
Alice, maybe you're interested in bisecting this? :)
I can reproduce in Firfox27.0 but not Firefox26.

Regression window(m-c)
Good:
http://hg.mozilla.org/mozilla-central/rev/1a2d9a04ffb2
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0 ID:20130927090306
Bad:
http://hg.mozilla.org/mozilla-central/rev/e1914e294152
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0 ID:20130927191437
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=1a2d9a04ffb2&tochange=e1914e294152

Regression window(m-i)
Good:
http://hg.mozilla.org/integration/mozilla-inbound/rev/19af7baaf26e
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0 ID:20130927102235
Bad:
http://hg.mozilla.org/integration/mozilla-inbound/rev/3f8e57e07eee
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0 ID:20130927103035
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=19af7baaf26e&tochange=3f8e57e07eee
Triggered by:
3f8e57e07eee	Brian Hackett — Bug 920689 - Only include types for 'own' properties in heap type sets, r=jandem.
Blocks: 920689
(In reply to Alice0775 White from comment #4)
> I can reproduce in Firfox27.0 but not Firefox26.
...
> Triggered by:
> 3f8e57e07eee	Brian Hackett — Bug 920689 - Only include types for 'own'
> properties in heap type sets, r=jandem.

Once again, thanks for tracking this down within an hour. You're the best.

Brian can you take a look?
Flags: needinfo?(jdemooij) → needinfo?(bhackett1024)
I'll be on vacation the next couple weeks but will look at this as soon as I get back.
While I really appreciate the very quick investigation and the willingness to look into this into the future, I would like to kindly ask if someone else might be able to shed some more light into this?

From what I can see, now, I cannot trust the Javascript engine in FF anymore at all, unless we use the switch to toggle "ion" off, which is not an option for our corporate customers. 

I would like to at least better understand what exactly *might* be broken here: is it everything that is inherited from the prototype that sometypes may just "not be there" - is it the comparison against null only, the first access only? If I have to rewrite my code in order not to be affected in FF, what is it we would have to do? Should we cache all property accesses in local variables first? 
The problem is that we have literally hundreds but more likely thousands of such code pieces in our code base (checking for null and accessing properties that may have been defined in the prototype) and I really wonder why it was only (until now) this specific piece of code that broke - we certainly tested our library and software with FF betas and nightlies and did not come across this problem, but since we have that specific demo online we got those error reports regularly. I would really hate to tell our customers that they cannot use FF at all anymore because the JS interpreter cannot be trusted anymore at all :-(
Sebastian, chances are the answer to exactly what's broken is going to need to wait on Brian getting back.  :(

Fundamentally, it's a matter of the JIT optimizing things incorrectly in some way; this can be very specific to the particular code around a particular property access, as well as the history of observed objects coming through the access point, which is why it didn't come up in other cases.

And yes, I agree that compiler bugs really suck to deal with...
Attached patch potential patchSplinter Review
This seems like a bug in the JIT which was tickled by bug 920689, and not really related to TI.  The type information here is such that the $localSerializationProperties property on the 'this' object either holds an object or is not present, and the prototype can have a null $localSerializationProperties property.  When we compile the first $localSerializationProperties access we use a GetPropertyPolymorphic MIR node, which only gets 'own' properties, so it's valid for this node to produce a known object (with the != null test then presumably constant folded).  Were the property access to produce a null value it would mean we fetched the value off the prototype, and we'll bailout before that happens.  This behavior is new as of bug 920689, which improved precision so that we recognized a null value could not be produced by an own property lookup here.

The problem seems to be that we either eliminate this first GetPropertyName, or fold it into the later $localSerializationProperties access.  We do bailout, but seem to end up at the second access when we get back to baseline, with the != null check already having been incorrectly performed.  The attached patch fixes this, though I don't know if it has performance ramifications.  I'm also not sure why we're able to eliminate the first GetPropertyName in the first place (this has been annoying to debug as it only repros for me in opt builds), as it is consumed by the compare and we don't seem to constant fold compares until lowering.
Assignee: nobody → bhackett1024
Attachment #8456423 - Flags: review?(jdemooij)
Flags: needinfo?(bhackett1024)
Thanks for investigating this further, Brian!
Since I believe it could take a while until the fixed reaches our customers (feel free to correct me if I'm wrong! ;-) ), could you please help us understand the bug from a users/programmers point of view, better? I am trying to understand how we can change our code base (not just the line in question, but probably other places, too), so that this kind of bug cannot happen anymore - is it enough to copy "null" properties from the prototype to the instance upon instance initialization so that the null becomes an "own property"? From your description I would also like to understand whether it's just "object" properties that are null, that can be affected, or whether primitive fields/properties could also be affected? Any idea why the bug does not occur after 10 seconds of waiting? Is there an optimization or deoptimization run in between? 
I think others might be interested in that information, too, although it's probably hard to make the connection with this bugreport for someone who runs into a similar issue...
Thanks again for your help!
Flags: needinfo?(bhackett1024)
(In reply to Brian Hackett (:bhackett) from comment #9)
> When we compile the first
> $localSerializationProperties access we use a GetPropertyPolymorphic MIR
> node, which only gets 'own' properties, so it's valid for this node to
> produce a known object (with the != null test then presumably constant
> folded).  Were the property access to produce a null value it would mean we
> fetched the value off the prototype, and we'll bailout before that happens.

Shouldn't getPropTryInlineAccess end up in pushTypeBarrier in this case, where we add a TypeBarrier that's set as guard?
(In reply to Jan de Mooij [:jandem] from comment #11)
> Shouldn't getPropTryInlineAccess end up in pushTypeBarrier in this case,
> where we add a TypeBarrier that's set as guard?

No, because there isn't a type barrier added.  The type barrier is only needed to capture the cases where execution can generate new types without bailing out --- this is why, for example, property reads which might hit a missing property and integer additions which might overflow don't need a barrier.  MGetPropertyPolymorphic can only fetch from 'own' properties of the object; if the object has such an 'own' property it will produce an object, and not require a barrier, and if the object does not have an 'own' property then the instruction will bail out.

This patch is pretty unsatisfying to me though, because the other operations mentioned above don't need to be marked as guards, and I think we really should just not be eliminating the first property read since it is used in the compare.  I could look again and try harder to see what's going on, but before that do you have ideas about what could cause this behavior?  Debugging this in an opt build is annoying.
Flags: needinfo?(bhackett1024)
(In reply to Sebastian Müller from comment #10)
> Thanks for investigating this further, Brian!
> Since I believe it could take a while until the fixed reaches our customers
> (feel free to correct me if I'm wrong! ;-) ), could you please help us
> understand the bug from a users/programmers point of view, better? I am
> trying to understand how we can change our code base (not just the line in
> question, but probably other places, too), so that this kind of bug cannot
> happen anymore - is it enough to copy "null" properties from the prototype
> to the instance upon instance initialization so that the null becomes an
> "own property"? From your description I would also like to understand
> whether it's just "object" properties that are null, that can be affected,
> or whether primitive fields/properties could also be affected?

Unfortunately the investigation isn't quite complete and I don't have a good idea for what could trigger this behavior.  I'm hoping I'll be able to look at this more later today or tomorrow.

> Any idea why
> the bug does not occur after 10 seconds of waiting? Is there an optimization
> or deoptimization run in between? 

After 10 seconds the browser has probably performed a garbage collection, which will usually wipe out all the optimized JIT code.
(In reply to Brian Hackett (:bhackett) from comment #12)
> This patch is pretty unsatisfying to me though, because the other operations
> mentioned above don't need to be marked as guards, and I think we really
> should just not be eliminating the first property read since it is used in
> the compare.  I could look again and try harder to see what's going on, but
> before that do you have ideas about what could cause this behavior? 
> Debugging this in an opt build is annoying.

From comment 0:

  if (this.$localSerializationProperties !== null) {

If the property read is a GetPropertyPolymorphic and its type can't be null, GVN will fold the comparison and then we can DCE the property read. If you set the isGuard() flag, we shouldn't eliminate the read. We can still fold the comparison and the second property read, but we shouldn't eliminate the first one.

It also doesn't repro in debug builds when disabling off-thread compilation?
(In reply to Jan de Mooij [:jandem] from comment #14)
> From comment 0:
> 
>   if (this.$localSerializationProperties !== null) {
> 
> If the property read is a GetPropertyPolymorphic and its type can't be null,
> GVN will fold the comparison and then we can DCE the property read. If you
> set the isGuard() flag, we shouldn't eliminate the read. We can still fold
> the comparison and the second property read, but we shouldn't eliminate the
> first one.

OK, I just saw MCompare::tryFold, which is invoked during lowering, but didn't see it is also invoked during GVN.

Maybe this patch is the right fix after all.  MGuardShape and MGuardShapePolymorphic also setGuard(), and MGetPropertyPolymorphic is effectively a combination of a shape guard and slot dispatch.  Certainly these instructions are ones that should never be eliminated for a getprop, as after the shape guard bails out they might not just get a value of a new type but also have side effects by invoking a getter on the proto etc.
Comment on attachment 8456423 [details] [diff] [review]
potential patch

Review of attachment 8456423 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, I agree this seems to be the right fix.
Attachment #8456423 - Flags: review?(jdemooij) → review+
https://hg.mozilla.org/mozilla-central/rev/b10531b4a946
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Thank you all for working on this issue and fixing it!

Is there any chance this can included in versions not so far in the future? Can't this be backported to 31, safely?

If not, now that you found a fix - can you explain to Joe average programmer how to write the code so that he can avoid running into this issue. Telling our corporate customers that they should either work with nightly, alpha, or beta builds or go back to version 26 is not really a great option for us, so I would either like to see the fix hit stable earlier or at least be able to rewrite our code so that we can be rather sure that it does not fail randomly at some place.

Thank you very much for your great work and time!
Setting tracking flags. 31 is probably too late (release next week), but we should get this on ESR and 32, the fix is very simple.
Brian, could you request an uplift for 32?

Jan, could you detail the impact to users before making an evaluation for the ESR tracking?
Flags: needinfo?(bhackett1024)
(In reply to Sylvestre Ledru [:sylvestre] from comment #21)
> Jan, could you detail the impact to users before making an evaluation for
> the ESR tracking?

This bug can break (corporate) websites (see this bug) and the fix is a very simple one-liner, I think we should take it.
OK. thanks. Tracking then.

Brian, please could you also fill an uplift request for ESR? Thanks.
Comment on attachment 8456423 [details] [diff] [review]
potential patch

Approval Request Comment
[Feature/regressing bug #]: old
[User impact if declined]: broken websites, see comment 22
[Describe test coverage new/current, TBPL]: none
[Risks and why]: none
Attachment #8456423 - Flags: approval-mozilla-esr31?
Attachment #8456423 - Flags: approval-mozilla-aurora?
Flags: needinfo?(bhackett1024)
(In reply to Sebastian Müller from comment #19)
> If not, now that you found a fix - can you explain to Joe average programmer
> how to write the code so that he can avoid running into this issue. Telling
> our corporate customers that they should either work with nightly, alpha, or
> beta builds or go back to version 26 is not really a great option for us, so
> I would either like to see the fix hit stable earlier or at least be able to
> rewrite our code so that we can be rather sure that it does not fail
> randomly at some place.

Well, the bug should just affect comparisons like the one in comment 0, where there is an 'if (x.f != null)' or 'if (x.f)' or so forth, where x.f normally holds an object or string.  There are a couple ways to work around this.  Option (a) is to inhibit Ion compilation in functions containing this sort of comparison, by adding a 'with ({}) {}' statement or similar to the function (functions containing 'with' statements are not Ion compiled).  Option (b) is to degrade the type information we have about these properties, by first writing a null value to the property when it is assigned to, e.g. instead of 'x.f = g' you would have 'x.f = null; x.f = g'.
Attachment #8456423 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
QA Whiteboard: [good first verify]
Reproduced the issue on Nightly (2014-06-23) (Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:33.0) Gecko/20100101 Firefox/33.0, build ID:20140623030201). 

Verified as fixed on latest Aurora (2014-07-30) (Mozilla/5.0 (Windows NT 6.1; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0, Build ID:20140730004005)
Status: RESOLVED → VERIFIED
QA Whiteboard: [good first verify] → [good first verify] [bugday-20140730
QA Whiteboard: [good first verify] [bugday-20140730 → [good first verify] [bugday-20140730]
Verified as fixed also on Firefox 32 Beta 2 (buildID: 20140728123914) under Windows 7 64bit.
Blocks: 1046969
Attachment #8456423 - Flags: approval-mozilla-esr31? → approval-mozilla-esr31+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: