Closed Bug 556124 Opened 14 years ago Closed 13 years ago

layout/reftests/bugs/508908-1.xul reftest persistent failure since ff6b54ac276d

Categories

(Core :: XUL, defect)

defect
Not set
major

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- -

People

(Reporter: Waldo, Assigned: neil)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Bug 497789 apparently causes a reftest with very little script to fail.  I'd dismiss it as bogus, but it's happening everywhere and every time, so it clearly can't be such.
Blocks: 497789
confirming that this is a regression from bug 497789
Boris suggests something XBL does with JS_SetPrototype could be involved:

$ grep JS_SetPrototype content/xbl/*/*.cpp
content/xbl/src/nsXBLBinding.cpp:              ::JS_SetPrototype(cx, base, grandProto);
content/xbl/src/nsXBLBinding.cpp:    if (!::JS_SetPrototype(cx, obj, proto)) {

I will attack here.

/be
If we had some way of recording exactly what js runs, you could try opening the test and ref as -chrome with and without the patch from bug 497789 and see what changed...
In a failing build, frame trees are different in the test and reference. In the test:

      Menu(menulist)(1)@0x14508b8 {120,300,36360,1320} [state=80d40010] [content=0x1ee9fe00] [overflow=-240,-240,36840,1800] [sc=0x1450520]<
        Box(hbox)(-1)@0x1451040 {300,60,34740,1080} [state=80400000] [content=0x1eecb6f0] [sc=0x1450980]<
          ImageBox(image)(0)@0x1453f68 {0,540,0,0} [content=0x1eecb720]
          TextBox(label)(1)[value=]@0x1468710 {180,60,34380,960} [state=01000010] [content=0x1eed36e0]
        >
      >

In the reference:

      Menu(menulist)(0)@0x144b960 {120,300,36360,1200} [state=80d40010] [content=0x1ef91c70] [overflow=-240,-240,36840,1680] [sc=0x144b5c8]<
        Box(hbox)(-1)@0x144c0e8 {300,60,34740,960} [state=80400000] [content=0x1efa2f10] [sc=0x144ba28]<
          ImageBox(image)(0)@0x1453c10 {0,480,0,0} [content=0x1ef9c6e0]
          XULLabel(label)(1)@0x1453c90 [content=0x1ef9b750] {180,480,34380,0} [state=00d00000] sc=0x144c260(i=0,b=0)<
          >
        >
      >

Note TextBox vs XULLabel frames.
Labels get a TextBox frame if they have a "value" attribute and get a XULLabel frame otherwise.
OK.  So in a build that shows the problem, I see only one call to the selectedItem setter in menulist.xml, passing the Embed.  And that's actually exactly what I see in a build that doesn't show the problem too, as far as I can tell...
OK, so in a build that does NOT show the bug, that attribute is removed under te same selectedItem setter with this stack:

#0  nsXULElement::UnsetAttr (this=0x7fffe3ebf4c0, aNameSpaceID=0, aName=
    0x7fffe793aa60, aNotify=0)
    at ../../../../../mozilla/content/xul/content/src/nsXULElement.cpp:1271
#1  0x00007ffff65422a7 in nsXULDocument::SynchronizeBroadcastListener (this=
    0x7fffe0d78800, aBroadcaster=0x7fffe5718260, aListener=0x7fffe3ebf508, aAttr=
    ...) at ../../../../../mozilla/content/xul/document/src/nsXULDocument.cpp:763
#2  0x00007ffff65426bc in nsXULDocument::AddBroadcastListenerFor (this=
    0x7fffe0d78800, aBroadcaster=0x7fffe5718260, aListener=0x7fffe3ebf508, aAttr=
    ...) at ../../../../../mozilla/content/xul/document/src/nsXULDocument.cpp:859

with the addBroadcastListenerFor call happening at chrome://global/content/bindings/menulist.xml line 233.

Same thing in a build with the bug.  Note the aNotify == 0 (sounds like an XBL bug to me).

So the sequence of events in a working build is:

1)  Enter this function.
2)  Set the value attr; post a restyle event.
3)  Unset the value attr.
4)  Process restyle events.

The sequence of events in a broken build is:

1)  Enter this function.
2)  Set the value attr; post a restyle event.
3)  Process restyle events.
4)  Unset the value attr.

Since the unset call doesn't have aNotify=1 it doesn't trigger a restyle and we get the observed bug.

The extra "process restyle events" happens with this stack:

#0  nsCSSFrameConstructor::ProcessPendingRestyles (this=0x154e21e0) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/layout/base/nsCSSFrameConstructor.cpp:11147
#1  0x1253e628 in PresShell::FlushPendingNotifications (this=0x154e1dc0, aType=Flush_Layout) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/layout/base/nsPresShell.cpp:4781
#2  0x127fdb4c in nsDocument::FlushPendingNotifications (this=0x126bc00, aType=Flush_Layout) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/content/base/src/nsDocument.cpp:6339
#3  0x127fdafe in nsDocument::FlushPendingNotifications (this=0x1b51b400, aType=Flush_Layout) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/content/base/src/nsDocument.cpp:6334
#4  0x12865044 in nsObjectLoadingContent::GetExistingFrame (this=0x1f4d8e04, aFlushType=nsObjectLoadingContent::eFlushLayout) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/content/base/src/nsObjectLoadingContent.cpp:1761
#5  0x12868766 in nsObjectLoadingContent::EnsureInstantiation (this=0x1f4d8e04, aInstance=0xbfffa9fc) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/content/base/src/nsObjectLoadingContent.cpp:862
#6  0x12b2d1e4 in nsHTMLPluginObjElementSH::GetPluginInstanceIfSafe (wrapper=0x1f4505d0, obj=0x200cde80, _result=0xbfffa9fc) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/dom/base/nsDOMClassInfo.cpp:9280
#7  0x12b3323a in nsHTMLPluginObjElementSH::NewResolve (this=0x1f481370, wrapper=0x1f4505d0, cx=0x147fc00, obj=0x200cde80, id=354076164, flags=1, objp=0xbfffab38, _retval=0xbfffab3c) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/dom/base/nsDOMClassInfo.cpp:9656
#8  0x11868d12 in XPC_WN_Helper_NewResolve (cx=0x147fc00, obj=0x200cde80, idval=354076164, flags=1, objp=0xbfffabc4) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/js/src/xpconnect/src/xpcwrappednativejsops.cpp:1221
#9  0x003a1718 in js_LookupPropertyWithFlags (cx=0x147fc00, obj=0x200cde80, id=354076164, flags=1, objp=0xbfffac34, propp=0xbfffac30) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/js/src/jsobj.cpp:4599
#10 0x003a519f in js_GetPropertyHelper (cx=0x147fc00, obj=0x200cde80, id=354076164, getHow=3, vp=0xbfffaf60) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/js/src/jsobj.cpp:5008
#11 0x003a560a in js_GetMethod (cx=0x147fc00, obj=0x200cde80, id=354076164, getHow=3, vp=0xbfffaf60) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/js/src/jsobj.cpp:5104
#12 0x0037890d in js_Interpret (cx=0x147fc00) at jsops.cpp:1621
#13 0x0038fb0b in js_Invoke (cx=0x147fc00, argc=1, vp=0x2188b8f4, flags=0) at jsinterp.cpp:1356
#14 0x003900db in js_InternalInvoke (cx=0x147fc00, obj=0x200cbd80, fval=537711296, flags=0, argc=1, argv=0xbfffb7f0, rval=0xbfffb7f0) at jsinterp.cpp:1413
#15 0x003901f5 in js_InternalGetOrSet (cx=0x147fc00, obj=0x200cbd80, id=354528516, fval=537711296, mode=JSACC_WRITE, argc=1, argv=0xbfffb7f0, rval=0xbfffb7f0) at jsinterp.cpp:1450
#16 0x003afde3 in JSScopeProperty::set (this=0x15a4030, cx=0x147fc00, obj=0x200cbd80, vp=0xbfffb7f0) at jsscope.h:996
#17 0x003a31f5 in js_SetPropertyHelper (cx=0x147fc00, obj=0x200cbd80, id=354528516, defineHow=1, vp=0xbfffb7f0) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/js/src/jsobj.cpp:5278
#18 0x00379f8b in js_Interpret (cx=0x147fc00) at jsops.cpp:1872
#19 0x0038fb0b in js_Invoke (cx=0x147fc00, argc=1, vp=0x2188b8c8, flags=0) at jsinterp.cpp:1356
#20 0x003900db in js_InternalInvoke (cx=0x147fc00, obj=0x200cbd80, fval=537711232, flags=0, argc=1, argv=0xbfffc080, rval=0xbfffc080) at jsinterp.cpp:1413
#21 0x003901f5 in js_InternalGetOrSet (cx=0x147fc00, obj=0x200cbd80, id=354066020, fval=537711232, mode=JSACC_WRITE, argc=1, argv=0xbfffc080, rval=0xbfffc080) at jsinterp.cpp:1450
#22 0x003afde3 in JSScopeProperty::set (this=0x15a4050, cx=0x147fc00, obj=0x200cbd80, vp=0xbfffc080) at jsscope.h:996
#23 0x003a31f5 in js_SetPropertyHelper (cx=0x147fc00, obj=0x200cbd80, id=354066020, defineHow=1, vp=0xbfffc080) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/js/src/jsobj.cpp:5278
#24 0x00379f8b in js_Interpret (cx=0x147fc00) at jsops.cpp:1872
#25 0x0038fb0b in js_Invoke (cx=0x147fc00, argc=0, vp=0x2188b820, flags=0) at jsinterp.cpp:1356
#26 0x003900db in js_InternalInvoke (cx=0x147fc00, obj=0x200cbd80, fval=537713632, flags=0, argc=0, argv=0x0, rval=0xbfffc580) at jsinterp.cpp:1413
#27 0x002f9e8d in JS_CallFunctionValue (cx=0x147fc00, obj=0x200cbd80, fval=537713632, argc=0, argv=0x0, rval=0xbfffc580) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/js/src/jsapi.cpp:4977
#28 0x12a78c09 in nsXBLProtoImplAnonymousMethod::Execute (this=0x1f47cbf0, aBoundElement=0x1c55b3b0) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/content/xbl/src/nsXBLProtoImplMethod.cpp:329

Looks like we're on line 229 of chrome://global/content/bindings/menulist.xml (in the selectedItem setter).
OK.  The reason the unset call has aNotify=0 is that nsXULDocument::SynchronizeBroadcastListener passes false to aNotify.

The reason we flush is that given this code in the selectedItem setter:

225       this.setAttribute('value', val.getAttribute('value'));
226       this.setAttribute('image', val.getAttribute('image'));
227       this.setAttribute('label', val.getAttribute('label'));
228       this.setAttribute('description', val.getAttribute('description'));

we miss the propcache for "getAttribute" in line 228, end up in the plugin obj resolve hook, and flush layout.  So we end up creating the bogus frame before the broadcaster synchronize happens.

That said, the fact that SynchronizeBroadcastListener is passing a false aNotify to the attr change call is broken.  It seems to be doing that because mInitialLayoutComplete is false, and that happens because we're under our StartLayout call.  We need to be notifying here any time we might have frames, I would think.  ccing folks who might know about this code.

Brendan, I still wonder why we used to hit the propcache here... worth looking into?
(In reply to comment #8)
> Brendan, I still wonder why we used to hit the propcache here... worth looking
> into?

As discussed on IRC, it's not surprising. Before bug 497789 was fixed a grand-proto or further hit would be keyed by (pc, kobj) where kobj was the direct (key) object. After 497789 was fixed it would be (pc, kshape) where kshape was the direct object's shape at that program point when the cache was filled.

The fact that we may resolve on miss, and resolve can have effects, is as usual a design flaw -- a bug, in short.

I'm gather propcache.stats with and without 497789's fix, will attach a diff.

/be
Right, that's the thing.  I would have expected us to always miss here before, since kobj would be new every time we entered this code....
(In reply to comment #10)
> Right, that's the thing.  I would have expected us to always miss here before,
> since kobj would be new every time we entered this code....

I misspoke in comment 9: key without 497789 patched was (katom, kobj), not (pc, kshape). So pc-independent, no need to fill once by running through the same code path (or similar one that hit the fill point).

Thanks to bz for all the debugging help. This seems to be a latent XUL bug (see comment 8, penultimate paragraph) that my patch for bug 497789 exposed. bz hoped one of the Neils or Smaug could take it. It should block 1.9.3.

/be
Assignee: brendan → nobody
blocking2.0: --- → ?
Component: JavaScript Engine → XUL
QA Contact: general → xptoolkit.widgets
Moreover beyond blocking 193, it blocks a merge of current TraceMonkey to mozilla-central, so it's important that it be fixed sooner rather than later (not quite "as soon as possible", I think, but we shouldn't dally on a fix lest the TM merge become much more monolithic than it already is).
(In reply to comment #11)
> I misspoke in comment 9: key without 497789 patched was (katom, kobj), not (pc,
> kshape). So pc-independent, no need to fill once by running through the same
> code path (or similar one that hit the fill point).

The old way was bad for grand-proto method call sites, where lots of different objects were used as the direct (key, receiver) object. You'd always miss. So the bug 497789 patch is an improvement in general, for important grand- (and great^8- grand-, in gmail's case :-P) proto-based method calling code.

But it is interesting that the old way would compete for direct-mapped property cache space between (kobj, atom) and (kshape, pc) key types, the former for deep (not direct or immediate proto) accesses. The (kobj, atom) entries were good for many possible get/set/call sites (pcs).

This means there will be misses where we used to get hits, for singleton-object access patterns.

(In reply to comment #12)
> Moreover beyond blocking 193, it blocks a merge of current TraceMonkey to
> mozilla-central, so it's important that it be fixed sooner rather than later
> (not quite "as soon as possible", I think, but we shouldn't dally on a fix lest
> the TM merge become much more monolithic than it already is).

The test can be disabled immediately, reenabled when this bug is fixed.

/be
(In reply to comment #8)
> We need to be notifying here any time we might have frames
But presumably not before we've even started frame construction, so maybe the fix is as simple as using mDocumentLoaded instead of mInitialLayoutComplete?
> But presumably not before we've even started frame construction

Sure (though that's a performance optimization of questionable correctness).

Using mDocumentLoaded would fix this bug, I think.  Is it ok otherwise?  I have to admit to not having a good handle on the broadcaster stuff and the invariants it expects.
(In reply to comment #15)
> Using mDocumentLoaded would fix this bug, I think.  Is it ok otherwise?  I have
> to admit to not having a good handle on the broadcaster stuff and the
> invariants it expects.
I don't think the broadcaster stuff cares. The idea is simply to avoid spurious notifications while overlays are being applied to broadcaster elements.
blocking2.0: ? → beta1+
Switched it from fails to random in http://hg.mozilla.org/mozilla-central/rev/f414fcdd192d since as comment 19 (and the next two runs) points out, it unexpectedly passes on the 10.5.8 talos slaves that have taken over running reftests on mozilla-central.
Blocks: 438871
blocking2.0: beta1+ → beta2+
Neil, can you post a patch for comment #14?
Assignee: nobody → neil
Attached patch Possible patchSplinter Review
My workflow doesn't support push-to-try but the browser starts, so...
jst?
Attachment #454996 - Flags: review+
Pushed changeset 386da94a0c0e to mozilla-central.

Does the reftest.list need to be restored?
> Does the reftest.list need to be restored?

You mean unmark the test as random?  I'd think so, yes...
Moving to beta3, as the underlying issue is fixed, and this is just holding on removing the workaround, AIUI.
blocking2.0: beta2+ → beta3+
I just reenabled the test:  http://hg.mozilla.org/mozilla-central/rev/04dbdc3370d0
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to comment #37)
> Can we resolve this?

It's still happening, although very infrequently.
Is it really?  The only occurrence since September was a crash on Thread 21, which makes me think it might be different.
Well, we can always reopen I guess.  :-)
Status: REOPENED → RESOLVED
Closed: 14 years ago13 years ago
Resolution: --- → FIXED
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: