Last Comment Bug 196603 - devedge.netscape.com - menus and "customize" do not work in recent nightlies
: devedge.netscape.com - menus and "customize" do not work in recent nightlies
Status: RESOLVED FIXED
[patch]
: regression, testcase
Product: Core
Classification: Components
Component: CSS Parsing and Computation (show other bugs)
: Trunk
: All All
: P1 major (vote)
: mozilla1.4alpha
Assigned To: David Baron :dbaron: ⌚️UTC-7 (busy September 14-25)
: Hixie (not reading bugmail)
Mentors:
http://devedge.netscape.com/
Depends on: 171830
Blocks:
  Show dependency treegraph
 
Reported: 2003-03-09 14:05 PST by Marek Stępień [:marcoos, inactive]
Modified: 2003-03-15 20:52 PST (History)
11 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
testcase (716 bytes, text/html)
2003-03-09 20:35 PST, Andrew Schultz
no flags Details
testcase that's more clear (1.09 KB, text/html)
2003-03-10 21:12 PST, Andrew Schultz
no flags Details
smaller testcase (840 bytes, text/html)
2003-03-13 00:25 PST, Boris Zbarsky [:bz] (TPAC)
no flags Details
patch (14.76 KB, patch)
2003-03-15 11:43 PST, David Baron :dbaron: ⌚️UTC-7 (busy September 14-25)
roc: review+
roc: superreview+
Details | Diff | Splinter Review
potential bustage fix (6.51 KB, patch)
2003-03-15 17:08 PST, David Baron :dbaron: ⌚️UTC-7 (busy September 14-25)
no flags Details | Diff | Splinter Review
macro-ization (7.40 KB, patch)
2003-03-15 19:13 PST, David Baron :dbaron: ⌚️UTC-7 (busy September 14-25)
no flags Details | Diff | Splinter Review

Description Marek Stępień [:marcoos, inactive] 2003-03-09 14:05:54 PST
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4a) Gecko/20030308
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4a) Gecko/20030308

The redesigned devedge.netscape.com does not work properly in recent nightlies.
It uses Eric Meyer's CSS menu, but modified with some JavaScript to show the
pull down menus and to display a <div> with the option to change stylesheet.
This worked in 20030303 nightly build of Mozilla, and does not work
in the following builds:

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4a) Gecko/20030308
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3b) Gecko/20030307 Phoenix/0.5 

Reproducible: Always

Steps to Reproduce:
1. Go to http://devedge.netscape.com/
2. Hover the "Customize" button [1] or the menu ("View Source | Archive | Tech
Centrall") [2]


Actual Results:  
Nothing happens.

Expected Results:  
Show the <div> with option to change the stylesheet [1]
or display the contents of the menu [2].


I was also notified by MozillaZine readers (
http://www.mozillazine.org/forums/viewtopic.php?t=7089 ), that this problem is
reproducible also in:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.3b) Gecko/20030308 Phoenix/0.5

but the page works fine in
Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.4a) Gecko/20030306

so this has become broken probably on March 7th or March 8th.
Comment 1 Chris Casciano 2003-03-09 15:32:50 PST
WFM branch build:
Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.3) Gecko/20030308
Comment 2 Marek Stępień [:marcoos, inactive] 2003-03-09 16:13:33 PST
Yes, I forgot to mention here, that this problem exists only in 1.4a trunk
builds, not in the 1.3 branch.
Comment 3 Andrew Schultz 2003-03-09 19:10:13 PST
confirmed with linux turnk 20030308
regression bewteen linux trunk 2003030605 and 2003030705

==> Layout
Comment 4 Andrew Schultz 2003-03-09 20:35:46 PST
Created attachment 116737 [details]
testcase

mousing over "International" works the first time, mousing out hides it, but
after that it stops working.
Comment 5 Andrew Schultz 2003-03-09 21:03:41 PST
regression from bug 171830?
Comment 6 Boris Zbarsky [:bz] (TPAC) 2003-03-09 21:25:54 PST
Hmm.. ok, I see the problem on the devedge site.  I have to try very hard to get
the testcase to reproduce it (have to move over to the right, not down; then
sometimes the mouseout fires without me actually mousing out of the <p> and the
bug appears).

ccing jkeiser in case he knows why the mouseout fires when it should not.....

In any case, I'm not likely to be able to seriously debug this for at least a
few more days....
Comment 7 Andrew Schultz 2003-03-10 05:33:22 PST
backing out bug 171830 fixed the URL and testcase
Comment 8 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2003-03-10 07:08:25 PST
The patch for bug 171830 *should* have changed what happens here from a reflow
to a repaint -- but could it somehow have caused a reframe instead?
Comment 9 Boris Zbarsky [:bz] (TPAC) 2003-03-10 08:05:35 PST
OK, so I think I sorta know what's going on...  Here's what I see when I stick
some printfs in the code (the ParsePropertyValue is in
nsDOMCSSAttrDeclaration.cpp, the other is in the frame constructor; MaxHint is
the hint the reresolve returned).

ParsePropertyValue 'padding-bottom' to '0' Old Value '0pt'
MaxHint: 14
ParsePropertyValue 'left' to '6px' Old Value '6px'
MaxHint: 14
ParsePropertyValue 'top' to '22px' Old Value '22px'
MaxHint: 14
ParsePropertyValue 'visibility' to 'visible' Old Value 'hidden'
MaxHint: 14

So the problem is in fact that we just don't repaint that area (going to a
different virtual terminal and coming back make the menu show up fine).

If I change nsStyleContext::CalcStyleDifference to use GetStyleData throughout
instead of using PeekStyleData, I get the right change hints and the right behavior.

What I suspect is happening is this.

1)  We start out with a style context (Context1).
2)  We change some inline style, creating Context2; diff context1 and context2
    and _stop_ at the point when we find what the difference is (when the
    optimizations in CalcStyleDifference kick in).  We post a reflow or
    invalidate or something for the frame.
3)  Before this is processed, we change some more inline style creating
    Context3; we diff context2 and context3.  Now the reflow/repaint/whatever
    has not happened yet, so we only have the style structs on context2 that
    were created in step #2.  None of those structs changed, so we move on down
    the list and see no change at all.

E.g. if step #2 triggered a reflow and step #3 triggered a reflow or repaint
further down the list, the second reflow or repaint would not get processed
properly...

Another possibility is that the hint from step #2 _is_ being processed fully,
but just not requesting all the structs that it probably should....
Comment 10 Boris Zbarsky [:bz] (TPAC) 2003-03-10 08:47:19 PST
Needless to say, a minimal testcase that demonstrates the problem consistently
(as the page does) would help immensely in debugging this.... ;)  I'll try to
construct one tomorrow, but if someone has time before that and is willing to do
it....
Comment 11 Andrew Schultz 2003-03-10 09:02:02 PST
the testcase works consistently for me.  The one thing to watch out for is that
if you mouseover the hidden element (Français), it will become visible.  Also
the mouseover area is 100% width, so mouseover left-right is not real effective
unless your mouse actually leaves the Mozilla window.  Coming in from the top
and then leaving through the top triggers the bug every time for me.

(I'm a bit confused by "have to move over to the right, not down")
Comment 12 Andrew Schultz 2003-03-10 21:12:34 PST
Created attachment 116836 [details]
testcase that's more clear

with this testcase, it should be easier to see what is and is not happening.
Comment 13 Boris Zbarsky [:bz] (TPAC) 2003-03-10 21:21:41 PST
Thanks!  Yeah, I see the problem with the first testcase if I go up.

Will try to debug some more later this week....
Comment 14 Pascal Chevrel:pascalc 2003-03-12 15:29:44 PST
Seeing the problem with build 2003031208 on WinXP too, changing OS to ALL
Comment 15 Boris Zbarsky [:bz] (TPAC) 2003-03-13 00:25:16 PST
Created attachment 117059 [details]
smaller testcase

OK, this one is probably about minimal...

This testcase, and others in this bug, fail even if I change from PeekStyleData
to GetStyleData, though that change fixes the devedge site.

Notes:

1)  I can reproduce the bug with visibility or -moz-opacity, but not with
color;
    I'm betting that's because the color struct is after the padding struct in
    the order we compute changes in.
2)  When I add a printf to the point right after I get the "otherVis" struct in

    nsStyleContext::CalcStyleDifference as follows:

      fprintf(stderr, "this: %p, other: %p, vis: %p, othervis: %p, diff: %d\n",

		       this, aOther, vis, otherVis,
		       vis->CalcDifference(*otherVis));

    I get the following when I mouseover the text:

this: 0x876c040, other: 0x876c5f4, vis: 0x862cfa8, othervis: 0x862cfa8, diff: 0

this: 0x876c768, other: 0x876c140, vis: 0x876c200, othervis: 0x876c7c8, diff: 0


and the following when I mouseout:

this: 0x876c140, other: 0x876c794, vis: 0x876c7c8, othervis: 0x876c200, diff:
94

If I take out the line that sets the padding, I instead get:

this: 0x87658e0, other: 0x8765d44, vis: 0x8765860, othervis: 0x8765ca4, diff:
94 (mouseover)
this: 0x8765d44, other: 0x8765834, vis: 0x8765ca4, othervis: 0x8765860, diff:
94 (mouseout)

It may be that caching the style structs themselves somewhere in the ruletree,
so that even though the content gets a new style rule and hence a new
rulenode/stylecontext it's ending up with the same struct as it used to
have.... Not sure why that's breaking things, though....
Comment 16 Chris Casciano 2003-03-14 07:14:03 PST
Setting platform to All (Mac/OS X)
Comment 17 Pham 2003-03-15 06:56:53 PST
I'm seeing this too. What I can do is hover over the menu(download etc) and
press ctrl+ -/= and the menu drops down, but when I go over them, they
dissapear. This problem also dissapears if I disable javascript for navigator
for the moment.
Comment 18 Boris Zbarsky [:bz] (TPAC) 2003-03-15 08:02:29 PST
You can't set that flag if you're not a driver.  Or rather you can, but you'll
be ignored.

David, do you have any idea what's up here?  I may be able to debug this on
Tuesday a bit, but then I'm gone till early April...
Comment 19 Boris Zbarsky [:bz] (TPAC) 2003-03-15 10:09:43 PST
OK, here's what's going on:

1)  We change padding on the parent node.   This triggers a style reresolve down
    the content/frame tree from that point.  All the kids get new style
    contexts, but none of those style contexts get any structs, because of the  
    "it's in the same position in the rule tree and has the same rulenode"
    optimization.
2)  Since the change hint comes out to NS_STYLE_HINT_NONE, there is no actual
    reflow or anything, so those structs remain unfilled (before bug 171830 the
    hint here would have been REFLOW).
3)  Another change comes along (for visibility) targeted at one of the kids and
    the structs are either null (since they don't exist) or get computed based
    on the nsCSSDeclaration. But the nsCSSDeclaration has _changed_ (since we
    don't clone the decl).  So our change hint here is always
    NS_STYLE_HINT_NONE.

Removing the optimization in item #1 fixes this bug (since the structs are now
filled).  Just cloning the decl fixes some testcases, but not all.  Just using
GetStyleData instead of PeekStyleData fixes some testcases, but not all. 
Cloning the decl _and_ using GetStyleData does fix all the testcases (since it
forces the creation of structs with the right style data in them).

Ho-hum.  So thoughts?  Note that we've always ended up in this situation where
style structs are blown away from child contexts by a parent's inline style
changing and things are broken thereafter; it just didn't affect inline style
changes, since those didn't depend on the resolved change hint (but _did_
affect, eg, :hover, alternate sheet switches, etc).
Comment 20 Boris Zbarsky [:bz] (TPAC) 2003-03-15 10:40:56 PST
So to sum up the situation, there are three possible approaches to the fix that
I can think of:

1)  Go back to doing a reflow even in cases when it's not really needed
2)  Make sure the new style contexts created for the kids in Step 1 of comment
    19 have all the needed style structs.  One way would be to change the
    optimization to always GetStyleData if PeekStyleData returned something but
    to only call CalcDifference if the rulenodes are different.
3)  Make sure that we don't touch the old style data in any way (Clone() the
    nsCSSDeclaration in inline style changes) and replace PeekStyleData with
    GetStyleData.

I'm not actually sure which of these is most performant.....  My personal
preference lies with #2, sort of, but real style rule immutability would sorta
point to #3....  Of course then cloning an nsCSSDeclaration needs to be cheap.
Comment 21 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2003-03-15 10:53:07 PST
I think I prefer #2 as well.  (Doesn't that just involve removing the
"break-out" optimization in CalcStyleDifference by moving the NS_IsHintSubset
tests in with the |struct != otherStruct| test?)  How does that disagree with
style rule immutability?
Comment 22 Boris Zbarsky [:bz] (TPAC) 2003-03-15 11:06:35 PST
Actually, it means combining the mRuleNode == aOther.mRuleNode test with the
struct != otherstruct test.

We could move the hint subset stuff there too, but if that test fails then we'll
be doing reflow or whatever and reconstructing the structs anyway (or so the
theory goes).  It's safer to combine all three tests, of course....

The immutability issue is that in this case we _are_ mutating the old
nsIStyleRule.  But I guess it does not matter since we also stop pointing to it
once the re-resolve completes...

OK, so let's go with #2.  David, will you have time to do this in the next day
or two?  If not, I can implement it on Tuesday.
Comment 23 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2003-03-15 11:29:02 PST
I'll attach a patch to do #2 shortly (once my build finishes and I can test it).
Comment 24 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2003-03-15 11:43:52 PST
Created attachment 117333 [details] [diff] [review]
patch

OK, this is fun.  If I have porting trouble with this, I'll add a bogus 6th
parameter to the inline function for type deduction and get rid of the explicit
typing.
Comment 25 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2003-03-15 11:50:26 PST
Actually, hold on, there's a slightly better way to do this...
Comment 26 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2003-03-15 11:51:38 PST
Actually, never mind.
Comment 27 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2003-03-15 13:53:12 PST
But when I check in I should remember to add the accessor to nsStyleSVG, since
for some reason it doesn't have one.
Comment 28 Robert O'Callahan (:roc) (email my personal email if necessary) 2003-03-15 16:08:31 PST
Comment on attachment 117333 [details] [diff] [review]
patch

nice! But I also wonder about the portability of the explicit template
parameter.
Comment 29 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2003-03-15 16:16:03 PST
We use the trick of adding an extra parameter to template functions all over in
the string code, so it clearly works on our set of compilers.
Comment 30 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2003-03-15 17:08:43 PST
Created attachment 117356 [details] [diff] [review]
potential bustage fix

Here's what I'd do if there are problems with what I checked in (in case some
obscure port breaks in the middle of the night after 5 hours thinking about
it).
Comment 31 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2003-03-15 18:12:35 PST
I used the potential bustage fix since IRIX and OS/2 VACPP didn't like the other
way.
Comment 32 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2003-03-15 18:48:15 PST
And from the codesize diffs it looks like VC++ (Windows) compiled it but did the
wrong thing.
Comment 33 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2003-03-15 18:53:33 PST
Well, fix checked in to trunk (although given the state of optimization on
current C++ compilers, judging from codesize logs, I'm tempted to just convert
this to macros).
Comment 34 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2003-03-15 19:13:45 PST
Created attachment 117359 [details] [diff] [review]
macro-ization

Is this better or worse?  It probably messes up debuggers more, but it should
reduce the codesize penalty...
Comment 35 Robert O'Callahan (:roc) (email my personal email if necessary) 2003-03-15 19:59:18 PST
This is ugly ... are you going to just check this in to see what it does to code
size on various platforms?

I don't think you need to whack the names of 'this' and 'other', since they're
scoped to the enclosing block. That would reduce the ugliness a tiny bit.
Comment 36 Robert O'Callahan (:roc) (email my personal email if necessary) 2003-03-15 20:01:35 PST
The code looks good though, r+sr etc
Comment 37 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2003-03-15 20:02:32 PST
I vaguely remember some compiler not liking variables of the same name across
scoped blocks.  It's a pretty vague memory, though.

I'm pretty sure it will improve codesize.  It looks like Visual C++ isn't inlining.
Comment 38 Brendan Eich [:brendan] 2003-03-15 20:41:43 PST
Comment on attachment 117359 [details] [diff] [review]
macro-ization

It's better style to use PR_BEGIN_MACRO and PR_END_MACRO instead of braces --
then you can use ; after each call, even in an unbraced then statement between
if (condition) and else.

/be
Comment 39 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2003-03-15 20:52:34 PST
bryner said Windows builds were having serious problems even with the second
patch (I presume due to some compiler stupidity), so I checked in the macroization.

Note You need to log in before you can comment on or make changes to this bug.