Closed Bug 392049 Opened 17 years ago Closed 17 years ago

Camino trunk orange on Tdhtml after bug 386265

Categories

(Camino Graveyard :: General, defect)

PowerPC
macOS
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mark, Unassigned)

References

()

Details

Attachments

(1 file)

The Camino trunk tinderbox has turned orange due to not completing Tdhtml.  The DHTML performance test completes the first two pages (Gecko Technology Text Transitions Fades Without Opacity and the diagonally-moving ball thing) but never makes it to the third page (Netscape Communications blahblahblah).  This test runs by directing the browser to http://www.mozilla.org/performance/test-cases/dhtml/runTests.html .  I have verified that this problem is not isolated to the tinderbox, and occurs with current trunk builds, such as the official trunk nightlies (http://ftp.mozilla.org/pub/mozilla.org/camino/nightly/latest-trunk/Camino.dmg).

Tinderbox page showing the last green and the first orange build (we're interested in boxset CmTrunk): http://tinderbox.mozilla.org/showbuilds.cgi?tree=Camino&hours=24&maxdate=1186891082&legend=0

The checkin window:

http://bonsai.mozilla.org/cvsquery.cgi?module=Camino&date=explicit&mindate=1186863120&maxdate=1186865099

That implicates bug 386265.
What happens with camino when it is pointed to http://www.mozilla.org/performance/test-cases/dhtml/runTests.html ? Does it go to an endless loop iterating through all the tests? Or does it stops at a particular test?
See comment 0.
(In reply to comment #2)
> See comment 0.
> 
Sorry for that, I was reading the comment too quickly. Still few more questions:

What happens when one opens the second and the third tests separately? Does they complete normally?

Are there any errors/warnings in the error console?

Is it possible to get a stack tarce when the second test stops?
Attached file "sample" output
Here's a sample of the app when it's stopped doing anything.

The second test is diagball, the third test is fadespacing.  See http://www.mozilla.org/performance/test-cases/dhtml/ .

If the third test (http://www.mozilla.org/performance/test-cases/dhtml/fadespacing/index.html) is loaded individually first, it operates correctly.  If runTests.html is then used, the third test operates correctly in that environment.

If runTests.html is loaded first, the test will stop after diagball, stuck on a white screen, without any fadespacing activity.  fadespacing is loaded, evidenced by "view frame source."  The application is quiescent, without any activity, as evidenced by the attached sample.  The application responds normally, but will not perform the fadespacing test, even if navigated to directly (as above).

There's nothing in the console, but I'm running a release build.  There's nothing interesting on any of the stacks, but let me know if you'd like to try breaking at a specific moment.
Also: if I manually load diagball first, then I can't manually load fadespacing.  (This simulates what runTests.html does, but is slightly reduced.)

The animation in diagball is not smooth: the ball stops once or twice along the way.
My current idea is that the optimization from bug 386265 affected the memory allocation pattern exposing a hidden bug. But this is just a wild guess.

This to proceed farther it is necessary to cut the test case to find out which JS triggers the bug. The idea is to run the tests from the local file system and then cut its pieces to find a minimal set that triggers the regression. Unfortunately I can not do it since since do not access to a Mac.

Mark, could you make a local build without the patch and verify that it was indeed caused the problem? 

Here is the list of command to remove the patch:

cvs update -j1.35 -j1.34 mozilla/js/src/jsxdrapi.c
cvs update -j3.46 -j3.45 mozilla/js/src/jsstr.h
cvs update -j3.160 -j3.159 mozilla/js/src/jsstr.c
cvs update -j3.153 -j3.152 mozilla/js/src/jsscript.c
cvs update -j3.48 -j3.47 mozilla/js/src/jsscope.h
cvs update -j3.67 -j3.66 mozilla/js/src/jsscope.c
cvs update -j3.27 -j3.26 mozilla/js/src/jsprvtd.h
cvs update -j3.261 -j3.260 mozilla/js/src/jsopcode.c
cvs update -j3.370 -j3.369 mozilla/js/src/jsobj.c
cvs update -j3.71 -j3.70 mozilla/js/src/jsiter.c
cvs update -j3.74 -j3.73 mozilla/js/src/jsgc.h
cvs update -j3.234 -j3.233 mozilla/js/src/jsgc.c
cvs update -j3.209 -j3.208 mozilla/js/src/jsfun.c
cvs update -j3.272 -j3.271 mozilla/js/src/jsemit.c
cvs update -j3.100 -j3.99 mozilla/js/src/jsdbgapi.c
cvs update -j3.76 -j3.75 mozilla/js/src/jsatom.h
cvs update -j3.106 -j3.105 mozilla/js/src/jsatom.c
cvs update -j3.347 -j3.346 mozilla/js/src/jsapi.c
cvs update -j3.161 -j3.160 mozilla/js/src/js.c
cvs update -j1.126 -j1.125 mozilla/js/src/xpconnect/src/nsXPConnect.cpp
cvs update -j1.338 -j1.337 mozilla/dom/src/base/nsJSEnvironment.cpp
Mark, but why it was decided that it was bug 386265 that caused the problem? 

I do not see a difference between the log before my checkin,
http://tinderbox.mozilla.org/showlog.cgi?log=Camino/1186739820.26624.gz

and after,

http://tinderbox.mozilla.org/showlog.cgi?log=Camino/1186830840.24455.gz

The check in itself came with a bunch of other bugs, http://bonsai.mozilla.org/cvsquery.cgi?module=Camino&date=explicit&mindate=1186743781.
Igor, the tinderbox we use for trunk tests is boxset.  maya builds the trunk but tests are currently broken.  Please look at the boxset trunk build.  The last green build began at 08-11 13:12.  Your checkins were at 13:25 and 13:40.  The first orange build began at 08-11 13:45.

Last green build: http://tinderbox.mozilla.org/showlog.cgi?log=Camino/1186863120.12857.gz&fulltext=1

First orange build: http://tinderbox.mozilla.org/showlog.cgi?log=Camino/1186865100.16690.gz&fulltext=1

The cvs update log from the orange build confirms that your changes were the only changes between these two builds (except for a packager.mk change by bsmedberg that does not affect us).
Still, if a Camino builder could try reverting Igor's patch and reproducing the failure, that would be great.

If it's indeed a latent impurity that was exposed by Igor's change, it will be hard to track down without some memory safety analysis. What does Mac offer? I'd reach for valgrind on Linux, purify on Windows. Perhaps just setting some Malloc* envariables would turn up something?

/be
(In reply to comment #10)
> Still, if a Camino builder could try reverting Igor's patch and reproducing the
> failure, that would be great.

That would be really nice. 

I checked the patch one more time and modified the code so interning strings and doubles is forced to search the hashtable the second time. This allowed to test otherwise very unlikely to be taken code path when GC during allocation of a new string/double would increase JSAtomState.tablegen. That have not shown any regressions.
Yes, a local backout following the "cvs update" commands suggested by Igor, plus an additional "cvs up -j3.75 -j3.74 mozilla/js/src/jsgc.h", turned the test back to green again.  You can't currently see these results on the tinderbox page because its SMTP relay is currently rejecting mail.  That's a separate issue I'm working on.
Thanks, Mike.

Now is it possible to cut http://www.mozilla.org/performance/test-cases/dhtml/runTests.html to a minimal JS that still points the problem? For example, if it modifies to perform just  "diagball" and "fadespacing" withot performace measures, would it still show the problem? After that minimal "diagball" and "fadespacing" would be nice.


Er, it's Mark.

Yes, I've tested diagball and fadespacing back-to-back, and that does show the problem (comment 5).
(In reply to comment #14)
> Er, it's Mark.

Sorry about that.

> 
> Yes, I've tested diagball and fadespacing back-to-back, and that does show the
> problem (comment 5).

But could you also cut the test cases to see which exactly JS causes the problem with the tests?
For example, given the current source for the boll example:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html><head>
<!-- MOZ_INSERT_CONTENT_HOOK -->
<title>Moving Ball</title>
  
  <script type="text/javascript">
    var xPos = 0;
    var objBall;
    
    function mzDhtmlStart()
    {
      moveBall();
    }   
    
    function moveBall()
    {
      objBall = document.getElementById("ball");
      objBall.style.left = xPos + 'px';
      objBall.style.top = xPos + 'px';
      xPos++;
      if (xPos<=150)
        setTimeout("moveBall()", parent == window ? 0 : parent.settime);
      else
        if (window.parent != window) window.parent.mzDhtmlStop();
    }
  </script>
</head><body onload="if (window.parent == window) { mzDhtmlStart(); }">
  <img src="index_files/ball.gif" width="70" height="70" id="ball" style="position: absolute; left: 0px; top: 0px;">
</body></html>


would the following modification still stop the fading from working as described in comments 5:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html><head>
<!-- MOZ_INSERT_CONTENT_HOOK -->
<title>Moving Ball</title>
  
  <script type="text/javascript">
    var xPos = 0;
    var objBall;
    
    function mzDhtmlStart()
    {
      moveBall();
    }   
    
    function moveBall()
    {
      objBall = document.getElementById("ball");
      objBall.style.left = xPos + 'px';
      objBall.style.top = xPos + 'px';
      xPos++;
      if (xPos<=150)
        setTimeout("moveBall()", parent == window ? 0 : parent.settime);
    }
  </script>
</head><body onload="mzDhtmlStart();">
  <img src="index_files/ball.gif" width="70" height="70" id="ball" style="position: absolute; left: 0px; top: 0px;">
</body></html>

Bug 392305 is reproducible regression on my box. I will work on it first. 
Depends on: 392305
Igor, your suggested modification makes no difference.

The only thing I've found that makes a difference is by adjusting the upper limit on xPos.  If the value at diagball/index.html:21 is 12 or less, the fadespacing test is subsequently able to run.  If the value is 13 or more, fadespacing cannot be performed.

It makes no difference whether the moveBall function operates on objBall or not.  The bug is still encountered if diagball/index.html is reduced to:

--
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html><head>
<!-- MOZ_INSERT_CONTENT_HOOK -->
<title>Moving Ball</title>
  
  <script type="text/javascript">
    var xPos = 0;
    
    function mzDhtmlStart()
    {
      moveBall();
    }   
    
    function moveBall()
    {
      xPos++;
      if (xPos<=150)
        setTimeout("moveBall()", 0);
    }
  </script>
</head><body onload="mzDhtmlStart();">
</body></html>
--

Also note that for a period of time after a problematic diagball test is loaded, the UI becomes slightly laggy from time to time for several seconds - if I put the cursor in the location bar and hold down the delete key, characters are not deleted at a consistent rate, but there are periodic pauses.  I haven't yet sampled to see what's going on in there.
No longer depends on: 392305
(In reply to comment #18)
> The bug is still encountered if diagball/index.html is reduced to:
> 
> --
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
> <html><head>
> <!-- MOZ_INSERT_CONTENT_HOOK -->
> <title>Moving Ball</title>
> 
>   <script type="text/javascript">
>     var xPos = 0;
> 
>     function mzDhtmlStart()
>     {
>       moveBall();
>     }   
> 
>     function moveBall()
>     {
>       xPos++;
>       if (xPos<=150)
>         setTimeout("moveBall()", 0);
>     }
>   </script>
> </head><body onload="mzDhtmlStart();">
> </body></html>
> --

What would happen if you rename xPos and/or mzDhtmlStart here?
I hope this should be fixed now with a fix from bug 392305.

Mark, sorry for all the time and energy you have to spend helping to track this down.
boxset is green again.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Thanks for the fix, Igor.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: