Closed Bug 93603 Opened 23 years ago Closed 23 years ago

win32/linux/mac post-8/03 build runaway thread at end of ibench test

Categories

(Core :: Networking: HTTP, defect, P1)

x86
Windows 98
defect

Tracking

()

RESOLVED WORKSFORME
mozilla0.9.8

People

(Reporter: jrgmorrison, Assigned: darin.moz)

References

()

Details

Attachments

(1 file)

I've run today's windows build twice on win98/266MHz/128MB for the ibench
HTML loading test, and both times, at the end of the test, at the point it
is submitting the final results (to show the final measurement):

a) the form submission fails to retrieve the response document, and the 
   page is left saying "Posting results ..."

b) one of the necko threads is off churning at 95+% of CPU. (Or, by inference
   it is a necko thread, since the UI thread is almost completely starved, 
   e.g., opening the File menu takes a couple of dozen seconds to respond).

Note: also with today's builds/tests, 

(a) the Mac build failed to complete this test (once again, bug 81480 "[mac] 
Results page not being posted to client after ibench run" and/or bug 91725 
"7/20 Mac trunk "hangs" running page-loader test; won't complete full test"

and, (b) bug 93561 "Cache is not being created (for a new profile; as of 8/03 
builds)" which is mac/linux/win32.
Severity: normal → major
-> me
Assignee: neeti → darin
Hung again tonight with 8/06 am build on win98. However, I did try to reproduce 
on win2k/opt. build, but didn't hang.
Darin, any insights?  We need to have working ibench.
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla0.9.4
Okay, so with today's build (with the cache back in place), this did not 
"freeze" on Windows. However, on both win98, and on Linux, there is a thread
spinning away at ~66% of CPU, while the browser is just sitting idle on the 
results page. [It just occurred to me, Doh, and what happens if you load 
another page, say about:blank ... well, I'll try that next time].

As for the Mac ... I think it has this same bug, although I don't have a 
ps/top like tool on the Mac to see the details. At any rate, the Mac is 
completely locked up, and has to be force-quit.
Summary: windows 8/03 build freezes at end of ibench test → win32/linux/mac post-8/03 build runaway thread at end of ibench test
i'm not able to reproduce this on my winnt build from yesterday evening... i
wonder if this is somehow a side-effect of the NSPR 4.2 landing.  jrgm: can you
still repro this?
Oh, ****. I didn't specifically check the CPU on linux/win32 at the end of the 
test last night, as I had been doing (monotony and all). I'll check later 
today.

[Note: Mac, though, was completely locked up and had to be force-quit. That may 
be this bug, or it may be something unique to the Mac].
So, darin and I looked at an opt. build on the test machine while this was 
in "spin mode" at ~70% cpu. It turns out that some js/dom code is being 
triggered repetitively, on the main thread, off of a timer. I'll attach 
the output of gdb in a moment. 

Punting to pavlov, to see if there are any recent timer changes that might 
be involved in this. 
Assignee: darin → pavlov
Status: ASSIGNED → NEW
nope, no timer changes in a long time.
Assignee: pavlov → darin
jonny, can you help take a look at this bug, especially the stack trace attached?
Unfortunately the attached stack doesn't help much, it shows that some JS is
being executed and JS is calling in through XPConnect. It looks like a JS
timeout or interval is executing, but the stack has bogus symbol names in it (an
aweful lot of NSGetModule() calls, which must be bogus).

One thing I can think of that could help debugging this a bit is to reproduce
this again, break in the debugger and make sure some JS code is running and then
calling DumpJSStack(); (defined in nsXPConnect.cpp) from the debugger to see
what JS code is currently executing.

Can someone who's able to reproduce this try that and put the output of
DumpJSStack() here?

(I'll be out of town most of this week so I won't be very responcive but I'm
hoping I'll be able to check email at least once a day.)
I'll give this a try, but can you elaborate on how to call DumpJSStack()?
Do I need to load something in gdb, or ...
As long as libxpc3250.so is loaded, which it must be if you have xpconnect on
the stack, then calling the method DumpJSStack(); in gdb should be a matter of
just typing:

  DumpJSStack();

or:

  print DumpJSStack();

(I don't know gdb very well, but according to akkana the above should work). In
devstudio calling a function is as simple as typing the function call in a watch
window.

DumpJSStack() will print the JS stack on the console.
actually, in GDB you'd want:

(gdb) call DumpJSStack()

note: no semicolon
Heh, DumpJSStack is debug-only (you knew that; I didn't). I was getting this 
in a opt. build during my normal run of this test, and that's where we got 
that stack trace. Guess we'll need to try a debug build. Can anyone put a 
tarball out somewhere that I can grab. Otherwise, I'll need to kick off a 
build for myself.
jrgm: i have a debug linux build... i can try to debug this further tomorrow.
Cool. If you can't reproduce on your system, can you tar up dist and I'll try 
again on this low-end machine.
so, i can't repro this with a linux debug build from today.  perhaps we could
try it on the lab machine, but i really suspect that this is an "optimized
version" only bug.
wasn't able to repro this with a debug win32 build either.
jrgm: are you still seeing this?

pushing out to 0.9.5
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Yeah, I still see this. Given that closing the current window will stop this 
spin, I don't consider this a super, major bug, but it would be good to know 
what the heck is happening.
Blocks: 99142
Gagan/Darin - Should we remove mozilla0.9.4 keyword, sicne this is now targted
for 9.5? Is this a stop ship bug (i.e. should we nsbranch+ it)?
yup. removing 9.4 keywords since this is now 9.5
jrgm: you are still seeing this right?  if so, then this is definitely a nsbranch 
candidate.
Status: NEW → ASSIGNED
I need to specifically check this again, but yeah, I think this is still 
happening. I will check later today.
Interesting -- bug 101870 sounds very much like this.
hmm.. but this bug is/was definitely XP.
bumping this one _again_ (why does it only show up on the slower systems?)

-> mozilla 0.9.6
Target Milestone: mozilla0.9.5 → mozilla0.9.6
not seeing this as a high priority right now.
Target Milestone: mozilla0.9.6 → mozilla0.9.8
can we close this one?  it doesn't seem like anyone has reported anything
similar in ages... marking WORKSFORME.  please reopen otherwise.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → WORKSFORME
haven't seen it recently because I've been a lazy bastard :-]
Will reopen if I can reproduce this again (by like running that test).
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: