Closed Bug 78428 Opened 23 years ago Closed 23 years ago

Browse during install results in crash/freeze

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla0.9.1

People

(Reporter: jimmykenlee, Assigned: jband_mozilla)

Details

(Keywords: crash, regression, Whiteboard: patch; r=?; sr=brendan)

Attachments

(5 files)

Build: 2001-05-01-06-trunk(WIN), 2001-05-01-04-trunk(MAC), 
2001-05-01-08-trunk(LINUX)

1. From http://jimbob/trigger3.html, enter manyfiles.xpi in Trigger URL field
2. Click Trigger button
3. Click OK button from confirm dialog
4. During installation, use browser to open new site

RESULT:
Freeze/crash.

EXPECTED RESULT:
No crash.  Browser may be used.  Install completes gracefully.
Nominating for Beta.  Browser use is expected especially during longer 
downloads and installations.  I wonder if this is related to Bug 76396.
Keywords: crash, nsbeta1
Keywords: nsbeta1nsbeta1+
Target Milestone: --- → mozilla0.9.1
manyfiles.xpi? I get a blank dialog (e.g., no packages to install). 
Let me clarify.

http://jimbob/jars/manyfiles.xpi
Keywords: regression
Whiteboard: [smartupdate]
Why is this bug marked as "smartupdate"? Does this only occur when you get the 
browser from SmartUpdate, or does it happen regardless of where you get it from 
(eg. in the XPInstall engine)...
None of the bugzilla bugs are only about the SmartUpdate site otherwise they 
would be filed in bugscape. These are bugs in the product which are of interest 
to the smartupdate team (usually because they will block its use).
Target Milestone: mozilla0.9.1 → mozilla0.9.2
New info... You don't actually have to open a new site.  Simply got the the url
bar and start typing in a new URL.  The program will eventually freeze.  No
crash, just freeze like it's in an infinite loop someplace.
What's the main thread doing?

/be
Looks like a deadlock.  Several threads are in PR_Wait conditions.
Ok, looks like more JS_SuspendRequest/JS_ResumeRequest foo.  I just bracketed
the AddSubcomponent stuff with sus/res and I'm able to run Jimmy's manyfiles.xpi
test while browsing around.  I'll post the diff in a sec.
dbragg: that's not the main thread, it's the memory flusher thread (as its lower
stack frames suggest).

/be
Sorry 'bout that.  I saw nsThread::Main
 and didn't look above it.

Do you still need to see the main thread stack?  Bracketing the xpinstall call
in JS_SuspendRequest and JS_ResumeRequest fixed this.  I also noted that our
execute function wasn't bracketed.  I thought I'd fixed that a while ago.
I believe you, bring on the patch with more suspend/resume!  But I like to see
all threads' stacks in a deadlock, otherwise we're ass-u-ming.

/be
You'll notice in the patch that I also surrounded the Execute native call with
the JS_Suspend/JS_Resume stuff.  I had thought that went in a while ago as it
was related to another bug.  Anyway, Execute could be a blocking call so it
should be bracketed by suspend/resume.
Yeah, deja vu -- didn't you fix this a milestone or two ago, at least for
Execute?  Oh well, r/sr=brendan@mozilla.org.

/be
Whiteboard: [smartupdate] → [smartupdate] requested drivers a= 4:41PM 5/23 needs another review?
a= asa@mozilla.org on behalf of drivers.
Whiteboard: [smartupdate] requested drivers a= 4:41PM 5/23 needs another review? → [smartupdate] fix ready to check in when tree opens.
Fix checked in.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Build: 2001-05-30-06-trunk(WIN), 2001-05-30-04-trunk(MAC), 
2001-05-30-08-trunk(LINUX)

Appears to look fine for Windows NT and Windows 98.

However, we still have a problem with Macintosh and Linux.  We still crash and 
burn.

Linux
Incident ID: 31104452

Stack Trace
gc_find_flags() 
js_MarkGCThing() 
js_MarkGCThing() 
js_MarkGCThing() 
js_MarkGCThing() 
js_GC() 
js_ForceGC() 
JS_GC() 
nsJSContext::GC() 
GlobalWindowImpl::SetNewDocument() 
DocumentViewerImpl::Init() 
nsDocShell::SetupNewViewer() 
nsWebShell::SetupNewViewer() 
nsDocShell::Embed() 
nsWebShell::Embed() 
nsDocShell::CreateContentViewer() 
nsDSURIContentListener::DoContent() 
nsDocumentOpenInfo::DispatchContent() 
nsDocumentOpenInfo::OnStartRequest() 
nsStreamListenerTee::OnStartRequest() 
nsHttpChannel::ProcessNormal() 
nsHttpChannel::ProcessResponse() 
nsHttpChannel::OnStartRequest() 
nsOnStartRequestEvent::HandleEvent() 
nsARequestObserverEvent::HandlePLEvent() 
PL_HandleEvent() 
PL_ProcessPendingEvents() 
nsEventQueueImpl::ProcessPendingEvents() 
event_processor_callback() 
our_gdk_io_invoke() 
libglib-1.2.so.0 + 0x101a0 (0x403601a0) 
libglib-1.2.so.0 + 0x11987 (0x40361987) 
libglib-1.2.so.0 + 0x12001 (0x40362001) 
libglib-1.2.so.0 + 0x121cc (0x403621cc) 
libgtk-1.2.so.0 + 0x93e57 (0x40278e57) 
nsAppShell::Run() 
nsAppShellService::Run() 
main1() 
main() 
libc.so.6 + 0x1bb65 (0x4048db65)

Macintosh
Incident ID 31104588

Stack Trace
JavaScript + 0x22d20 (0x1eb781b0) 
JavaScript + 0x22dac (0x1eb7823c) 
JavaScript + 0x22dac (0x1eb7823c) 
JavaScript + 0x22dac (0x1eb7823c) 
JavaScript + 0x22dac (0x1eb7823c) 
JavaScript + 0x2321c (0x1eb786ac) 
JavaScript + 0x22efc (0x1eb7838c) 
JavaScript + 0x30f0 (0x1eb58580) 
DOM_DLL + 0x4c70 (0x1e59c2f0) 
DOM_DLL + 0x7b0c (0x1e59f18c) 
CONTENT_DLL + 0x1bed8c (0x1dfb742c) 
WEB_DLL + 0x21078 (0x1e609fb8)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Brendan, I really don't know what to do here.  The bug fix worked on windows and
it's XP.  But according to Jimmy it still doesn't work on Mac and Linux.  Any ideas?
More info...

The Linux mozilla build crashes similar to the commercial build.  However, it 
seems that Macintosh mozilla build does not crash unlike the commercial build.
Still more info!  It's not browsing that causes the crash.  You can reproduce
the crash by just switching between the Download Dialog on the main window.
Steps to reproduce:
1. Run http://jimbob/trigger3.html
2. Enter http://jibob/jars/manyfiles.xpi in the Trigger URL field and click the
Trigger button.
3. Confirm the install
4. When the Download dialog gets to the "Finishing Install Please Wait" phase
click on the browser window to bring it forward.
5. CRASH with:

/u/dbragg/mojo/mozilla/dist/bin/run-mozilla.sh: line 72:  1383 Segmentation
fault      $prog ${1+"$@"}

The finalize install is already "bracketed" by the
JS_SuspendRequest/JS_ResumeRequest code so that ain't it.  I suspect this is a
windowing problem on Linux.
Hmm. I don't know what to say. I hate bugs that don't bite windows too. The 
stack implies that we've got a bogus object being marked. The Mac output hints 
that xpconnect might be involved. I'll give this a try under Purify on Windows 
and maybe a clue will appear.
No clues from Purify.

Don: I don't see "Finishing Install Please Wait" on Windows. I just see the 
dialog come up with a "resolving host jibob" in the status bar and then go away 
fairly quickly. Clicking before or after this had no effect. Is there different 
UI behaviour on the non-Windows platform in this regard? or Am I doing something 
wrong?
Thanks for looking at this John.  The behavior should be the same across
platforms.  I was describing the Linux problem.  Try it twice in a row.  I
forgot that part.  (sorry, I've run the test so many times) When you run it more
than once, the "finalize" step takes a lot longer because all the files are
already on the system and the xpinstall code has to deal with it.

Did you see the manyfile directory off of bin and was it full of files?  Did you
even crash?
Don: It looks like the reason it was so fast was that your url has a typo:

 s/jibob/jimbob/

I just copy and pasted the text and didn't notice. Interesting that this fails 
silently when it can't resolve the host!

I'll try again.
I'm chagrined.  Sorry 'bout that John.  (did I spell chagrined right?) ;-)

Muchos Gracias again.
Just a refinement of my previous info...
Running the test and leaving the Download dialog in front or brining the browser
window to the front BEFORE the finalze step all works fine.
The following will crash:
1. Bring browser to the front during the finalize step
2. If browser is already in front, clicking in the url bar during the finalize
step crashes.  Clicking anywhere else seems to be ok.
In looking at mozilla/xpinstall/src/nsJSInstall.cpp I saw that FinalizeInstall
is part of the JSClass InstallClass...

JSClass InstallClass = {
  "Install",
  JSCLASS_HAS_PRIVATE,
  ...,
  FinalizeInstall
};

I think dveditz wrote this code originally so I don't know what the reasoning was.

Could this have something to do with the odd crashing?  I think that we used to
have the FileOp object in there too and I had to remove it and call
JS_DefineProperty to "add" it to the install object.  (Please forgive my
nomenclature here.  I'm pretty far from a javascript engine expert.)

Might we have to do something similar with FinalizeInstall?
(I'm not here; you didn't read this; but danm should help if he can -- this now
strongly reeks of unbalanced JS_PushArguments/JS_PopArguments or some similar
such OpenDialog foo.

/be)
I've got this stopped in the NT debugger with a Purify build at the point where 
it does an ABR in js_MarkGCThing. Some of the local vars are hard to puzzle out 
(optimized into registers). But it seems to be reading off the end of the slots 
array. The type of the object is Function. It has 5 slots (measured by 
map->nslots and obj->[slots-1] and the contents of what the slots array points 
to in memory). It *is* its own scope's object so js_Mark is using map->freeslots 
to set the 'end' pointer that will terminate the slot marking loop. But 
map->freeslots is 6! I have to figure that the reserved slot logic is screwed up 
(we reserve two slots for functions). I have to leave right now, but I'll get 
back to this and see if I can work it out.
AFAICT there is no logic error in manipulating the freeslot value. The thing 
that changed with the addition of the reserved slots code (I think) is the idea 
that freeslots could *ever* be larger than nslots. This is an initial state 
for Function object now. Some of the places that should care about this state 
were not updated. The thing that surprised me is that it is so hard to find 
objects that are in this state. In our tree only Function object maps are going 
to be in this state at create time. And we generally add props to them right 
away. I have yet to puzzle out exactly how this testcase produces objects still 
in this state at gc time. The case I can see is a Function for js_obj_toSource 
and its global (in the parant chain) is the "Install" object.

Leaving aside my curiousity about why this is not more common, I have a proposed 
fix I'll attach. It uses JS_MIN to get the lower of freeslot or nslots in the 
places I could see where it matters. I also got rid of a use of JS_MAX in a 
place where its result was always predetermined.
Ok, now we just need someone more qualified than me to do the review on this. ;-)
This is fallout from the reserved slot extension, from bug 73843.  Sorry about
that, I can't recall what I was thinking.  Jband's patch looks good to me,
sr=brendan@mozilla.org.

/be
First of all, thanks John for tracking this down!  Should this bug be assigned
to you now?
reassigning to me. I requested review from bread and rogerl. I suppose we want 
this on the mozilla0.9.1 branch?
Assignee: syd → jband
Status: REOPENED → NEW
Component: Installer: XPInstall Engine → Javascript Engine
Indeed we do!  Please check it in to both the branch and trunk!
Keywords: patch
Whiteboard: [smartupdate] fix ready to check in when tree opens. → [have fix]
Target Milestone: mozilla0.9.2 → mozilla0.9.1
Whiteboard: [have fix] → patch; r=?; sr=brendan
r=beard
a= asa@mozilla.org for checkin to the 0.9.1 branch and the trunk.
(on behalf of drivers)
fix checked in to 0.9.1 branch and trunk.
Status: NEW → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
Build: 2001-06-05-11-0.9.1(WIN), 2001-06-05-03-0.9.1(MAC), 
2001-06-05-04-0.9.1(LINUX)

Very nice.  This looks great on the branch.

Adding vtrunk keyword to be verified on trunk.
Keywords: vtrunk
verified in builds 
win 20010062204 trunk, 
mac 2001062503 (branch and trunk)
linux 2001062508 trunk, 2001062706 branch
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: