Closed Bug 59246 Opened 24 years ago Closed 24 years ago

Segfault/dangling pointer after registering JS nsIWebProgressListener

Categories

(SeaMonkey :: UI Design, defect, P3)

x86
Linux
defect

Tracking

(Not tracked)

VERIFIED WORKSFORME
mozilla0.9

People

(Reporter: jag+mozbugs, Assigned: paulkchen)

References

Details

Attachments

(4 files)

I created a JS object in mini-nav.js which implements nsIWebProgressListener.

If I register this object with the DocShell's DocLoader, I segfault at browser
launch (-chrome chrome://embed/content/mini-nav.xul).

Sometimes it gets through that and actually displays a mini navigator, but if I
press reload or type an URL + <enter>, it segfaults after an assertion:
ASSERTION: bad param: 'p', file xpcwrappednative.cpp, line 442

Side-note: instead of a segfault I sometimes get "illegal instruction".

I've added |NS_WARNING|s in nsDocLoader and |dump|s in my js implementation
object and it happens somewhere between the |NS_WARNING| just before the
|listener->OnStatusChange(...)| call to my js object in |FireOnStatusChange|,
and the dump at the beginning of |onStatusChange|.

I'll attach mini-nav.xul and mini-nav.js.
jag, when you get this narrowed down a bit, let's figure out who should get this
bug.
bill, could you take a look at what's going on here?

assigning to law per don.
Assignee: don → law
OK; can I use a branch build or is the problem limited to the trunk?  Probably
doesn't matter, I'd guess.
I'm suspecting it doesn't matter. I've only tried trunk so far though.
I'm not getting this crash on today's trunk build with the attached files
I suck.

Sorry guys. I commented out the statement causing the crashing in the file I
uploaded. In mini-nav.js, look for: 
    //webProgress.addProgressListener(webProgressListener);

Uncomment that line. Damn, I suck.
How do I set it up so that chrome: url works?  I tried replacing those files in
mozilla\embedding\browser\chrome\content but all I get is lots of "Chrome
Registration:" messages and no windows open (the splash screen is left standing).

I didn't crash, though :-).
Well what I do is copy this over the mini-nav.* in
mozilla/chrome/embed/content/embed/, rm mozilla/chrome/*.jar and edit
installed-chrome.txt to s/\.*jar.//g (remove "jar:" and ".jar!"), then call
mozilla -chrome chrome://embed/content/mini-nav.xul

With the line commented out you should just get a strongly reduced version of
our well known browser. With the line uncommented, behaviour is undefined
(though you'll most likely crash).
This doesn't seem to work for me.  I changed the files in the source tree.  I 
also tried to edit installed-chrome.txt.  All I get is a splash screen which 
never goes away.  Here's my console log:

Type Manifest File: E:\mozilla\dist\WIN32_D.OBJ\bin\components\xpti.dat
nsNativeComponentLoader: autoregistering begins.
nsNativeComponentLoader: autoregistering succeeded
Initialized app shell component {33e569b0-40f8-11d4-9a41-000064657374}, 
rv=0x00000000
Initialized app shell component {18c2f989-b09f-11d2-bcde-00805f0e1353}, 
rv=0x00000000
WEBSHELL+ = 1
CSSLoaderImpl::LoadAgentSheet: Load of URL 
'file:///C:/WINNT/PROFILES/LAW/APPLICATION%20DATA/Mozilla/Users50/mini-nav/kb2nj
7gy.slt/chrome/userChrome.css' failed.  Error code: 18
CSSLoaderImpl::LoadAgentSheet: Load of URL 
'file:///C:/WINNT/PROFILES/LAW/APPLICATION%20DATA/Mozilla/Users50/mini-nav/kb2nj
7gy.slt/chrome/userContent.css' failed.  Error code: 18
plugins at: E:\mozilla\dist\WIN32_D.OBJ\bin\plugins
plugins at: C:\Program Files\Netscape\Communicator\Program\Plugins
WEBSHELL+ = 2
WARNING: chrome: failed to get base url for chrome://embed/content/mini-nav.xul 
-- using wacky default, file e:\mozilla\rdf\chrome\src\nsChromeRegistry.cpp, 
line 511
Enabling Quirk StyleSheet
Note: verifyreflow is disabled
Note: styleverifytree is disabled
Note: frameverifytree is disabled
WARNING: chrome: failed to get base url for 
chrome://necko/locale/necko.properties -- using wacky default, file 
e:\mozilla\rdf\chrome\src\nsChromeRegistry.cpp, line 511
###!!! ASSERTION: nsRepeatService not thread-safe: 'owningThread == 
NS_CurrentThread()', file e:\mozilla\xpcom\base\nsDebug.cpp, line 490

I must be doing something wrong.  Any ideas?
And that is with the line uncommented?

What if you leave the line commented?

Does the original mini-nav.xul load for you?

If you have the time, find me in #mozilla
nav triage team:

Should look into this, marking nsbeta1, mozilla0.9, reassigning to pchen
Assignee: law → pchen
Keywords: nsbeta1
Target Milestone: --- → mozilla0.9
I've reduced the testcase and moved the JS into the xul file, so all you need to
do now is put mini-nav.xul in chrome/comm/content/navigator/ (or add it to
chrome/comm.jar!/content/navigator/) and call mozilla with: ./mozilla -chrome
chrome://navigator/content/mini-nav.xul
jag, I get the same non-crash now with this as I got when we emailed before:

"It does not crash for me on NT4. I get a couple of
NS_ENSURE_TRUE failures in 
nsXULWindow::LoadTitleFromXUL and
nsXULWindow::LoadPositionAndSizeFromXUL because
GetWindowDOMElement fetches nsnull. When I 'ignore' those asserts
I get a full screen window that does not redraw, but does
shutdown the app when I hit the close box. Commenting out the
line you point out makes no difference."

If you can get a stack trace then please attach it.

You said it asserts in "xpcwrappednative.cpp, line 442". I see that this is in  
nsXPCWrappedNative::GetArbitraryScriptable(nsIXPCScriptable** p) which is the 
implementation of "readonly attribute nsIXPCScriptable ArbitraryScriptable;" 
from
http://lxr.mozilla.org/seamonkey/source/js/src/xpconnect/idl/nsIXPConnect.idl

Now, the interesting thing is that lxr does not show *any* code in the tree that 
actually calls this version of GetArbitraryScriptable - the version that takes a 
paramater as opposed to the versions that take no params.

http://lxr.mozilla.org/seamonkey/search?string=GetArbitraryScriptable

This implies that some virtual call somewhere may be either calling the wrong 
method on the right object, or maybe the wrong object altogether.

There could be something wrong with the code OR there may just be something 
wrong with your tree (has anyone else reproduced this yet?). A signature 
mismatch between caller and callee can cause this. An out-of-date xpt file can 
do funny things too.

Would it be possible for you to try this on a fresh clobber build?

Again, a stack trace would help.
jband: I just saw what you're seeing... I get that when I have mozilla already
running and then try to run mini-nav.xul. Could you try closing all mozilla
instances and then do this?
I don't know. I just tried it again (on NT) with a fresh debug build from 
today's tree. I verified that no other mozilla processes were running and I see 
the same non-crash I described above. I'm still waiting to hear that anyone else 
has reproduced this or see a stack.
alecf saw it, I'll ask him to attach a stack...
Okay, so I talked to jband and here's what's going on:

When you register the JS object, a native wrapper object is created, which gets
added to the list, and then disappears when you exit the function since there
isn't an owning reference keeping it alive.

To fix this AddProgressListener should use owning references, which also means
fixing other clients so they cleanly RemoveProgresslistener where appropriate.

Ugh.
what I don't understand is why a global variable isn't enough to keep a
reference. jband?
alecf: The JSObject will be rooted as a property of global object. But, the 
xpconnect wrapper that is built to be used by the native code lives only as long 
as its reference count is > 0. In this case, the wrapper is transparently 
created by xpconnect with a refcount of 1 to be used as an 'in' param to 
AddProgressListener. Since AddProgressListener does not addref the copy of the 
pointer it stores, xpconnect's Release on the wrapper causes the wrapper's 
destruction.

jag: I'm not so sure that forcing the use of a strong reference is the best way 
to go. If it were up to me I'd convert the progress listener stuff to use the 
same pattern as nsIObserverList. That system has all the support needed to 
transparently work with nsIObservers that support nsISupportsWeakReference (or 
not). You could look into cloning that pattern for use with 
nsIWebProgressListener. Then you'd need to make the other listeners either 
implement nsISupportsWeakReference or change their use to make them deal with 
the doc loader holding a strong reference.

Right this moment xpconnect wrapped JSObjects do not support 
nsISupportsWeakReference. But I'm working on it. See bug 66950.
I like jband's suggestion of using weak-when-available and strong refs, then
fixing up the callers. That probably wouldn't be particularily hard either.

My one concern is that when JS objects DO support weak refs, we'll be in a
different, broken situation: we'll call addProgressListener, the progress
listener won't hold a strong ref, neither will the JS wrapper class, and then
the object will just die, leaving us with an expired weak reference. we won't
crash, but the listener won't stay attached either.

However, I'm imagining that fixing bug 66950 is going to cause all sorts of
similar problems with other JS-implemented observer mechanisms, including
nsIObserverService and friends. (i.e. it's not specific to this bug, so jag we
should just go ahead with jband's suggestion anyway)
No, I belief the idea is that the JS wrapper won't go away as long as there's a
weak reference to the wrapper and the JS object isn't about to be GC'ed. 

Anyway, see bug 65925 which has already done most of the work of making
addProgressListener use weak references.

Marking this depend on that.
Depends on: 65925
I think alecf was refering to cases where there is nothing rooting the referent 
JSObject *except* the strong reference via xpconnect. When those references 
become weak then the JSObject is subject to gc. See my (2001-01-30 13:04) 
comment in bug 66950 about giving the JSObjects some control over whether or not 
a given wrapper will support nsISupportsWeakReference.
AddProgressListener now expects the listener to support weak references.

I guess we can wait for bug 66950 to be fixed, add strong ref support to
AddProgressListener like what's in nsObserversList::AddObserver, or do both.
Depends on: 66950
we should do both. we aren't dependant on 66950, and in fact we really need to
make sure addprogresslistener does do strong references in order to fix this.
QA Contact: sairuh → ckritzer
I thought this was basically fixed now that we support proper strong refs in
addprogresslistener?
We do?

All I see so far is proper support for weak refs.
argh, I meant proper weak, not proper strong :)
Yes, it is fixed in that we don't crash anymore :-) The wrapper goes away, and
thanks to the weak ref this is detected in a safe manner. Unfortunately, we
don't want the wrapper to go away in the first place ;-)

I could open a new bug on not being able to register a JS web progress listener
without creating a kung fu deathgrip nsISupportsArray from JS ;-)
66950 is now fixed; does this problem go away w/ that fix (and after you've made
your js object support nsIWeakReference)? observers should be held weakly (by
definition) IMO. If we're going to special case this to handle observers that
don't support weak refs, we should just pull the weak ref stuff altogether.
maintaining both models just adds confusion.

IMO, observers should be in complete control of their lifetime, we shouldn't be
gripping them to death.
Well, the original problem is fixed, we don't crash any longer.

But, the native wrapper object for the JS listener object still goes away
because there's nothing holding a strong reference to it, so my problem isn't
solved yet.

Also, since you need to hold a strong reference to the native wrapper object,
you can't just do something like this from JS:

var gListeners = []; gListeners.push(myListener);

since that will only hold a JS reference (? I need to improve my vocabulary!) to
the JS object, not the native wrapper.

One alternative solution is to do something like this from JS:
1) create nsISupportsArray
2) add listener to supports array (strong ref to wrapper object)
3) register listener

and then at an appropriate time the listener could be unregistered and should be
removed from the supports array.

This of course is assuming our code reuses the wrapper object created for the
reference in the nsISupportsArray.

> If we're going to special case this to handle observers that
> don't support weak refs, we should just pull the weak ref stuff altogether.
> maintaining both models just adds confusion.

Understood. Note however that this is a generic problem when JS objects need to
register themselves with a non-JS object which won't hold a strong ref. Unless
something other object is holding a strong reference to the native wrapper
object, the wrapper object will go away, probably much to the JS object's
surprise.

I would love to see a more generic and less ugly solution than
gKungFuDeathGripISupportsArray.

Thoughts? Suggestions?
Trying to merge garbage collecting w/ passive/weak references is going to lead
to ugliness :-/. IMO, if an implementation wants to use weak references, it
should, and it's the users/callers challenge to work w/ that. If the user/caller
*needs* a strong ref somewhere, then it's their responsibility to provide one
(nsISupportsArray for example) and clean it up when the appropriate time comes.
I know that's kludgy, but there's got to be sacrifice somewhere when you try and
merge these two models.

The alternative, as far as I can tell, is to revert our listener list impls to
be strong refs. If we go this route, I'd argue we should make listener lists,
observerlists and nsIObserverService (which is probably using observerlist under
the covers anyway) strong refs only also. the current nsObserverList.cpp impl is
kludgy. I can support weak ref or not, we should pick one and go w/ it,
otherwise it's confusing.

Personally, I think that listeners/observers should be able to own their
lifespan. listening or observing is a passive operation. I'd also argue that
banking on observer/listener registration to maintain your lifespan is wrong.
For example, you don't necessarily know when that list will go away, and it may
go away at an in-opportune time for you; point being the semantics surrounding
the observer/listener list's lifespan aren't even defined (though we can make
some decent assumptions most of the time).
Agreed. I'm all for weak references, but then we should document this problem
and a clean solution for it somewhere.

Also, this point may actually be moot. I'm now running with a simplified browser
which registers itself without holding a strong ref to the native wrapper
object, and it just works, the wrapper object isn't going away and I'm getting
progress notifications. I guess I'll go read bug 66950 once more to see if I
missed something, or that I'm just being extremely lucky and this is an accident
waiting to happen :-)
Yes jag, maybe you were just confused? Or maybe there is a bug in my support in 
xpconnect? OR maybe your listener service is not really requesting a weak 
reference beofore letting the storng reference free? See the comment...
http://lxr.mozilla.org/seamonkey/source/js/src/xpconnect/src/xpcwrappedjs.cpp#97
...that says how the xpconnect support is supposed to be working. If you find 
something wrong with that functional;ity then let me know.

Also note that the reference you are holding in JS to your JS object will be 
wiped away when the global object (window) properties are cleared when the 
document is unloaded.
jband: I think I was just confused.

So let me verify this:
1) JS objects can now add nsISupportsWeakReference to their QI implementation so
a weak reference can be held to them, and if the JS object is GC'ed, the wrapper
object will go away cleanly.
2) If a weak reference is held to the wrapper object, the wrapper object won't
go away until the wrapped JS object is about to be garbage collected.
Correct.
Yay! Thanks!

Closing bug as WFM.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → WORKSFORME
Uh...

Why was this QA_Assigned to me? Not JS Security...
;-)

jrgm?  Sounds like your expertise...correct me if I'm misinformed, sending to
wrong person, etc.

And I promise, no eggnog has been harmed in the process <grin>
QA Contact: ckritzer → jrgm
Lots of verifications.

If I had to test stuff, it was on Win2K, build id 2001052404.
Status: RESOLVED → VERIFIED
Product: Core → Mozilla Application Suite
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: