Closed Bug 116232 Opened 23 years ago Closed 23 years ago

[viewpoint] crash because plugin is destroyed during setwindow call

Categories

(Plugins Graveyard :: Viewpoint Media Player, defect)

x86
Windows NT
defect
Not set
critical

Tracking

(Not tracked)

VERIFIED INVALID

People

(Reporter: aberger, Assigned: peterlubczynski-bugs)

References

()

Details

(Keywords: crash, topembed)

Attachments

(7 files, 3 obsolete files)

From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) BuildID: We need to get the URL of the html page that is starting up our plugin. During the first SetWindow call that we get, we make a GetURLNotify call on "javascript:doucment.location". 99% of the time this works great. On one piece of content only (I can't find anything special with the content- is this just a race condition?) this causes Mozilla to crash. Here is what I have found: Our GetURLNotify goes through the callstack to Our GetUrlNotify goes down through your call stack (attached) to nsHTMLDocument::FlushPendingNotifcations. It hits the lines: if ((isSafeToFlush) && (mParser)) { nsCOMPtr<nsIContentSink> sink; // XXX Ack! Parser doesn't addref sink before passing it back sink = mParser->GetContentSink(); if (sink) { result = sink->FlushPendingNotifications(); } } In 99% of our content, we don't go into this branch because mParser has already been used and set to NULL. On this piece of content, it is still a valid pointer so we go into that branch. This leads us up the stack ContentRemoved, RemoveFrame, nsObjectFrameDestroy, up to a call to NPP_Destroy. We don't like this call to NPP_Destroy in the middle of our SetWindow call. You like it even less because as the call stack unwinds, we get back to our setwindow call, but we have already been destroyed! Once we return you crash. So I have 2 questions: 1) Why is the call causing the page to get destroyed? Can I get around this by specifying some target? 2) Is there any way you can avoid this problem in general (destroying a plguin while it is being called?) Is there some refcounting problem? Reproducible: Always Steps to Reproduce: 1.Install our plugin (see http://cole.viewpoint.com/~aberger/readme.txt) I have gotten one report from av that he has beeen having problems hitting files on our server. If others have similar problems, please let me know and I will try to find another server to put our test files on). 2.View URL: http://cole.viewpoint.com/~ddavies/gecko/interactor1/index.html Actual Results: Browser crashes. Note: I am using the 0.9.4 branch.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: crash
Summary: crash because plugin is destroyed during setwindow call → [viewpoint] crash because plugin is destroyed during setwindow call
I do not see how the description and the stack trace relate to the steps to reproduce the crash. According to the steps you crash not on leaving the page but rather on entering it. I tried and crashed right away in layout, no plugin stuff shown in the stack trace (follows).
I'm nominating edt0.9.4 based on our embedding criteria.
Keywords: edt0.9.4
looks like Andrei's been looking at this...over to him
Assignee: peterl → av
Andrei- Perhaps I was not clear. We do crash while entering the page. enering the page causes the setwindow call, which causes (for some reason) the frame to get destroyed. When returning from the setwindow call, mozilla crashes. Perhaps I mischaracterized my call stack as the stack in which we crash- it is not. It is the stack just prior to the crash. Its purpose is to demonstrate that the SetWindow call leads to the destruction of the plugin. I crash after the stack unwinds and we return from setwindow. Is it possible that our call stacks are really the same- that yours is just after the Setwindow call (which cauesed the destroyframe) returned?
The reason the page is destroyed is that the plugin calls GetURL with null target, which means the new content is going to load in the same browser window thus leaving the current page. All this will be irrelevant after I land those painting fixes: there will be no SetWindow call in nsObjectFrame::DidReflow any more. So we should hold on this one for now. I'll try to check it in tonight. Thanks to serge who investigated this problem.
A few questions: When we send a javascript call to some script (whether just calling a function on the page or asking for document.location), is there a target that we can supply so will not leave the page? Why does the plugin not get destroyed any other time we ask for javascript to be evaluated? (we are moving to a different method of javascript evaluation, but we have been using GetURL to run the script. Are you concerned about the browser crashing if you leave the page when it is being set up? Could I cause a crash by requesting a URL during NPP_New or something? Or is this more narrow than that? If the paint fixes take care of this, that would be fantastic.
Ari, you are using 0.9.4 branch, is this true? I cannot reproduce the crash on the trunk. In fact, it crashes in a different place, as I posted earlier. But I can see what you see on the branch. Paint fixes did not help much, but it catches the exeption now and allows you to proceed, so I was able to see the page.
Status: NEW → ASSIGNED
Please disregard what I said in my comment #7. I was thinking of _current target apparently. Null-target seems to be the right target to use in this case.
Ari, could you please describe briefly what exactly the plugin does on this test case, so I could model the situation and investigate it. What I see is NPN_GetURLNotify called when processing NPP_SetWindow call. Could you please more detail?
av- I am not sure exactly what information will help you. I am not sure (other than timing) what makes this test case different from any of our other content. What you said is pretty much the whole story. The first time we receive a SetWindow call, we want to figure out the URL of our content. We do a Get_URLNotify on the URL "javascript:document.location". When we receive the stream, we read the bytes which contain our URL. In this case we crash before we get the stream. We are moving towards a different method of evaluating javascript, but it will still require an initial GetURLNotify to set it up.
Depends on: 117777
Ari says that the crash is gone. This might be an effect of check in to the bug 36997. I am updating my tree to confirm.
my 2 cents: on the trunk, this still crashed for me when I applied the patch from bug 36997.
Could be that (as av mentioned a few weeks back) the exception is just being caught now and so I don't crash anymore. I will check.
Shrirang, on the trunk you most likely see bug 117777.
yes, i just noticed that bug.
Ari, if no special prefs set, you will see the warning message when exception is caught. If you don't see the popup that means it does really not crash.
Ari, how do I tell that I have the latest version of the plugin?
Every couple of days we do an 'official build'. I'll keep putting up both a full installer of our plugin and also the dll and xpt for you guys: http://cole.viewpoint.com/~aberger/VMPFullInstall.exe http://cole.viewpoint.com/~aberger/npViewpoint.dll http://cole.viewpoint.com/~aberger/npViewpoint.xpt Also, if I fix a serious bug that is worth giving to you guys, I will just upload the new dll from my machine. If you would like an email notification whenever I put up a new build, just let me know. Right now, the current full installer is version 3.0.10.8. I put up a newer version - dll only- it is from my machine so the build number is wrong- I think 3.0.8.7 (size=167,936 bytes). Either of those builds should be recent enough to see the same version that I have.
I have v.3.0.9.6 and still see the crash (wrapped with this warning dialog box). Again, for some reason the link to the dll you just posted does not work for me very well -- the downloaded file is broken. If the version I have is not what I should look at, please let me know.
av- I have a newer version up- I wonder why you have problems downloading it. I zipped it up for you- that should work: http://cole.viewpoint.com/~aberger/npViewpoint.zip
I don't know. I am just trying to Save Link As... with the right mouse click, it saves the file but spoiled somehow. Zipped version is better. With this newer version I still see the 'illegal operation by a plugin' message and 'recursive painting' console output: ###!!! ASSERTION: recursive painting not permitted: '!(PR_TRUE == mPainting)', file y:\mozilla\view\src\nsViewManager.cpp, line 831 To be continued...
The crash in the plugin happens when we call NPP_SetWindow with null npinstance.pdata field. Indeed, we should not do that and I am trying to figure out why this is happening, but it would not hurt the plugin to null-check this parameter before derefencing it. In fact, I do it in every NPP_* call in our sample code in the SDK.
Please ignore my last comment (#25). Looks like this value becomes null _during_ this NPP_SetWindow call.
The latest news. For some reason when during the first NPP_SetWindow call the plugin sends NPN_GetURL with javascript string for evaluation, layout tries to recreate the frame which hosts the plugin as if the page is being reloaded. I stop on breakpoint in |nsObjectFrame::Destroy| right after |js_Interpret| call in |js_Execute| (see js/src/jsinterp.c). On most occasions the plugin does not survive, sometimes it does, and sometimes all this just does not happen and the plugin does not go through multiple cycles of reinstantiation, but I see the situation on my debug build, say, 90% of times, so it is pretty consistent. We should at least try to fix the crash. The underlying problem does not probably belong to plugins. I suspect race condition which is hard to debug and predict. Shrirand, can you please test this bug with recent release builds, note that the trunk will not exhitbit this bug, branch only. If it does not happen on release builds, we can perhaps reduce it severity and downgrade it to non-showstopper, the last word being ViewPoint people's.
I have news: On 0106 branch 0.9.4 build, with the latest plugin files (attached in this bug), I actually got a crash(different trace) only the first time when I visited the testcase here: http://cole.viewpoint.com/~ddavies/gecko/interactor1/index.html The next time I tried it, I got an 'illegal operation performed by the plugin'. I dismissed that dialog and the page is loading nicely thereafter and I can see the 3 cubes... Stack trace for crash : nsPluginInstanceOwner::Paint [d:\builds\seamonkey\mozilla\layout\html\base\src\nsObjectFrame.cpp, line 3303] nsObjectFrame::Paint [d:\builds\seamonkey\mozilla\layout\html\base\src\nsObjectFrame.cpp, line 1767] PresShell::Paint [d:\builds\seamonkey\mozilla\layout\html\base\src\nsPresShell.cpp, line 5521] nsView::Paint [d:\builds\seamonkey\mozilla\view\src\nsView.cpp, line 275] nsViewManager::RenderDisplayListElement [d:\builds\seamonkey\mozilla\view\src\nsViewManager.cpp, line 1441] nsViewManager::RenderViews [d:\builds\seamonkey\mozilla\view\src\nsViewManager.cpp, line 1366] nsViewManager::Refresh [d:\builds\seamonkey\mozilla\view\src\nsViewManager.cpp, line 901] nsViewManager::DispatchEvent [d:\builds\seamonkey\mozilla\view\src\nsViewManager.cpp, line 1973] HandleEvent [d:\builds\seamonkey\mozilla\view\src\nsView.cpp, line 68] nsWindow::DispatchEvent [d:\builds\seamonkey\mozilla\widget\src\windows\nsWindow.cpp, line 730] nsWindow::DispatchWindowEvent [d:\builds\seamonkey\mozilla\widget\src\windows\nsWindow.cpp, line 752] nsWindow::OnPaint [d:\builds\seamonkey\mozilla\widget\src\windows\nsWindow.cpp, line 4131] nsWindow::ProcessMessage [d:\builds\seamonkey\mozilla\widget\src\windows\nsWindow.cpp, line 3081] nsWindow::WindowProc [d:\builds\seamonkey\mozilla\widget\src\windows\nsWindow.cpp, line 995] USER32.dll + 0x19d0 (0x77e719d0) USER32.dll + 0x1982 (0x77e71982) ntdll.dll + 0x163a3 (0x77f763a3)
I think that's what I see too. The top call in the stack nsPluginInstanceOwner::paint happens the very next line to NPP_SetWindow call inside which all those bad things happen -- destroying the frame, recreating the plugin, no wonder it crashes since it thinks it is still the same plugin.
This is the call stack at the moment when during |NPP_SetWindow| plugins calls back with javascript evaluation request, and a break point in |nsObjectFrame::Destroy| reached.
Layout folks, can somebody shed light on why we destroy the frame here? What to do to avoid it?
Adding brendan and waterson. Please advise. Briefly: nsObjectFrame::Paint method calls a plugin function. During this call plugin evaluates javascript using NPN_GetURL with null target. Somehow this results in frame destruction and recreating so that when the control returns to the originating nsObjectFrame::Paint method the very |this| object is already nuked. That's what I see is happening. The stack trace in the second previous comment shows pretty much everything I found related so far.
The current version of the plugin uses the attached javascript, but in previous versions the same crash happened with just "javascript:document.location".
If you want me to put up an old version of the plugin (that does send javascript.document.location as the first geturl, let me know.
Whiteboard: [NEED ETA]
Whiteboard: [NEED ETA] → [NO ETA]
-->peterl
Assignee: av → peterlubczynski
Status: ASSIGNED → NEW
I can't reproduce this with our windowless plugin sample. I put a NPN_GetURLNotify("javascript:document.location",...) in the PAINT method and even used a document.write for the embed tag. This ONLY happens with viewpoint. Install their plugin and put breakpoints in GetURLNotify and nsObjectFrame::Destroy and notice the destory gets called after the geturl with the same stack as Andrei in attchment #63909. I also tried changing the javascript command in memory to window.location and that seemed not to crash. Ari, is there something strange going on in this testcase?
Attached file testcase with frameset
Strange.....when using a frameset and targeting that frame with NPN_GetURL seems to do the right thing...
The problem here is that a NULL target is acting like a "_self" and causing the results of the NPN_GetURL to replace the current frame hosting the plugin (bad, bad, bad!!). I'm not sure, but it may be because the plugin doesn't know what context it's inside in this testcase. I wonder if this bug is related to how the plugin is created through Javascript: http://cole.viewpoint.com/~ddavies/MTS3Interface.js Ari, what happens if you use just a simple, static tag? What's really strange is that only on the trunk, without the plugin installed, the testcase causes a crash in layout!
This is bug 117777 I filed couple of days ago.
Attached patch possible 1 line patch (obsolete) — Splinter Review
Looks like the problme is that it's not safe to do a flush during paiting: http://lxr.mozilla.org/mozilla/source/layout/html/base/src/nsPresShell.cpp#5043 Even though we are painting, we don't go through this code path! Here's a one-line patch that may do the trick, but I don't now what I've broken. Please try it out. Jonny, is there any harm in also flushing reflows while in |nsHTMLDocument::ResolveName| like I do in this patch?
Wow! Does it fix the exception?
That did it! Thank you Peter.
Peter, with this fix, is the plugin still being destroyed and recreated?
Comment on attachment 64859 [details] [diff] [review] possible 1 line patch Unfortunately this patch is not acceptable. See the comment above that line of code? Changing the PR_FALSE to PR_TRUE will make it even worse, and it will make that code a perf killer even after the document is done loading.
Attachment #64859 - Flags: needs-work+
Attached patch patch take 2 (obsolete) — Splinter Review
okay, what about this approach? I think it may always be evil to call |FlushPendingNotifications| while painting.
Attachment #64859 - Attachment is obsolete: true
No longer depends on: 117777
*** Bug 117777 has been marked as a duplicate of this bug. ***
That seems like the wrong thing to do, IMO. Flushing the sink has nothing to do with painting, flushing reflows potentially does however. There shouldn't be any reason for flushing the sink not being safe while painting, flushing the sink will cause frames to be constructed, but no reflow should happen. Seems to me that the problem is the fact that we end up flushing from within a Paint() call in the first place, if that's what's really causing the crash here. I'm not happy about limiting when we flush even more than we do now, not flushing when we need to flush will cause incorrect behavior in JS.
This is only reproducable when the plugin is inside a large table because we normally skip the flush because |mParser| is null. Why are large tables special and how can I turn "off" the parser manually? The code inside the plugin is causing the flush to happen. During our Paint, we call SetWindow on the plugin with it's updated location. They in turn, in SetWindow, need to call into Javascript. Only when there is a large table, that call into Javascript results in a |FlushPendingNotifications| because mParser is not null and we didn't do the painting saftey check. I think flushing the sink is bad during painting, at least for plugins, because it may indirectly cause the destruction of the plugin. I think that's what is going on here. One of the crashes I see is when we dereference an invalid content pointer in the object frame.
One small note on Peter's comment- I don't think that it has to do with us being IN a large table- it has to do with the table being on the page with us. The plugin is not inside the table. But it is true that when you remove the table the problem goes away.
The fact that there's a table on this page shouldn't matter, any large chunk of content should cause the same thing to happen. I bet we're dealing with a timming issue here where depending on the existance of the table (in this case) causes the plugin to paint before we're done parsing the HTML, that's why we still have a parser. I don't see how we could change anything in the flushing logic w/o breaking other things, I think we should look for a different approach to this problem. The fact that the plugin is deleted while a call is made through the plugin seems like a problem that the plugin code should protect against. Likewise, knowing that a call to SetWindow() on the plugin can cause the nsObjectFrame to go away, the nsObjectFrame code must protect against things that would go wrong if that happens, i.e. all references to memeber variables after the call to SetWindow() must be avoided, if references to some members are needed after the call to SetWindow() those references need to be aquired before the call and stored on the stack so that the code doesn't need to reference |this| after we know it's not safe to do so. This would obviously need to be well commented in the code. This would take some hacking on the plugin and nsObjectFrame side of this, but it seems very much doable to me, and this would also take care of this same problem next time we run into a case where the nsObjectFrame (and plugin) ends up being destroyed from within a call through the plugin. Thoughts?
This isn't limited to just SetWindow(), that's just the testcase here. It's possible that the plugin will cause this to happen at any time while calling NPN_GetURL(). I wonder if it would be an acceptable fix to the timing issue if we returned error to the plugin if it called NPN_GetURL() while the parser is still active?
I don't think that would be acceptable, that would cause seemingly random behavior. If SetWindow() is not the only entry point into the plugins where this could be a problem, then we need to protect against this around all calls into the plugin. Doesn't seem too unreasonable to me.
That's another, more complicated, bug about de-coupling the object frame from the plugin to allow for things like a reframe. If a plugin chooses to destroy itself, then it will crash, but that's not the case here. The problem here is that the document shoudn't even be changing at all because the plugin asked for the results back by specifiying a null target frame. But for some reason, an |nsHTMLUnknownElement| is causing |nsCSSFrameConstructor::WipeContainingBlock| to destroy the plugin. Using a frameset and manually retargeting the plugin to another frame doesn't crash and also doesn't change the other frame. We setup the stream here in NewPluginURLStream: http://lxr.mozilla.org/mozilla/source/modules/plugin/base/src/nsPluginHostImpl.cpp#5221 I don't understand what would be wrong with rejecting calls from the plugin into Javascript (via NPN_GetURL) while the document is being parsed?
Ignoring the javascript call would surprise our plugin- we'd have to maintain a queue of javascripts to keep trying to call until it succeeds? Sounds a little strange.
This patch also seems to stop the crash for me. Please check it out. This is a timing issue when the plugin is calling into Javascript too early. In the debugger, I can see many of the members in nsHTMLDocument have not been initialized yet. This patch attempts to prevent this condition by returning failure to the plugin if it tries a javascript URL too early. A possible better solution than returning error in this case would be to queue up the javascript url ourselves and execute it asynchronously like we do for non-NULL target cases. This is done in layout with an nsILinkHandler. The only problem is I'm not sure how to hook up the streams so that the plugin gets the result instead of the target frame.
Attachment #64875 - Attachment is obsolete: true
The plugin shouldn't be forced to queue these up when we can do it. This patch uses a PLEvent to schedule (and delay) the execution of an URL that begins with "javascript:" in that case that we are still loading the document. It'll execute it once the document is loaded. Ari, please try this out and let me know if it's acceptable.
Attachment #65390 - Attachment is obsolete: true
This patch seems to fix my problem. Once we get a build with this, I'll get our QA on it. But it works on my machine.
Comment on attachment 65400 [details] [diff] [review] better patch, using PLEvents r=av, looks like it is doing the right thing and addresses jst's concernes.
Attachment #65400 - Flags: review+
Comment on attachment 65400 [details] [diff] [review] better patch, using PLEvents As an alternative solution you could make the plugin host an nsIWebProgressObserver and queue up the requests to load javascript: URL's until you're notified that the document is done loading. This way you wouldn't need to create events and possibly (likely?) end up creating and firing more than one event for the same request. If there's no time to do that, or it doesn't make sense, then I'm ok with the current proposal with a few modifications. - In PluginURLStreamEvent, make mURL an nsString in stead of an nsString* to save one allocation per event. Make mInstance and mListener nsCOMPtr's. Also, if we for some reason don't get an event Q in the constructor of PluginURLStreamEvent we'll leak the event. Unlikely, could be fixed by moving the event Q code into an Init() method that would let the caller know about the failure. I'll leave it up to you to decide. - In the last part of the diff, could |peer| be made an nsCOMPtr to avoid having to worry about releasing before every early return? Also, else-after-return: + if (ev) + return NS_OK; + else + return NS_ERROR_OUT_OF_MEMORY; How about: + if (!ev) + return NS_ERROR_OUT_OF_MEMORY; + + return NS_OK; in stead? sr=jst with that fixed.
Attachment #65400 - Flags: superreview+
I have a patch with Jonny's changes in it, but it may not be needed after all. The true problem seems to be any document.write() AFTER the document.write() of the plugin in the case of a large document. By the plugin calling GetURL() while the document is still loading, it causes a flush of the extra content which is causing the frame and plugin to be destroyed and recreated. This is kinda bad, but especialy bad while painting. My patch prevents the crash by moving the creation/destruction so that at least it's not in paint. So, there are several ways to prevent this race-condition crash: 1) Ordering the document.write for the plugin so that nothing writes after it (content-side fix) 2) Don't FlushPendingNotifications while the plugin is painting (my patch) 3) Prevent the object frame from being destroyed/created when the flush happens (possible fix deep in layout?)
Whiteboard: [NO ETA] → [POSSIBLE CLIENT-SIDE FIX]
In response to Peter's 3 suggestions: -1) Ordering the document.write for the plugin so that nothing writes after it -(content-side fix) Unfortunately, a content fix like this is not acceptable- we do not always have control over the html on a page that embeds us. We can't really dictate "No document.writes on large web pages after our embed". 2) Don't FlushPendingNotifications while the plugin is painting (my patch) This is an acceptable workaround, but not really correct. The plugin still gets destroyed and recreated. Doesn't hurt too much, but is probably bad performance-wise. 3) Prevent the object frame from being destroyed/created when the flush happens (possible fix deep in layout?) This seems to be the correct fix to me.
Comments on the above fix proposals: #3 seems to risky for the embedding customers using 0.9.4; it should be considered for the trunk only (unless it's an easy, low risk fix) #2 if fixes the problem then it should be done for 0.9.4
Peter helped us track down that the cause of the problem is in our js interface. We use document.write to put together our embed tag. Instead of writing in something like this: <EMBED height="350" width="200" src="test.mtx" properties="string" ... > We had something like this: <EMBED height="350" width="200" src="test.mtx" <> properties="string" ... > which got parsed into: <EMBED height="350" width="200" src="test.mtx" < properties="string" ... > And so the block tag was inside the embed tag- bad... While I feel that bad content shouldn't crash the browser, that is another story for another day. When we fix our content, no crash, which is the bottom line for now. Peter- thanks for your help.
--->INVALID The general problem of tieing plugin lifetime to the frames will be resolved n bug 90268. This will prevent further reframe crashes. Feel free to open another bug if you think we are incorrectly parsing invalid HTML markup in document.write() calls.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Keywords: edt0.9.4edt0.9.4-
Resolution: --- → INVALID
Whiteboard: [POSSIBLE CLIENT-SIDE FIX]
Keywords: topembed
spam : marking invalid bugs 'verif'. reopen if u disagree, reporter!
Status: RESOLVED → VERIFIED
Component: Plug-ins → Viewpoint Media Player
Product: Core → Plugins
QA Contact: shrir → viewpoint-mediaplayer
Version: Trunk → unspecified
Product: Plugins → Plugins Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: