Closed Bug 116232 Opened 20 years ago Closed 20 years ago
[viewpoint] crash because plugin is destroyed during setwindow call
11.11 KB, text/plain
3.51 KB, text/plain
11.85 KB, text/plain
6.97 KB, text/plain
200 bytes, text/plain
287 bytes, text/html
6.49 KB, patch
|Details | Diff | Splinter Review|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: crash because plugin is destroyed during setwindow call → [viewpoint] crash because plugin is destroyed during setwindow call
I do not see how the description and the stack trace relate to the steps to reproduce the crash. According to the steps you crash not on leaving the page but rather on entering it. I tried and crashed right away in layout, no plugin stuff shown in the stack trace (follows).
I'm nominating edt0.9.4 based on our embedding criteria.
looks like Andrei's been looking at this...over to him
Assignee: peterl → av
Andrei- Perhaps I was not clear. We do crash while entering the page. enering the page causes the setwindow call, which causes (for some reason) the frame to get destroyed. When returning from the setwindow call, mozilla crashes. Perhaps I mischaracterized my call stack as the stack in which we crash- it is not. It is the stack just prior to the crash. Its purpose is to demonstrate that the SetWindow call leads to the destruction of the plugin. I crash after the stack unwinds and we return from setwindow. Is it possible that our call stacks are really the same- that yours is just after the Setwindow call (which cauesed the destroyframe) returned?
The reason the page is destroyed is that the plugin calls GetURL with null target, which means the new content is going to load in the same browser window thus leaving the current page. All this will be irrelevant after I land those painting fixes: there will be no SetWindow call in nsObjectFrame::DidReflow any more. So we should hold on this one for now. I'll try to check it in tonight. Thanks to serge who investigated this problem.
Ari, you are using 0.9.4 branch, is this true? I cannot reproduce the crash on the trunk. In fact, it crashes in a different place, as I posted earlier. But I can see what you see on the branch. Paint fixes did not help much, but it catches the exeption now and allows you to proceed, so I was able to see the page.
Status: NEW → ASSIGNED
Please disregard what I said in my comment #7. I was thinking of _current target apparently. Null-target seems to be the right target to use in this case.
Ari, could you please describe briefly what exactly the plugin does on this test case, so I could model the situation and investigate it. What I see is NPN_GetURLNotify called when processing NPP_SetWindow call. Could you please more detail?
Ari says that the crash is gone. This might be an effect of check in to the bug 36997. I am updating my tree to confirm.
my 2 cents: on the trunk, this still crashed for me when I applied the patch from bug 36997.
Could be that (as av mentioned a few weeks back) the exception is just being caught now and so I don't crash anymore. I will check.
Shrirang, on the trunk you most likely see bug 117777.
yes, i just noticed that bug.
Ari, if no special prefs set, you will see the warning message when exception is caught. If you don't see the popup that means it does really not crash.
Ari, how do I tell that I have the latest version of the plugin?
Every couple of days we do an 'official build'. I'll keep putting up both a full installer of our plugin and also the dll and xpt for you guys: http://cole.viewpoint.com/~aberger/VMPFullInstall.exe http://cole.viewpoint.com/~aberger/npViewpoint.dll http://cole.viewpoint.com/~aberger/npViewpoint.xpt Also, if I fix a serious bug that is worth giving to you guys, I will just upload the new dll from my machine. If you would like an email notification whenever I put up a new build, just let me know. Right now, the current full installer is version 126.96.36.199. I put up a newer version - dll only- it is from my machine so the build number is wrong- I think 188.8.131.52 (size=167,936 bytes). Either of those builds should be recent enough to see the same version that I have.
I have v.184.108.40.206 and still see the crash (wrapped with this warning dialog box). Again, for some reason the link to the dll you just posted does not work for me very well -- the downloaded file is broken. If the version I have is not what I should look at, please let me know.
av- I have a newer version up- I wonder why you have problems downloading it. I zipped it up for you- that should work: http://cole.viewpoint.com/~aberger/npViewpoint.zip
I don't know. I am just trying to Save Link As... with the right mouse click, it saves the file but spoiled somehow. Zipped version is better. With this newer version I still see the 'illegal operation by a plugin' message and 'recursive painting' console output: ###!!! ASSERTION: recursive painting not permitted: '!(PR_TRUE == mPainting)', file y:\mozilla\view\src\nsViewManager.cpp, line 831 To be continued...
The crash in the plugin happens when we call NPP_SetWindow with null npinstance.pdata field. Indeed, we should not do that and I am trying to figure out why this is happening, but it would not hurt the plugin to null-check this parameter before derefencing it. In fact, I do it in every NPP_* call in our sample code in the SDK.
Please ignore my last comment (#25). Looks like this value becomes null _during_ this NPP_SetWindow call.
I have news: On 0106 branch 0.9.4 build, with the latest plugin files (attached in this bug), I actually got a crash(different trace) only the first time when I visited the testcase here: http://cole.viewpoint.com/~ddavies/gecko/interactor1/index.html The next time I tried it, I got an 'illegal operation performed by the plugin'. I dismissed that dialog and the page is loading nicely thereafter and I can see the 3 cubes... Stack trace for crash : nsPluginInstanceOwner::Paint [d:\builds\seamonkey\mozilla\layout\html\base\src\nsObjectFrame.cpp, line 3303] nsObjectFrame::Paint [d:\builds\seamonkey\mozilla\layout\html\base\src\nsObjectFrame.cpp, line 1767] PresShell::Paint [d:\builds\seamonkey\mozilla\layout\html\base\src\nsPresShell.cpp, line 5521] nsView::Paint [d:\builds\seamonkey\mozilla\view\src\nsView.cpp, line 275] nsViewManager::RenderDisplayListElement [d:\builds\seamonkey\mozilla\view\src\nsViewManager.cpp, line 1441] nsViewManager::RenderViews [d:\builds\seamonkey\mozilla\view\src\nsViewManager.cpp, line 1366] nsViewManager::Refresh [d:\builds\seamonkey\mozilla\view\src\nsViewManager.cpp, line 901] nsViewManager::DispatchEvent [d:\builds\seamonkey\mozilla\view\src\nsViewManager.cpp, line 1973] HandleEvent [d:\builds\seamonkey\mozilla\view\src\nsView.cpp, line 68] nsWindow::DispatchEvent [d:\builds\seamonkey\mozilla\widget\src\windows\nsWindow.cpp, line 730] nsWindow::DispatchWindowEvent [d:\builds\seamonkey\mozilla\widget\src\windows\nsWindow.cpp, line 752] nsWindow::OnPaint [d:\builds\seamonkey\mozilla\widget\src\windows\nsWindow.cpp, line 4131] nsWindow::ProcessMessage [d:\builds\seamonkey\mozilla\widget\src\windows\nsWindow.cpp, line 3081] nsWindow::WindowProc [d:\builds\seamonkey\mozilla\widget\src\windows\nsWindow.cpp, line 995] USER32.dll + 0x19d0 (0x77e719d0) USER32.dll + 0x1982 (0x77e71982) ntdll.dll + 0x163a3 (0x77f763a3)
I think that's what I see too. The top call in the stack nsPluginInstanceOwner::paint happens the very next line to NPP_SetWindow call inside which all those bad things happen -- destroying the frame, recreating the plugin, no wonder it crashes since it thinks it is still the same plugin.
Layout folks, can somebody shed light on why we destroy the frame here? What to do to avoid it?
Assignee: av → peterlubczynski
Status: ASSIGNED → NEW
Strange.....when using a frameset and targeting that frame with NPN_GetURL seems to do the right thing...
This is bug 117777 I filed couple of days ago.
Looks like the problme is that it's not safe to do a flush during paiting: http://lxr.mozilla.org/mozilla/source/layout/html/base/src/nsPresShell.cpp#5043 Even though we are painting, we don't go through this code path! Here's a one-line patch that may do the trick, but I don't now what I've broken. Please try it out. Jonny, is there any harm in also flushing reflows while in |nsHTMLDocument::ResolveName| like I do in this patch?
Wow! Does it fix the exception?
That did it! Thank you Peter.
Peter, with this fix, is the plugin still being destroyed and recreated?
Comment on attachment 64859 [details] [diff] [review] possible 1 line patch Unfortunately this patch is not acceptable. See the comment above that line of code? Changing the PR_FALSE to PR_TRUE will make it even worse, and it will make that code a perf killer even after the document is done loading.
Attachment #64859 - Flags: needs-work+
okay, what about this approach? I think it may always be evil to call |FlushPendingNotifications| while painting.
Attachment #64859 - Attachment is obsolete: true
*** Bug 117777 has been marked as a duplicate of this bug. ***
That seems like the wrong thing to do, IMO. Flushing the sink has nothing to do with painting, flushing reflows potentially does however. There shouldn't be any reason for flushing the sink not being safe while painting, flushing the sink will cause frames to be constructed, but no reflow should happen. Seems to me that the problem is the fact that we end up flushing from within a Paint() call in the first place, if that's what's really causing the crash here. I'm not happy about limiting when we flush even more than we do now, not flushing when we need to flush will cause incorrect behavior in JS.
One small note on Peter's comment- I don't think that it has to do with us being IN a large table- it has to do with the table being on the page with us. The plugin is not inside the table. But it is true that when you remove the table the problem goes away.
The fact that there's a table on this page shouldn't matter, any large chunk of content should cause the same thing to happen. I bet we're dealing with a timming issue here where depending on the existance of the table (in this case) causes the plugin to paint before we're done parsing the HTML, that's why we still have a parser. I don't see how we could change anything in the flushing logic w/o breaking other things, I think we should look for a different approach to this problem. The fact that the plugin is deleted while a call is made through the plugin seems like a problem that the plugin code should protect against. Likewise, knowing that a call to SetWindow() on the plugin can cause the nsObjectFrame to go away, the nsObjectFrame code must protect against things that would go wrong if that happens, i.e. all references to memeber variables after the call to SetWindow() must be avoided, if references to some members are needed after the call to SetWindow() those references need to be aquired before the call and stored on the stack so that the code doesn't need to reference |this| after we know it's not safe to do so. This would obviously need to be well commented in the code. This would take some hacking on the plugin and nsObjectFrame side of this, but it seems very much doable to me, and this would also take care of this same problem next time we run into a case where the nsObjectFrame (and plugin) ends up being destroyed from within a call through the plugin. Thoughts?
This isn't limited to just SetWindow(), that's just the testcase here. It's possible that the plugin will cause this to happen at any time while calling NPN_GetURL(). I wonder if it would be an acceptable fix to the timing issue if we returned error to the plugin if it called NPN_GetURL() while the parser is still active?
I don't think that would be acceptable, that would cause seemingly random behavior. If SetWindow() is not the only entry point into the plugins where this could be a problem, then we need to protect against this around all calls into the plugin. Doesn't seem too unreasonable to me.
Attachment #64875 - Attachment is obsolete: true
Attachment #65390 - Attachment is obsolete: true
This patch seems to fix my problem. Once we get a build with this, I'll get our QA on it. But it works on my machine.
Comment on attachment 65400 [details] [diff] [review] better patch, using PLEvents r=av, looks like it is doing the right thing and addresses jst's concernes.
Attachment #65400 - Flags: review+
Attachment #65400 - Flags: superreview+
I have a patch with Jonny's changes in it, but it may not be needed after all. The true problem seems to be any document.write() AFTER the document.write() of the plugin in the case of a large document. By the plugin calling GetURL() while the document is still loading, it causes a flush of the extra content which is causing the frame and plugin to be destroyed and recreated. This is kinda bad, but especialy bad while painting. My patch prevents the crash by moving the creation/destruction so that at least it's not in paint. So, there are several ways to prevent this race-condition crash: 1) Ordering the document.write for the plugin so that nothing writes after it (content-side fix) 2) Don't FlushPendingNotifications while the plugin is painting (my patch) 3) Prevent the object frame from being destroyed/created when the flush happens (possible fix deep in layout?)
Whiteboard: [NO ETA] → [POSSIBLE CLIENT-SIDE FIX]
In response to Peter's 3 suggestions: -1) Ordering the document.write for the plugin so that nothing writes after it -(content-side fix) Unfortunately, a content fix like this is not acceptable- we do not always have control over the html on a page that embeds us. We can't really dictate "No document.writes on large web pages after our embed". 2) Don't FlushPendingNotifications while the plugin is painting (my patch) This is an acceptable workaround, but not really correct. The plugin still gets destroyed and recreated. Doesn't hurt too much, but is probably bad performance-wise. 3) Prevent the object frame from being destroyed/created when the flush happens (possible fix deep in layout?) This seems to be the correct fix to me.
Comments on the above fix proposals: #3 seems to risky for the embedding customers using 0.9.4; it should be considered for the trunk only (unless it's an easy, low risk fix) #2 if fixes the problem then it should be done for 0.9.4
Peter helped us track down that the cause of the problem is in our js interface. We use document.write to put together our embed tag. Instead of writing in something like this: <EMBED height="350" width="200" src="test.mtx" properties="string" ... > We had something like this: <EMBED height="350" width="200" src="test.mtx" <> properties="string" ... > which got parsed into: <EMBED height="350" width="200" src="test.mtx" < properties="string" ... > And so the block tag was inside the embed tag- bad... While I feel that bad content shouldn't crash the browser, that is another story for another day. When we fix our content, no crash, which is the bottom line for now. Peter- thanks for your help.
--->INVALID The general problem of tieing plugin lifetime to the frames will be resolved n bug 90268. This will prevent further reframe crashes. Feel free to open another bug if you think we are incorrectly parsing invalid HTML markup in document.write() calls.
spam : marking invalid bugs 'verif'. reopen if u disagree, reporter!
Status: RESOLVED → VERIFIED
Component: Plug-ins → Viewpoint Media Player
Product: Core → Plugins
QA Contact: shrir → viewpoint-mediaplayer
Version: Trunk → unspecified
You need to log in before you can comment on or make changes to this bug.