Closed
Bug 156711
Opened 22 years ago
Closed 22 years ago
Startup crash [@ xptiInterfaceEntry::ResolveLocked] on Trunk
Categories
(Core :: XPConnect, defect, P1)
Tracking
()
VERIFIED
FIXED
mozilla1.2alpha
People
(Reporter: greer, Assigned: leaf)
References
Details
(Keywords: crash, qawanted, topcrash)
Crash Data
Attachments
(2 files, 1 obsolete file)
|
1.42 KB,
text/plain
|
Details | |
|
847 bytes,
patch
|
Details | Diff | Splinter Review |
Since the 07-04 Trunk build we have seen the xptiInterfaceEntry::ResolveLocked
signature in the Talkback topcrash reports. I don't see any checkins for any of
the modules of the top 6-7 frames of the stack in that time.
Since it's a crash in the opening seconds of the launch, there aren't any
particular repro steps. Here's what we've got...
Min/Max Runtime: 0 - 22
Crash data range: 2002-07-04 to 2002-07-09
Build ID range: 2002070404 to 2002070813
Stack Trace:
xptiInterfaceEntry::ResolveLocked
[c:/builds/seamonkey/mozilla/xpcom/reflect/xptinfo/src/xptiInterfaceInfo.cpp
line 193]
xptiInterfaceEntry::Resolve
[c:/builds/seamonkey/mozilla/xpcom/reflect/xptinfo/src/xptiInterfaceInfo.cpp
line 146]
xptiInterfaceEntry::GetConstantCount
[c:/builds/seamonkey/mozilla/xpcom/reflect/xptinfo/src/xptiInterfaceInfo.cpp
line 307]
xptiInterfaceInfo::GetConstantCount
[c:/builds/seamonkey/mozilla/xpcom/reflect/xptinfo/src/xptiprivate.h line 704]
nsScriptNameSpaceManager::RegisterInterface
[c:/builds/seamonkey/mozilla/dom/src/build/nsScriptNameSpaceManager.cpp line 386]
nsScriptNameSpaceManager::FillHashWithDOMInterfaces
[c:/builds/seamonkey/mozilla/dom/src/build/nsScriptNameSpaceManager.cpp line 273]
nsScriptNameSpaceManager::Init
[c:/builds/seamonkey/mozilla/dom/src/build/nsScriptNameSpaceManager.cpp line 444]
nsJSContext::InitContext
[c:/builds/seamonkey/mozilla/dom/src/base/nsJSEnvironment.cpp line 1153]
NS_CreateScriptContext
[c:/builds/seamonkey/mozilla/dom/src/base/nsJSEnvironment.cpp line 1757]
nsDOMSOFactory::NewScriptContext
[c:/builds/seamonkey/mozilla/dom/src/build/nsDOMFactory.cpp line 156]
nsDocShell::EnsureScriptEnvironment
[c:/builds/seamonkey/mozilla/docshell/base/nsDocShell.cpp line 6232]
nsWebShell::GetInterface
[c:/builds/seamonkey/mozilla/docshell/base/nsWebShell.cpp line 296]
nsGetInterface::operator()
[c:/builds/seamonkey/mozilla/xpcom/glue/nsIInterfaceRequestorUtils.cpp line 55]
nsCOMPtr_base::assign_from_helper
[c:/builds/seamonkey/mozilla/xpcom/glue/nsCOMPtr.cpp line 81]
nsDocShell::InternalLoad
[c:/builds/seamonkey/mozilla/docshell/base/nsDocShell.cpp line 4500]
nsDocShell::LoadURI
[c:/builds/seamonkey/mozilla/docshell/base/nsDocShell.cpp line 663]
nsDocShell::LoadURI
[c:/builds/seamonkey/mozilla/docshell/base/nsDocShell.cpp line 2399]
nsWebShellWindow::Initialize
[c:/builds/seamonkey/mozilla/xpfe/appshell/src/nsWebShellWindow.cpp line 343]
nsAppShellService::JustCreateTopWindow
[c:/builds/seamonkey/mozilla/xpfe/appshell/src/nsAppShellService.cpp line 716]
nsAppShellService::CreateHiddenWindow
[c:/builds/seamonkey/mozilla/xpfe/appshell/src/nsAppShellService.cpp line 419]
main1
[c:/builds/seamonkey/mozilla/xpfe/bootstrap/nsAppRunner.cpp line 1429]
main
[c:/builds/seamonkey/mozilla/xpfe/bootstrap/nsAppRunner.cpp line 1808]
WinMain
[c:/builds/seamonkey/mozilla/xpfe/bootstrap/nsAppRunner.cpp line 1826]
WinMainCRTStartup()
kernel32.dll + 0x1eb69 (0x77e7eb69)
Source File :
c:/builds/seamonkey/mozilla/xpcom/reflect/xptinfo/src/xptiInterfaceInfo.cpp line
: 193
(8100501) Comments: just trying to start it
(8093730) URL: mp3.com
(8030051) Comments: loading Mozilla
Comment 2•22 years ago
|
||
xpconnect
Assignee: dougt → dbradley
Component: XPCOM → XPConnect
QA Contact: scc → pschwartau
Comment 3•22 years ago
|
||
Given the value in eax, ffff0066 or similar value in the couple of incidents I looked at, this looks like it thinks this has already been resolved, and using xptiTypelibGuts instead of xptiTypelib. xptiTypelib contains two 16 bit integers, one often set to ffff and the other the method index. I'll look into how this could have happened.
Status: NEW → ASSIGNED
Priority: -- → P1
Comment 4•22 years ago
|
||
Ugh, that was a little bit off, I got xptiInterfaceGuts with xptiTypelibGuts. But it does look like somehow the mDescriptor of xptiInterfaceGuts got overloaded with the xptiTypelib image. I'm suspicious if this isn't more of the problem we've seen in the past with overlaying a build over an existing install when interface changes occur. I'll see what I can find.
Comment 5•22 years ago
|
||
This looks like more problems with people copying over an existing build. The crash occurs because we think we've resolved this entry, but we haven't. So we attempt to use the mInterface pointer which instead holds the mTypelib structure. There are only two states that I can think of that would cause this problem. One is an interface in the manifest that doesn't actually exist. The second is if xptiInterfaceGuts::NewGuts would return null. The later seems unlikely, although Bug 149032 patch to add CPP_THROW_NEW does touch this, but I still think it's unlikely. If someone copied over an existing installation and the manifest contained an entry for an old interface, in this case a DOM interface, it would never find it. and get into this partially resolved state. CC'ing jst, in case he knows off the top of his head if any DOM interface/method was removed. Unlike our other problems with copying over existing installs, we could add code to detect this state and correct. It would add a little time to the resolving logic.
Comment 6•22 years ago
|
||
Talked with jst and there was a case where an nsIDOM interface was added and then removed/renamed around that time. So I'm 90% sure this is the case. The patch I'm attaching will allow a more graceful exit and reporting of this condition. It will add additional time to the function for checking the state flag, but this should be on the order of nano-seconds.
Comment 7•22 years ago
|
||
I shouldn't have been looking at the assert when I wrote the patch.
Attachment #91380 -
Attachment is obsolete: true
Comment 8•22 years ago
|
||
Considering the fact that this is a topcrasher and we have a patch...are we going to get this on the Trunk soon?
Comment 9•22 years ago
|
||
Seeking r/sr, mail message request sent as well. Soon as we get r/sr I'll seek an a=.
Comment 10•22 years ago
|
||
Comment on attachment 91385 [details] [diff] [review] Version with the correct test There is no reason to format the string in a non-debug build since it will not be used. What will the effect of returning false be? Does this not just paper over the real problem? Will this help people figure that they have a BROKEN installation? The fact that we have a top crasher that apparently can only happen if people have incorrectly installed a non-installer build is *itself* a problem. Why is talkback distributed as a zip rather than as an installer build. People are clearly doing the wrong thing with these builds. How much extra work and confusion is this causing? Can't we fix the real problem? - i.e. distrubute talkback as installer builds rather than as zip?
Attachment #91385 -
Flags: needs-work+
Comment 11•22 years ago
|
||
Given this is mostly with talkback builds, the patch doesn't buy us much other than identifying the offender in a debug build which I don't know if thats occurs a lot. Is there any way to report errors in a release build? I think given this is affected by external factors and errant code, I think it warrants more than an assert. But if no such facility exists, then it may be best to leave it be and focus on the biggest offender, the way talkback builds are distributed.
Comment 12•22 years ago
|
||
Reporting errors in release builds doesn't buy much. No coherent installation will have this problem. If this problem is detected the thing to do is to delete the manifest (so that the *next* run of the app will build a new non-broken manfest) and return false. See an example here: http://lxr.mozilla.org/seamonkey/source/xpcom/reflect/xptinfo/src/xptiInterfaceInfoManager.cpp#498 I think the existing assertion is correct and an additional non-debug check would not hurt. However, I say again that the real problem is people not doing a proper install. I have a quick and dirty answer to that... You should have the build generate a non-valid components/xpti.dat file and this should be included in the zip. The real installer both deletes stale files and installs new files. With the zip file we can only add or overwrite files. Stale files can still be a problem if people just unzip on top of an old installation. But, at least for xpti.dat, we can force a new autoreg and building of xpti.dat by having the unzip overwrite the old xpti.dat with a new one. This buildtime generated copy need not (and should not!) have valid contents. Just a file with that name containing a single '\n' will ensure that the old copy is overwritten and then during the first run the xpti system will detect that the manifest is not valid and will then rebuild a vaild one by scanning all the existing xpt files. This is an easy solution to implement and should not have bad side effects on debug or non-debug builds. I'd expect it to solve most of the false problems we see from talkback builds being unzipped by people over existing installations. We should *still* try to fix that (since stale dlls and xpt files can still add noise to our crash data). But this simple change should help with most of the cases we see caused by the xpti.dat not matching the info in the installed xpt files.
| Assignee | ||
Comment 13•22 years ago
|
||
would a blank xpti.dat file do the trick, jband? I can add that to the windows automation pretty easily.
Comment 14•22 years ago
|
||
leaf: I believe it would (a little testing would be good:) I'd vote for sticking a /n in the file to avoid any zero length file issues that *might* come up.
Comment 15•22 years ago
|
||
To you Leaf to address the xpti.dat file. For the heck of it, I tried this on the Mac and this approach worked there as well. I don't know if this affects the Mac talkbacks or not.
Assignee: dbradley → leaf
Status: ASSIGNED → NEW
Comment 16•22 years ago
|
||
*** Bug 161503 has been marked as a duplicate of this bug. ***
Comment 17•22 years ago
|
||
Leaf: This crash hasn't been around for a while now. Did you make the changes to the build automation (has the xpti.dat file been created)? If so, can we mark this fixed?
Comment 18•22 years ago
|
||
This is exactly the kind of problem I wanted to avoid by not creating non-installer talkback builds. I still believe we should only be putting talkback in the installer. For this bug we should have used the mail feature of talkback so that anyone that crashes with this stacktrace would get an email saying their installation is corrupt. Is it too late to put this in place rather than trying to paper over the real problem?
Updated•22 years ago
|
Target Milestone: --- → mozilla1.2alpha
Comment 19•22 years ago
|
||
I haven't seen this crash for a long time. Marking fixed.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Updated•13 years ago
|
Crash Signature: [@ xptiInterfaceEntry::ResolveLocked]
You need to log in
before you can comment on or make changes to this bug.
Description
•