Closed Bug 156711 Opened 22 years ago Closed 22 years ago

Startup crash [@ xptiInterfaceEntry::ResolveLocked] on Trunk

Categories

(Core :: XPConnect, defect, P1)

x86
Windows 98
defect

Tracking

()

VERIFIED FIXED
mozilla1.2alpha

People

(Reporter: greer, Assigned: leaf)

References

Details

(Keywords: crash, qawanted, topcrash)

Crash Data

Attachments

(2 files, 1 obsolete file)

Since the 07-04 Trunk build we have seen the xptiInterfaceEntry::ResolveLocked
signature in the Talkback topcrash reports. I don't see any checkins for any of
the modules of the top 6-7 frames of the stack in that time.

Since it's a crash in the opening seconds of the launch, there aren't any
particular repro steps. Here's what we've got...

Min/Max Runtime: 0 - 22
Crash data range: 2002-07-04 to 2002-07-09
Build ID range: 2002070404 to 2002070813
 
Stack Trace: 

	 xptiInterfaceEntry::ResolveLocked
[c:/builds/seamonkey/mozilla/xpcom/reflect/xptinfo/src/xptiInterfaceInfo.cpp 
line 193]
	 xptiInterfaceEntry::Resolve
[c:/builds/seamonkey/mozilla/xpcom/reflect/xptinfo/src/xptiInterfaceInfo.cpp 
line 146]
	 xptiInterfaceEntry::GetConstantCount
[c:/builds/seamonkey/mozilla/xpcom/reflect/xptinfo/src/xptiInterfaceInfo.cpp 
line 307]
	 xptiInterfaceInfo::GetConstantCount
[c:/builds/seamonkey/mozilla/xpcom/reflect/xptinfo/src/xptiprivate.h  line 704]
	 nsScriptNameSpaceManager::RegisterInterface
[c:/builds/seamonkey/mozilla/dom/src/build/nsScriptNameSpaceManager.cpp  line 386]
	 nsScriptNameSpaceManager::FillHashWithDOMInterfaces
[c:/builds/seamonkey/mozilla/dom/src/build/nsScriptNameSpaceManager.cpp  line 273]
	 nsScriptNameSpaceManager::Init
[c:/builds/seamonkey/mozilla/dom/src/build/nsScriptNameSpaceManager.cpp  line 444]
	 nsJSContext::InitContext
[c:/builds/seamonkey/mozilla/dom/src/base/nsJSEnvironment.cpp  line 1153]
	 NS_CreateScriptContext
[c:/builds/seamonkey/mozilla/dom/src/base/nsJSEnvironment.cpp  line 1757]
	 nsDOMSOFactory::NewScriptContext
[c:/builds/seamonkey/mozilla/dom/src/build/nsDOMFactory.cpp  line 156]
	 nsDocShell::EnsureScriptEnvironment
[c:/builds/seamonkey/mozilla/docshell/base/nsDocShell.cpp  line 6232]
	 nsWebShell::GetInterface
[c:/builds/seamonkey/mozilla/docshell/base/nsWebShell.cpp  line 296]
	 nsGetInterface::operator()
[c:/builds/seamonkey/mozilla/xpcom/glue/nsIInterfaceRequestorUtils.cpp  line 55]
	 nsCOMPtr_base::assign_from_helper
[c:/builds/seamonkey/mozilla/xpcom/glue/nsCOMPtr.cpp  line 81]
	 nsDocShell::InternalLoad
[c:/builds/seamonkey/mozilla/docshell/base/nsDocShell.cpp  line 4500]
	 nsDocShell::LoadURI
[c:/builds/seamonkey/mozilla/docshell/base/nsDocShell.cpp  line 663]
	 nsDocShell::LoadURI
[c:/builds/seamonkey/mozilla/docshell/base/nsDocShell.cpp  line 2399]
	 nsWebShellWindow::Initialize
[c:/builds/seamonkey/mozilla/xpfe/appshell/src/nsWebShellWindow.cpp  line 343]
	 nsAppShellService::JustCreateTopWindow
[c:/builds/seamonkey/mozilla/xpfe/appshell/src/nsAppShellService.cpp  line 716]
	 nsAppShellService::CreateHiddenWindow
[c:/builds/seamonkey/mozilla/xpfe/appshell/src/nsAppShellService.cpp  line 419]
	 main1
[c:/builds/seamonkey/mozilla/xpfe/bootstrap/nsAppRunner.cpp  line 1429]
	 main
[c:/builds/seamonkey/mozilla/xpfe/bootstrap/nsAppRunner.cpp  line 1808]
	 WinMain
[c:/builds/seamonkey/mozilla/xpfe/bootstrap/nsAppRunner.cpp  line 1826]
	 WinMainCRTStartup()
	 kernel32.dll + 0x1eb69 (0x77e7eb69)
 
 	Source File :
c:/builds/seamonkey/mozilla/xpcom/reflect/xptinfo/src/xptiInterfaceInfo.cpp line
: 193
     (8100501)	Comments: just trying to start it
     (8093730)	URL: mp3.com
     (8030051)	Comments: loading Mozilla
Keywords: crash, qawanted, topcrash
Attached file OS breakdown
FWIW, a deeper breakdown of the OS info.
xpconnect
Assignee: dougt → dbradley
Component: XPCOM → XPConnect
QA Contact: scc → pschwartau
Given the value in eax, ffff0066 or similar value in the couple of incidents I
looked at, this looks like it thinks this has already been resolved, and using
xptiTypelibGuts instead of xptiTypelib. xptiTypelib contains two 16 bit
integers, one often set to ffff and the other the method index. I'll look into
how this could have happened.
Status: NEW → ASSIGNED
Priority: -- → P1
Ugh, that was a little bit off, I got xptiInterfaceGuts with xptiTypelibGuts.
But it does look like somehow the mDescriptor of xptiInterfaceGuts got
overloaded with the xptiTypelib image. I'm suspicious if this isn't more of the
problem we've seen in the past with overlaying a build over an existing install
when interface changes occur. I'll see what I can find.
This looks like more problems with people copying over an existing build. The
crash occurs because we think we've resolved this entry, but we haven't. So we
attempt to use the mInterface pointer which instead holds the mTypelib structure.

There are only two states that I can think of that would cause this problem. One
is an interface in the manifest that doesn't actually exist. The second is if
xptiInterfaceGuts::NewGuts would return null. The later seems unlikely, although
Bug 149032 patch to add CPP_THROW_NEW does touch this, but I still think it's
unlikely.

If someone copied over an existing installation and the manifest contained an
entry for an old interface, in this case a DOM interface, it would never find
it. and get into this partially resolved state. CC'ing jst, in case he knows off
the top of his head if any DOM interface/method was removed.

Unlike our other problems with copying over existing installs, we could add code
to detect this state and correct. It would add a little time to the resolving logic.
Talked with jst and there was a case where an nsIDOM interface was added and
then removed/renamed around that time. So I'm 90% sure this is the case. The
patch I'm attaching will allow a more graceful exit and reporting of this
condition.

It will add additional time to the function for checking the state flag, but
this should be on the order of nano-seconds.
I shouldn't have been looking at the assert when I wrote the patch.
Attachment #91380 - Attachment is obsolete: true
Considering the fact that this is a topcrasher and we have a patch...are we
going to get this on the Trunk soon?  
Seeking r/sr, mail message request sent as well.

Soon as we get r/sr I'll seek an a=.
Comment on attachment 91385 [details] [diff] [review]
Version with the correct test

There is no reason to format the string in a non-debug build since it will not
be used.

What will the effect of returning false be? 

Does this not just paper over the real problem? Will this help people figure
that they have a BROKEN installation?

The fact that we have a top crasher that apparently can only happen if people
have incorrectly installed a non-installer build is *itself* a problem. Why is
talkback distributed as a zip rather than as an installer build. People are
clearly doing the wrong thing with these builds. How much extra work and
confusion is this causing? Can't we fix the real problem?
 - i.e. distrubute talkback as installer builds rather than as zip?
Attachment #91385 - Flags: needs-work+
Given this is mostly with talkback builds, the patch doesn't buy us much other
than identifying the offender in a debug build which I don't know if thats
occurs a lot.

Is there any way to report errors in a release build? I think given this is
affected by external factors and errant code, I think it warrants more than an
assert. But if no such facility exists, then it may be best to leave it be and
focus on the biggest offender, the way talkback builds are distributed.
Reporting errors in release builds doesn't buy much.

No coherent installation will have this problem. If this problem is detected the
thing to do is to delete the manifest (so that the *next* run of the app will
build a new non-broken manfest) and return false. See an example here:
http://lxr.mozilla.org/seamonkey/source/xpcom/reflect/xptinfo/src/xptiInterfaceInfoManager.cpp#498

I think the existing assertion is correct and an additional non-debug check
would not hurt.

However, I say again that the real problem is people not doing a proper install.
I have a quick and dirty answer to that...

You should have the build generate a non-valid components/xpti.dat file and this
should be included in the zip. The real installer both deletes stale files and
installs new files. With the zip file we can only add or overwrite files. Stale
files can still be a problem if people just unzip on top of an old installation.
But, at least for xpti.dat, we can force a new autoreg and building of xpti.dat
by having the unzip overwrite the old xpti.dat with a new one. This buildtime
generated copy need not (and should not!) have valid contents. Just a file with
that name containing a single '\n' will ensure that the old copy is overwritten
and then during the first run the xpti system will detect that the manifest is
not valid and will then rebuild a vaild one by scanning all the existing xpt
files. This is an easy solution to implement and should not have bad side
effects on debug or non-debug builds. I'd expect it to solve most of the false
problems we see from talkback builds being unzipped by people over existing
installations. We should *still* try to fix that (since stale dlls and xpt files
can still add noise to our crash data). But this simple change should help with
most of the cases we see caused by the xpti.dat not matching the info in the
installed xpt files.
would a blank xpti.dat file do the trick, jband? I can add that to the windows
automation pretty easily.
leaf: I believe it would (a little testing would be good:)  I'd vote for
sticking a  /n in the file to avoid any zero length file issues that *might*
come up.
To you Leaf to address the xpti.dat file.

For the heck of it, I tried this on the Mac and this approach worked there as
well.  I don't know if this affects the Mac talkbacks or not.
Assignee: dbradley → leaf
Status: ASSIGNED → NEW
*** Bug 161503 has been marked as a duplicate of this bug. ***
Leaf:  This crash hasn't been around for a while now.  Did you make the changes
to the build automation (has the xpti.dat file been created)?  If so, can we
mark this fixed?
This is exactly the kind of problem I wanted to avoid by not creating
non-installer talkback builds.  I still believe we should only be putting
talkback in the installer.

For this bug we should have used the mail feature of talkback so that anyone
that crashes with this stacktrace would get an email saying their installation
is corrupt.  Is it too late to put this in place rather than trying to paper
over the real problem?
Target Milestone: --- → mozilla1.2alpha
I haven't seen this crash for a long time.  Marking fixed.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Rubber-stamp vrfy
Status: RESOLVED → VERIFIED
Crash Signature: [@ xptiInterfaceEntry::ResolveLocked]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: