Closed Bug 24312 Opened 25 years ago Closed 24 years ago

some component has bad registration behavior (optimized mac RegXPCom crashes)

Categories

(Core :: XPCOM, defect, P3)

PowerPC
Mac System 8.5
defect

Tracking

()

RESOLVED DUPLICATE of bug 64978
Future

People

(Reporter: jj.enser, Assigned: sfraser_bugs)

References

Details

(Keywords: crash, helpwanted)

Attachments

(2 files)

"PowerPC unmapped memory exception at NQDGetPort+00048" Bumping up the memory partition from 10 to 12 MB puts the app back on its feet.
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Target Milestone: M13
fixed. minimum/preferred memory size bumped up to 16MB in mozilla/xpcom/tools/registry/macbuild/RegXPCOM.mcp
Reopening since RegXPCOM crashed again in today's verification. I will bump the executable memory up to 18 MB as a temporary fix, but we need a better long-term solution.
Status: RESOLVED → REOPENED
dp, can you help?
Assignee: jj → dp
Status: REOPENED → NEW
Resolution: FIXED → ---
This is purely a mac binary executable issue. I have not clue. We should do what we do to navigator in this regard (ie) if we increase nav, we should increase regxpcom too. Scc/Simon Fraser, can you help.
Assignee: dp → scc
Target Milestone: M13 → M14
adding sfraser to the cc list. This is becoming critical, as we now still crash with a 20MB partition.
Assinging to simon fraser. Simon, scc is not in town until tomorrow. If you can help jj that would be super.
Assignee: scc → sfraser
it might be useful to turn off all optimization, just to make sure we aren't getting killed by the compiler trying to be too clever.
Well, this is heinous. I found a serious bug in nsLocalFileMac.cpp, where a full path handle was assumed to be null terminated, when it was not. This could have caused all kinds of random behaviour. Here's a diff for that: Index: nsLocalFileMac.cpp =================================================================== RCS file: /cvsroot/mozilla/xpcom/io/nsLocalFileMac.cpp,v retrieving revision 1.3 diff -r1.3 nsLocalFileMac.cpp 940d939 < OSErr err; 942,944c941,942 < err = FSpGetFullPath(&mResolvedSpec, &fullPathLen, & fullPathHandle); < *_retval = (char*) nsAllocator::Clone(*fullPathHandle, fullPathLen+1); < DisposeHandle(fullPathHandle); --- > (void)::FSpGetFullPath(&mResolvedSpec, &fullPathLen, &fullPathHandle); > if (!fullPathHandle) return NS_ERROR_OUT_OF_MEMORY; 946c944,945 < ((*_retval)+fullPathLen)[0] = 0; --- > char* fullPath = (char *)nsAllocator::Alloc(fullPathLen + 1); > if (!fullPath) return NS_ERROR_OUT_OF_MEMORY; 947a947,952 > ::HLock(fullPathHandle); > nsCRT::memcpy(fullPath, *fullPathHandle, fullPathLen); > fullPath[fullPathLen] = '\0'; > > *_retval = fullPath; > ::DisposeHandle(fullPathHandle); But even with this fix, RegXPCOM is still crashing in non-debug builds.
Status: NEW → ASSIGNED
cc-ing sdagley. Steve, are there other cases when strings are assumed to be terminated? Steve, can you work with simon to fix these problem?
dougt: the deal here is that FSpGetFullPath() returns a Mac handle, which is not null-terminated. I used lxr to look for other calls to this function, and this is the only one that looks bad.
That patch above has been checked in to nsLocalFileMac.cpp. The quit crash remains, however.
smfr, do you have a stack crawl?
It crashes in a call to free() coming out of JS garbage collection.
ok, we got this now. sdagley was writing one off the end of a buffer in nsLocalFileMac.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago25 years ago
Resolution: --- → FIXED
Adding crash keyword
Keywords: crash
Jan, is this reopened??
I don't think so. I haven't seen RegXPCom crashing on the release build mac for a while. Jan just updated the keyword
actually, this did just happen again this morning. we'll have to wait and see if it continues to be a problem before revisiting this.
Reopening. This happened again this morning. We have no mac verification builds.
Severity: critical → blocker
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: M14 → M16
Putting on smoketest keyword.
Keywords: smoketest
do you have a stack crawl?
what kind of stack crawl? output from stdlog?
And this shouldn't stop delivery of builds. If you run the browser, it should generate the component.reg that you can bundle.
that would be fine. just attached the stdlog to this bug.
the mac goes into macsbug, you es out of it, various things time out, then you try to do anything at all and it crashes. repeat as necessary. unless you catch the regxpcom crash almost immediately, it is impossible to get a build. I'll get a stdlog the next time it crashes.
This really isn't my bug. Reassigning to xpcom owner.
Assignee: sfraser → dp
Status: REOPENED → NEW
Putting on [dogfood+] radar
Whiteboard: [dogfood+]
Scott can you handle this.
Assignee: dp → scc
I'm on it.
Status: NEW → ASSIGNED
not going to hold the tree for this, removing smoketest keyword.
Keywords: smoketest
RegXPCOM didn't crash today during verif. builds... Scott, is it magic or did you do something about it? (don't see any checkin though)
yes it did. there's a stdlog from this morning on the desktop. i just caught it in time so we didn't have to respin and all the builds finished.
ok, here's the current status: - regXPCOM keeps crashing every morning when ran against the mozilla build (2nd attempt against ns build succeeds though) - no stdlog possible as Macsbug sez "File system busy" - unless we don't "es" from Macsbug quickly enough (within 5 minutes after the crash), the entire build process times out and no builds are delivered to QA Given all this, I removed the call to RegXPCOM from the verification build script until it gets _really_ fixed. This will allow the morning verification build to be delivered in a timely manner, while it will significantly increase the startup time of the app.
Whiteboard: [dogfood+] → [dogfood+] regXPCOM removed from Mac build
Scott can you provide status on this. It has been 3 days since this got assigned to you and "blockers" get priority over everything else.
This shouldn't really be marked `blocker' since we have a work-around in place. I think I agree with the `dogfood+' label. I'm kind of at a loss, so far. Everyone who has looked at this bug has come up empty. I will continue to work on it, but I all easy avenues have been explored.
Scott, the workaround is costly (startup time) and the fact that all "easy avenues" have been explored is the very reason why we now need to dig further. I just want to remind all that this has been bugging us for over 4 months now... If dp agrees that we can leave without running RegXPCOM in the Mac release build, then we might as well mark this bug invalid and forget about it. otherwise, let's nail this baby! We discussed about trying to run a debug build of RegXPCOM against one of the morning verification builds (optimized) where it usually crashes. Would that help making some progress?
I would like to keep this a blocker. The goal here was that this step helps improve installation and startup time immensely. Autoreg, a costly step, wont happen at the customer's machine at all. Even if it did, wont take as much time as no dlls would have changed. Scott, i think we should try fixing it. You said this was optimized only. Do you have a stack trace. I dont see any reason on why this failing from you.
mass re-assigning to my new bugzilla account
Assignee: scc → scc
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
Why can't we just generate the components registry by just running the app once?
(the app == mozilla or Netscape)
This is not a dogfood bug. According to dveditz@netscape.com, the pregenerated reg file is being removed from the build. According to him, we should (or will soon be in the state of) not be running RegXPCom anymore. Yes, whatever module is causing the bustage should be found and fixed, but this is definitely _not_ dogfood. Taking RegXPCom out of the build process is the real fix to this blocking situation, according to dveditz. This bug needs to transform into a low priority bug to find and fix whatever component it is that's screwing us up at the moment. Please remove the [dogfood+] status. The remaining low priority task can probably be assigned to rayw as the owner of components ... once he knows which component is failing (dp suggests using a binary search of the component space :-) he can reassign the bug to the owner of the bad piece.
So... based on conversation and email exchange, I'm removing [dogfood+] status from this bug, lowering the priority, changing the summary and target milestone, re-assigning it to rayw, and cc'ing sgehani@netscape.com (samir). This is the correct action, right? dp? dveditz? samir?
Assignee: scc → rayw
Severity: blocker → normal
Status: ASSIGNED → NEW
Summary: mac RegXPCom crashes during verification build → some component has bad registration behavior (optimized mac RegXPCom crashes)
Whiteboard: [dogfood+] regXPCOM removed from Mac build
Target Milestone: M16 → M17
Not so sure this is low priority. As dp points out, the pre-generated component registry is shipped because the initial Mac browser startup time was rather sluggish (due to autoreg). A couple of reasons not to ship the pre-generated component registry: 1> So we pick up third party components (but, this happens anyways since we check timestamps during autoreg every time we startup and hence pick up new components -- needs clarification as to why it is a problem). 2> A larger registry means a larger footprint since we hold multiple copies of the registry in memory as I recall based on a discussion with Dan (dveditz). The argument here is that if we only install navigator (say without mail etc.) then we only have the navigator components registered (smaller registry size == lower memory footprint). However, I don't know how relevant this really is: if it is the case that majority of the components are installed when we install navigator only, the registry size savings may not be siginificant if we autoreg at install time. As of now, dp's argument is making a lot of sense and I believe we should be shipping a pre-generated component registry. But, there's probably information I'm missing that Dan/dp may fill in.
Renominating for dogfood. If it is important that we ship the Mac with a component.reg then the build process must not regularly crash while creating it. I'm concerned about blaming a component for the crash -- if that were so why wouldn't the product itself crash? Or perhaps it means the Component Registry is being created in the *build* directory with all the test crap rather than a deliver/staging area, which will bloat the registry horribly.
Keywords: dogfood
Putting on [dogfood-] radar. Was the beta1 performance contingent upon this. We must be equal to beta1 performance.
Whiteboard: [dogfood-]
Unless we're planning on doing away with autoreg, we'll have to ship a components.reg file. The whole point of regxpcom is to improve startup time. startup time on the mac is atrocious. we can take this out of the build process, it already is out of the build process. as soon as we do the next set of performance tests though, people are going to scream about the mac startup time and we'll be right back here again.
No-one answered my earlier question. Why can't we just run Mozilla/Netscape to generate the component registry?
We've had long discussions over email. The outcome was that we should keep RegXPCOM'ing at build time and ship a minimal Components Registry on the Mac. JJ is aware of this. Rayw was cc'ed on the discussion. If we don't ship a Components Registry we will regress in performance compared to beta1. Please consult dp if you have further questions. Nominating for nsbeta2 since it is arguably not dogfood. JJ, Is it possible for us to execute on Simon's suggestion in your build system? If so, please remove the nsbeta2 nomination and swap in running RegXPCOM with running the app. Thanks.
Keywords: nsbeta2
I'm a bit skeptical running Netscape for the following reasons: 1) it doesn't quit automatically, like RegXPCOM did. How do I know when to continue with the packaging? Then, how do I quit Netscape (does it support the 'quit' appleevent yet?) 2) Running netscape has proven to be risky in the past. I don't want to hose the packaging process because of an obsolete profile, prefs, or registry. On the other hand, if Netscape is DOA, I'll be the first to know, and there's no real need to package it! 3) If I can 'safely' run RegXPCOM against a _reduced_ set of componenets like we discussed, why not go this way ? We can try this solution "manually" to see if it works. All I need is the minimum set of components that must be pre-registered.
Jonathan: <startup time on the mac is atrocious> is it really that bad? do we have precise data on this and did we measure the kind of improvement obtained with an existing Registry versus without? Sorry to re-open the debate!
you'd have to ask leger as I haven't seen startup times in a while and wasn't able to find them on the QA page. But as I recall the difference in startup between having a registry and not was significant. Easy way to test it would be to start it up with a registry, then remove it and restart it, and compare the times.
If I understand the comments on this correctly: 1. Dogfood should be removed as a keyword. 2. The problem to be solved is a poorly-behaved component that crashes on Mac, which I do not currently have the ability to do a binary test on and find the misbehaved component, lacking the hardware. 3. Is there still a debate on whether a registry should be precreated? Would a registry creator that makes a smaller footprint than Netscape be of any use here?
Status: NEW → ASSIGNED
No. There is no debate. 1> We would like RegXPCOM to work because jj has found running the mozilla binary to be unreliable for his automated release build process. 2> This should not be dogfood but certainly nsbeta2 (based on plenty of mail with plenty of qualified folks).
Putting on nsbeta2- radar. We will be focusing on performance in beta 3. Reassigning to sfraser. Ray has no mac.
Assignee: rayw → sfraser
Status: ASSIGNED → NEW
Keywords: nsbeta3, perf
Whiteboard: [dogfood-] → [dogfood-], [nsbeta2-]
there are work arounds for this issue and is specific to making builds, this will not be encountered by the end user, marking as later
Status: NEW → RESOLVED
Closed: 25 years ago25 years ago
Resolution: --- → LATER
ok. just remember this bug when we start doing startup performance testing again and the mac takes a minute or longer to start, and that's on a fast system...
If you can get me a reproducible case for this bug, where I can run RegXPCOM by hand and consistently see a crash, then I'd be glad to look at it. As it stands, the bug is too hard to track down. And I still don't understand why the workarounds I suggested above (i.e. just run Mozilla/Netscape to generate the components reg) were rejected.
Reopening bug. Just because it's a build issue doesn't mean it's not a problem.
Status: RESOLVED → REOPENED
Resolution: LATER → ---
Marking milestone "Future" in the spirit of the "LATER" resolution. Simon, the reason we can't just run Mozilla/Netscape 6 to generate this is that cannot be automated as part of the build process: we could start Mozilla but it wouldn't shut down and the build would not continue. regxpcom shouldn't be doing anything differently than AutoReg in Mozilla, so the crash is quite odd.
Target Milestone: M17 → Future
If we can't run then quit mozilla using AppleEvents then that's a bug that we should certainly fix.
Well *that's* an interesting thought. Couldn't really do that on win or linux so I didn't think about it, but I guess it could be a workaround for the Mac build process. Doesn't obviate the need to make sure regxpcom works, though, as we should be shipping that so developers can register drop-in components. AutoReg does not run in optimized builds to detect new components. AutoReg will run if people use XPInstall to add the components, but not everyone is going to want to do that.
a little more input: - bug reproducibility: it crashes on first run after build completion (_every_ build). subsequent runs of RegXPCOM are successful. I noticed that even after a crash I get a Component Registry file, about 20K shorter than if the process completes. Maybe analyzing that incomplete registry would give us a better idea of where the crash occurs. Simon, you can always poke around with the build machine after the verification build is delivered and see the bug in action. - Mozilla/Netscape scriptability: I haven't tried to see if actually supports the quit AppleEvent, but assuming that it does, how would the script know when the startup process is done before quitting the app and continuing the packaging?
Mozilla won't quit until it has created the components registry, because it doesn't get to the main event loop until after that. So a simple script to run then quit should just work. See, however, bug 43163.
jj: please send a copy of the incomplete registry to dveditz, so he can analyze where the failure occurred.
After spending some hours debugging this on the verification machine, I have a better understanding of what is going on. First, the bug is reproducible only the first time you run RegXPCOM after rebooting the machine. The crash occurs after all component DLLs have been registered, and while we are registering JS components. We actually crash while reading in default pref files, which happens because the JSLoader code load the scriptsecuritymanager service, which in turn loads the prefs service. I'll attach the call stack just before we crash. What's odd is that we don't crash loading the first default prefs file, but the 2nd one after sorting them (mailnews.js).
Why the crash occurs I have no idea. Perhaps there are other NSPR threads running, and we yield in the async read call to another thread? Or maybe there is some stack corruption going on here. It's too early to tell.
CC'ing mstoltz in case the scriptsecuritymanager service involvement was more than coincidental.
clean up keywords, adding help wanted
Whiteboard: [dogfood-], [nsbeta2-]
adding help wanted keyword
Status: REOPENED → ASSIGNED
spam: Adding crash keyword...
Keywords: crash
Blocks: 43000
Chris, did you mean bug 46000 instead of 43000?
Yes, I did. Thanks, dveditz.
Blocks: 46000
No longer blocks: 43000
OK, here's how to make it not crash -- 2 options. 1. Don't have RegXPCOM register JS compnents. 2. Move the default prefs folder out of the way before running RegXPCOM so that it doesn't try to load the default prefs files.
(2) is in my reach. Why does regxpcom need to load default prefs files ? Isn't supposed to focus on XPCOM components?
Temporary fix: - rename viewer/Defaults to <anything> to avoid hitting Defaults/Pref - run RegXPCOM -> no crash, Component Registry created - rename "Defaults" back. After testing this manually with success, I updated the release build automation and will watch it over the weekend. Even though this is an ugly patch, it will get us moving forward and include a component.reg with the mac build, hence reducing initial startup time. Feel free to mark this bug fixed if this solution is good enough in the long run.
So now we just need to add the component registry back into the package list (browser.xpi section) and we're all set. Ideally this would be a component reg for the browser components only to cut down on unnecessary footprint for mail and aim components for users who don't install those options, but I'll take what we can get at this point.
Bad news: the patch in place (renaming the 'Defaults' folder before running RegXPCOM) worked fine for just a week... but we're dealing with a tough guy here. RegXPCOM is crashing again every now and then (mostly now :-) However, this time the crash doesn't seem as deep now as it used to, cuz I can log a stack trace. I attach one from today's crash: http://bugzilla.mozilla.org/showattachment.cgi?attach_id=13327
Is this bug the reason there is no regxpcom installed with mac moz/n6? If so, this blocks xpcom plugin uninstallers (plugins can be removed but since the plugin reg entries still exist, moz/n6 thinks the plugin still exists). If not, sorry for the spam.
I don't think regxpcom was ever designed to be shipped with either mozilla or ns6. it's an internal tool whose only purpose is to generate a component registry without having to launch the app. no installer / uninstaller should refer to regxpcom. Simon, you can mark this as "worksforme" if you think it's ok now that #46000 if fixed (running the app instead of regxpcom)
Right; RegXPCOM was never meant to be external. jj: I'd still like to understand why this happens. If it crashes RegXPCOM, it'll probably crash someone's embedding app at some point.
thanks for setting me straight. Does anyone have any ideas how to unregister an xpcom component? Is deleting the component registry the preferred way of unregistering xpcom components?
dp is no longer @netscape.com. reassigning qa contact to default for this component
QA Contact: dp → rayw
Depends on: 64978
No longer depends on: 64978
For the skinny on this, see bug 64978.
Dupping this to bug 64978, which contains much better data. *** This bug has been marked as a duplicate of 64978 ***
Status: ASSIGNED → RESOLVED
Closed: 25 years ago24 years ago
Resolution: --- → DUPLICATE
Component: XPCOM Registry → XPCOM
QA Contact: rayw → xpcom
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: