Closed Bug 457024 Opened 11 years ago Closed 11 years ago

Crash during shutdown [@nsXPCWrappedJS::Release][@nsCOMPtr<calITimezone>::~nsCOMPtr<calITimezone>]

Categories

(Calendar :: General, defect, critical)

defect
Not set
critical

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: ssitter, Assigned: dbo)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file, 2 obsolete files)

Lightning 0.6a1 (ID 20080925085848) with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b1pre) Gecko/20080925034523 Shredder/3.0b1pre

Steps to Reproduce:
1. Start Thunderbird with clean profile
2. Install Lightning and restart Thunderbird
3. Create an event and exit Thunderbird

Actual Results: 
A popup dialog is displayed:

    -------------------------------------------------
    thunderbird.exe
    -------------------------------------------------
    The instruction at "0x004ba2f7" referenced memory
    at "0x000000d0". The memory could not be "read".

    Click on OK to terminate the program
    --------------------------------------------------
    OK   
    --------------------------------------------------

Expected Results: 
No error.
This is a regression caused by Bug 437944.

Regression range: Works in Sunbird 0.6a1 (2008090718)
                  Fails in Sunbird 0.6a1 (2008090818)

The crash doesn't occur if I remove the line that was added in Bug 437944:
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/calendar/base/content/calendar-management.js&rev=1.38&mark=163#153


If I attach WinDbg to my sunbird comm-central win32 build it produces the following stack trace for the crash:

00 xpc3250!nsXPCWrappedJS::Release+0x3b
     [e:\src\mozilla\js\src\xpconnect\src\xpcwrappedjs.cpp @ 232]
http://hg.mozilla.org/mozilla-central/annotate/4d57bfc569f5/js/src/xpconnect/src/xpcwrappedjs.cpp#l232

01 xpcom_core!nsXPTCStubBase::Release+0xd
     [e:\src\mozilla\xpcom\reflect\xptcall\src\xptcall.cpp @ 66]
http://hg.mozilla.org/mozilla-central/annotate/4d57bfc569f5/xpcom/reflect/xptcall/src/xptcall.cpp#l66

02 calbscmp!nsCOMPtr<calITimezone>::~nsCOMPtr<calITimezone>+0x1f
     [e:\obj\sb-dbg\mozilla\dist\include\xpcom\nscomptr.h @ 525]
http://hg.mozilla.org/mozilla-central/annotate/4d57bfc569f5/xpcom/glue/nsCOMPtr.h#l525

03 calbscmp!doexit+0xab
     [f:\rtm\vctools\crt_bld\self_x86\crt\src\crt0dat.c @ 553]

...
Blocks: 437944
Keywords: regression
Berend, could you take a look at this?
Flags: blocking-calendar1.0+
Duplicate of this bug: 459486
Summary: Crash or Invalid memory read during shutdown → Crash during shutdown [@nsXPCWrappedJS::Release][@nsCOMPtr<calITimezone>::~nsCOMPtr<calITimezone>]
Crashes are critical issues esp. if they can be reproduced.
Severity: major → critical
Attached patch fix - v1 (obsolete) — Splinter Review
Assignee: nobody → daniel.boelzle
Attachment #344250 - Flags: review?(Berend.Cornelius)
OS: Windows XP → All
Hardware: PC → All
Blocks: 460408
Comment on attachment 344250 [details] [diff] [review]
fix - v1

>     setStatusObserver: function(aStatusObserver, aWindow){
>+        if (this.mStatusObserver) {
>+            this.mStatusObserver.init(null);
>+        }
>         this.mStatusObserver = aStatusObserver;
>+        if (aStatusObserver) {
>+            aStatusObserver.initialize(aWindow);

init vs initialize?
Comment on attachment 344250 [details] [diff] [review]
fix - v1

no, a typo ;-)
Attachment #344250 - Attachment is obsolete: true
Attachment #344250 - Flags: review?(Berend.Cornelius)
What's particularly strange is:
If I comment out the mentioned line, then the program no longer crashes. If (in addition) I comment out calendar-statusbar.js's gCalendarStatusFeedback, it crashes (again).
This makes me guess gCalendarStatusFeedback (being held by the composite calendar) acts as some type of glue holding maybe holding other stuff that relies on it. Even if I strip down gCalendarStatusFeedback to not hold any references to the status bar elements, it still crashes.
Duplicate of this bug: 461193
I found the underlying error here, with some help from biesi: we use static nsCOMPtr's in calUtils.cpp, which is evil. Taking those accessors apart got rid of the segfault.

Daniel, if you want to take over the actual fix, go ahead. I'll take care on the weekend or next week otherwise.
After a nights sleep, I've come to the conclusion I'd rather like to fix this myself to brush up on my mozilla C++ :-)
Assignee: daniel.boelzle → philipp
Thanks. Static nsCOMPtr's per se shouldn't do harm, if (and only if) they refer to a free-store object. This has been initially the case. But since we've shifted the timezone objects into js land some months ago (timezone service and sqlite), those now refer to a wrapped js object. And since the js engine presumably dies before the static de-initializer list runs, we crash. Is that the case?

What's wondering me however is why this seems related to the statusbar code. I'd expect to run into the above problem either way, since mozilla doesn't end using _exit()?

Finally, the static's have been used on purpose to cache commonly used UTC and floating. Please don't just remove those, but install an xpcom-shutdown handler to reset them.
> What's wondering me however is why this seems related to the statusbar code.

As mentioned in Bug 458190 Comment #5 and Bug 458190 Comment #6 the same crash seems to happen for xpcshell.exe when running the unittests. Maybe the statusbar code only changes timing or order of object destroying.
(In reply to comment #13)
> > What's wondering me however is why this seems related to the statusbar code.
> 
> As mentioned in Bug 458190 Comment #5 and Bug 458190 Comment #6 the same crash
> seems to happen for xpcshell.exe when running the unittests. Maybe the
> statusbar code only changes timing or order of object destroying.

The statusbar thing definitely has impact, because commenting the mentioned line out made the crash disappear.
Speaking of timing here doesn't convince me w.r.t. de-initializer list, because I assume it is filled up and executed the same order in both cases. I more think the (leaking) statusbar code hinders the whole js engine from shutting down (though I don't know enough whether the js engine really cares on shutdown), and thus the nsCOMPtr static can be desctructed properly.
Attached patch let it leak (obsolete) — Splinter Review
Another possibility that Philipp saw in some other code is to let the wrapped js object statics just leak (in a documented way though).
Duplicate of this bug: 461648
I wonder if we really need to cache the services. afaik, xpcom itself already does that (for services). For javascript, the win is in that you don't want to do xpconnect wrapping every time. But for c++ code, that's not an issue.
I want benchmarks ;)
(In reply to comment #17)
It's not only services (which end up being a hash lookup, too), but also getting the UTC and floating timezone object which would cross xpconnect every time. We could also consider to implement those objects in native code (as it's initially been), because those are used mainly from native code.
> I want benchmarks ;)
How would those look like? I agree we need to avoid premature optimization, but since date-time is a heavily used core object, any optimization is welcome.
Given that the number of C++ callers is limited (most code is javacsript), can we get away with chaching the timezones in the callers itself? It won't be duplicated that much, and fixes the problem by storing the pointers in classes, not as a static.
This of course depends on mostly having services call those utility functions. If a often-instantiated object uses the utility functions, this ideas reduces the use of the cache.

about benchmarks: I have no real idea. Hence the smiley.
Attached patch fix - v1Splinter Review
Thinking further, I think we should get rid of the statics now (and fix the bug, because it's annoying), and watch out for performance impacts (as always). I doubt that those getters are a real hotspot, anyway.
This patch removes the statics, and adds some minor additional tweaks.

@mvl: As far as I see that doesn't buy us anything, because the native calDatetime code would need to use a static, too for UTC/floating.
Assignee: philipp → daniel.boelzle
Attachment #344754 - Attachment is obsolete: true
Attachment #344805 - Flags: review?(philipp)
Status: NEW → ASSIGNED
Comment on attachment 344805 [details] [diff] [review]
fix - v1

>+/**
>+ * Gets the global console service.
>+ */
>+inline nsCOMPtr<nsIConsoleService> getConsoleService() {
>+    return do_GetService("@mozilla.org/consoleservice;1");
>+}
I'd prefer names like do_GetTimezoneService(), similar to do_GetIOService(). We also might want to move those functions to the actual service for our services. Also consider the following alternative:

inline const nsGetServiceByContractIDWithError
do_GetTimezoneService(nsresult* error = 0)
{
    return nsGetServiceByContractIDWithError(CAL_TIMEZONESERVICE_CONTRACTID, error);
}


>+inline nsCOMPtr<calITimezone> UTC() {
>+    nsCOMPtr<calITimezone> tz;
>+    getTimezoneService()->GetUTC(getter_AddRefs(tz));
>+    return tz;
>+}
> 
> /**
>  * Gets the "floating" timezone
>  */
>+inline nsCOMPtr<calITimezone> floating() {
>+    nsCOMPtr<calITimezone> tz;
>+    getTimezoneService()->GetFloating(getter_AddRefs(tz));
>+    return tz;
>+}
If you already addref these, shouldn't it be inline already_AddRefed<calITimezone> floating() ?

r=philipp
Attachment #344805 - Flags: review?(philipp) → review+
(In reply to comment #21)
> I'd prefer names like do_GetTimezoneService(), similar to do_GetIOService(). We
> also might want to move those functions to the actual service for our services.
I don't like those names, especially w.r.t. the |cal| namespace, i.e. cal::do_Getsomething(). I think the current cal::getSomething() reads better.

> Also consider the following alternative:
> 
> inline const nsGetServiceByContractIDWithError
> do_GetTimezoneService(nsresult* error = 0)
> {
>     return nsGetServiceByContractIDWithError(CAL_TIMEZONESERVICE_CONTRACTID,
> error);
> }

This disallows to directly use the function for calling, e.g. getTimezoneService()->GetTimezone(...).


> >+inline nsCOMPtr<calITimezone> UTC() {
> >+    nsCOMPtr<calITimezone> tz;
> >+    getTimezoneService()->GetUTC(getter_AddRefs(tz));
> >+    return tz;
> >+}
> > 
> > /**
> >  * Gets the "floating" timezone
> >  */
> >+inline nsCOMPtr<calITimezone> floating() {
> >+    nsCOMPtr<calITimezone> tz;
> >+    getTimezoneService()->GetFloating(getter_AddRefs(tz));
> >+    return tz;
> >+}
> If you already addref these, shouldn't it be inline
> already_AddRefed<calITimezone> floating() ?
This would lead to leaks unless users explicitly construct an nsCOMPtr<> out of it.

I think we should stick to using nsCOMPtr<> as before, because it's safe to use.

Pushed to comm-central <http://hg.mozilla.org/comm-central/rev/043bc1d53edb>

-> FIXED
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → 1.0
VERIFIED FIXED using Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b3pre) Gecko/20081127 Lightning/1.0pre Shredder/3.0b1pre.
Status: RESOLVED → VERIFIED
Target Milestone: 1.0 → 1.0b1
Crash Signature: [@nsXPCWrappedJS::Release] [@nsCOMPtr<calITimezone>::~nsCOMPtr<calITimezone>]
You need to log in before you can comment on or make changes to this bug.