Closed Bug 351504 Opened 18 years ago Closed 17 years ago

Camino 1.1.x topcrash involving the download manager [@ libobjc.A.dylib.227.0.0] [@ nsDownloadListener::OnStateChange]

Categories

(Camino Graveyard :: Downloading, defect)

1.8 Branch
PowerPC
macOS
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED
Camino1.5

People

(Reporter: ispiked, Assigned: stuart.morgan+bugzilla)

References

()

Details

(Keywords: crash, topcrash, Whiteboard: qawanted)

Crash Data

Attachments

(6 files, 1 obsolete file)

This is showing up as the #1 topcrash for Camino 1.1.x. This is very similar to bug 322374, but that is supposedly fixed.

Incident ID: 22945047
Stack Signature	libobjc.A.dylib.227.0.0 + 0x50f0 (0x90a3e0f0) 3d809023
Product ID	Camino11
Build ID	2006090504
Trigger Time	2006-09-05 20:09:53.0
Platform	MacOSX
Operating System	Darwin 8.7.0
Module	libobjc.A.dylib.227.0.0 + (000050f0)
URL visited	
User Comments	
Since Last Crash	24373 sec
Total Uptime	31948 sec
Trigger Reason	SIGSEGV: Segmentation Violation: (signal 11)
Source File, Line No.	N/A
Stack Trace 	
libobjc.A.dylib.227.0.0 + 0x50f0 (0x90a3e0f0)
nsDownloadListener::OnStateChange()  [mozilla/extensions/transformiix/source/xslt/txXSLTNumberCounters.cpp]
nsHttpChannel::OnStopRequest()  [mozilla/netwerk/protocol/http/src/nsHttpChannel.cpp, line 713]
nsInputStreamPump::OnStateStop()  [mozilla/extensions/transformiix/source/xslt/txXSLTNumberCounters.cpp, l]
nsInputStreamPump::OnInputStreamReady()  [mozilla/extensions/transformiix/source/xslt/txXSLTNumberCounters]
nsInputStreamReadyEvent::EventHandler()
PL_HandleEvent()
PL_ProcessPendingEvents()
__CFRunLoopDoSources0()
__CFRunLoopRun()
CFRunLoopRunSpecific()
RunCurrentEventLoopInMode()
ReceiveNextEventCommon()
BlockUntilNextEventMatchingListInMode()
_DPSNextEvent()
-   -   NSApplicationMain()
_start()   start()
> libobjc.A.dylib.227.0.0 + 0x50f0 (0x90a3e0f0)
> nsDownloadListener::OnStateChange() 
> [mozilla/extensions/transformiix/source/xslt/txXSLTNumberCounters.cpp]
> nsHttpChannel::OnStopRequest() 
> [mozilla/netwerk/protocol/http/src/nsHttpChannel.cpp, line 713]
> nsInputStreamPump::OnStateStop() 
> [mozilla/extensions/transformiix/source/xslt/txXSLTNumberCounters.cpp, l]
> nsInputStreamPump::OnInputStreamReady() 
> [mozilla/extensions/transformiix/source/xslt/txXSLTNumberCounters]
> nsInputStreamReadyEvent::EventHandler()
> PL_HandleEvent()
> PL_ProcessPendingEvents()
> __CFRunLoopDoSources0()
> __CFRunLoopRun()
> CFRunLoopRunSpecific()
> RunCurrentEventLoopInMode()
> ReceiveNextEventCommon()
> BlockUntilNextEventMatchingListInMode()
> _DPSNextEvent()
> -   -   NSApplicationMain()
> _start()   start()
> 

I'm not to sure how accurate this stack is, Adam didn't we find different stack traces than this one?
Yeah, I have my doubts as well. nsDownloadListener::OnStateChange() doesn't directly call any Obj-C methods, unless something else was inlined.
Many of the comments mention that two files were being downloaded at once.
This was an interesting comment in a recent TB report:

"When a download completes and it attempts to update with a Download Completed-type message, if there is any other activity ongoing within Camino -- another download that finishes concurrently, a page loading, etc. -- the browser will crash."

Also, TB24092346x also gives a slightly different second frame (CHBrowserListener::OnLocationChange() instead), which may or may not address comment 2.

I'm still not convinced that this crash is a top crasher; "crashes in libobjc.dylib" is a topcrasher, but this particular crash is only one of many.
Target Milestone: --- → Camino1.1
Hey all, the TB comment quoted in Comment 4 was mine, actually. I see this bug ALL THE TIME. Now that I've found a bugid for it, I know where to look for updates and where to update you with info on anything else you might need to fix the problem.

Is there anything at this time I can provide you to help with root-causing this bug? This is the thorn in my backside and I will be VERY happy to see it resolved!
(In reply to comment #5)
> Is there anything at this time I can provide you to help with root-causing this
> bug? This is the thorn in my backside and I will be VERY happy to see it
> resolved!

If you find a way to reproduce it every time, using simple steps, that would help a lot.
(In reply to comment #6)
> If you find a way to reproduce it every time, using simple steps, that would
> help a lot.
> 

It tends to be intermittent. I can tell intuitively when it's coming because it's happened so many times now. Usually, I know I'm in trouble if all of the conditions are met:

(a) two or more downloads both say "less than 5 seconds remaining";
(b) I have two or more tabs open in SWM [I always do, so I don't know whether it impacts this bug];
(c) some activity is happening in the main browser window, such as scrolling, page loading, switching to another tab, etc.

The more concurrent downloads in progress, the more likely it is to encounter the bug, especially for downloads in size I would estimate between 1 MB and 50 MB; smaller downloads are less likely to cause it. Download size as a contributing factor, though, could be dependent on speed of connection.

I've encountered this bug many, many times before when I didn't bother sending in a TB, but I will definitely do so for every one now. I usually download a new branch nightly every 1-2 days, so my Camino is pretty fresh. [I'm at 100722 now.]
Billy, thanks for the info.

What I mean is that finding a way to 100% reproduce this every time will help a developer debug it, and track down the cause.

We already know what triggers the bug (downloading many files at once), but a slimmed down testcase, where you can repro it every time, would help here.
(In reply to comment #8)
> Billy, thanks for the info.
> 
> What I mean is that finding a way to 100% reproduce this every time will help a
> developer debug it, and track down the cause.

Hi Håkan,
Unfortunately I don't have any insight on that. I used to be a Unix kernel programmer myself, working in exactly the same role fixing bugs (usually other developers' regressions). I know how frustrating it is trying to fix an intermittent problem or one without a defined path to reproduce. If I do discover a definite way to reproduce it, I'll be sure to update here.
Just posted TB24446361Y with another encounter of the issue. Also, I've got screenshots of the download window and the main browser window if that would help in any way.
Another crash, with a more recent build, filed at TB24446987E. Screenshot is at http://paxoo.com/misc/camino-bug351504-screenshot03.jpeg.

Are these screenshots helping? Should I keep posting them? I'm not really sure how else I can help root-cause this bug.
Judging by the last few stacks from those screen shots, the problem originates from:

-[NSObject(GrowlCaminoPatch) myOnEndDownload:statusCode];

So people are crashing because of that growl patch. (not surprising)
Please don't ever file bugs or submit TalkBack reports when you are running haxies inside of Camino.

Is there evidence that anyone has ever seen this who wasn't running a third-party haxie, or can we just close this as INVALID?
FWIW, this bug was opened the day after GrowlCamino was introduced, so it's certainly possible that they're all growl related.
(In reply to comment #13)
> Is there evidence that anyone has ever seen this who wasn't running a
> third-party haxie, or can we just close this as INVALID?
As much as I'd love to blame this all on GrowlCamino and close this INVALID, Nick and Adam saw enough download-related crashes in Talkback on the day GrowlCamino was first released (Sept 4) to reopen the old bug 322374.  Since I didn't look at Talkback then to see if all the download-related reports came in that same day, I suppose it's possible they were all from first-day GrowlCamino users, but it does seem a bit unlikely (my experience is it takes 24 hours for Talkback's Topcrash reports to cycle...).

That said, it may also be that some other haxie is also involved here (and taking *all* the libobjc.A.dylib.227.0.0 crashes on the branch—not just ones related to downloading—there are only 25 unique users involved).

I'd love to close this now, but I think it might be prudent to watch it through a1 and make sure we don't see a spike and/or some reproducible steps not involving haxies, and then make a ruling.
These TB stacks don't seem to be helpful, and all the discussion in this bug has turned about to be about a haxied environment.  I think everything here is essentially noise, and we'd be better off opening a new bug if and when we have information suggesting there's an actual Camino bug.
(In reply to comment #16)
> These TB stacks don't seem to be helpful, and all the discussion in this bug
> has turned about to be about a haxied environment.  I think everything here is
> essentially noise, and we'd be better off opening a new bug if and when we have
> information suggesting there's an actual Camino bug.
> 

agreed.
I actually installed the GrowlCamino plugin BECAUSE of the bug. I was encountering the bug LONG, LONG before I installed it. I had hoped that by having it not open the Download window it would circumvent the bug -- at least isolating the section of code which contains the bug.

I maintain that there does, in fact, exist a bug inside Camino, and this can't simply be blamed in Growl Camino.

If it would make you happy, I'll gladly uninstall GrowlCamino and any other plugins/input managers you'd like and replicate it just as easily.
(In reply to comment #18)
> If it would make you happy, I'll gladly uninstall GrowlCamino and any other
> plugins/input managers you'd like and replicate it just as easily.

Doing that and then attaching the crashlog here (since TB isn't always especially helpful) would actually be very helpful.
Also, what are your download prefs?  Does it seem to matter what those 4 items are set to, or does it crash regardless?
(In reply to comment #20)
> Also, what are your download prefs?  Does it seem to matter what those 4 items
> are set to, or does it crash regardless?
> 

browser.download.autoDispatch false
browser.download.autoDownload true
browser.download.downloadRemoveAction 2
browser.download.progressDnldDialog.bringToFront false
browser.download.progressDnldDialog.keepAlive false
Unfortunately I'm at a conference this week and can't properly look into the GrowlCamino aspect of this for a few days.  I just wanted to say a few things:

* *Do not* submit bugs in bugzilla or crash reports in talkback if you have any plugins installed.  It sends the crash information to the wrong people, wastes their time and annoys them.  And the right person (in this case, me) never gets to hear about it.

* I will put a message in the GrowlCamino (etc) installer that explains the above

* The crashes in the screenshots are plainly GrowlCamino's fault, though I'm unable to reproduce the crash.

* I too have seen crashes implicating libojbc/objc_msgSend, in the days before GrowlCamino was even a gleam in my eye.  Since this signature is a catch-all, I suspect there is more than one issue here.

* Is it possible for onEndDownload to be called from more than one thread?  My code is not threadsafe.  Is the vanilla Camino code?  From a little probing, it looks like it is always called in the same thread, but I can't reproduce the crash either, so that tells us little.
As requested, here's a screenshot of another crash -- after having removed GrowlCamino, CamiScript, MoreCamino, UnifyCamino, and CamiTools -- showing exactly the same behavior. Also in the ZIP is the crash log generated by OS X's crashdump utility.
(In reply to comment #23)
> Created an attachment (id=242151) [edit]
> Screenshot and crashlog without GrowlCamino (or other Camino add-ons) installed
> 
> As requested, here's a screenshot of another crash -- after having removed
> GrowlCamino, CamiScript, MoreCamino, UnifyCamino, and CamiTools -- showing
> exactly the same behavior. Also in the ZIP is the crash log generated by OS X's
> crashdump utility.
> 

Thanks for this stack report, it should be useful. Are you running PPC? If so I  will wip up a build that has some extra logging to help trace this down. (Since we can't reproduce on our end).

Also do you have your hard drive cache off (Correct me if I'm wrong on my terminology here, we do have a bug on the disk drive cache if I remember correctly).
It is PPC.  I looked through the assembly for OnStateChange and DownloadDone to figure out what's going on with the stack, and DownloadDone's call to [mDownloadDisplay onEndDownload:statusCode:] has been tail call optimized, which is why we are seeing an ObjC call apparently showing up in a function that doesn't do any ObjC calling.  So we are probably dealing with DownloadDone's 
  [mDownloadDisplay onEndDownload:(NS_SUCCEEDED(aStatus) && !mUserCanceled) statusCode:aStatus];
Although from the register values, especially r5, I'm wondering if we're actually another level below that, at
  [mProgressWindowController didEndDownload:self withSuccess:completedOK statusCode:aStatus];
in onEndDownload:statusCode:
(In reply to comment #24)

> Thanks for this stack report, it should be useful. Are you running PPC? If so I
>  will wip up a build that has some extra logging to help trace this down.
> (Since we can't reproduce on our end).
> 
> Also do you have your hard drive cache off (Correct me if I'm wrong on my
> terminology here, we do have a bug on the disk drive cache if I remember
> correctly).

Yes, I'm running a PowerBook G4 1.67GHz (-mcpu=7450).

browser.cache.check_doc_frequency 3 (default)
browser.cache.disk.capacity 51200 (default)
browser.cache.disk.enable true (default)
browser.cache.disk_cache_ssl false (default)
browser.cache.memory -1 (user set)
browser.cache.memory.enable true (default)

If you want to send me over a debug build, I'll gladly test it.
(In reply to comment #25)
> It is PPC.  I looked through the assembly for OnStateChange and DownloadDone to
> figure out what's going on with the stack, and DownloadDone's call to
> [mDownloadDisplay onEndDownload:statusCode:] has been tail call optimized,
> which is why we are seeing an ObjC call apparently showing up in a function
> that doesn't do any ObjC calling.  So we are probably dealing with
> DownloadDone's 
>   [mDownloadDisplay onEndDownload:(NS_SUCCEEDED(aStatus) && !mUserCanceled)
> statusCode:aStatus];
> Although from the register values, especially r5, I'm wondering if we're
> actually another level below that, at
>   [mProgressWindowController didEndDownload:self withSuccess:completedOK
> statusCode:aStatus];
> in onEndDownload:statusCode:
> 

Not that I know ObjC or PPC assembly, but from my knowledge of C and UltraSPARC assembly, this sounds perfectly reasonable. It's also perfectly inline (pardon the pun) with what I'm seeing: even though Camino crashes on download, the download does complete and is a valid file with correct checksums; i.e., it's usable. Unfortunately, any other downloads that happen to be going on at the same time are not.
Flags: camino1.1?
Whiteboard: qawanted
Since kreeger hasn't had time, I'll try to put together and post a build that does extra logging for you to reproduce with. Sorry about the delay here.
In the first page or so of libobjc crashes, there were a couple mentioning downloading, and a couple of them actually had stacks.

I ran a query for the tell-tale download listener thing, and http://tinyurl.com/y2udtp came up with about 20 crashes since 01 Dec, which is about 1/day.  It's hard to tell how significant that is without being able to correlate unique users.

Billy, are you still seeing this crash?  If so, we'll make sure to get a logging build up for you after the holidays.  Sorry this got lost....
I'm really sorry about that. It completely fell off my radar when I got busy. Here's the logging I thought it would be interesting to see; if someone could make sure it compiles and put together a build that would be great, otherwise I'll make a build when I have a chance (probably in about a week).
(In reply to comment #29)
> Billy, are you still seeing this crash?  If so, we'll make sure to get a
> logging build up for you after the holidays.  Sorry this got lost....
> 

Hey Smokey,
I'm actually using Firefox primarily these days instead of Camino. When I uninstalled CamiTools, MoreCamino, etc., the crash happened much less frequently, but as can you can see from comment #23 it did still occur.

If there's still interest in finding and fixing this bug, I'll run a debug Camino (but miss my xpi extensions all the while) in order to help track it down. I've been successful at building my own Firefox from CVS, so I should be able to build Camino as well with your logging patch, the patron saint of gcc willing. I'll work on building it today and leave a comment if I have problems.
Billy, have you had a chance to try building Camino with the logging patch? If not, we'll make a renewed effort to get one posted for you to try, since we would definitely like to track this down now.
(In reply to comment #33)
> I've posted a PPC-only build of today's branch code plus Stuart's patch at
> http://www.ardisson.org/smokey/moz/Camino-1.1a2+logging.dmg
> 

Hi Smokey,
I have been busy and not feeling well. Personal matters always get in the way, yes? Anyway, I've just downloaded your debug build, and I'll start running with it to attempt to reproduce this bug. I'll update again by the end of the week with my status.
That didn't take long. I've got two crashes for you already. No third-party extensions installed. Just plain ol' Camino from Smokey's/Stuart's logging version. (I couldn't find a log... can you point me to it?) I'm uploading the crashlogs for the two crashes, along with a few other files you might find useful -- or not.

These crashlogs were generated with the logging version provided by Smokey in comment #33. Also attached are my org.camino.plist, prefs.js, and WindowState.plist files. (The plist files have .xml appended to play nicely everywhere.)
The logging will be in your console.log file (/Library/Logs/Console/<your user id>/console.log); if you could post everything from it containing "onEndDownload:" or "DownloadDone:" that would be great.
Here's the relevant Console.log, with a grep of "Camino":

2007-01-17 04:49:41.372 Camino[2602] DownloadDone: mDownloadDisplay is 165533840
2007-01-17 04:49:41.372 Camino[2602] onEndDownload: mProgressWindowController is 165478944
2007-01-17 04:51:41.234 Camino[2602] DownloadDone: mDownloadDisplay is 170926848
2007-01-17 04:51:41.234 Camino[2602] onEndDownload: mProgressWindowController is 165478944
Jan 17 04:51:49 Tonks crashdump[2735]: Camino crashed
Jan 17 04:51:50 Tonks crashdump[2735]: crash report written to: /Users/billifer/Library/Logs/CrashReporter/Camino.crash.log
2007-01-17 05:00:06.587 Camino[2815] DownloadDone: mDownloadDisplay is 120919952
2007-01-17 05:00:06.587 Camino[2815] onEndDownload: mProgressWindowController is 120939232
2007-01-17 05:01:03.107 Camino[2815] DownloadDone: mDownloadDisplay is 143177104
2007-01-17 05:01:03.107 Camino[2815] onEndDownload: mProgressWindowController is 120939232
2007-01-17 05:01:26.742 Camino[2815] DownloadDone: mDownloadDisplay is 121149008
2007-01-17 05:01:26.743 Camino[2815] onEndDownload: mProgressWindowController is 120939232
2007-01-17 05:02:23.019 Camino[2815] DownloadDone: mDownloadDisplay is 111635792
2007-01-17 05:02:23.019 Camino[2815] onEndDownload: mProgressWindowController is 120939232
2007-01-17 05:03:16.227 Camino[2815] DownloadDone: mDownloadDisplay is 143462784
2007-01-17 05:03:16.227 Camino[2815] onEndDownload: mProgressWindowController is 120939232
2007-01-17 05:03:55.732 Camino[2815] DownloadDone: mDownloadDisplay is 112090944
2007-01-17 05:03:55.732 Camino[2815] onEndDownload: mProgressWindowController is 120939232
Jan 17 05:04:00 Tonks crashdump[2947]: Camino crashed
Jan 17 05:04:01 Tonks crashdump[2947]: crash report written to: /Users/billifer/Library/Logs/CrashReporter/Camino.crash.log
fileName: 'Camino-debug.app'
2007-01-17 05:31:41.815 Camino[3429] DownloadDone: mDownloadDisplay is 114300176
2007-01-17 05:31:41.815 Camino[3429] onEndDownload: mProgressWindowController is 114382000
2007-01-17 05:32:58.647 Camino[3429] DownloadDone: mDownloadDisplay is 178208064
2007-01-17 05:32:58.647 Camino[3429] onEndDownload: mProgressWindowController is 114382000
2007-01-17 05:32:59.829 Camino[3429] DownloadDone: mDownloadDisplay is 114843408
2007-01-17 05:32:59.830 Camino[3429] onEndDownload: mProgressWindowController is 114382000
2007-01-17 05:33:10.117 Camino[3429] DownloadDone: mDownloadDisplay is 111470880
2007-01-17 05:33:10.117 Camino[3429] onEndDownload: mProgressWindowController is 114382000
Jan 17 05:33:14 Tonks crashdump[3534]: Camino crashed
Jan 17 05:33:15 Tonks crashdump[3534]: crash report written to: /Users/billifer/Library/Logs/CrashReporter/Camino.crash.log
fileName: 'Camino-Crash'

Note that I did NOT include the first crashlog in attachment 251781 [details] because I had mistakenly left two InputManagers (both download managers) installed. The second and third crashes, however, are valid and can be trusted; I obliterated the IMs before attempting to reproduce, then got the 2nd and 3rd crashes. As you can see from the crashlogs, there's nothing in memory for the Camino process except what should be. (I think.)
Unfortunately, the mProgressWindowController value isn't showing up anywhere in these crashes, so I guess I didn't log deeply enough. I'm going to have to make a new logging patch for each line of didEndDownload:withSuccess:statusCode: and impose on you again, I'm afraid.

I don't see how we could be crashing directly in there, but we should be able to at least figure out how far through it's getting and which method to target next.
Smokey, could you put together a new build with this?
Billy, have you had a chance to try the build in the previous comment?

TB28841913x has a comment that implies that we can crash when trying to show the Downloads window, too: 

"Trying to download a file, and the program crashed when the download manager was trying to load."
Hi Smokey, for some reason, I didn't get an email notification that you had built the "logging 2" version, so I hadn't even downloaded and run it. Thanks for pinging me on it. I'll do it now and update the bug with status as soon as I have it.
The kqueue patch exacerbated this in such a way that I'm almost positive I now see what's going wrong. I'll have a patch up soon.
Assignee: nobody → stuart.morgan
Attached patch fix for new version (obsolete) — Splinter Review
This will definitely fix the new version of this crash, which was triggered by the kqueue patch; PVCs were being dealloc'd when they asked to be removed, but then continued to execute code when |self| was no longer valid. It's a very band-aidy fix because the download logic is so convoluted I'm afraid to make more significant changes this close to a release.

Having looked at the code more I'm no longer sure that this will fix the original version of this crash, so we should keep this open and keep an eye on it.

Billy, if you happen to have hit this again with the new logging, it would still be useful to see those logs.
Attachment #255493 - Flags: review?(nick.kreeger)
Comment on attachment 255493 [details] [diff] [review]
fix for new version

Looks fine to me, at least this close to a release.

Just two tiny-nits:

>+    if ([self shouldRemoveFromDownloadList]) {
>+      [mProgressWindowController removeDownload:self suppressRedraw:NO];
>+    }
>+    else if (!mDownloadFailed) {
>+      [self setupFileSystemNotification];
>+    }

Remove curlies one the one-liners


>+  // This is sometimes called by code that thinks it can continue
>+  // to use progressDisplay. Extended the lifetime slightly as a
>+  // band-aid, but this logic should really be reworked.
>+  [[progressDisplay retain] autorelease];

Can you add '|' around progressDisplay, I know that's picky, but I think it makes the code easier to read.
Attachment #255493 - Flags: review?(nick.kreeger) → review+
Done and done. Thanks for catching the curlies; they were cruft from an earlier version of my fix where they were needed.
Attachment #255493 - Attachment is obsolete: true
Attachment #255498 - Flags: superreview?(joshmoz)
Attachment #255498 - Flags: superreview?(joshmoz) → superreview+
Comment on attachment 255498 [details] [diff] [review]
fix v2 [checked in]

Checked in on trunk and MOZILLA_1_8_BRANCH
Attachment #255498 - Attachment description: fix v2 → fix v2 [checked in]
FWIW, if anyone is still seeing this, there's an updated (PPC-only) logging build (1.1beta + "logging 2" patch) at

http://www.ardisson.org/smokey/moz/Camino-1.1b+logging.dmg

Stuart also suggested that if you see this you should try running with zombies enabled and report back with any zombie mentions from your console.log: 

MallocScribble=1 MallocPreScribble=1 MallocGuardEdges=1 NSZombieEnabled=YES /Applications/Camino.app/Contents/MacOS/Camino

(that's all one line and obviously assumes your Camino is in the standard location of /Applications)
This doesn't seem to be showing up at all in Talkback anymore, and no one's mentioned anything that I've heard.

FIXED by attachment 255498 [details] [diff] [review].

If you're still seeing this crash, please let us know (preferably with logging info obtained with the methods in the previous comment).
Status: NEW → RESOLVED
Closed: 17 years ago
Flags: camino1.1? → camino1.1+
Resolution: --- → FIXED
Crash Signature: [@ libobjc.A.dylib.227.0.0] [@ nsDownloadListener::OnStateChange]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: