598007 - Start-up crash under Windows XP [@ nsDiskCacheMap::Open(nsILocalFile*) ]

Reporter

Description

•

15 years ago

Build : Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b7pre) Gecko/20100919 Firefox/4.0b7pre This is a residual crash signature that exists in trunk builds. It is #57 top crasher for 4.0b7pre for the last two weeks. Signature nsDiskCacheMap::Open(nsILocalFile*) UUID c6ceb095-65f9-447c-8768-85eed2100920 Time 2010-09-20 06:20:51.277412 Uptime 1 Last Crash 2 seconds before submission Install Age 40712 seconds (11.3 hours) since version was first installed. Product Firefox Version 4.0b7pre Build ID 20100919042023 Branch 2.0 OS Windows NT OS Version 5.1.2600 Service Pack 3 CPU x86 CPU Info GenuineIntel family 6 model 23 stepping 6 Crash Reason EXCEPTION_ACCESS_VIOLATION_READ Crash Address 0xffffffff80000000 App Notes AdapterVendorID: 10de, AdapterDeviceID: 0622 Crashing Thread Frame Module Signature [Expand] Source 0 @0x80000000 1 xul.dll nsDiskCacheMap::Open netwerk/cache/nsDiskCacheMap.cpp:155 2 xul.dll nsDiskCacheDevice::OpenDiskCache 3 xul.dll nsDiskCacheDevice::Init netwerk/cache/nsDiskCacheDevice.cpp:384 4 xul.dll nsCacheService::CreateDiskDevice netwerk/cache/nsCacheService.cpp:1305 5 xul.dll nsCacheService::SearchCacheDevices netwerk/cache/nsCacheService.cpp:1718 6 xul.dll nsCacheService::ActivateEntry netwerk/cache/nsCacheService.cpp:1627 7 xul.dll nsCacheService::ProcessRequest netwerk/cache/nsCacheService.cpp:1490 8 xul.dll nsProcessRequestEvent::Run netwerk/cache/nsCacheService.cpp:913 9 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:547 10 xul.dll nsThread::ThreadFunc xpcom/threads/nsThread.cpp:263 11 nspr4.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:426 12 nspr4.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:122 13 mozcrt19.dll _callthreadstartex obj-firefox/memory/jemalloc/crtsrc/threadex.c:348 14 mozcrt19.dll _threadstartex obj-firefox/memory/jemalloc/crtsrc/threadex.c:326 15 kernel32.dll BaseThreadStart

Benjamin Smedberg

Updated

•

15 years ago

Assignee: nobody → honzab.moz

Keywords: regression

Honza Bambas (:mayhemer)

Comment 1

•

15 years ago

Looks like something rotten for already a long time, apparently a race condition, there are similar much older reports from this area of code: http://crash-stats.mozilla.com/report/index/b0cc7822-a4cb-429e-b758-cdea22100906 http://crash-stats.mozilla.com/report/index/9e270a67-27f6-4b4c-b98b-9b16f2100918 http://crash-stats.mozilla.com/report/index/97cc456a-1361-48b9-bd8f-bb9662100830 http://crash-stats.mozilla.com/report/index/07761768-0cbf-492d-8b3e-7d0242100907 Something just woken this up to happen more often. Will look for the regression range.

Jason Duell

Assignee

Comment 2

•

15 years ago

I wouldn't be surprised if this has something to do with the smart_size changes in bug 559942, though I don't exactly see how. If we're lucky this *might* be fixed by bug 596476 or 595413. Alas, the former won't make it into Beta7.

Scoobidiver (away)

Reporter

Updated

•

15 years ago

Depends on: 595413, 596476

chris hofmann

Comment 3

•

15 years ago

#2 top crash in 4.0b7pre early data from yesterday. we should figure out how to mitigate this. is bug 596476 still on track? sounds like bug 595413 is now fixed as of the sept 15. need to check deeper to see if that fix as helped but looks like it may not have.

blocking2.0: --- → ?

chris hofmann

Comment 4

•

15 years ago

actually looks like this got worse on builds from sept 17. maybe after trunk users got the patches in https://bugzilla.mozilla.org/show_bug.cgi?id=596476#c7 or https://bugzilla.mozilla.org/show_bug.cgi?id=595413#c8 date tl crashes at, count build, count build, ... nsDiskCacheMap::Open.nsILocalFile.. 20100910 2 3.62010011514, 2 ,, 20100911 ,, 20100912 2 3.0b12007110904, 2 ,, 20100913 16 ,, 13 3.0b12007110904, 2 3.0.52008120122, 1 3.0b22007121120, 20100914 14 ,, 8 3.0b22007121120, 5 3.0b12007110904, 1 3.6.92010082415, 20100915 2 ,, 1 3.6.92010082415, 1 3.62010011514, 20100916 2 3.6.92010082415, 2 ,, 20100917 12 ,, 10 4.0b7pre2010091704, 1 4.0b62010091408, 1 3.0b12007110904, 20100918 8 ,, 7 4.0b7pre2010091704, 1 3.0b12007110904, 20100919 46 ,, 42 4.0b7pre2010091704, 3 4.0b62010091408, 1 3.6.102010091412, 20100920 79 , 67 4.0b7pre2010091704, 7 4.0b7pre2010091904, 4 3.0b12007110904, 20100921 163 , 67 4.0b7pre2010091704, 60 4.0b7pre2010092004, 31 4.0b7pr20100919

Jason Duell

Assignee

Comment 5

•

15 years ago

Notes: 1) every instance of this crash is happening only on "Windows NT 5.1.2600 Service Pack 3". I'm guessing this reduces the importance of this bug, though we should obviously fix it. 2) It seems to be causing fewer crashes in the last few days. But that could be an artifact of jitter in the number of NT 5.1 boxes running beta7--I don't know how many such boxes there are out there, so we may have high variance. 3) Honza is correct in comment 1 that this bug has been triggered for a while. The main change seems to be that it used to get (infrequently) hit via a synchronous codepath from AsyncOpen->Connect->OpenCacheEntry->ProcessRequest, whereas now it's getting (more frequently) hit via the async cache read path created in bug 513008 (eliminate sync reads from cache). So if we get desperate, that's the bug to back out (I'd really hate to back that out, though) Still looking into the cause of this by poring over the stack trace. One thing I don't understand is the segfault happening at nsDiskCacheMap.cpp:155: that's a function call, and shouldn't segfault. How accurate are our crash stack traces (I notice there's a frame 0 with just an addr listed). I assume we're crashing in OpenBlockFiles() somewhere.

Honza Bambas (:mayhemer)

Comment 6

•

15 years ago

(In reply to comment #5) > So if we get > desperate, that's the bug to back out (I'd really hate to back that out, > though) I would like to avoid that too. > How accurate are our crash stack traces On windows you get the cursor on a stack put after the line it is being executed. So, you have to find a line executed "manually" by going upward in the source code.

Jason Duell

Assignee

Comment 7

•

15 years ago

> On windows you get the cursor on a stack put after the line it is being executed. So, you have to find a line executed "manually" by going upward in the source code. Sorry, having trouble understanding. So the stack trace is 0 @0x80000000 1 xul.dll nsDiskCacheMap::Open netwerk/cache/nsDiskCacheMap.cpp:155 2 xul.dll nsDiskCacheDevice::OpenDiskCache And line 155 is a call to OpenBlockFiles(). So are you saying that means the crash happened somewhere in OpenDiskCache, or in nsDiskCacheMap::Open somewhere above line 155?

Status: NEW → ASSIGNED

Benjamin Smedberg

Comment 8

•

15 years ago

Is this breakpad, or MSVC? breakpad often skips the next-to-top frame when the top frame is a numeric address.

chris hofmann

Comment 9

•

15 years ago

this should probably block b7. Its now the #1 topcrash in b7pre and a regression from b6. can someone mark blocking status so we make sure its on the release radar?

Johnny Stenback (:jst)

Comment 10

•

15 years ago

Blocking beta7.

blocking2.0: ? → beta7+

Keywords: topcrash

Jason Duell

Assignee

Comment 11

•

15 years ago

Oddly enough, sometimes the error is EXCEPTION_ACCESS_VIOLATION_READ, and sometimes EXCEPTION_ACCESS_VIOLATION_EXEC. Has anyone ever seen that before? Given that all errors are on x86 systems, which don't even support separate read/exec page permissions, is that a red herring? FWIW this looks like the same as bug 595957 (which goes back as far as 3.0b1): it also seems to be affecting only Windows NT machines in Russia, and has essentially the same stack trace. The only difference I see is that async cache reads weren't landed yet in 3.6.x, and that some of the errors are EXCEPTION_ACCESS_VIOLATION_WRITE, which we don't seem to be getting any more with b7pre. Very weird. I'd love to hear ideas on how to proceed (other than staring at code, which I'm still doing). Do we have a Windows NT box somewhere?

Summary: start-up crash under Windows XP [@ nsDiskCacheMap::Open(nsILocalFile*) ] → start-up crash under Windows NT [@ nsDiskCacheMap::Open(nsILocalFile*) ]

Jason Duell

Assignee

Comment 12

•

15 years ago

Wild guess #1: This is a problem with appending ASCII to a Cyrillic filename, and/or passing a Cyrillic filename to an NSPR I/O function. I don't understand charsets (and maybe XPCOM) well enough to know. nsresult nsDiskCacheMap::GetBlockFileForIndex(PRUint32 index, nsILocalFile ** result) { if (!mCacheDirectory) return NS_ERROR_NOT_AVAILABLE; nsCOMPtr<nsIFile> file; nsresult rv = mCacheDirectory->Clone(getter_AddRefs(file)); if (NS_FAILED(rv)) return rv; char name[32]; ::sprintf(name, "_CACHE_%03d_", index + 1); rv = file->AppendNative(nsDependentCString(name)); if (NS_FAILED(rv)) return rv; nsCOMPtr<nsILocalFile> localFile = do_QueryInterface(file, &rv); NS_IF_ADDREF(*result = localFile); return rv; } The IDL for AppendNative says that the argument must be in the native charset of the filesystem (in our error case, Russian Cyrillic). If for some reason converting ASCII "_CACHE_001_" to wchar and appending it (AppendNative does the conversion to wchar) returns NS_OK, but then the QI back to nsILocalFile fails, we'll return NS_OK without having touched 'result', which is a stack variable and thus garbage, which could then segfault when OpenBlockFiles calls Open() with it. But ascii usually converts to wchar fine, right? And I don't see any reason why the QI back to nsILocalFile could fail: mCacheDirectory is an nsCOMPtr<nsILocalFile>, so we're just going from that to nsIFile and back. There's nothing fancy about nsLocalFileWin.cpp's implementation of QI: NS_IMPL_THREADSAFE_ISUPPORTS4(nsLocalFile, nsILocalFile, nsIFile, nsILocalFileWin, nsIHashable) Wild guess #2: We could get past GetBlockFileForIndex OK, and die in nsDiskCacheBlockFile::Open(), which passes the file to OpenNSPRFileDesc(), which calls the Windows SDK functions GetFileInfo() and CreateFileW(). MSDN doesn't mention GetFileInfo() supporting unicode. Perhaps some of our Russian users have home directories with characters in them that trigger some sort of crash (only on Windows NT)?

Benjamin Smedberg

Comment 13

•

15 years ago

AppendNative should be fine here, it's always ASCII-compatible. The obvious way to check is to create a profile in a Cyrillic-named directory and run against it. GetFileInfo is not a win32 API, it's http://mxr.mozilla.org/mozilla-central/source/xpcom/io/nsLocalFileWin.cpp#473 and it is unicode-safe. This is WinXP, so if you don't have a VM of it, we can arrange for one, or you can get somebody in the QA lab to run some experiments for you.

Benjamin Smedberg

Updated

•

15 years ago

Summary: start-up crash under Windows NT [@ nsDiskCacheMap::Open(nsILocalFile*) ] → start-up crash under Windows XP [@ nsDiskCacheMap::Open(nsILocalFile*) ]

chris hofmann

Comment 14

•

15 years ago

sample of OS versions from yesterday 87 Windows NT 5.1. nsDiskCacheMap::Open(nsILocalFile*) 83 0.954023 Windows NT5.1.2600 Service Pack 3 4 0.045977 Windows NT5.1.2600 Service Pack 2

Damon Sicore (:damons)

Comment 15

•

15 years ago

Can we get an ETA for a patch here? Or, will this be fixed by bug 596476? Also, are we still sure this should block beta 7?

Jason Duell

Assignee

Comment 16

•

15 years ago

I no longer think bug 596476 is relevant--this is much older than smart sizing. I can't give an ETA, because I still have no clue what's going on. I've asked for help from the Mozilla Russia folks, and am trying to repro on an XP box I've set up with Cyrillic. Re: blocking beta 7: this only appears to affect Russian Windows XP boxes. It also seems to have tapered off in frequency from 300 crashes/day on 9/17 to 20-30 per day in the last few days. http://tinyurl.com/28hrqvz Alas, I have no idea why the decline is happening, so it could go back up. I wouldn't personally keep the train at the station for this, but I'm not a release driver and don't know how much we care about the Russian audience for the beta.

No longer depends on: 596476, 595413

Johnny Stenback (:jst)

Comment 17

•

15 years ago

Leaving this as a blocker so we keep investigating (though it's not clear to me that it actually needs to block, or that it's even something we can fix), but this should not block beta7, not given the decline in crashes and the fact that this has been around seemingly forever.

blocking2.0: beta7+ → betaN+

chris hofmann

Comment 18

•

15 years ago

its been around forever in low volume, but the crashes happening now are almost exlusively 4.0b7pre. Also we are under somekind of spike related to crash from russia or Cyrillic problems noted in bug 599126 and Bug 597260, but those seem unconnected in time and the releases they apply too. here are latest stats on which builds were hit by this in the last few days. date tl crashes at, count build, count build, ... nsDiskCacheMap::Open.nsILocalFile.. 20100920 79 ,, 67 4.0b7pre2010091704, 7 4.0b7pre2010091904, 4 3.0b12007110904, 1 4.0b7pre2010091804, 20100921 163 ,, 67 4.0b7pre2010091704, 60 4.0b7pre2010092004, 31 4.0b7pre2010091904, 3 3.0b12007110904, 1 3.6.92010082415, 1 3.6.102010091412, 20100922 136 ,, 66 4.0b7pre2010091704, 27 4.0b7pre2010092104, 19 3.0b12007110904, 9 4.0b7pre2010092204, 6 4.0b7pre2010091904, 4 3.0b22007121120, 3 4.0b7pre2010091804, 1 4.0b7pre2010092004, 1 3.0.52008120122, 20100923 87 ,, 32 4.0b7pre2010091704, 27 3.0b12007110904, 8 4.0b7pre2010092204, 8 4.0b7pre2010092004, 4 4.0b7pre2010091904, 3 4.0b7pre2010092104, 2 4.0b7pre2010091804, 1 4.0b7pre2010092304, 1 3.6.62010062523, 1 3.6.102010091412,

Honza Bambas (:mayhemer)

Comment 19

•

15 years ago

Hmm.. I don't see that creation/access to nsCacheService::mDiskDevice would be synchronized... There is some nsCacheService::mLock and the ref counter is thread safe, but what happens when we enter the code on two threads concurrently?

Honza Bambas (:mayhemer)

Comment 20

•

15 years ago

Exactly: executing nsCacheService::SearchCacheDevices.

Honza Bambas (:mayhemer)

Comment 21

•

15 years ago

There is a lot of comments in German in the last crashes. I was trying to create an account with some Czech letters in the name, no luck to reproduce.

Honza Bambas (:mayhemer)

Comment 22

•

15 years ago

(In reply to comment #19) > Hmm.. I don't see that creation/access to nsCacheService::mDiskDevice would be > synchronized... There is some nsCacheService::mLock and the ref counter is > thread safe, but what happens when we enter the code on two threads > concurrently? Taking back... Just checked that all code paths leading to access to mDiskDevice are protected by nsCacheService::mLock. (In reply to comment #21) > I was trying to create an account with some Czech letters in the name, no luck > to reproduce. And the system was Windows XP SP3 [5.1.2600]

juan becerra [:juanb]

Comment 23

•

15 years ago

I installed the multilingual user interface package for Russian, and created user account with Cyrillic characters, and I created a profile on a folder with Cyrillic characters. I've been trying to reproduce though general browsing, but no luck so far. None of the comments I saw say much in the way of reproducing the problem.

Bjarne (:bjarne)

Comment 24

•

15 years ago

Just a thought, referring to bug #595957, comment #4: Is there any way we could get hold of the fx-binaries from a user who has experienced this and check if there is a trojan involved? (Or alternatively: Is there a way we can guarantee that no trojan is mucking things up in this particular case?)

chris hofmann

Comment 25

•

15 years ago

Its possible that malware is involved, but that happens rarely as the #1 top crash, and even more rare as the #1 topcrash that affects trunk users. Another area to look at would be to make sure we've look at all the changes on trunk that could have affected cache operations on just prior to sept 17 with this ramped up exclusively on 4.0b7pre builds. Honza started that in comment 1 but its not clear that anything conclusive was found.

Summary: start-up crash under Windows XP [@ nsDiskCacheMap::Open(nsILocalFile*) ] → spike in 4.0b7pre start-up crash under Windows XP [@ nsDiskCacheMap::Open(nsILocalFile*) ]

chris hofmann

Comment 26

•

15 years ago

I don't see any mention of b47978b94fc9 2010-09-16 20:21 -0700 Bjarne Herland - Bug 596808 - nsDiskCacheDevice::Init() called twice resulting in no disk cache available r=jduell, a=betaN which landed shortly before this started appearing. I wonder if it might be worth backing that out for b7 or for a few days on trunk to see if it makes the volume drop back down. what would be the trade there?

chris hofmann

Comment 27

•

15 years ago

if think about investigating and trying the back out of bug 596808 we should flip the blocking "betaN+" flag to blocking b7+ so it gets on the radar to hold the release.

Mike Beltzner [:beltzner, not reading bugmail]

Comment 28

•

15 years ago

Let's try the backout and see what it does to the stats.

blocking2.0: betaN+ → beta7+

Whiteboard: [trying a backout]

Bjarne (:bjarne)

Comment 29

•

15 years ago

You might have found the issue although I don't see the relevance of Cyrillic profiles... The patch for bug #596808 was supposed to initialize the disk-device earlier than it used to. I believe the issue here is that this actually fails (because of the check for existence of the disk-device object in nsCacheService::OnProfileChanged() !) and that this has consequences for later requests which actually creates and initializes the disk-device. The reason the patch resolves bug #596808 is simply because it avoids initializing the disk-device twice (it fails). IMO, the solution is to ensure the disk-device is created in nsCacheService::OnProfileChanged(). I can come up with a patch for this later, or Honza or Michal could do it.

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 30

•

15 years ago

Yeah, there was a huge spike in this crash on the 17th, although there were a few on the 14th: http://crash-stats.mozilla.com/report/list?range_value=4&range_unit=weeks&signature=nsDiskCacheMap%3A%3AOpen%28nsILocalFile*%29&branch=2.0&product=Firefox Since it seems like a startup crash, it probably does seem important to fix for beta7.

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 31

•

15 years ago

For what it's worth, there's also a second cache change in the one-day window when this started: http://hg.mozilla.org/mozilla-central/rev/26e2971eeec9

Bjarne (:bjarne)

Comment 32

•

15 years ago

Could someone offer an explanation why the number of these crashes drops dramatically in nightlies *after* the 17th (see also comment #16) ? Also observe that a crash-profile described with nsDiskCacheDevice::OpenDiskCache() is on the top-crasher list of 3.6.9, mainly on WinNT 5.1 SP3 (with lots of Cyrillic fonts in the comments). The stacks from these crashes look very similar to the stacks for this issue. I'm not so convinced that the patch for bug #596808 is the culprit anymore. IMO it is likely that the earlier initialization performed in this patch exposes something lurking in other parts of the code, and I believe we should try to track down and fix the real issue. It might be worth backing it out to see if it makes a difference in the stats but there are not many crashes with this signature anymore, so I'm not convinced we will see anything.

Mike Beltzner [:beltzner, not reading bugmail]

Comment 33

•

15 years ago

Is it possible the decline came because people were crashing on startup so they stopped using the browser? Seems like a reasonable reaction to me.

Scoobidiver (away)

Reporter

Comment 34

•

15 years ago

> Is it possible the decline came because people were crashing on startup so > they stopped using the browser? Seems like a reasonable reaction to me. According to crash stats, the number of users increases : 2010-09-27 1,824 40,509 100% 4.5% 2010-09-26 1,876 32,320 100% 5.8% 2010-09-25 1,867 30,232 100% 6.18% 2010-09-24 1,864 33,378 100% 5.58% 2010-09-23 2,081 32,737 100% 6.36% 2010-09-22 2,431 30,958 100% 7.85% 2010-09-21 2,571 28,803 100% 8.93% 2010-09-20 2,040 25,458 100% 8.01% 2010-09-19 1,653 20,031 100% 8.25% 2010-09-18 1,714 18,371 100% 9.33% 2010-09-17 2,519 20,792 100% 12.12% 2010-09-16 738 18,556 100% 3.98% 2010-09-15 22 11,565 100% 0.19% 2010-09-14 1,081 2,601 100% 41.56%

chris hofmann

Comment 35

•

15 years ago

> Could someone offer an explanation why the number of these crashes drops > dramatically in nightlies *after* the 17th (see also comment #16) ? that's an interesting point, but I'm not sure we can to say the crashes have "dropped", without understand how fast people might be rolling forward. The core of our nightly testers move forward pretty routinely and agressively, but we have had several tech press articles with "feature X lands on mozilla nightlies" lately. One of these articles might have skewed the pool of users on builds from the 17, or changed the nightly tester composition, and maybe more people got stuck on sept 17 or just gave up. here are updated stats. crashes are showing up on 0924 and 0925 builds, but its true they are still 1/2 the rate of 0917 20100916 2 3.6.92010082415 2 , 20100917 12 10 4.0b7pre2010091704, 1 4.0b62010091408, 1 3.0b12007110904, 20100918 8 7 4.0b7pre2010091704, 1 3.0b12007110904, 20100919 46 42 4.0b7pre2010091704, 3 4.0b62010091408, 1 3.6.102010091412, 20100920 79 67 4.0b7pre2010091704, 7 4.0b7pre2010091904, 4 3.0b12007110904, 1 4.0b7pre2010091804, 20100921 163 67 4.0b7pre2010091704, 60 4.0b7pre2010092004, 31 4.0b7pre2010091904, 3 3.0b12007110904, 1 3.6.92010082415, 1 3.6.102010091412, 20100922 136 66 4.0b7pre2010091704, 27 4.0b7pre2010092104, 19 3.0b12007110904, 9 4.0b7pre2010092204, 6 4.0b7pre2010091904, 4 3.0b22007121120, 3 4.0b7pre2010091804, 1 4.0b7pre2010092004, 1 3.0.52008120122, 20100923 87 32 4.0b7pre2010091704, 27 3.0b12007110904, 8 4.0b7pre2010092204, 8 4.0b7pre2010092004, 4 4.0b7pre2010091904, 3 4.0b7pre2010092104, 2 4.0b7pre2010091804, 1 4.0b7pre2010092304, 1 3.6.62010062523, 1 3.6.102010091412, 20100924 66 34 4.0b7pre2010091704, 15 4.0b7pre2010092404, 6 3.0b12007110904, 5 4.0b7pre2010092004, 3 4.0b7pre2010091904, 2 4.0b7pre2010091804, 1 3.6.102010091412, 20100925 87 47 4.0b7pre2010091704, 20 4.0b7pre2010092404, 6 3.0b12007110904, 5 4.0b7pre2010092304, 4 4.0b7pre2010092312, 3 3.6.102010091412, 2 3.0.52008120122, 20100926 85 40 4.0b7pre2010091704, 25 4.0b7pre2010092504, 10 4.0b7pre2010092004, 5 4.0b7pre2010092404, 4 3.0b12007110904, 1 3.6.102010091412, 20100927 89 51 4.0b7pre2010091704, 12 4.0b7pre2010092204, 9 4.0b7pre2010092604, 9 4.0b7pre2010092312, 4 3.0b12007110904, 3 4.0b7pre2010092404, 1 3.6.102010091412,

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 36

•

15 years ago

(In reply to comment #35) > that's an interesting point, but I'm not sure we can to say the crashes have > "dropped", without understand how fast people might be rolling forward. The > core of our nightly testers move forward pretty routinely and agressively, but > we have had several tech press articles with "feature X lands on mozilla > nightlies" lately. One of these articles might have skewed the pool of users > on builds from the 17, or changed the nightly tester composition, and maybe > more people got stuck on sept 17 or just gave up. here are updated stats. It seems likely that people got stuck on the Sept. 17 build, since this seems to be a startup crash.

Jason Duell

Assignee

Comment 37

•

15 years ago

Status: Spent much of the day staring at minidumps w/dbaron and sicking. Didn't get much traction. Just checked in a version bump of the HTTP cache: http://hg.mozilla.org/mozilla-central/rev/a9d1ad0bc386 This will cause nightly users to have their cache re-created. We wanted to do this anyway so that nightly users get the fallocate optimization from bug 592520. But also, since landing 592520 coincided with the crash spike for beta7 (comment 31), we may wind up seeing either a crash or a dropoff in the crash count. Seemed worth trying. I'm also planning to land the patches for bug 596476 tomorrow--they clean up the smart size logic, and might help reduce the crash rate if we're lucky, though they're almost definitely not going to completely fix this.

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 38

•

15 years ago

I think we were able to rule a few things out from the minidumps: The most notable is that it's related to having Cyrillic characters in the username. In a bunch of the minidumps (maybe even all?), there were parts of file paths for a cache map file on the stack, and those paths were for the user name Admin. It's perhaps also of interest that the crashes for this bug are *off* the main thread, and during the crash, the main thread is waiting for the cache lock. This made it seem like the bug on making nsCacheProfilePrefObserver::GetSmartCacheSize (which runs off the main thread) not call NS_GetSpecialDirectory might help, although we couldn't really see how. We didn't come to a conclusion about whether or not this is the same as bug 595957. They have a whole bunch of similarities, though: most user comments are Cyrillic, spiked around the same time (although not exactly). It's possible that both are related to malware circulating in Russia, the Ukraine, and Poland.

Bjarne (:bjarne)

Comment 39

•

15 years ago

(In reply to comment #35) > crashes are showing up on 0924 and 0925 builds, but its true they are still 1/2 > the rate of 0917 I'm sorry, but we're probably looking at different data... I tend to look at the link provided in comment #30, then choose the "Table" tab. I see 4 crashes on the 14th, 536 on the 17th, 43 on the 24th and 25 on the 25th. Am I looking at the wrong thing? (In reply to comment #37) > Just checked in a version bump of the HTTP cache: Brilliant idea! :) If we see another spike, I'd suggest to bump again and back out #596808 (it should probably be fixed more thoroughly anyway). (In reply to comment #38) > The most notable is that it's related to having Cyrillic characters in the > username. In a bunch of the minidumps (maybe even all?), there were parts of > file paths for a cache map file on the stack, and those paths were for the user > name Admin. Admin means elevated privileges on Windows, right? Virus/Malware...? A few holes in the story still: - do we know if the users who experience this crash run the beta again without the crash, or do we even know that this is on the first run? (The version-bump may provide insight here.) - is there really no relation to this crash http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&signature=nsDiskCacheDevice%3A%3AOpenDiskCache%28%29&version=Firefox%3A3.6.9 which also has a spike on the 17th

Bjarne (:bjarne)

Comment 40

•

15 years ago

(In reply to comment #38) > We didn't come to a conclusion about whether or not this is the same as bug > 595957. They have a whole bunch of similarities, though: most user comments > are Cyrillic, spiked around the same time (although not exactly). It's > possible that both are related to malware circulating in Russia, the Ukraine, > and Poland. Sorry - I missed the fact that this is the same as the 3.6.9-crash I was referring to in previous comment. AFAICS the crashes for these two issues seem to both revolve around the statement rv = mCacheMap.Open(mCacheDirectory)

Benjamin Smedberg

Comment 41

•

15 years ago

The theory about the bad off-main-thread usage of the directory service is very likely, bug 597658. There's a patch in bug 596476 to fix it.

Depends on: 596476

Jason Duell

Assignee

Comment 42

•

15 years ago

> do we know if the users who experience this crash run the beta again without > the crash, or do we even know that this is on the first run? We're seeing a lot of repeat crashes with the same hour:minute timestamp--usually from 2-6 in a row, which suggests it may be users crashing repeatedly and then giving up. > the crashes for these two issues seem to both revolve around > > rv = mCacheMap.Open(mCacheDirectory) which is calling nsDiskCacheMap::OpenBlockFiles(), which calls nsDiskCacheMap::GetBlockFileForIndex() three times (to get nsILocalFiles for _CACHE_001,2, and then 3). I believe we kept seeing "_CACHE_001_" in the disassembly on the stack; if true we're dying after the first call. I'm going to write a patch for the potential segfault mentioned in comment 12 just in case that helps.

Jason Duell

Assignee

Comment 43

•

15 years ago

I take it back. The code mentioned in comment 12 already returns any error from QI, so that theory is bunk. Will land 596476 once I get jst's (or anyone's) +r for the directory service patch. Oh, hmm--we're still seeing crashes from the build after my cache version bump (build 20100928041914): the crash stack (and exception addr) are still the same, but the exception is now always EXCEPTION_ACCESS_VIOLATION_EXEC (before it was almost always a READ exception, with a few EXEC's thrown in).

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 44

•

15 years ago

(In reply to comment #39) > (In reply to comment #35) > > crashes are showing up on 0924 and 0925 builds, but its true they are still 1/2 > > the rate of 0917 > > I'm sorry, but we're probably looking at different data... I tend to look at > the link provided in comment #30, then choose the "Table" tab. I see 4 crashes > on the 14th, 536 on the 17th, 43 on the 24th and 25 on the 25th. Am I looking > at the wrong thing? There are two different notions of time: (1) the build ID, and (2) the crash date. chofmann is saying that *for current crash dates*, half the crashes are still from the build ID of the 17th. The data in comment 35 are a matrix showing *both* of these notions of time. Each entry is a date-of-crash, formatted like: date-of-crash total-count-on-date build-id-1 crashes-that-date-on-build-id-1 build-id-2 crashes-that-date-on-build-id-2 etc.

Bjarne (:bjarne)

Comment 45

•

15 years ago

Thanks for the clarification! So that means that e.g on Sept.22nd there were 136 total crashes with this signature, 66 from the 0917-build, 27 from the 0921-build, 9 from the 0922-build etc... ok. (Quite useful, I must say :) ) However, IMO it still doesn't explain why the builds after 0917 produce fewer crashes...

Benjamin Smedberg

Comment 46

•

15 years ago

Well... because nightly users are stuck on the 09-17 build, I'll bet!

Bjarne (:bjarne)

Comment 47

•

15 years ago

Why would they be stuck? In particular: if it crashes at startup, why would anyone continue using it?

Benjamin Smedberg

Comment 48

•

15 years ago

They keep hitting their Minefield icon in the taskbar and then remember that it crashes. They're stuck because if we can't launch, we can't update. Anyway, let's land the fix we know is a problem and see if this crash signature goes away.

Bjarne (:bjarne)

Comment 49

•

15 years ago

(In reply to comment #48) > They keep hitting their Minefield icon in the taskbar and then remember that it > crashes. They're stuck because if we can't launch, we can't update. How would we ever get them back? :) Seriously: So the theory is that a number of nightly users has the 0917-build installed and do not manage to upgrade from it? In fact, there are so many of these that the crashes they generate after 11 days (and builds) still dominate this type of crash? Counter-intuitive to me, but I'll accept it if established experience say that this is how it works... > Anyway, let's land the fix we know is a problem and see if this crash signature > goes away. Definitely! :) (In reply to comment #43) > Oh, hmm--we're still seeing crashes from the build after my cache version bump > (build 20100928041914): the crash stack (and exception addr) are still the > same, but the exception is now always EXCEPTION_ACCESS_VIOLATION_EXEC (before > it was almost always a READ exception, with a few EXEC's thrown in). But the number of crashes did not jump? I.e. the act of re-creating the cache does not seem to be the problem (yet)? Anyone who knows what EXCEPTION_ACCESS_VIOLATION_EXEC in fact means? Illegal instruction?

Bjarne (:bjarne)

Comment 50

•

15 years ago

(In reply to comment #49) > In fact, there are so many of > these that the crashes they generate after 11 days (and builds) still dominate > this type of crash? ... and, btw, they all use Cyrillic keyboards?

Jason Duell

Assignee

Comment 51

•

15 years ago

Landed 596476--let's see from the nightlies tomorrow if the directory service was indeed the culprit. > what does EXCEPTION_ACCESS_VIOLATION_EXEC mean? I believe it means a bad address was used as an instruction (instead of a read/write). Really not sure what that means here. A little odd given that the stack frame and addr are the same. So far 16 crashes today with the build from last night. Hard to say if this is an improvement, as our slavic XP user base may or may not be trying it in large numbers (some may be stuck on the build from 17th, or given up on nightlies, etc.) > How would we ever get [those users] back? :) We can let them switch to Chrome for a while, then realize FF 4 is better.

(not currently active) Ted Mielczarek

Comment 52

•

15 years ago

(In reply to comment #51) > Landed 596476--let's see from the nightlies tomorrow if the directory service > was indeed the culprit. > > > what does EXCEPTION_ACCESS_VIOLATION_EXEC mean? > > I believe it means a bad address was used as an instruction (instead of a > read/write). Really not sure what that means here. A little odd given that > the stack frame and addr are the same. That's exactly what it means: http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/minidump_processor.cc#723 If you look at: http://crash-stats.mozilla.com/report/index/deb500a9-e87b-4f4b-926c-b0b0b2100924 The top of the stack is the crash address, yes, which means that something caused us to jump to a bad address in non-executable memory. Saved by DEP! Interestingly, frame 1 is missing source info, which probably means it's in a "cold" block of that function. I've investigated this in the past, when VC++ does PGO optimization it will separate functions out into "hot" and "cold" blocks, and put all the hot blocks in one set of pages, and the cold ones in another set of pages. Unfortunately VC2005 then fails to write out source line info in the PDB for the cold blocks. (VC2010 fixes this, at least.)

Bjarne (:bjarne)

Comment 53

•

15 years ago

So something made us try executing from a bad address? Could this be caused by e.g. calling a method on a dangling pointer to an object?

chris hofmann

Comment 54

•

15 years ago

re comment 45 > So that means that on day A there were X crashes on Y build.... . (Quite useful, I must say :) Bug 600534 tracks trying to get this view in the web interface of socorro

Honza Bambas (:mayhemer)

Comment 55

•

15 years ago

(In reply to comment #53) > So something made us try executing from a bad address? Could this be caused by > e.g. calling a method on a dangling pointer to an object? No I'd say, unless the object has virtual methods, nsDiskCacheBlockFile doesn't have any. This all seems to me more like a stack corruption, and BAD_EXEC as RET would jump to a bad address. But I'm not that much expert to deep debugging...

Jason Duell

Assignee

Comment 56

•

15 years ago

Well, we're at 10 crashes so far today with the build from last night, so the directory service fix hasn't made this gone away. We're still down from the 9/17 spike, but hard to know what sort of prevalence we'd see if this shipped in beta7. Error is now back to EXCEPTION_ACCESS_VIOLATION_READ. (Is it just me, or does the combo of same crash stack + different access error + Slavic XP only == probably some sort of malware problem?) Will look at some minidumps of yesterday and today as soon as I can get my hands on some.

Bjarne (:bjarne)

Comment 57

•

15 years ago

(In reply to comment #56) >(Is it just me, or does > the combo of same crash stack + different access error + Slavic XP only == > probably some sort of malware problem?) No (i.e. it's not just you). Could we conclude that there was no new spike after the new version-bump?

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 58

•

15 years ago

Still 31 crashes in the 2010-09-30 build.

Jason Duell

Assignee

Comment 59

•

15 years ago

21 of those 31 crashes for the 9/30 build are in rapid succession, so probably just a very persistent user crashing over and over at startup. We have alas made very little headway on this bug. Opening the crashdumps causes my copy of devstudio to load the blue screen of death, which is making it hard for me at least to get anywhere. Given the crash levels are pretty low since 9/17 do we want to mark this betaN?

Alexander L. Slovesnik

Comment 60

•

15 years ago

FWIW, I found this blog post in Russian: http://translate.google.com/translate?hl=ru&sl=ru&tl=en&u=http%3A%2F%2Fsibilev.net%2F%3Fp%3D3573 that describes either this bug or probably Bug 595957 (FireFox constantly crashes on startup and tries to send a message about the crash). The reason of Firefox crash on startup is virus loaded through HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon\Userinit Once virus removed, Firefox crash is gone. Few users on our local forum indicate that method of virus removal, described in this blog post, fixed their problem with Firefox startup crash.

Jason Duell

Assignee

Comment 61

•

15 years ago

Alexander, Thanks very much for this information! I am not clear from the translation of the blog post whether the "DrWeb" software mentioned was part of the problem (caused the virus), or was just part of an attempt to fix it. It looks like the problem here is that a malicious program is somehow inserted into HKEY_LOCAL_MACHINE \ SOFTWARE \ Microsoft \ Windows NT \ CurrentVersion \ Winlogon \ Userinit, and then presumably run at startup. It's not clear how it winds up affecting Firefox, but when the registry key is removed, the crashes go away. One possibility is that the filesystem I/O syscalls are being intercepted, but presumably it could be lots of different things. We've looked over the "interesting_modules" file for the crash, and we don't see any clear .dll file that's associated with this. So we don't have a .dll name that we can block. Not sure if there's anything else to do here. At JST's behest, marking INVALID and removing as blocker, since the crash numbers have stayed low since the spike on 9/17: http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&signature=nsDiskCacheMap%3A%3AOpen(nsILocalFile*)&version=Firefox%3A4.0b7pre )

Assignee: honzab.moz → jduell.mcbugs

Status: ASSIGNED → RESOLVED

blocking2.0: beta7+ → ---

Closed: 15 years ago

Keywords: regression

Resolution: --- → INVALID

Whiteboard: [trying a backout]

Alexander L. Slovesnik

Comment 62

•

15 years ago

(In reply to comment #61) > I am not clear from the translation of the blog post whether the "DrWeb" > software mentioned was part of the problem (caused the virus), or was just part > of an attempt to fix it. For clarity, Dr.Web is antivirus software, popular in Russia - http://www.drweb.com/?lng=en

timeless

Comment 64

•

15 years ago

http://technet.microsoft.com/en-us/library/cc939862.aspx Specifies the programs that Winlogon runs when a user logs on. By default, Winlogon runs Userinit.exe, which runs logon scripts, reestablishes network connections, and then starts Explorer.exe, the Windows user interface. Think of it as like ~/.profile or something, a happy place for bad guys to ask to run really early.

chris hofmann

Comment 65

•

15 years ago

we are getting 10,000-14,000 crashes per day on this. lets not call it invalid, and lets try and figure out what we can do to drive those numbers lower. date crashes at HeapDestroy bug 597960 20101001 5943 20101002 6788 20101003 5717 20101004 5288 20101005 4787 172:crashdata chofmann$ ./stacktrend.sh nsDiskCacheDevice::OpenDiskCache 201010* date crashes at nsDiskCacheDevice::OpenDiskCache bug 595957 20101001 5503 20101002 7564 20101003 7252 20101004 6486 20101005 6324 172:crashdata chofmann$ ./stacktrend.sh nsDiskCacheMap::Open.nsILocalFile.. 201010* date crashes at nsDiskCacheMap::Open.nsILocalFile.. 20101001 63 20101002 20 20101003 23 20101004 38 20101005 44 I suggested that we post something on SUMO and try and drive traffic to that article with press, but cww says visits by Russian users to SUMO are low. What are the support venues that we should be hitting? Should we try and ramp up some press on this with instructions on how to repair? I've sent mail to contacts kasperski to maybe get involved in blocking/repairing this malware; are there other contacts like that we should reach out to? e-mail responder feature should be going on-line with socorro 1.7 tomorrow night. this is a good candidate to use for responding to users that add e-mails to crash reports on these three signatures.

Status: RESOLVED → REOPENED

Keywords: user-doc-needed

Resolution: INVALID → ---

Alexander L. Slovesnik

Comment 66

•

15 years ago

(In reply to comment #65) > I suggested that we post something on SUMO and try and drive traffic to that > article with press, but cww says visits by Russian users to SUMO are low. > What are the support venues that we should be hitting? I guess it's possible to add to crash reporter detection that Firefox has crashed several times on startup. After crash reporter has detected that Firefox has crashed several times on start up, crash reporter could launch another browser (most probably Internet Explorer) and open in it SUMO article explaining what should be done in case if it's not possible to launch Firefox. Some pitfalls: 1) Most viruses block for user access to web-sites of antivirus companies. They easily could block access to SUMO too. We could ship SUMO web-page bundled with browser, but it will hurt distribution size. 2) Some computers doesn't have another browser installed (thanks to European Union browser choice initiative). Thankfully Russia is not part of EU.

chris hofmann

Comment 67

•

15 years ago

yeah, we can't ensure that we will each users with any of these channels; but we need to try! The impact of this is probably far greater than the 10,000-14,000 users I mentioned above. Thats the number of users that crash per day from the buggy malware. We will probably run on to more as we find more signatures releated to this problem. The larger problem might be for users where the malware runs as designed and does not crash and the system is compromised.

Alexander L. Slovesnik

Comment 68

•

15 years ago

(In reply to comment #67) > The larger problem might be for users where the malware runs > as designed and does not crash and the system is compromised. Well, Firefox already have safe browsing feature, thanks to Google. It would be helpful to add to Firefox some basic virus/malware detection feature, that could indicate that this system has been compromised (may be some antivirus company would be interested). At least Mozilla developers wouldn't waste their precious time on bugs, caused by malware. Though I guess this discussion doesn't belong to this bug.

Emanuel Hoogeveen [:ehoogeveen]

Comment 69

•

15 years ago

(In reply to comment #67) > The larger problem might be for users where the malware runs > as designed and does not crash and the system is compromised. Do we have a name for the malware concerned at this point? e.g. could we get a copy from Dr.Web for sandboxed analysis?

[:Cww]

Comment 70

•

15 years ago

Alexander L. Slovesnik: where do most Russians go for tech support (and by extension, where do they go for tech support with Firefox?) Is the front page of http://mozilla-russia.org/ a good place for a notice about this issue? You seem to have a much more active community than we do. FWIW, since it's a startup crash, the primary driver of traffic to SUMO -- the built-in Help button -- is not usable so we should be looking at messaging in other places. I have no sense for the scale of this problem either... is it bad enough that we should try to send an official notice to the Technology ministry in Russia? Should we try to get Microsoft to release a security update to address this? Comment 67 makes it seem like a much larger percentage of users are affected than the crash reports we see.

chris hofmann

Updated

•

15 years ago

Blocks: malware-attacks

chris hofmann

Comment 71

•

15 years ago

translated version of the support forum at http://mozilla-russia.org/ http://translate.google.com/translate?js=n&prev=_t&hl=en&ie=UTF-8&layout=2&eotf=1&sl=ru&tl=en&u=http%3A%2F%2Fmozilla-russia.org%2F shows the symptoms of this bug translated post name posts views Mozilla Firefox will not start - Chara [ 1 2 3 4 ] 80 24660 firefox does not run a report of an unexpected error - axe 14 1195 Permanent fall browser - KReoN 8 154

Alexander L. Slovesnik

Comment 72

•

15 years ago

(In reply to comment #70) > Alexander L. Slovesnik: where do most Russians go for tech support (and by > extension, where do they go for tech support with Firefox?) Is the front page > of http://mozilla-russia.org/ a good place for a notice about this issue? You > seem to have a much more active community than we do. Russian Mozilla forum is http://forum.mozilla-russia.org/. I've created post in our local FAQ on this issue on http://forum.mozilla-russia.org/viewtopic.php?id=46369 However, malware removal is very tricky business and I'm reluctant to convert Mozilla support forum to malware removal support forum. Antivirus companies support and special forums are more qualified to deal with malware issues. > I have no sense for the scale of this problem either... is it bad enough that > we should try to send an official notice to the Technology ministry in Russia? FWIW, it's not only Russia problem. On http://crash-stats.mozilla.com/report/list?signature=nsDiskCacheDevice::OpenDiskCache%28%29 there are some comments on Italian and German. > Should we try to get Microsoft to release a security update to address this? > Comment 67 makes it seem like a much larger percentage of users are affected > than the crash reports we see. There is nothing that indicates that this is Microsoft issue.

[:Cww]

Comment 73

•

15 years ago

Microsoft, as part of monthly security updates, pushes out a malware scanner... I don't know if it works really well but if chofmann is right, this is affecting tons of users (who are not crashing) and causing loss of personal data and we should leverage whatever resources we can to help them. Another question: Is there anything you think Mozilla should do to help? You probably have a better sense of your locale than we do and I'd be happy to do what we can. However, you are much better qualified to say what steps/outreach is necessary.

Alexander L. Slovesnik

Comment 74

•

15 years ago

(In reply to comment #73) > Microsoft, as part of monthly security updates, pushes out a malware scanner... > I don't know if it works really well but if chofmann is right, this is > affecting tons of users (who are not crashing) and causing loss of personal > data and we should leverage whatever resources we can to help them. Unfortunately, a lot of users disable Microsoft Update on pirated Windows installations. > Another question: Is there anything you think Mozilla should do to help? You > probably have a better sense of your locale than we do and I'd be happy to do > what we can. However, you are much better qualified to say what > steps/outreach is necessary. I've posted a kind of plan in comment 66. Additionaly Mozilla could contact antivirus companies (http://translate.google.com/translate?js=n&prev=_t&hl=en&ie=UTF-8&layout=2&eotf=1&sl=ru&tl=en&u=http%3A%2F%2Fwww.anti-malware.ru%2Frussian_antivirus_market_2009_2010 shows some stats on antivirus market in Russia) to ask them for any data on Firefox start-up crash issue. I guess they can correlate Firefox crash statistic with malware spread statistic.

[:Cww]

Comment 75

•

15 years ago

The plan in comment 66 is a good long term idea but would minimally require a new version of Firefox to work (and maybe a lot of work in the socorro backend). Is there anything that we can do without making changes to Firefox?

chris hofmann

Comment 76

•

15 years ago

yes, https://bugzilla.mozilla.org/show_bug.cgi?id=585593 outlines the plan for changes going into socorro that will allow finding the crash signatures like in this bug and the two other related bugs, then pulling e-mail address where users provided them, then e-mailing with instructions on how to avoid the crash they just hit. this won't require any changes to firefox. the message that we construct for the e-mail ought to have information in Russian and English and sounds like maybe Italian and German with maybe links in the e-mail with instructions on how to avoid the crash in each of these languages.

Alexander L. Slovesnik

Comment 77

•

15 years ago

(In reply to comment #76) > yes, https://bugzilla.mozilla.org/show_bug.cgi?id=585593 outlines the plan for > changes going into socorro that will allow finding the crash signatures like in > this bug and the two other related bugs, then pulling e-mail address where > users provided them, then e-mailing with instructions on how to avoid the crash > they just hit. Can you estimate percentage of users, that have provided their e-mail addresses in crash reports? Are we talking about 1%, 10% or 90%?

chris hofmann

Comment 78

•

15 years ago

yeah, the projections for the number of users that we can reach with this technique are low, but its still one more tool to get the word out. some quick checks indicate that we might be able to reach just over a 1,000 user per day that that are hitting these crashes. Here is a sample from oct 6 HeapDestroy 6319 reports - no e-mail provided 516 yes, have e-mail address nsDiskCacheDevice::OpenDiskCache 8269 no e-mail provided 549 yes, have e-mail nsDiskCacheMap::Open.nsILocalFile.. 68 no e-mail this is probably a good bug to test the rollout of the e-mail responder system.

Bjarne (:bjarne)

Comment 79

•

15 years ago

I'm no expert in runtime C++ and only use Windows if I'm forced to, but would it be possible to add exception-handling (possible for Windows only) in the appropriately coarse-grained places in the code which loads/bootstraps Firefox-modules? Just to catch stuff like this and pop up some reasonable message?

timeless

Comment 80

•

15 years ago

not like that. we don't know if a library is poisoning our process and running away, or if a process is attacking our process, or if a kernel driver is ruining us. there's also another minor detail... a rogue piece of code could hurt any random file i/o, not just the one we pick. ignoring that, assuming the process actually does care about us, this is a losing battle.

chris hofmann

Updated

•

15 years ago

Depends on: 585593

chris hofmann

Comment 81

•

15 years ago

still currently running at about ten thousand crashes per day on Bug 597960 - crash under Windows XP [@ HeapDestroy ] mainly on start-up Plus another 8,000 per day with the nsDiskCacheDevice::OpenDiskCache.. signature plus another 100 or so per day on this signature would bring the total to 19,000 crashes per day of the crash reports we process.

Mike Beltzner [:beltzner, not reading bugmail]

Comment 82

•

15 years ago

I'm not seeing anything at 10,000 crashes a day on http://crash-stats.mozilla.com/products/Firefox/versions/4.0b8pre - where are we seeing this volume?

chris hofmann

Comment 83

•

15 years ago

this one of several bugs where we are affected by the same possible malware spans all releases. this particular signature applies to only trunk so its low volume. one of the bugs are duped against this bugs so I fugured we were concentrating comments here. maybe we should spin up a tracking bug to cover common stats and attributes of all the bugs. Here is the first comment for the tracking bug this bug's stats. date tl crashes at, count build, count build, ... nsDiskCacheMap::Open.nsILocalFile.. 20101020 33 12 4.0b7pre^\2010100204, 10 4.0b8pre^\2010101804, 6 4.0b8pre^\2010102004, 2 4.0b8pre^\2010101904, 1 4.0b8pre^\2010101104, 1 4.0b8pre^\2010100704, 1 4.0b7pre^\2010100304, 20101021 60 53 4.0b7pre^\2010100204, 2 4.0b4^\2010081813, 2 3.6.10^\2010091412, 1 4.0b8pre^\2010101604, 1 4.0b8pre^\2010100904, 1 3.6.11^\2010101211, Bug 595957 - Sept 10-12, Spike in Firefox Crashes for Russian Users [@ nsDiskCacheDevice::OpenDiskCache() ] (edit) date tl crashes at, count build, count build, ... nsDiskCacheDevice::OpenDiskCache.. 20101020 4063 2173 3.6.10^\2010091412, 343 4.0b6^\2010091408, 260 3.0.19^\2010031422, 150 3.6.11^\2010101211, 149 3.6^\2010011514, 141 3.5.13^\2010091413, 125 3.5.5^\2009110215, 89 3.6.3^\2010040108, 72 3.6.8^\2010072215, 59 3.0b5^\2008032620, 55 3.0.1^\2008070208, 47 4.0b4^\2010081813, 47 4.0b2^\2010072019, 31 3.6.9^\2010082415, 25 3.0.5^\2008120122, <releases where volume is less that 30 crashes per day snipped> Bug 597960 - crash under Windows XP [@ HeapDestroy ] date tl crashes at, count build, count build, ... HeapDestroy 20101020 17985 7901 3.6.10^\2010091412, 1517 3.5.13^\2010091413, 901 3.6.8^\2010072215, 844 3.6.11^\2010101211, 762 4.0b6^\2010091408, 751 3.6^\2010011514, 522 3.6.3^\2010040108, 503 3.0.19^\2010031422, 334 3.0.6^\2009011913, 297 3.5.6^\2009120122, 283 3.6.6^\2010062523, 222 3.5.5^\2009110215, 201 3.5.3^\2009082410, 184 3.5.2^\2009072922, 183 3.7a1pre^\2009082804, 148 3.0.1^\2008070208, 137 4.0b7pre^\2010100204, 131 3.0.5^\2008120122, 111 3.0^\2008052906, 101 4.0b4^\2010081813, 96 4.0a1pre^\2008051003, <releases where volume is under 100 per day snipped>

Wayne Mery (:wsmwk)

Comment 84

•

15 years ago

Similar crash in thunderbird. All are win XP. bp-0d4166c4-99dc-42c3-a130-2e3e42101109 "Opens up again but breaks down right with the 1st click. Firefox doesn't even open anymore." 0 @0xf195b58c 1 thunderbird.exe nsDiskCacheMap::OpenBlockFiles netwerk/cache/src/nsDiskCacheMap.cpp:617 2 thunderbird.exe nsDiskCacheMap::Open netwerk/cache/src/nsDiskCacheMap.cpp:155 3 thunderbird.exe nsDiskCacheDevice::OpenDiskCache netwerk/cache/src/nsDiskCacheDevice.cpp:896 4 thunderbird.exe nsDiskCacheDevice::Init netwerk/cache/src/nsDiskCacheDevice.cpp:374 5 thunderbird.exe nsCacheService::CreateDiskDevice netwerk/cache/src/nsCacheService.cpp:966 6 thunderbird.exe nsCacheService::SearchCacheDevices netwerk/cache/src/nsCacheService.cpp:1362 7 thunderbird.exe nsCacheService::ActivateEntry netwerk/cache/src/nsCacheService.cpp:1271 8 thunderbird.exe nsCacheService::ProcessRequest netwerk/cache/src/nsCacheService.cpp:1151 9 thunderbird.exe nsCacheService::OpenCacheEntry netwerk/cache/src/nsCacheService.cpp:1236 10 thunderbird.exe nsCacheSession::OpenCacheEntry netwerk/cache/src/nsCacheSession.cpp:98 11 thunderbird.exe nsHttpChannel::OpenCacheEntry netwerk/protocol/http/src/nsHttpChannel.cpp:1832 bp-ec0241d2-852e-42fe-b18d-b13fe2101110 (e.biehl) bp-45bb010c-5f03-4f11-b2cb-3e5022101111 (g.birkle)

David Tenser [:djst]

Comment 85

•

15 years ago

We'd like to use this as a test pilot for reaching out to users suffering from a crash where there's a known workaround but not a fix in place in Firefox. Based on the Russian forum thread, here's my attempt to translate the instructions to English. Can anyone confirm that this is an accurate translation (and clarification)? 1. Open regedit (click Start, then Run..., and then type "regedit" and press Enter). 2. Locate the key: HKEY_LOCAL_MACHINE \ SOFTWARE \ Microsoft \ Windows NT \ CurrentVersion \ Winlogon. 3. Find the entry called "Userinit". It should only have the value of "C:\WINDOWS\system32\userinit.exe". If there is a comma and more text after it, this is a virus. Remember the part after the comma, which might look like this: "C:\WINDOWS\system32\3abcde04.exe". 4. Open My Computer and navigate to the folder containing the virus. In the example above, this is "C:\Windows\system32". 5. Completely remove the virus file by selecting it ("3abcde04.exe" in the example above) and pressing the Delete key while holding down the Shift key. 6. Go back to regedit and remove the part of the entry "Userinit" so it only includes "C:\WINDOWS\system32\userinit.exe". 7. Restart the computer.

Alexander L. Slovesnik

Comment 86

•

15 years ago

(In reply to comment #85) > Based on the Russian forum thread, here's my attempt to translate the > instructions to English. Can anyone confirm that this is an accurate > translation (and clarification)? Translation looks good.

Nobody; OK to take it and work on it

Updated

•

14 years ago

Crash Signature: [@ nsDiskCacheMap::Open(nsILocalFile*) ]

Scoobidiver (away)

Reporter

Comment 87

•

14 years ago

It's now a low volume crash: only 11 crashes in 8.0 over the last week.

Keywords: topcrash

Summary: spike in 4.0b7pre start-up crash under Windows XP [@ nsDiskCacheMap::Open(nsILocalFile*) ] → Start-up crash under Windows XP [@ nsDiskCacheMap::Open(nsILocalFile*) ]

BMO Automation

Updated

•

10 years ago

Crash Signature: [@ nsDiskCacheMap::Open(nsILocalFile*) ] → [@ nsDiskCacheMap::Open(nsILocalFile*) ] [@ nsDiskCacheMap::Open ]

Wayne Mery (:wsmwk)

Comment 88

•

10 years ago

zero examples with nsDiskCacheMap::Open in signature in the past week for any version

Status: REOPENED → RESOLVED

Closed: 15 years ago → 10 years ago

Resolution: --- → WORKSFORME