Note: There are a few cases of duplicates in user autocompletion which are being worked on.

crash nsLocalFile::CopyMove

VERIFIED FIXED in Firefox 8

Status

()

Core
Networking: File
--
critical
VERIFIED FIXED
6 years ago
6 years ago

People

(Reporter: Mats Palmgren (vacation - back in August), Assigned: michal)

Tracking

(5 keywords)

8 Branch
mozilla10
x86
Windows 7
crash, qawanted, regression, verified-aurora, verified-beta
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox8+ fixed, firefox9+ fixed)

Details

(Whiteboard: [inbound][qa!], crash signature)

Attachments

(2 attachments)

This bug was filed from the Socorro interface and is 
report bp-2ba15c7b-9b36-4a0a-a1f4-840c72110822 .
============================================================= 

Currently at #9 on the "Top Crashers for Firefox 8.0a2" list
with 52 crashes in the past 7 days (all on Windows).

Comment 1

6 years ago
Jason - any ideas here? I think you modified this code somewhat recently.
Yes, I changed the code with this patch in bug 670911, and the crashes seem to only happen after then

  http://hg.mozilla.org/mozilla-central/rev/f5272524806b

But I've got no clue as to how this code could be causing a segfault.  The patch is quite simple, and doesn't cause any new data to be read, just tests against '!aParentDir', which is allowed to be null but is never read from if so.  My patch also changes the call stack here so that aParentDir is now null (used to be non-null), but that should be fine too (and was on the Windows box I tested with).

Been staring at it for a while now and am at a loss.  Any chance this is some malware issue?  In the past that's generally been the issue for necko crashes that don't make any sense at all.

I'm also puzzled by the one user comment we have "watching a streaming video":  this code happens when the cache is first opened, so it seems like it would hard to have it happen while watching a video.

Comment 3

6 years ago
(In reply to Jason Duell (:jduell) from comment #2)
> I'm also puzzled by the one user comment we have "watching a streaming
> video":  this code happens when the cache is first opened, so it seems like
> it would hard to have it happen while watching a video.
If you look at that stack, thread 0 (UI thread) was explicitly clearing the cache from script. The other crash reports show the cache being cleared on a background thread.

Comment 4

6 years ago
None of the linked stacks were able to identify the source line for the top frame, nor do they hand out the actual EIP for any of the stack frames :-(

Comment 5

6 years ago
I'm going to nominate this one for tracking. It looks like a regression to me. Can we can some further action on this?
tracking-firefox8: --- → ?
Keywords: regression
From what I can see in crash stats, this crash signature has been around in low volume since the 3.0 days. In the last week there were 252 crashes across all versions, Windows only.
Breaking this down a bit further:

127 crashes in Aurora in the last week on Windows. First instance of Aurora crashes showed up in crash stats using the 2011082600 build.

16 crashes in on the trunk in the last week on Windows. First instance of trunk crashes showed up in crash stats using the 2011090800 build.

The bug noted in Comment 2 never landed on Aurora and is only on mozilla Central, so there may possibly be different causes for this crash?
Keywords: qawanted, regressionwindow-wanted
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=09935ede3c77&tochange=b7d269a291b6 is the regression range on the trunk according to crash stats - forgot to add it.

Comment 9

6 years ago
It's currently #17 top crasher in 8.0b1 and happens mainly close to startup.

Comment 10

6 years ago
(In reply to Marcia Knous [:marcia] from comment #7)
> First instance of Aurora crashes showed up in crash stats using the 2011082600
> build.
I see crashes before:
https://crash-stats.mozilla.com/report/list?range_value=30&range_unit=days&signature=nsLocalFile%3A%3ACopyMove%28nsIFile*%2C%20nsAString_internal%20const%26%2C%20int%2C%20int%29&version=Firefox:8.0a2

> First instance of trunk crashes showed up in crash stats using the 2011090800
> build.
I see crashes before:
https://crash-stats.mozilla.com/report/list?range_value=30&range_unit=days&signature=nsLocalFile%3A%3ACopyMove%28nsIFile*%2C%20nsAString_internal%20const%26%2C%20int%2C%20int%29&version=Firefox:9.0a1

The regression range is probably in 8.0a1.
Version: unspecified → 8 Branch

Comment 11

6 years ago
(In reply to Marcia Knous [:marcia] from comment #7)
> The bug noted in Comment 2 never landed on Aurora and is only on mozilla
> Central, so there may possibly be different causes for this crash?
The patch landed in 8.0a1, so it could be the culprit because this crash was already existing in 8.0a2 (see comment 10).

One comment says it doesn't happen in Private Browsing.

Updated

6 years ago
Component: Networking: File → Networking: Cache
QA Contact: networking.file → networking.cache

Comment 12

6 years ago
It first appeared in 8.0a1/20110729:
https://crash-stats.mozilla.com/report/list?range_value=30&range_unit=days&date=2011-08-17&signature=nsLocalFile%3A%3ACopyMove%28nsIFile*%2C%20nsAString_internal%20const%26%2C%20int%2C%20int%29&version=Firefox:8.0a1
The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a627b24e684e&tochange=cef1817c3b13
The likely culprit is bug 673808.
Blocks: 673808
Keywords: regressionwindow-wanted
Assignee: nobody → michal.novotny

Comment 13

6 years ago
It rised to #13 top crasher in 8.0b2.
(Assignee)

Comment 14

6 years ago
Created attachment 567105 [details] [diff] [review]
fix

This crash occurs when the directory is deleted after the call to Exists() at http://hg.mozilla.org/mozilla-central/annotate/ca73f057dab7/xpcom/io/nsLocalFileWin.cpp#l1633 and before call to GetDirectoryEntries() at http://hg.mozilla.org/mozilla-central/annotate/ca73f057dab7/xpcom/io/nsLocalFileWin.cpp#l1651.

The patch fixes just the crash and doesn't fix the race condition in the cache deleting logic. IMO it needs to be completely rewritten and I'll file a follow-up bug for this issue.
Attachment #567105 - Flags: review?(benjamin)
(Assignee)

Updated

6 years ago
Assignee: michal.novotny → nobody
Component: Networking: Cache → Networking: File
QA Contact: networking.cache → networking.file

Updated

6 years ago
Assignee: nobody → michal.novotny

Comment 15

6 years ago
This is now the #8 topcrash in 8.*, I hope we can get that reviewed and landed really soon!
Comment on attachment 567105 [details] [diff] [review]
fix

Review of attachment 567105 [details] [diff] [review]:
-----------------------------------------------------------------

Michal, were you able to reproduce the crash on Windows? And, were you able to verify experimentally that the fix avoids the crash?

::: xpcom/io/nsLocalFileWin.cpp
@@ +1649,5 @@
>  
>              nsCOMPtr<nsISimpleEnumerator> targetIterator;
>              rv = target->GetDirectoryEntries(getter_AddRefs(targetIterator));
> +            if (NS_FAILED(rv)
> +                return rv;

There is a missing closing parenthesis here.
(Assignee)

Comment 17

6 years ago
(In reply to Brian Smith (:bsmith) from comment #16)
> Michal, were you able to reproduce the crash on Windows? And, were you able
> to verify experimentally that the fix avoids the crash?

I was able to reproduce the crash on Windows exactly at this place, so this patch should fix it. I'll try to run my test with this patch, but as I said it doesn't fix the race condition while deleting the trash. I expect that with this patch the disk cache will stop working when MoveToNative() fails, because we won't initialize the disk device in nsDiskCacheDevice::ClearDiskCache(). BTW the follow-up bug is #695003.
(Assignee)

Comment 18

6 years ago
I've verified that and it works as I described. The patch fixes the crash but the disk device stays in an uninitialized state.
Comment on attachment 567105 [details] [diff] [review]
fix

Review of attachment 567105 [details] [diff] [review]:
-----------------------------------------------------------------

r=bsmith for preventing the crash. Bug 695003 is the follow-up bug for actually fixing the race condition.

nsLocalFileUnix.cpp already contains this error check.
Attachment #567105 - Flags: review+
Michal, make sure you fix the syntax error I mentioned in comment 16 before you check this in.
(Assignee)

Comment 21

6 years ago
http://hg.mozilla.org/integration/mozilla-inbound/rev/1eb208e16555
Whiteboard: [inbound]
Target Milestone: --- → mozilla10

Comment 22

6 years ago
Comment on attachment 567105 [details] [diff] [review]
fix

This is one of the top 10 crashers in Firefox 8, so requesting approval for Aurora and Beta. Even if that only fixes the crash and not the underlying problem, I think we're better off with some problems deleting the cache than with such an action crashing the whole application.
Attachment #567105 - Flags: approval-mozilla-beta?
Attachment #567105 - Flags: approval-mozilla-aurora?
> I expect that with this patch the disk cache will stop working when 
> MoveToNative() fails

Note to drivers: the above sounds scary, but is actually much better than a crash (the browser won't use the disk cache for their current session only:  the RAM cache still works, so it's not a huge perf hit, and then things return to normal at the next restart).  IMO worth landing to aurora/beta to fix the crashes.
https://hg.mozilla.org/mozilla-central/rev/1eb208e16555
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED

Updated

6 years ago
Crash Signature: [@ nsLocalFile::CopyMove(nsIFile*, nsAString_internal const&, int, int)] → [@ nsLocalFile::CopyMove(nsIFile*, nsAString_internal const&, int, int)] [@ nsLocalFile::CopyMove(nsIFile*, nsAString_internal const&, bool, bool) ]
(Assignee)

Comment 25

6 years ago
Created attachment 568434 [details]
testcase to reproduce the crash
Comment on attachment 567105 [details] [diff] [review]
fix

Approved per todays triage meeting.
Attachment #567105 - Flags: approval-mozilla-beta?
Attachment #567105 - Flags: approval-mozilla-beta+
Attachment #567105 - Flags: approval-mozilla-aurora?
Attachment #567105 - Flags: approval-mozilla-aurora+

Comment 27

6 years ago
---------------------------------[ Triage Comment ]---------------------------------

Tracking as this is a top 10 crasher for Firefox 8 while in beta.
tracking-firefox8: ? → +
tracking-firefox9: --- → +
Attachment #567105 - Flags: review?(benjamin) → review+

Comment 28

6 years ago
http://hg.mozilla.org/releases/mozilla-aurora/rev/4da434beb537
http://hg.mozilla.org/releases/mozilla-beta/rev/a3491d0f5e7b
status-firefox8: --- → fixed
status-firefox9: --- → fixed
Blocks: 695003
Other than checking Socorro, is there a testcase QA can use to verify this fix?
Whiteboard: [inbound] → [inbound][qa?]
There is a testcase in the attachments. I cannot crash using Mozilla/5.0 (Windows NT 5.1; rv:11.0a1) Gecko/20111122 Firefox/11.0a1 - my Win 7 VM is not working ATM.

(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #29)
> Other than checking Socorro, is there a testcase QA can use to verify this
> fix?
Whiteboard: [inbound][qa?] → [inbound][qa+]
I could not reproduce the crash on Firefox 9.0b2, the latest Nighlty and Aurora using the test case from Comment 25, but checking in Socoro I found some crash reports on builds released after the fixes.

https://crash-stats.mozilla.com/report/index/0062b77f-8dfd-4732-80a9-6d73e2111122

https://crash-stats.mozilla.com/report/index/ab255e67-254c-44c8-8c1a-4009f2111122
https://crash-stats.mozilla.com/report/index/db84a35e-373b-4d1b-b72b-fa8da2111121
https://crash-stats.mozilla.com/report/index/fc2ff216-170a-467f-9064-a29002111121
https://crash-stats.mozilla.com/report/index/86384a39-1b67-47b1-a387-13f362111121
https://crash-stats.mozilla.com/report/index/243b4c87-3f96-490d-b540-f5eee2111121
https://crash-stats.mozilla.com/report/index/1cc4c319-78fa-4237-89f2-19af52111121
https://crash-stats.mozilla.com/report/index/696d050f-13a2-492d-b757-f2c882111120
https://crash-stats.mozilla.com/report/index/696d050f-13a2-492d-b757-f2c882111120
https://crash-stats.mozilla.com/report/index/86961e9b-a06e-4cae-a6ed-adc832111120
https://crash-stats.mozilla.com/report/index/789b4d8a-408d-4a71-9ba1-643de2111120
https://crash-stats.mozilla.com/report/index/5c7798ea-bb49-40f3-a70e-6fa762111119
https://crash-stats.mozilla.com/report/index/530ba01f-66fc-46f9-8063-83bfd2111119

Can this bug be reopened?

Thank you!

Comment 32

6 years ago
(In reply to Simona B [QA] from comment #31)
> Can this bug be reopened?
File a new one.

Comment 33

6 years ago
(In reply to Scoobidiver from comment #32)
> (In reply to Simona B [QA] from comment #31)
> > Can this bug be reopened?
> File a new one.

Would be better to explain why here to not sound rude. :)

If it's the same issue and not (completely) fixed, reopening is in order. If the original case was fixed but there's a different one with similar symptoms (which may be the case here), a new bug is the best thing to do, as the solution might need to be completely different.
(Assignee)

Comment 34

6 years ago
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #33)
> Would be better to explain why here to not sound rude. :)

These crashes have a completely different backtrace. It could be that there is still some problem in nsLocalFile::CopyMove() but it is definitely on a different line.

Comment 35

6 years ago
(In reply to Michal Novotny (:michal) from comment #34)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #33)
> > Would be better to explain why here to not sound rude. :)
> 
> These crashes have a completely different backtrace. It could be that there
> is still some problem in nsLocalFile::CopyMove() but it is definitely on a
> different line.

Yes, sure. I was just explaining why a new bug is useful here, as I found comment #32 to sound rude without such an explanation.

Comment 36

6 years ago
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #35)
> Yes, sure. I was just explaining why a new bug is useful here, as I found
> comment #32 to sound rude without such an explanation.
Sorry for that. I filed bug 704817 instead.
Verified issue using the test case attached in Comment 25. I cannot crash using: 
Mozilla/5.0 (Windows NT 6.1; rv:9.0) Gecko/20100101 Firefox/9.0
Mozilla/5.0 (Windows NT 6.1; rv:10.0a2) Gecko/20111128 Firefox/10.0a2
Mozilla/5.0 (Windows NT 6.1; rv:11.0a1) Gecko/20111127 Firefox/11.0a1

Based on Comment 34 setting resolution to VERIFIED FIXED.
Status: RESOLVED → VERIFIED
Keywords: verified-aurora, verified-beta
Whiteboard: [inbound][qa+] → [inbound][qa!]
You need to log in before you can comment on or make changes to this bug.