Closed Bug 681407 Opened 8 years ago Closed 8 years ago

crash nsLocalFile::CopyMove

Categories

(Core :: Networking: File, defect, critical)

8 Branch
x86
Windows 7
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla10
Tracking Status
firefox8 + fixed
firefox9 + fixed

People

(Reporter: mats, Assigned: michal)

References

Details

(5 keywords, Whiteboard: [inbound][qa!])

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-2ba15c7b-9b36-4a0a-a1f4-840c72110822 .
============================================================= 

Currently at #9 on the "Top Crashers for Firefox 8.0a2" list
with 52 crashes in the past 7 days (all on Windows).
Jason - any ideas here? I think you modified this code somewhat recently.
Yes, I changed the code with this patch in bug 670911, and the crashes seem to only happen after then

  http://hg.mozilla.org/mozilla-central/rev/f5272524806b

But I've got no clue as to how this code could be causing a segfault.  The patch is quite simple, and doesn't cause any new data to be read, just tests against '!aParentDir', which is allowed to be null but is never read from if so.  My patch also changes the call stack here so that aParentDir is now null (used to be non-null), but that should be fine too (and was on the Windows box I tested with).

Been staring at it for a while now and am at a loss.  Any chance this is some malware issue?  In the past that's generally been the issue for necko crashes that don't make any sense at all.

I'm also puzzled by the one user comment we have "watching a streaming video":  this code happens when the cache is first opened, so it seems like it would hard to have it happen while watching a video.
(In reply to Jason Duell (:jduell) from comment #2)
> I'm also puzzled by the one user comment we have "watching a streaming
> video":  this code happens when the cache is first opened, so it seems like
> it would hard to have it happen while watching a video.
If you look at that stack, thread 0 (UI thread) was explicitly clearing the cache from script. The other crash reports show the cache being cleared on a background thread.
None of the linked stacks were able to identify the source line for the top frame, nor do they hand out the actual EIP for any of the stack frames :-(
I'm going to nominate this one for tracking. It looks like a regression to me. Can we can some further action on this?
Keywords: regression
From what I can see in crash stats, this crash signature has been around in low volume since the 3.0 days. In the last week there were 252 crashes across all versions, Windows only.
Breaking this down a bit further:

127 crashes in Aurora in the last week on Windows. First instance of Aurora crashes showed up in crash stats using the 2011082600 build.

16 crashes in on the trunk in the last week on Windows. First instance of trunk crashes showed up in crash stats using the 2011090800 build.

The bug noted in Comment 2 never landed on Aurora and is only on mozilla Central, so there may possibly be different causes for this crash?
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=09935ede3c77&tochange=b7d269a291b6 is the regression range on the trunk according to crash stats - forgot to add it.
It's currently #17 top crasher in 8.0b1 and happens mainly close to startup.
(In reply to Marcia Knous [:marcia] from comment #7)
> First instance of Aurora crashes showed up in crash stats using the 2011082600
> build.
I see crashes before:
https://crash-stats.mozilla.com/report/list?range_value=30&range_unit=days&signature=nsLocalFile%3A%3ACopyMove%28nsIFile*%2C%20nsAString_internal%20const%26%2C%20int%2C%20int%29&version=Firefox:8.0a2

> First instance of trunk crashes showed up in crash stats using the 2011090800
> build.
I see crashes before:
https://crash-stats.mozilla.com/report/list?range_value=30&range_unit=days&signature=nsLocalFile%3A%3ACopyMove%28nsIFile*%2C%20nsAString_internal%20const%26%2C%20int%2C%20int%29&version=Firefox:9.0a1

The regression range is probably in 8.0a1.
Version: unspecified → 8 Branch
(In reply to Marcia Knous [:marcia] from comment #7)
> The bug noted in Comment 2 never landed on Aurora and is only on mozilla
> Central, so there may possibly be different causes for this crash?
The patch landed in 8.0a1, so it could be the culprit because this crash was already existing in 8.0a2 (see comment 10).

One comment says it doesn't happen in Private Browsing.
Component: Networking: File → Networking: Cache
QA Contact: networking.file → networking.cache
Assignee: nobody → michal.novotny
It rised to #13 top crasher in 8.0b2.
Attached patch fixSplinter Review
This crash occurs when the directory is deleted after the call to Exists() at http://hg.mozilla.org/mozilla-central/annotate/ca73f057dab7/xpcom/io/nsLocalFileWin.cpp#l1633 and before call to GetDirectoryEntries() at http://hg.mozilla.org/mozilla-central/annotate/ca73f057dab7/xpcom/io/nsLocalFileWin.cpp#l1651.

The patch fixes just the crash and doesn't fix the race condition in the cache deleting logic. IMO it needs to be completely rewritten and I'll file a follow-up bug for this issue.
Attachment #567105 - Flags: review?(benjamin)
Assignee: michal.novotny → nobody
Component: Networking: Cache → Networking: File
QA Contact: networking.cache → networking.file
Assignee: nobody → michal.novotny
This is now the #8 topcrash in 8.*, I hope we can get that reviewed and landed really soon!
Comment on attachment 567105 [details] [diff] [review]
fix

Review of attachment 567105 [details] [diff] [review]:
-----------------------------------------------------------------

Michal, were you able to reproduce the crash on Windows? And, were you able to verify experimentally that the fix avoids the crash?

::: xpcom/io/nsLocalFileWin.cpp
@@ +1649,5 @@
>  
>              nsCOMPtr<nsISimpleEnumerator> targetIterator;
>              rv = target->GetDirectoryEntries(getter_AddRefs(targetIterator));
> +            if (NS_FAILED(rv)
> +                return rv;

There is a missing closing parenthesis here.
(In reply to Brian Smith (:bsmith) from comment #16)
> Michal, were you able to reproduce the crash on Windows? And, were you able
> to verify experimentally that the fix avoids the crash?

I was able to reproduce the crash on Windows exactly at this place, so this patch should fix it. I'll try to run my test with this patch, but as I said it doesn't fix the race condition while deleting the trash. I expect that with this patch the disk cache will stop working when MoveToNative() fails, because we won't initialize the disk device in nsDiskCacheDevice::ClearDiskCache(). BTW the follow-up bug is #695003.
I've verified that and it works as I described. The patch fixes the crash but the disk device stays in an uninitialized state.
Comment on attachment 567105 [details] [diff] [review]
fix

Review of attachment 567105 [details] [diff] [review]:
-----------------------------------------------------------------

r=bsmith for preventing the crash. Bug 695003 is the follow-up bug for actually fixing the race condition.

nsLocalFileUnix.cpp already contains this error check.
Attachment #567105 - Flags: review+
Michal, make sure you fix the syntax error I mentioned in comment 16 before you check this in.
http://hg.mozilla.org/integration/mozilla-inbound/rev/1eb208e16555
Whiteboard: [inbound]
Target Milestone: --- → mozilla10
Comment on attachment 567105 [details] [diff] [review]
fix

This is one of the top 10 crashers in Firefox 8, so requesting approval for Aurora and Beta. Even if that only fixes the crash and not the underlying problem, I think we're better off with some problems deleting the cache than with such an action crashing the whole application.
Attachment #567105 - Flags: approval-mozilla-beta?
Attachment #567105 - Flags: approval-mozilla-aurora?
> I expect that with this patch the disk cache will stop working when 
> MoveToNative() fails

Note to drivers: the above sounds scary, but is actually much better than a crash (the browser won't use the disk cache for their current session only:  the RAM cache still works, so it's not a huge perf hit, and then things return to normal at the next restart).  IMO worth landing to aurora/beta to fix the crashes.
https://hg.mozilla.org/mozilla-central/rev/1eb208e16555
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Crash Signature: [@ nsLocalFile::CopyMove(nsIFile*, nsAString_internal const&, int, int)] → [@ nsLocalFile::CopyMove(nsIFile*, nsAString_internal const&, int, int)] [@ nsLocalFile::CopyMove(nsIFile*, nsAString_internal const&, bool, bool) ]
Comment on attachment 567105 [details] [diff] [review]
fix

Approved per todays triage meeting.
Attachment #567105 - Flags: approval-mozilla-beta?
Attachment #567105 - Flags: approval-mozilla-beta+
Attachment #567105 - Flags: approval-mozilla-aurora?
Attachment #567105 - Flags: approval-mozilla-aurora+
---------------------------------[ Triage Comment ]---------------------------------

Tracking as this is a top 10 crasher for Firefox 8 while in beta.
Attachment #567105 - Flags: review?(benjamin) → review+
Blocks: 695003
Other than checking Socorro, is there a testcase QA can use to verify this fix?
Whiteboard: [inbound] → [inbound][qa?]
There is a testcase in the attachments. I cannot crash using Mozilla/5.0 (Windows NT 5.1; rv:11.0a1) Gecko/20111122 Firefox/11.0a1 - my Win 7 VM is not working ATM.

(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #29)
> Other than checking Socorro, is there a testcase QA can use to verify this
> fix?
Whiteboard: [inbound][qa?] → [inbound][qa+]
(In reply to Simona B [QA] from comment #31)
> Can this bug be reopened?
File a new one.
(In reply to Scoobidiver from comment #32)
> (In reply to Simona B [QA] from comment #31)
> > Can this bug be reopened?
> File a new one.

Would be better to explain why here to not sound rude. :)

If it's the same issue and not (completely) fixed, reopening is in order. If the original case was fixed but there's a different one with similar symptoms (which may be the case here), a new bug is the best thing to do, as the solution might need to be completely different.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #33)
> Would be better to explain why here to not sound rude. :)

These crashes have a completely different backtrace. It could be that there is still some problem in nsLocalFile::CopyMove() but it is definitely on a different line.
(In reply to Michal Novotny (:michal) from comment #34)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #33)
> > Would be better to explain why here to not sound rude. :)
> 
> These crashes have a completely different backtrace. It could be that there
> is still some problem in nsLocalFile::CopyMove() but it is definitely on a
> different line.

Yes, sure. I was just explaining why a new bug is useful here, as I found comment #32 to sound rude without such an explanation.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #35)
> Yes, sure. I was just explaining why a new bug is useful here, as I found
> comment #32 to sound rude without such an explanation.
Sorry for that. I filed bug 704817 instead.
Verified issue using the test case attached in Comment 25. I cannot crash using: 
Mozilla/5.0 (Windows NT 6.1; rv:9.0) Gecko/20100101 Firefox/9.0
Mozilla/5.0 (Windows NT 6.1; rv:10.0a2) Gecko/20111128 Firefox/10.0a2
Mozilla/5.0 (Windows NT 6.1; rv:11.0a1) Gecko/20111127 Firefox/11.0a1

Based on Comment 34 setting resolution to VERIFIED FIXED.
Status: RESOLVED → VERIFIED
Whiteboard: [inbound][qa+] → [inbound][qa!]
You need to log in before you can comment on or make changes to this bug.