Closed Bug 1125940 Opened 10 years ago Closed 10 years ago

Monkey Crash [@ mozilla::dom::File::IsFile]

Categories

(Core :: DOM: Core & HTML, defect, P3)

ARM
Gonk (Firefox OS)
defect

Tracking

()

RESOLVED FIXED
mozilla38
blocking-b2g 2.2+
Tracking Status
firefox36 --- wontfix
firefox37 --- fixed
firefox38 --- fixed
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: ntroast, Assigned: baku)

References

Details

(Keywords: crash, Whiteboard: [b2g-crash][caf-crash 490][caf priority: p1][CR 786703])

Crash Data

Attachments

(13 files, 2 obsolete files)

747 bytes, text/plain
Details
415.82 KB, text/plain
Details
117.10 KB, text/plain
Details
339.08 KB, text/plain
Details
717 bytes, text/plain
Details
527.94 KB, text/plain
Details
717 bytes, text/plain
Details
527.94 KB, text/plain
Details
122.83 KB, text/plain
Details
343.79 KB, text/plain
Details
120.22 KB, text/plain
Details
345.15 KB, text/plain
Details
1001 bytes, patch
khuey
: review+
Details | Diff | Splinter Review
We have been observing the following crash during monkey runs.

[@ mozilla::dom::File::IsFile | mozilla::dom::File::QueryInterface | nsQueryInterface::operator() | nsCOMPtr_base::assign_from_qi ]

STR not available. cafbot will upload minidump.
Attached file decoded minidump -
We've seen this crash signature on both kk and L builds, around 100 times total over past couple of weeks.
Flags: needinfo?(bbajaj)
Whiteboard: [CR 786703]
Whiteboard: [CR 786703] → [caf priority: p1][CR 786703]
Whiteboard: [caf priority: p1][CR 786703] → [b2g-crash][caf-crash 490][caf priority: p1][CR 786703]
Keywords: crash
Observed on: 

Device: msm8909
Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.054
Moz BuildID: 20150120002507
B2G Version: 2.2
Gecko Version: 37.0a2
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=f5b3d1b6cfa3e702033f613915ae637cb735cbfb
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=bb6c4d3fc51281f1a2ee0ff471221dbc1d0d1035
Patches: bug 1067629, bug 1117862, bug 1122162, bug 1119980, bug 1091307, bug 1106637
:sku/ken,

Can we please find an owner for this?
blocking-b2g: 2.2? → 2.2+
Flags: needinfo?(sku)
Flags: needinfo?(kchang)
Flags: needinfo?(bbajaj)
Hi Aknow,
  Can you please take this bug? If you are not available, please ni back to me.
Thanks.
Flags: needinfo?(kchang) → needinfo?(szchen)
(bumping priority as we now this crash as a part of basic build sanity testing)
Severity: major → blocker
Priority: -- → P1
Hi,
 Could you help clarify question below?

This crash was hit by doing memory report.
Will you monkey script dump memory info?


Below might be the root cause of issue.
1. Someone request memory dump, but delete the File object before finishing dumping.
2. Data partition is full, that no available space for memory dump.
3. Other timing issue. 


0|0|libxul.so|mozilla::dom::File::IsFile|/local/mnt/workspace/lnxbuild/project/trees_in_use/free_tree_b2g_manifest_LF.BF.1.1_msm8610_commander_7081765/checkout/gecko/dom/base/File.cpp|315|0x0
0|1|libxul.so|mozilla::dom::File::QueryInterface|/local/mnt/workspace/lnxbuild/project/trees_in_use/free_tree_b2g_manifest_LF.BF.1.1_msm8610_commander_7081765/checkout/gecko/dom/base/File.cpp|155|0x5
0|2|libxul.so|nsQueryInterface::operator()|/local/mnt/workspace/lnxbuild/project/trees_in_use/free_tree_b2g_manifest_LF.BF.1.1_msm8610_commander_7081765/checkout/gecko/xpcom/glue/nsCOMPtr.cpp|14|0x5
0|3|libxul.so|nsCOMPtr_base::assign_from_qi|/local/mnt/workspace/lnxbuild/project/trees_in_use/free_tree_b2g_manifest_LF.BF.1.1_msm8610_commander_7081765/checkout/gecko/xpcom/glue/nsCOMPtr.cpp|59|0x7
0|4|libxul.so|xpc::OrphanReporter::sizeOfIncludingThis|/local/mnt/workspace/lnxbuild/project/trees_in_use/free_tree_b2g_manifest_LF.BF.1.1_msm8610_commander_7081765/checkout/out/target/product/msm8610/obj/objdir-gecko/dist/include/nsCOMPtr.h|482|0x3
0|5|libxul.so|StatsCellCallback<(Granularity)0u>|/local/mnt/workspace/lnxbuild/project/trees_in_use/free_tree_b2g_manifest_LF.BF.1.1_msm8610_commander_7081765/checkout/gecko/js/src/vm/MemoryMetrics.cpp|426|0x7
0|6|libxul.so|IterateCompartmentsArenasCells|/local/mnt/workspace/lnxbuild/project/trees_in_use/free_tree_b2g_manifest_LF.BF.1.1_msm8610_commander_7081765/checkout/gecko/js/src/gc/Iteration.cpp|49|0x3
0|7|libxul.so|js::IterateZonesCompartmentsArenasCells|/local/mnt/workspace/lnxbuild/project/trees_in_use/free_tree_b2g_manifest_LF.BF.1.1_msm8610_commander_7081765/checkout/gecko/js/src/gc/Iteration.cpp|66|0x11
0|8|libxul.so|JS::CollectRuntimeStats|/local/mnt/workspace/lnxbuild/project/trees_in_use/free_tree_b2g_manifest_LF.BF.1.1_msm8610_commander_7081765/checkout/gecko/js/src/vm/MemoryMetrics.cpp|722|0x3
0|9|libxul.so|xpc::JSReporter::CollectReports|/local/mnt/workspace/lnxbuild/project/trees_in_use/free_tree_b2g_manifest_LF.BF.1.1_msm8610_commander_7081765/checkout/gecko/js/xpconnect/src/XPCJSRuntime.cpp|2824|0xd
0|10|libxul.so|nsWindowMemoryReporter::CollectReports|/local/mnt/workspace/lnxbuild/project/trees_in_use/free_tree_b2g_manifest_LF.BF.1.1_msm8610_commander_7081765/checkout/gecko/dom/base/nsWindowMemoryReporter.cpp|547|0xf
0|11|libxul.so|nsMemoryReporterManager::GetReportsForThisProcessExtended|/local/mnt/workspace/lnxbuild/project/trees_in_use/free_tree_b2g_manifest_LF.BF.1.1_msm8610_commander_7081765/checkout/gecko/xpcom/base/nsMemoryReporterManager.cpp|1270|0x9
0|12|libxul.so|nsMemoryReporterManager::StartGettingReports|/local/mnt/workspace/lnxbuild/project/trees_in_use/free_tree_b2g_manifest_LF.BF.1.1_msm8610_commander_7081765/checkout/gecko/xpcom/base/nsMemoryReporterManager.cpp|1206|0xf
0|13|libxul.so|nsMemoryReporterManager::GetReportsExtended|/local/mnt/workspace/lnxbuild/project/trees_in_use/free_tree_b2g_manifest_LF.BF.1.1_msm8610_commander_7081765/checkout/gecko/xpcom/base/nsMemoryReporterManager.cpp|1183|0x5
0|14|libxul.so|DumpMemoryInfoToFile|/local/mnt/workspace/lnxbuild/project/trees_in_use/free_tree_b2g_manifest_LF.BF.1.1_msm8610_commander_7081765/checkout/gecko/xpcom/base/nsMemoryInfoDumper.cpp|670|0x17
0|15|libxul.so|nsMemoryInfoDumper::DumpMemoryInfoToTempDir|/local/mnt/workspace/lnxbuild/project/trees_in_use/free_tree_b2g_manifest_LF.BF.1.1_msm8610_commander_7081765/checkout/gecko/xpcom/base/nsMemoryInfoDumper.cpp|760|0xd
0|16|libxul.so|DumpMemoryInfoToTempDirRunnable::Run|/local/mnt/workspace/lnxbuild/project/trees_in_use/free_tree_b2g_manifest_LF.BF.1.1_msm8610_commander_7081765/checkout/gecko/xpcom/base/nsMemoryInfoDumper.cpp|76|0xd
0|17|libxul.so|nsThread::ProcessNextEvent|/local/mnt/workspace/lnxbuild/project/trees_in_use/free_tree_b2g_manifest_LF.BF.1.1_msm8610_commander_7081765/checkout/gecko/xpcom/threads/nsThread.cpp|855|0x5
0|18|libxul.so|NS_ProcessNextEvent|/local/mnt/workspace/lnxbuild/project/trees_in_use/free_tree_b2g_manifest_LF.BF.1.1_msm8610_commander_7081765/checkout/gecko/xpcom/glue/nsThreadUtils.cpp|265|0xb
Flags: needinfo?(ntroast)
Observed on: 

Device: msm8909
Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.060
Moz BuildID: 20150201002504
B2G Version: 2.2
Gecko Version: 37.0a2
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=d6141fa3208f224393269e17c39d1fe53b7e6a05
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=29ba8859a5dcf7c207ba2f846a99dbe3a4232bb5
Patches: bug 1067629, bug 1117862, bug 1091307, bug 1106637
Our monkey script is collecting gecko memory reports.

We found that a file was filling the /data partition, so it is likely the condition that causes Gecko to crash when collecting a memory report.

Gecko memory report should gracefully fail if the /data partition is full, or if the /data partition is filled during the collection process.
Flags: needinfo?(ntroast)
Hi Nicholas,
 Can you please share the available storage on problematic device via below command to us?

> adb shell df /data
Filesystem               Size     Used     Free   Blksize
/data                    2.0G    53.2M     2.0G   4096

1. If /data is really full by memory report (triggered by monkey), I don't think this should be P1 issue.
2. From user perspective, there is no way for user to update /date to full due to 5MB low storage threshold.

In short, if #1 case is real, this issue should be fixed as a long term solution, but not P1 blocker.
Flags: needinfo?(ntroast)
(In reply to shawn ku [:sku] from comment #11)
> In short, if #1 case is real, this issue should be fixed as a long term
> solution, but not P1 blocker.

Yep, I agree too.  It's not a blocker but just something to resolve for general device stability -- programs (not the user) could potentially fill up /data and cause the same crash in the field.
Severity: blocker → normal
Priority: P1 → P3
Clear my ni because Shawn has already been working on it and provided some feedback. Also, seems the bug is not a blocker now.
Flags: needinfo?(szchen)
/data had 0.0G Free when I checked it on the problematic device
Flags: needinfo?(ntroast)
[Blocking Requested - why for this release]:

Hi Ken:
 Please help assign a member to study/finish this task per all the conversation we have on the bug.

I expect this issue can be fixed on m-c first with gracefully handling of 0 space condition. Even user will not hit this, programs might do such things from gecko, gonk or even kernel side.

Thanks!!
Shawn
blocking-b2g: 2.2+ → 3.0?
Flags: needinfo?(sku) → needinfo?(kchang)
Hi Thinker,
  Who is familiar with this part. It seems that we need to check the reaming storage of data partition before creating memory report.
Flags: needinfo?(kchang) → needinfo?(tlee)
Or instead of checking available disk space beforehand, just gracefully exit if a write fails, etc.  Checking space beforehand is no guarantee another program won't take it away later.
Observed on: 

Device: msm8909
Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.063
Moz BuildID: 20150204002509
B2G Version: 3.0
Gecko Version: 37.0a2
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=a4c4cc86303a554facb8f45b7e764e5c4473c3de
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=5523db61e51c5263191d5df2f4b40f78c82e8746
Patches: bug 1067629, bug 1117862, bug 1091307, bug 1106637
(In reply to Michael Vines [:m1] [:evilmachines] from comment #17)
> Or instead of checking available disk space beforehand, just gracefully exit
> if a write fails, etc.  Checking space beforehand is no guarantee another
> program won't take it away later.

I agree with you partly!
All we need to do is to print an error message to explain the reason, and crash the program at point immediately.
Flags: needinfo?(tlee) → needinfo?(sku)
Flags: needinfo?(sku)
This crash has been observed on v2.2 when /data is not full.
blocking-b2g: 3.0? → 2.2?
triage: blocking CAF-FL-2.2
blocking-b2g: 2.2? → 2.2+
Comment on attachment 8563255 [details]
EXTRA file attachment - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.071

(cafbot screwed up here)
Attachment #8563255 - Attachment is obsolete: true
Comment on attachment 8563256 [details]
decoded minidump - AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.071

(cafbot screwed up here)
Attachment #8563256 - Attachment is obsolete: true
Can we get somebody on this bug please?
Flags: needinfo?(tlee)
Hi Michael:
 There is no new attachment for case mentioned in comment 21.
Please have a person to update bugzilla first.

Besides, few things to update too.
1. available storage left on device?
2. Since there is no clear STR, can Mozilla setup monkey to repo. this issue by our-self? Who can help this topic?
Flags: needinfo?(mvines)
(In reply to shawn ku [:sku] from comment #30)
> Hi Michael:
>  There is no new attachment for case mentioned in comment 21.
> Please have a person to update bugzilla first.

cafbot just uploaded the latest occurrence of this crash, from a build based on Mozilla build ID 20150211183853.  We see this crash consistently a couple times a day still

> Besides, few things to update too.
> 1. available storage left on device?

Filesystem               Size     Used     Free   Blksize
/dev                   437.5M    52.0K   437.4M   4096
/sys/fs/cgroup         437.5M    12.0K   437.4M   4096
/mnt/asec              437.5M     0.0K   437.5M   4096
/mnt/obb               437.5M     0.0K   437.5M   4096
/system                  1.2G   300.4M   880.4M   4096
/data                    1.7G    54.7M     1.6G   4096
/cache                 248.0M   156.0K   247.8M   4096
/persist                27.5M    92.0K    27.4M   4096
/firmware               64.0M    48.5M    15.4M   16384
/mnt/media_rw/sdcard     7.4G   291.0M     7.1G   32768
/mnt/secure/asec         7.4G   291.0M     7.1G   32768
/storage/sdcard          7.4G   291.0M     7.1G   32768

> 2. Since there is no clear STR, can Mozilla setup monkey to repo. this issue
> by our-self? Who can help this topic?

https://github.com/mozilla-b2g/orangutan can probably help.  I'm a little surprised the Mozilla QA team isn't already running monkey on v2.2 though.
Flags: needinfo?(mvines)
The last logcat entries before the crash (which are consistent with other instances of this crash as well) from the .extra file might contain a hint?

---
01-01 00:07:56.831   944   944 D slogger : Triggered Gecko memory report for iteration 0
01-01 00:07:56.832  3356  3370 I Gecko:DumpUtils: FifoWatcher(command:memory report) dispatching memory report runnable.
01-01 00:07:56.832  3356  3370 I Gecko:DumpUtils: FifoWatcher closing and re-opening fifo.
01-01 00:07:56.834  3356  3356 I DMD     : opened /data/local/tmp/memory-reports/dmd-476-4122.json.gz for writing
01-01 00:07:56.837  3356  3356 I DMD     : opened /data/local/tmp/memory-reports/dmd-476-3935.json.gz for writing
01-01 00:07:56.840  3356  3356 I DMD     : opened /data/local/tmp/memory-reports/dmd-476-3356.json.gz for writing
01-01 00:07:59.103  3356  3701 E libsuspend: Error reading from /sys/power/wakeup_count: Interrupted system call
---

"slogger" is a logging daemon (which has been unmodified and active for months before this crash was first observed) that is requesting a Gecko memory report by writing "memory report" to the /data/local/debug_info_trigger pipe, much like what this code does -- https://github.com/mozilla-b2g/B2G/blob/3a3278e0f3bac37741a81c9a793dd836e7622f9a/tools/get_about_memory.py#L212-L220
(In reply to Michael Vines [:m1] [:evilmachines] from comment #33)
> (In reply to shawn ku [:sku] from comment #30)
> > Hi Michael:
> >  There is no new attachment for case mentioned in comment 21.
> > Please have a person to update bugzilla first.
> 
> cafbot just uploaded the latest occurrence of this crash, from a build based
> on Mozilla build ID 20150211183853.  We see this crash consistently a couple
> times a day still
> 
> > Besides, few things to update too.
> > 1. available storage left on device?
> 
> Filesystem               Size     Used     Free   Blksize
> /dev                   437.5M    52.0K   437.4M   4096
> /sys/fs/cgroup         437.5M    12.0K   437.4M   4096
> /mnt/asec              437.5M     0.0K   437.5M   4096
> /mnt/obb               437.5M     0.0K   437.5M   4096
> /system                  1.2G   300.4M   880.4M   4096
> /data                    1.7G    54.7M     1.6G   4096
> /cache                 248.0M   156.0K   247.8M   4096
> /persist                27.5M    92.0K    27.4M   4096
> /firmware               64.0M    48.5M    15.4M   16384
> /mnt/media_rw/sdcard     7.4G   291.0M     7.1G   32768
> /mnt/secure/asec         7.4G   291.0M     7.1G   32768
> /storage/sdcard          7.4G   291.0M     7.1G   32768
> 
> > 2. Since there is no clear STR, can Mozilla setup monkey to repo. this issue
> > by our-self? Who can help this topic?
> 
> https://github.com/mozilla-b2g/orangutan can probably help.  I'm a little
> surprised the Mozilla QA team isn't already running monkey on v2.2 though.

Michael:
 For QA part, I need to check with them about the status of monkey.
ni is just in case we don't need back-n-forth.

Thanks for your information.
After talk with Shawn, the situation of this bug more complicate than what I had acknowledged.  It seems includes multiple issues.  Shawn would follow up this bug to figure out all of them, then we may open more bugs and assign bugs to someone.
Flags: needinfo?(tlee)
(In reply to Thinker Li [:sinker] from comment #36)
> After talk with Shawn, the situation of this bug more complicate than what I
> had acknowledged.  It seems includes multiple issues.  Shawn would follow up
> this bug to figure out all of them, then we may open more bugs and assign
> bugs to someone.

Sounds a little nebulous, but thanks.  What are the next steps Shawn?  We continue to see this crash multiple times a day in our automation.   Who will carry this forward while Taipei is on holiday?
Flags: needinfo?(sku)
Mike:
 Due to CNY holidays, there is no resource on this issue checking from TPE, could you please help find someone to check issue first?
Flags: needinfo?(mlee)
Andrew,

Can you have someone on the DOM team help investigate this as the Crash signature points to mozilla::dom::File::IsFile and mozilla::dom::File::QueryInterface?

Thanks,
Mike
Component: Stability → DOM
Flags: needinfo?(mlee) → needinfo?(overholt)
Product: Firefox OS → Core
A file that's already been unlinked (and thus had mImpl set to null), perhaps?
Flags: needinfo?(overholt) → needinfo?(amarchesini)
We probably just should not unlink mImpl.  We do similar stuff in IDB.  As long as the object graph that survives cycle collection is acyclic it'll all get torn down appropriately in the end anyways.
Nathan,

Can you help investigate this crash or get this to the engineer who can? You appear to have done the code review for the section of code referenced in the crash signature: nsCOMPtr_base::assign_from_qi

Thanks,
Mike
Flags: needinfo?(nfroyd)
(In reply to Mike Lee [:mlee] from comment #54)
> Can you help investigate this crash or get this to the engineer who can? You
> appear to have done the code review for the section of code referenced in
> the crash signature: nsCOMPtr_base::assign_from_qi

It looks much more likely that the problem resides in mozilla::dom::File, so I think the request to Andrea is sufficient here.
Flags: needinfo?(nfroyd)
Attached patch file.patchSplinter Review
Flags: needinfo?(amarchesini)
Attachment #8566174 - Flags: review?(khuey)
Comment on attachment 8566174 [details] [diff] [review]
file.patch

Review of attachment 8566174 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/File.cpp
@@ +128,5 @@
>  
>  NS_IMPL_CYCLE_COLLECTION_CLASS(File)
>  
>  NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN(File)
> +  // No unlink for mImpl bacause FileImpl is not CC-able and it's needed for QI.

I would just drop this comment entirely.

@@ +134,5 @@
>    NS_IMPL_CYCLE_COLLECTION_UNLINK_PRESERVED_WRAPPER
>  NS_IMPL_CYCLE_COLLECTION_UNLINK_END
>  
>  NS_IMPL_CYCLE_COLLECTION_TRAVERSE_BEGIN(File)
>    // No traverse for mImpl bacause FileImpl is not CC-able.

and this one.
Attachment #8566174 - Flags: review?(khuey) → review+
Setting the assignee to :baku since he is working on the patch.
Assignee: nobody → amarchesini
Comment on attachment 8566174 [details] [diff] [review]
file.patch

We pulled this patch into a couple monkey runs overnight and they could not reproduce the crash.  More testing tonight, but this looks good so far.  Thanks!
Attachment #8566174 - Flags: feedback+
Flags: needinfo?(sku)
https://hg.mozilla.org/mozilla-central/rev/3fea2967b275
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Comment on attachment 8566174 [details] [diff] [review]
file.patch

User impact if declined:  Their device will crash 
Testing completed: Crash has not been reproduced by hours of monkey testing with this patch applied
Attachment #8566174 - Flags: approval-mozilla-b2g37?
Crash Signature: [@ mozilla::dom::File::IsFile | mozilla::dom::File::QueryInterface | nsQueryInterface::operator() | nsCOMPtr_base::assign_from_qi ] → [@ mozilla::dom::File::IsFile | mozilla::dom::File::QueryInterface | nsQueryInterface::operator() | nsCOMPtr_base::assign_from_qi ] [@ mozilla::dom::File::IsFile() const ]
Attachment #8566174 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Comment on attachment 8566174 [details] [diff] [review]
file.patch

We should take this on branch too.
Attachment #8566174 - Flags: approval-mozilla-beta?
Does this fix address a crash in Firefox desktop/mobile as well or the uplift about keeping the branches in sync?
Flags: needinfo?(khuey)
Yes, this could crash any product when taking memory reports.
Flags: needinfo?(khuey)
Comment on attachment 8566174 [details] [diff] [review]
file.patch

Let's take this in 37 Beta as per khuey's request in comment 65. Simple crash fix. Beta+
Attachment #8566174 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
No longer blocks: CAF-v3.0-FL-metabug
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: