Closed
Bug 1038461
Opened 11 years ago
Closed 10 years ago
Failed to map ion memory in the client on gonk
Categories
(Core :: Graphics: Layers, defect, P1)
Tracking
()
People
(Reporter: tkundu, Assigned: sotaro)
References
Details
(Keywords: crash, Whiteboard: [b2g-crash][caf-crash 217][caf priority: p1][CR 686674])
Attachments
(8 files, 4 obsolete files)
4.94 KB,
patch
|
Details | Diff | Splinter Review | |
4.08 MB,
application/x-bzip
|
Details | |
144.02 KB,
text/plain
|
Details | |
7.20 KB,
patch
|
Details | Diff | Splinter Review | |
3.01 KB,
patch
|
Details | Diff | Splinter Review | |
1.12 MB,
application/x-bzip
|
Details | |
4.60 KB,
patch
|
Details | Diff | Splinter Review | |
3.03 KB,
patch
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #1034294 +++
This bug is created from Bug 1034294 comment 32.
Logs :
---------------------------------------------------------
07-14 16:25:00.210 8506 8536 E qdmemalloc: ion: Failed to map memory in the client: Invalid argument
07-14 16:25:00.210 8506 8536 E qdgralloc: Could not mmap handle 0xb0c55c40, fd=45 (Invalid argument)
07-14 16:25:00.210 8506 8536 E qdgralloc: gralloc_register_buffer: gralloc_map failed
07-14 16:25:00.210 8506 8536 W GraphicBufferMapper: registerBuffer(0xb0c55c40) failed -22 (Invalid argument)
07-14 16:25:00.210 8506 8536 E GraphicBuffer: unflatten: registerBuffer failed: Invalid argument (-22)
07-14 16:25:00.210 8506 8536 I Gecko : ParamTraits<MagicGrallocBufferHandle>::Read() failed to get gralloc buffer
07-14 16:25:00.220 8506 8536 I Gecko : IPDL protocol error: Error deserializing 'MaybeMagicGrallocBufferHandle'
07-14 16:25:00.220 8506 8536 I Gecko : [Child 8506] ###!!! ABORT: IPDL error [PSharedBufferManagerChild]: "Error deserializing 'MaybeMagicGrallocBufferHandle'". abort()ing as a result.: file ../../../../../../../../gecko/ipc/glue/ProtocolUtils.cpp, line 198
07-14 16:25:00.230 232 910 W Adreno-GSL: <gsl_ldd_control:412>: ioctl fd 141 code 0xc02c093d (IOCTL_KGSL_SUBMIT_COMMANDS) failed: errno 22 Invalid argument
07-14 16:25:00.230 8506 8536 E Gecko : mozalloc_abort: [Child 8506] ###!!! ABORT: IPDL error [PSharedBufferManagerChild]: "Error deserializing 'MaybeMagicGrallocBufferHandle'". abort()ing as a result.: file ../../../../../../../../gecko/ipc/glue/ProtocolUtils.cpp, line 198
Crash logs: Please see attachment in bug 1034294 comment 32
Reporter | ||
Updated•11 years ago
|
Blocks: CAF-v2.0-FC-metabug
blocking-b2g: --- → 2.0?
Reporter | ||
Updated•11 years ago
|
Flags: needinfo?(sotaro.ikeda.g)
Comment 1•11 years ago
|
||
Observed on:
Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.029
Moz BuildID: 20140710000201
B2G Version: 2.0
Gecko Version: 32.0a2
Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=35a9b715e7348ec738ff6c8a59f50190390a06f2
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=2fb60c777d3f82d580cba249e5e01a167a01de39
Assignee | ||
Comment 2•11 years ago
|
||
I am not sure if this crash is caused by SharedBufferManagerParent's problem. Bug 1034294 was caused by SharedBufferManagerParent's locking problem. The crash log did not have a log like in Comment 0.
I have no idea how SharedBufferManagerParent could cause the problem like the following log. It says that file descriptor that is delivered to client side is invalid. I am wondering whether the problem is caused by another cause than SharedBufferManagerParent.
> E qdmemalloc: ion: Failed to map memory in the client: Invalid argument
Flags: needinfo?(sotaro.ikeda.g)
Assignee | ||
Comment 3•11 years ago
|
||
Log in comment 0 reminds me Bug 1036905. Although, it happens only on peak device and easy to reproduce.
Updated•11 years ago
|
Assignee: nobody → sotaro.ikeda.g
Assignee | ||
Comment 4•11 years ago
|
||
Update the summary to more correct one.
Summary: Fix SharedBufferManagerParent → Failed to map ion memory in the client on gonk
Updated•11 years ago
|
blocking-b2g: 2.0? → 2.0+
Comment 5•11 years ago
|
||
How/when did we move from pmem to ion?
Assignee | ||
Comment 6•11 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #5)
> How/when did we move from pmem to ion?
Since JB, android provides ion. pmen is vendor specific RAM.
Assignee | ||
Comment 7•11 years ago
|
||
There are links that explain about ion.
- http://groleo.wordpress.com/2012/07/24/ion-buffer-sharing-mechanism/
- https://wiki.linaro.org/BenjaminGaignard/ion
- http://lwn.net/Articles/480055/
Comment 8•11 years ago
|
||
:nical, take a look, is there anything that stands out here?
Flags: needinfo?(nical.bugzilla)
Comment 9•11 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #8)
> :nical, take a look, is there anything that stands out here?
Not much at this point. Looking at mmap in bionic, it can fail if the passed offset isn't 4096-byte aligned, I suppose there is a sanity check on the size somewhere too, but since these two parameters are generated by android's gralloc code it's hard to believe the issue is here. There is also __mmap2 which can fail but I don't think I have access to its source code.
Somewhere in XDA forums I read about someone seeing gralloc errors caused by some shared memory that was open twice when it should have been opened once (the description was as blurry as this) but we are opening a freshly created GraphicBuffer so it's doesn't look like a good candidate either.
Flags: needinfo?(nical.bugzilla)
Comment 10•11 years ago
|
||
I don't think it makes a difference but sBufferKey is uint64 while the key we pass in GrallocBufferRef is int64. We should use the same type if only for clarity.
GrallocBufferRef has a default constructor with "invalid" values in it but should always have its members set by the time we deserialize it. It'd be interesting to double check when deserializing that the GrallocBufferRef we receive isn't equal to a default constructed GrallocBufferRef.
Perhaps the mOwner member in SharedBufferManagerParent has an incorrect value? I don't see checks but I haven't read through all of the code.
Assignee | ||
Comment 11•11 years ago
|
||
Bug 1039883 reduce the gralloc buffer allocations and reduce ion mapping in application process. It might address the problem.
Assignee | ||
Comment 12•11 years ago
|
||
Tapas, can you confirm if Bug 1039883 address the problem?
Flags: needinfo?(tkundu)
Reporter | ||
Comment 13•11 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro PTO July/25 - Aug/3] from comment #12)
> Tapas, can you confirm if Bug 1039883 address the problem?
Sure. Our internal test team is testing with that patch. We will confirm soon.
Comment 14•11 years ago
|
||
Observed on:
Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.035
Moz BuildID: 20140713000201
B2G Version: 2.0
Gecko Version: 32.0a2
Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=ca022f811bcbbda0f89086094a9e92bb220fea18
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=be6908fec84d3e39453275da96c031336f58f23d
Comment 15•11 years ago
|
||
(In reply to Tapas Kumar Kundu from comment #13)
> (In reply to Sotaro Ikeda [:sotaro PTO July/25 - Aug/3] from comment #12)
> > Tapas, can you confirm if Bug 1039883 address the problem?
>
> Sure. Our internal test team is testing with that patch. We will confirm
> soon.
Tapas ignoring the cafbot comment in comment #14 as we got a latest patch in Bug 1039883 ? Sounds reasonable ?
Reporter | ||
Comment 16•11 years ago
|
||
(In reply to bhavana bajaj [:bajaj] [On PTO until July 27 ] from comment #15)
> Tapas ignoring the cafbot comment in comment #14 as we got a latest patch in
> Bug 1039883 ? Sounds reasonable ?
Yeah . you are correct. Please also note that we found a new memleak in bug 1041751
Flags: needinfo?(tkundu)
![]() |
||
Updated•11 years ago
|
status-b2g-v2.0:
--- → affected
status-b2g-v2.1:
--- → affected
Reporter | ||
Comment 17•11 years ago
|
||
We should wait till bug 1041751 is fixed
Comment 18•11 years ago
|
||
Observed on:
Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.040
Moz BuildID: 20140716000201
B2G Version: 2.0
Gecko Version: 32.0a2
Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=5f8b1b8a2da9e3b531eee817a669f57fa4d9b9c6
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=e00f7e464333689fcf54edb4945ece94f97f930b
Comment 19•11 years ago
|
||
Observed on:
Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.042
Moz BuildID: 20140721000201
B2G Version: 2.0
Gecko Version: 32.0a2
Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=8cb1a949f2e9650bb2c5598e78a6f24a58bbaf97
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=5f27d3ee3ccf01ac91a3efacb5e3e22ea62fd73c
Updated•11 years ago
|
Whiteboard: [caf priority: p1][CR 686674] → [b2g-crash][caf-crash 217][caf priority: p1][CR 686674]
Comment 20•11 years ago
|
||
Observed on:
Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.044
Moz BuildID: 20140724160208
B2G Version: 2.0
Gecko Version: 32.0
Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=68226b3fd4eba752307daa5e917238bde253f5ab
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=b07c8ef448ee2c96955dee2a715f575faaaa72bc
Comment 21•11 years ago
|
||
Taking this bug while Sotaro is away.
(In reply to cafbot (PoC: ggrisco) from comment #20)
> Observed on:
>
> Device: msm8610
> Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.044
> Moz BuildID: 20140724160208
> B2G Version: 2.0
> Gecko Version: 32.0
> Gaia:
> http://git.mozilla.org/?p=releases/gaia.git;a=commit;
> h=68226b3fd4eba752307daa5e917238bde253f5ab
> Gecko:
> http://git.mozilla.org/?p=releases/gecko.git;a=commit;
> h=b07c8ef448ee2c96955dee2a715f575faaaa72bc
This is against Gecko 32, but bug 1041751 only landed on master for now.
Assignee: sotaro.ikeda.g → lissyx+mozillians
Comment 22•11 years ago
|
||
(In reply to Tapas Kumar Kundu from comment #17)
> We should wait till bug 1041751 is fixed
Landed in 2.0 on 7/28, so the next 2.0 nightly should contain the fix.
Comment 23•11 years ago
|
||
:milan -- We had cherry-pick'ed the fix from bug 1041751 but we are still seeing this issue.
Comment 24•11 years ago
|
||
Tapas, Inder, do you mind sharing STR for this ? I could not find some proper in other bugs.
Flags: needinfo?(tkundu)
Flags: needinfo?(ikumar)
Reporter | ||
Comment 25•11 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #24)
> Tapas, Inder, do you mind sharing STR for this ? I could not find some
> proper in other bugs.
STR: Run call, sms, music, video, camera, camcorder, airplane on/off, wifi on/off etc testcases for 48 hours.
I knew that STR will be very difficult to reproduce. But we still have at least one mem leak bug 1044514 unresolved.
We will upload latest log soon which will make it clear whether there is a memleak in system when this happens or not. I hope that this will help us to move proper direction towards solution here.
Flags: needinfo?(tkundu)
Flags: needinfo?(ikumar)
Reporter | ||
Updated•11 years ago
|
Flags: needinfo?(tkundu)
Comment 26•11 years ago
|
||
(In reply to Tapas Kumar Kundu from comment #25)
> (In reply to Alexandre LISSY :gerard-majax from comment #24)
> > Tapas, Inder, do you mind sharing STR for this ? I could not find some
> > proper in other bugs.
>
> STR: Run call, sms, music, video, camera, camcorder, airplane on/off, wifi
> on/off etc testcases for 48 hours.
Do you just open/close those apps or do you perform specific steps inside?
>
> I knew that STR will be very difficult to reproduce. But we still have at
> least one mem leak bug 1044514 unresolved.
>
> We will upload latest log soon which will make it clear whether there is a
> memleak in system when this happens or not. I hope that this will help us to
> move proper direction towards solution here.
It's in 2.0 now, I'm waiting for your feedback :)
Comment 27•11 years ago
|
||
Observed on:
Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.048
Moz BuildID: 20140728000238
B2G Version: 2.0
Gecko Version: 32.0
Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=0a864988f5dce7f9f3dea9609e8ef054679c30ff
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=745b486db495248e4d4503039e374cb8d5bb244f
Comment 28•11 years ago
|
||
Tapas, are you still seeing this?
Comment 29•11 years ago
|
||
Yes, we are still seeing this issue, in last stability run we saw it happen twice. We need to have a fix for this asap.
Tapas -- please upload the latest logs.
Flags: needinfo?(lissyx+mozillians)
Comment 30•11 years ago
|
||
(In reply to Inder from comment #29)
> Yes, we are still seeing this issue, in last stability run we saw it happen
> twice. We need to have a fix for this asap.
> Tapas -- please upload the latest logs.
And can you make sure it includes bug 1044514 ? Can you reply to my questions in comment 26 ?
Flags: needinfo?(lissyx+mozillians)
Comment 31•11 years ago
|
||
Tapas, can you make sure your tests includes bug 1044514 ?
Flags: needinfo?(tkundu)
Reporter | ||
Comment 32•11 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #31)
> Tapas, can you make sure your tests includes bug 1044514 ?
We tested with that patch too but we are still hitting this issue in below gaia/gecko :
https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/commit/?h=mozilla/v2.0&id=0a864988f5dce7f9f3dea9609e8ef054679c30ff
https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/commit/?h=mozilla/v2.0&id=745b486db495248e4d4503039e374cb8d5bb244f
Could you please provide us a patch which may help us logging how graphic buffer FDs are created and destroyed by gecko layer tree. This may help us to debug it further. Please suggest.
Flags: needinfo?(lissyx+mozillians)
Comment 33•11 years ago
|
||
Thanks. Can you please reply to my questions from comment 26 ?
Flags: needinfo?(lissyx+mozillians) → needinfo?(tkundu)
Reporter | ||
Comment 34•11 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #26)
> (In reply to Tapas Kumar Kundu from comment #25)
> > (In reply to Alexandre LISSY :gerard-majax from comment #24)
> > > Tapas, Inder, do you mind sharing STR for this ? I could not find some
> > > proper in other bugs.
> >
> > STR: Run call, sms, music, video, camera, camcorder, airplane on/off, wifi
> > on/off etc testcases for 48 hours.
>
> Do you just open/close those apps or do you perform specific steps inside?
>
We are not doing just open/close apps. Stability team is running some test and STR varies from one stability run to next stability run.
This issue is last visible with below STR:
1.Make outgoing call and got connected.
2.Wifi ON and dowloaded games.
3.Data ON/OFF for sometime.
4.Wifi ON/OFF for sometime.
5.BT on/off for sometime.
6.While opening settings device got crashed.
But there is no guarantee that you will see this issue if you try this STR. So please don't try to reproduce in your device as it may waste your time.
> >
> > I knew that STR will be very difficult to reproduce. But we still have at
> > least one mem leak bug 1044514 unresolved.
> >
> > We will upload latest log soon which will make it clear whether there is a
> > memleak in system when this happens or not. I hope that this will help us to
> > move proper direction towards solution here.
>
> It's in 2.0 now, I'm waiting for your feedback :)
I already confirmed this in Comment 32 . I am confirming it again here :) . We are still seeing this issue with patch from bug 1044514
Could you please provide us a logging patch as I suggested in comment 32 ?
Flags: needinfo?(tkundu) → needinfo?(lissyx+mozillians)
Comment 35•11 years ago
|
||
Thanks. I had a look at the bionic and kernel code for the codepath that may lead to EINVAL. Do you mind hacking libc/bionic/mmap.cpp, and add some logcat output to know whether it's the first offset test that returns -EINVAL or whether we are getting this from mmap2 directly?
I also noticed that all the runs reporting the error are on Kitkat codebase, how can we block 2.0 on a KK issue ?
Flags: needinfo?(lissyx+mozillians) → needinfo?(tkundu)
Comment 36•11 years ago
|
||
To help debug of bug 1038461 were bad things are happening with file
descriptors, we hack this to expose the content of the FDs we are
passing and receiving to the underlying libs. Filtering logcat on
'gfxfds' should be enough to gather those logs.
Comment 37•11 years ago
|
||
Tapas, I've looked at the code and checked with nical. With attachment 8467704 [details] [diff] [review] we should be covering all the call sites of gfx/ that makes use of fds. This will dump to logcat, and I hope it will help us identify something we can work on :)
Flags: needinfo?(tkundu)
Comment 38•11 years ago
|
||
> I also noticed that all the runs reporting the error are on Kitkat codebase,
> how can we block 2.0 on a KK issue ?
KK is the official baseline for 2.0 release
So, to confirm we don't need logs in bionic with your patch in comment 36, right?
Comment 39•11 years ago
|
||
(In reply to Inder from comment #38)
> > I also noticed that all the runs reporting the error are on Kitkat codebase,
> > how can we block 2.0 on a KK issue ?
> KK is the official baseline for 2.0 release
All the 2.0 I'm aware of are on JB
>
> So, to confirm we don't need logs in bionic with your patch in comment 36,
> right?
I think it can still be useful to have an eye on what happens at this level, so if you can it's be useful.
Blocks: CAF-v2.0-CC-metabug
Updated•11 years ago
|
Flags: needinfo?(ikumar)
Comment 41•11 years ago
|
||
Sotaro, when this information comes back, can you see if you can help Alexandre if he needs it at this point? It would be very beneficial if we can get this taken care of this week, as we start bumping against CC milestone otherwise.
I do hope this is not specific to KK, as our Flame KK builds are not in the best of shape.
Flags: needinfo?(sotaro.ikeda.g)
Reporter | ||
Comment 42•11 years ago
|
||
(In reply to Milan Sreckovic [:milan] (PTO 8/11 - 8/15) from comment #41)
> Sotaro, when this information comes back, can you see if you can help
> Alexandre if he needs it at this point? It would be very beneficial if we
> can get this taken care of this week, as we start bumping against CC
> milestone otherwise.
>
> I do hope this is not specific to KK, as our Flame KK builds are not in the
> best of shape.
We reproduced it with additional logs. Please note that settings app is crashing here when it fails to map ion memory.
Exact timestamp in log is : 08-07 17:20:24.367
08-07 17:20:24.367 11186 11246 E qdmemalloc: ion: Failed to map memory in the client: Invalid argument
I can also see that dmesg has following logs :
<7>[ 1178.275354] BufferMgrChild: unhandled page fault (11) at 0x00000000, code 0x817
<1>[ 1178.275375] pgd = c0e14000
<1>[ 1178.277329] [00000000] *pgd=08340831, *pte=00000000, *ppte=00000000
<6>[ 1178.283290]
<6>[ 1178.283301] Pid: 11246, comm: BufferMgrChild
<6>[ 1178.283312] CPU: 0 Tainted: G O (3.4.0-g015041d #1)
<6>[ 1178.283324] PC is at 0xb64e3e32
<6>[ 1178.283331] LR is at 0xb64e3e2f
<6>[ 1178.283341] pc : [<b64e3e32>] lr : [<b64e3e2f>] psr: 200f0030
<6>[ 1178.283345] sp : b2fef668 ip : 00000003 fp : b1f83288
<6>[ 1178.283354] r10: 00000000 r9 : b66a31da r8 : b53b5da9
<6>[ 1178.283363] r7 : ffffffff r6 : 00000001 r5 : b2fef6a4 r4 : 00000000
<6>[ 1178.283372] r3 : 00000000 r2 : 0000007b r1 : ac72dc86 r0 : 000000f3
<6>[ 1178.283382] Flags: nzCv IRQs on FIQs on Mode USER_32 ISA Thumb Segment user
<6>[ 1178.283392] Control: 10c5387d Table: 00e1406a DAC: 00000015
<6>[ 1178.283423] [<c010bd94>] (unwind_backtrace+0x0/0xf8) from [<c01116a0>] (__do_user_fault+0xbc/0x134)
<6>[ 1178.283443] [<c01116a0>] (__do_user_fault+0xbc/0x134) from [<c0884848>] (do_page_fault+0x290/0x400)
<6>[ 1178.283469] [<c0884848>] (do_page_fault+0x290/0x400) from [<c010029c>] (do_DataAbort+0x34/0x98)
<6>[ 1178.283486] [<c010029c>] (do_DataAbort+0x34/0x98) from [<c08830b4>] (__dabt_usr+0x34/0x40)
<6>[ 1178.283498] Exception stack(0xc624dfb0 to 0xc624dff8)
<6>[ 1178.283509] dfa0: 000000f3 ac72dc86 0000007b 00000000
<6>[ 1178.283524] dfc0: 00000000 b2fef6a4 00000001 ffffffff b53b5da9 b66a31da 00000000 b1f83288
<6>[ 1178.283537] dfe0: 00000003 b2fef668 b64e3e2f b64e3e32 200f0030 ffffffff
But settings app should be killed by LMK if it is running very low memory:
Device memory when crash happened is :
[H[JEvery 5s: b2g-info 2014-08-07 17:20:23
| megabytes |
NAME PID PPID CPU(s) NICE USS PSS RSS SWAP VSIZE OOM_ADJ USER
b2g 231 1 436.5 0 33.7 35.4 40.9 17.4 237.4 0 root
(Nuwa) 1066 231 2.0 0 0.0 0.1 0.8 7.7 54.1 0 root
Camera 1715 1066 35.4 18 1.6 2.4 6.3 17.9 80.7 11 u0_a1715
Homescreen 7746 1066 6.2 18 0.0 0.6 4.2 15.7 70.4 8 u0_a7746
Usage 11002 1066 6.9 18 1.3 2.2 6.4 12.7 67.6 11 u0_a11002
Settings 11186 231 251.0 1 13.1 14.9 20.2 6.0 84.8 2 u0_a11186
Calendar 11187 1066 3.7 18 0.0 0.8 5.1 14.5 66.3 10 u0_a11187
(Preallocated a 28481 1066 1.1 18 0.0 0.6 4.2 11.0 61.2 1 u0_a28481
System memory info:
Total 167.6 MB
SwapTotal 192.0 MB
Used - cache 145.8 MB
B2G procs (PSS) 56.9 MB
Non-B2G procs 88.9 MB
Free + cache 21.8 MB
Free 3.1 MB
Cache 18.7 MB
SwapFree 89.6 MB
It is also not a memleak issue. My guess is somehow fd=74 is getting closed by some process. Could you please confirm it ?
Flags: needinfo?(tkundu)
Flags: needinfo?(ikumar)
Assignee | ||
Comment 43•11 years ago
|
||
(In reply to Milan Sreckovic [:milan] (PTO 8/11 - 8/15) from comment #41)
> Sotaro, when this information comes back, can you see if you can help
> Alexandre if he needs it at this point?
Okey :-)
Flags: needinfo?(sotaro.ikeda.g)
Assignee | ||
Comment 44•11 years ago
|
||
> I can also see that dmesg has following logs :
>
> <7>[ 1178.275354] BufferMgrChild: unhandled page fault (11) at 0x00000000,
> code 0x817
> <1>[ 1178.275375] pgd = c0e14000
> <1>[ 1178.277329] [00000000] *pgd=08340831, *pte=00000000, *ppte=00000000
> <6>[ 1178.283290]
> <6>[ 1178.283301] Pid: 11246, comm: BufferMgrChild
> <6>[ 1178.283312] CPU: 0 Tainted: G O (3.4.0-g015041d #1)
> <6>[ 1178.283324] PC is at 0xb64e3e32
> <6>[ 1178.283331] LR is at 0xb64e3e2f
> <6>[ 1178.283341] pc : [<b64e3e32>] lr : [<b64e3e2f>] psr: 200f0030
> <6>[ 1178.283345] sp : b2fef668 ip : 00000003 fp : b1f83288
> <6>[ 1178.283354] r10: 00000000 r9 : b66a31da r8 : b53b5da9
> <6>[ 1178.283363] r7 : ffffffff r6 : 00000001 r5 : b2fef6a4 r4 : 00000000
> <6>[ 1178.283372] r3 : 00000000 r2 : 0000007b r1 : ac72dc86 r0 : 000000f3
> <6>[ 1178.283382] Flags: nzCv IRQs on FIQs on Mode USER_32 ISA Thumb
> Segment user
> <6>[ 1178.283392] Control: 10c5387d Table: 00e1406a DAC: 00000015
> <6>[ 1178.283423] [<c010bd94>] (unwind_backtrace+0x0/0xf8) from [<c01116a0>]
> (__do_user_fault+0xbc/0x134)
> <6>[ 1178.283443] [<c01116a0>] (__do_user_fault+0xbc/0x134) from
> [<c0884848>] (do_page_fault+0x290/0x400)
> <6>[ 1178.283469] [<c0884848>] (do_page_fault+0x290/0x400) from [<c010029c>]
> (do_DataAbort+0x34/0x98)
> <6>[ 1178.283486] [<c010029c>] (do_DataAbort+0x34/0x98) from [<c08830b4>]
> (__dabt_usr+0x34/0x40)
> <6>[ 1178.283498] Exception stack(0xc624dfb0 to 0xc624dff8)
> <6>[ 1178.283509] dfa0: 000000f3
> ac72dc86 0000007b 00000000
> <6>[ 1178.283524] dfc0: 00000000 b2fef6a4 00000001 ffffffff b53b5da9
> b66a31da 00000000 b1f83288
> <6>[ 1178.283537] dfe0: 00000003 b2fef668 b64e3e2f b64e3e32 200f0030 ffffffff
The above page fault seems to come from the following ABORT calling.
> 08-07 17:20:24.387 11186 11246 E Gecko : mozalloc_abort: [Child 11186] ###!!! ABORT: IPDL error [PSharedBufferManagerChild]: "Error deserializing 'MaybeMagicGrallocBufferHandle'". abort()ing as a result.: file ../../../../../../../../gecko/ipc/glue/ProtocolUtils.cpp, line 198
Comment 45•11 years ago
|
||
Nicolas, as soon as I see IPDL, I CC you on the bug - we're trying to accelerate this, so if there is something in the logs that catches your eye, let us know.
Sotaro, at first I thought this may be related to the Fence::merge() issues, but your last comment suggests otherwise?
Flags: needinfo?(nical.bugzilla)
Comment 46•11 years ago
|
||
> 08-07 17:20:24.367 11186 11246 E qdmemalloc: ion: Failed to map memory in the client: Invalid argument
> 08-07 17:20:24.367 11186 11246 E qdgralloc: Could not mmap handle 0xb12dc100, fd=74 (Invalid argument)
> 08-07 17:20:24.367 11186 11246 E qdgralloc: gralloc_register_buffer: gralloc_map failed
> 08-07 17:20:24.367 11186 11246 W GraphicBufferMapper: registerBuffer(0xb12dc100) failed -22 (Invalid argument)
> 08-07 17:20:24.367 11186 11246 E GraphicBuffer: unflatten: registerBuffer failed: Invalid argument (-22)
> 08-07 17:20:24.367 11186 11246 I Gecko : ParamTraits<MagicGrallocBufferHandle>::Read() failed to get gralloc buffer
Tapas, I don't understand how to read those logs, especially the "Could not mmap handle" line. Is it the bionic logging you added ?
Does it means that it is the offset check in bionic/libc/bionic/mmap.cpp that is returning -EINVAL, or is it happening after, when calling __mmap2 ?
Flags: needinfo?(tkundu)
Comment 47•11 years ago
|
||
Looking at dmesg I see several strange things:
> 298 <3>[ 313.107074] msm_vfe32_process_error_status: camif error status: 0x80000000
> 299 <3>[ 313.113116] msm_vfe32_process_error_status: violation
> 300 <3>[ 313.118179] msm_vfe32_process_violation_status: black violation
> 301 <3>[ 313.168431] msm_vfe32_process_error_status: camif error status: 0x80000000
Also, Fence timing out:
> 316 <6>[ 360.418365] fence timeout on [c5fd7a00] after 3000ms
> 317 <6>[ 360.418405] fence:
> 318 <6>[ 360.418408] --------------
> 319 <6>[ 360.418412] [c5fd7a00] kgsl-fence: active
> 320 <6>[ 360.418416] kgsl-3d0_b2g(231)-Compositor(968)-1_pt active: 28012 / 28009 retired:28009
> 321 <6>[ 360.418424]
> 322 <4>[ 360.418436] mdss_fb_wait_for_fence: mdp-fence: sync_fence_wait timed out! Waiting 10 more seconds
> 323 <6>[ 360.671207] fence timeout on [c5fd7a80] after 3000ms
> 324 <6>[ 360.671241] fence:
> 325 <6>[ 360.671243] --------------
> 326 <6>[ 360.671246] [c5fd7a80] TextureHostOGL: active
> 327 <6>[ 360.671249] mdss_fb_0_pt signaled@357.415841: 8148 / 8148
> 328 <6>[ 360.671252] kgsl-3d0_b2g(231)-Compositor(968)-1_pt active: 28012 / 28009 retired:28009
> 329 <6>[ 360.671257]
> 330 <6>[ 370.418362] fence timeout on [c5fd7a00] after 10000ms
> 331 <6>[ 370.418391] fence:
> 332 <6>[ 370.418393] --------------
> 333 <6>[ 370.418395] [c5fd7a00] kgsl-fence: active
> 334 <6>[ 370.418398] kgsl-3d0_b2g(231)-Compositor(968)-1_pt active: 28012 / 28009 retired:28009
> 335 <6>[ 370.418403]
> 336 <3>[ 370.418433] mdss_fb_wait_for_fence: mdp-fence: sync_fence_wait failed! ret = ffffffc2
Assignee | ||
Updated•11 years ago
|
Assignee: lissyx+mozillians → sotaro.ikeda.g
Reporter | ||
Comment 48•11 years ago
|
||
(In reply to Alexandre LISSY :gerard-majax from comment #46)
> > 08-07 17:20:24.367 11186 11246 E qdmemalloc: ion: Failed to map memory in the client: Invalid argument
> > 08-07 17:20:24.367 11186 11246 E qdgralloc: Could not mmap handle 0xb12dc100, fd=74 (Invalid argument)
> > 08-07 17:20:24.367 11186 11246 E qdgralloc: gralloc_register_buffer: gralloc_map failed
> > 08-07 17:20:24.367 11186 11246 W GraphicBufferMapper: registerBuffer(0xb12dc100) failed -22 (Invalid argument)
> > 08-07 17:20:24.367 11186 11246 E GraphicBuffer: unflatten: registerBuffer failed: Invalid argument (-22)
> > 08-07 17:20:24.367 11186 11246 I Gecko : ParamTraits<MagicGrallocBufferHandle>::Read() failed to get gralloc buffer
>
> Tapas, I don't understand how to read those logs, especially the "Could not
> mmap handle" line. Is it the bionic logging you added ?
> Does it means that it is the offset check in bionic/libc/bionic/mmap.cpp
> that is returning -EINVAL, or is it happening after, when calling __mmap2 ?
It comes from display HAL. Gecko calls this API to map ION handle.
Look into https://www.codeaurora.org/cgit/quic/la/platform/hardware/qcom/display/tree/libgralloc/ionalloc.cpp?h=b2g_kk_3.5#n154
But it can happen only if this FD is invalid. So my guess is that some process (b2g ?) has invalided this fd somewhere by mistake.
Flags: needinfo?(tkundu)
Assignee | ||
Comment 49•11 years ago
|
||
(In reply to Tapas Kumar Kundu from comment #48)
>
> It comes from display HAL. Gecko calls this API to map ION handle.
>
> Look into
> https://www.codeaurora.org/cgit/quic/la/platform/hardware/qcom/display/tree/
> libgralloc/ionalloc.cpp?h=b2g_kk_3.5#n154
>
> But it can happen only if this FD is invalid. So my guess is that some
> process (b2g ?) has invalided this fd somewhere by mistake.
It seems better to call gralloc hal. I am not sure if fd is actually invalidated only from the log. The erro log comes from gralloc::IonAlloc::map_buffer(). The function calls mmap() and the mmap() seems to call __NR_mmap2, it is system call. I assume the source of error comes from ion driver in kernel.
Comment 50•11 years ago
|
||
Sotaro is working on a patch to get us more logging around ION driver.
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 51•11 years ago
|
||
Function call sequence is is like the following.
MagicGrallocBufferHandle>::Read()
->GraphicBuffer::unflatten()
->GraphicBufferMapper::registerBuffer()
->gralloc_register_buffer() // gralloc hal
->gralloc_map() // gralloc hal
->IonAlloc::map_buffer() // gralloc hal
->mmap() //bionic
->__mmap2() //bionic
->system call with __NR_mmap2
->sys_mmap2() //kernel
->sys_mmap_pgoff()
->do_mmap_pgoff()
->mmap_region()
->ion_mmap()
Assignee | ||
Comment 52•11 years ago
|
||
(In reply to Tapas Kumar Kundu from comment #48)
>
> But it can happen only if this FD is invalid. So my guess is that some
> process (b2g ?) has invalided this fd somewhere by mistake.
If fd is invalid, mmap() seems to fail by "-EBADF".
Assignee | ||
Comment 53•11 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #52)
> (In reply to Tapas Kumar Kundu from comment #48)
> >
> > But it can happen only if this FD is invalid. So my guess is that some
> > process (b2g ?) has invalided this fd somewhere by mistake.
>
> If fd is invalid, mmap() seems to fail by "-EBADF".
sys_mmap_pgoff() returns "-EBADF", if fd is invalid.
https://github.com/mozilla-b2g/codeaurora_kernel_msm/blob/master/mm/mmap.c#L1088
Assignee | ||
Comment 54•11 years ago
|
||
ion_mmap() out put error log, when error happens. But kernel log of attachment 8469555 [details] does not include that error. It seems to mean the following possibilities.
- Kernel error log is not correctly captured.
- Error happens between sys_mmap_pgoff() and mmap_region()
https://www.codeaurora.org/cgit/external/gigabyte/kernel/msm/tree/drivers/gpu/ion/ion.c?h=caf/b2g_kk_3.5#n1110
Assignee | ||
Comment 55•11 years ago
|
||
Assignee | ||
Comment 56•11 years ago
|
||
Tapas, can you test again by applying attachment 8470280 [details] [diff] [review]?
Flags: needinfo?(tkundu)
Assignee | ||
Comment 57•11 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #56)
> Tapas, can you test again by applying attachment 8470280 [details] [diff] [review]
> [review]?
The log is output to kernel log. To analyze the problem. logcat log and kernel log are necessary. Thanks.
Assignee | ||
Comment 58•11 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #57)
> (In reply to Sotaro Ikeda [:sotaro] from comment #56)
> > Tapas, can you test again by applying attachment 8470280 [details] [diff] [review]
> > [review]?
>
> The log is output to kernel log. To analyze the problem. logcat log and
> kernel log are necessary. Thanks.
Is it possible to do testing during weekend?
Comment 59•11 years ago
|
||
Not sure who's "on call" with codeaurora, so adding Michael on NI as well.
Updated•11 years ago
|
Flags: needinfo?(mvines)
Reporter | ||
Comment 61•11 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #58)
> (In reply to Sotaro Ikeda [:sotaro] from comment #57)
> > (In reply to Sotaro Ikeda [:sotaro] from comment #56)
> > > Tapas, can you test again by applying attachment 8470280 [details] [diff] [review]
> > > [review]?
> >
> > The log is output to kernel log. To analyze the problem. logcat log and
> > kernel log are necessary. Thanks.
>
> Is it possible to do testing during weekend?
Yes. We will put your patch in our build and confirm you with logs asap. Thanks a lot for helping us.
Assignee | ||
Comment 62•11 years ago
|
||
(In reply to Tapas Kumar Kundu from comment #61)
> (In reply to Sotaro Ikeda [:sotaro] from comment #58)
> > (In reply to Sotaro Ikeda [:sotaro] from comment #57)
> > > (In reply to Sotaro Ikeda [:sotaro] from comment #56)
> > > > Tapas, can you test again by applying attachment 8470280 [details] [diff] [review]
> > > > [review]?
> > >
> > > The log is output to kernel log. To analyze the problem. logcat log and
> > > kernel log are necessary. Thanks.
> >
> > Is it possible to do testing during weekend?
>
> Yes. We will put your patch in our build and confirm you with logs asap.
> Thanks a lot for helping us.
Thanks, testing with both attachment 8467704 [details] [diff] [review] and attachment 8470280 [details] [diff] [review] seems good.
Assignee | ||
Comment 63•11 years ago
|
||
File descriptors are sent from b2g to content process via unix socket. Function call sequence seems like the following. New file descriptors are allocated under recvmsg().
Channel::ChannelImpl::ProcessOutgoingMessages() //gecko ipc
->sendmsg() // bionic
->system call with __NR_sendmsg
->sys_sendmsg()
->__sys_sendmsg()
->sock_sendmsg_nosec()
->__sock_sendmsg_nosec()
->unix_stream_sendmsg()
->scm_send()
->__scm_send()
->scm_fp_copy()
->// Verify the descriptors and increment the usage count.
->sock_alloc_send_skb() // Grab a buffer
->unix_scm_to_skb() // Send the fds
->unix_attach_fds()
Channel::ChannelImpl::ProcessIncomingMessages() //gecko ipc
->recvmsg() // bionic
->system call with __NR_recvmsg
->sys_recvmsg()
->__sys_recvmsg()
->sock_recvmsg()
->__sock_recvmsg_nosec()
->unix_stream_recvmsg()
->scm_detach_fds()
->new_fd = get_unused_fd_flags()
->__alloc_fd() // allocate a file descriptor
->put_user(new_fd,..);
->get_file(fp[i]);// Bump the usage count
->fd_install(new_fd, fp[i]); // install the file
Assignee | ||
Comment 64•11 years ago
|
||
(In reply to Tapas Kumar Kundu from comment #48)
>
> Look into
> https://www.codeaurora.org/cgit/quic/la/platform/hardware/qcom/display/tree/
> libgralloc/ionalloc.cpp?h=b2g_kk_3.5#n154
>
> But it can happen only if this FD is invalid. So my guess is that some
> process (b2g ?) has invalided this fd somewhere by mistake.
From comment 63, new fd is related only to content process. It seems that any processes except the content process can not invalidate the fd.
Assignee | ||
Comment 65•10 years ago
|
||
The patch is a workaround to the crash caused by unflatten() failure. It changes the crash to gralloc allocation failure. The patch could prevent an application's crash. But it cause rendering problem because of gralloc buffer allocation failure. Therefore, it seems better not to use this patch.
Reporter | ||
Comment 66•10 years ago
|
||
Unfortunately, we got only partial logs when we reproduced this again. I attached partial logs but I am waiting for our test team to reproduce again with full logcat logs.
@sotaro: Can you please check this partial logcat logs and confirm us if you find something useful ?
Flags: needinfo?(sotaro.ikeda.g)
Comment 67•10 years ago
|
||
Observed on:
Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.058
Moz BuildID: 20140807000201
B2G Version: 2.0
Gecko Version: 32.0
Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=8cc28fd31905a0ea2b2e15d13e80a0eab2feb1ba
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=f7bd772b1e42774708a4ede13b149a1706a59b25
Assignee | ||
Comment 68•10 years ago
|
||
(In reply to Tapas Kumar Kundu from comment #66)
> Created attachment 8471019 [details]
> logs from .extra file
>
> Unfortunately, we got only partial logs when we reproduced this again. I
> attached partial logs but I am waiting for our test team to reproduce again
> with full logcat logs.
>
> @sotaro: Can you please check this partial logcat logs and confirm us if you
> find something useful ?
Partial log does not have kernel log. Therefore it does not have information about where the problem happens in kernel. The logcat log(user side log) includes the error. But the error code is different than before. It failed by "Permission denied" error.
-------------------------------
08-11 17:59:37.491 8895 8905 E qdmemalloc: ion: Failed to map memory in the client: Permission denied
08-11 17:59:37.491 8895 8905 E qdgralloc: Could not mmap handle 0xb1a6f650, fd=60 (Permission denied)
08-11 17:59:37.491 8895 8905 E qdgralloc: gralloc_register_buffer: gralloc_map failed
08-11 17:59:37.491 8895 8905 W GraphicBufferMapper: registerBuffer(0xb1a6f650) failed -13 (Permission denied)
Flags: needinfo?(sotaro.ikeda.g)
Assignee | ||
Comment 69•10 years ago
|
||
Add more logs to cover "Permission denied" error case.
Attachment #8470280 -
Attachment is obsolete: true
Assignee | ||
Comment 70•10 years ago
|
||
Add an additional debug log.
Attachment #8471252 -
Attachment is obsolete: true
Assignee | ||
Comment 71•10 years ago
|
||
Tapas, can you update the adding log patch with attachment 8471264 [details] [diff] [review]?
Assignee | ||
Comment 72•10 years ago
|
||
ION memory allocation is like the following.
IonAlloc::alloc_buffer()
->ioctl() with ION_IOC_ALLOC
->ion_alloc()
->// plist_for_each_entry(heap, &dev->heaps, node) //try to allocate ion for each heap
->ion_buffer_create()
->heap->ops->allocate() // try to allocate for a heap
->mutex_init(&buffer->lock);
->ion_buffer_add()
->ion_handle_create()
->ion_handle_add()
->ioctl() with ION_IOC_MAP
->ion_share_dma_buf_fd() // given an ion client, create a dma-buf fd
->ion_share_dma_buf() // share buffer as dma-buf
->ion_handle_validate()
->ion_buffer_get()
->dma_buf_export(buffer, &dma_buf_ops, buffer->size, O_RDWR)
// Creates a new dma_buf, and associates an anon file with a buffer
->dma_buf_fd() // returns a file descriptor for the given dma_buf
->get_unused_fd_flags()
->fd_install()
->ioctl() with ION_IOC_FREE
->ion_handle_validate(client, data.handle);
->ion_free(client, data.handle)
Assignee | ||
Comment 73•10 years ago
|
||
The followings in do_mmap_pgoff() returns "-EACCES". It cause "Permission denied". attachment 8471019 [details] does not have a kernel log. Therefore it is not clear the error happened there.
https://www.codeaurora.org/cgit/external/gigabyte/kernel/msm/tree/mm/mmap.c?h=caf/b2g_kk_3.5#n1035
https://www.codeaurora.org/cgit/external/gigabyte/kernel/msm/tree/mm/mmap.c?h=caf/b2g_kk_3.5#n1042
The ion memory file is created as O_RDWR by dma_buf_export(). It seems to contradict to the above.
Assignee | ||
Comment 74•10 years ago
|
||
Tapas, the recent log does not apply user side log patch. Can the test be done by applying both user side log patch attachment 8467704 [details] [diff] [review] and kernel side log patch attachment 8471264 [details] [diff] [review] ?
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(tkundu)
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(tkundu)
Reporter | ||
Comment 75•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #74)
> Tapas, the recent log does not apply user side log patch. Can the test be
> done by applying both user side log patch attachment 8467704 [details] [diff] [review]
> [diff] [review] and kernel side log patch attachment 8471264 [details] [diff] [review]
> [diff] [review] ?
We asked out test team and we are waiting for them to get those logs for us. Thanks a lot for your help.
Assignee | ||
Comment 76•10 years ago
|
||
I re-checked attachment 8469555 [details]. I found weird log in "logcat_09-01-1970-05-21-14.log.txt". flatten() log print out 2 fds, but they are both 108.
> 08-07 17:20:24.367 231 11200 I Gecko : gfxfds 19 flatten(0xaa7ffba8, 0, 0xaa7ffa60, 0)
> 08-07 17:20:24.367 231 11200 I Gecko : gfxfds 19 list, dumping 0 fds:
> 08-07 17:20:24.367 231 11200 I Gecko :
> 08-07 17:20:24.367 231 11200 I Gecko : gfxfds Write() using one fd: 108 at 108
> 08-07 17:20:24.367 231 231 D btif_config_util: btif_config_save_file(L188): in file name:/data/misc/bluedroid/bt_config.new
> 08-07 17:20:24.367 231 11200 I Gecko : gfxfds Write() using one fd: 108 at 108
Assignee | ||
Comment 77•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #76)
> I re-checked attachment 8469555 [details]. I found weird log in
> "logcat_09-01-1970-05-21-14.log.txt". flatten() log print out 2 fds, but
> they are both 108.
In "logcat_09-01-1970-05-21-14.log.txt", the same fds seems to happen only at the gralloc allocation that related to the crash.
Assignee | ||
Comment 78•10 years ago
|
||
(In reply to Tapas Kumar Kundu from comment #48)
> Look into
> https://www.codeaurora.org/cgit/quic/la/platform/hardware/qcom/display/tree/
> libgralloc/ionalloc.cpp?h=b2g_kk_3.5#n154
>
> But it can happen only if this FD is invalid. So my guess is that some
> process (b2g ?) has invalided this fd somewhere by mistake.
From comment 76, it might be possible that a task in b2g process invalidate a fd that belong to b2g process.
Assignee | ||
Updated•10 years ago
|
Attachment #8467704 -
Attachment description: Dumping flatten/unflatten and Read/Write fds to logcat → log patch - Dumping flatten/unflatten and Read/Write fds to logcat
Assignee | ||
Updated•10 years ago
|
Attachment #8471264 -
Attachment description: patch v3 - Add log around kernel mmap → log patch v3 - Add log around kernel mmap
Assignee | ||
Comment 79•10 years ago
|
||
Add log to bionic's close() to check Comment 78.
Assignee | ||
Comment 80•10 years ago
|
||
Tapas, I added bionic's log patch. Can you use the following patch for testing?
- gecko log patch: attachment 8467704 [details] [diff] [review]
- bionic log patch: attachment 8472044 [details] [diff] [review]
- kernel mmap log patch: attachment 8471264 [details] [diff] [review]
Flags: needinfo?(tkundu)
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(tkundu)
Reporter | ||
Comment 81•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #80)
> Tapas, I added bionic's log patch. Can you use the following patch for
> testing?
> - gecko log patch: attachment 8467704 [details] [diff] [review]
> - bionic log patch: attachment 8472044 [details] [diff] [review]
> - kernel mmap log patch: attachment 8471264 [details] [diff] [review]
I asked our internal team to test again with this patch. I will update asap
Assignee | ||
Comment 82•10 years ago
|
||
By using bionic log patch attachment 8472044 [details] [diff] [review], I found the invalid fd close in b2g process. I am going to create a bug for it. It is not clear that invalid fd close happen at only one place. I am going to create a bug for each one.
Reporter | ||
Comment 83•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #82)
> It is not clear that invalid fd close happen at only one place.
Nice find. Could you please think for adding log in b2g framework for invalid fd close issue (which you found now) ? This will saves lot of time in stability nightmare issues in 2.1
Assignee | ||
Comment 84•10 years ago
|
||
(In reply to Tapas Kumar Kundu from comment #83)
> Nice find. Could you please think for adding log in b2g framework for
> invalid fd close issue (which you found now) ? This will saves lot of time
> in stability nightmare issues in 2.1
Bug 1053277 is created for it.
Reporter | ||
Comment 85•10 years ago
|
||
Here is the log from previous stability run which contains logs from Comment 62.
We have another build which is running stability test with logs from comment 82
@sotaro : Could you please go through these logs ? we have full logs here
Flags: needinfo?(sotaro.ikeda.g)
Assignee | ||
Comment 86•10 years ago
|
||
(In reply to Tapas Kumar Kundu from comment #85)
> Created attachment 8472482 [details]
> mozilla_logs.tar.bz2
>
> @sotaro : Could you please go through these logs ? we have full logs here
Thanks for the log :-)
logcat log says that GraphicBuffers' 2 fds are same. gralloc allocate two ion buffers. Both fds should have different fds. From the log, it seems possible that someone at least closed the first fd.
> 08-13 16:39:08.169 240 2709 I Gecko : gfxfds Write() using one fd: 173 at 173
> 08-13 16:39:08.169 240 2709 I Gecko : gfxfds Write() using one fd: 173 at 173
kernel log say that "file->f_op->mmap(file, vma)" was failed. But if the file is ion buffer, "ion_mmap()" should be called at "file->f_op->mmap(file, vma)". If ion_mmap() is failed, ion_mmap() print kernel error log. But the kernel log does not have such error log. From it the fd is related other file than ion.
> <3>[ 115.144449] mmap_region: mmap() failed
Flags: needinfo?(sotaro.ikeda.g)
Assignee | ||
Comment 87•10 years ago
|
||
From attachment 8472482 [details], "fd close" or "fd replacement to other file" seems to happen. Bug 1053204 could be a cause of the problem. But it is not clear yet if bug 1053204 could fix all problems.
Assignee | ||
Updated•10 years ago
|
Attachment #8470909 -
Attachment is obsolete: true
Comment 88•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #87)
> From attachment 8472482 [details], "fd close" or "fd replacement to other
> file" seems to happen. Bug 1053204 could be a cause of the problem. But it
> is not clear yet if bug 1053204 could fix all problems.
Thanks Sotaro. We are trying the patch from bug 1053204 but meanwhile i hope you are continuing to look into it to confirm that we have fixed all cases.
Comment 89•10 years ago
|
||
NI: sotaro for comment #88, to see if there is anyway to verify the patch in 1053204 helps?
Flags: needinfo?(sotaro.ikeda.g)
Assignee | ||
Comment 90•10 years ago
|
||
(In reply to bhavana bajaj [:bajaj] from comment #89)
> NI: sotaro for comment #88, to see if there is anyway to verify the patch in
> 1053204 helps?
By applying a patch in Bug 1053204, check if open() close failure happens in b2g process. I already did it today. And I did not saw invalid file descriptor close failure since applying the patch.
Flags: needinfo?(sotaro.ikeda.g)
Updated•10 years ago
|
Blocks: CAF-v2.1-FC-metabug
Comment 91•10 years ago
|
||
Observed on:
Device: msm8226
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.060
Moz BuildID: 20140810000201
B2G Version: 2.0
Gecko Version: 32.0
Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=de28796a8956a48bb98ca67df6a33e0622d642d1
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=2b27becae85092d46bfadcd4fb5605e82e1e1093
Updated•10 years ago
|
Flags: in-moztrap?(bzumwalt)
Comment 92•10 years ago
|
||
STR makes it difficult to reproduce on a testrun. It appears a test case for this issue is unnecessary.
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
This reproduced in AU 60 but per bug 1053204 comment 18 that patch did not land until AU 63.
Updated•10 years ago
|
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage?][2.0-signoff-need+]
Updated•10 years ago
|
QA Whiteboard: [QAnalyst-Triage?][2.0-signoff-need+] → [QAnalyst-Triage+][2.0-signoff-need+]
Flags: needinfo?(ktucker)
Flags: in-moztrap?(bzumwalt)
Flags: in-moztrap-
Comment 94•10 years ago
|
||
We are not seeing this issue anymore.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Target Milestone: --- → 2.1 S3 (29aug)
Comment 95•10 years ago
|
||
Observed on:
Device: msm8226
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.066
Moz BuildID: 20140810160202
B2G Version: 2.0
Gecko Version: 32.0
Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=de28796a8956a48bb98ca67df6a33e0622d642d1
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=2b27becae85092d46bfadcd4fb5605e82e1e1093
Assignee | ||
Comment 96•10 years ago
|
||
(In reply to cafbot (PoC: ggrisco) from comment #95)
> Observed on:
>
> Device: msm8226
> Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.066
> Moz BuildID: 20140810160202
> B2G Version: 2.0
> Gecko Version: 32.0
> Gaia:
> http://git.mozilla.org/?p=releases/gaia.git;a=commit;
> h=de28796a8956a48bb98ca67df6a33e0622d642d1
> Gecko:
> http://git.mozilla.org/?p=releases/gecko.git;a=commit;
> h=2b27becae85092d46bfadcd4fb5605e82e1e1093
This gecko does not have the fix.
Reporter | ||
Comment 97•10 years ago
|
||
We have frozen importing gaia and gecko code . But we have cherry-picked your fix into AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.066.
So this issue is happening again with your fix.
Assignee | ||
Comment 98•10 years ago
|
||
Assignee | ||
Comment 99•10 years ago
|
||
Reduce the logout. Only when error happens, logout close() log.
Attachment #8472044 -
Attachment is obsolete: true
Assignee | ||
Comment 100•10 years ago
|
||
The patch is temporary patch just to implement close() with c.
Assignee | ||
Updated•10 years ago
|
Attachment #8472044 -
Attachment is obsolete: false
Assignee | ||
Updated•10 years ago
|
Attachment #8477772 -
Attachment is obsolete: true
Blocks: 1058366
No longer blocks: 1058366
You need to log in
before you can comment on or make changes to this bug.
Description
•