Failed to map ion memory in the client on gonk

RESOLVED FIXED in 2.1 S3 (29aug)

Status

()

P1
normal
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: tkundu, Assigned: sotaro)

Tracking

({crash})

32 Branch
2.1 S3 (29aug)
ARM
Gonk (Firefox OS)
crash
Points:
---
Dependency tree / graph
Bug Flags:
in-moztrap -

Firefox Tracking Flags

(blocking-b2g:2.0+, b2g-v2.0 fixed, b2g-v2.1 fixed)

Details

(Whiteboard: [b2g-crash][caf-crash 217][caf priority: p1][CR 686674])

Attachments

(8 attachments, 4 obsolete attachments)

4.94 KB, patch
Details | Diff | Splinter Review
4.08 MB, application/x-bzip
Details
144.02 KB, text/plain
Details
7.20 KB, patch
Details | Diff | Splinter Review
3.01 KB, patch
Details | Diff | Splinter Review
1.12 MB, application/x-bzip
Details
4.60 KB, patch
Details | Diff | Splinter Review
3.03 KB, patch
Details | Diff | Splinter Review
+++ This bug was initially created as a clone of Bug #1034294 +++

This bug is created from Bug 1034294 comment 32. 

Logs : 
---------------------------------------------------------
07-14 16:25:00.210  8506  8536 E qdmemalloc: ion: Failed to map memory in the client: Invalid argument
07-14 16:25:00.210  8506  8536 E qdgralloc: Could not mmap handle 0xb0c55c40, fd=45 (Invalid argument)
07-14 16:25:00.210  8506  8536 E qdgralloc: gralloc_register_buffer: gralloc_map failed
07-14 16:25:00.210  8506  8536 W GraphicBufferMapper: registerBuffer(0xb0c55c40) failed -22 (Invalid argument)
07-14 16:25:00.210  8506  8536 E GraphicBuffer: unflatten: registerBuffer failed: Invalid argument (-22)
07-14 16:25:00.210  8506  8536 I Gecko   : ParamTraits<MagicGrallocBufferHandle>::Read() failed to get gralloc buffer
07-14 16:25:00.220  8506  8536 I Gecko   : IPDL protocol error: Error deserializing 'MaybeMagicGrallocBufferHandle'
07-14 16:25:00.220  8506  8536 I Gecko   : [Child 8506] ###!!! ABORT: IPDL error [PSharedBufferManagerChild]: "Error deserializing 'MaybeMagicGrallocBufferHandle'". abort()ing as a result.: file ../../../../../../../../gecko/ipc/glue/ProtocolUtils.cpp, line 198
07-14 16:25:00.230   232   910 W Adreno-GSL: <gsl_ldd_control:412>: ioctl fd 141 code 0xc02c093d (IOCTL_KGSL_SUBMIT_COMMANDS) failed: errno 22 Invalid argument
07-14 16:25:00.230  8506  8536 E Gecko   : mozalloc_abort: [Child 8506] ###!!! ABORT: IPDL error [PSharedBufferManagerChild]: "Error deserializing 'MaybeMagicGrallocBufferHandle'". abort()ing as a result.: file ../../../../../../../../gecko/ipc/glue/ProtocolUtils.cpp, line 198


Crash logs: Please see attachment in bug 1034294 comment 32
Blocks: 1011657
blocking-b2g: --- → 2.0?
Flags: needinfo?(sotaro.ikeda.g)
Observed on: 

Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.029
Moz BuildID: 20140710000201
B2G Version: 2.0
Gecko Version: 32.0a2
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=35a9b715e7348ec738ff6c8a59f50190390a06f2
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=2fb60c777d3f82d580cba249e5e01a167a01de39
(Assignee)

Comment 2

5 years ago
I am not sure if this crash is caused by SharedBufferManagerParent's problem. Bug 1034294 was caused by SharedBufferManagerParent's locking problem. The crash log did not have a log like in Comment 0.

I have no idea how SharedBufferManagerParent could cause the problem like the following log. It says that file descriptor that is delivered to client side is invalid. I am wondering whether the problem is caused by another cause than SharedBufferManagerParent. 

> E qdmemalloc: ion: Failed to map memory in the client: Invalid argument
Flags: needinfo?(sotaro.ikeda.g)
(Assignee)

Comment 3

5 years ago
Log in comment 0 reminds me Bug 1036905. Although, it happens only on peak device and easy to reproduce.
Assignee: nobody → sotaro.ikeda.g
(Assignee)

Comment 4

5 years ago
Update the summary to more correct one.
Summary: Fix SharedBufferManagerParent → Failed to map ion memory in the client on gonk
blocking-b2g: 2.0? → 2.0+
How/when did we move from pmem to ion?
(Assignee)

Comment 6

5 years ago
(In reply to Milan Sreckovic [:milan] from comment #5)
> How/when did we move from pmem to ion?

Since JB, android provides ion. pmen is vendor specific RAM.
:nical, take a look, is there anything that stands out here?
Flags: needinfo?(nical.bugzilla)
(In reply to Milan Sreckovic [:milan] from comment #8)
> :nical, take a look, is there anything that stands out here?

Not much at this point. Looking at mmap in bionic, it can fail if the passed offset isn't 4096-byte aligned, I suppose there is a sanity check on the size somewhere too, but since these two parameters are generated by android's gralloc code it's hard to believe the issue is here. There is also __mmap2 which can fail but I don't think I have access to its source code.
Somewhere in XDA forums I read about someone seeing gralloc errors caused by some shared memory that was open twice when it should have been opened once (the description was as blurry as this) but we are opening a freshly created GraphicBuffer so it's doesn't look like a good candidate either.
Flags: needinfo?(nical.bugzilla)
I don't think it makes a difference but sBufferKey is uint64 while the key we pass in GrallocBufferRef is int64. We should use the same type if only for clarity.
GrallocBufferRef has a default constructor with "invalid" values in it but should always have its members set by the time we deserialize it. It'd be interesting to double check when deserializing that the GrallocBufferRef we receive isn't equal to a default constructed GrallocBufferRef.
Perhaps the mOwner member in SharedBufferManagerParent has an incorrect value? I don't see checks but I haven't read through all of the code.
(Assignee)

Comment 11

5 years ago
Bug 1039883 reduce the gralloc buffer allocations and reduce ion mapping in application process. It might address the problem.
(Assignee)

Comment 12

5 years ago
Tapas, can you confirm if Bug 1039883 address the problem?
Flags: needinfo?(tkundu)
(In reply to Sotaro Ikeda [:sotaro PTO July/25 - Aug/3] from comment #12)
> Tapas, can you confirm if Bug 1039883 address the problem?

Sure. Our internal test team is testing with that patch. We will confirm soon.
Observed on: 

Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.035
Moz BuildID: 20140713000201
B2G Version: 2.0
Gecko Version: 32.0a2
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=ca022f811bcbbda0f89086094a9e92bb220fea18
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=be6908fec84d3e39453275da96c031336f58f23d
(In reply to Tapas Kumar Kundu from comment #13)
> (In reply to Sotaro Ikeda [:sotaro PTO July/25 - Aug/3] from comment #12)
> > Tapas, can you confirm if Bug 1039883 address the problem?
> 
> Sure. Our internal test team is testing with that patch. We will confirm
> soon.

Tapas ignoring the cafbot comment in comment #14 as we got a latest patch in Bug 1039883 ? Sounds reasonable ?
(In reply to bhavana bajaj [:bajaj] [On PTO until July 27 ] from comment #15)
> Tapas ignoring the cafbot comment in comment #14 as we got a latest patch in
> Bug 1039883 ? Sounds reasonable ?

Yeah . you are correct. Please also note that we found a new memleak in bug 1041751
Flags: needinfo?(tkundu)
status-b2g-v2.0: --- → affected
status-b2g-v2.1: --- → affected
We should wait till bug 1041751 is fixed
(Assignee)

Updated

5 years ago
Depends on: 1039883
Observed on: 

Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.040
Moz BuildID: 20140716000201
B2G Version: 2.0
Gecko Version: 32.0a2
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=5f8b1b8a2da9e3b531eee817a669f57fa4d9b9c6
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=e00f7e464333689fcf54edb4945ece94f97f930b
Observed on: 

Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.042
Moz BuildID: 20140721000201
B2G Version: 2.0
Gecko Version: 32.0a2
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=8cb1a949f2e9650bb2c5598e78a6f24a58bbaf97
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=5f27d3ee3ccf01ac91a3efacb5e3e22ea62fd73c
Whiteboard: [caf priority: p1][CR 686674] → [b2g-crash][caf-crash 217][caf priority: p1][CR 686674]
Keywords: crash
Observed on: 

Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.044
Moz BuildID: 20140724160208
B2G Version: 2.0
Gecko Version: 32.0
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=68226b3fd4eba752307daa5e917238bde253f5ab
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=b07c8ef448ee2c96955dee2a715f575faaaa72bc
Taking this bug while Sotaro is away.

(In reply to cafbot (PoC: ggrisco) from comment #20)
> Observed on: 
> 
> Device: msm8610
> Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.044
> Moz BuildID: 20140724160208
> B2G Version: 2.0
> Gecko Version: 32.0
> Gaia: 
> http://git.mozilla.org/?p=releases/gaia.git;a=commit;
> h=68226b3fd4eba752307daa5e917238bde253f5ab
> Gecko:
> http://git.mozilla.org/?p=releases/gecko.git;a=commit;
> h=b07c8ef448ee2c96955dee2a715f575faaaa72bc

This is against Gecko 32, but bug 1041751 only landed on master for now.
Assignee: sotaro.ikeda.g → lissyx+mozillians
(In reply to Tapas Kumar Kundu from comment #17)
> We should wait till bug 1041751 is fixed

Landed in 2.0 on 7/28, so the next 2.0 nightly should contain the fix.

Comment 23

4 years ago
:milan -- We had cherry-pick'ed the fix from bug 1041751 but we are still seeing this issue.
Tapas, Inder, do you mind sharing STR for this ? I could not find some proper in other bugs.
Flags: needinfo?(tkundu)
Flags: needinfo?(ikumar)
(In reply to Alexandre LISSY :gerard-majax from comment #24)
> Tapas, Inder, do you mind sharing STR for this ? I could not find some
> proper in other bugs.

STR: Run call, sms, music, video, camera, camcorder, airplane on/off, wifi on/off etc testcases for 48 hours. 

I knew that STR will be very difficult to reproduce. But we still have at least one mem leak bug 1044514 unresolved. 

We will upload latest log soon which will make it clear whether there is a memleak in system when this happens or not. I hope that this will help us to move proper direction towards solution here.
Flags: needinfo?(tkundu)
Flags: needinfo?(ikumar)
Flags: needinfo?(tkundu)
(In reply to Tapas Kumar Kundu from comment #25)
> (In reply to Alexandre LISSY :gerard-majax from comment #24)
> > Tapas, Inder, do you mind sharing STR for this ? I could not find some
> > proper in other bugs.
> 
> STR: Run call, sms, music, video, camera, camcorder, airplane on/off, wifi
> on/off etc testcases for 48 hours. 

Do you just open/close those apps or do you perform specific steps inside?

> 
> I knew that STR will be very difficult to reproduce. But we still have at
> least one mem leak bug 1044514 unresolved. 
> 
> We will upload latest log soon which will make it clear whether there is a
> memleak in system when this happens or not. I hope that this will help us to
> move proper direction towards solution here.

It's in 2.0 now, I'm waiting for your feedback :)
Observed on: 

Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.048
Moz BuildID: 20140728000238
B2G Version: 2.0
Gecko Version: 32.0
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=0a864988f5dce7f9f3dea9609e8ef054679c30ff
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=745b486db495248e4d4503039e374cb8d5bb244f
Tapas, are you still seeing this?

Comment 29

4 years ago
Yes, we are still seeing this issue, in last stability run we saw it happen twice. We need to have a fix for this asap.
Tapas -- please upload the latest logs.
Flags: needinfo?(lissyx+mozillians)
(In reply to Inder from comment #29)
> Yes, we are still seeing this issue, in last stability run we saw it happen
> twice. We need to have a fix for this asap.
> Tapas -- please upload the latest logs.

And can you make sure it includes bug 1044514 ? Can you reply to my questions in comment 26 ?
Flags: needinfo?(lissyx+mozillians)
Tapas, can you make sure your tests includes bug 1044514 ?
Flags: needinfo?(tkundu)
(In reply to Alexandre LISSY :gerard-majax from comment #31)
> Tapas, can you make sure your tests includes bug 1044514 ?

We tested with that patch too but we are still hitting this issue in below gaia/gecko :

https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/commit/?h=mozilla/v2.0&id=0a864988f5dce7f9f3dea9609e8ef054679c30ff
https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/commit/?h=mozilla/v2.0&id=745b486db495248e4d4503039e374cb8d5bb244f


Could you please provide us a patch which may help us logging how graphic buffer FDs are created and destroyed by gecko layer tree. This may help us to debug it further. Please suggest.
Flags: needinfo?(lissyx+mozillians)
Thanks. Can you please reply to my questions from comment 26 ?
Flags: needinfo?(lissyx+mozillians) → needinfo?(tkundu)
(In reply to Alexandre LISSY :gerard-majax from comment #26)
> (In reply to Tapas Kumar Kundu from comment #25)
> > (In reply to Alexandre LISSY :gerard-majax from comment #24)
> > > Tapas, Inder, do you mind sharing STR for this ? I could not find some
> > > proper in other bugs.
> > 
> > STR: Run call, sms, music, video, camera, camcorder, airplane on/off, wifi
> > on/off etc testcases for 48 hours. 
> 
> Do you just open/close those apps or do you perform specific steps inside?
> 

We are not doing just open/close apps. Stability team is running some test and STR varies from one stability run to next stability run. 

This issue is last visible with below STR:
1.Make outgoing call and got connected.
2.Wifi ON and dowloaded games.
3.Data ON/OFF for sometime.
4.Wifi ON/OFF for sometime.
5.BT on/off for sometime.
6.While opening settings device got crashed.

But there is no guarantee that you will see this issue if you try this STR. So please don't try to reproduce in your device as it may waste your time.
> > 
> > I knew that STR will be very difficult to reproduce. But we still have at
> > least one mem leak bug 1044514 unresolved. 
> > 
> > We will upload latest log soon which will make it clear whether there is a
> > memleak in system when this happens or not. I hope that this will help us to
> > move proper direction towards solution here.
> 
> It's in 2.0 now, I'm waiting for your feedback :)

I already confirmed this in Comment 32 . I am confirming it again here :) . We are still seeing this issue with patch from bug 1044514

Could you please provide us a logging patch as I suggested in comment 32 ?
Flags: needinfo?(tkundu) → needinfo?(lissyx+mozillians)
Thanks. I had a look at the bionic and kernel code for the codepath that may lead to EINVAL. Do you mind hacking libc/bionic/mmap.cpp, and add some logcat output to know whether it's the first offset test that returns -EINVAL or whether we are getting this from mmap2 directly?

I also noticed that all the runs reporting the error are on Kitkat codebase, how can we block 2.0 on a KK issue ?
Flags: needinfo?(lissyx+mozillians) → needinfo?(tkundu)
Created attachment 8467704 [details] [diff] [review]
log patch - Dumping flatten/unflatten and Read/Write fds to logcat

To help debug of bug 1038461 were bad things are happening with file
descriptors, we hack this to expose the content of the FDs we are
passing and receiving to the underlying libs. Filtering logcat on
'gfxfds' should be enough to gather those logs.
Tapas, I've looked at the code and checked with nical. With attachment 8467704 [details] [diff] [review] we should be covering all the call sites of gfx/ that makes use of fds. This will dump to logcat, and I hope it will help us identify something we can work on :)
Flags: needinfo?(tkundu)

Comment 38

4 years ago
> I also noticed that all the runs reporting the error are on Kitkat codebase,
> how can we block 2.0 on a KK issue ?
KK is the official baseline for 2.0 release

So, to confirm we don't need logs in bionic with your patch in comment 36, right?
(In reply to Inder from comment #38)
> > I also noticed that all the runs reporting the error are on Kitkat codebase,
> > how can we block 2.0 on a KK issue ?
> KK is the official baseline for 2.0 release

All the 2.0 I'm aware of are on JB

> 
> So, to confirm we don't need logs in bionic with your patch in comment 36,
> right?

I think it can still be useful to have an eye on what happens at this level, so if you can it's be useful.

Updated

4 years ago
Blocks: 1041241
Can we get a log with the patch applied here?
Flags: needinfo?(tkundu)
Flags: needinfo?(ikumar)
Sotaro, when this information comes back, can you see if you can help Alexandre if he needs it at this point?  It would be very beneficial if we can get this taken care of this week, as we start bumping against CC milestone otherwise.

I do hope this is not specific to KK, as our Flame KK builds are not in the best of shape.
Flags: needinfo?(sotaro.ikeda.g)
Created attachment 8469555 [details]
Addiitonal logs when issue has reproduced

(In reply to Milan Sreckovic [:milan] (PTO 8/11 - 8/15) from comment #41)
> Sotaro, when this information comes back, can you see if you can help
> Alexandre if he needs it at this point?  It would be very beneficial if we
> can get this taken care of this week, as we start bumping against CC
> milestone otherwise.
> 
> I do hope this is not specific to KK, as our Flame KK builds are not in the
> best of shape.

We reproduced it with additional logs. Please note that settings app is crashing here when it fails to map ion memory.

Exact timestamp in log is : 08-07 17:20:24.367
08-07 17:20:24.367 11186 11246 E qdmemalloc: ion: Failed to map memory in the client: Invalid argument


I can also see that dmesg has following logs : 

<7>[ 1178.275354] BufferMgrChild: unhandled page fault (11) at 0x00000000, code 0x817
<1>[ 1178.275375] pgd = c0e14000
<1>[ 1178.277329] [00000000] *pgd=08340831, *pte=00000000, *ppte=00000000
<6>[ 1178.283290] 
<6>[ 1178.283301] Pid: 11246, comm:       BufferMgrChild
<6>[ 1178.283312] CPU: 0    Tainted: G           O  (3.4.0-g015041d #1)
<6>[ 1178.283324] PC is at 0xb64e3e32
<6>[ 1178.283331] LR is at 0xb64e3e2f
<6>[ 1178.283341] pc : [<b64e3e32>]    lr : [<b64e3e2f>]    psr: 200f0030
<6>[ 1178.283345] sp : b2fef668  ip : 00000003  fp : b1f83288
<6>[ 1178.283354] r10: 00000000  r9 : b66a31da  r8 : b53b5da9
<6>[ 1178.283363] r7 : ffffffff  r6 : 00000001  r5 : b2fef6a4  r4 : 00000000
<6>[ 1178.283372] r3 : 00000000  r2 : 0000007b  r1 : ac72dc86  r0 : 000000f3
<6>[ 1178.283382] Flags: nzCv  IRQs on  FIQs on  Mode USER_32  ISA Thumb  Segment user
<6>[ 1178.283392] Control: 10c5387d  Table: 00e1406a  DAC: 00000015
<6>[ 1178.283423] [<c010bd94>] (unwind_backtrace+0x0/0xf8) from [<c01116a0>] (__do_user_fault+0xbc/0x134)
<6>[ 1178.283443] [<c01116a0>] (__do_user_fault+0xbc/0x134) from [<c0884848>] (do_page_fault+0x290/0x400)
<6>[ 1178.283469] [<c0884848>] (do_page_fault+0x290/0x400) from [<c010029c>] (do_DataAbort+0x34/0x98)
<6>[ 1178.283486] [<c010029c>] (do_DataAbort+0x34/0x98) from [<c08830b4>] (__dabt_usr+0x34/0x40)
<6>[ 1178.283498] Exception stack(0xc624dfb0 to 0xc624dff8)
<6>[ 1178.283509] dfa0:                                     000000f3 ac72dc86 0000007b 00000000
<6>[ 1178.283524] dfc0: 00000000 b2fef6a4 00000001 ffffffff b53b5da9 b66a31da 00000000 b1f83288
<6>[ 1178.283537] dfe0: 00000003 b2fef668 b64e3e2f b64e3e32 200f0030 ffffffff


But settings app should be killed by LMK if it is running very low memory: 
Device memory when crash happened is : 

[H[JEvery 5s: b2g-info                                          2014-08-07 17:20:23

                           |     megabytes     |
           NAME   PID PPID CPU(s) NICE  USS  PSS  RSS SWAP VSIZE OOM_ADJ USER     
            b2g   231    1  436.5    0 33.7 35.4 40.9 17.4 237.4       0 root     
         (Nuwa)  1066  231    2.0    0  0.0  0.1  0.8  7.7  54.1       0 root     
         Camera  1715 1066   35.4   18  1.6  2.4  6.3 17.9  80.7      11 u0_a1715 
     Homescreen  7746 1066    6.2   18  0.0  0.6  4.2 15.7  70.4       8 u0_a7746 
          Usage 11002 1066    6.9   18  1.3  2.2  6.4 12.7  67.6      11 u0_a11002
       Settings 11186  231  251.0    1 13.1 14.9 20.2  6.0  84.8       2 u0_a11186
       Calendar 11187 1066    3.7   18  0.0  0.8  5.1 14.5  66.3      10 u0_a11187
(Preallocated a 28481 1066    1.1   18  0.0  0.6  4.2 11.0  61.2       1 u0_a28481

System memory info:

            Total 167.6 MB
        SwapTotal 192.0 MB
     Used - cache 145.8 MB
  B2G procs (PSS)  56.9 MB
    Non-B2G procs  88.9 MB
     Free + cache  21.8 MB
             Free   3.1 MB
            Cache  18.7 MB
         SwapFree  89.6 MB

It is also not a memleak issue. My guess is somehow fd=74 is getting closed by some process. Could you please confirm it ?
Flags: needinfo?(tkundu)
Flags: needinfo?(ikumar)
(Assignee)

Comment 43

4 years ago
(In reply to Milan Sreckovic [:milan] (PTO 8/11 - 8/15) from comment #41)
> Sotaro, when this information comes back, can you see if you can help
> Alexandre if he needs it at this point?

Okey :-)
Flags: needinfo?(sotaro.ikeda.g)
(Assignee)

Comment 44

4 years ago
> I can also see that dmesg has following logs : 
> 
> <7>[ 1178.275354] BufferMgrChild: unhandled page fault (11) at 0x00000000,
> code 0x817
> <1>[ 1178.275375] pgd = c0e14000
> <1>[ 1178.277329] [00000000] *pgd=08340831, *pte=00000000, *ppte=00000000
> <6>[ 1178.283290] 
> <6>[ 1178.283301] Pid: 11246, comm:       BufferMgrChild
> <6>[ 1178.283312] CPU: 0    Tainted: G           O  (3.4.0-g015041d #1)
> <6>[ 1178.283324] PC is at 0xb64e3e32
> <6>[ 1178.283331] LR is at 0xb64e3e2f
> <6>[ 1178.283341] pc : [<b64e3e32>]    lr : [<b64e3e2f>]    psr: 200f0030
> <6>[ 1178.283345] sp : b2fef668  ip : 00000003  fp : b1f83288
> <6>[ 1178.283354] r10: 00000000  r9 : b66a31da  r8 : b53b5da9
> <6>[ 1178.283363] r7 : ffffffff  r6 : 00000001  r5 : b2fef6a4  r4 : 00000000
> <6>[ 1178.283372] r3 : 00000000  r2 : 0000007b  r1 : ac72dc86  r0 : 000000f3
> <6>[ 1178.283382] Flags: nzCv  IRQs on  FIQs on  Mode USER_32  ISA Thumb 
> Segment user
> <6>[ 1178.283392] Control: 10c5387d  Table: 00e1406a  DAC: 00000015
> <6>[ 1178.283423] [<c010bd94>] (unwind_backtrace+0x0/0xf8) from [<c01116a0>]
> (__do_user_fault+0xbc/0x134)
> <6>[ 1178.283443] [<c01116a0>] (__do_user_fault+0xbc/0x134) from
> [<c0884848>] (do_page_fault+0x290/0x400)
> <6>[ 1178.283469] [<c0884848>] (do_page_fault+0x290/0x400) from [<c010029c>]
> (do_DataAbort+0x34/0x98)
> <6>[ 1178.283486] [<c010029c>] (do_DataAbort+0x34/0x98) from [<c08830b4>]
> (__dabt_usr+0x34/0x40)
> <6>[ 1178.283498] Exception stack(0xc624dfb0 to 0xc624dff8)
> <6>[ 1178.283509] dfa0:                                     000000f3
> ac72dc86 0000007b 00000000
> <6>[ 1178.283524] dfc0: 00000000 b2fef6a4 00000001 ffffffff b53b5da9
> b66a31da 00000000 b1f83288
> <6>[ 1178.283537] dfe0: 00000003 b2fef668 b64e3e2f b64e3e32 200f0030 ffffffff


The above page fault seems to come from the following ABORT calling.

> 08-07 17:20:24.387 11186 11246 E Gecko   : mozalloc_abort: [Child 11186] ###!!! ABORT: IPDL error [PSharedBufferManagerChild]: "Error deserializing 'MaybeMagicGrallocBufferHandle'". abort()ing as a result.: file ../../../../../../../../gecko/ipc/glue/ProtocolUtils.cpp, line 198
Nicolas, as soon as I see IPDL, I CC you on the bug - we're trying to accelerate this, so if there is something in the logs that catches your eye, let us know.

Sotaro, at first I thought this may be related to the Fence::merge() issues, but your last comment suggests otherwise?
Flags: needinfo?(nical.bugzilla)
> 08-07 17:20:24.367 11186 11246 E qdmemalloc: ion: Failed to map memory in the client: Invalid argument
> 08-07 17:20:24.367 11186 11246 E qdgralloc: Could not mmap handle 0xb12dc100, fd=74 (Invalid argument)
> 08-07 17:20:24.367 11186 11246 E qdgralloc: gralloc_register_buffer: gralloc_map failed
> 08-07 17:20:24.367 11186 11246 W GraphicBufferMapper: registerBuffer(0xb12dc100) failed -22 (Invalid argument)
> 08-07 17:20:24.367 11186 11246 E GraphicBuffer: unflatten: registerBuffer failed: Invalid argument (-22)
> 08-07 17:20:24.367 11186 11246 I Gecko   : ParamTraits<MagicGrallocBufferHandle>::Read() failed to get gralloc buffer

Tapas, I don't understand how to read those logs, especially the "Could not mmap handle" line. Is it the bionic logging you added ?
Does it means that it is the offset check in bionic/libc/bionic/mmap.cpp that is returning -EINVAL, or is it happening after, when calling __mmap2 ?
Flags: needinfo?(tkundu)
Looking at dmesg I see several strange things:

>  298 <3>[  313.107074] msm_vfe32_process_error_status: camif error status: 0x80000000
>  299 <3>[  313.113116] msm_vfe32_process_error_status: violation
>  300 <3>[  313.118179] msm_vfe32_process_violation_status: black violation
>  301 <3>[  313.168431] msm_vfe32_process_error_status: camif error status: 0x80000000

Also, Fence timing out:

>   316 <6>[  360.418365] fence timeout on [c5fd7a00] after 3000ms
>   317 <6>[  360.418405] fence:
>   318 <6>[  360.418408] --------------
>   319 <6>[  360.418412] [c5fd7a00] kgsl-fence: active
>   320 <6>[  360.418416]   kgsl-3d0_b2g(231)-Compositor(968)-1_pt active: 28012 / 28009 retired:28009
>   321 <6>[  360.418424]
>   322 <4>[  360.418436] mdss_fb_wait_for_fence: mdp-fence: sync_fence_wait timed out! Waiting 10 more seconds
>   323 <6>[  360.671207] fence timeout on [c5fd7a80] after 3000ms
>   324 <6>[  360.671241] fence:
>   325 <6>[  360.671243] --------------
>   326 <6>[  360.671246] [c5fd7a80] TextureHostOGL: active
>   327 <6>[  360.671249]   mdss_fb_0_pt signaled@357.415841: 8148 / 8148
>   328 <6>[  360.671252]   kgsl-3d0_b2g(231)-Compositor(968)-1_pt active: 28012 / 28009 retired:28009
>   329 <6>[  360.671257]
>   330 <6>[  370.418362] fence timeout on [c5fd7a00] after 10000ms
>   331 <6>[  370.418391] fence:
>   332 <6>[  370.418393] --------------
>   333 <6>[  370.418395] [c5fd7a00] kgsl-fence: active
>   334 <6>[  370.418398]   kgsl-3d0_b2g(231)-Compositor(968)-1_pt active: 28012 / 28009 retired:28009
>   335 <6>[  370.418403]
>   336 <3>[  370.418433] mdss_fb_wait_for_fence: mdp-fence: sync_fence_wait failed! ret = ffffffc2
(Assignee)

Updated

4 years ago
Assignee: lissyx+mozillians → sotaro.ikeda.g
(In reply to Alexandre LISSY :gerard-majax from comment #46)
> > 08-07 17:20:24.367 11186 11246 E qdmemalloc: ion: Failed to map memory in the client: Invalid argument
> > 08-07 17:20:24.367 11186 11246 E qdgralloc: Could not mmap handle 0xb12dc100, fd=74 (Invalid argument)
> > 08-07 17:20:24.367 11186 11246 E qdgralloc: gralloc_register_buffer: gralloc_map failed
> > 08-07 17:20:24.367 11186 11246 W GraphicBufferMapper: registerBuffer(0xb12dc100) failed -22 (Invalid argument)
> > 08-07 17:20:24.367 11186 11246 E GraphicBuffer: unflatten: registerBuffer failed: Invalid argument (-22)
> > 08-07 17:20:24.367 11186 11246 I Gecko   : ParamTraits<MagicGrallocBufferHandle>::Read() failed to get gralloc buffer
> 
> Tapas, I don't understand how to read those logs, especially the "Could not
> mmap handle" line. Is it the bionic logging you added ?
> Does it means that it is the offset check in bionic/libc/bionic/mmap.cpp
> that is returning -EINVAL, or is it happening after, when calling __mmap2 ?

It comes from display HAL. Gecko calls this API to map ION handle.

Look into https://www.codeaurora.org/cgit/quic/la/platform/hardware/qcom/display/tree/libgralloc/ionalloc.cpp?h=b2g_kk_3.5#n154

But it can happen only if this FD is invalid. So my guess is that some process (b2g ?) has invalided this fd somewhere by mistake.
Flags: needinfo?(tkundu)
(Assignee)

Comment 49

4 years ago
(In reply to Tapas Kumar Kundu from comment #48)
> 
> It comes from display HAL. Gecko calls this API to map ION handle.
> 
> Look into
> https://www.codeaurora.org/cgit/quic/la/platform/hardware/qcom/display/tree/
> libgralloc/ionalloc.cpp?h=b2g_kk_3.5#n154
> 
> But it can happen only if this FD is invalid. So my guess is that some
> process (b2g ?) has invalided this fd somewhere by mistake.

It seems better to call gralloc hal. I am not sure if fd is actually invalidated only from the log. The erro log comes from gralloc::IonAlloc::map_buffer(). The function calls mmap() and the mmap() seems to call __NR_mmap2, it is system call. I assume the source of error comes from ion driver in kernel.
Sotaro is working on a patch to get us more logging around ION driver.
Flags: needinfo?(nical.bugzilla)
(Assignee)

Comment 51

4 years ago
Function call sequence is is like the following.

MagicGrallocBufferHandle>::Read()
->GraphicBuffer::unflatten()
->GraphicBufferMapper::registerBuffer()
->gralloc_register_buffer() // gralloc hal
->gralloc_map() // gralloc hal
->IonAlloc::map_buffer()  // gralloc hal
->mmap() //bionic
->__mmap2()  //bionic
->system call with __NR_mmap2
->sys_mmap2() //kernel
->sys_mmap_pgoff()
->do_mmap_pgoff()
->mmap_region()
->ion_mmap()
(Assignee)

Comment 52

4 years ago
(In reply to Tapas Kumar Kundu from comment #48)
> 
> But it can happen only if this FD is invalid. So my guess is that some
> process (b2g ?) has invalided this fd somewhere by mistake.

If fd is invalid, mmap() seems to fail by "-EBADF".
(Assignee)

Comment 53

4 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #52)
> (In reply to Tapas Kumar Kundu from comment #48)
> > 
> > But it can happen only if this FD is invalid. So my guess is that some
> > process (b2g ?) has invalided this fd somewhere by mistake.
> 
> If fd is invalid, mmap() seems to fail by "-EBADF".

sys_mmap_pgoff() returns "-EBADF", if fd is invalid.
  https://github.com/mozilla-b2g/codeaurora_kernel_msm/blob/master/mm/mmap.c#L1088
(Assignee)

Comment 54

4 years ago
ion_mmap() out put error log, when error happens. But kernel log of attachment 8469555 [details] does not include that error. It seems to mean the following possibilities.
- Kernel error log is not correctly captured. 
- Error happens between sys_mmap_pgoff() and mmap_region()

https://www.codeaurora.org/cgit/external/gigabyte/kernel/msm/tree/drivers/gpu/ion/ion.c?h=caf/b2g_kk_3.5#n1110
(Assignee)

Comment 55

4 years ago
Created attachment 8470280 [details] [diff] [review]
patch - Add log around kernel mmap
(Assignee)

Comment 56

4 years ago
Tapas, can you test again by applying attachment 8470280 [details] [diff] [review]?
Flags: needinfo?(tkundu)
(Assignee)

Comment 57

4 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #56)
> Tapas, can you test again by applying attachment 8470280 [details] [diff] [review]
> [review]?

The log is output to kernel log. To analyze the problem. logcat log and kernel log are necessary. Thanks.
(Assignee)

Comment 58

4 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #57)
> (In reply to Sotaro Ikeda [:sotaro] from comment #56)
> > Tapas, can you test again by applying attachment 8470280 [details] [diff] [review]
> > [review]?
> 
> The log is output to kernel log. To analyze the problem. logcat log and
> kernel log are necessary. Thanks.

Is it possible to do testing during weekend?
Not sure who's "on call" with codeaurora, so adding Michael on NI as well.
Flags: needinfo?(mvines)
(:tk'll be around to pick this up)
Flags: needinfo?(mvines)
(In reply to Sotaro Ikeda [:sotaro] from comment #58)
> (In reply to Sotaro Ikeda [:sotaro] from comment #57)
> > (In reply to Sotaro Ikeda [:sotaro] from comment #56)
> > > Tapas, can you test again by applying attachment 8470280 [details] [diff] [review]
> > > [review]?
> > 
> > The log is output to kernel log. To analyze the problem. logcat log and
> > kernel log are necessary. Thanks.
> 
> Is it possible to do testing during weekend?

Yes. We will put your patch in our build and confirm you with logs asap. Thanks a lot for helping us.
(Assignee)

Comment 62

4 years ago
(In reply to Tapas Kumar Kundu from comment #61)
> (In reply to Sotaro Ikeda [:sotaro] from comment #58)
> > (In reply to Sotaro Ikeda [:sotaro] from comment #57)
> > > (In reply to Sotaro Ikeda [:sotaro] from comment #56)
> > > > Tapas, can you test again by applying attachment 8470280 [details] [diff] [review]
> > > > [review]?
> > > 
> > > The log is output to kernel log. To analyze the problem. logcat log and
> > > kernel log are necessary. Thanks.
> > 
> > Is it possible to do testing during weekend?
> 
> Yes. We will put your patch in our build and confirm you with logs asap.
> Thanks a lot for helping us.

Thanks, testing with both attachment 8467704 [details] [diff] [review] and attachment 8470280 [details] [diff] [review] seems good.
(Assignee)

Comment 63

4 years ago
File descriptors are sent from b2g to content process via unix socket. Function call sequence seems like the following. New file descriptors are allocated under recvmsg().

Channel::ChannelImpl::ProcessOutgoingMessages() //gecko ipc
->sendmsg() // bionic
 ->system call with __NR_sendmsg
  ->sys_sendmsg()
   ->__sys_sendmsg()
    ->sock_sendmsg_nosec()
     ->__sock_sendmsg_nosec()
      ->unix_stream_sendmsg()
       ->scm_send()
        ->__scm_send()
         ->scm_fp_copy()
          ->// Verify the descriptors and increment the usage count.
       ->sock_alloc_send_skb() // Grab a buffer
       ->unix_scm_to_skb() // Send the fds
        ->unix_attach_fds()

Channel::ChannelImpl::ProcessIncomingMessages()  //gecko ipc
->recvmsg() // bionic
 ->system call with __NR_recvmsg
  ->sys_recvmsg()
   ->__sys_recvmsg()
    ->sock_recvmsg()
     ->__sock_recvmsg_nosec()
      ->unix_stream_recvmsg()
       ->scm_detach_fds()
        ->new_fd = get_unused_fd_flags()
         ->__alloc_fd() // allocate a file descriptor
        ->put_user(new_fd,..);
        ->get_file(fp[i]);// Bump the usage count
        ->fd_install(new_fd, fp[i]); // install the file
(Assignee)

Comment 64

4 years ago
(In reply to Tapas Kumar Kundu from comment #48)
> 
> Look into
> https://www.codeaurora.org/cgit/quic/la/platform/hardware/qcom/display/tree/
> libgralloc/ionalloc.cpp?h=b2g_kk_3.5#n154
> 
> But it can happen only if this FD is invalid. So my guess is that some
> process (b2g ?) has invalided this fd somewhere by mistake.

From comment 63, new fd is related only to content process. It seems that any processes except the content process can not invalidate the fd.
(Assignee)

Comment 65

4 years ago
Created attachment 8470909 [details] [diff] [review]
patch - workaround to a crash

The patch is a workaround to the crash caused by unflatten() failure. It changes the crash to gralloc allocation failure. The patch could prevent an application's crash. But it cause rendering problem because of gralloc buffer allocation failure. Therefore, it seems better not to use this patch.
Created attachment 8471019 [details]
logs from .extra file

Unfortunately, we got only partial logs when we reproduced this again. I attached partial logs but I am waiting for our test team to reproduce again with full logcat logs.

@sotaro: Can you please check this partial logcat logs and confirm us if you find something useful ?
Flags: needinfo?(sotaro.ikeda.g)
Observed on: 

Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.058
Moz BuildID: 20140807000201
B2G Version: 2.0
Gecko Version: 32.0
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=8cc28fd31905a0ea2b2e15d13e80a0eab2feb1ba
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=f7bd772b1e42774708a4ede13b149a1706a59b25
(Assignee)

Comment 68

4 years ago
(In reply to Tapas Kumar Kundu from comment #66)
> Created attachment 8471019 [details]
> logs from .extra file
> 
> Unfortunately, we got only partial logs when we reproduced this again. I
> attached partial logs but I am waiting for our test team to reproduce again
> with full logcat logs.
> 
> @sotaro: Can you please check this partial logcat logs and confirm us if you
> find something useful ?

Partial log does not have kernel log. Therefore it does not have information about where the problem happens in kernel. The logcat log(user side log) includes the error. But the error code is different than before. It failed by "Permission denied" error.

-------------------------------
08-11 17:59:37.491  8895  8905 E qdmemalloc: ion: Failed to map memory in the client: Permission denied
08-11 17:59:37.491  8895  8905 E qdgralloc: Could not mmap handle 0xb1a6f650, fd=60 (Permission denied)
08-11 17:59:37.491  8895  8905 E qdgralloc: gralloc_register_buffer: gralloc_map failed
08-11 17:59:37.491  8895  8905 W GraphicBufferMapper: registerBuffer(0xb1a6f650) failed -13 (Permission denied)
Flags: needinfo?(sotaro.ikeda.g)
(Assignee)

Comment 69

4 years ago
Created attachment 8471252 [details] [diff] [review]
patch v2 - Add log around kernel mmap

Add more logs to cover "Permission denied" error case.
Attachment #8470280 - Attachment is obsolete: true
(Assignee)

Comment 70

4 years ago
Created attachment 8471264 [details] [diff] [review]
log patch v3 - Add log around kernel mmap

Add an additional debug log.
Attachment #8471252 - Attachment is obsolete: true
(Assignee)

Comment 71

4 years ago
Tapas, can you update the adding log patch with attachment 8471264 [details] [diff] [review]?
(Assignee)

Comment 72

4 years ago
ION memory allocation is like the following.

IonAlloc::alloc_buffer()
->ioctl() with ION_IOC_ALLOC
 ->ion_alloc()
  ->// plist_for_each_entry(heap, &dev->heaps, node) //try to allocate ion for each heap
   ->ion_buffer_create()
    ->heap->ops->allocate() // try to allocate for a heap
    ->mutex_init(&buffer->lock);
    ->ion_buffer_add()
  ->ion_handle_create()
  ->ion_handle_add()
->ioctl() with ION_IOC_MAP
 ->ion_share_dma_buf_fd() // given an ion client, create a dma-buf fd
  ->ion_share_dma_buf() // share buffer as dma-buf
   ->ion_handle_validate()
   ->ion_buffer_get()
   ->dma_buf_export(buffer, &dma_buf_ops, buffer->size, O_RDWR)
     // Creates a new dma_buf, and associates an anon file with a buffer
  ->dma_buf_fd() // returns a file descriptor for the given dma_buf
   ->get_unused_fd_flags()
   ->fd_install()
->ioctl() with ION_IOC_FREE
 ->ion_handle_validate(client, data.handle);
 ->ion_free(client, data.handle)
(Assignee)

Comment 73

4 years ago
The followings in do_mmap_pgoff() returns "-EACCES". It cause "Permission denied". attachment 8471019 [details] does not have a kernel log. Therefore it is not clear the error happened there.
https://www.codeaurora.org/cgit/external/gigabyte/kernel/msm/tree/mm/mmap.c?h=caf/b2g_kk_3.5#n1035
https://www.codeaurora.org/cgit/external/gigabyte/kernel/msm/tree/mm/mmap.c?h=caf/b2g_kk_3.5#n1042

The ion memory file is created as O_RDWR by dma_buf_export(). It seems to contradict to the above.
(Assignee)

Comment 74

4 years ago
Tapas, the recent log does not apply user side log patch. Can the test be done by applying both user side log patch attachment 8467704 [details] [diff] [review] and kernel side log patch attachment 8471264 [details] [diff] [review] ?
(Assignee)

Updated

4 years ago
Flags: needinfo?(tkundu)
(Assignee)

Updated

4 years ago
Flags: needinfo?(tkundu)
(In reply to Sotaro Ikeda [:sotaro] from comment #74)
> Tapas, the recent log does not apply user side log patch. Can the test be
> done by applying both user side log patch attachment 8467704 [details] [diff] [review]
> [diff] [review] and kernel side log patch attachment 8471264 [details] [diff] [review]
> [diff] [review] ?

We asked out test team and we are waiting for them to get those logs for us. Thanks a lot for your help.
(Assignee)

Comment 76

4 years ago
I re-checked attachment 8469555 [details]. I found weird log in "logcat_09-01-1970-05-21-14.log.txt". flatten() log print out 2 fds, but they are both 108.

> 08-07 17:20:24.367   231 11200 I Gecko   : gfxfds 19 flatten(0xaa7ffba8, 0, 0xaa7ffa60, 0)
> 08-07 17:20:24.367   231 11200 I Gecko   : gfxfds 19 list, dumping 0 fds:
> 08-07 17:20:24.367   231 11200 I Gecko   : 
> 08-07 17:20:24.367   231 11200 I Gecko   : gfxfds Write() using one fd: 108 at 108
> 08-07 17:20:24.367   231   231 D btif_config_util: btif_config_save_file(L188): in file name:/data/misc/bluedroid/bt_config.new
> 08-07 17:20:24.367   231 11200 I Gecko   : gfxfds Write() using one fd: 108 at 108
(Assignee)

Comment 77

4 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #76)
> I re-checked attachment 8469555 [details]. I found weird log in
> "logcat_09-01-1970-05-21-14.log.txt". flatten() log print out 2 fds, but
> they are both 108.

In "logcat_09-01-1970-05-21-14.log.txt", the same fds seems to happen only at the gralloc allocation that related to the crash.
(Assignee)

Comment 78

4 years ago
(In reply to Tapas Kumar Kundu from comment #48)
> Look into
> https://www.codeaurora.org/cgit/quic/la/platform/hardware/qcom/display/tree/
> libgralloc/ionalloc.cpp?h=b2g_kk_3.5#n154
> 
> But it can happen only if this FD is invalid. So my guess is that some
> process (b2g ?) has invalided this fd somewhere by mistake.

From comment 76, it might be possible that a task in b2g process invalidate a fd that belong to b2g process.
(Assignee)

Updated

4 years ago
Attachment #8467704 - Attachment description: Dumping flatten/unflatten and Read/Write fds to logcat → log patch - Dumping flatten/unflatten and Read/Write fds to logcat
(Assignee)

Updated

4 years ago
Attachment #8471264 - Attachment description: patch v3 - Add log around kernel mmap → log patch v3 - Add log around kernel mmap
(Assignee)

Comment 79

4 years ago
Created attachment 8472044 [details] [diff] [review]
log patch - Add log to close()

Add log to bionic's close() to check Comment 78.
(Assignee)

Comment 80

4 years ago
Tapas, I added bionic's log patch. Can you use the following patch for testing?
 - gecko log patch: attachment 8467704 [details] [diff] [review]
 - bionic log patch: attachment 8472044 [details] [diff] [review] 
 - kernel mmap log patch: attachment 8471264 [details] [diff] [review]
Flags: needinfo?(tkundu)
(Assignee)

Updated

4 years ago
Flags: needinfo?(tkundu)
(In reply to Sotaro Ikeda [:sotaro] from comment #80)
> Tapas, I added bionic's log patch. Can you use the following patch for
> testing?
>  - gecko log patch: attachment 8467704 [details] [diff] [review]
>  - bionic log patch: attachment 8472044 [details] [diff] [review] 
>  - kernel mmap log patch: attachment 8471264 [details] [diff] [review]

I asked our internal team to test again with this patch. I will update asap
(Assignee)

Comment 82

4 years ago
By using bionic log patch  attachment 8472044 [details] [diff] [review], I found the invalid fd close in b2g process. I am going to create a bug for it. It is not clear that invalid fd close happen at only one place. I am going to create a bug for each one.
(Assignee)

Updated

4 years ago
Depends on: 1053204
(In reply to Sotaro Ikeda [:sotaro] from comment #82)
> It is not clear that invalid fd close happen at only one place.

Nice find. Could you please think for adding log in b2g framework for invalid fd close issue (which you found now) ? This will saves lot of time in stability nightmare issues in 2.1
(Assignee)

Comment 84

4 years ago
(In reply to Tapas Kumar Kundu from comment #83)
> Nice find. Could you please think for adding log in b2g framework for
> invalid fd close issue (which you found now) ? This will saves lot of time
> in stability nightmare issues in 2.1

Bug 1053277 is created for it.
(Assignee)

Updated

4 years ago
See Also: → bug 1053277
Created attachment 8472482 [details]
mozilla_logs.tar.bz2

Here is the log from previous stability run which contains logs from Comment 62.

We have another build which is running stability test with logs from comment 82

@sotaro : Could you please go through these logs ? we have full logs  here
Flags: needinfo?(sotaro.ikeda.g)
(Assignee)

Comment 86

4 years ago
(In reply to Tapas Kumar Kundu from comment #85)
> Created attachment 8472482 [details]
> mozilla_logs.tar.bz2
> 
> @sotaro : Could you please go through these logs ? we have full logs  here

Thanks for the log :-)
logcat log says that GraphicBuffers' 2 fds are same. gralloc allocate two ion buffers. Both fds should have different fds. From the log, it seems possible that someone at least closed the first fd.

> 08-13 16:39:08.169   240  2709 I Gecko   : gfxfds Write() using one fd: 173 at 173
> 08-13 16:39:08.169   240  2709 I Gecko   : gfxfds Write() using one fd: 173 at 173


kernel log say that "file->f_op->mmap(file, vma)" was failed. But if the file is ion buffer, "ion_mmap()" should be called at "file->f_op->mmap(file, vma)". If ion_mmap() is failed, ion_mmap() print kernel error log. But the kernel log does not have such error log. From it the fd is related other file than ion. 
 
> <3>[  115.144449] mmap_region: mmap() failed
Flags: needinfo?(sotaro.ikeda.g)
(Assignee)

Comment 87

4 years ago
From attachment 8472482 [details], "fd close" or "fd replacement to other file" seems to happen. Bug 1053204 could be a cause of the problem. But it is not clear yet if bug 1053204 could fix all problems.
(Assignee)

Updated

4 years ago
Attachment #8470909 - Attachment is obsolete: true

Comment 88

4 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #87)
> From attachment 8472482 [details], "fd close" or "fd replacement to other
> file" seems to happen. Bug 1053204 could be a cause of the problem. But it
> is not clear yet if bug 1053204 could fix all problems.

Thanks Sotaro. We are trying the patch from bug 1053204 but meanwhile i hope you are continuing to look into it to confirm that we have fixed all cases.

Updated

4 years ago
Flags: needinfo?(tkundu)
NI: sotaro for comment #88, to see if there is anyway to verify the patch in 1053204 helps?
Flags: needinfo?(sotaro.ikeda.g)
(Assignee)

Comment 90

4 years ago
(In reply to bhavana bajaj [:bajaj] from comment #89)
> NI: sotaro for comment #88, to see if there is anyway to verify the patch in
> 1053204 helps?

By applying a patch in Bug 1053204, check if open() close failure happens in b2g process. I already did it today. And I did not saw invalid file descriptor close failure since applying the patch.
Flags: needinfo?(sotaro.ikeda.g)

Updated

4 years ago
Blocks: 1025317
Observed on: 

Device: msm8226
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.060
Moz BuildID: 20140810000201
B2G Version: 2.0
Gecko Version: 32.0
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=de28796a8956a48bb98ca67df6a33e0622d642d1
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=2b27becae85092d46bfadcd4fb5605e82e1e1093
Flags: in-moztrap?(bzumwalt)
STR makes it difficult to reproduce on a testrun. It appears a test case for this issue is unnecessary.
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
This reproduced in AU 60 but per bug 1053204 comment 18 that patch did not land until AU 63.
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage?][2.0-signoff-need+]
QA Whiteboard: [QAnalyst-Triage?][2.0-signoff-need+] → [QAnalyst-Triage+][2.0-signoff-need+]
Flags: needinfo?(ktucker)
Flags: in-moztrap?(bzumwalt)
Flags: in-moztrap-

Comment 94

4 years ago
We are not seeing this issue anymore.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
status-b2g-v2.0: affected → fixed
status-b2g-v2.1: affected → fixed
Target Milestone: --- → 2.1 S3 (29aug)
Observed on: 

Device: msm8226
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.066
Moz BuildID: 20140810160202
B2G Version: 2.0
Gecko Version: 32.0
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=de28796a8956a48bb98ca67df6a33e0622d642d1
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=2b27becae85092d46bfadcd4fb5605e82e1e1093
(Assignee)

Comment 96

4 years ago
(In reply to cafbot (PoC: ggrisco) from comment #95)
> Observed on: 
> 
> Device: msm8226
> Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.066
> Moz BuildID: 20140810160202
> B2G Version: 2.0
> Gecko Version: 32.0
> Gaia: 
> http://git.mozilla.org/?p=releases/gaia.git;a=commit;
> h=de28796a8956a48bb98ca67df6a33e0622d642d1
> Gecko:
> http://git.mozilla.org/?p=releases/gecko.git;a=commit;
> h=2b27becae85092d46bfadcd4fb5605e82e1e1093

This gecko does not have the fix.
We have frozen importing gaia and gecko code . But we have cherry-picked your fix into AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.066.

So this issue is happening again with your fix.
(Assignee)

Comment 98

4 years ago
Created attachment 8477396 [details] [diff] [review]
log patch - Dumping flatten/unflatten and Read/Write fds to logcat since Bug 1042387 fix

un-bitrot attachment 8467704 [details] [diff] [review] since Bug 1042387 fix.
(Assignee)

Comment 99

4 years ago
Created attachment 8477769 [details] [diff] [review]
log patch v2 - Add log to close()

Reduce the logout. Only when error happens, logout close() log.
Attachment #8472044 - Attachment is obsolete: true
(Assignee)

Comment 100

4 years ago
Created attachment 8477772 [details] [diff] [review]
temporary patch - Implement close() with c

The patch is temporary patch just to implement close() with c.
(Assignee)

Updated

4 years ago
Attachment #8472044 - Attachment is obsolete: false
(Assignee)

Updated

4 years ago
Attachment #8477772 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.