Closed
Bug 886217
Opened 11 years ago
Closed 11 years ago
[General] Memory leakage induces a crash on RPC Modem
Categories
(Firefox OS Graveyard :: General, defect)
Tracking
(blocking-b2g:leo+, b2g18 fixed)
Tracking | Status | |
---|---|---|
b2g18 | --- | fixed |
People
(Reporter: leo.bugzilla.gecko, Unassigned)
References
Details
(Keywords: verifyme, Whiteboard: [TD-59414][MemShrink:P2] QARegressExclude)
Attachments
(5 files)
If gecko (the b2g process) contains a memory leak then we will see its rss continues to grow like this.
At some point it's likely that the process will grow large enough that it inhibits the kernel from responding to modem RPC
requests and the modem will timeout and induce a crash.
Reporter | ||
Updated•11 years ago
|
blocking-b2g: --- → leo?
Target Milestone: --- → 1.1 QE3 (24jun)
Comment 1•11 years ago
|
||
Is this speculation or did you actually reproduce the crash?
Comment 2•11 years ago
|
||
Comment 3•11 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #1)
> Is this speculation or did you actually reproduce the crash?
not yet, we are checking now.
the point of problems of attached kernel log.
0) : <6>[2022-06-19 21:21:32 KST][43496.202695] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
1) : <6>[2022-06-19 17:53:18 KST][31001.723033] [ 150] 0 150 112034 69527 0 0 0 b2g
2) : <6>[2022-06-19 21:21:32 KST][43496.202922] [ 150] 0 150 128515 84430 0 0 0 b2g
b2g's rss size gradually increased from 69527 to 84430.
Comment 4•11 years ago
|
||
0) : <6>[2022-06-19 21:21:32 KST][43496.202695] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
1) : <6>[2022-06-19 17:53:18 KST][31001.723033] [ 150] 0 150 112034 69527 0 0 0 b2g
2) : <6>[2022-06-19 21:21:32 KST][43496.202922] [ 150] 0 150 128515 84430 0 0 0 b2g
sorry, fix column's width
Reporter | ||
Comment 5•11 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #1)
> Is this speculation or did you actually reproduce the crash?
Acutally, the crash happened when we were running Marionette script to test functionalities on Leo device.
And now we are testing to reproduce this and analyzing to figure out the root cause.
If we find additional information for this, we will attach them here.
Comment 6•11 years ago
|
||
I think the growth of RSS of b2g doesn't mean there's memory leak in b2g for sure. What about the PSS?
Second, from the kernel log, this is an oom-killer invoked case. If the PSS of b2g and Apps doesn't increase much compare to normal cases, maybe checking whether there's memory leak in kernel or fragmentation is a possible direction.
Comment 7•11 years ago
|
||
Gabriele can you keep an eye on this bug while Alan is going to be away for the week?
Flags: needinfo?(gsvelto)
Comment 8•11 years ago
|
||
(In reply to Wayne Chang [:wchang] from comment #7)
> Gabriele can you keep an eye on this bug while Alan is going to be away for
> the week?
Yes, I can have a look. Comment 5 suggests that this is happening only when using marionette, if this is the case then I know of a recent marionette problem that was looking like a leak but was in fact caused by the GC being indefinitely delayed from running effectively causing b2g to consume all memory. I'll try to dig out that bug and see if it's related.
Flags: needinfo?(gsvelto)
Comment 9•11 years ago
|
||
We previously had a similar problem that was fixed as part of bug 825802. The patch was uplifted to mozilla-b2g18 five months ago so you might be experiencing a different issue but it's probably worth checking.
Comment 10•11 years ago
|
||
We should try to grab an about:memory dump of the processes right before the crash happens, you can find instructions on how to do it in this tutorial:
https://wiki.mozilla.org/B2G/Debugging_OOMs
Comment 11•11 years ago
|
||
I'm using this script to get memory reports of b2g when oom-killer is activated.
I register this script as service and run automated test via marionette.
Comment 12•11 years ago
|
||
I made a service with oom_trap.sh, I attached before.
With that service, marionette test, which made modem crash, is performed repeatedly over 5 hours.
The leo device didn't show crash on modem but there were some marks of oom-killer activation.
So, I think it's worth to dig.
According to the logs, there are huge amount of png images.
I don't think it's normal.
Anyone have idea for this?
Comment 13•11 years ago
|
||
(In reply to Changbin Park from comment #12)
> Anyone have idea for this?
Checking your memory report dumps the image that shows up multiplied a lot of times is: "..." so this is the same problem we've been investigating as part of bug 851626.
I'll close this one as a duplicate and add a comment there to point to your logs; I'll also CC everybody else there. Thanks for the thorough testing!
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
Updated•11 years ago
|
blocking-b2g: leo? → ---
Comment 14•11 years ago
|
||
Bug 851626 Comment 93
before marionette test.
│ │ │ │ │ ├──1.03 MB (02.83%) -- huge
│ │ │ │ │ │ ├──0.19 MB (00.52%) ── string(length=23510, "...") [4]
│ │ │ │ │ │ ├──0.16 MB (00.43%) ── string(length=18514, "...") [4]
│ │ │ │ │ │ ├──0.16 MB (00.43%) ── string(length=20438, "...") [4]
│ │ │ │ │ │ ├──0.10 MB (00.27%) ── string(length=9114, "...") [5]
│ │ │ │ │ │ ├──0.09 MB (00.26%) ── string(length=10914, "...") [4]
│ │ │ │ │ │ ├──0.09 MB (00.26%) ── string(length=7074, "...") [6]
│ │ │ │ │ │ ├──0.08 MB (00.21%) ── string(length=8054, "...") [5]
│ │ │ │ │ │ ├──0.05 MB (00.13%) ── string(length=4654, "...") [4]
│ │ │ │ │ │ ├──0.04 MB (00.11%) ── string(length=2522, "...") [5]
│ │ │ │ │ │ ├──0.04 MB (00.11%) ── string(length=3062, "...") [5]
│ │ │ │ │ │ └──0.04 MB (00.11%) ── string(length=3926, "...") [5]
after testing by marionette.
│ │ │ │ │ ├──1.55 MB (03.72%) -- huge
│ │ │ │ │ │ ├──0.23 MB (00.56%) ── string(length=9114, "...") [12]
│ │ │ │ │ │ ├──0.20 MB (00.49%) ── string(length=7074, "...") [13]
│ │ │ │ │ │ ├──0.19 MB (00.45%) ── string(length=23510, "...") [4]
│ │ │ │ │ │ ├──0.19 MB (00.45%) ── string(length=8054, "...") [12]
│ │ │ │ │ │ ├──0.16 MB (00.38%) ── string(length=18514, "...") [4]
│ │ │ │ │ │ ├──0.16 MB (00.38%) ── string(length=20438, "...") [4]
│ │ │ │ │ │ ├──0.09 MB (00.23%) ── string(length=10914, "...") [4]
│ │ │ │ │ │ ├──0.09 MB (00.23%) ── string(length=2522, "...") [12]
│ │ │ │ │ │ ├──0.09 MB (00.23%) ── string(length=3062, "...") [12]
│ │ │ │ │ │ ├──0.09 MB (00.23%) ── string(length=3926, "...") [12]
│ │ │ │ │ │ └──0.05 MB (00.11%) ── string(length=4654, "...") [4]
Comment 15•11 years ago
|
||
Bug 851626 Comment 94
Running the gallery-camera stress test on an m-c/master build, the test actually completed; though the b2g parent process had ballooned to the point where only it and the Gallery app could fit in memory at the same time.
After 100 iterations, I see 205 copies of the Marketplace app icon, ~2 per iteration.
(jlebar, this is without your DOMRequest fixes--I'll run that test overnight.)
I know next to nothing about how marionette works, but I wonder how it could be very specifically leaking data: URI icons.
Updated•11 years ago
|
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Updated•11 years ago
|
blocking-b2g: --- → leo+
Whiteboard: [TD-59414]
Target Milestone: 1.1 QE3 (26jun) → 1.1 QE4 (15jul)
Comment 16•11 years ago
|
||
Icon duplicate issue might be caused by marionette.
This one is not reproduced by manual test.
Do you know who is a right person for this one?
Flags: needinfo?(tchung)
Comment 17•11 years ago
|
||
(In reply to jongsoo.oh from comment #16)
> Icon duplicate issue might be caused by marionette.
> This one is not reproduced by manual test.
> Do you know who is a right person for this one?
Marionette knowledge would be best addressed by jonathan griffin. jgriffin, any known issues?
Flags: needinfo?(tchung) → needinfo?(jgriffin)
Comment 18•11 years ago
|
||
(In reply to Tony Chung [:tchung] from comment #17)
> (In reply to jongsoo.oh from comment #16)
> > Icon duplicate issue might be caused by marionette.
> > This one is not reproduced by manual test.
> > Do you know who is a right person for this one?
>
> Marionette knowledge would be best addressed by jonathan griffin. jgriffin,
> any known issues?
more context, it seems bug 889261 is the actual issue that Leo is tracking. The first comment notes that they are using marionette to launch apps.
Comment 19•11 years ago
|
||
It is a wrong. The bug 885158 is not related to the marionette.
It is use the orangutan because of mariomette icon issue.
Comment 20•11 years ago
|
||
There aren't any known Marionette issues related to icon duplication. It's possible that there is a Marionette bug, or it's possible that there is a Gaia bug which only occurs when events are dispatched very quickly (as would happen during a Marionette test) and not when events happen more slowly as would happen during a manual test.
If you can show us the Marionette test you're using, we may be able to investigate.
Flags: needinfo?(jgriffin)
Updated•11 years ago
|
Whiteboard: [TD-59414] → [TD-59414][MemShrink]
Comment 21•11 years ago
|
||
The memory report here looks the same as bug 851626.
See bug 889990 for the analysis we've already done. It's not clear to me yet how this is marionette's fault, but given that we only see this without marionette, I think it's pretty likely.
We're leaking DOM application objects. If you do a gc/cc dump, you can see exactly how we're leaking them.
Comment 22•11 years ago
|
||
Leo, can you attach the Marionette script you're using to reproduce this issue, or describe what it's doing? I'd like to reproduce it, so I can attempt to fix it, if indeed it's Marionette-related.
Comment 23•11 years ago
|
||
If there's some leak when running Marionette tests involving manifests, it may be due to the gaiatest atoms at https://github.com/mozilla/gaia-ui-tests/blob/master/gaiatest/atoms/gaia_apps.js, but I'd need to know if you're using gaiatest or not.
Updated•11 years ago
|
Whiteboard: [TD-59414][MemShrink] → [TD-59414][MemShrink:P2]
Comment 24•11 years ago
|
||
We're marking this MemShrink P2 because we're not sure whether this is a marionette-only issue or not.
If this is caused by Marionette, we should leo- this bug. OTOH if it's not, we should probably make this a P1.
Like jgriffin said in comment 22, we need the test script Leo was running.
Reporter | ||
Comment 25•11 years ago
|
||
I attach the Marionette test script we reproduce this issue.
Comment 26•11 years ago
|
||
(In reply to leo.bugzilla.gecko from comment #25)
> Created attachment 780177 [details]
> test_scripts.7z
>
> I attach the Marionette test script we reproduce this issue.
Thank you; is it also possible to attach the lglib files that these tests use, so I can locally reproduce the problem?
Also, which of the attached tests were you running? All of them in sequence?
Comment 27•11 years ago
|
||
If the only relevant leak here is the icon url leak, that's been isolated in bug 897684.
Reporter | ||
Comment 28•11 years ago
|
||
I'm sorry, I thought you just need test scripts.
So, I attach lglib files for Marionette testing.
Also, I attach some scripts that we tested. These files using on git bash are numbered according to the order.
Comment 29•11 years ago
|
||
needinfo'ing jgriffin here to help understand if this is a marionette issue alone per lglib files in the above comment and keeping in mind comment# 24.
Flags: needinfo?(jgriffin)
Comment 30•11 years ago
|
||
AFAIK, it isn't a Marionette issue; it's caused at least in part by bug 897684 (a gecko bug). Once that's fixed, we can determine if there are other leaks involved.
Flags: needinfo?(jgriffin)
Updated•11 years ago
|
Comment 31•11 years ago
|
||
QA Wanted for what purpose?
Probably don't need verify me here - the bug isn't fixed yet.
Keywords: verifyme
Comment 32•11 years ago
|
||
Talked with Preeti in person - don't need do anything here yet for qawanted. When bug 889984 gets uplifted, a retest will be requested on this bug to see if it's fixed.
Keywords: qawanted
Comment 33•11 years ago
|
||
bug 889984 will not be fixed for 1.1. Lets wait for bug bug 900221 to be uplifted instead. QA please test if fixing 900221 fixes this issue (886217) as well. Thanks!!
Whiteboard: [TD-59414][MemShrink:P2] → [TD-59414][MemShrink:P2] QARegressExclude
Updated•11 years ago
|
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
status-b2g18:
--- → fixed
Updated•11 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•