Closed Bug 880940 Opened 6 years ago Closed 6 years ago

[unagi weekly build 13.05.02]run unagi/tara monkeytest about 12 hours, b2g memory leak, homescreen can't launch

Categories

(Firefox OS Graveyard :: General, defect, critical)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

(blocking-b2g:-)

RESOLVED WORKSFORME
1.1 QE4 (15jul)
blocking-b2g -

People

(Reporter: james.zhang, Assigned: mccr8)

References

Details

(Whiteboard: [MemShrink:P2])

Attachments

(3 files)

Attached file about memory 0
Hi Justin,
This build include Bug 856080 patch, run monkey test 12 hours, homescreen can't launch.
Attached file about memory 1
Hi Justin, can you give me any document or wiki, about how to analyse memory leak issue ?
I'm not sure there's any documentation, unfortunately.  You just have to look at a number of them so you can tell what is unusual.  You can load memory-reports in a regular desktop browser by going to about:memory and clicking on "load" to load the file.

In about-memory-0, heap-unclassified is quite large:
├──26.92 MB (37.10%) ── heap-unclassified

The wifi worker is also pretty big:
│   ├──7.20 MB (09.93%) -- worker(resource://gre/modules/wifi_worker.js, 0x44544000)

explicit is only 72mb, doesn't seem like it is an unusually large amount?

virtual size is 225mb, which seems pretty high.  This line is the largest:
│  ├──158.36 MB (70.27%) ── anonymous, outside brk() [rw-p] [119]

Nothing else seems too high to me, though I'm not very used to looking at B2G reports and its tiny memories.
Assignee: nobody → continuation
Did you mean to assign this to yourself Andrew?
Flags: needinfo?(continuation)
Depends on: 870588
Yes, I was doing to analyze the log a bit.  If somebody else who is more experienced with these wants to look at it, that's fine too.
Flags: needinfo?(continuation)
There are about 6.5mb of heap-unclassified from IPC under mozilla::dom::TabParent::SendRealTouchEvent, which seems bad.  I saw this before, too.  This is filed under bug 870588.

There's also 1.8mb of Pickle stuff under mozilla::system::nsVolumeService::BroadcastVolume, which looks similar.
Whiteboard: [MemShrink]
I'd like to make sure that the homescreen is unable to launch due to OOM, because the memory usage of the main process, although high, is not ridiculous.  We may be hitting something else that's causing the homescreen to crash on startup.

To check for OOM, try to load the homescreen, and then run |adb shell dmesg|.  You should see something about sending SIGKIL to the homescreen process.  Please attach to the bug the result of |adb shell dmesg| and |adb logcat -v time|.
Whiteboard: [MemShrink]
Hi Justin,

    I'll run monkey test with bug 885158 patch to verify.
(In reply to James Zhang from comment #9)
> Hi Justin,
> 
>     I'll run monkey test with bug 885158 patch to verify.

Thanks!  Note that bug 885158 may not fix your whole problem there (in fact, it probably will not), but the DMD report shouldn't include a ton of pickled data anymore.
Hi Justin,
   New memory leak log, include about memory, adb logcat, adb dmesg, snapshot, bugreport. 
   Run monkey test about 48 hours with Bug 885158
patch, homescreen process exist, but no icon can display.
Hmm, the patch in bug 885158 does not seem to have improved the situation.  There's 25mb of stuff with Pickle::WriteBytes in the stack...
(In reply to Andrew McCreight [:mccr8] from comment #12)
> Hmm, the patch in bug 885158 does not seem to have improved the situation. 
> There's 25mb of stuff with Pickle::WriteBytes in the stack...

Yes.

Unreported: ~3,821 blocks from ~36 stack trace records in stack frame record 7 of 3,123
 ~15,995,792 bytes (~15,901,136 requested / ~94,656 slop)
 28.50% of the heap;  0.37% of unreported
 PC is
   Pickle::Resize(unsigned int) /media/500G_disk/sp8810eabase_512x256_hvga/B2G/gecko/ipc/chromium/src/base/pickle.cc:531 (0x413e9546 libxul.so+0xab3546)

Unreported: ~3,785 blocks from ~26 stack trace records in stack frame record 8 of 3,123
 ~15,848,444 bytes (~15,753,788 requested / ~94,656 slop)
 28.24% of the heap;  0.37% of unreported
 PC is
   Pickle::WriteBytes(void const*, int, unsigned int) /media/500G_disk/sp8810eabase_512x256_hvga/B2G/gecko/ipc/chromium/src/base/pickle.cc:453 (0x413e97fa libxul.so+0xab37fa)

Unreported: ~3,785 blocks from ~26 stack trace records in stack frame record 9 of 3,123
 ~15,848,444 bytes (~15,753,788 requested / ~94,656 slop)
 28.24% of the heap;  0.37% of unreported
 PC is
   Pickle::BeginWrite(unsigned int, unsigned int) /media/500G_disk/sp8810eabase_512x256_hvga/B2G/gecko/ipc/chromium/src/base/pickle.cc:420 (0x413e97be libxul.so+0xab37be)
Okay, we obviously need a better tool here than DMD.  We're looking at resurrecting the tool in bug 704240; that might help us.

If it's any consolation, I'm not aware of us seeing this leak in production, so perhaps it's an artifact of the monkey test somehow.
Whiteboard: [MemShrink]
Whiteboard: [MemShrink] → [MemShrink:P2]
Duplicate of this bug: 889261
blocking-b2g: --- → leo+
Target Milestone: --- → 1.1 QE4 (15jul)
Fabrice, this has been identified as an extremely urgent bug.  I don't think it's going to be completely fixed by the pickle issue.  We need to figure out the DOMApplication object issue post haste.
Depends on: 893012
Bug 893172 just landed. Do we need to re-run tests here?
(In reply to Alex Keybl [:akeybl] from comment #17)
> Bug 893172 just landed. Do we need to re-run tests here?

Someone needs to look at the memory report again to see if it's mostly heap-unclassified (possibly fixed by bug 893172), or mostly strings (likely not fixed by bug 893172, needs bug 889990).

Just download the attachment, open about:memory, load the file, and look under the main process.
We should probably wait until bug 893012 and dependencies are fixed.
This might be worth testing now.
Well, okay, I'll do what comment 18 suggests first.
Status: UNCONFIRMED → NEW
Ever confirmed: true
48% (45mb) of the heap in the 6-24 attachment is heap-unclassified, and there aren't many strings now, so I think in light of bug 893012 being closed this would be worth retesting.
James, can you please help with the re-run of the test run as comment #22 suggests?
Flags: needinfo?(james.zhang)
(In reply to bhavana bajaj [:bajaj] from comment #23)
> James, can you please help with the re-run of the test run as comment #22
> suggests?

Hi bajaj,

   We're working on gecko master branch (FFOS v1.2) now, can you give me this commit revision? I can test it on master branch.
Flags: needinfo?(james.zhang)
(In reply to James Zhang from comment #24)
> (In reply to bhavana bajaj [:bajaj] from comment #23)
> > James, can you please help with the re-run of the test run as comment #22
> > suggests?
> 
> Hi bajaj,
> 
>    We're working on gecko master branch (FFOS v1.2) now, can you give me
> this commit revision? I can test it on master branch.

Looks like https://bugzilla.mozilla.org/show_bug.cgi?id=893172#c21 landed a month ago, I would recommend trying the latest build to see if the issue is fixed.
James,

Did you get a chance to test on the suggested build?
Flags: needinfo?(james.zhang)
(In reply to Preeti Raghunath(:Preeti) from comment #26)
> James,
> 
> Did you get a chance to test on the suggested build?

No, latest build always crash because camera and video can't work now.
Monkey test need a stable version and can run over 12 hours.
Flags: needinfo?(james.zhang)
QA,

Please check if this issue has been fixed on the suggested build
Keywords: qawanted
(In reply to Preeti Raghunath(:Preeti) from comment #28)
> QA,
> 
> Please check if this issue has been fixed on the suggested build

Can't do that. We don't do monkeytests within QA. You need to followup with partners here.
Keywords: qawanted
James,

Please check with the latest build and a method to run the monkey tests. Moz will not be executing the same. Please re-nom based on the test results.
blocking-b2g: leo+ → -
Flags: needinfo?(james.zhang)
Hi Ying,

Please run monkey test in latest build today to check this issue, thanks.
Flags: needinfo?(james.zhang)
Partners never responded.  Can we close this?
We have run monkey test on fugu device with v1.2 branch for a few days
So far we have not found any memory leak problems.
Please close this bug.
Sorry for the delayed response.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.