Closed Bug 454731 Opened 13 years ago Closed 11 years ago

pageloader.js should watch for oom

Categories

(Release Engineering :: General, defect, P5)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dougt, Assigned: aki)

References

Details

(Whiteboard: [talos])

Attachments

(3 files, 3 obsolete files)

Attached file pageloader.js
pageloader.js should watch for oom.  when oom happens, we should just stop the cycler and report back what output occurred up to the OOM event.

below is the modified pageloader.js i have tested.

the important bits are:

    // setup oom watcher:
    var os = Cc["@mozilla.org/observer-service;1"].getService(Ci.nsIObserverService);
    os.addObserver(gPageLoaderStopper, "memory-pressure", false);


and of course:

const gPageLoaderStopper = {
 observe: function(subject, topic, data)
 {
   dumpLine('page loader: running low on memory.  stopping...');
   cycle = NUM_CYCLES; // make sure it is done.  (zero based)

   plStop(false);
 },
};
this will block fennec talos runs (unless we change the page set)
Component: Release Engineering → Release Engineering: Talos
triaging to future for now. Should be able to deal with this in early q4.
Component: Release Engineering: Talos → Release Engineering: Future
Priority: -- → P3
Assignee: nobody → anodelman
Priority: P3 → P2
I'm currently able to run talos for 1 cycle (minus tjss currently) on an n810 without a custom pageloader, but with 128mb swap.  I haven't tried pushing to the point where I get oom errors...

I think this is a good to have, but it's looking like I can get talos numbers without it, if we end up running out of time.
Throwing back in the pool.
Assignee: anodelman → nobody
This isn't blocking my nokia talos work... removing dependency.
No longer blocks: 455755
We might want to revisit; it's highly possible that Tp4/Tp4 nochrome on Maemo are crashing regularly due to oom.  It would be nice to exit gracefully and get some post-mortem info before rebooting.

I'd be surprised if we can get to this this quarter though.
Blocks: maemo4
Mass move of bugs from Release Engineering:Future -> Release Engineering. See
http://coop.deadsquid.com/2010/02/kiss-the-future-goodbye/ for more details.
Component: Release Engineering: Future → Release Engineering
Priority: P2 → P3
Whiteboard: [talos]
Blocks: mobile-pool
This may be a part of the get-n900s-and-n810s-green project, depending.
Priority: P3 → P5
This'll need testing in staging.  However, it might help us with Maemo Tp4 crashes.

Vlad, I'm under the impression you wrote the original pageloader; does this look good to you?
Attachment #444696 - Flags: review?(vladimir)
Comment on attachment 444696 [details] [diff] [review]
attaching dougt's pageloader as a diff

Dunno where the original pageloader that this was diffed against came from, but it's nowhere near what's current in trunk.  Can you post a new diff against whatever's in trunk?  You'll likely need the whole updated set of stuff in the pageloader module.
Attachment #444696 - Flags: review?(vladimir) → review-
Sorry, I checked out mozilla/testing/tools/pageloader instead of mozilla/layout/tools/pageloader previously.
Attachment #444696 - Attachment is obsolete: true
Attachment #444929 - Flags: review?(vladimir)
Comment on attachment 444929 [details] [diff] [review]
diffing against mozilla/layout/tools/pageloader/pageloader.js

That looks fine, except for the change in plRecordTime -- no need to get rid of the printing of the next page.
Attachment #444929 - Flags: review?(vladimir) → review+
Oh also -- 

+   dumpLine('page loader: running low on memory.  stopping...');

That should really be 'ERROR: page loader: ....'.  Not completing the full run is an error; the results aren't going to be useful.
Attached patch with vlad's corrections (obsolete) — Splinter Review
Updated patch with Vlad's corrections; carrying over r+.
Assignee: nobody → aki
Attachment #444929 - Attachment is obsolete: true
Attachment #444958 - Flags: review+
Whiteboard: [talos] → [talos][needs staging]
Attached file new pageloader.xpi
This should be copied to http://build.mozilla.org/talos/xpis/pageloader.xpi during the downtime.
Attachment #470801 - Attachment is patch: false
Attachment #470801 - Attachment mime type: text/plain → application/octet-stream
Attached patch hg-friendly diffSplinter Review
This is an hg-friendly diff, for ease of landing in http://hg.mozilla.org/build/pageloader .
Carrying forward Vlad's r+.

I ran this overnight in both mobile and Firefox desktop Talos staging; looks good.
Attachment #444958 - Attachment is obsolete: true
Attachment #470803 - Flags: review+
Blocks: 591055
Whiteboard: [talos][needs staging] → [talos]
Comment on attachment 470801 [details]
new pageloader.xpi

Landed this on build.m.o
Attachment #470801 - Flags: checked-in+
I am actually not sure we want this anymore.  Two years ago when I wrote the patch, it was important because we'd run out of memory (due to bloat, and leaks).  Running out of memory on the device would reboot it. :(

Today, if we run out of memory, devices don't reboot.  We are also alot better at leaks and bloat.

Can we remove this, or pref disable?

(Also, 2 years?  wow.  has it been that long!)
> I am actually not sure we want this anymore.  Two years ago when I wrote the
> patch, it was important because we'd run out of memory (due to bloat, and
> leaks).  Running out of memory on the device would reboot it. :(

This is not true on N900s?

> Today, if we run out of memory, devices don't reboot.  We are also alot better
> at leaks and bloat.

Actually, today if we finish the test run with any status (success, failure, warning) we reboot. In the case of the N900, we also reformat the partition the test was run on at boot. How do we tell what the cause of, say, a Tp4 failure is?  With this landed, we can at least rule out oom.

Also, does this patch negatively affect anything?

> Can we remove this, or pref disable?
 
Is there a reason you waited until after the work was complete to request this? There is a decent amount of activity on the bug this calendar year [1], and as the filer, you should have been CC'ed.

> (Also, 2 years?  wow.  has it been that long!)

Yes.
We're making a concerted effort to clean out our pre-2010 bugs this quarter.
I checked, out of curiosity; it looks like Fennec has 335 bugs opened before January 1 [2].

[1] https://bugzilla.mozilla.org/show_activity.cgi?id=454731
[2] http://bit.ly/cOlj43
> This is not true on N900s?

I don't think so.

> Is there a reason you waited until after the work was complete to request this?

After two years, I basically gave up and forgot about it.  I should have followed closer, clearly.
oh, and... 

We probably can just leave this in and see if it hurts anything.  If we do get OOM notifications, please file bugs and we'll fix fennec.
Will do.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.