Closed Bug 1001206 Opened 6 years ago Closed 6 years ago

OOM Logger should have some additional regexps

Categories

(Core :: Hardware Abstraction Layer (HAL), defect)

x86_64
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla32

People

(Reporter: dhylands, Assigned: ahameez)

References

Details

(Whiteboard: [good first bug][mentor=dhylands][lang=C++])

Attachments

(1 file, 3 obsolete files)

Currently the oom logger (found in GonkHal.cpp) looks for kernel message from this kernel function:
http://lxr.linux.no/#linux+v3.4/drivers/staging/android/lowmemorykiller.c#L66

Apparently, some phones also produce messages from this file as well:
http://lxr.linux.no/#linux+v3.4/mm/oom_kill.c

So we should add regexps to cover the messages produced from oom_kill.c
See Also: → 939372
Whiteboard: [good first bug][mentor=dhylands][lang=C++]
I have added the required regular expressions in the patch.
Looking for feedback, thanks.
Comment on attachment 8421054 [details] [diff] [review]
Patch including additional regexp for OOM Logger

Review of attachment 8421054 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. When looking for feedback, you should put a f? on the patch, and ideally fill in someones name (you can also generally use : followed by an IRC nick. Bugzilla will fill in an email for IRC nicks it knows about). In this case, since I volunteered to be a mentor for this bug, it's appropriate to ask for feedback from me.

I (and I'm guessing most devs here) have filters setup to put feedback & review requests into special folders so that I don't otherwise miss them.

::: hal/gonk/GonkHal.cpp
@@ +1110,5 @@
>    const char* const regexes_raw[] = {
>      ".*select.*to kill.*",
>      ".*send sigkill to.*",
>      ".*lowmem_shrink.*, return",
> +    ".*lowmem_shrink.*, ofree.*",

Let's combine a bunch of these. Rather than making the regexps really specific, lets try to make them more generic. In the worst case, this means that we log a little bit too much, but as a general guideline this will help us to adapt to minor changes made in the kernel.

So, for example, I would log anything with lowmem_shrink
And I'd probably go so far as to say lets log any lines which has "[Oo]ut of memory"

@@ +1117,5 @@
> +    "Out of memory (oom_kill_allocating_task)",
> +    "Memory cgroup out of memory",
> +    "Out of memory: (compulsory|system-wide) panic_on_oom is enabled",
> +    "Kill process [0-9]* \((.*)\) sharing same memory",
> +    "Killed process [0-9]* \((.*)\) total-vm:[0-9]*kB, anon-rss:[0-9]*kB, file-rss:[0-9]*kB",

Lets log all messages of the form "Kill process" and/or "Killed process"

@@ +1119,5 @@
> +    "Out of memory: (compulsory|system-wide) panic_on_oom is enabled",
> +    "Kill process [0-9]* \((.*)\) sharing same memory",
> +    "Killed process [0-9]* \((.*)\) total-vm:[0-9]*kB, anon-rss:[0-9]*kB, file-rss:[0-9]*kB",
> +    ".*: Kill process [0-9]* \((.*)\) score -?[0-9]* or sacrifice child",
> +    ".* invoked oom-killer: gfp_mask=0x[0-9a-d]*, order=-?[0-9]*, oom_adj=-?[0-9]*, oom_score_adj=-?[0-9]*",

Lets log any line which has "oom-killer" in it.

@@ +1120,5 @@
> +    "Kill process [0-9]* \((.*)\) sharing same memory",
> +    "Killed process [0-9]* \((.*)\) total-vm:[0-9]*kB, anon-rss:[0-9]*kB, file-rss:[0-9]*kB",
> +    ".*: Kill process [0-9]* \((.*)\) score -?[0-9]* or sacrifice child",
> +    ".* invoked oom-killer: gfp_mask=0x[0-9a-d]*, order=-?[0-9]*, oom_adj=-?[0-9]*, oom_score_adj=-?[0-9]*",
> +    "\[-?[0-9]*\]-?[0-9]*-?[0-9]*[0-9]*[0-9]*[0-9]*-?[0-9]*-?[0-9]*.*"

I think that this regexp needs a comment, i.e. include the type of line you're trying to grab.

@@ +1121,5 @@
> +    "Killed process [0-9]* \((.*)\) total-vm:[0-9]*kB, anon-rss:[0-9]*kB, file-rss:[0-9]*kB",
> +    ".*: Kill process [0-9]* \((.*)\) score -?[0-9]* or sacrifice child",
> +    ".* invoked oom-killer: gfp_mask=0x[0-9a-d]*, order=-?[0-9]*, oom_adj=-?[0-9]*, oom_score_adj=-?[0-9]*",
> +    "\[-?[0-9]*\]-?[0-9]*-?[0-9]*[0-9]*[0-9]*[0-9]*-?[0-9]*-?[0-9]*.*"
> +    "[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name"};

I'm guessing that this is a title line for the previous output.

Let's group these (i.e. put a comment line that mentions that the following 2 regexps are grabbing the output of dump_task, , the title line regexp, and then the body line regexp.
And also, you should probably put yourself down as the assignee. If you don't have enough priviledge to do that, then ask. I'll go ahead and assign you to the bug.

I'm guilty of forgetting to do this for bugs I work on as well :)
Assignee: nobody → althaf.mozilla
Thanks for the great feedback :) 
I have updated the patch file with the changes requested.
Attachment #8421054 - Attachment is obsolete: true
Attachment #8421401 - Flags: review?(dhylands)
Comment on attachment 8421401 [details] [diff] [review]
updated Patch including additional regexp for OOM Logger

Review of attachment 8421401 [details] [diff] [review]:
-----------------------------------------------------------------

I decided to try and test this and see how well it works. I think that that script should be correct, but feel free to correct it if I made a mistake. I tested on a flame, but it shouldn't matter.

Script:
----- Start of script -----
#!/bin/bash

klog() {
   adb shell "echo '$1' > /dev/kmsg"
}

klog 'select to kill'
klog 'send sigkill to'
klog 'lowmem_shrink'
klog 'out of memory'
klog 'Out of memory'
klog 'Killed Process'
klog 'Kill process'
klog 'kill Process'
klog 'oom-killer'
klog '[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name'
klog '[    1]     2     3      4         5    6      7       8        foobar'
klog '[    1]     2     3      4         5    6     -7       8        foobar'
klog '[    1]     2     3      4         5    6      7      -8        foobar'
----- End of script -----


I ran that on the host, and then launched and killed an app (like clock) which triggers the oom logger. What I saw in logcat was the following:

11:13:10.881 326/326                 OomLogger  E   [Kill]: select to kill
11:13:10.881 326/326                 OomLogger  E   [Kill]: send sigkill to
11:13:10.881 326/326                 OomLogger  E   [Kill]: lowmem_shrink
11:13:10.881 326/326                 OomLogger  E   [Kill]: out of memory
11:13:10.881 326/326                 OomLogger  E   [Kill]: Out of memory
11:13:10.891 326/326                 OomLogger  E   [Kill]: oom-killer

Looking at the documentation for regcomp 
http://linux.die.net/man/3/regcomp

it suggests that the "POSIX Basic Regular Expression syntax is used" which appears to be this: http://www.regular-expressions.info/posix.html

So it appears that ? is only supported as part of the "POSIX Extended Regular Expression syntax". So you'll either need to rewrite the regexps to use the basic syntax or try using REG_EXTENDED flag.

So I've r-'d the patch since it doesn't quite work.
Attachment #8421401 - Flags: review?(dhylands) → review-
Thanks for the test, I didn't realize which syntax of regular expression was being used. I'll attempt to fix the patch and get back to you.
So I updated it using the POSIX Basic syntax.
Luckily I could test it since grep uses the same syntax.
I think it should work fine now.
Attachment #8421401 - Attachment is obsolete: true
Attachment #8422785 - Flags: review?(dhylands)
Comment on attachment 8422785 [details] [diff] [review]
updated using POSIX Basic regex syntax

Review of attachment 8422785 [details] [diff] [review]:
-----------------------------------------------------------------

Much better, but still not quite right.

When I run the test script, I get everything but the title line (in particular, this line:
> [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name

If you can't figure it out then here's a hint: http://hastebin.com/quzazotoce.vala

For your next submission, please add r=dhylands to the bug description (in the patch - use "hg qref -e" to update the description).

::: hal/gonk/GonkHal.cpp
@@ +1118,5 @@
> +    // The regexes below are for the output of dump_task from oom_kill.c
> +    // 1st - title 2nd - body lines (8 ints and a string)
> +    // oom_adj and oom_score_adj can be negative
> +    "\[ pid \]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name",
> +    "\[.*[0-9][0-9]*\][ ]*[0-9][0-9]*[ ]*[0-9][0-9]*[ ]*[0-9][0-9]*[ ]*[0-9][0-9]*[ ]*[0-9][0-9]*[ ]*.[0-9][0-9]*[ ]*.[0-9][0-9]*.*"};

nit: put the closing curly brace on the next line.
Attachment #8422785 - Flags: review?(dhylands)
Attachment #8422785 - Attachment is obsolete: true
Attachment #8424140 - Flags: review?(dhylands)
Comment on attachment 8424140 [details] [diff] [review]
bug1001206_additionalregex.diff

Review of attachment 8424140 [details] [diff] [review]:
-----------------------------------------------------------------

Excellent - the test script passes.
Attachment #8424140 - Flags: review?(dhylands) → review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/c5d95022f856
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
You need to log in before you can comment on or make changes to this bug.