Closed Bug 726002 Opened 12 years ago Closed 12 years ago

Potential OOM in new UrlClassifier.

Categories

(Toolkit :: Safe Browsing, defect)

13 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 17

People

(Reporter: gcp, Assigned: gcp)

Details

Attachments

(4 files, 4 obsolete files)

The new code from bug 673470 has a path that should've used a fallible array but didn't. Patch attached.

After Bug 719531 landed, I now see URLCLASSIFIER_OOM Telemetry hits. So we're still OOMing here in some cases. It's too bad not crashing here makes us unable to use the extra information Bug 720444 or Bug 716638 would bring. This is a top 20 crasher in 10.0 so I would've liked to understand the underlying issue. 

Taras, does it make sense to add Telemetry for what Bug 720444 gathers? We could then look at the Telemetry submissions with OOM=1 and see if they are really low on memory. I think reordering the Telemetry recording (done in the patch) before the memory allocation will give us information on the allocation size similar to bug 716638 does.
Attachment #596051 - Flags: review?(dcamp)
Attachment #596051 - Flags: feedback?(justin.lebar+bug)
Assignee: nobody → gpascutto
> Taras, does it make sense to add Telemetry for what Bug 720444 gathers?

Problem is, if we collect this data when telemetry ping runs, it's not necessarily going to reflect the state of things when we ran out of memory here.

Why don't we, in a separate bug, temporarily turn all the relevant fallible TArrays into infallible arrays and then look at the resulting crash reports?
>Problem is, if we collect this data when telemetry ping runs, it's not necessarily 
>going to reflect the state of things when we ran out of memory here.

Good point.

What about making the OOM not a boolean, but transmit the free-RAM value instead? (I'd guess that will confuse our backend right now, so it'd have to be renamed)

>Why don't we, in a separate bug, temporarily turn all the relevant fallible 
>TArrays into infallible arrays and then look at the resulting crash reports?

There aren't that many OOMs (I see 4 pings so far, which might have been from 1 person for all we know), so we might have to do this on Aurora, and maybe even Beta, then back out again.

That's not very pleasant.
> What about making the OOM not a boolean, but transmit the free-RAM value instead? (I'd guess that 
> will confuse our backend right now, so it'd have to be renamed)

We could do that, but we'd lose the stack trace and data on exactly which TArray operation was causing the failure.  (I guess you could add one histogram for each possible failure point.)

I guess we could send the available-commit-space number and investigate further only if it's abnormally high...
Comment on attachment 596051 [details] [diff] [review]
Patch 4. Fix a potential OOM in the new code. Reorder telemetry.

This looks sane, although I really don't know this code.  Note that mCompletions (used in LookupCache::Build) is an *infallible* TArray.

gcp and I discussed on IRC that it might be possible to reduce the working memory of LookupCache::Build by destroying aAddCompletes immediately after constructing mCompletions.

Indeed, you could go further than that and free aAddCompletes *while* you build mCompletions.  Add an item, remove the corresponding item, and periodically call TArray::Compact().  You could do the same thing in ConstructPrefixSet.
Attachment #596051 - Flags: feedback?(justin.lebar+bug) → feedback+
Attachment #596051 - Flags: review?(dcamp) → review+
This basically implements the easiest part of jlebar's suggestion. And I found another place where we keep a temporary big array longer than needed.

Will deal with OOM telemetry in a separate bug.

https://tbpl.mozilla.org/?tree=Try&rev=3fadf2578425
Attachment #596341 - Flags: review?(dcamp)
Attachment #596341 - Flags: review?(dcamp) → review+
Comment on attachment 596051 [details] [diff] [review]
Patch 4. Fix a potential OOM in the new code. Reorder telemetry.

Funnily enough, nsTArray::SetCapacity returns a boolean. Please backout.
Attachment #596051 - Attachment is patch: true
Attachment #596051 - Flags: review-
This is a trivial fix - let's do a followup instead.
Attachment #596667 - Flags: review?(bugs)
It'll be better if the fix actually compiles.

Previous patches already landed in inbound:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e0e2cc5570ac
https://hg.mozilla.org/integration/mozilla-inbound/rev/91eed468cff8
Attachment #596667 - Attachment is obsolete: true
Attachment #596667 - Flags: review?(bugs)
Attachment #596669 - Flags: review?(bugs)
Attachment #596669 - Flags: review?(bugs) → review+
Are you landing or not?
The second fix also doesn't compile (I'm glad I checked before landing...). I'm going to back out.

(error_bailout assumes rv contains the error value, so the SetCapacity check should set it, that's *another* error)
Attachment #596051 - Attachment is obsolete: true
Attachment #596341 - Attachment is obsolete: true
Attachment #596669 - Attachment is obsolete: true
Attachment #596697 - Flags: review?(dcamp)
https://tbpl.mozilla.org/?tree=Try&rev=b25710891998

I also did some manual checking that this doesn't break the protection. Our testsuite doesn't actually exercise the "update online from Google" path, and I think the first patch here actually did break that. Good that it was caught...
Attachment #596699 - Flags: review?(dcamp)
Attachment #596697 - Flags: review?(dcamp) → review+
Attachment #596699 - Flags: review?(dcamp) → review+
https://hg.mozilla.org/mozilla-central/rev/a010dcf1a973
https://hg.mozilla.org/mozilla-central/rev/35bf0d62cc30
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 13
[Approval Request Comment]
Backout due to bug 744993:

a01cf079ee0b Bug 730247
1a6d008acb4f Bug 729928
f8bf3795b851 Bug 729640
35bf0d62cc30 Bug 726002
a010dcf1a973 Bug 726002
e9291f227d63 Bug 725597
db52b4916cde Bug 673470
173f90d397a8 Bug 673470
Attachment #616611 - Flags: approval-mozilla-central?
Attachment #616611 - Flags: approval-mozilla-aurora?
Attachment #616612 - Flags: approval-mozilla-central?
Attachment #616612 - Flags: approval-mozilla-aurora?
Attachment #616611 - Flags: approval-mozilla-central? → approval-mozilla-central+
Attachment #616612 - Flags: approval-mozilla-central? → approval-mozilla-central+
Attachment #616611 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Attachment #616612 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Comment on attachment 616611 [details] [diff] [review]
Patch. Backout part 1

Sorry - thought this bug was reopened because the backout was backed out. My mistake. Approved for Aurora 13 (or Beta 13 if the merge occurs before we land).
Attachment #616611 - Flags: approval-mozilla-aurora- → approval-mozilla-aurora+
Attachment #616612 - Flags: approval-mozilla-aurora- → approval-mozilla-aurora+
https://hg.mozilla.org/mozilla-central/rev/3dbf00bb0d70
https://hg.mozilla.org/mozilla-central/rev/aa57f7757c97
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Target Milestone: Firefox 13 → Firefox 17
Product: Firefox → Toolkit
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: