Potential OOM in new UrlClassifier.

RESOLVED FIXED in Firefox 17

Status

()

Toolkit
Safe Browsing
RESOLVED FIXED
6 years ago
3 years ago

People

(Reporter: gcp, Assigned: gcp)

Tracking

13 Branch
Firefox 17
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments, 4 obsolete attachments)

(Assignee)

Description

6 years ago
Created attachment 596051 [details] [diff] [review]
Patch 4. Fix a potential OOM in the new code. Reorder telemetry.

The new code from bug 673470 has a path that should've used a fallible array but didn't. Patch attached.

After Bug 719531 landed, I now see URLCLASSIFIER_OOM Telemetry hits. So we're still OOMing here in some cases. It's too bad not crashing here makes us unable to use the extra information Bug 720444 or Bug 716638 would bring. This is a top 20 crasher in 10.0 so I would've liked to understand the underlying issue. 

Taras, does it make sense to add Telemetry for what Bug 720444 gathers? We could then look at the Telemetry submissions with OOM=1 and see if they are really low on memory. I think reordering the Telemetry recording (done in the patch) before the memory allocation will give us information on the allocation size similar to bug 716638 does.
(Assignee)

Updated

6 years ago
Attachment #596051 - Flags: review?(dcamp)
Attachment #596051 - Flags: feedback?(justin.lebar+bug)
(Assignee)

Updated

6 years ago
Assignee: nobody → gpascutto
> Taras, does it make sense to add Telemetry for what Bug 720444 gathers?

Problem is, if we collect this data when telemetry ping runs, it's not necessarily going to reflect the state of things when we ran out of memory here.

Why don't we, in a separate bug, temporarily turn all the relevant fallible TArrays into infallible arrays and then look at the resulting crash reports?
(Assignee)

Comment 2

6 years ago
>Problem is, if we collect this data when telemetry ping runs, it's not necessarily 
>going to reflect the state of things when we ran out of memory here.

Good point.

What about making the OOM not a boolean, but transmit the free-RAM value instead? (I'd guess that will confuse our backend right now, so it'd have to be renamed)

>Why don't we, in a separate bug, temporarily turn all the relevant fallible 
>TArrays into infallible arrays and then look at the resulting crash reports?

There aren't that many OOMs (I see 4 pings so far, which might have been from 1 person for all we know), so we might have to do this on Aurora, and maybe even Beta, then back out again.

That's not very pleasant.
> What about making the OOM not a boolean, but transmit the free-RAM value instead? (I'd guess that 
> will confuse our backend right now, so it'd have to be renamed)

We could do that, but we'd lose the stack trace and data on exactly which TArray operation was causing the failure.  (I guess you could add one histogram for each possible failure point.)

I guess we could send the available-commit-space number and investigate further only if it's abnormally high...
Comment on attachment 596051 [details] [diff] [review]
Patch 4. Fix a potential OOM in the new code. Reorder telemetry.

This looks sane, although I really don't know this code.  Note that mCompletions (used in LookupCache::Build) is an *infallible* TArray.

gcp and I discussed on IRC that it might be possible to reduce the working memory of LookupCache::Build by destroying aAddCompletes immediately after constructing mCompletions.

Indeed, you could go further than that and free aAddCompletes *while* you build mCompletions.  Add an item, remove the corresponding item, and periodically call TArray::Compact().  You could do the same thing in ConstructPrefixSet.
Attachment #596051 - Flags: feedback?(justin.lebar+bug) → feedback+

Updated

6 years ago
Attachment #596051 - Flags: review?(dcamp) → review+
(Assignee)

Comment 5

6 years ago
Created attachment 596341 [details] [diff] [review]
Patch 5. Clear some nsTArrays as quickly as possible

This basically implements the easiest part of jlebar's suggestion. And I found another place where we keep a temporary big array longer than needed.

Will deal with OOM telemetry in a separate bug.

https://tbpl.mozilla.org/?tree=Try&rev=3fadf2578425
Attachment #596341 - Flags: review?(dcamp)

Updated

6 years ago
Attachment #596341 - Flags: review?(dcamp) → review+
Comment on attachment 596051 [details] [diff] [review]
Patch 4. Fix a potential OOM in the new code. Reorder telemetry.

Funnily enough, nsTArray::SetCapacity returns a boolean. Please backout.
Attachment #596051 - Attachment is patch: true
Attachment #596051 - Flags: review-
(Assignee)

Comment 7

6 years ago
Created attachment 596667 [details] [diff] [review]
Patch 6. Followup for wrong return value.

This is a trivial fix - let's do a followup instead.
Attachment #596667 - Flags: review?(bugs)
(Assignee)

Comment 8

6 years ago
Created attachment 596669 [details] [diff] [review]
Patch 6. Followup for wrong return value.

It'll be better if the fix actually compiles.

Previous patches already landed in inbound:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e0e2cc5570ac
https://hg.mozilla.org/integration/mozilla-inbound/rev/91eed468cff8
Attachment #596667 - Attachment is obsolete: true
Attachment #596667 - Flags: review?(bugs)
Attachment #596669 - Flags: review?(bugs)

Updated

6 years ago
Attachment #596669 - Flags: review?(bugs) → review+
Are you landing or not?
(Assignee)

Comment 10

6 years ago
The second fix also doesn't compile (I'm glad I checked before landing...). I'm going to back out.

(error_bailout assumes rv contains the error value, so the SetCapacity check should set it, that's *another* error)
(Assignee)

Comment 11

6 years ago
Backed out:

https://hg.mozilla.org/integration/mozilla-inbound/rev/bd04ba7c9996
https://hg.mozilla.org/integration/mozilla-inbound/rev/442b3399ff1a
(Assignee)

Comment 12

6 years ago
Created attachment 596697 [details] [diff] [review]
Patch 1. Fix a potential OOM in the new code. Reorder telemetry.
Attachment #596051 - Attachment is obsolete: true
Attachment #596341 - Attachment is obsolete: true
Attachment #596669 - Attachment is obsolete: true
Attachment #596697 - Flags: review?(dcamp)
(Assignee)

Comment 13

6 years ago
Created attachment 596699 [details] [diff] [review]
Patch 2. Reduce max memory by freeing arrays as early as possible.

https://tbpl.mozilla.org/?tree=Try&rev=b25710891998

I also did some manual checking that this doesn't break the protection. Our testsuite doesn't actually exercise the "update online from Google" path, and I think the first patch here actually did break that. Good that it was caught...
Attachment #596699 - Flags: review?(dcamp)

Updated

6 years ago
Attachment #596697 - Flags: review?(dcamp) → review+

Updated

6 years ago
Attachment #596699 - Flags: review?(dcamp) → review+
(Assignee)

Comment 14

6 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/a010dcf1a973
https://hg.mozilla.org/integration/mozilla-inbound/rev/35bf0d62cc30
https://hg.mozilla.org/mozilla-central/rev/a010dcf1a973
https://hg.mozilla.org/mozilla-central/rev/35bf0d62cc30
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 13
(Assignee)

Comment 16

5 years ago
Created attachment 616611 [details] [diff] [review]
Patch. Backout part 1

[Approval Request Comment]
Backout due to bug 744993:

a01cf079ee0b Bug 730247
1a6d008acb4f Bug 729928
f8bf3795b851 Bug 729640
35bf0d62cc30 Bug 726002
a010dcf1a973 Bug 726002
e9291f227d63 Bug 725597
db52b4916cde Bug 673470
173f90d397a8 Bug 673470
Attachment #616611 - Flags: approval-mozilla-central?
Attachment #616611 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 17

5 years ago
Created attachment 616612 [details] [diff] [review]
Patch. Backout part 2
Attachment #616612 - Flags: approval-mozilla-central?
Attachment #616612 - Flags: approval-mozilla-aurora?
Attachment #616611 - Flags: approval-mozilla-central? → approval-mozilla-central+
Attachment #616612 - Flags: approval-mozilla-central? → approval-mozilla-central+
(Assignee)

Comment 18

5 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/984300ab1282
https://hg.mozilla.org/integration/mozilla-inbound/rev/e5af6f193e2a
https://hg.mozilla.org/mozilla-central/rev/984300ab1282
https://hg.mozilla.org/mozilla-central/rev/e5af6f193e2a
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Updated

5 years ago
Attachment #616611 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-

Updated

5 years ago
Attachment #616612 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Comment on attachment 616611 [details] [diff] [review]
Patch. Backout part 1

Sorry - thought this bug was reopened because the backout was backed out. My mistake. Approved for Aurora 13 (or Beta 13 if the merge occurs before we land).
Attachment #616611 - Flags: approval-mozilla-aurora- → approval-mozilla-aurora+

Updated

5 years ago
Attachment #616612 - Flags: approval-mozilla-aurora- → approval-mozilla-aurora+
(Assignee)

Comment 21

5 years ago
https://hg.mozilla.org/releases/mozilla-beta/rev/cfd1455c6330
https://hg.mozilla.org/releases/mozilla-beta/rev/c70e9dd2057b
(Assignee)

Comment 22

5 years ago
Relanding after fixes in bug 673470 to fix bug 744993.
https://hg.mozilla.org/integration/mozilla-inbound/rev/3dbf00bb0d70
https://hg.mozilla.org/integration/mozilla-inbound/rev/aa57f7757c97
https://hg.mozilla.org/mozilla-central/rev/3dbf00bb0d70
https://hg.mozilla.org/mozilla-central/rev/aa57f7757c97
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: Firefox 13 → Firefox 17
Component: Phishing Protection → Phishing Protection
Product: Firefox → Toolkit
You need to log in before you can comment on or make changes to this bug.