Closed Bug 1376410 Opened 7 years ago Closed 7 years ago

Crash in OOM | large | NS_ABORT_OOM | nsACString::Replace

Categories

(Toolkit :: Safe Browsing, defect, P1)

All
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 + fixed
firefox57 --- fixed

People

(Reporter: baffclan, Assigned: tnguyen)

References

Details

(Keywords: crash, regression, Whiteboard: #sbv4-m9)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-c47fb23e-cc7e-468e-8bbd-b7f460170627.
=============================================================

Crashing Thread (46), Name: Classifier Update
Frame 	Module 	Signature 	Source
0 	xul.dll 	NS_ABORT_OOM(unsigned int) 	xpcom/base/nsDebugImpl.cpp:610
1 	xul.dll 	nsACString::Replace(unsigned int, unsigned int, char const*, unsigned int) 	xpcom/string/nsTSubstring.cpp:564
2 	xul.dll 	mozilla::safebrowsing::AppendPrefixToMap 	toolkit/components/url-classifier/LookupCacheV4.cpp:195
3 	xul.dll 	mozilla::safebrowsing::LookupCacheV4::ApplyUpdate(mozilla::safebrowsing::TableUpdateV4*, nsClassHashtable<nsUint32HashKey, nsCString>&, nsClassHashtable<nsUint32HashKey, nsCString>&) 	toolkit/components/url-classifier/LookupCacheV4.cpp:287
4 	xul.dll 	mozilla::safebrowsing::Classifier::UpdateTableV4(nsTArray<mozilla::safebrowsing::TableUpdate*>*, nsACString const&) 	toolkit/components/url-classifier/Classifier.cpp:1358
5 	xul.dll 	mozilla::safebrowsing::Classifier::ApplyUpdatesBackground(nsTArray<mozilla::safebrowsing::TableUpdate*>*, nsACString&) 	toolkit/components/url-classifier/Classifier.cpp:829
6 	xul.dll 	<lambda_b69fb69cf3ba961377d01f45253c0d68>::operator() 	toolkit/components/url-classifier/Classifier.cpp:746
7 	xul.dll 	mozilla::detail::RunnableFunction<<lambda_b69fb69cf3ba961377d01f45253c0d68> >::Run() 	obj-firefox/dist/include/nsThreadUtils.h:461
8 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1428
9 	xul.dll 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/threads/nsThreadUtils.cpp:477
10 	xul.dll 	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:368
11 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:313
12 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:293
13 	xul.dll 	nsThread::ThreadFunc(void*) 	xpcom/threads/nsThread.cpp:503
14 	nss3.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:397
15 	nss3.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c:95
16 	ucrtbase.dll 	thread_start<unsigned int (__stdcall*)(void*)> 	
17 	kernel32.dll 	BaseThreadInitThunk 	
18 	mozglue.dll 	patched_BaseThreadInitThunk 	mozglue/build/WindowsDllBlocklist.cpp:849
19 	ntdll.dll 	__RtlUserThreadStart 	
20 	ntdll.dll 	_RtlUserThreadStart 	


Application Basics:
Name: Firefox
Version: 56.0a1
Build ID: 20170621102301
Update Channel: nightly
User Agent: Mozilla/5.0 (Windows NT 10.0; rv:56.0) Gecko/20100101 Firefox/56.0
OS: Windows_NT 10.0
this looks related to safebrowsing v4.
Component: General → Safe Browsing
Flags: needinfo?(francois)
Keywords: regression
OS: Windows 10 → Windows
Hardware: x86 → All
I took a glance at dump and it's interesting that 
https://hg.mozilla.org/releases/mozilla-beta/annotate/da6760885a24/toolkit/components/url-classifier/LookupCacheV4.cpp#l195
the length is quite big (4194292).
If there's an oom, basically we can avoid OOM and return error (we did handle oom when updating SB list).
But if there's an error in our logic, we have to find and fix that.
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #2)
> I took a glance at dump and it's interesting that 
> https://hg.mozilla.org/releases/mozilla-beta/annotate/da6760885a24/toolkit/
> components/url-classifier/LookupCacheV4.cpp#l195
> the length is quite big (4194292).

If there's a way to chunk that large allocation into smaller pieces, that would be a good option too.
Flags: needinfo?(francois)
Priority: -- → P1
Assignee: nobody → tnguyen
Status: NEW → ASSIGNED
The legal v4 prefix should range from 4 to 32 so it's definitely not a real OOM issue.
(In reply to Henry Chang [:hchang] from comment #4)
> The legal v4 prefix should range from 4 to 32 so it's definitely not a real
> OOM issue.

Oops my fault!

static void
AppendPrefixToMap(PrefixStringMap& prefixes, nsDependentCSubstring& prefix)
{
  if (!prefix.Length()) {
    return;
  }

  nsCString* prefixString = prefixes.LookupOrAdd(prefix.Length());
  prefixString->Append(prefix.BeginReading(), prefix.Length());
}

If the OOM came from |prefixString|, it might be not that unreasonable 
since |prefixString| is supposed to be a lengthy string to store all 
same-length prefixes. 4194292 means we have ~1M 4-byte prefix which is
not so ridiculous.

-rw-r--r--@ 1 henry  staff       67 Aug 11 14:40 goog-badbinurl-proto.metadata
-rw-r--r--@ 1 henry  staff    73822 Aug 11 14:40 goog-badbinurl-proto.pset
-rw-r--r--@ 1 henry  staff       65 Aug 11 14:40 goog-downloadwhite-proto.metadata
-rw-r--r--@ 1 henry  staff    15929 Aug 11 14:40 goog-downloadwhite-proto.pset
-rw-r--r--@ 1 henry  staff       67 Aug 11 14:40 goog-malware-proto.metadata
-rw-r--r--@ 1 henry  staff  1113028 Aug 11 14:40 goog-malware-proto.pset
-rw-rw-r--@ 1 henry  staff       67 Aug 11 14:40 goog-unwanted-proto.metadata
-rw-rw-r--@ 1 henry  staff   350678 Aug 11 14:40 goog-unwanted-proto.pset

On mac the typical blacklist size is around 1MB (up to 0.25M prefixes) but
I don't know how big it would be on windows. (Also we need to consider the
string internal buffer realloc growing factor.)
Per offline discussion with Thomas, the OOM came from unexpected lengthy prefix.Length() 
(should range from 4 to 32 but appeared to be 4194292). So, please ignore my comment 5 ^^"
I took randomly 10 dumps and two of them have wrong prefix.length() (I am using VS2015 to open the dump) at line
https://hg.mozilla.org/releases/mozilla-beta/annotate/da6760885a24/toolkit/components/url-classifier/LookupCacheV4.cpp#l195
Anyway, it's safe and necessary to add a condition check to make sure we have correct prefix size (when we are going to load prefixes from file, and updating sb list).
Hi gcp, Francois is on PTO for few fays, could you please take a look?
FWIW, memory seems to be exhausted when the issue occurs
2,047.94 MB (100.0%) -- address-space
├──1,547.57 MB (75.57%) -- commit
│  ├──1,313.98 MB (64.16%) -- private
│  │  ├──1,312.48 MB (64.09%) ── readwrite(segments=1781)
│  │  └──────1.50 MB (00.07%) ++ (6 tiny)
│  ├────156.34 MB (07.63%) -- image
│  │    ├───99.39 MB (04.85%) ── execute-read(segments=132)
│  │    ├───53.89 MB (02.63%) ── readonly(segments=288)
│  │    └────3.05 MB (00.15%) ++ (3 tiny)
│  └─────77.25 MB (03.77%) -- mapped
│        ├──72.80 MB (03.55%) ── readonly(segments=77)
│        └───4.45 MB (00.22%) ── readwrite(segments=26)
├────349.47 MB (17.06%) -- reserved
│    ├──336.11 MB (16.41%) ── private(segments=496)
│    └───13.36 MB (00.65%) ── mapped(segments=4)
└────150.89 MB (07.37%) ── free(segments=315)
....
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #8)
> I took randomly 10 dumps and two of them have wrong prefix.length() (I am
> using VS2015 to open the dump) at line
> https://hg.mozilla.org/releases/mozilla-beta/annotate/da6760885a24/toolkit/
> components/url-classifier/LookupCacheV4.cpp#l195
> Anyway, it's safe and necessary to add a condition check to make sure we
> have correct prefix size (when we are going to load prefixes from file, and
> updating sb list).
> Hi gcp, Francois is on PTO for few fays, could you please take a look?

Relooked and VS2015 showed prefix.mLength as 4194292 at line
https://hg.mozilla.org/releases/mozilla-beta/annotate/da6760885a24/toolkit/components/url-classifier/LookupCacheV4.cpp#l195
But after that it is rolled back to 4 in the line 
http://searchfox.org/mozilla-central/rev/e5b13e6224dbe3182050cf442608c4cb6a8c5c55/xpcom/string/nsTSubstring.cpp#555
So probably it was a potential issue when VS2015 was trying to read register's value. So we only need to care about the OOM case of update
prefixString->Append(prefix.BeginReading(), prefix.Length()); 
In the context of SafeBrowsing, that failing to update due to fallible malloc should be handled.
[Tracking Requested - why for this release]: crash when doing large Safe Browsing V4 updates under low-memory situations
Whiteboard: #sbv4-m9
Attachment #8896189 - Flags: review?(gpascutto)
Comment on attachment 8896189 [details]
Bug 1376410 - Handle OOM when appending prefix to map

https://reviewboard.mozilla.org/r/167476/#review174292

::: toolkit/components/url-classifier/LookupCacheV4.cpp:193
(Diff revision 1)
>  AppendPrefixToMap(PrefixStringMap& prefixes, nsDependentCSubstring& prefix)
>  {
> -  if (!prefix.Length()) {
> -    return;
> +  uint32_t len = prefix.Length();
> +  if (len < PREFIX_SIZE || len > COMPLETE_SIZE) {
> +    return NS_ERROR_FAILURE;
> +  }

IMO this is unnecessary but probably worth given that we haven't be clear the actual root cause.

::: toolkit/components/url-classifier/LookupCacheV4.cpp:291
(Diff revision 1)
>        // If the number of picks from old map matches the removalIndex, then this prefix
>        // will be removed by not merging it to new map.
>        if (removalIndex < removalArray.Length() &&
>            numOldPrefixPicked == removalArray[removalIndex]) {
>          removalIndex++;
> -      } else {
> +      } else if (!smallestOldPrefix.IsEmpty()){

Why do we have to do this check here? Does an empty list make any issue to |AppendPrefixToMap|?

::: toolkit/components/url-classifier/LookupCacheV4.cpp:300
(Diff revision 1)
> +        }
> +
>          UpdateChecksum(crypto, smallestOldPrefix);
>        }
>        smallestOldPrefix.SetLength(0);
> -    } else {
> +    } else if (!smallestAddPrefix.IsEmpty()){

As above. Why do we need this specific check?
Comment on attachment 8896189 [details]
Bug 1376410 - Handle OOM when appending prefix to map

https://reviewboard.mozilla.org/r/167476/#review174292

> Why do we have to do this check here? Does an empty list make any issue to |AppendPrefixToMap|?

The empty list is skipped but not throw explicit update error in the old code logic. I just don't want to change that logic.
Anyway, as discussed, if we dont really need the prefix size check, the code would be reverted back to old logic
The string internal buffer growth rate can be found in [1] so appending 
1M 4-byte prefixes requires up to 22 times re-allocation.

[1] http://searchfox.org/mozilla-central/rev/13148faaa91a1c823a7d68563d9995480e714979/xpcom/string/nsTSubstring.cpp#107
(In reply to Henry Chang [:hchang] from comment #14)
> The string internal buffer growth rate can be found in [1] so appending 
> 1M 4-byte prefixes requires up to 22 times re-allocation.
> 
> [1]
> http://searchfox.org/mozilla-central/rev/
> 13148faaa91a1c823a7d68563d9995480e714979/xpcom/string/nsTSubstring.cpp#107

Yep, that almost happened every time we do partial update SB list V4, particularly goog-phish-proto, goog-malware-proto (I could see it did 18 times realloc). But that's ok for our logic, realloc ~20 times in a loop total > 10^6 appending task.

I was trying to preallocate the output but seems we could not have an exact number (it depends on how many indexes which match removal array). So go back to simpler version.
Henry, could you please take a look?
Comment on attachment 8896189 [details]
Bug 1376410 - Handle OOM when appending prefix to map

https://reviewboard.mozilla.org/r/167476/#review174400

Looks good but I would rather leave only the following changes:

1) return type of |AppendPrefixToMap|
2) Add |fallible| to the use of nsCString::Append
3) return NS_ERROR_OUT_OF_MEMORY if nsCString::Append returns false

to keep the fix as isolated as possible if we are focusing on OOM in this bug.

That said, this is just my personal opinion. You can check if Francois or gcp has different thought.

Anyways, thanks for your investigation on this issue!
Attachment #8896189 - Flags: review?(hchang) → review+
Attachment #8896189 - Flags: review?(francois)
Comment on attachment 8896189 [details]
Bug 1376410 - Handle OOM when appending prefix to map

https://reviewboard.mozilla.org/r/167476/#review174778
Attachment #8896189 - Flags: review?(francois) → review+
hg error in cmd: hg push -r . -f try: pushing to ssh://hg.mozilla.org/try
searching for changes
remote: abort: abandoned transaction found!
remote: (run 'hg recover' to clean up transaction)
abort: stream ended unexpectedly (got 0 bytes, expected 4)
hg error in cmd: hg push -r . -f try: pushing to ssh://hg.mozilla.org/try
searching for changes
remote: abort: abandoned transaction found!
remote: (run 'hg recover' to clean up transaction)
abort: stream ended unexpectedly (got 0 bytes, expected 4)
Track 56+ as the volume of crashes is high.
Keywords: checkin-needed
Comment on attachment 8896189 [details]
Bug 1376410 - Handle OOM when appending prefix to map

Approval Request Comment
[Feature/Bug causing the regression]: Enable SB V4 only in FF 56 
[User impact if declined]: High rate of crash 
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: Checking-in needed
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: No
[Is the change risky?]: No
[Why is the change risky/not risky?]: Fallible of OOM string append
[String changes made/needed]: No
Attachment #8896189 - Flags: approval-mozilla-beta?
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/0368d3403a34
Handle OOM when appending prefix to map r=francois,hchang
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/0368d3403a34
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Comment on attachment 8896189 [details]
Bug 1376410 - Handle OOM when appending prefix to map

The crash doesn't show up much in nightly. So let's uplift and see if this helps in beta. May land for beta 5 or later this week for beta 6.
Attachment #8896189 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #23)
> [Is this code covered by automated tests?]: No
> [Has the fix been verified in Nightly?]: Checking-in needed
> [Needs manual test from QE? If yes, steps to reproduce]: No

Setting qe-verify- based on Thomas's assessment on manual testing needs.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.