Closed Bug 1552900 Opened 5 years ago Closed 8 months ago

ICU OOM error handling is broken in some cases

Categories

(Core :: JavaScript: Internationalization API, defect, P2)

defect

Tracking

()

RESOLVED FIXED
118 Branch
Tracking Status
firefox118 --- fixed

People

(Reporter: jorendorff, Assigned: dminor)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file)

Many places within ICU fail to check for malloc returning null, leading to undefined behavior. We get occasional crash reports (see this bug's dependency tree).

In particular see bug 1495244 comment 5 for anba's guess that this needs to be fixed in "dozens of places" in ICU.

What to do about crashing bugs in a hard dependency? Two thoughts:

  1. Maybe we should just fix them all upstream, or pay someone else to do it.

  2. We could decide that returning nullptr from our allocation function to ICU is simply too dangerous—we can't fix ICU and can't trust it to handle nullptr correctly—and therefore MOZ_CRASH on OOM instead.

  3. Or, for less crashing: Do option #2, but also establish some kind of quota/ballast. In our allocation function, if the quota is spent / the ballast is tapped, then allocate anyway (so as to return a valid pointer), but set an icu_oom flag somewhere. After the ICU function returns, check the flag and report OOM ourselves. If ICU manages to burn through the whole ballast, or actually get malloc to fail, fine, we crash. This does an end run around ICU's error checking. It seems a bit extreme though, and "now you have two problems" (our after-action checks could just as easily have bugs of omission).

Priority: P3 → P2
Severity: normal → S3
Depends on: 1806246

This is an ongoing problem. It's not practical for us to fix everything upstream, and the issues we have opened upstream seem to have stalled [1] [2].

I think it would be interesting to try option #2, and MOZ_CRASH on OOM in ICUReporter [3]. We could try this for Nightly only builds to begin with, and monitor for a spike in crashes. Since these problems are low volume in crash reports, hopefully we won't see a dramatic increase with this change. Crashing isn't ideal, but it's better than potential security problems that could result from getting in an inconsistent state in ICU.

[1] https://unicode-org.atlassian.net/browse/ICU-20367
[2] https://unicode-org.atlassian.net/browse/ICU-22057
[3] https://searchfox.org/mozilla-central/rev/892475f3ba2b959aeaef19d1d8602494e3f2ae32/xpcom/build/XPCOMInit.cpp#165-171

We're seeing inconsistent handling of OOMs in the ICU library. This
patch changes the behaviour to crash on OOM rather than allowing
ICU to handle the allocation failure. The inconsistent handling in ICU
could lead to ICU being in an inconsistent state which could in turn
cause security problems. The safer alternative is to crash, but it's
possible this will lead to too high of crash rate. For now, we'll try
this on Nightly only and monitor crash reports to see what impact this
change has.

Assignee: nobody → dminor
Status: NEW → ASSIGNED

I ran a small try job here: https://treeherder.mozilla.org/jobs?repo=try&revision=f7e49f0b7e1f540a1e67d8d4809f432eef508c35. It's hard to predict offhand what tests this could affect, so I'm going to try it on autoland rather than run a huge try job.

Pushed by dminor@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0fa9348134da
Crash on OOM in ICU on Nightly builds; r=jandem,platform-i18n-reviewers
Status: ASSIGNED → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED
Target Milestone: --- → 118 Branch
Blocks: 1858496
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: