Open
Bug 810718
Opened 12 years ago
Updated 2 years ago
Enforce pldhash RECURSION_LEVEL checks dynamically: abort on mutating a hash during iteration
Categories
(Core :: XPCOM, enhancement)
Core
XPCOM
Tracking
()
NEW
People
(Reporter: jruderman, Unassigned)
References
Details
(Keywords: sec-want)
Attachments
(2 files, 2 obsolete files)
15.53 KB,
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
3.31 KB,
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
A search for "ALL su,desc:RECURSION_LEVEL" gives me 57 bugs. And quite a few of them have been identified as security bugs -- recent examples include bug 786111 and bug 803093. (Are they all security bugs??)
> https://bugzilla.mozilla.org/buglist.cgi?bugidtype=include;order=bugs.bug_id;bug_id=803093,803093,802982,802982,786111,786111,784253,784253,779829,779829,779828,779828,779471,779471,764122,764122,755290,755290,733014,733014,731191,731191,714420,714420,705961,705961,700984,700984,697657,697657,696333,696333,681170,681170,669112,669112,668989,668989,660440,660440,651615,651615,647938,647938,647490,647490,646395,646395,637214,637214,634578,634578,631772,631772,629861,629861,627985,627985,624268,624268,615149,615149,614480,614480,613564,613564,605953,605953,605457,605457,590494,590494,580790,580790,528518,528518,525161,525161,517363,517363,501934,501934,478336,478336,475976,475976,472909,472909,469004,469004,463289,463289,456272,456272,449233,449233,439475,439475,437325,437325,421671,421671,415192,415192,398466,398466,398083,398083,378866,378866,368528,368528,334180,334180
How expensive would it be to turn RECURSION_LEVEL failures into aborts in non-debug builds? I'm thinking that hash tables are already fairly expensive so it wouldn't be terrible.
(In Rust you can enforce "no mutation during iteration" statically most of the time, but I'm guessing that's a lost cause for pldhash in C++.)
My fuzzer is currently ignoring these assertions because of bug 705961. But even with that fixed I wouldn't want to rely on fuzzing to catch everything.
Comment 1•12 years ago
|
||
Try it on nightly, but make it configurable? Make sure that we can find these in crash-stats easily by adding a comment/annotation. I bet things will start showing up and we'll have a better sense of the size of the problem.
There are two ways to hit this assertion: one is to use the hash from multiple threads. The other is to actually recursively modify during iteration on the main thread. I'm sure both will show up.
Reporter | ||
Comment 2•12 years ago
|
||
Configurable how? I think pldhash is too low-level to rely on the prefs service.
The failure path can't call AppendAppNotesToCrashReport, because that function can't be called off the main thread in content processes (!?). So I guess the failure path should call a MOZ_NEVER_INLINE function, whose name will appear at the top of crash reports. (See also bug 788069.)
By the way, these crashes will probably be easier to find in crash-stats than the crashes caused by *not* having dynamic checks!
Since these hash tables can be used from multiple threads, ++RECURSION_LEVEL should become PR_ATOMIC_INCREMENT(RECUSION_LEVEL).
Comment 3•12 years ago
|
||
> Configurable how?
If the goal is to turn it on only for Nightly, or only for non-release builds, we can just have a build-time pref...
Comment 4•12 years ago
|
||
NS_RUNTIMEABORT will annotate correctly (when it can), so let's just use that.
Comment 5•12 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #4)
> NS_RUNTIMEABORT will annotate correctly (when it can), so let's just use
> that.
Is there some alternative?
We're trying to use pldhash inside newdmd (bug 717853). For weird reasons, newdmd has to live outside libxul. But we want a hashtable (and STL's unordered_map is C++11-only), so we're re-compiling pldhash.cpp for newdmd.
If pldhash.cpp calls NS_RUNTIMEABORT, we have to re-compile nsDebugImpl.cpp (for NS_DebugBreak), and that causes us to pull other stuff in...it's a total mess.
In fact, we'd like to remove the NS_RUNTIMEABORT calls from nsTHashtable as well, for the same reason.
I guess MOZ_CRASH doesn't work for you, because it doesn't annotate properly?
Comment 6•12 years ago
|
||
Gah. You could of course add another set of ifdefs specifically for the DMD case and use NS_RUNTIMEABORT when available and MOZ_CRASH when not. We'd only be compiling pldhash.cpp *5* times per build then! ;-)
But also, what platforms are we trying to build DMD on which don't have an available unordered_map?
Comment 7•12 years ago
|
||
> But also, what platforms are we trying to build DMD on which don't have an available
> unordered_map?
Huh, I did not expect that to work on bionic, but it seems to. I'll ask in the bug if we can use that. There's something to be said for using the thing we're familiar with, though...
Comment 8•12 years ago
|
||
FWIW I think we're going to use unordered_map or the JS hashtables, so feel free to add whatever to pldhash.
Comment 9•11 years ago
|
||
Update:
- DMD ended up using the JS hash tables, so that's not an issue any more.
- Bug 815467 will cause |recursionLevel| to be present in non-debug builds (at no space cost!).
So this bug should be easy to do now. bsmedberg, should I change lines like this:
MOZ_ASSERT(RECURSION_LEVEL_SAFE_TO_FINISH(table));
to this:
if (!RECURSION_LEVEL_SAFE_TO_FINISH(table)) {
NS_RUNTIMEABORT("recursion level error in pldhash");
}
? (If so, I'll probably wrap that up in a macro for conciseness.)
(And Jesse's point about using PR_ATOMIC_INCREMENT() is a good one.)
Comment 10•11 years ago
|
||
> MOZ_ASSERT(RECURSION_LEVEL_SAFE_TO_FINISH(table));
>
> to this:
>
> if (!RECURSION_LEVEL_SAFE_TO_FINISH(table)) {
> NS_RUNTIMEABORT("recursion level error in pldhash");
> }
Yes.
> (And Jesse's point about using PR_ATOMIC_INCREMENT() is a good one.)
I'm not sure what that means, actually. Since the hashtables themselves aren't threadsafe, any access needs to be lock-protected, and that should be enough synchronization that we don't also need to use atomics here.
Flags: needinfo?(benjamin)
Comment 11•11 years ago
|
||
Attachment #825640 -
Flags: review?(benjamin)
Updated•11 years ago
|
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
Comment 12•11 years ago
|
||
Just some minor comment tweaks.
Attachment #825740 -
Flags: review?(benjamin)
Updated•11 years ago
|
Attachment #825640 -
Attachment is obsolete: true
Attachment #825640 -
Flags: review?(benjamin)
Reporter | ||
Comment 13•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #10)
> > (And Jesse's point about using PR_ATOMIC_INCREMENT() is a good one.)
>
> I'm not sure what that means, actually. Since the hashtables themselves
> aren't threadsafe, any access needs to be lock-protected, and that should be
> enough synchronization that we don't also need to use atomics here.
Are there cases where multiple threads are reading, but none are writing, that might not currently be protected by locks?
Comment 14•11 years ago
|
||
I cannot think of a case like that. It would basically have to be an initialized table that is never touched subsequently.
Comment 15•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #14)
> I cannot think of a case like that. It would basically have to be an
> initialized table that is never touched subsequently.
Never touched? What if it's filled in one thread and then read again in multiple threads? E.g. some kind of atoms table.
Comment 16•11 years ago
|
||
As far as I know we don't have any init-only hashtables which are then frozen and read from multiple threads. The XPT and atom tables are all runtime-modifiable.
Comment 17•11 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #15)
> (In reply to Benjamin Smedberg [:bsmedberg] from comment #14)
> > I cannot think of a case like that. It would basically have to be an
> > initialized table that is never touched subsequently.
>
> Never touched? What if it's filled in one thread and then read again in
> multiple threads? E.g. some kind of atoms table.
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/Telemetry.cpp#959
is one such case.
Comment 18•11 years ago
|
||
Comment on attachment 825740 [details] [diff] [review]
Enforce pldhash RECURSION_LEVEL checks in all builds and abort on failure.
Given my hypothetical example and froydnj's actual example, I will update my patch to use atomic arithmetic.
Attachment #825740 -
Flags: review?(benjamin)
Comment 19•11 years ago
|
||
This is just like the previous patch, but |recursionLevel| is now Atomic<>.
Because Atomic<uint16_t> isn't possible, I had to make it Atomic<uint32_t>,
which increases sizeof(PLDHashTable). If this patch gets r+, I will write a
follow-up patch that reduces it by shrinking |entrySize| to 16 bits.
Attachment #829004 -
Flags: review?(benjamin)
Updated•11 years ago
|
Attachment #825740 -
Attachment is obsolete: true
Updated•11 years ago
|
Attachment #829004 -
Flags: review?(benjamin) → review+
Comment 20•11 years ago
|
||
Attempting to create a pldhash with an entrySize greater than 64KB is
sufficiently unlikely that I decided to just crash instead of returning
failure.
Attachment #832005 -
Flags: review?(benjamin)
Updated•11 years ago
|
Attachment #832005 -
Flags: review?(benjamin) → review+
Comment 21•11 years ago
|
||
Comment 22•11 years ago
|
||
Alas...
Regression: Mozilla-Inbound - Number of Constructors - CentOS release 5 (Final) - 3.17% increase
------------------------------------------------------------------------------------------------
Previous: avg 126.000 stddev 0.000 of 12 runs up to revision 8b676ce79d10
New : avg 130.000 stddev 0.000 of 12 runs since revision db7cbf61b001
Change : +4.000 (3.17% / z=0.000)
Graph : http://mzl.la/17EidKt
I think I changed PLDHashTable from a POD to a non-POD type by adding the Atomic<> member. And we have several global PDLHashTable variables. I'm not sure how to fix this.
Comment 23•11 years ago
|
||
Backed out:
https://hg.mozilla.org/integration/mozilla-inbound/rev/18d9216b6677
At the time of writing, it has caused the following additional performance regressions:
Regression: Mozilla-Inbound - Number of Constructors - CentOS (x86_64) release 5 (Final) - 3.17% increase
---------------------------------------------------------------------------------------------------------
Previous: avg 126.000 stddev 0.000 of 12 runs up to revision 8b676ce79d10
New : avg 130.000 stddev 0.000 of 12 runs since revision db7cbf61b001
Change : +4.000 (3.17% / z=0.000)
Graph : http://mzl.la/17Eks0f
Regression: Mozilla-Inbound-Non-PGO - a11y Row Major MozAfterPaint - Ubuntu HW 12.04 x64 - 2.7% increase
--------------------------------------------------------------------------------------------------------
Previous: avg 287.208 stddev 2.624 of 12 runs up to revision 8b676ce79d10
New : avg 294.958 stddev 1.602 of 12 runs since revision db7cbf61b001
Change : +7.750 (2.7% / z=2.954)
Graph : http://mzl.la/17EmAFe
Regression: Mozilla-Inbound - Tp5 Optimized - MacOSX 10.8 - 7.49% increase
--------------------------------------------------------------------------
Previous: avg 232.476 stddev 5.161 of 12 runs up to revision 8b676ce79d10
New : avg 249.885 stddev 5.640 of 12 runs since revision db7cbf61b001
Change : +17.409 (7.49% / z=3.373)
Graph : http://mzl.la/18HArFQ
Regression: Mozilla-Inbound - a11y Row Major MozAfterPaint - MacOSX 10.8 - 19% increase
---------------------------------------------------------------------------------------
Previous: avg 198.375 stddev 13.046 of 12 runs up to revision 8b676ce79d10
New : avg 236.125 stddev 6.314 of 12 runs since revision db7cbf61b001
Change : +37.750 (19% / z=2.894)
Graph : http://mzl.la/18HAxgC
Regression: Mozilla-Inbound - a11y Row Major MozAfterPaint - MacOSX 10.7 - 6.83% increase
-----------------------------------------------------------------------------------------
Previous: avg 323.917 stddev 2.193 of 12 runs up to revision 8b676ce79d10
New : avg 346.042 stddev 1.544 of 12 runs since revision e24e7d00b56d
Change : +22.125 (6.83% / z=10.087)
Graph : http://mzl.la/18HCA4d
Comment 24•11 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #22)
> Alas...
>
> Regression: Mozilla-Inbound - Number of Constructors - CentOS release 5
> (Final) - 3.17% increase
> -----------------------------------------------------------------------------
> -------------------
> Previous: avg 126.000 stddev 0.000 of 12 runs up to revision 8b676ce79d10
> New : avg 130.000 stddev 0.000 of 12 runs since revision db7cbf61b001
> Change : +4.000 (3.17% / z=0.000)
> Graph : http://mzl.la/17EidKt
>
> I think I changed PLDHashTable from a POD to a non-POD type by adding the
> Atomic<> member. And we have several global PDLHashTable variables. I'm
> not sure how to fix this.
Atomic<> globals should be fine because of the constructors being constexpr. It wouldn't suprise me if you could fix this by just adding a constexpr default ctor to pldhash (compiler generated ones aren't constexpr, and for some compilers that actually matters...)
Comment 25•11 years ago
|
||
Just for completeness:
Regression: Mozilla-Inbound - Tp5 Optimized - MacOSX 10.6 (rev4) - 2.59% increase
---------------------------------------------------------------------------------
Previous: avg 311.291 stddev 1.784 of 12 runs up to revision 8b676ce79d10
New : avg 319.352 stddev 0.853 of 12 runs since revision 78fd5daacc01
Change : +8.062 (2.59% / z=4.518)
Graph : http://mzl.la/182te8M
Regression: Mozilla-Inbound - SVG, Opacity Row Major - MacOSX 10.6 (rev4) - 2.47% increase
------------------------------------------------------------------------------------------
Previous: avg 416.000 stddev 2.836 of 12 runs up to revision 8b676ce79d10
New : avg 426.292 stddev 3.071 of 12 runs since revision 78fd5daacc01
Change : +10.292 (2.47% / z=3.628)
Graph : http://mzl.la/17EJaxE
Regression: Mozilla-Inbound - a11y Row Major MozAfterPaint - MacOSX 10.6 (rev4) - 6.82% increase
------------------------------------------------------------------------------------------------
Previous: avg 325.083 stddev 1.917 of 12 runs up to revision 8b676ce79d10
New : avg 347.250 stddev 2.624 of 12 runs since revision 78fd5daacc01
Change : +22.167 (6.82% / z=11.564)
Graph : http://mzl.la/17EHbJC
Comment 26•11 years ago
|
||
Happily, I got nine "Improvement" emails after backing out, which matches the nine "Regression" emails from the original landing.
I don't see how to proceed further here. Even if the constructors issue can be worked out (as per comment 24), there are still the other regressions, which presumably(?) came from the extra recursionLevel updates.
Updated•11 years ago
|
Assignee: n.nethercote → nobody
Comment 28•6 years ago
|
||
No assignee, updating the status.
Comment 29•6 years ago
|
||
No assignee, updating the status.
Comment 30•6 years ago
|
||
No assignee, updating the status.
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•