Closed Bug 917918 Opened 11 years ago Closed 9 years ago

overflow-prone binary search algorithms

Categories

(Core :: MFBT, defect, P2)

defect
Points:
13

Tracking

()

RESOLVED FIXED
Tracking Status
firefox32 --- wontfix
firefox33 --- affected
firefox34 --- affected
firefox35 + fixed
firefox-esr24 --- wontfix

People

(Reporter: sunfish, Assigned: gfritzsche)

References

Details

(Keywords: sec-audit, Whiteboard: [adv-main35-])

Attachments

(9 files, 12 obsolete files)

8.89 KB, patch
Waldo
: review+
gfritzsche
: checkin+
Details | Diff | Splinter Review
10.84 KB, text/plain
Details
7.50 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
2.93 KB, text/plain
Details
42.64 KB, patch
gfritzsche
: review+
gfritzsche
: checkin+
Details | Diff | Splinter Review
2.24 KB, text/x-c++src
Details
7.18 KB, patch
gfritzsche
: review+
gfritzsche
: checkin+
Details | Diff | Splinter Review
18.26 KB, patch
gfritzsche
: review+
gfritzsche
: checkin+
Details | Diff | Splinter Review
1.43 KB, patch
gfritzsche
: review+
Details | Diff | Splinter Review
While working on bug 917841, I was curious what I'd find grepping all of mozilla-central for code patterns which look like the overflow-prone binary search pattern [0]. Here's an incomplete list: https://hg.mozilla.org/mozilla-central/file/c486dec6e968/layout/base/nsGenConList.cpp#l144 https://hg.mozilla.org/mozilla-central/file/c486dec6e968/layout/xul/base/src/nsSprocketLayout.cpp#l959 https://hg.mozilla.org/mozilla-central/file/c486dec6e968/parser/htmlparser/src/nsParser.cpp#l723 https://hg.mozilla.org/mozilla-central/file/c486dec6e968/widget/shared/x11/keysym2ucs.c#l853 https://hg.mozilla.org/mozilla-central/file/c486dec6e968/security/nss/lib/sqlite/sqlite3.c#l132719 https://hg.mozilla.org/mozilla-central/file/c486dec6e968/security/nss/cmd/shlibsign/shlibsign.c#l515 https://hg.mozilla.org/mozilla-central/file/c486dec6e968/accessible/src/base/ARIAMap.cpp#l745 This is not a comprehensive list; this is just what I found with a few minutes of grepping for ad-hoc patterns and scanning through them by hand. These all use the overflow-prone pattern. While some or all of these may actually be safe, none have comments or immediately obvious measures which assure the casual reader of correctness. I am not currently working on patches for any of these; I'm just filing this in case the information is useful. [0] http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html
Flags: sec-review?
Made seccurity-sensitive per bug 917918 comment 3
Group: core-security
This would want to be in mfbt, probably in a BinarySearch.h header or something.
Component: General → MFBT
Benjamin: Is this something for your team as a cross-tree code-quality issue?
Flags: needinfo?(benjamin)
Keywords: sec-audit
Doesn't need a "security review" per se, Dan's already found a problem. We need this assigned to someone to do the clean-up
Flags: sec-review?
Assignee: nobody → georg.fritzsche
Flags: needinfo?(benjamin)
Priority: -- → P2
Attached file Suspects (obsolete) —
Preliminary list, just casually inspected for sorting so far. The affected binary searches in our source don't seem to be too many. We should definitely introduce a common function here though (at least for the C++ parts). There is also 3rd party code affected, ICU stands out with 23 hits. Also, yay, 2 different sqlite versions in our tree. Do we want to spend time on patches for upstream or just report those without sensible invariants?
Attached patch WIP (obsolete) — Splinter Review
Jeff, how does this BinarySearch.h approach look to you in principle? We can't use lambdas yet (still need to support GCC 4.4 for B2G at least), so this is a little uglier then it could be. Note: this is WIP, styling/correctness/etc. not... entirely there yet.
Attachment #8347248 - Flags: feedback?(jwalden+bmo)
(In reply to Georg Fritzsche [:gfritzsche] from comment #7) > We can't use lambdas yet (still need to support GCC 4.4 for B2G at least), > so this is a little uglier then it could be. Hm, looks like using local types for comparators etc. are also out until GCC 4.5.
Comment on attachment 8347248 [details] [diff] [review] WIP Review of attachment 8347248 [details] [diff] [review]: ----------------------------------------------------------------- Yup, the local-class thing was first on my list of questions. :-) This looks broadly pretty reasonable, except that I think probably callers should just pass a dummy themselves, rather than having BinarySearchIf to use. ::: gfx/qcms/transform_util.c @@ +283,5 @@ > // Seems not a degenerated case... apply binary search > > while (r > l) { > > + x = l + ((r - l) / 2); Wonderful. This won't stick out at all when the ultimate patch here lands. :-\ (Versus everything else, where it looks a lot more like commoning up code, and you'd have to check case by case to see any difference.) It looks like this might be fixable by converting those values to size_t, or to unsigned if not that. I'd prefer if we tried something along those lines, maybe smuggled into another patch somehow to disguise this, if we're unwilling to standardize on C++ here (like most of the rest of Mozilla code). ::: gfx/thebes/gfxFontUtils.cpp @@ +1221,5 @@ > // binary search; if not found, set language to ANY and try again > + const size_t where = BinarySearch(gMacFontNameCharsets, 0, > + ArrayLength(gMacFontNameCharsets), > + searchValue); > + if (where != ArrayLength(gMacFontNameCharsets)) { SIZE_MAX seems a better sentinel value to return. No need to compute a custom value for every call to this -- I see a lot of .Count(), ArrayLength, etc. to compare for this, and SIZE_MAX lets you avoid that. ::: mfbt/BinarySearch.h @@ +13,5 @@ > + > +namespace mozilla { > + > +/** > + * Blah blah. This is perhaps okay, but honestly a part of evaluating this is seeing its documentation and seeing how well the documentation reads. Maybe doesn't need to be super-polished on a feedback request, but seems to me if something half-decent can't be quickly written up, with simple examples, that'd indicate a problem with the interface. I'd say count the process of writing that up as *part* of the process of gathering feedback -- just self-feedback. @@ +16,5 @@ > +/** > + * Blah blah. > + */ > + > +template<class Container, class T> typename T, as it won't always be a class. (Often won't be, even.) @@ +19,5 @@ > + > +template<class Container, class T> > +size_t > +BinarySearch(const Container& container, const size_t first, > + const size_t last, const T& value, s/last/ifNotFound/ maybe? @@ +20,5 @@ > +template<class Container, class T> > +size_t > +BinarySearch(const Container& container, const size_t first, > + const size_t last, const T& value, > + size_t& insertionPoint) Generally there's a preference for outparams being pointers, so that the potential for mutation is more obvious. Style-wise, this signature may be more readable as template<class Container, typename T> size_t BinarySearch(const Container& container, const size_t first, const size_t last, const T& value, size_t* insertionPoint) to not split first/last across two lines. I'm not fully convinced the constness of first/last are worth much here, myself, but it certainly doesn't much matter. @@ +30,5 @@ > + > + while (low < high) { > + const size_t mid = low + ((high - low) / 2); > + const T& item = container[mid]; > + Seems to me we should add minimal, cheap, non-complexity-changing assertions about the input data being sorted: MOZ_ASSERT(container[low] <= container[mid]); MOZ_ASSERT(container[mid] <= container[high]); MOZ_ASSERT(container[low] <= container[high]); Just in case. @@ +51,5 @@ > +BinarySearch(const Container& container, const size_t first, > + const size_t last, const T& value) > +{ > + size_t dummy; > + return BinarySearch(container, first, last, value, dummy); I think we should just have one interface instead of two here. Make callers that don't care about the insertion point pass in their own dummy, often named that way. Keep the interface of this header simpler. Then you don't have to wonder how BinarySearchIf differs from BinarySearch, you just see at the call site that |size_t dummy;| is only ever passed by address to one method, and you know you can ignore it. @@ +57,5 @@ > + > +template<class Container, class Comparator> > +size_t > +BinarySearchIf(const Container& container, const size_t first, > + const size_t last, Comparator compare, Rather than multiply-copy this whole algorithm for the comparator/no-comparator case, I wonder if we shouldn't make the comparator argument have a default, so that you can just leave it off if you want the default comparator. Actually: template<class Container, typename U, class Comparator = detail::DefaultComparator<U> > size_t BinarySearch(const Container& container, size_t first, size_t last, const U& value, size_t* insertionPoint) { Comparator c(value); return first + c.compare(); } Or at least, you'd do that if we could rely on function template default arguments. Which we can't, because MSVC 2013. Hmm. Maybe we do have to have two BinarySearch functions. :-( (Could use a class to get default arguments, but then you don't get type parameter inference, which seems worse overall than an extra algorithm copy here.)
Attachment #8347248 - Flags: feedback?(jwalden+bmo) → feedback+
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #9) > ::: gfx/qcms/transform_util.c > @@ +283,5 @@ > > // Seems not a degenerated case... apply binary search > > > > while (r > l) { > > > > + x = l + ((r - l) / 2); > > Wonderful. This won't stick out at all when the ultimate patch here lands. > :-\ (Versus everything else, where it looks a lot more like commoning up > code, and you'd have to check case by case to see any difference.) > > It looks like this might be fixable by converting those values to size_t, or > to unsigned if not that. I'd prefer if we tried something along those > lines, maybe smuggled into another patch somehow to disguise this, if we're > unwilling to standardize on C++ here (like most of the rest of Mozilla code). I'll check. I'm thinking that we can land this in >=2 parts - one for mfbt/BinarySearch ("just some code cleanup" on another bug) and then something sneakier for the rest. My understanding is that qcms tries to stay self-contained C-only. > ::: gfx/thebes/gfxFontUtils.cpp > @@ +1221,5 @@ > > // binary search; if not found, set language to ANY and try again > > + const size_t where = BinarySearch(gMacFontNameCharsets, 0, > > + ArrayLength(gMacFontNameCharsets), > > + searchValue); > > + if (where != ArrayLength(gMacFontNameCharsets)) { > > SIZE_MAX seems a better sentinel value to return. No need to compute a > custom value for every call to this -- I see a lot of .Count(), ArrayLength, > etc. to compare for this, and SIZE_MAX lets you avoid that. > > ::: mfbt/BinarySearch.h > @@ +13,5 @@ > > + > > +namespace mozilla { > > + > > +/** > > + * Blah blah. > > This is perhaps okay, but honestly a part of evaluating this is seeing its > documentation and seeing how well the documentation reads. Maybe doesn't > need to be super-polished on a feedback request, but seems to me if > something half-decent can't be quickly written up, with simple examples, > that'd indicate a problem with the interface. I'd say count the process of > writing that up as *part* of the process of gathering feedback -- just > self-feedback. > > @@ +16,5 @@ > > +/** > > + * Blah blah. > > + */ > > + > > +template<class Container, class T> > > typename T, as it won't always be a class. (Often won't be, even.) > > @@ +19,5 @@ > > + > > +template<class Container, class T> > > +size_t > > +BinarySearch(const Container& container, const size_t first, > > + const size_t last, const T& value, > > s/last/ifNotFound/ maybe? Hm, i don't think that matches the usage here - it's the index range [first,last) like standard library algorithms. > I'm not fully convinced the constness of first/last are worth much here, > myself, but it certainly doesn't much matter. To be honest i just went with the style of some functions in mfbt, happy to drop it. :) > Or at least, you'd do that if we could rely on function template default > arguments. Which we can't, because MSVC 2013. Hmm. Maybe we do have to > have two BinarySearch functions. :-( (Could use a class to get default > arguments, but then you don't get type parameter inference, which seems > worse overall than an extra algorithm copy here.) We could just make BinarySearch() call BinarySearchIf() with a comparator, that gets rid of the duplication and the template default argument. However i was wondering whether that can/will be fully optimized away.
(In reply to Georg Fritzsche [:gfritzsche] from comment #10) > My understanding is that qcms tries to stay self-contained C-only. Yeah, that's what I figured. :-( (Notwithstanding that having everyone on the same page, even if they want to stick to C-ish idioms as a general rule, makes it easier for everyone to contribute, but I digress.) > Hm, i don't think that matches the usage here - it's the index range > [first,last) like standard library algorithms. Er, oops, yeah, I misread the meaning. It does seem to me like the standard library functions returning |last| rather than |SIZE_MAX| or so are a bad idea, tho -- too easy to confuse with being valid indexes. (The insertionPoint bit should be first or last, tho, whichever is correct.) Another possibility might be a boolean return plus another outparam. That at least has the virtue of being |if (BinarySearch(...))| with it being clearer that you're depending on a success condition or so. But I'm mostly just playing with this stuff mentally, might work better or worse in code. > To be honest i just went with the style of some functions in mfbt, happy to > drop it. :) Heh. > We could just make BinarySearch() call BinarySearchIf() with a comparator, > that gets rid of the duplication and the template default argument. That seems like it could work, at a glance. Maybe. (I could think about this more coherently if I were editing it in a text editor, I think -- as it is I get myself into knots trying to think too hard without playing around with it.) > However i was wondering whether that can/will be fully optimized away. With the definitions of all this stuff exposed in compilation units that use it, it should be. In practice, I suspect the call overhead isn't going to matter against the overhead of the search, in any case.
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #11) > Another possibility might be a boolean return plus another outparam. That > at least has the virtue of being |if (BinarySearch(...))| with it being > clearer that you're depending on a success condition or so. Hm, then i think i'll go for boolean return plus one outparam (index of found item or insertion point) - that should cover the concerns and not be too annoying to use.
Flags: sec-bounty?
Status: NEW → ASSIGNED
Iteration: --- → 34.1
QA Whiteboard: [qa?]
Flags: firefox-backlog+
Hi Georg, can you provide a point value and indicate if the bug should be marked as [qa+] or [qa-]
Flags: needinfo?(georg.fritzsche)
13 might be over-estimating, but there were quite a few potentially touchy use-cases across the code-base.
Points: --- → 13
Flags: needinfo?(georg.fritzsche)
Sounds good. Should it be marked as [qa+] or [qa-]? (In reply to Georg Fritzsche [:gfritzsche] from comment #14) > 13 might be over-estimating, but there were quite a few potentially touchy > use-cases across the code-base.
QA Whiteboard: [qa?] → [qa-]
Part of this happened in bug 998507, but that didn't implement the second form/signature required here.
Depends on: 998507
Iteration: 34.1 → 34.2
Catching up here, we still need the binary search algorithm with the comparator. Note that if want to use a single algorith (with a default comparator for the standard BinarySearch()), then we have to drop the ordering assertions as we have to take a unary comparison functor to cover all use-cases.
Attachment #8347248 - Attachment is obsolete: true
Attachment #8474616 - Flags: review?(jwalden+bmo)
Attached file Suspects
I'm looking to fix the issues in our source first, then worry about a follow-up that tracks what we'll do about 3rd-party source.
Attachment #8343276 - Attachment is obsolete: true
Iteration: 34.2 → 34.3
QA Whiteboard: [qa-]
Flags: qe-verify-
Attached patch accessible/ (obsolete) — Splinter Review
Attachment #8478244 - Flags: review?(surkov.alexander)
Attached patch content/ (obsolete) — Splinter Review
Attachment #8478245 - Flags: review?(bugs)
Comment on attachment 8478244 [details] [diff] [review] accessible/ Holding things off for better reviewer choice.
Attachment #8478244 - Flags: review?(surkov.alexander)
Attachment #8478245 - Flags: review?(bugs)
Attachment #8478244 - Attachment is obsolete: true
Comment on attachment 8478245 [details] [diff] [review] content/ Some nits anyhow, >+namespace { >+ struct PositionComparator { this { should be on its own line, and why the namespace { ? >+ Element* const mElement; >+ PositionComparator(Element* const element) : mElement(element) {} s/element/aElement/ Similar also elsewhere
(In reply to Olli Pettay [:smaug] from comment #22) > >+namespace { > >+ struct PositionComparator { > this { should be on its own line, > and why the namespace { ? The namespace is to avoid potential linkage conflicts.
Attachment #8478245 - Attachment is obsolete: true
Attachment #8479078 - Flags: review?(jwalden+bmo)
Attached patch The obvious direct fixups (obsolete) — Splinter Review
The more obvious parts that are directly fixed up. There is parts in dom, layout & xpcom that i should be able to replace with BinarySearch()/BinarySearchIf(), but i haven't had a non-failing try run with those yet.
Attachment #8479085 - Flags: review?(jwalden+bmo)
Comment on attachment 8474616 [details] [diff] [review] Add BinarySearchIf() Review of attachment 8474616 [details] [diff] [review]: ----------------------------------------------------------------- Assuming we're trying to disguise the impact of this, filing a separate open bug to land this, then only citing that in comments and all, might be good. ::: mfbt/BinarySearch.h @@ +13,5 @@ > > namespace mozilla { > > /* > + * The BinarySearch() algorithm searches the given container |aContainer| over the sorted Needs rewrapping to 80ch. @@ +29,5 @@ > * if (BinarySearch(sortedInts, 0, sortedInts.length(), 13, &match)) { > * printf("found 13 at %lu\n", match); > * } > + * > + * The BinarySearchIf() version behaves similar, but takes |aComparator|, Looks like it should be wrapped to 80ch too. @@ +35,5 @@ > + * That functor should take one argument - the value to compare - and > + * return an |int| with the comparison result: > + * * 0, if the argument is equal to, > + * * -1, if the argument is greater than, > + * * 1, if the argument is less than Please put blank lines before/after this bullet list, and indent the bullets two spaces. This all blends into the surrounding text and left margin, as it is now. There's no reason not to say 0 if equal, less than 0 if greater than, +1 if less than, right? Right now the interface claims only -1/0/1 but accepts values outside that range. We should be consistent. @@ +92,5 @@ > return false; > } > > +namespace detail { > + template<class T> Don't indent the contents of the namespace, please. @@ +93,5 @@ > } > > +namespace detail { > + template<class T> > + class BinarySearchDefaultComparator { { on new line. ::: mfbt/tests/TestBinarySearch.cpp @@ +10,5 @@ > > using mozilla::Vector; > using mozilla::BinarySearch; > +using mozilla::BinarySearchIf; > +using mozilla::ArrayLength; Please alphabetize these. @@ +26,5 @@ > explicit GetAge(Vector<Person>& aV) : mV(aV) {} > int operator[](size_t index) const { return mV[index].mAge; } > }; > > +struct RangeFinder { { on new line, please. @@ +47,5 @@ > v1.append(4); > v1.append(6); > v1.append(8); > > + #define A(a) MOZ_RELEASE_ASSERT(a) Just leave these as they were, I think, in terms of A/MOZ_RELEASE_ASSERT. @@ +89,5 @@ > A( BinarySearch(GetAge(v3), 0, v3.length(), 6, &m) && m == 2); > A(!BinarySearch(GetAge(v3), 0, v3.length(), 7, &m) && m == 3); > + > + const int v4[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }; > + const size_t len = ArrayLength(v4); It might be nice to put the BinarySearch tests in one method and the BinarySearchIf methods in another, TestBinarySearch and TestBinarySearchIf. A little nicer for debugging and all. And it accords with what we've generally done in other bugs.
Attachment #8474616 - Flags: review?(jwalden+bmo) → review+
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #26) > Assuming we're trying to disguise the impact of this, filing a separate open > bug to land this, then only citing that in comments and all, might be good. Yes, that was the plan - i'll file one on BinarySearchIf() now and request review there, can you mirror your response there? I'll follow up afterwards with another bug for adopting the mfbt binary searches in more places for the next patch.
Trying here as well - dveditz, do you agree on the "replacements" patch being fine to land in a separate patch without additional cloaking? Any ideas on stealth landing for the "obvious" patch?
Flags: needinfo?(dveditz)
Comment on attachment 8479085 [details] [diff] [review] The obvious direct fixups Review of attachment 8479085 [details] [diff] [review]: ----------------------------------------------------------------- Sadly we'll have to file a separate bug for the 3 security/nss changes since those need to first go into an upstream repo.
(In reply to Georg Fritzsche [:gfritzsche] from comment #28) > Trying here as well - dveditz, do you agree on the "replacements" patch > being fine to land in a separate patch without additional cloaking? Yes. If we can get it landed on Aurora as well right away that'd be great. ("It just missed the merge but we'd like this clean up" maybe?) > Any ideas on stealth landing for the "obvious" patch? We shouldn't land those bits until we can land on all the branches. That means waiting for the BinarySearch bits to make it up to beta and then check in the rest somewhere in the middle of that cycle. We can set appropriate release tracking flags here to remind us to do that.
Flags: needinfo?(dveditz)
Iteration: 34.3 → 35.1
See Also: → 1061701
Attachment #8479078 - Attachment is obsolete: true
Attachment #8479078 - Flags: review?(jwalden+bmo)
Attachment #8482749 - Flags: review?(jwalden+bmo)
Attachment #8479085 - Attachment is obsolete: true
Attachment #8479085 - Flags: review?(jwalden+bmo)
Attachment #8482750 - Flags: review?(jwalden+bmo)
Blocks: 1061701
Comment on attachment 8482749 [details] [diff] [review] BinarySearch()-/BinarySearchIf()-based replacements, rebased Review of attachment 8482749 [details] [diff] [review]: ----------------------------------------------------------------- Sorry for the delay. The comparator interface and returning <0/0/>0 is just horrible. :-( (Although, better it than upteen binary-search implementations inlined individually into callers. Just, reviewing the transition is excruciating.) Wish there were something better. I'll see if I can get a review of the other patch later, um, today. I'm deliberately making a relatively lazy Saturday to rest a slightly tweaked ankle, which should give me some time for the review. ::: content/base/src/nsPlainTextSerializer.cpp @@ +1878,5 @@ > + if (mUcs > combining.last) > + return 1; > + if (mUcs < combining.first) > + return -1; > + return 0; MOZ_ASSERT(combining.first <= mUcs); MOZ_ASSERT(mUcs <= combining.last); for consistency with the old code, which would have, um, ilooped in this case. :-) Presumably that table is consistent in this way, but I didn't look. @@ +1881,5 @@ > + return -1; > + return 0; > + } > +}; > +} // namespace Blank line before this, to match the one at the start of the namespace. ::: content/xul/templates/src/nsXULTreeBuilder.cpp @@ +1076,5 @@ > + nsIXULTemplateResult* const mResult; > + ResultComparator(nsXULTreeBuilder* aTreebuilder, nsIXULTemplateResult* aResult) > + : mTreebuilder(aTreebuilder), mResult(aResult) {} > + int operator()(const nsTreeRows::Row& aSubtree) const { > + return - mTreebuilder->CompareResults(aSubtree.mMatch->mResult, mResult); Negation here is only confusing. :-) Remove it and flip the arguments, please. ::: extensions/spellcheck/src/mozInlineSpellWordUtil.cpp @@ +663,5 @@ > +// Find the last value, if any, such that mSoftTextOffset <= aSoftTextOffset. > +// If not found, returns false, otherwise true and its index in aIndex. > +template<class T> > +bool > +FindLastSmallerOffset(const nsTArray<T>& aContainer, int32_t aSoftTextOffset, int32_t* aIndex) Why are the changes in this file not using BinarySearch or BinarySearchIf? Nothing here particularly looks like it couldn't use BinarySearchIf to determine the ultimate index and found-ness. ::: gfx/thebes/gfxFont.cpp @@ +7489,5 @@ > +{ > + const uint32_t mOffset; > + CharacterOffsetComparator(uint32_t aOffset) : mOffset(aOffset) {} > + int operator()(const gfxTextRun::GlyphRun& aRun) const { > + return (aRun.mCharacterOffset <= mOffset) ? 1 : -1; Reverse the comparison and return values, to be more consistent with the usual "this < argument means cmpReturn < 0, this > argument means cmpReturn > 1" interface description. Probably worth a comment here that because want to find the *first* (last?) glyph run, we never want to return 0 here and stop on a run that matches this offset, because others might appear before/after it. @@ +7506,4 @@ > return mGlyphRuns.Length(); > + } > + > + // Find the last glyph run containing aOffset Uh. FindFirstGlyphRunContaining and "Find the last glyph run containing" do not seem consistent! But this does seem to be what was being done previously. Please file a bug to make this comment and code, and the function name, consistent, or to explain why the "last" glyph run is really "first" here. @@ +7510,5 @@ > + size_t index = 0; > + CharacterOffsetComparator cmp(aOffset); > + bool found = BinarySearchIf(mGlyphRuns, 0, mGlyphRuns.Length(), > + cmp, &index); > + if (!found && index > 0) { Something is weird here. Our comparator never returns 0, so we should never ever find anything here. We should thus have (with first/last adjustment): MOZ_ASSERT(!found, "CharacterOffsetComparator shouldn't have reported a match, " "because it's searching for the {first,last} glyph run"); MOZ_ASSERT(index > 0, "should have found at least one glyph run containing aOffset"); return index - 1; Given the assertion has never failed (judging by no bugs found searching for "mGlyphRuns" in the summary), it seems safe to upgrade the latter to a real assertion. It may be worth renaming |index| to |firstAfter| or so, since it's not finding an item but rather the end of a series of them. ::: gfx/thebes/gfxFontUtils.cpp @@ +1247,5 @@ > + typedef gfxFontUtils::MacFontNameCharsetMapping MacFontNameCharsetMapping; > + const MacFontNameCharsetMapping& mSearchValue; > + MacCharsetMappingComparator(const MacFontNameCharsetMapping& aSearchValue) > + : mSearchValue(aSearchValue) {} > + int operator()(const MacFontNameCharsetMapping& aEntry) const { Rewrite as if (mSearchValue < aEntry) return -1; if (aEntry < mSearchValue) return 1; return 0; to lean on the "this < argument means cmpReturn < 0" and vice versa understanding. Pity we can't have mSearchValue < aEntry for the latter, but I guess that's what having only operator< defined does. @@ +1278,5 @@ > for (uint32_t i = 0; i < 2; ++i) { > + size_t idx; > + bool found = BinarySearchIf(gMacFontNameCharsets, 0, ArrayLength(gMacFontNameCharsets), > + MacCharsetMappingComparator(searchValue), &idx); > + if (found) { Preference for if (BinarySearchIf(...)) { so readers don't have to internalize found-ness. @@ +1284,3 @@ > } > > + // no match, so try again finding one in any language So, this is not strictly identical to what happened before. Before we would search 0 to end, and if we didn't find a match, we'd search from *lo* to end the second time through. Given the size of the table this micro-optimization, if indeed it was one, doesn't seem super-important to preserve. But we should be aware we're discarding it, for sure. ::: gfx/thebes/gfxSkipChars.cpp @@ +11,5 @@ > + const uint32_t mOffset; > + SkippedRangeStartComparator(const uint32_t aOffset) : mOffset(aOffset) {} > + int operator()(const gfxSkipChars::SkippedRange& aRange) const { > + if (mOffset < aRange.Start()) { > + return -1; Ternary expression? @@ +74,5 @@ > + const uint32_t mOffset; > + SkippedRangeOffsetComparator(const uint32_t aOffset) : mOffset(aOffset) {} > + int operator()(const gfxSkipChars::SkippedRange& aRange) const { > + if (mOffset < aRange.SkippedOffset()) { > + return -1; Ternary? ::: intl/locale/nsUConvPropertySearch.cpp @@ -12,5 @@ > int32_t aNumberOfProperties, > const nsACString& aKey, > nsACString& aValue) > { > - const char* key = PromiseFlatCString(aKey).get(); Uh... * A |nsPromiseFlat[C]String| doesn't necessarily own the characters it * promises. You must never use it to promise characters out of a string * with a shorter lifespan. The typical use will be something like this: * * SomeOSFunction( PromiseFlatCString(aCSubstring).get() ); // GOOD * * Here's a BAD use: * * const char* buffer = PromiseFlatCString(aCSubstring).get(); * SomeOSFunction(buffer); // BAD!! |buffer| is a dangling pointer http://hg.mozilla.org/mozilla-central/annotate/25efb1b16a5e/xpcom/string/nsTPromiseFlatString.h#l33 So this is a potential dangling reference. A semi-"safe" one depending whether freeing the string rearranges the heap (and a use of poisoned memory in paranoid builds), because this method doesn't do any allocations or anything, but still a dangling reference. We should fix this by making PropertyComparator take a |const nsACString&| and having it store a PromiseFlatCString, then operator() should use Compare() against nsDependentCString(aProperty[0]) to do the comparison. @@ +27,5 @@ > const nsACString& aKey, > nsACString& aValue) > { > + const nsCString& flat = PromiseFlatCString(aKey); > + size_t index; const nsCString& flat = ..; is probably even worse than the current code, in that the current code is storing a pointer to possibly-freed memory, but this is likely storing a pointer to deallocated *stack* space -- much likelier to be overwritten in subsequent code. ::: intl/unicharutil/nsUnicodeNormalizer.cpp @@ +293,5 @@ > * So we can use binary search. > */ > + size_t idx; > + bool found = mozilla::BinarySearch(SequenceAdaptor(cseq), 0, n, c2, &idx); > + if (found) { if (mozilla::BinarySearch(...)) { ::: layout/generic/nsTextFrame.cpp @@ +140,5 @@ > TabWidthStore::ApplySpacing(gfxTextRun::PropertyProvider::Spacing *aSpacing, > uint32_t aOffset, uint32_t aLength) > { > + const size_t len = mWidths.Length(); > + size_t i = 0; Minor preference for other ordering of these.
Attachment #8482749 - Flags: review?(jwalden+bmo) → review-
Comment on attachment 8482750 [details] [diff] [review] The obvious direct fixups, without security/nss Review of attachment 8482750 [details] [diff] [review]: ----------------------------------------------------------------- So this appears fine enough, without going into detail examining context for every case to be sure binary searches are exactly what's being done. But, what plans do we have to ensure we're updating all the binary search implementations that actually exist? Can we do something like a clang plugin searching for (x + y ) / 2 patterns inside of loops, then manually examining all of them to be sure we haven't missed any? Or something else, I don't know what. A half-fix here is not all that much better than no fix, because an attacker who knows this is a problem only needs to find that one place we missed to take advantage of this. (At least that one place has to be web-reachable to matter, but that's not much comfort to me, honestly.)
Attachment #8482750 - Flags: review?(jwalden+bmo) → review+
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #34) > But, what plans do we have to ensure we're updating all the binary search > implementations that actually exist? Can we do something like a clang > plugin searching for (x + y ) / 2 patterns inside of loops, then manually > examining all of them to be sure we haven't missed any? Or something else, > I don't know what. A half-fix here is not all that much better than no fix, > because an attacker who knows this is a problem only needs to find that one > place we missed to take advantage of this. (At least that one place has to > be web-reachable to matter, but that's not much comfort to me, honestly.) So, i'm not sure if we have a good semantic-aware way to search - that is, without spending time setting up some clang plugin or similar. I've done a regex-search for the expected pattern, but i'm happy for input on improving that approach if it doesn't take a lot of effort. At this point i also want to wrap this up and move additional work to follow-ups where possible.
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #33) > Sorry for the delay. The comparator interface and returning <0/0/>0 is just > horrible. :-( (Although, better it than upteen binary-search > implementations inlined individually into callers. Just, reviewing the > transition is excruciating.) Wish there were something better. Right, but at least we can improve things much easier once we have things more uniform from by this. > ::: content/base/src/nsPlainTextSerializer.cpp > @@ +1878,5 @@ > > + if (mUcs > combining.last) > > + return 1; > > + if (mUcs < combining.first) > > + return -1; > > + return 0; > > MOZ_ASSERT(combining.first <= mUcs); > MOZ_ASSERT(mUcs <= combining.last); > > for consistency with the old code, which would have, um, ilooped in this > case. :-) Presumably that table is consistent in this way, but I didn't > look. > > @@ +1881,5 @@ > > + return -1; > > + return 0; > > + } > > +}; > > +} // namespace > > Blank line before this, to match the one at the start of the namespace. > > ::: content/xul/templates/src/nsXULTreeBuilder.cpp > @@ +1076,5 @@ > > + nsIXULTemplateResult* const mResult; > > + ResultComparator(nsXULTreeBuilder* aTreebuilder, nsIXULTemplateResult* aResult) > > + : mTreebuilder(aTreebuilder), mResult(aResult) {} > > + int operator()(const nsTreeRows::Row& aSubtree) const { > > + return - mTreebuilder->CompareResults(aSubtree.mMatch->mResult, mResult); > > Negation here is only confusing. :-) Remove it and flip the arguments, > please. > > ::: extensions/spellcheck/src/mozInlineSpellWordUtil.cpp > @@ +663,5 @@ > > +// Find the last value, if any, such that mSoftTextOffset <= aSoftTextOffset. > > +// If not found, returns false, otherwise true and its index in aIndex. > > +template<class T> > > +bool > > +FindLastSmallerOffset(const nsTArray<T>& aContainer, int32_t aSoftTextOffset, int32_t* aIndex) > > Why are the changes in this file not using BinarySearch or BinarySearchIf? > Nothing here particularly looks like it couldn't use BinarySearchIf to > determine the ultimate index and found-ness. I don't think we can map the |while (end - start > 1)| ... or i'm missing something obvious here. > @@ +7510,5 @@ > > + size_t index = 0; > > + CharacterOffsetComparator cmp(aOffset); > > + bool found = BinarySearchIf(mGlyphRuns, 0, mGlyphRuns.Length(), > > + cmp, &index); > > + if (!found && index > 0) { > > Something is weird here. Our comparator never returns 0, so we should never > ever find anything here. We should thus have (with first/last adjustment): > > MOZ_ASSERT(!found, > "CharacterOffsetComparator shouldn't have reported a match, " > "because it's searching for the {first,last} glyph run"); > MOZ_ASSERT(index > 0, > "should have found at least one glyph run containing aOffset"); > return index - 1; > > Given the assertion has never failed (judging by no bugs found searching for > "mGlyphRuns" in the summary), it seems safe to upgrade the latter to a real > assertion. Hm, is not finding mGlyphRuns in bugzilla a sufficiently reliable indicator for this being fine? > It may be worth renaming |index| to |firstAfter| or so, since it's not > finding an item but rather the end of a series of them. |index| seems generic enough, no? :) > ::: intl/locale/nsUConvPropertySearch.cpp > @@ -12,5 @@ > > int32_t aNumberOfProperties, > > const nsACString& aKey, > > nsACString& aValue) > > { > > - const char* key = PromiseFlatCString(aKey).get(); > > Uh... > > * A |nsPromiseFlat[C]String| doesn't necessarily own the characters it > * promises. You must never use it to promise characters out of a string > * with a shorter lifespan. The typical use will be something like this: > * > * SomeOSFunction( PromiseFlatCString(aCSubstring).get() ); // GOOD > * > * Here's a BAD use: > * > * const char* buffer = PromiseFlatCString(aCSubstring).get(); > * SomeOSFunction(buffer); // BAD!! |buffer| is a dangling pointer > > http://hg.mozilla.org/mozilla-central/annotate/25efb1b16a5e/xpcom/string/ > nsTPromiseFlatString.h#l33 > > So this is a potential dangling reference. A semi-"safe" one depending > whether freeing the string rearranges the heap (and a use of poisoned memory > in paranoid builds), because this method doesn't do any allocations or > anything, but still a dangling reference. > > @@ +27,5 @@ > > const nsACString& aKey, > > nsACString& aValue) > > { > > + const nsCString& flat = PromiseFlatCString(aKey); > > + size_t index; > > const nsCString& flat = ..; > > is probably even worse than the current code, in that the current code is > storing a pointer to possibly-freed memory, but this is likely storing a > pointer to deallocated *stack* space -- much likelier to be overwritten in > subsequent code. I changed behavior there exactly because of the lifetime issues. The comment there goes on to say: * ``What if I need to keep a promise * around for a little while?'' you might ask. In that case, you can keep a * reference, like so * * const nsCString& flat = PromiseFlatString(aCSubstring); * // this reference holds the anonymous temporary alive, but remember, * // it must _still_ have a lifetime shorter than that of |aCSubstring| We fulfill that criteria, so i don't see an issue with lifetime. We should get rid of the plain C-style strings there though, good point.
Addressed feedback per above comment.
Attachment #8482749 - Attachment is obsolete: true
Attachment #8485842 - Flags: review?(jwalden+bmo)
(In reply to Georg Fritzsche [:gfritzsche] from comment #35) > So, i'm not sure if we have a good semantic-aware way to search - that is, > without spending time setting up some clang plugin or similar. At some point it's probably worth our time to do this. Unfortunately every individual security-fix is probably going to beg off this work. :-( Thus is it ever for audit-ish code-quality efforts. > I've done a regex-search for the expected pattern, but i'm happy for input > on improving that approach if it doesn't take a lot of effort. At this point > i also want to wrap this up and move additional work to follow-ups where > possible. Yeah, it'd be nice to wrap it up, but if it's not *actually* wrapped up, checking the item off the to-do list is kind of meaningless. I wonder if jcranmer has any ideas here, for searching for something like |(x + y) / 2| within a loop where x and y are both set in various places in the loop.
Poking jcranmer to find out if he knows anything about tools for doing such a search.
I have more than ideas, but an actual prototype LLVM pass that marks everything that looks like a binary search. I'm currently run a build on my (Linux x64) machine on revision 1e0b75f27196 (~last Friday).
Here's the list of apparent binary searches. Note that I defined binary search as (low + high) {/ 2, >> 1}, where low and high are not loop-invariant. This seems to pick up a few extras (I already trashed the ICU ones from this list, some others may be third-party that I'm not fully aware of). nsTArray.h:1206:7 accessible/base/ARIAMap.cpp:754:7 accessible/generic/HyperTextAccessible.cpp:1860:9 content/base/src/nsDocument.cpp:576:5 content/html/content/src/HTMLFormElement.cpp:1109:7 content/html/content/src/HTMLFormElement.cpp:2317:9 content/xul/templates/src/nsXULContentBuilder.cpp:1956:17 content/xul/templates/src/nsXULTreeBuilder.cpp:1177:21 extensions/spellcheck/hunspell/src/csutil.cxx:221:7 extensions/spellcheck/hunspell/src/replist.cxx:39:7 extensions/spellcheck/src/mozInlineSpellWordUtil.cpp:674:5 extensions/spellcheck/src/mozInlineSpellWordUtil.cpp:720:5 gfx/harfbuzz/src/hb-open-type-private.hh:943:7 gfx/skia/trunk/include/core/SkTSearch.h:52:9 gfx/skia/trunk/src/core/SkGlyphCache.cpp:296:13 gfx/skia/trunk/src/core/SkTSearch.cpp:32:9 gfx/skia/trunk/src/utils/SkParseColor.cpp:407:9 gfx/thebes/gfxFont.cpp:7496:9 gfx/thebes/gfxFontUtils.cpp:1357:21 gfx/thebes/gfxFontUtils.cpp:717:9 gfx/thebes/gfxFontUtils.cpp:740:9 gfx/thebes/gfxMathTable.cpp:372:9 gfx/thebes/gfxMathTable.cpp:395:9 gfx/thebes/gfxSkipChars.cpp:35:9 gfx/thebes/gfxSkipChars.cpp:84:9 intl/locale/nsUConvPropertySearch.cpp:20:5 intl/unicharutil/nsUnicodeNormalizer.cpp:285:3 layout/base/nsGenConList.cpp:145:9 layout/generic/MathMLTextRunFactory.cpp:200:5 layout/generic/nsColumnSetFrame.cpp:873:5 layout/generic/nsTextFrame.cpp:137:7 media/libstagefright/frameworks/av/media/libstagefright/SampleTable.cpp:530:9 parser/html/jArray.h:40:7 parser/htmlparser/nsParser.cpp:687:9 xpcom/threads/TimerThread.cpp:205:5
Here's the LLVM pass that I used to find candidates. Running it requires LLVM/Clang 3.5, -load'ing the .so as appropriate, and adding -Rpass=binary-search to the command line. So far, I've only found one false positive.
Comment on attachment 8485842 [details] [diff] [review] BinarySearch()-/BinarySearchIf()-based replacements, v2 Review of attachment 8485842 [details] [diff] [review]: ----------------------------------------------------------------- (In reply to Georg Fritzsche [:gfritzsche] from comment #36) > I don't think we can map the |while (end - start > 1)| ... or i'm missing > something obvious here. I'm not sure, my quick skim suggested this was just binary search, but maybe I was mistaken. Tell you what -- this bit is going to take longer to review than the rest of this, so here's r+-with-nits for everything *except* this file. I'll follow up with a second comment/review of just this file's changes. In the meantime, you can look into all the other binary-search implementations jcranmer pointed out in comment 41. Ultimately we want all of to land as a single patch/revision. But there's no reason we can't split it up and have separately-reviewable individual pieces of it, prior to landing. We can call this piece with nits fixed, minus mozInlineSpellWordUtil.cpp, as ready to go. > Hm, is not finding mGlyphRuns in bugzilla a sufficiently reliable indicator > for this being fine? I think it is. The problem with having it the other way, is that readers have to start worrying/wondering about why the decrement is only sometimes necessary, and people have to start thinking about how likely it is for the condition to not hold, and it turns into a mess. The more restrictive the behavior, the clearer what will happen and when. So I think this absolutely should be upgraded. If it's been there years without a bug report, I think that's pretty strong proof. (And if it turns out not to be, downgrading is easy.) > > It may be worth renaming |index| to |firstAfter| or so, since it's not > > finding an item but rather the end of a series of them. > > |index| seems generic enough, no? :) Generic is sort of the problem -- "index" is so bland as to convey almost nothing about the meaning. "firstAfter", in contrast, makes clear that it's always greater than one, so we can always subtract one from it without underflow. > I changed behavior there exactly because of the lifetime issues. The comment > there goes on to say: I wouldn't have believed that comment if I hadn't done spec research and discovered that temporaries are ascribed longer lifetime when bound to references. I'm inclined (in another bug) to go and add a chapter-and-verse citation to that comment, for greater believability. ::: content/base/src/nsPlainTextSerializer.cpp @@ +1875,5 @@ > + const char16_t mUcs; > + CombiningComparator(char16_t ucs) : mUcs(ucs) {} > + int operator()(const interval& combining) const { > + MOZ_ASSERT(combining.first <= mUcs); > + MOZ_ASSERT(mUcs <= combining.last); Wait -- I meant for these to occur in the case where we return 0. Placed here I'm pretty sure the entire thing goes up in flames basically immediately -- either that or this last assertion implies that the |if (mUcs > combining.last) return 1;| is dead code. ::: gfx/thebes/gfxFont.cpp @@ +7521,5 @@ > + int operator()(const gfxTextRun::GlyphRun& aRun) const { > + // We intentionally never return 0 here: we want to find the first run > + // containing the offset, so we have to identify the first offset > + // that exceeds it. > + return (mOffset < aRun.mCharacterOffset) ? -1 : 1; Wait, you changed this to < from <= in the previous patch. That makes this return the first-containing run, which is consistent with the function name but inconsistent with the last-glyph-run-containing language in the comment added before the BinarySearchIf call. Given this is a security issue, we should keep the behavior as unchanged as possible *except* wrt the security issue. So leave this as <=, have the comment say "last", and add an XXX comment next to it saying "last" contradicts the function name. The separate, non-security bug can fix the inconsistency. @@ +7538,5 @@ > return mGlyphRuns.Length(); > + } > + > + // Find the last glyph run containing aOffset > + size_t index = 0; Don't initialize -- this implies the initial value is necessary somehow, which is more confusing than helpful. ::: gfx/thebes/gfxFontUtils.cpp @@ +1370,5 @@ > MacFontNameCharsetMapping searchValue = { aScript, aLanguage, nullptr }; > for (uint32_t i = 0; i < 2; ++i) { > + size_t idx; > + if (BinarySearchIf(gMacFontNameCharsets, 0, ArrayLength(gMacFontNameCharsets), > + MacCharsetMappingComparator(searchValue), &idx)) { Indentation is off on this second line. @@ +1376,3 @@ > } > > + // no match, so try again finding one in any language On second look I'd unroll this loop, but it doesn't really matter for this bug's purposes. ::: intl/locale/nsUConvPropertySearch.cpp @@ +12,5 @@ > +struct PropertyComparator > +{ > + const nsCString& mKey; > + PropertyComparator(const nsCString& aKey) : mKey(aKey) {} > + int operator()(const char** aProperty) const { On second look. Can we make this |const char* (&aProperty)[3]| so as to better correspond with what we're actually looking at? (The reference bit prevents the arrays from decaying into pointers, FWIW.) @@ +26,5 @@ > int32_t aNumberOfProperties, > const nsACString& aKey, > nsACString& aValue) > { > + const nsCString& flat = PromiseFlatCString(aKey); I'd kind of prefer to put the PromiseFlatCString as a member of PropertyComparator, because temporary lifetime being extended through assignment to local reference is an exception to normal temporary rules (and a bit spooky). (I didn't know this was a thing before doing a bunch of research for this! Learn something new every day.) But this does work. @@ +28,5 @@ > nsACString& aValue) > { > + const nsCString& flat = PromiseFlatCString(aKey); > + size_t index; > + bool found = mozilla::BinarySearchIf(aProperties, 0, aNumberOfProperties, I'd add |using mozilla::BinarySearchIf;| to files that don't have this already in scope in them, to eliminate qualification, but that's just a style nit. @@ +32,5 @@ > + bool found = mozilla::BinarySearchIf(aProperties, 0, aNumberOfProperties, > + PropertyComparator(flat), > + &index); > + > + if (found) { Again I'd get rid of found as intermediate here.
Attachment #8485842 - Flags: review?(jwalden+bmo) → review+
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #43) > (In reply to Georg Fritzsche [:gfritzsche] from comment #36) > > I don't think we can map the |while (end - start > 1)| ... or i'm missing > > something obvious here. > > I'm not sure, my quick skim suggested this was just binary search, but maybe > I was mistaken. Tell you what -- this bit is going to take longer to review > than the rest of this, so here's r+-with-nits for everything *except* this > file. I'll follow up with a second comment/review of just this file's > changes. In the meantime, you can look into all the other binary-search > implementations jcranmer pointed out in comment 41. > > Ultimately we want all of to land as a single patch/revision. But there's > no reason we can't split it up and have separately-reviewable individual > pieces of it, prior to landing. We can call this piece with nits fixed, > minus mozInlineSpellWordUtil.cpp, as ready to go. Ok, sounds good. Do we necessary need to land the mfbt-based changes as a single patch? If parts of it take longer, it shouldn't matter so much if we land it all later or only those parts. On the other hand, bitrot. > > Hm, is not finding mGlyphRuns in bugzilla a sufficiently reliable indicator > > for this being fine? > > I think it is. The problem with having it the other way, is that readers > have to start worrying/wondering about why the decrement is only sometimes > necessary, and people have to start thinking about how likely it is for the > condition to not hold, and it turns into a mess. The more restrictive the > behavior, the clearer what will happen and when. So I think this absolutely > should be upgraded. If it's been there years without a bug report, I think > that's pretty strong proof. (And if it turns out not to be, downgrading is > easy.) Alright, that sounds reasonable, going for it. > > > It may be worth renaming |index| to |firstAfter| or so, since it's not > > > finding an item but rather the end of a series of them. > > > > |index| seems generic enough, no? :) > > Generic is sort of the problem -- "index" is so bland as to convey almost > nothing about the meaning. "firstAfter", in contrast, makes clear that it's > always greater than one, so we can always subtract one from it without > underflow. Ok. > > I changed behavior there exactly because of the lifetime issues. The comment > > there goes on to say: > > I wouldn't have believed that comment if I hadn't done spec research and > discovered that temporaries are ascribed longer lifetime when bound to > references. I'm inclined (in another bug) to go and add a chapter-and-verse > citation to that comment, for greater believability. Ah, i thought that was common knowledge, but i guess not everyone stumbled over that. Maybe we should add least at a note for the "temporary bound to const-ref" fact there, not sure if we need to throw spec refs in. > ::: content/base/src/nsPlainTextSerializer.cpp > @@ +1875,5 @@ > > + const char16_t mUcs; > > + CombiningComparator(char16_t ucs) : mUcs(ucs) {} > > + int operator()(const interval& combining) const { > > + MOZ_ASSERT(combining.first <= mUcs); > > + MOZ_ASSERT(mUcs <= combining.last); > > Wait -- I meant for these to occur in the case where we return 0. Placed > here I'm pretty sure the entire thing goes up in flames basically > immediately -- either that or this last assertion implies that the |if (mUcs > > combining.last) return 1;| is dead code. Yikes, that's what happens when i address comments in a tired state. > ::: gfx/thebes/gfxFont.cpp > @@ +7521,5 @@ > > + int operator()(const gfxTextRun::GlyphRun& aRun) const { > > + // We intentionally never return 0 here: we want to find the first run > > + // containing the offset, so we have to identify the first offset > > + // that exceeds it. > > + return (mOffset < aRun.mCharacterOffset) ? -1 : 1; > > Wait, you changed this to < from <= in the previous patch. That makes this > return the first-containing run, which is consistent with the function name > but inconsistent with the last-glyph-run-containing language in the comment > added before the BinarySearchIf call. Given this is a security issue, we > should keep the behavior as unchanged as possible *except* wrt the security > issue. So leave this as <=, have the comment say "last", and add an XXX > comment next to it saying "last" contradicts the function name. The > separate, non-security bug can fix the inconsistency. Note that i also changed the operands order per your request to match "this < arg means -1" more closely. Previously: (aRun.mCharacterOffset <= mOffset) ? 1 : -1 Now: (mOffset < aRun.mCharacterOffset) ? -1 : 1
Moved out mozInlineSpellWordUtil.cpp, addressed comments, carrying over r+.
Attachment #8485842 - Attachment is obsolete: true
Attachment #8487142 - Flags: review+
(In reply to Joshua Cranmer [:jcranmer] from comment #41) > Here's the list of apparent binary searches. Note that I defined binary > search as (low + high) {/ 2, >> 1}, where low and high are not > loop-invariant. This seems to pick up a few extras (I already trashed the > ICU ones from this list, some others may be third-party that I'm not fully > aware of). Thanks, that matches with my hits and adds some (didn't consider >>). Can you still gives us the 3rd-party ones so i can check it for the 3rd-party vulns follow-up bug?
Flags: needinfo?(Pidgeot18)
This is mozInlineSpellWordUtil.cpp moved out of part 1.
Attachment #8487206 - Flags: review?(jwalden+bmo)
Comment on attachment 8487206 [details] [diff] [review] BinarySearch()-/BinarySearchIf()-based replacements, part 2, v1 Review of attachment 8487206 [details] [diff] [review]: ----------------------------------------------------------------- Okay, after a bit of staring, some coding and testing, &c. I have this as a working solution that uses BinarySearchIf. Let's run with this, unless I missed something here (but I don't think I did). ::: extensions/spellcheck/src/mozInlineSpellWordUtil.cpp @@ +659,5 @@ > } > > +namespace { > + > +// Find the last value, if any, such that mSoftTextOffset <= aSoftTextOffset. Compute the index of the last element whose mSoftTextOffset <= aSoftTextOffset, or zero if for all elements mSoftTextOffset > aSoftTextOffset. @@ +660,5 @@ > > +namespace { > + > +// Find the last value, if any, such that mSoftTextOffset <= aSoftTextOffset. > +// If not found, returns false, otherwise true and its index in aIndex. "not found" isn't the right description of the condition that causes this -- it's that the container is empty. |start >= end| is a loop invariant. It's true on entry for any container length. Each time either start or end is modified. But, the modification is to a value *between* start/end. |(end - start) / 2| is positive because |end - start > 1| per loop condition, so the value is greater than start. And the value is less than end, because half rounded down of |end - start| never equals |end - start| -- unless |end == start|, but then we didn't enter the loop. So if no loop entry, |start == end| and if loop entry, |start != end| and |start < end|. @@ +667,5 @@ > +FindLastSmallerOffset(const nsTArray<T>& aContainer, int32_t aSoftTextOffset, int32_t* aIndex) > +{ > + int32_t start = 0; > + int32_t end = aContainer.Length(); > + On closer inspection, I remain convinced we can reuse the binary search stuff. We just need to find the first index whose element has mSoftTextOffset > aSoftTextOffset, then subtract one if we found such an element. This pattern seems exactly like the FindFirstGlyphContaining thing in the rest of the patch, except there we were guaranteed to always have at least one qualifying element and so decremented unconditionally. I think this will work: namespace { template<class T> class FirstLargerOffset { int32_t mSoftTextOffset; public: FirstLargerOffset(int32_t aSoftTextOffset) : mSoftTextOffset(aSoftTextOffset) {} int operator()(const T& t) const { // We want the *last* not-greater offset, so never return 0 (which would // short-circuit evaluation before finding the last such offset). return mSoftTextOffset < t.mSoftTextOffset ? -1 : 1; } }; } // anonymous namespace template<class T> bool FindLastNongreaterOffset(const nsTArray<T>& aContainer, int32_t aSoftTextOffset, size_t* aIndex) { if (aContainer.Length() == 0) { return false; } BinarySearchIf(aContainer, 0, aContainer.Length(), FirstLargerOffset<T>(aSoftTextOffset), aIndex); if (*aIndex > 0) { // There was at least one mapping with offset <= aSoftTextOffset. Step back // to find the last element with |mSoftTextOffset <= aSoftTextOffset|. *aIndex -= 1; } else { // Every mapping had offset greater than aSoftTextOffset. MOZ_ASSERT(aContainer[*aIndex].mSoftTextOffset > aSoftTextOffset); } return true; } I did some local experiemntation with this system (will post my test file, with -I $objdir/dist/include needed to work), and it seems to do the right thing. @@ +693,5 @@ > if (!mSoftTextValid) > return NodeOffset(nullptr, -1); > + > + // Find the last mapping, if any, such that mSoftTextOffset <= aSoftTextOffset > + int32_t index; |index| is never used as an int32_t, so this can/should be size_t.
Attachment #8487206 - Flags: review?(jwalden+bmo)
(In reply to Georg Fritzsche [:gfritzsche] from comment #44) > Do we necessary need to land the mfbt-based changes as a single patch? If > parts of it take longer, it shouldn't matter so much if we land it all later > or only those parts. We want to land it all at once so that anyone watching has as little time to react to the changes as possible. Landing just a little bit clues them off that changes are afoot and gives time to attack users that haven't yet been fixed. > On the other hand, bitrot. I doubt much of this is going to bitrot, unless someone else decides to replace one of these binary searches with BinarySearch on their own. That's (sadly) an unlikely occurrence. > Ah, i thought that was common knowledge, but i guess not everyone stumbled > over that. > Maybe we should add least at a note for the "temporary bound to const-ref" > fact there, not sure if we need to throw spec refs in. It's possible I just missed this when learning C++, of course -- not like I took a super-formalized class or anything. :-) I suspect, more than it being common knowledge, most people haven't thought the lifetime bits of this through that far to consider whether this is or isn't supposed to work. I'm a consult-the-spec sort of person, so refs speak clearly to me. And to everyone else, they either make it easy to double-check work, *or* as a very-precise checkable thing they indicate the writer wasn't just making a handwavy claim but had (or at least thought he had) ironclad basis for it. Filed bug 1065774 with a fix. > Note that i also changed the operands order per your request to match "this > < arg means -1" more closely. > Previously: (aRun.mCharacterOffset <= mOffset) ? 1 : -1 > Now: (mOffset < aRun.mCharacterOffset) ? -1 : 1 Oh, right. I hate this comparator interface thing. :-(
This is from what jcranmers run found, the parts using MFBT.
Attachment #8487949 - Flags: review?(jwalden+bmo)
Attached patch Obvious fixups, part 2 (obsolete) — Splinter Review
From jcranmers finds, the obvious/non-sneaky fix.
Attachment #8487957 - Flags: review?(jwalden+bmo)
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #50) > (In reply to Georg Fritzsche [:gfritzsche] from comment #44) > > Do we necessary need to land the mfbt-based changes as a single patch? If > > parts of it take longer, it shouldn't matter so much if we land it all later > > or only those parts. > > We want to land it all at once so that anyone watching has as little time to > react to the changes as possible. Landing just a little bit clues them off > that changes are afoot and gives time to attack users that haven't yet been > fixed. Per the preceding discussion, this will ride the trains anyway (the mftb-based parts), but just breaking it up already helps a lot :) > I'm a consult-the-spec sort of person, so refs speak clearly to me. And to > everyone else, they either make it easy to double-check work, *or* as a > very-precise checkable thing they indicate the writer wasn't just making a > handwavy claim but had (or at least thought he had) ironclad basis for it. > Filed bug 1065774 with a fix. Right, doesn't hurt, good to have this.
ICU ones intl/icu/source/common/ucase.cpp:372:9: intl/icu/source/common/ucnv_bld.cpp:381:9: intl/icu/source/common/ucnv_io.cpp:574:9: intl/icu/source/common/unames.cpp:492:9: intl/icu/source/common/uset.cpp:447:17: intl/icu/source/common/uvector.cpp:466:9: intl/icu/source/common/uvectr32.cpp:306:9: intl/icu/source/i18n/alphaindex.cpp:100:9: intl/icu/source/i18n/alphaindex.cpp:149:13: intl/icu/source/i18n/gregocal.cpp:1156:17: intl/icu/source/i18n/japancal.cpp:407:13: intl/icu/source/i18n/nfrs.cpp:452:13: intl/icu/source/i18n/ucol_bld.cpp:134:9: intl/icu/source/i18n/ucurr.cpp:1192:8: intl/icu/source/i18n/ucurr.cpp:1213:21: intl/icu/source/i18n/ucurr.cpp:1244:21: intl/icu/source/tools/toolutil/package.cpp:908:9:
Flags: needinfo?(Pidgeot18)
Attachment #8487206 - Attachment is obsolete: true
Attachment #8489020 - Flags: review?(jwalden+bmo)
Comment on attachment 8489020 [details] [diff] [review] BinarySearch()-/BinarySearchIf()-based replacements, part 2, v2 Review of attachment 8489020 [details] [diff] [review]: ----------------------------------------------------------------- ::: extensions/spellcheck/src/mozInlineSpellWordUtil.cpp @@ +668,5 @@ > + > +public: > + FirstLargerOffset(int32_t aSoftTextOffset) : mSoftTextOffset(aSoftTextOffset) {} > + int operator()(const T& t) const { > + // We want the *last* not-greater offset, so never return 0 (which would I guess this should be "*first* larger offset", probably. Seeing as this comparator finds that, and the method below is what adjusts the computed index to what we actually want. @@ +680,5 @@ > +FindLastNongreaterOffset(const nsTArray<T>& aContainer, int32_t aSoftTextOffset, size_t* aIndex) > +{ > + if (aContainer.Length() == 0) { > + return false; > + } Misindented. And, it seems good to have a blank line between the length-zero check and the rest of the method. We're handling length-zero qualitatively differently from how we implement the rest of the method. Visual sparation is good for that.
Attachment #8489020 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8487957 [details] [diff] [review] Obvious fixups, part 2 Review of attachment 8487957 [details] [diff] [review]: ----------------------------------------------------------------- ::: layout/generic/nsColumnSetFrame.cpp @@ +869,5 @@ > // that we are crawling through. > maybeContinuousBreakingDetected = true; > } > > + nscoord nextGuess = aConfig.mKnownFeasibleHeight + (aConfig.mKnownInfeasibleHeight - aConfig.mKnownFeasibleHeight) / 2; I think this should be nscoord nextGuess = aConfig.mKnownInfeasibleHeight + (aConfig.mKnownFeasibleHeight - aConfig.mKnownFeasibleHeight) / 2; because per the check a bit above knownInfeasible < knownFeasible - 1. So conceptually we should start with the low value, then add the difference. This might be mathematically identical, but it starts with high and adds a negative (I double-checked this in a debugger), which is less understandable than the other. Another thought (as followup) is that perhaps we should add a mozilla::Midpoint(lo, hi) template method to MathAlgorithms.h. That would make loops of this sort not do the wrong thing (even if it shouldn't be used for most binary searches, rather BinarySearch directly).
Attachment #8487957 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 8487949 [details] [diff] [review] BinarySearch()-/BinarySearchIf()-based replacements, part 3, v1 Review of attachment 8487949 [details] [diff] [review]: ----------------------------------------------------------------- ::: accessible/generic/HyperTextAccessible.cpp @@ +1850,5 @@ > > int32_t > HyperTextAccessible::GetChildIndexAtOffset(uint32_t aOffset) const > { > + using mozilla::BinarySearch; I mildly prefer putting this at top level in C++ files, if it's going to be done. Supposing another use were made elsewhere in this file, it'd be unfortunate if it did this same thing again, without realizing the two should have been commoned up. @@ +1856,2 @@ > uint32_t lastOffset = 0; > + Again mildly prefer start-index computation followed by end-index computation, so keep to the other order. ::: gfx/thebes/gfxFontUtils.cpp @@ +733,3 @@ > const Format14Cmap *cmap14 = reinterpret_cast<const Format14Cmap*>(aBuf); > > + // search in varSelectorRecords Comment seems superfluous without the open-coded algorithm. @@ +743,4 @@ > const NonDefUVSTable *table = reinterpret_cast<const NonDefUVSTable*> > (aBuf + nonDefUVSOffset); > > + // search in uvsMappings Same. ::: gfx/thebes/gfxMathTable.cpp @@ +5,5 @@ > #include "gfxMathTable.h" > > #include "MathTableStructures.h" > #include "harfbuzz/hb.h" > +#include <mozilla/BinarySearch.h> Use "" to include, please. ::: layout/generic/MathMLTextRunFactory.cpp @@ +207,5 @@ > // aTable must be an array, whose length is specified by aNumElements > static uint32_t > MathvarMappingSearch(uint32_t aKey, const MathVarMapping* aTable, uint32_t aNumElements) > { > + using mozilla::BinarySearch; This shouldn't be necessary -- I see a |using namespace mozilla;| at the top of the patch. ::: xpcom/threads/TimerThread.cpp @@ +186,5 @@ > + > +struct IntervalComparator > +{ > + int operator()(PRIntervalTime aInterval) const { > + if (aInterval != 0) { 0 < aInterval is a way to write this that accords more closely with the "common" way to write comparators, I think. I'd probably string this into a ternary, too. @@ +214,5 @@ > MonitorAutoLock lock(mMonitor); > > // We need to know how many microseconds give a positive PRIntervalTime. This > // is platform-dependent, we calculate it at runtime now. > // First we find a value such that PR_MicrosecondsToInterval(high) = 1 We kind of lost the binary-search comment with this patch's changes. I'd do something like ...calculate it at runtime, finding a value |v| such that |PR_MicrosecondsToInterval(v) > 0| and then binary-searching in the range [0, v) to find the ms-to-interval scale. @@ +215,5 @@ > > // We need to know how many microseconds give a positive PRIntervalTime. This > // is platform-dependent, we calculate it at runtime now. > // First we find a value such that PR_MicrosecondsToInterval(high) = 1 > + PRUint32 usForPosInterval = 1; ...PRUint32, really? :-P uint32_t, kplzthx. @@ +222,5 @@ > } > + > + size_t usIntervalResolution; > + BinarySearchIf(MicrosecondsToInterval(), 0, usForPosInterval, IntervalComparator(), &usIntervalResolution); > + ++usIntervalResolution; I'd bracket this increment like so, as a final verification: MOZ_ASSERT(PR_MicrosecondsToInterval(usIntervalResolution) == 0); ++usIntervalResolution; MOZ_ASSERT(PR_MicrosecondsToInterval(usIntervalResolution) == 1); Searching for the first/last element matching some characteristics using binary search is very hard to read as to whether the first/last such element is found. Something like this, that verifies it to the reader, is extremely useful. @@ +227,4 @@ > > // Half of the amount of microseconds needed to get positive PRIntervalTime. > // We use this to decide how to round our wait times later > + int32_t halfMicrosecondsIntervalResolution = usIntervalResolution >> 1; Could you change this to / 2 for readability? No compiler is going to fail to strength-reduce this.
Attachment #8487949 - Flags: review?(jwalden+bmo) → review+
Iteration: 35.1 → 35.2
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #57) > Another thought (as followup) is that perhaps we should add a > mozilla::Midpoint(lo, hi) template method to MathAlgorithms.h. That would > make loops of this sort not do the wrong thing (even if it shouldn't be used > for most binary searches, rather BinarySearch directly). Hm, i was thinking that too, but then we'd either pay for compare+branching or still be prone to lo/hi being switched. Or do you think assertions would cover us sufficiently?
Attachment #8489020 - Attachment is obsolete: true
final (?) try run for the mfbt-based parts with latest comments addressed: https://tbpl.mozilla.org/?tree=Try&rev=b50fe9e1a7c1
Attachment #8487957 - Attachment is obsolete: true
Attachment #8490036 - Flags: review+
So, i'd like to put the MFBT-based parts up on bug 1067989, trying to reproduce most of the review process from here. Waldo, are you fine with copying most of this over? Maybe lets coordinate on IRC later.
See Also: → 1069859
Comment on attachment 8474616 [details] [diff] [review] Add BinarySearchIf() Landed in bug 1059179.
Attachment #8474616 - Flags: checkin+
Comment on attachment 8487142 [details] [diff] [review] BinarySearch()-/BinarySearchIf()-based replacements, part 1, v3 The mfbt-based parts are landing in bug 1067989. gfxFont et al saw a big refactoring, so one part didn't apply anymore: http://hg.mozilla.org/mozilla-central/rev/9528e6149978 Reading through the refactoring i noticed that we missed nsTArrays BinaryIndexOf() so far, i filed bug 1071489 on it.
Attachment #8487142 - Flags: checkin+
Attachment #8490020 - Flags: checkin+
Attachment #8490022 - Flags: checkin+
Blocks: 1069859
(In reply to Georg Fritzsche [:gfritzsche] from comment #59) > (In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #57) > > Another thought (as followup) is that perhaps we should add a > > mozilla::Midpoint(lo, hi) template method to MathAlgorithms.h. That would > > make loops of this sort not do the wrong thing (even if it shouldn't be used > > for most binary searches, rather BinarySearch directly). > > Hm, i was thinking that too, but then we'd either pay for compare+branching > or still be prone to lo/hi being switched. > Or do you think assertions would cover us sufficiently? I think assertions would cover us sufficiently. It's kind of rare, I expect, to have two values, the midpoint of which is being computed, not have some consistent, latent order. I would certainly not have a compare-and-branch just to determine the smaller of them.
[Tracking Requested - why for this release]: This won't make 34 as a sneaky "oh, that just missed uplift" doesnt apply anymore. Its on the trains for 35 now.
Iteration: 35.2 → ---
Whiteboard: [adv-main35-]
Group: core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: