Closed Bug 629858 Opened 14 years ago Closed 13 years ago

strict warning "function f does not always return a value" can cause buffer overreads

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
firefox5 --- fixed
blocking2.0 --- -
status2.0 --- wontfix
blocking1.9.2 --- .18+
status1.9.2 --- .18-fixed
status1.9.1 --- wanted

People

(Reporter: gkw, Assigned: n.nethercote)

Details

(Keywords: crash, privacy, testcase, Whiteboard: [sg:moderate], [needs 1.9.2 landing] fixed-in-tracemonkey)

Attachments

(5 files, 1 obsolete file)

I tried but somehow failed to get a Windows environment with symbols for a js shell, but nonetheless attaching what I have.

The testcase is extremely fragile but somewhat reproducible. It doesn't seem to affect debug shells. Assuming worse-case [sg:critical?] pending further analysis.
Tested with -m and -j on TM changeset 3f23c7719100. It seems to only occur with both -m and -j but I'm not sure.

This issue (or its variants) seems pretty common on Windows when running jsfunfuzz, but getting a somewhat reliable testcase is an extreme PITA.

How one wishes to have Breakpad for js shells.
blocking2.0: --- → ?
Summary: INVALID_POINTER_READ with testcase → TM/JM: Crash [@ js::TypedArray::operator] INVALID_POINTER_READ with testcase
Could be a dup of bug 625141. Calling it a hardblocker pending a fix from there or further analysis.
Whiteboard: [sg:critical?] → [sg:critical?][hardblocker]
blocking2.0: ? → final+
Demoting because I can not repro this on either tip or 3f23c7719100.
Whiteboard: [sg:critical?][hardblocker] → [sg:critical?][softblocker]
(In reply to comment #4)
> Demoting because I can not repro this on either tip or 3f23c7719100.

Definitely still reproduces on Windows 7 changeset 9459c4b15890. Just keep trying:

$ ./js-opt -m -j testcase.js

and one will crash once in every 5-6 runs of the command.
(In reply to comment #5)
> $ ./js-opt -m -j testcase.js

Of course, the binary should have a .exe extension on windows...
Gary, do you still see this now that bug 625141 is fixed-in-tracemonkey?
(In reply to comment #7)
> Gary, do you still see this now that bug 625141 is fixed-in-tracemonkey?

Yes, I do still see it (once every 5-6 tries, with the testcase in comment 0) with the following changeset on tip:

http://hg.mozilla.org/tracemonkey/rev/a6d56af51d69

This includes the checkin for bug 625141.
Is your js shell 32-bit or 64-bit?  ./js -j -e "print(jitstats.archIs64BIT)"

Can you reproduce on any platform other than Windows?

Does using a loop (around the entire block of tryItOut calls) make the crash reliable enough to use Lithium?
32-bit:

$ file js-opt-32-tm-nt.exe
js-opt-32-tm-nt.exe: PE executable for MS Windows (console) Intel 80386 32-bit

$ ./js-opt-32-tm-nt.exe -j -e "print(jitstats.archIs64BIT)"
-e:1: ReferenceError: jitstats is not defined

> Can you reproduce on any platform other than Windows?

Nope, but again, I'm not entirely sure, it doesn't blow on Linux and Mac for me no matter how many times I try but the problem might still be there.

> Does using a loop (around the entire block of tryItOut calls) make the crash reliable enough to use Lithium?

Tried that, nope, it doesn't make it reliable.

Tested on TM changeset 3b3710520c0e.
> -e:1: ReferenceError: jitstats is not defined
Oh, right, jitstats is debug-only and this crash is opt-only.
This may crash after 1-2 minutes, or it may not. Either way, it's further reduced. The previous earlier unreduced testcase takes 5-6 seconds to crash if it decides to.

Both tested on Windows 7 opt build on TM changeset a1a8cd4accba.
This is a sg:critical bug, with a testcase. That say it needs to be a hardblocker. If this is not reproducible enough, then we can argue about that, but it sounds like this reproduces for Gary, so let's at least give it a shot.
Whiteboard: [sg:critical?][softblocker] → [sg:critical?][hardblocker]
I can't reproduce a crash with signature js::TypedArray::operator.

I can crash with either test case if I do many runs, but the crash is in a js.cpp error reporter function. By itself, that would definitely not be sg:critical, but the reason we crash there is that we create a report object with a bad pointer for |tokenptr|. mxr seems to say that |tokenptr| isn't used in the browser, so this wouldn't affect the browser.

Gary, can you still reproduce the original crash signature?
This patch stops the shell crash, which could at least be helpful for fuzzing. I'm not sure of the exact cause of the error yet, but it has something to with having a 0 character in the code. The line-handling code ends up thinking the line is short, but tokenpos data refers to locations after the 0. I'm not sure if the tokenpos is for the right line, or a different line, or what. Not sure it's that important at this point, it's kind of a corner case and it's easy to step on it this way.
With the patch for the error reporter issue, I'm not able to repro this in 1000 runs of the original test case or 10 runs of the longer-running one. This is on Windows 7 with a 32-bit opt shell (TM 9ecbccbaff7b, VS2008, ../configure  --enable-debug-symbols). So, unblocking for now.
blocking2.0: final+ → .x
(In reply to comment #15)
> Created attachment 511566 [details] [diff] [review]
> Patch for shell error reporter crash
> 
> This patch stops the shell crash, which could at least be helpful for fuzzing.
> I'm not sure of the exact cause of the error yet, but it has something to with
> having a 0 character in the code. The line-handling code ends up thinking the
> line is short, but tokenpos data refers to locations after the 0. I'm not sure
> if the tokenpos is for the right line, or a different line, or what. Not sure
> it's that important at this point, it's kind of a corner case and it's easy to
> step on it this way.

Yes, this gets stepped on by the fuzzer on Windows very often, and the patch does immensely help to stop the crash. I'm not sure why I got a crash with another signature, but for the testcases here it seems to stop crashes, so let's get the patch in and fuzz-testing on Windows can resume :)
I think I noticed this on ARM as well, just that it is very difficult to get a reliable testcase as per above..
OS: Windows 7 → All
Hardware: x86 → All
Attachment #511566 - Flags: review?(jwalden+bmo)
See bug 634444, which may be related.
(In reply to comment #19)
> See bug 634444, which may be related.

Yeah, if dmandelin's patch helps it's almost certainly bug 634444, which I've officially decided should be backed out and not included in Firefox 4.0 (see bug 635235).
Oh, I didn't realize this was triggering before bug 634444 landed.  That's not good.

It's possible that the patch for bug 588648 caused this, though it's strange that it would only start happening in late January, because that patch landed in early December.  Gary, can you either back that patch out or revert to just before it landed and see if you can reproduce the problem?  Thanks.
(In reply to comment #21)
> Gary, can you either back that patch out

(In your local repository, I mean! :)
> It's possible that the patch for bug 588648 caused this, though it's strange
> that it would only start happening in late January, because that patch landed
> in early December.

I only started to occasionally fuzz-test on Windows from late Jan onwards..
(In reply to comment #15)
> 
> This patch stops the shell crash, which could at least be helpful for fuzzing.
> I'm not sure of the exact cause of the error yet, but it has something to with
> having a 0 character in the code. The line-handling code ends up thinking the
> line is short, but tokenpos data refers to locations after the 0.

You mean a NUL char, ie. a char with value of 0?  Where does that occur in the test case?

> I'm not sure
> if the tokenpos is for the right line, or a different line, or what. Not sure
> it's that important at this point, it's kind of a corner case and it's easy to
> step on it this way.

There's definitely some occasional weirdness in the parser and/or scanner to do with the 'index' value used in reportCompileErrorNumberVA().  Bug 635235 (see the e4x_TCO++ patch) may be the root cause, we should fix that.  If that's not the root cause, it would be good to put the above patch in as a defensive measure (with a comment explaining why the extra test is needed).
Comment on attachment 511566 [details] [diff] [review]
Patch for shell error reporter crash

Whee, I actually understand the patch's code now!  (I didn't when I glanced at this when it was first posted, but with bug 635235 under my belt I grok it a fair bit more.)

Bug 634444 relied on every token's information always being correct, as did previous (now again-current) code, but the getToken state space is large enough more failures-to-set and the like may lurk.  So this might be bug 635235, or it might be something else.  It's impossible to say without a rigorous proof that the E4X failure is the only one in existence, which seems a difficult thing to produce, and arguably isn't worth the time to do so.

Either way, the patch here seems to be a hackaround.  Further, given the memory-writing behavior I observed in bug 635235, I don't believe this hack would actually protect against all badness of these sorts.

One thing that might help with diagnosing failure-to-initialize-token bugs would be to not rely on overwriting default-initialized values (like the end position's lineno) for special cases.  If the end lineno hadn't been set in that case, it would have been a valgrind uninitialized-value copy that would have been easier to see.  But that means the simple cases require more cleanup drudge.  That seems a hard tradeoff to make either way.  Maybe someone else sees another idea for how to make the token-generation stuff more robust to producing faulty tokens?
Attachment #511566 - Flags: review?(jwalden+bmo) → review-
(In reply to comment #25)
> 
> Bug 634444 relied on every token's information always being correct

True.  I didn't realize defensive programming was necessary here :/

> One thing that might help with diagnosing failure-to-initialize-token bugs
> would be to not rely on overwriting default-initialized values (like the end
> position's lineno) for special cases.  If the end lineno hadn't been set in
> that case, it would have been a valgrind uninitialized-value copy that would
> have been easier to see.  But that means the simple cases require more cleanup
> drudge.  That seems a hard tradeoff to make either way.  Maybe someone else
> sees another idea for how to make the token-generation stuff more robust to
> producing faulty tokens?

There's a small circular token buffer (TokenStream::tokens) and old tokens get overwritten all the time, so Valgrind wouldn't warn against uninitialized values without some annotations.  We could overwrite with some canary values instead.  But as you say, newToken() does a lot of default initialization, which defeats canaries.

One idea is to add a Token sanity checker.  It would do things like:

- If |begin| and |end| are on different lines, check it's a token that's allowed to be on multiple lines (eg. strings, xml tokens, nothing else?)
- If |begin| and |end| are on the same line, check that begin < end.

That's all I can think of right now.  There's scope for more assertions in other places too, I think.
I've worked this out.  I think it shouldn't be a hardblocker because (a) the bug is present in 1.9.1 and 1.9.2, so it's not a regression, and (b) it can only occur in the browser if javascript.options.strict is enabled.  (Note that this turns on SpiderMonkey's "strict warnings", which is similar in spirit to ES5's "strict mode" but looks for different things.)

Here's a small reproducer;  in the shell, you need to run with |-o strict| to enable strict warnings.

  function f(a) {
      if (a > 3)
          return 644;
  }

It doesn't crash for me on Linux but Valgrind complains about overrunning a buffer.  (I got the same Valgrind complaint with Gary's original test.)  It appears it can only crash the shell on Windows, which is presumably why Gary only just found it even though it's been present for ages.  As for why it manifests in js::TypedArray::operator, who knows... memory corruption?

The problem is that TokenStream::reportCompileErrorNumberVA() uses a combination of the current token (from tokens[cursor]) and the current state in TokenStream when reporting the error.  In this case, ';' is the current token, and from that we get the index (which is the column number minus one), which is 18.  But TokenStream is already on the next line by the time the strict warning is issued, and so the line length is computed as 2 (the '}' plus the newline).  We end up copying 2 chars from line 4 into the buffer used for the error, but try to index 18 chars into it.  You can see this in the actual warning:

  b.js:4: strict warning: function f does not always return a value:
  b.js:4: strict warning: }
  b.js:4: strict warning: ..................^

The carat is pointing 18 chars into line 4.

The patch I'm about to attach fixes the problem by passing in a JSParseNode, which causes its position to be used instead of the current token's position, and we end up with this warning:

  b.js:1: strict warning: function f does not always return a value

which is clearer anyway.  There are three other places where ReportBadReturn() is called, so I passed in what seemed like the right JSParseNode to each of them too.  I confirmed that Valgrind doesn't complain with Gary's original testcase once this patch is applied.

Though I think it shouldn't be a hardblocker, I'll ask for 2.0 approval once I get r+ because the patch is very small and IMO safe.
Assignee: general → nnethercote
Status: NEW → ASSIGNED
Keywords: regression
(I removed the [hardblocker] marking because it's been changed to blocking2.x since bug 634444 -- which caused the bug to be exposed more -- was backed out.)
Whiteboard: [sg:critical?][hardblocker] → [sg:critical?]
Attachment #511566 - Attachment is obsolete: true
Attachment #513723 - Flags: review?(jwalden+bmo)
Summary: TM/JM: Crash [@ js::TypedArray::operator] INVALID_POINTER_READ with testcase → strict warning "function f does not always return a value" can cause buffer overruns
Comment on attachment 513723 [details] [diff] [review]
patch (against TM 62529:4e085ba15d4c)

>diff --git a/js/src/jsparse.cpp b/js/src/jsparse.cpp

> static JSBool
>-ReportBadReturn(JSContext *cx, JSTreeContext *tc, uintN flags, uintN errnum,
>+ReportBadReturn(JSContext *cx, JSTreeContext *tc, JSParseNode *pn, uintN flags, uintN errnum,
>                 uintN anonerrnum)
> {
>+    JS_ASSERT(pn);          /* we must have a pn -- see bug 629858 comment 27 */

Nix the comment, doesn't seem to add much here.  This would seem like a place for pass-by-reference -- but for the reference haters around here, alas.  ;-)

The name pn is canonical and also canonically unhelpfully informative, for a method whose behavior is not trivially obvious.  I don't have a good suggestion for a better name.


>@@ -1539,17 +1540,17 @@ Parser::functionBody()
>     } else {
>         pn = UnaryNode::create(tc);
>         if (pn) {
>             pn->pn_kid = assignExpr();
>             if (!pn->pn_kid) {
>                 pn = NULL;
>             } else {
>                 if (tc->flags & TCF_FUN_IS_GENERATOR) {
>-                    ReportBadReturn(context, tc, JSREPORT_ERROR,
>+                    ReportBadReturn(context, tc, pn, JSREPORT_ERROR,
>                                     JSMSG_BAD_GENERATOR_RETURN,
>                                     JSMSG_BAD_ANON_GENERATOR_RETURN);

For any future readers wondering how this can get hit:

js> function f() ((yield 7), 5)
;
typein:10: TypeError: generator function f returns a value:
typein:10: ;
typein:10: ..........................^

Yes, really.


Perhaps we can't test these directly (unless we wrote a shell error reporter that would save the information to the side -- maybe useful, doesn't seem important to do it now), but we could write tests that demonstrating this stuff's Valgrind-safe.  r=me if you write some testcases that trigger Valgrind warnings pre-patch and don't post-patch and dump them in a js/src/tests/*/extensions directory.
Attachment #513723 - Flags: review?(jwalden+bmo) → review+
Jeff: const refs are not hated, mutable refs are.

OTOH, asserting pn non-null is against house style and seems silly here. The single-file scope imposed by static on ReportBadReturn makes for a small-world codebase, where one can grep for all the calls. Anyway crashing on a near null constant-offset dereference is safe.

/be
Okay.

The assertion for non-null doesn't seem silly to me, because the ReportCompileError method permits a NULL pn to be passed into it.  That's what the assertion guards against: RCE trying to be clever when it's not explicitly told where to look.  The non-NULL requirement would not be clear to me, knowing that RCE accepts NULL pn, without that assertion.
The assertion only guards by crashing a debug build in a controlled way. But so does a guaranteed NPE. Thus the don't-assert-not-null style rule.

For a real language-level guarantee, I agree a const ref could be used. But we don't use those elsewhere in jsparse.cpp, and nul pn being valid is unusual (you're right that the report-compile-error path is where that comes up).

I suggest consulting with jimb, luke, and jorendorff, and having a plan to use const refs more consistently, rather than injecting them piecemeal, and in this bug.

/be
(In reply to comment #33)
> The assertion only guards by crashing a debug build in a controlled way. But so
> does a guaranteed NPE. Thus the don't-assert-not-null style rule.

Quite true.  I have no beef with that as long as it doesn't "unduly" hinder readability.  The vast majority of the time it doesn't.  (The exception I'm thinking of is where the pointer gets passed down half a dozen levels or so before a deref happens, or perhaps where it only *might* happen.  Again, that's very rare.)

> I suggest consulting with jimb, luke, and jorendorff, and having a plan to use
> const refs more consistently, rather than injecting them piecemeal, and in this
> bug.

I'm fine with or without them as a general rule, don't need or not-need a plan one way or the other.  One just seems appropriate here, in this narrow case, for making the requirement crystal-clear in the lowest-mental-overhead way.

(Why does it feel like we've talked past each other in almost every interaction we've had in the last couple weeks?  :-( )
(In reply to comment #32)
> Okay.
> 
> The assertion for non-null doesn't seem silly to me, because the
> ReportCompileError method permits a NULL pn to be passed into it.  That's what
> the assertion guards against: RCE trying to be clever when it's not explicitly
> told where to look.  The non-NULL requirement would not be clear to me, knowing
> that RCE accepts NULL pn, without that assertion.

That's why I put it there *and* added a comment -- to strongly indicate that it wasn't just random paranoia.  But I can remove the assertion.


(In reply to comment #30)
> 
> For any future readers wondering how this can get hit:
> 
> js> function f() ((yield 7), 5)
> ;
> typein:10: TypeError: generator function f returns a value:
> typein:10: ;
> typein:10: ..........................^
> 
> Perhaps we can't test these directly (unless we wrote a shell error reporter
> that would save the information to the side -- maybe useful, doesn't seem
> important to do it now), but we could write tests that demonstrating this
> stuff's Valgrind-safe.  r=me if you write some testcases that trigger Valgrind
> warnings pre-patch and don't post-patch and dump them in a
> js/src/tests/*/extensions directory.

Thanks for the test case!  I was thinking about storing this extra context info somewhere that it's testable anyway (Shaver suggested an extra property in SyntaxError) because I've been making various changes to the scanner lately and am a bit nervous due to the lack of regression testing of error messages.
I think an assertion is the best choice here:

- In cases where a pointer will be dereferenced anyway, it's fine to let the machine's null pointer checking do the work for us --- but in this case the pointer will not be dereferenced if it is null.

- Readers should be able to infer invariants themselves by reading the code --- but ReportCompileErrorNumber permits a null pointer, misleading such efforts.

- One can't create a reference to NULL without dereferencing a NULL pointer, which is undefined, so 'const foo &' is a type-ish way of indicating that an argument must null. But nowhere do we pass JSParseNodes by reference this way.

So while I don't want us to go crazy with assertions either, the alternatives to and arguments against assertions that I'm aware of don't seem to apply here.

Contra Waldo, I would like the assertion to have a brief comment to explain why we can't just pass along RCEN's default behavior --- perhaps "This error is often detected after the token stream has moved to a different line, so without a parse node, RCEN guesses the error position poorly"?

(I probably comment too much, but I do think a reference to bugzilla should be a second choice to a brief, clear, in-place explanation.)
fourth para, please s/must null/must not be null/.
Truly I don't care that much whether there is, or isn't, a comment.  Actually I made the suggestion because I thought Brendan would be happier without it, because it's not hugely more informative than /* require non-null pn */ would be, especially in concert with hg annotate.  So if it helps, pretend I never said anything about the assertion at all, and just add the couple valgrind-anticipating tests.  :-(
I'm sad so much ink was spilled over this. It's true that we use pointers known to be nullable or non-null, with conventions, going back many years and due to C not C++ in large part. A bit more code like that wouldn't break us old-timers, but you who are newer are within your rights to object, and the C++ switch is a big win we shouldn't overlook.

So, I agree with Jeff: use const refs to get the compiler covering your back at compile time. Do not bloat the source with comments and assertions that at best fail hard at runtime.

/be
I'm marking this as [sg:moderate] because it's very similar to bug 635235 -- in comment 27, if you make the value returned longer you can cause a read past the end of a buffer of arbitrary length, so it's a privacy risk.  Albeit a smaller one than bug 635235 because you need javascript.options.strict enabled, which is an unusual option.
Whiteboard: [sg:critical?] → [sg:moderate]
http://hg.mozilla.org/tracemonkey/rev/17273c2c0eda

I ended up removing the assertion.  I tried using a ref-to-const but then I would have had to do a cast to get rid of the const-ness when passing 'pn' on, which was ugly, so I didn't use that.

I added a test, but it's not run under Valgrind.  The problem was that Valgrind jit-tests get their stderr output printed on the console, so you can see any Valgrind warnings.  But SpiderMonkey was also printing a warning, which is console pollution we don't want.  Non-valgrind jit-tests don't have stderr printed to the console.  In compensation, I made the return value a very long string, which maximizes the likelihood of a crash when the fix isn't present.
Whiteboard: [sg:moderate] → [sg:moderate], fixed-in-tracemonkey
Attached patch landed patchSplinter Review
Requesting this for 1.9.1 and 1.9.2.  It should also go into 2.0.1 (or whatever it's called) -- the approval2.0? request is for 2.x.
Attachment #516741 - Flags: approval2.0?
Attachment #516741 - Flags: approval1.9.2.15?
Attachment #516741 - Flags: approval1.9.1.18?
Comment on attachment 516741 [details] [diff] [review]
landed patch

Oh, it's already marked blocking2.x;  I'll remove the approval2.0 request.
Attachment #516741 - Flags: approval2.0?
(In reply to comment #41)
> http://hg.mozilla.org/tracemonkey/rev/17273c2c0eda
> 
> I ended up removing the assertion.  I tried using a ref-to-const but then I
> would have had to do a cast to get rid of the const-ness when passing 'pn' on,
> which was ugly, so I didn't use that.

Would rather const-ipate harder. But yeah.

/be
blocking1.9.1: --- → needed
blocking1.9.2: --- → needed
Comment on attachment 516741 [details] [diff] [review]
landed patch

We don't want to land this on the branches until we know we can get it into a shipping FF4 update, holding off on the approval requests for now.
(In reply to comment #40)
> I'm marking this as [sg:moderate] because it's very similar to bug 635235
> [...] you can cause a read past the end of a buffer of arbitrary length

So the summary is better as a buffer "overread" than "overrun"?
Summary: strict warning "function f does not always return a value" can cause buffer overruns → strict warning "function f does not always return a value" can cause buffer overreads
Keywords: privacy
(In reply to comment #46)
> 
> So the summary is better as a buffer "overread" than "overrun"?

Sure.
http://hg.mozilla.org/mozilla-central/rev/17273c2c0eda
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment on attachment 516741 [details] [diff] [review]
landed patch

Moving request to next cycle
Attachment #516741 - Flags: approval1.9.2.18?
Attachment #516741 - Flags: approval1.9.2.17?
Attachment #516741 - Flags: approval1.9.1.20?
Attachment #516741 - Flags: approval1.9.1.19?
blocking1.9.1: needed → ---
blocking2.0: .x+ → -
status2.0: --- → wontfix
Comment on attachment 516741 [details] [diff] [review]
landed patch

Approved for 1.9.2.18, a=dveditz for release-drivers
Attachment #516741 - Flags: approval1.9.2.18?
Attachment #516741 - Flags: approval1.9.2.18+
Attachment #516741 - Flags: approval1.9.1.20?
blocking1.9.2: needed → .18+
Whiteboard: [sg:moderate], fixed-in-tracemonkey → [sg:moderate], [needs 1.9.2 landing] fixed-in-tracemonkey
Comment on attachment 516741 [details] [diff] [review]
landed patch

Requesting approval for 2.0 (patch applies cleanly to branch).
Attachment #516741 - Flags: approval2.0?
Comment on attachment 516741 [details] [diff] [review]
landed patch

Approved for the mozilla2.0 repository, a=dveditz for release-drivers
Attachment #516741 - Flags: approval2.0? → approval2.0+
Group: core-security
A testcase for this bug was automatically identified at js/src/jit-test/tests/basic/bug629858.js.
Flags: in-testsuite+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: