Closed Bug 452357 Opened 16 years ago Closed 13 years ago

TM: trace recorder error handling needs auditing and fixing

Categories

(Core :: JavaScript Engine, defect, P3)

defect

Tracking

()

RESOLVED WONTFIX
mozilla1.9.1

People

(Reporter: brendan, Assigned: graydon)

References

Details

Attachments

(1 file)

Some errors (e.g. from js_FindPropertyHelper or js_LookupPropertyWithFlags in TraceRecorder::test_property_cache) abort recording but do not turn into errors in the interpreter. Other abort cases are not errors. There is code to check cx->throwing in js_StartRecorder that could be emulated elsewhere.

Need to fix this before shipping.

/be
Blocks final release.  blocking1.9.1+.
Flags: blocking1.9.1+
Priority: -- → P3
Graydon, this is needed before we ship, so by b2 preferably, by final for sure (hence P3 targeting 1.9.1). Would be good to have fresh eyes on it. Andreas, David and I are happy to help.

Besides wrongly-suppressed exceptions left pending without false/null return meaning failure propagating, we have OOM issues, but those are tracked over in bug 449534 (see bug 456826 for a particular bug). Thanks,

/be
Assignee: general → graydon
Perhaps this is overzealous, but I don't terribly trust my eyes and I wanted to give the static analysis system a spin.

The attached patch implements static error-code check enforcement. With it, you can declare a C++ function as JS_SETS_ERR or JS_CLEARS_ERR, and the analysis will reject any program that:

  - Calls a SETS_ERR function when an error is already pending and unchecked.
  - Calls a SETS_ERR function and drops the return code on the floor.
  - Calls a SETS_ERR function and -- on the paths where the return code is false
    or NULL -- attempts to return in any of these cases:
      - Returning without clearing the error in a non-marked function.
      - Returning a non-error value in a function itself marked SETS_ERROR.
        (Returning the error value itself is ok, as ESP knows what state it must
        be in)

All this and more: it also simultaneously tracks up to 32 separate JS_SETS_FOO / JS_CLEARS_FOO flag bits, so you can model OOM handling separate from trace-abort handling and whatever other failure modes we're worried about.

The effect is roughly analogous to having Java's checked exception enforcement, except errors are considered null pointer returns or false boolean returns, and adhere to values rather than any exception-handling mechanisms like try/catch.

Comments? Questions? Review? It's my first static analysis pass and so it's probably a bit silly in places. Lots of copy-n-paste from other passes :)

It depends on the pending ESP patches in Bug 473430 and Bug 473492, to improve the Zero-Nonzero analysis a bit.
Attachment #356877 - Flags: review?(dmandelin)
Depends on: 473492, 473430
Comment on attachment 356877 [details] [diff] [review]
A static analysis pass for error-code check enforcement

Detailed comments below. It looks good. r+=me on documenting the special assumptions or surprises I noted below.

The comment is inaccurate. But the misunderstanding makes it clear that our JS map API needs better documentation. There are two map implementations. One, like a traditional chaining hashmap implementation, requires the user to give a hash function (except the hash value is really a string) and an equality test. The other only requires a hash function but the hash value has to be unique to each key. Because the hash values are strings, this is really easy to do in practice, so we usually use that kind.

>+// Tell MapFactory we don't need multimaps (a speed optimization).
>+MapFactory.use_injective = true;

I would like to see these functions brought into the Treehydra libs.

>+  function attrs(tree) {
>+  function hasUserAttribute(tree, attrname) {
>+  function getLocation(node, stack, skiptop) {

>+    let cfg = function_decl_cfg(fndecl);
>+    log("func " + decl_name(fndecl));
>+    for (let bb in cfg_bb_iterator(cfg)) {
>+      log("{");
>+      for (let isn in bb_isn_iterator(bb)) {
>+        isn.setsErr = 0;
>+        isn.clearsErr = 0;
>+        walk_tree(isn, function(t, stack) {
>+          let indent = "\t" + (["\t" for (i in stack)].join(""));
>+          log(indent + isn_display(t));

This next part is unnecessary and should be deleted. If you don't know why, please ask. I just tried cutting it out and it didn't cause any test case failures.

>+          // Make a psvar for everything z/nz might want to track.
>+          if (DECL_P(t)) {
>+            state_vars.push(new ESP.PropVarSpec(t, false, ESP.TOP));
>+          }

I like this design of computing the semantics of the important statements in a first pass. Very nice.

>+          if (TREE_CODE(t) == CALL_EXPR) {
>+
>+            let callee = call_function_decl(t);
>+
>+            for (i in ERRS) {
>+              let err = ERRS[i];
>+              if (setsErr(callee, err)) {
>+                let calleeName = dehydra_convert(callee).name;
>+                isn.setsErr |= (1<<i);
>+                isn.errInfo = [calleeName, getLocation(t, stack, false)];
>+                self.hasErr = true;
>+              }
>+              if (clearsErr(callee, err)) {
>+                let calleeName = dehydra_convert(callee).name;
>+                isn.clearsErr |= (1<<i);
>+                isn.errInfo = [calleeName, getLocation(t, stack, false)];
>+                self.hasErr = true;
>+              }
>+            }
>+
>+            if (isn.setsErr & isn.clearsErr) {
>+              error("Statement both sets and clears the same error",
>+                    getLocation(t, stack, false));
>+            }
>+          }
>+        });

This function isn't going to give you the right results if you split a "bits"-type variable with abstract value TOP. But it might not matter in practice. 

>+  ErrCheck.prototype.split = function(vbl, v) {
>+    log("ErrCheck: split called");
>+    if (isBits(v)) {
>+      let vs = [];
>+      for (let i = 0; i < 32; i++) {
>+        if (v.bits & (1<<i)) {
>+          vs.push(bits(1<<i));
>+        }
>+      }
>+      return vs;
>+    }
>+    // Delegate.
>+    return ESP.Analysis.prototype.split(vbl, v);
>+  }

These will give funny results if one value is a "bits" value and the other is not. But I think you ensure that never happens. And you already said how you would like to be able to have different lattices, and I agreed.

>+  ErrCheck.prototype.join = function(v1, v2) {
>+    log("ErrCheck: join called");
>+    if (isBits(v1) && isBits(v2)) {
>+      return bits(v1.bits | v2.bits);
>+    }
>+    // Delegate.
>+    return Zero_NonZero.join(v1,v2);
>+  }
>+
>+  ErrCheck.prototype.meet = function(v1, v2) {
>+    log("ErrCheck: meet called");
>+    if (isBits(v1) && isBits(v2)) {
>+      return bits(v1.bits & v2.bits);
>+    }
>+    // Delegate.
>+    return Zero_NonZero.meet(v1,v2);
>+  }

Brief header comment saying what this is testing for would be nice.

>+  function returnsZero(ss, isn) {
>+    let retval = isn.operands()[0];
>+    if (TREE_CODE(retval) == GIMPLE_MODIFY_STMT) {
>+      let [res,rhs] = retval.operands();
>+      if (DECL_P(rhs)) {
>+        if (ss.get(rhs) == 0)
>+          return true;
>+      }
>+    }
>+    return false;
>+  }
>+
>+  ErrCheck.prototype.flowState = function(isn, state) {
>+
>+    log("ErrCheck: +++ flowState");
>+
>+    let statevar = this._state_var_decl;
>+    let fndecl = this._fndecl;
>+    let didZnz = false;
>+
>+    state.update(function(ss) {
>+
>+      let v = ss.get(statevar);
>+      let currFlags = isBits(v) ? v.bits : 0;
>+
>+      for (i in ERRS) {
>+        let err = ERRS[i];
>+        let flag = 1<<i;

It was surprising to me to see these checks here. If you check a statement inside a loop, I think it's possible you'll get multiple error reports, but I think it is OK unless a problem is found in practice.

>+        // Check #1: if error N is set, setting it again while it's still pending is a no-no.
>+        if ((isn.setsErr & flag) && (currFlags & flag)) {
>+          error("Unhandled pending " + err + " error when calling " + isn.errInfo[0], isn.errInfo[1]);
>+        }
>+
>+        // Check #2: if error N is not set, clearing it is a no-no.
>+        if ((isn.clearsErr & flag) && !(currFlags & flag)) {
>+          error("Clearing un-set error " + err + " when calling " + isn.errInfo[0], isn.errInfo[1]);
>+        }
>+
>+        // Check #3: if returning with error N set, must have declared sets-error-N on self and
>+        // must be returning 0/false/null.
>+        if (TREE_CODE(isn) == RETURN_EXPR 
>+            && (flag & currFlags)) {
>+          if (!(fndecl.setsErr & flag))
>+            error("Returning from function that does not set " + err + ", with unhandled error")
>+          else if (!returnsZero(ss, isn))     

I think this error message is confusing.
       
>+            error("Mis-propagating error " + err + " via nonzero-return")
>+        }
>+
>+      }
>+
>+      let substates = [];
>+
>+      // Note the flags cleared by this isn (or just the incoming state, if no flags are cleared).
>+      currFlags &= ~(isn.clearsErr);
>+      ss.assignValue(statevar, bits(currFlags), isn);
>+      substates.push(ss);
>+
>+      /*
>+       * When the program-under-analysis runs, the occurrence of a setsErr is coupled with those
>+       * paths in which this isn is a call that "returns false" (or null): ZERO in the z/nz analysis,
>+       * rather than NONZERO. So if setsErr != 0, this is an isn in which z/nz status will split the
>+       * substate in two, one associated with NONZERO (that keeps currFlags as-is) and one associated with
>+       * ZERO (that sets the state-var to have all the setsErr flags raised).
>+       *
>+       * ESP.TOP should be the abstract z/nz state of the LHS of the isn in the current substate.
>+       */
>+      if ((currFlags | isn.setsErr) != currFlags) {
>+        currFlags |= isn.setsErr;
>+        log("ErrCheck: Error-flag transition -> " + currFlags);
>+        let [lhs,rhs] = TREE_CHECK(isn, GIMPLE_MODIFY_STMT).operands();
>+        if (DECL_P(lhs)) {

I don't understand this check. Are you really insisting that the lhs for an assignment from an error-returning function is uninitialized?

>+          let znz_val = ss.get(lhs);
>+          if (znz_val != ESP.TOP)
>+            error("Zero-Nonzero value for LHS is not ESP.TOP as expected");

Looks good.

>+          /*
>+           * The two substates:
>+           *   ss is "nonzero return => no flags change"
>+           *   ss_err is "zero return => flags change"
>+           */
>+
>+          let ss_err = ss.copy();
>+          ss.assignValue(lhs, Zero_NonZero.Lattice.NONZERO, isn);
>+          ss_err.assignValue(lhs, Zero_NonZero.Lattice.ZERO, isn);
>+          ss_err.assignValue(statevar, bits(currFlags), isn);
>+          substates.push(ss_err);
>+          didZnz = true;
>+          log("ErrCheck: Split substate");
>+        }
>+      }
>+
>+      return substates;
>+    });
>+
>+    // Delegate.
>+    if (!didZnz) {
>+      log("ErrCheck: Delegating to ZNZ");
>+      this._zeroNonzero.flowState(isn, state);
>+    }
>+    log("ErrCheck: --- flowState");
>+  }
>+
>+  return {ErrCheck: ErrCheck};
>+}();
Attachment #356877 - Flags: review?(dmandelin) → review+
Thanks for the feedback, will update before proceeding.

Question to the audience on this bug: this is not P1 right now and I'm likely to go focus on those for a while. When I return to it, shall I focus (in addition to addressing comments here and in the ESP patches) on annotating functions in jstracer.js to conform to this sort of marked-function analysis, or should I play with trying to implement user attributes on types, and modify this analysis to use the presence/absence of a return value of an "error code" type in place of a function annotation?
Depends on: 488018
In the past months I've managed to set aside 3 days to "audit" the file jstracer.cpp, in the sense of printing the entire thing out and reading it with a red pen. I made a lot of notes about things to clean up, and a lot of notes about things I didn't understand or were too complex to follow.

I did not specifically find any error-handling bugs, for the most part because sizable chunks of the file fall in the latter category: too complex to follow. I've attempted to write up notes on what I found and understood <a href="https://developer.mozilla.org/En/SpiderMonkey/Internals/Tracing_JIT#SpiderMonkey-specific_high_level_component.3a_jstracer.*">over here</a>, but that supposed understanding (a) doesn't actually prove that the file obeys the intended state transitions strictly and (b) doesn't cover all the sub-states of type stabilization and tree-calling, which is where most of the "stuff too complex to follow" shows up.

In short, while I understand the system much more now than I did a few months ago, I am not significantly closer to trusting my eyeball judgment of its correctness than I was back in january. I'm not sure whether to continue hacking on the error-code static analysis here (which I'm currently doing) or perhaps follow some other route towards resolving this bug.

Is there some more narrow sense in which others would like me to do a focused re-reading / auditing?
I don't think we can block on this. If there are specific issues that must be fixed for release, please do speak up.
Flags: blocking1.9.1+ → blocking1.9.1-
These bugs are all part of a search I made for js bugs that are getting lost in transit:

http://tinyurl.com/jsDeadEndBugs

They all have a review+'ed, non-obsoleted patch and are not marked fixed-in-tracemonkey or checkin-needed but have not seen any activity in 300 days. Some of these got lost simply because the assignee/patch provider never requested a checkin, or just because they were forgotten about.
Obsolete with the removal of tracejit.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: