Closed Bug 976850 Opened 10 years ago Closed 9 years ago

[email/backend] Refactor/clean-up GELAM tests to be more intuitive; distinguish front-end from back-end, helper steps, etc.

Categories

(Firefox OS Graveyard :: Gaia::E-Mail, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: asuth, Assigned: mcav)

Details

Attachments

(6 files)

We got a little thread going on in bug 975298 about confusing tests and have had some brainstorming.  Moving the conversation here.

Discussion starts at https://bugzilla.mozilla.org/show_bug.cgi?id=975298#c3 with my first brain-dump at https://bugzilla.mozilla.org/show_bug.cgi?id=975298#c5 which I'm including here inline:


Agreed.  Here's some ideas:

=== Go somewhat overkill on namespacing and being explicit.

So examples might be:

- let {frontend, backend, servers} = unavoidableMagic.gimmeContext();

- T.magicSteps(unavoidableMagic.makeAndSaveToNamespaces({ what: 'account', name: 'myacct',  , haveFoldersWithMessages: [ { name: 'bob', messageSpec: ... } ] })

- frontend.myacct.realObj.clearProblems()

- backend.myacct.rootAccount.actor.expectlog_clearAccountProblems() which was testAccount.eAccount.expect_clearAccountProblems() and where rootAccount correspond to the composite account.  (Unfortunately, to transparently have the same test work for ActiveSync, we can't really name-check CompositeAccount)

- backend.myacct.folderAccount.actor.expectlog_createConnection()

- backend.myacct.helper.expect_saveState()

- T.magicSteps(servers.myacct.createTestFolder('bob')) creates a folder only on the server.  Whether the frontend/backend know about it depends when this happened, etc.

- T.magicSteps(frontend.mailapi.helper.makesteps_viewFolders(unavoidableMagic.saveToNamespaces('folders'))

- T.magicSteps(frontend.mailapi.helper.makesteps_openFolderView(unavoidableMagic.saveToNamespaces('manipView'), frontend.folders.realObj.getFirstFolderWithName('bob')))


One potentially useful thing from having helper methods explicitly manipulate these namespaces is that the unit test logs/results themselves could have documentation that explains WTF the things in the namespaces actually are.  So if you're looking up the test, you can say: oh, backend.myacct has a "folderAccount" that refers to the IMAP "incoming" part which also has a "opAccount" alias, plus a "smtpAccount" that refers to the SMTPAccount.

=== Stop cramming all contexts into one.

Change up the test execution model and control-flow so that there are separate functions that are invoked in back-end and front-end contexts, etc.  We use toString() here in all cases to screw over lexical scoping.  You only have access to what is in the context.

T.step({
  name: 'blah',
  // we'd run these in order, so it's okay for inFrontend to take an action.  Alternatively,
  // we break these out into prepare and do variants.
  inBackend: function(bectx) {
    bectx.myacct.opAccount.helper.expect_runOp('modtags');
  },
  inFrontend: function(fectx) {
    var msg = fectx.manipView.realObj.items[0];
    fectx.manipView.helper.expect_headerChanges([msg, 'isStarred' true]);
    msg.setStarred(true);
  },
  // we'd have some type of server context too
});

We could potentially then do the realistic thing and actually run the front-end steps in the actual page rather than looping it back to the worker.

An interesting question would be if this isn't insane whether this would also be a better way to write Marionette JS tests somewhat.
I've given this some thought this afternoon. I played around with some
of your ideas from above; I like what you're trying to do with them,
especially the idea of shuttling across test functions into the
frontend/backend contexts explicitly.

I noodled around with some APIs in the style that you describe,
storing things on the context and passing them to the next step. The
problem I kept running into was that the test helpers right now aren't
composable, because of the flat structure of our current test steps.
It became very difficult to try to abstract that in a way that
substantially improved understandability.

I think the only way to really clean up the tests would be to allow
the helpers to be composed, essentially requiring us to do something
different about the way we currently do test steps.

I've come up with a sketch of one possible pseudo-API that attempts to
address those problems, as well as incorporating your suggestions from
above. While I think it's clearer, it does imply removing the notion
that Actors and expectations control the flow of tests (instead using
Promises to indicate completion of tests; more below). So if you're
invested in the current control flow paradiagm, this may not be up
your alley.

An upside is that it would make it much easier to integrate directly
into Mocha tests, and it keeps all of the ArbPL goodness except for
test control flow.

I haven't really given a lot of thought to what real-world complex
tests would look like with this framework. The upside is that it
allows composability, so we could hide stuff behind helper methods
without bounds. It's promises all the way down.

Of course you may have practical and reasonable reasons why we should
instead stick to something more resembling the current architecture,
which is fine by me; in thinking about this problem, it was easier for
me to sketch out a prototype API than to remain bounded by the
limitations of reality and practicality.



This gist contains the code sample as reproduced below:
https://gist.github.com/mcav/9220539



/**
 * This is a sketch of one alternative implementation of the GELAM
 * test harness. At a high level, it does three things:
 *
 * 1. It still uses the ArbPL/Loggest framework of loggers and
 *    expectations (mocked out below with slightly different naming).
 *
 * 2. It replaces the serial "test step" framework with a flow based
 *    on Promises.
 *
 * 3. It explicitly shuttles functions to frontend/backend contexts.
 *
 * Tests run in an undetermined context -- _not_ the frontend, not the
 * backend, not the server. You shouldn't have any access to any of
 * those things at test-runner level. Instead, you must 'shell out' to
 * the frontend/backend/server explicitly. This is done via
 * Function.toString() so as to avoid variable leakage.
 *
 * Instead of serial 'test steps' as before, test flow is controlled
 * through promises. The function that runs in the other context
 * should return a promise; to exit the context, you resolve the
 * promise as you would in normal JS code. In this way, a test step
 * for GELAM would follow an execution flow similar to those in Gaia;
 * a test's final 'done' callback could just be hooked onto the end of
 * the promise chain.
 *
 * Removing these serial test steps means we must have an alternate
 * way to keep logging from getting all mushed together:
 *
 * To keep the detailed level of logging with ArbPL, we use a special
 * type of promise called a LoggerPromise. This promise resolves
 * normally, like any other promise. However, it does two additional
 * things: one, it allows you to grab references to Actors; two, it
 * itself acts as a lazyLogger for convenience.
 *
 * When the LoggerPromise would resolve normally -- for instance,
 * after doing work in the Frontend context -- it checks its
 * expectations. If it succeeded, the promise resolves as normal and
 * the test continues. But if an actor's expectations went unmet, it
 * fails the Promise, logging and cascading the error as we would in
 * our current infrastructure. To mimic our current notion of a test
 * step or test group, the LoggerPromise could be named.
 *
 * In theory, this preserves a hierarchy of test steps. You can still
 * define a set of expectations that you must complete before moving
 * on to the next step, but you aren't restrained to one flat serial
 * level. Similarly, these tests could integrate seamlessly with the
 * Marionette tests, because we're explicit about which parts of the
 * code get run in different contexts.
 *
 * Example of sorts follows.
 *
 */

mail.onFrontend(function(frontend) {
  // Create a LoggerPromise, which is really just a Promise.
  var p = new frontend.LoggerPromise('Opens a Folder');
  // You can grab a reference to an Actor from a LoggerPromise,
  // and specify things that the actor must log.
  p.actor('ImapFolderConn').mustLog('syncDateRange');
  // Alternately, p could simultaneously be a lazy logger:
  p.mustLog('some thing');
  frontend.MailAPI.viewFolders(function(folders) {
    p.log('some thing');
    // Rather than implicitly completing a "step" when all
    // expectations have been resolved, the LoggerPromise will
    // explicitly raise hell if expectations are unmet when the
    // promise would have otherwise normally been resolved, here:
    p.resolve(folders);
  });
  return p;
}).onBackend(function(folders) { // << JSONified through
  // In this function, the result of the promise has been shuttled
  // through to the server context. You could then instantiate more
  // LoggerPromise expectations, etc.
}).then(function() {
  // Or we could run stuff in the testing context, e.g. if we're
  // running in Marionette and we want to do things in la-la land.

  // We know the complete test has finished after we've resolved the
  // final promise; the test is done here.
}).then(testDone, testDone);
Yes!  I like what you propose super-much.  Specifically awesome:

- This makes it clear when a step ends from reading the code.  The whole concept of an active actor and having to explicitly specify it in the test step was magic and annoying.

- A whole family of foot-guns is avoided thanks to the data-flow:
-- Since variables aren't closed over, there's no snafu with referencing something at step creation-time versus run-time.
-- Because (as I read it), you are specifically requiring the frontend step of the test to complete and then hand off to the back-end part, it's much harder to screw up and have a front-end based condition and a back-end condition that potentially race.

- I think you're capturing the good bits of writing expectations against loggers while ditching the brittleness.  The current "you must explicitly specify everything that gets logged in sequence or explicitly ignore, and we do have set-matching if you want to be risky" is a huge hassle.  I think explicit opt-ins to say mustLog/mustNotLog/mustNotLogAnything should get us at least 80% of the way.  And when hardcore logging constraints are required, we can still have those.

- This is oriented towards composition which should help avoid the giant th_main per-actor file soup where there's really way too much going on in the one file.


== Agreeing suggestions:

- The LoggerPromise could potentially have something like resolveWhenLogged() or callWhenLogged(function() {}) for those cases where there's something going on that doesn't have an explicit callback we can easily use/interpose.


== Real-world complex tests:

A lot of the complexity of tests has already been shoved down into th_main.js.  The high-level do_* helpers can be wrapped/superficially transformed to conform to using the promises without a rewrite.

In my mind, the most terrifying things going on right now in th_main are:
- There's a lot of "expect this if this flag, but not on POP3, and not if this flag, etc."
- The logic related to maintaining a representation in the test framework of what we think the client should know versus what we think is on the server, etc. is not trivial and it has its tendrils in many places.  It's an example of where aspect-oriented programming would not be horrible.

The combinatorial flag-splosion can likely be made better by:
- Reducing the strictness of our logging checks and just leaving it up to the tests that care to be more explicit.
- Finish out the test helper that says "I just want to be in a folder with these messages and I don't care about how many messages get synced/flags changed/etc.".  Right now the folder creation/message addition/syncing is its own boilerplate that most tests just don't care about.


== Potential new foot-guns:

- Promises are not a panacea, especially for understanding when things go wrong.  I did some visualization/debugging stuff for Q back in the day (http://www.visophyte.org/blog/?p=652) and the way promises would be used to resolve other promises generated poor debugging info.  The great news, of course, is that devtools are on top of this, and if anyone can address it (and ideally it's people who are not us addressing it), it's the devtools!  So this is the right call, just want to recognize the new dominant footgun explicitly.


== While we're at it... simplify the loggers 

Currently the loggers require you to define a schema ahead-of-time.  This:
- Lets us explicitly specify that a set of arguments should be discarded for privacy reasons and requires no work at the call-site.  This gives us our 3-tier logging: no logging, private-information excluding logging that can probably be shared on a bug, private-information including logging.
- Lets our logger use look like function calls: bob.foo(a, b, c) instead of bob.log('foo', a, , b, c)
- Is/allows for somewhat of a telemetry/state-tracking system.  Because we maintain a formal logger hierarchy and state can be latched, at runtime you can look at the logger hierarchy and theoretically see the status of our connections, error counts/etc.  I suspect this was important to me when thinking about us running a potentially large network of deuxdrop servers.
-- The original architecture was designed to allow code-generation to make this especially efficient; the current mechanism relies on closing over parameters to achieve the same functional result but is not going to get the polymorphic-inline-cache benefits unless the JS engine is absurdly clever.
-- Note: the formal hierarchy is a bit of a pain for our circular logger implementation and if we just want to stream log results 


I think it's worth adopting a more traditional log4blah-type solution that produces a stream of simple JSON objects.  Something like https://github.com/flatiron/winston or https://github.com/trentm/node-bunyan or really anything that has an existing ecosystem of logging reporters and consumers.

Examples first because my reply is very long and dry.

  account.log('runOp:begin', { mode: mode, type: op.type, err: null, pvtOp: op });
  account.log('runOp:end', { mode: mode, type: op.type, err: err, pvtOp: op });
  account.log('createFolder', { pvtName: folderName });
  account.error('folderAlreadyHasConn');
  // bunyan-style mixing in of some info.  no magic implied right now
  imapConn.logId = magic.makeMeAOneUpIdOrSomething();
  imapConn.log = account.log.child({type: 'ImapConnection', id: imapConn.logId });
  imapConn.log('born', { owner: account.logId });

Key things:

- Do use explicit priorities, at least log/warn/error.

- Do emit ownership information that could be automatically used to help us know what a logger is associated with.  We could fold this in with a little bit of magic or at least a helper method.

- Recognize that whenever we turn any logging on, we're not particularly concerned about performance.  Traversing over the object to filter out the private stuff isn't all that difficult or expensive.  So a prefix is a potentially workable solution and is arguably more self-explanatory than a second object dict.  Other options would be 'censor', etc.

- A lot of these logging frameworks also potentially include string formatting to present human readable strings.  I don't think we need or want that.

- Ideally the implementation would be fairly simple and we could use it in the front-end too, even if it that's more of a pared down lookalike version and we just require that the front-end only ever log public stuff and it just uses dump() to stdout like we do now.  As someone who trawls a lot of our logs, I think machine-parseable log output from the UI would be handy.
Awesome, I agree 200% with all of that. It seems like it will make the tests a lot more approachable and flexible, too.

When do we get to start? :-)
While getting the cronsync tests going on bug 987931 I think I've determined we probably also want some standard idiom for stalling/intercepting a call or otherwise verifying a control-flow/"interestingness" barrier.

For example, in cronsync, we have a bug where we aren't waiting for a save to complete.  One of the important things the test is doing is intercepting the save call so that we can make sure that the sync process doesn't keep going.  I've also done some other simple dirty tricks like stealing the mutex for folders so I can control when those things get going.


Now is the time to look at the attached image.  It's an attempt to visualize the general control flow of a cronsync operation.  The cluster boxes represent high-level semantic operations we can trigger.  Inside these boxes can be yet more semantic operations.

What I try and do by trickery in bug 987931 is to slice things up so each of the smaller boxes happens in one or more test steps.  And in many ways that's what the idea of the whole test framework is/was.  Break all complex stuff up into sanely-sized test steps where the logs are somewhat human comprehensible.

It would be nice in our newer/better world to not need to engage in as much trickery, or to standardize it somewhat more.

When working on jstut/narscribblus I attempted to accomplish at least the understanding aspect of it by using labeled promises which could be logged and then visualized.  Pictures and stuff can be seen at http://www.visophyte.org/blog/?p=652


I haven't totally thought it through, but I think perhaps we could accomplish what I'm after and clean things up by doing something similar with promises.  Their very nature makes it possible for a testing framework to arbitrarily meddle.

For example, saveAccount could do (pseudocode):
  new LoggedLabeledPromise(accountLogger, 'saveAccount', ...)
and then testing code could do:
  p.firstAccount.interceptPromise('saveAccount').then(function(intercepted) {
    console.log('they said!', intercepted.resolvedWith);
    intercepted.okayResolveItForRealsYo();
  });


It might also be possible to have a semi-magical "barrier" helper which does the following:
- intercepts a promise
- makes sure nothing happens, or a certain class of things don't happen, possibly specifically enumerated
- there's the question of "how long is long enough to make sure nothing has happened?".  Options include:
  - be diligent about tracking outstanding timers/IndexedDB queries/other async APIs that might still be working in the background
  - rely on the execution container to tell us that the coast is clear; aka our cpu load has hit zero and we have no pending I/O/etc.
  - just wait for a few seconds.  we only do this when running "full"/"serious" testing mode or something.  Most changes to the code are unlikely to cause the type of breakage the barriers would be needed to detect, so even requiring a human doing a review to run with the knob turned up isn't that bad.


Anywho, more for me to think on this, and :mcav, I'm most definitely interested in your thoughts on this!
Assignee: nobody → m
Attachment #8580437 - Flags: review?(bugmail)
Comment on attachment 8580437 [details] [review]
[gaia-email-libs-and-more] mcav:logic-tests > mozilla-b2g:master

Fantastic work.  Some small requests in the pull request, most of which can just be follow-ups.  I trust you to address this all so r=asuth now, but feel free to re-flag if there's something you really want my eye on before landing.
Attachment #8580437 - Flags: review?(bugmail) → review+
Comment on attachment 8580437 [details] [review]
[gaia-email-libs-and-more] mcav:logic-tests > mozilla-b2g:master

A few more changes that I'd like you to review, partially for sanity but also so you can see the changes before they go in; you can just view the latest commit: https://github.com/mozilla-b2g/gaia-email-libs-and-more/commit/a6618d7c136027c3606285b876a80ebf99f13789

I went back over the list of comments now and I think I addressed everything; all of the suggested followup bugs were small enough to warrant just including here.
Attachment #8580437 - Flags: review+ → review?(bugmail)
Attachment #8580437 - Flags: review?(bugmail) → review+
Backed out due to b2g-inbound failing[1] on esprima r.js parsing. Note that the gaia-try run was green[2], as well as local runs, indicating something is not the same on b2g-inbound.

Investigating with :kgrandon now.

[1] https://treeherder.mozilla.org/logviewer.html#?job_id=1632837&repo=b2g-inbound

[2] https://treeherder.mozilla.org/#/jobs?repo=gaia&revision=cd69c93b1764a46cf0b17ff050f8896c0c217850
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 8588454 [details] [review]
[gaia] mcav:refactor-gelam > mozilla-b2g:master

r+ for autolander
Attachment #8588454 - Flags: review+
Keywords: checkin-needed
Attempting to reland now that https://bugzilla.mozilla.org/show_bug.cgi?id=1087811 (fixing the minification issue directly, rather than the b2g-inbound bug) has landed.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Backed out again due to the same failure:

https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/p8LPMPwuSfWVp1_y22f2Nw/2/public/logs/live_backing.log

Not sure what's goin on, need to talk to kgrandon/jlal/jrburke.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #8589260 - Flags: review+
Keywords: checkin-needed
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: