Closed Bug 1287365 Opened 8 years ago Closed 7 years ago

Design for the future of TPS and end-to-end Sync testing

Categories

(Cloud Services :: QA: Test Automation, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kthiessen, Assigned: kthiessen)

References

()

Details

Creating a public bug to bring out the conversation from secure bug 1280716 about how we should best make use of the tests in TPS.

Ryan, Mark, Thom, Stuart:  Please feel free to tag in other folks via cc if you think they would have relevant insights.
Ryan and I came up with a vague plan:

* Add a new test to TPS whose primary concern is to make sure that Sync works in a very general and minimal sense, and have only this single test be run against prod at a schedule yet to be determined. This would probably involve exactly 2 logins each time it is run, so shouldn't be noticeable in prod.

* Run the full TPS suite against a non-prod environment. Ideally we do this after bug 1249520 lands and we could add a cmdline param with the server to use for the config (although if that bug gets blocked for any reason, we could do so without that) Exactly which "non-prod" environment is TBD, but for TPS failures to be noticed and actioned, it would probably need to be very close to the production code and have relatively high availability.

Thoughts?
test_sync.js is a reasonable approximation of "is sync working" AFAIK, but I'm not sure if it's minimal (or thorough) enough to be a good candidate for this. https://dxr.mozilla.org/mozilla-central/source/services/sync/tests/tps/test_sync.js
> Run the full TPS suite against a non-prod environment.

Stage seems like as good a non-prod environment as any, and has the advantage of being as close to production as possible.  :jrgm how would you feel about running TPS against stage rather than prod?

An alternative would be to deploy a dedicated FxA stack just for TPS, but I don't think we need the overhead of keeping yet another stack up and running and up-to-date.
Flags: needinfo?(jrgm)
Blocks: 1219456
Flags: needinfo?(jrgm)
An update here: TPS is now running a very gentle test against Prod once a day around midnight Pacific time.  TPS is also running against Stage thrice daily, around 09:00, 13:00, and 17:00 Pacific.  The Stage test uses the entire suite.  Mail is sent to sync-core@mozilla.com on failures -- repeated failures will generally cause me to disable jobs manually until problems have been cleared up.

All in all, this is pretty hackish automation, but it's groundwork.

Folks are invited to chime in either here or on sync-dev if there are any comments/questions.  If there aren't, I plan to close this bug as RESOLVED/FIXED in one week's time.
Thom suggests moving Stage to once-a-day, at 09:00 UTC.  That config change has been made, and I will await his word that his current changes have been landed and made it into Nightly before I re-enable the Stage job.
Thanks to all, but especially :tcsc, for pushing this over the finish line.  Once we have the QA/TestEngineering Jenkins up and stable, we may move these jobs there, but for now there is no hurry.

Thanks again, everybody.  I hope these jobs prove useful.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.