Output our test/device status mapping into a table

RESOLVED FIXED

Status

defect
P2
normal
RESOLVED FIXED
6 years ago
4 years ago

People

(Reporter: zcampbell, Assigned: Bebe)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 5 obsolete attachments)

Reporter

Description

6 years ago
We have a little bit of a problem with test transparency in that a lot of people don't realise what test coverage we have and when/where it is enabled.

It would be great to have some kind of public facing 'status' page which shows all of our tests and then a matrix of what is enabled and where.

This will enables devs to know how much then can trust Travis and TBPL when doing check-ins; whether their pull request will be covered.

I am envisaging (although open to ideas) a table with rows for each test (grouped by Class) and columns for "Hamachi" , "Travis", "TBPL" and perhaps emulator in the future.

In each cell would be some kind of colour legend to match the HTML report:
Green - enabled
Red - expected fail
Yellow - Skipped 

At the top of the page should be a broad summary of how the tests are run, for example summarizing the restart/cleanUp/fresh profile at the start of each new test and perhaps the capabilities of each device.

We can use the ManifestDestiny parser to parse the manifest.ini files with Python:
https://mozbase.readthedocs.org/en/latest/manifestdestiny.html

This *needs* to be 100% automated as the state of the tests changes way too much on each daily basis for it to be maintainable. However we can reach the 100% automation in steps if need be.

I am not sure where we can store this online just yet, we need a webqa server somewhere (aws?).
Assignee: nobody → florin.strugariu
Posted file Is this a good start? (obsolete) —
Attachment #8344780 - Flags: feedback?(zcampbell)
Attachment #8344780 - Flags: feedback?(stephen.donner)
Attachment #8344780 - Flags: feedback?(dave.hunt)
Reporter

Comment 2

6 years ago
Comment on attachment 8344780 [details]
Is this a good start?

Hell yeah that's a really good start Bebe!

test_clock_set_alarm_time seems wrong, I know that test is disabled but for Hamachi it says "Running". It looks like "Expected to" and "Hamachi columns are mixed up.
We don't need "Expected to" anyway as it's device specific.

Also change the "class" column to be "App" I think that's more appropriate as that's how we group the tests.

I think we should change the tense of the labels too:
Running = Enabled
Failed = Expected to fail

Do you think disabled should be red colour? I think red is more serious; it should be disabled.
Attachment #8344780 - Flags: feedback?(zcampbell)
Oh my, yes.  This, please.  Stat.  It'll at least help me be a /tad/ bit more sane :-\  Great, great work; TY SO MUCH.
(In reply to Zac C (:zac) from comment #2)
> Comment on attachment 8344780 [details]
> Is this a good start?
> test_clock_set_alarm_time seems wrong, I know that test is disabled but for
> Hamachi it says "Running". 
I think you are wrong on this Zac:
https://github.com/mozilla-b2g/gaia/blob/master/tests/python/gaia-ui-tests/gaiatest/tests/functional/clock/manifest.ini#L21
Also what do you guys think if we would add a "reason" tag in the manifest files?

This would help us better identify why the tests have some marks 
It would look something like this:

[test_cost_control_data_alert_mobile.py]
skip-if = device == "desktop"
wifi = false
expected = fail
reason = Bug 942730 - The send_keys() method of input element (type is number) will raise MarionetteException
Reporter

Comment 6

6 years ago
(In reply to Florin Strugariu [:Bebe] from comment #4)

> I think you are wrong on this Zac:
> https://github.com/mozilla-b2g/gaia/blob/master/tests/python/gaia-ui-tests/
> gaiatest/tests/functional/clock/manifest.ini#L21

It says there "disabled=" ?
Reporter

Comment 7

6 years ago
(In reply to Florin Strugariu [:Bebe] from comment #5)
> Also what do you guys think if we would add a "reason" tag in the manifest
> files?
> 

I agree this information would be very useful in this report but we should probably fix the inconsistency in the framework rather than do a hack job on the manifest files.

The inconsistency is where we have:
disabled = <reason>
but for expected fail, for example, we have to do:
# <reason>
expected = fail
Comment on attachment 8344780 [details]
Is this a good start?

Looks great, but I'm finding it difficult to follow. You have an 'Expected To' column but these are often conditional. Shouldn't this be incorporated into the target columns?

I would expect something like:

-------------------------------------------
|      | Device          | Desktop        |
|      |-----------------|----------------|
| Test | Inari | Hamachi | Travis | TBPL  |
-------------------------------------------
| Name | State | State   | State  | State |
-------------------------------------------

Where 'State' would be one of:
* Enabled
* Disabled
* Skipped
* XFailed

I think it would also be worthwhile allowing State to be empty, or '--' to indicate that it is not included in the manifest at all.

I don't think we need the test class, but would be good to have a tooltip when hovering over the test name for the full path.

Let's exclude emulator for now as we're not currently running any tests against it.
Attachment #8344780 - Flags: feedback?(dave.hunt) → feedback+
(In reply to Florin Strugariu [:Bebe] from comment #5)
> Also what do you guys think if we would add a "reason" tag in the manifest
> files?
> 
> This would help us better identify why the tests have some marks 
> It would look something like this:
> 
> [test_cost_control_data_alert_mobile.py]
> skip-if = device == "desktop"
> wifi = false
> expected = fail
> reason = Bug 942730 - The send_keys() method of input element (type is
> number) will raise MarionetteException

This wouldn't work if we had different reasons for skip-if and fail-if. I agree with Zac that the best solution would be to enhance ManifestDestiny to allow specifying a reason for expected, skip-if, and fail-if. This is bug 922581.
I forgot to suggest in comment 8 that the disabled reason would work well as a tooltip on the state so we can use the full reason without compromising the layout. It would also be a good idea to include a couple of sentences before the report to help people using it for the first time.
Reporter

Comment 11

6 years ago
I like the extra Device/Desktop distinction there but I reckon we can give Inari a miss for now, it's not really part of our target audience. However in the future there may be two devices supported so we have to keep that in mind to be able to add that.

+1 tooltip when not blocked by 922581 :(
(In reply to Zac C (:zac) from comment #11)
> I reckon we can give Inari a miss for now

Sure, makes sense to me.
Comment on attachment 8344780 [details]
Is this a good start?

+1 for comment 8
Attachment #8344780 - Flags: feedback?(stephen.donner) → feedback+
Attachment #8344780 - Attachment is obsolete: true
Posted file report.html (obsolete) —
Take a look and add suggestions
If this is OK we should figure it out where to have it?
Attachment #8349505 - Flags: feedback?(zcampbell)
Attachment #8349505 - Flags: feedback?(stephen.donner)
Attachment #8349505 - Flags: feedback?(dave.hunt)
Attachment #8349505 - Attachment mime type: text/plain → text/HTML
Comment on attachment 8349505 [details]
report.html

Very cool Bebe, I can see a couple of tests that should be added to the TBPL manifest from this!

When I hover over the test name I'm expecting to see the full path of the test, but at the moment it looks like it's showing the parent directory. Could you fix this?

Also, there's no tooltips for the states as far as I can tell. I should be able to hover over a 'disabled' state to see the reason for it being disabled. Perhaps some don't have reasons? Maybe we should indicate those with reasons by adding a * or similar.

I would make 'disabled' grey instead of red. To me, red would indicate something is failing and needs action. We could promote xfail to red, as that's the one most likely to need attention.

I think the next step would be to attach the code that generates this report, and we can review that.
Attachment #8349505 - Flags: feedback?(dave.hunt) → feedback+
Reporter

Comment 16

6 years ago
Hey what happened to the App column? 

I said red for disabled because disabled denotes a test that should be running but it's blocked by either a Gaia or Marionette bug, but this test should be considered part of the suite. It also needs attention for that test to be re-enabled if possible and red draws your attention to that in the way grey does not.

I guess that kind of means the same as xfailed -they both should be enabled but are blocked from passing by something that can be fixed. 
Where as grey means it's not running just because it's just not expected to on that platform.

Both disabled and xfailed should be red?
(In reply to Zac C (:zac) from comment #16)
> I said red for disabled because disabled denotes a test that should be
> running but it's blocked by either a Gaia or Marionette bug, but this test
> should be considered part of the suite. It also needs attention for that
> test to be re-enabled if possible and red draws your attention to that in
> the way grey does not.

As you mention below, it could also be disabled because it's not appropriate to the target. For example, we disable carrier tests when running against the desktop build.

> I guess that kind of means the same as xfailed -they both should be enabled
> but are blocked from passing by something that can be fixed. 
> Where as grey means it's not running just because it's just not expected to
> on that platform.
> 
> Both disabled and xfailed should be red?

There would be a lot of red :)
Reporter

Comment 18

6 years ago
(In reply to Dave Hunt (:davehunt) from comment #17)
> 
> As you mention below, it could also be disabled because it's not appropriate
> to the target. For example, we disable carrier tests when running against
> the desktop build.
> 

I guess this is difficult because we're not using the manifest file properly for desktopb2g. We're just skipping the whole test rather than skipping it on the command line with the type parameters.

Thus there is not enough distinction between a test that can't run on desktopb2g and one that is disabled because of a blocking bug. It is useful to bring the latter to attention here.
Attachment #8349505 - Attachment is obsolete: true
Attachment #8349505 - Flags: feedback?(zcampbell)
Attachment #8349505 - Flags: feedback?(stephen.donner)
Attachment #8350023 - Flags: review?(zcampbell)
Attachment #8350023 - Flags: review?(dave.hunt)
Posted file This is how it would look like (obsolete) —
hope all looks OK now
Attachment #8350025 - Flags: feedback?(zcampbell)
Attachment #8350025 - Flags: feedback?(stephen.donner)
Attachment #8350025 - Flags: feedback?(dave.hunt)
Attachment #8350025 - Attachment mime type: text/plain → text/HTML
(In reply to Zac C (:zac) from comment #18)
> I guess this is difficult because we're not using the manifest file properly
> for desktopb2g. We're just skipping the whole test rather than skipping it
> on the command line with the type parameters.

I wouldn't say we're misusing the manifest here. You're right that there are at least two different reasons for disabling. Either the test is failing and we don't want to run it or xfail it (correct me if I'm wrong, but probably because it's intermittently failing), or it's not appropriate for the target. At the moment I don't think there's a good way to differentiate these. The latter are usually disabled via a skip-if, but I'm not sure I'd want any reporting to be based on that assumption.

> Thus there is not enough distinction between a test that can't run on
> desktopb2g and one that is disabled because of a blocking bug. It is useful
> to bring the latter to attention here.

Fair enough.
Comment on attachment 8350025 [details]
This is how it would look like

It's looking good, Bebe. I'll continue by reviewing your pull request.
Attachment #8350025 - Flags: feedback?(dave.hunt) → feedback+
Comment on attachment 8350023 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/gaia/pull/14845

See pull request for comments. It would also be good to initially sort the tests, and we can remove the 'State' table cells. These were meant to represent values in comment 8, not column headers.
Attachment #8350023 - Flags: review?(dave.hunt) → review-
Comment on attachment 8350023 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/gaia/pull/14845

Updated
Attachment #8350023 - Flags: review- → review?(dave.hunt)
Comment on attachment 8350023 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/gaia/pull/14845

I'm going to defer to Zac as he's more familiar with the requirements. I don't think all of my comments from the previous review were addressed but it's really difficult to tell when the commits are rebased!
Attachment #8350023 - Flags: review?(dave.hunt)
Comment on attachment 8350023 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/gaia/pull/14845

I made some changes but I don't think it's ready can you take a look pls
Attachment #8350023 - Flags: feedback?(bob.silverberg)
Bebe, are you going to revisit this and generate a json file instead?
Reporter

Comment 28

5 years ago
Comment on attachment 8350025 [details]
This is how it would look like

I'm really happy with how this looks now. f+!
I expect we'll find some issues in real world use so I'm keen to get it up somewhere like the WebQA dashboard for now and then we can refine it.

bsilverberg also has some ideas for the data/backend of it I understand.

Let's go to the next step!
Attachment #8350025 - Flags: feedback?(zcampbell)
Attachment #8350025 - Flags: feedback?(stephen.donner)
Attachment #8350025 - Flags: feedback+
Flags: needinfo?(bob.silverberg)
Reporter

Comment 29

5 years ago
Comment on attachment 8350023 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/gaia/pull/14845

I think use `device` instead of `hamachi` so the variable names don't rot.
Attachment #8350023 - Flags: review?(zcampbell) → review-
Comment on attachment 8350023 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/gaia/pull/14845

Looks good to me except for a couple of nits. Note that we still need a way of publishing this.

Even though it doesn't match the format, we could still add it to the MozWebQA dashboard as a tab.
Attachment #8350023 - Flags: feedback?(bob.silverberg) → feedback+
Flags: needinfo?(bob.silverberg)
I rebased the commit and now it looks OK.
It should work in jenkins with the command:

python manifest.py --manifest=<Path to manifest> --tbpl_manifest=<path to tbpl manifest> --output=<path where we want the generated html>
Bebe, what's left to do here?  Publish, still?  Bob, can you or someone else help with that?
Flags: needinfo?(florin.strugariu)
yes we need to publish this somewhere and create a job or something in Jenkins 

I could try to do it but don't know where to publish it
Flags: needinfo?(florin.strugariu)
Yes, we can use the same approach that we are using for publishing the other dashboard data extracts. It sounds like this one could use a cron job on a separate machine, as is currently being done for the xfails data. Raymond Would be the one to set that up. If we need it to run on Jenkins (I don't think we do) then it could use the same technique that we are going to use for the Marketplace-Tests dashboard data. Raymond is going to set up a job for that.

Pinging Raymond for both of these tasks.
Flags: needinfo?(mozbugs.retornam)
(In reply to Bob Silverberg [:bsilverberg] from comment #34)
> Yes, we can use the same approach that we are using for publishing the other
> dashboard data extracts. It sounds like this one could use a cron job on a
> separate machine, as is currently being done for the xfails data. Raymond
> Would be the one to set that up. If we need it to run on Jenkins (I don't
> think we do) then it could use the same technique that we are going to use
> for the Marketplace-Tests dashboard data. Raymond is going to set up a job
> for that.
> 
> Pinging Raymond for both of these tasks.

What commands should the cron job run?
Flags: needinfo?(mozbugs.retornam)
Bebe: Can you please get Raymond the info he needs? As I understand it, the job will need to clone the repo from Github, in order to have access to the manifest and the report-generating code. Then it will need to run the code to generate the report, after which the report html file will need to be pushed to Github, just like what is currently being done with the xfails dashboard data.

I will likely edit the html at some point to make it fit in a bit nicer with the current dashboard, but that can be done at any time after the job has been set up.
Flags: needinfo?(florin.strugariu)
Yes Bob those steps are correct.

1. Clone the gaia repo
2. Run the code in the pull request:
python manifest.py --manifest=<path to manifest file> --tbpl_manifest=<path to tbpl manifest> --output=<path to output file>
3. The script will create a html file in the path specified
4. Upload html file to hosting server
Flags: needinfo?(florin.strugariu)
Guys can you hep getting this in the dashboard
Flags: needinfo?(mozbugs.retornam)
Flags: needinfo?(bob.silverberg)
Raymond, this sounds like a good candidate for Dylan, as he has already done some work on the dashboard.
Flags: needinfo?(bob.silverberg)
(In reply to Bob Silverberg [:bsilverberg] from comment #39)
> Raymond, this sounds like a good candidate for Dylan, as he has already done
> some work on the dashboard.

I have a few questions.

I tok a quick look at https://github.com/mozilla-b2g/gaia/pull/14845/files which has some Hamachi specific stuff. Will this have to be modified if we switch everything over to the FLAME devices? Does this have coverage for  the Tarako devices that we have?
Flags: needinfo?(mozbugs.retornam) → needinfo?(florin.strugariu)
Raymond yes this needs update to work on Flame. I can update tomorrow if you gays want to.

There is no coverage for Tarako or any other devices.
Flags: needinfo?(florin.strugariu)
QA Whiteboard: [fxosqa-auto-backlog+]
QA Whiteboard: [fxosqa-auto-backlog+] → [fxosqa-auto-s4]
I updated this
Attachment #8350023 - Attachment is obsolete: true
Attachment #8523954 - Flags: review?(gmealer)
Attachment #8523954 - Flags: review?(dave.hunt)
Demo result
Attachment #8350025 - Attachment is obsolete: true
Comment on attachment 8523957 [details]
manifest_status.html

Geo can you take a look over this?

What other info should we have here?
Attachment #8523957 - Flags: feedback?(gmealer)
Comment on attachment 8523957 [details]
manifest_status.html

Looks pretty good to me as it is. I don't know this is the sum total of everything we'll ever want, but I'm a fan of getting this started with a basic set of info (like this) then adding incrementally if/when we need to.
Attachment #8523957 - Flags: feedback?(gmealer) → feedback+
It makes me wonder about the difference between disabled =, skip-if = and fail-if =.
I guess with fail-if =, one can still run that test, but then the result will be ignored.
But disabled = equals to skip-if = True, right?
(In reply to Martijn Wargers [:mwargers] (QA) from comment #46)
> It makes me wonder about the difference between disabled =, skip-if = and
> fail-if =.

disabled skips the test and provides a reason for the test being disabled.
skip-if skips the test if the logic resolves true.
fail-if marks the test as expected to fail if the logic is true.

> But disabled = equals to skip-if = True, right?

Yes, but without the ability to provide a reason. I would avoid using skip-if = True.
QA Whiteboard: [fxosqa-auto-s4] → [fxosqa-auto-s4+]
QA Whiteboard: [fxosqa-auto-s4+] → [fxosqa-auto-s4]
Comment on attachment 8523954 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/gaia/pull/26196

The report looks okay but I'm not sure I'm the right person to review. Who is going to be the consumer of this report? Also, how often are we going to produce this report and where are we planning on generating and hosting it? If it was something we could include with our readthedocs then that might be useful.

That said, this does seem to overlap a bit with Test Informant [1] and there's already bug 1085402 for adding Gaia UI tests to this.

[1] https://wiki.mozilla.org/Auto-tools/Projects/Test-Informant
Attachment #8523954 - Flags: review?(dave.hunt) → review-
Comment on attachment 8523954 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/gaia/pull/26196

Updated
Attachment #8523954 - Flags: review- → review?(dave.hunt)
Test Informant may be the better long-term solution. 

In my discussion with Bebe, I was really looking for something that we could even just run ad-hoc periodically when we wanted to get a summary of the current state, or that we could park somewhere on a cron job. 

I wasn't ready to consider--and don't really think we should block on--what this would look like in a full workflow, nor do I think this has to be officially developed, approved, and accepted etc. I honestly expected something closer to a zipfile with a tool than a Bugzilla entry with a review request. :)
Comment on attachment 8523954 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/gaia/pull/26196

I'm r+'ing this, modulo my comments, neither of which expressed required changes.

However, I'm considering this as an ad-hoc tool when I r+. If the intention is to fold it into an official workflow, we need to consider Dave's points above. I'd personally like to see us roll out something quick and easy up front though.
Attachment #8523954 - Flags: review?(gmealer) → review+
Comment on attachment 8523954 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/gaia/pull/26196

(In reply to Geo Mealer [:geo] from comment #51)
> However, I'm considering this as an ad-hoc tool when I r+. If the intention
> is to fold it into an official workflow, we need to consider Dave's points
> above. I'd personally like to see us roll out something quick and easy up
> front though.

I agree with Geo here. Perhaps zip up what you have and attach it to this bug. Anyone wanting to generate the report can download the attachment and run the script to generate the report. If you want it version controlled then perhaps create a gist on github, or maintain it in a fork of gaia.

That said, if you do want to land it in the main gaia repository then I think we at least need documentation for this, but I'm happy for Geo to be the reviewer.
Attachment #8523954 - Flags: review?(dave.hunt)
Comment on attachment 8523954 [details] [review]
Link to Github pull-request: https://github.com/mozilla-b2g/gaia/pull/26196

I moved the code in a separate git repo 

https://github.com/bebef1987/gaia-UI-tests-status-mapping
Attachment #8523954 - Attachment is obsolete: true
Geo can you take a look over the code and see if I need to make any improvements
Flags: needinfo?(gmealer)
I added comments to your first commit. TLDR: looks good, though I had a couple of suggestions for making it a little more robust and clear. They're all minor though.
Flags: needinfo?(gmealer)
QA Whiteboard: [fxosqa-auto-s4] → [fxosqa-auto-s4+]
QA Whiteboard: [fxosqa-auto-s4+] → [fxosqa-auto-from-s4] [fxosqa-auto-s5]
Flags: needinfo?(gmealer)
This looks terrific! Let's generate another ad-hoc one for the beginning of the year.
Flags: needinfo?(gmealer)
This is done and works as expected.
The code can be found in:
https://github.com/bebef1987/gaia-UI-tests-status-mapping

If we need any updates please open git hub issues
Status: NEW → UNCONFIRMED
Ever confirmed: false
Summary: Output our test/device status mapping into a table and make it public → Output our test/device status mapping into a table
Status: UNCONFIRMED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.