1255802 - Reftests don't produce stable test ids in structured logs

Reporter

Description

•

8 years ago

This makes the logs rather useless for many purposes e.g. autoclassification will be ineffective for reftests, and I was unable to compare results between e10s and non-e10s builds.

The main issue seems to be the use of absolute paths, rather than relative, platform-normalised, paths, and the use of random port numbers in localhost urls.

(not currently active) Ted Mielczarek

Comment 1

•

8 years ago

This is a little goofy as currently implemented:
https://dxr.mozilla.org/mozilla-central/rev/102886e9ac63b3fa33a6f1b394aea054abce2dfd/layout/tools/reftest/reftest.js#1065

The tests are always treated as URLs, and the paths in the manifests are resolved relative to the manifests, which are also treated as URLs. The manifest paths get from the Python harness into the JS harness by way of the reftest.manifests pref:
https://dxr.mozilla.org/mozilla-central/rev/102886e9ac63b3fa33a6f1b394aea054abce2dfd/layout/tools/reftest/runreftest.py#271

And that gets passed down from mozharness here:
https://dxr.mozilla.org/mozilla-central/rev/102886e9ac63b3fa33a6f1b394aea054abce2dfd/testing/mozharness/configs/unittests/linux_unittest.py#216

We'd have to also pass in that "tests/reftest/tests/" bit of the path as the root test directory, and pass it down into ReadManifest so we could resolve the test paths relative to it, then use those as the pretty paths. Sounds like a pain, but it should work.

Kyle Lahnakoski [:ekyle]

Comment 2

•

8 years ago

For me, the solution is to let the ETL pipeline deal with the translation:  We can have the test framework (with absolute URLs), remain decoupled from the analysis database (with normalized URLs).   URLs are probably not the only format issue, and the pipeline can perform all the data scrubbing and cross referencing required to produce the quality required.  Since the pipeline code is stateless and functional, it can be replayed, debugged, and coded faster than either the data producer or the data consumer.  Having an explicit pipeline simplifies the producer; letting the code focus on the task, rather than data munging.  Other consumers benefit too:  The autoclassifcation database does not demand format changes to a data stream being consumed by other systems, like ActiveData.  Consumers can make high level demands about what metrics should be available, but should have no say in the output format. 

Ted does bring up point about "goofy" code:  If it is the wish of the reftest coder to cleanup the output, then that is commendable, and a legitimate reason for changing the output format.  

I am concerned with the tight coupling between our data producers and consumers; leading to producers changing formats for no good reason (other than to (T)ransform the data for some consumer), and every producer knowing about every consumer, and their particular data needs.

Andrew Halberstadt [:ahal]

Comment 3

•

8 years ago

I perhaps agree with this sentiment, but disagree about what is the producer and consumer in this context. IMO, the structured log is the end point here. ActiveData isn't necessarily the only consumer of these structured logs, so it would be good to have the paths normalized before the structured log is written out.

I think we could either produce normalized output from the get-go in reftest.js, or perhaps the python OutputHandler could normalize it after the fact. The former seems slightly preferable.

Andrew Halberstadt [:ahal]

Comment 4

•

8 years ago

Another thing to consider.. We'll still want to keep the 'path' in the raw data. The mozlog protocol has a 'path' attribute on the test_start action we can use. But given reftests potentially require two paths, we might need to turn this into a list?

James Graham [:jgraham]

Reporter

Comment 5

•

8 years ago

Right, we can't make downstream consumers clean up the data because there are so many different consumers. The point of putting requirements on the producers is exactly to reduce the coupling; a consumer should be able to assume that it will get a mozlog file that can be usefully processed without knowledge of the specific testsuites involved (except where they happen to know that the testsuite uses an extension point).

Kyle Lahnakoski [:ekyle]

Comment 6

•

8 years ago

As a human being, The structured logs have perfectly acceptable URLs; I can see past the interesting string prefixes and perform a comparison between platforms. This means the reftest output is just fine for analysis. This discussion on URLs is only superficial reformatting to suit a specific use case for specific consumer. I do not believe consumers should be dictating format.

Normalizing the URLs run the risk of information destruction: What happens to the scheme and path prefixes? Are they just dropped? Is the developer running reftests using those URLs to jump to local artifacts? Are we going to add more properties so we have both a absolute and relative URLs?

I find it suspicious that the URLs are suddenly unacceptable now that we have a new consumer. I fear this is the beginning of significant reformatting, information destruction, and introducing new (unstable) code, without adding value.

On the subject of "structured logs": I see it as an effort to overcome the parsing problems of the text logs; every little effort toward structuring the remaining text will help machine ingestion immensely. Yet, using "structured logs" as the one true test format is less laudable: Different suites are now coupled to each other's format, each forcing values into properties that would not make sense on their own, or producing properties that have no meaning, or dropping inconvenient information, or dumping inconvenient information into a property string. Reftests (and the other tests too) could probably tell us more if it wasn't limited to a limited set of properties. The strict "structured log" format is preventing us from fully structuring the whole log, one suite at a time, so our machines can fully digest them.

Andrew Halberstadt [:ahal]

Comment 7

•

8 years ago

(In reply to Kyle Lahnakoski [:ekyle] from comment #6)
> As a human being, The structured logs have perfectly acceptable URLs; I can
> see past the interesting string prefixes and perform a comparison between
> platforms.  This means the reftest output is just fine for analysis.  This
> discussion on URLs is only superficial reformatting to suit a specific use
> case for specific consumer.  I do not believe consumers should be dictating
> format.  

This isn't consumers dictating requirements. A test id should be the same no matter what platform the test gets run on. Running the test on windows instead of linux doesn't make it a completely different test.

> Normalizing the URLs run the risk of information destruction:  What happens
> to the scheme and path prefixes?  Are they just dropped?  Is the developer
> running reftests using those URLs to jump to local artifacts?  Are we going
> to add more properties so we have both a absolute and relative URLs?

Yes, this is what I was talking about in comment 4. We don't want to lose any information.

> I find it suspicious that the URLs are suddenly unacceptable now that we
> have a new consumer.  I fear this is the beginning of significant
> reformatting, information destruction, and introducing new (unstable) code,
> without adding value.

They aren't suddenly unacceptable. Reftest structured logging just landed recently and nothing other than ActiveData is consuming them. It was unacceptable from the start, I just didn't have time to do that part when I landed it. Before structured logging, there were no consumers so we never really had to worry about it.

> On the subject of "structured logs": I see it as an effort to overcome the
> parsing problems of the text logs; every little effort toward structuring
> the remaining text will help machine ingestion immensely.  Yet, using
> "structured logs" as the one true test format is less laudable:  Different
> suites are now coupled to each other's format, each forcing values into
> properties that would not make sense on their own, or producing properties
> that have no meaning, or dropping inconvenient information, or dumping
> inconvenient information into a property string.  Reftests (and the other
> tests too) could probably tell us more if it wasn't limited to a limited set
> of properties.  The strict "structured log" format is preventing us from
> fully structuring the whole log, one suite at a time, so our machines can
> fully digest them.

This is true, and some balance needs to be struck. Certain log actions already have an 'extra' dict that data can be stuffed into. A place to store arbitrary data should only be used as a last resort though. Most of the time, it's easy enough to mold the existing schema to fit whatever new requirements a test harness might have.

James Graham [:jgraham]

Reporter

Comment 8

•

8 years ago

> I find it suspicious that the URLs are suddenly unacceptable now that we have a new consumer

This isn't the case. The output has always been unacceptable compared with the intent of the format (that "test" contains a stable, unique identifier for the testcase) and doesn't work with any existing consumer that wants to compare tests or test results between runs, for example:

* The wptview test result viewer
* The e10s/non-e10s comparison tool
* The autoclassification feature in treeherder

I don't recall if test-informant tries to list which tests have been added or removed, but if it does that will also be affected.

The fact that converting the actual URL loaded into something stable causes information loss simply isn't a huge concern to me. It's generally not possible to control the unstable parts of these urls across runs (e.g. I assume the ports are being picked at random, and the source path is what it is), and there is a great deal of more relevant information that we make no attempt to store (the value of environment variables, the physical hardware being used, etc.). Optimising for the possibility of bugs that happen only when the source is in a specific location, for example, seems unreasonable. Plus the log will generally contain action="log" messages with additional information that a human can use to reconstruct more of the environment.

As far as structured logging not being able to represent everything, it intentionally has defined extension points, and should be forward-compatible enough that we can extend the format as required.

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Bugzilla

Quick Search

Reftests don't produce stable test ids in structured logs

Categories

(Testing :: Reftest, defect)

Tracking

(Not tracked)

People

(Reporter: jgraham, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated