Closed Bug 1020516 Opened 10 years ago Closed 8 years ago

Mochitest log messages dumps on TBPL lose timing information

Categories

(Testing :: Mochitest, defect)

defect
Not set
normal

Tracking

(firefox52 fixed)

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: drno, Assigned: ahal)

References

Details

Attachments

(4 files, 2 obsolete files)

When SimpleTest.js is used on TBPL is buffers all the log messages instead of printing them right away.
In case of a test failure the buffer content gets dumped into the log file, but the timestamps of these message are all the time of when the mass dumping happens. This not helpful at all in terms of figuring out when the mochitest log statements happened, or how much time passed in between these buffered log messages.
Initial version which adds the original timestamp to the dumped logs messages.
Attachment #8434353 - Flags: review?(hskupin)
Comment on attachment 8434353 [details] [diff] [review]
bug_1020516_add_timestamps_to_buffered_log.patch

Review of attachment 8434353 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not that familiar with dump calls for mochitests. Ted will be a much better reviewer here.

::: dom/media/tests/mochitest/pc.js
@@ +1937,5 @@
>        return n;
>      }
>  
> +    var ex = unexpectedEventAndFinish("Hello World", "exception");
> +    ex();

This is only in for testing, right?
Attachment #8434353 - Flags: review?(hskupin) → review?(ted)
Removed the development code which whimboo pointed out. Thanks!
Attachment #8434353 - Attachment is obsolete: true
Attachment #8434353 - Flags: review?(ted)
Attachment #8434476 - Flags: review?(ted)
Could you give some example log output with this change? (Just want to make sure it doesn't break log parsing)
Here is my test run on Try:
https://tbpl.mozilla.org/php/getParsedLog.php?id=40976200&tree=Try&full=1#error0
Right above that error message.

E.g.
12:19:20     INFO -  721 INFO 12:19:20 TEST-INFO | /tests/dom/media/tests/identity/test_peerConnection_peerIdentity.html | Analyzing element: pcRemote_remote_audiovideo

As it contains multiple time and "INFO" it might confuse log parsers. But obviously we can still change the output format, e.g. put the recorded timestamp at the end of the line, or mark the recorded timestamp with a keyword,...

BTW I'm also not a big fan that the same function converts all TEST-PASS into INFO's. I could try to clean that up as well, but it is not such a big pain point for me right now. But if it helps with the log parsing, let me know
I'm not really wild about having the timestamps duplicated, that seems confusing. I looked into what would be required to swap your saved timestamp for the older timestamp and it seems a little invasive. You'd have to add a timestamp argument to the TestRunner.log function:
http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/tests/SimpleTest/TestRunner.js#210

and then to the LogController.log function:
http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/tests/SimpleTest/LogController.js#60

and the logWithLevel function that it calls, and then into createLogMessage, which is where the timestamp is currently populated.
Sure two timestamps, especially as presented in the current patch are not super nice.
But changing the timestamp of the logged event could also lead to all kind of confusion. I know there is no guarantee that your log is precisely sorted by time, but if you suddenly see log messages with timestamps from several seconds or even minutes ago...

How about we log something like this instead:

12:19:20     INFO -  721 TEST-INFO | /tests/dom/media/tests/identity/test_peerConnection_peerIdentity.html | Analyzing element: pcRemote_remote_audiovideo -- (event time 12:19:20)
I think that bug 886570 is going to make this irrelevant. Ahmed: is my suspicion correct? If so it might be better to just wait for that to land.
Flags: needinfo?(akachkach)
Yes, bug 886570 re-implements the buffering logic on the Python side so we should probably wait for it to land before we make any changes.
Flags: needinfo?(akachkach)
Comment on attachment 8434476 [details] [diff] [review]
bug_1020516_add_timestamps_to_buffered_log.patch

In light of that I'm going to drop this and we can revisit it after that bug lands.
Attachment #8434476 - Flags: review?(ted)
(In reply to Ahmed Kachkach [:akachkach] from comment #9)
> Yes, bug 886570 re-implements the buffering logic on the Python side so we
> should probably wait for it to land before we make any changes.

As this has landed now, can we revisit this per bug 1051840?
Flags: needinfo?(akachkach)
Yes!

So currently, the mochitest logs don't show any timing information (the time you can see before the logs is added by mozharness I believe, which is just the time at which the log was emited by Python).

One thing we could do is use the "time" field present in all logs (== time the log message was created, in the JS side) and the "message" field: it's present for "log", "test_status" and "test_end" actions (so basically all actions we use in mochitest). We could just do the following when dumping buffered messages:

  message['message'] = "{} event time {}".format(message['message'], message['time']))

Note that I'm still working on refactoring the buffering on bug 1055765
Flags: needinfo?(akachkach)
Blocks: 1170176
FYI the incorrect timestamps are still causing the WebRTC team pain. We would highly appreciate if someone could take a serious look at this.
I think duplicate timestamps is likely the way to go. As long as they are only printed for the buffered messages that get dumped on test failure. Alternatively, we could move away from a mozharness imposed log format. I'd prefer the latter, but it is probably very involved.

As a temporary work around, you could download the raw (unstructured) log and pipe it through one of the mozlog formatters that does have timestamps:

  $ pip install mozlog
  $ wget -q -O- <raw log url> | gunzip | structlog format mach

You can find the raw log by clicking the job and looking for "artifact uploaded" in the bottom left of treeherder.
(In reply to Andrew Halberstadt [:ahal] from comment #14)
> I think duplicate timestamps is likely the way to go. As long as they are
> only printed for the buffered messages that get dumped on test failure.
> Alternatively, we could move away from a mozharness imposed log format. I'd
> prefer the latter, but it is probably very involved.

I would prefer the quicker solution, but understand if a better solution is preferred. As long as we have at least a solution at some point :-)

> As a temporary work around, you could download the raw (unstructured) log
> and pipe it through one of the mozlog formatters that does have timestamps:
> 
>   $ pip install mozlog
>   $ wget -q -O- <raw log url> | gunzip | structlog format mach
> 
> You can find the raw log by clicking the job and looking for "artifact
> uploaded" in the bottom left of treeherder.

Thanks. That should help for now :-)
(In reply to Nils Ohlmeier [:drno] from comment #15)
> > As a temporary work around, you could download the raw (unstructured) log
> > and pipe it through one of the mozlog formatters that does have timestamps:
> > 
> >   $ pip install mozlog
> >   $ wget -q -O- <raw log url> | gunzip | structlog format mach
> > 
> > You can find the raw log by clicking the job and looking for "artifact
> > uploaded" in the bottom left of treeherder.
> 
> Thanks. That should help for now :-)

Hmm that does not do much. The output is empty for this log:
https://treeherder.mozilla.org/logviewer.html#?job_id=1574400&repo=mozilla-central

And manually looking at the downloaded raw log does not reveal any of the raw information being saved.
Ah, looks like the blobber uploads were moved to the "Job details" tab. So for jobs that failed, you'll need to switch over to it from the "Failure summary" tab. Look for something called <suite>_raw.log. For the one you linked, the url you'd use is:
http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-central/sha512/324af9fe03f4fface80becd2078dc4a9098409632c02e99ff6a994ed5552b5058032c004355c5a27982e540c6db8b7d0abc047745c6a94d8918653b587abe30e
Attachment #8434476 - Attachment is obsolete: true
So this problem is still causing severe pain for the WebRTC testing. E.g. on Android we are running all mochitests without NSPR and nICEr logs. So if things fail we, just with the output from mochitest, we have no idea about the timing of events. E.g. did one of our events arrived 2min late, probably because the emulator is over loaded.
FYI we are actively discussing to de-activate all WebRTC PeerConnection tests on Android emulator, because it's so hard to debug them.

This feels like trying to fly an airplane blind folded, just listening to the voice warnings from the auto pilot.

As this hasn't moved a single bit I tried to take another crack at it as shown in the attached MozReview. My new idea is to only add the "real time" time stamps to the log lines if the test requested it explicitly. So any pain from these extra time stamps could then be limited to the WebRTC tests, which would all request these extra time stamps.
Now the new structured logging code is really not that easy to understand (the previous patch took me not long to come up with) how/where I could write the added time stamp to the log line. So if someone could please point me to the code where this could be done so that I can add it I would appreciate it.
Bug 1273314 is yet another case where having the actual time of when a log line got created would be really helpful.

Who can help me to point me to the right code to implement this somehow with the new structured logger?
Flags: needinfo?(ted)
Flags: needinfo?(ahmed.kachkach)
Flags: needinfo?(ahalberstadt)
See Also: → 1273314
Sorry this stagnated for so long, must have slipped through the cracks. Nils, does the patch I attached good enough for your case? It logs a timestamp on a new log line, and then all proceeding logs have that timestamp. It only works with a granularity of seconds though, so if this isn't good enough, we can certainly log it for each line.

I think I'd rather log a second timestamp (in a bracket at the end) than introduce a new API to enable timestamps. I agree that buffered logs should just have their timestamp reported automatically.
Flags: needinfo?(ted)
Flags: needinfo?(drno)
Flags: needinfo?(ahmed.kachkach)
Flags: needinfo?(ahalberstadt)
Assignee: nobody → ahalberstadt
Status: NEW → ASSIGNED
Thanks Andrew for responding so quickly.

Proposing a new API just an attempt to get more people on board for getting the original event time stamps somehow into the logs. Any solution without a new API is/would be better.

Granularity of seconds is good enough. In most cases we need to know if 1s or 60s passed by.
I'll be happy if this lands.
Flags: needinfo?(drno)
Summary: Mochitest log messages dumps on TBPL loose timing information → Mochitest log messages dumps on TBPL lose timing information
Blocks: 1307564
Comment on attachment 8798303 [details]
Bug 1020516 - Pass options.quiet into mochitest MessageLogger via constructor,

https://reviewboard.mozilla.org/r/83818/#review85218
Attachment #8798303 - Flags: review?(ted) → review+
Comment on attachment 8798504 [details]
Bug 1020516 - Only buffer mochitest log messages that happen between a TEST-START/TEST-END,

https://reviewboard.mozilla.org/r/83974/#review85232

I had to reason about this a bit, but it seems sensible.
Attachment #8798504 - Flags: review?(ted) → review+
Comment on attachment 8798504 [details]
Bug 1020516 - Only buffer mochitest log messages that happen between a TEST-START/TEST-END,

https://reviewboard.mozilla.org/r/83974/#review85232

I had to reason about this a bit, but it seems sensible.
Comment on attachment 8798505 [details]
Bug 1020516 - Add timestamp messages when dumping buffered messages in mochitest,

https://reviewboard.mozilla.org/r/83976/#review85236

This seems like the best we can do short of plumbing the structured log message timing info all the way through to mozharness. Thanks!
Attachment #8798505 - Flags: review?(ted) → review+
Comment on attachment 8798504 [details]
Bug 1020516 - Only buffer mochitest log messages that happen between a TEST-START/TEST-END,

https://reviewboard.mozilla.org/r/83974/#review85232

Yeah, it was suprisingly difficult to reason about. Fwiw, I did a lot of testing/trial and error and am pretty confident it's correct.
Can we please land this? :-)
Flags: needinfo?(ahalberstadt)
Sigh.. I autolanded this, but then autoland failed to rebase automatically. We should really get notifications going for when that happens :(

Sorry about that.
Flags: needinfo?(ahalberstadt)
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7f2168187bd4
Pass options.quiet into mochitest MessageLogger via constructor, r=ted
https://hg.mozilla.org/integration/autoland/rev/e79656a8ce0d
Only buffer mochitest log messages that happen between a TEST-START/TEST-END, r=ted
https://hg.mozilla.org/integration/autoland/rev/2e502ac7212c
Add timestamp messages when dumping buffered messages in mochitest, r=ted
https://hg.mozilla.org/mozilla-central/rev/7f2168187bd4
https://hg.mozilla.org/mozilla-central/rev/e79656a8ce0d
https://hg.mozilla.org/mozilla-central/rev/2e502ac7212c
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
This change eliminated TEST-PASS messages from successful robocop logs (all messages are buffered and dumped on failure). That makes robocop logs quieter, more similar to mochitest logs - perhaps not a bad change, but maybe not intentional? I think we can leave it as it is, but thought I'd mention it in case others are curious or alarmed.
You need to log in before you can comment on or make changes to this bug.