Closed Bug 661897 Opened 13 years ago Closed 13 years ago

Talos Txul seems to be posting double entries

Categories

(Release Engineering :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Assigned: armenzg)

Details

Attachments

(1 file)

For instance, we can see that for every post there is a second one for the same changeset:
http://graphs.mozilla.org/#tests=[[17,1,858]]&sel=1306684617,1307100956

We can even see it more clearly in the new graph server:
http://graphs-new.mozilla.org/graph.html#tests=[[17,1,19]]&sel=1307020397740.9355,1307125749999&displayrange=7&datatype=running

rhelmer says that he can see in the json as well.
http://graphs-new.mozilla.org/api/test/runs?id=17&branchid=1&platformid=19

This happens for Win7 as well but the difference between 1st post and the 2nd one are not that different so it looks more smooth (less noticeable).

On another note, could the new graph server helps us know which slave did the post? It would be useful if we want to spot a misbehaving slave.
Could also be a sign of bug 592244 not really being fixed.  If we could check to see if the results are from different slaves that would indicate that two sendchanges were generated for the same changeset.
Looking over the double entries they are coming from different timestamps, so should be from different machines and is thus not the talos side of the testing system generating them.

Moving to releng for further investigation.
Component: Talos → Release Engineering
Product: Testing → mozilla.org
QA Contact: talos → release
Version: unspecified → other
anode it cannot be bug 592244 as the 5 Windows 64-bit test machines are attached to the same master and on top of that it could not happen for every single changeset.

rhelmer after you read my reply would you be fine if I move this bug to webtools graphserver?

Let me know if my analysis makes sense or if I am missing anything.

I think I have spotted 2 different problems.
One of them is the double posting and the other one is that the timestamps are off.

#### Let's look at the double posting

Could the double posting be because Txul is run within chrome and nochrome? (txul is run within twinopen, right?)
line 46 -- 'chrome': GRAPH_CONFIG + ['--activeTests', 'ts:tdhtml:twinopen:tsspider'],
line 47 -- 'nochrome': GRAPH_CONFIG + ['--activeTests', 'tdhtml:twinopen:tsspider', '--noChrome'],

For changeset 48e72227c2fa:
* chrome finished at 22:14 (t-r3-w764-003) and nochrome at 22:04 (t-r3-w764-001) [1] [2]

#### Let's look at the timestamps

rhelmer where do the timestamps come from?

For instance for 48e72227c2fa I see 2 posts at the following times:
* Jun 23rd 5:03
* Jun 23rd 10:17

And that confuses me a LOT and a LOT!

As mentioned before the chrome and nochrome jobs finished at 22:14 & 22:17 on the *22nd*.
This (IIURC) means that for close to 7 hours on the graph server there was no evidence of any of these twinopen posts.

[1] https://build.mozilla.org/buildapi/revision/mozilla-central/48e72227c2fa
[2] Transmitting test: twinopen: 
		Started Wed, 22 Jun 2011 22:04:51
    Transmitting test: twinopen: 
		Started Wed, 22 Jun 2011 22:14:04
Note, this happens for all other OSes as well.
(In reply to comment #3)
> anode it cannot be bug 592244 as the 5 Windows 64-bit test machines are
> attached to the same master and on top of that it could not happen for every
> single changeset.
> 
> rhelmer after you read my reply would you be fine if I move this bug to
> webtools graphserver?


Hmm I don't think I have enough info to know where the problem is yet or how to fix, I'd prefer we leave it here for now since I will need some help :)

 
> Let me know if my analysis makes sense or if I am missing anything.
> 
> I think I have spotted 2 different problems.
> One of them is the double posting and the other one is that the timestamps
> are off.
> 
> #### Let's look at the double posting
> 
> Could the double posting be because Txul is run within chrome and nochrome?


Possible, I'll look into it.. this isn't the only test like this though right? Do we see duplicates in tdhtml or tsspider too?


> (txul is run within twinopen, right?)


I think txul and twinopen were basically two names for the same thing :
https://wiki.mozilla.org/Performance:Tinderbox_Tests#Txul:_XUL_window_open_time


> line 46 -- 'chrome': GRAPH_CONFIG + ['--activeTests',
> 'ts:tdhtml:twinopen:tsspider'],
> line 47 -- 'nochrome': GRAPH_CONFIG + ['--activeTests',
> 'tdhtml:twinopen:tsspider', '--noChrome'],
> 
> For changeset 48e72227c2fa:
> * chrome finished at 22:14 (t-r3-w764-003) and nochrome at 22:04
> (t-r3-w764-001) [1] [2]
> 
> #### Let's look at the timestamps
> 
> rhelmer where do the timestamps come from?


Pretty sure they are all HTTP POSTed along with the results (date_run in https://wiki.mozilla.org/Buildbot/Talos/DataFormat)


> For instance for 48e72227c2fa I see 2 posts at the following times:
> * Jun 23rd 5:03
> * Jun 23rd 10:17
> 
> And that confuses me a LOT and a LOT!
> 
> As mentioned before the chrome and nochrome jobs finished at 22:14 & 22:17
> on the *22nd*.
> This (IIURC) means that for close to 7 hours on the graph server there was
> no evidence of any of these twinopen posts.
> 
> [1] https://build.mozilla.org/buildapi/revision/mozilla-central/48e72227c2fa
> [2] Transmitting test: twinopen: 
> 		Started Wed, 22 Jun 2011 22:04:51
>     Transmitting test: twinopen: 
> 		Started Wed, 22 Jun 2011 22:14:04


So I think this should be fairly easy to track down if I could see what is being posted and what the database looks like, is it possible to submit these to the graphs-stage.mozilla.org instead?

If not, I get a nightly snapshot of the prod DB, I can take a look at that (but logging the POSTs would be even more helpful)
(In reply to comment #5)
> > Could the double posting be because Txul is run within chrome and nochrome?
> 
> 
> Possible, I'll look into it.. this isn't the only test like this though
> right? Do we see duplicates in tdhtml or tsspider too?
> 
On graphs-new I can select the chrome version of each suites.
For instance, "DHTML" and "DHTML Chrome"
txul does not have this problem.
e.g. http://graphs-new.mozilla.org/graph.html#tests=[[25,1,17],[18,1,17]]&sel=none&displayrange=7&datatype=running


> > [2] Transmitting test: twinopen: 
> > 		Started Wed, 22 Jun 2011 22:04:51
> >     Transmitting test: twinopen: 
> > 		Started Wed, 22 Jun 2011 22:14:04
> 
> 
> So I think this should be fairly easy to track down if I could see what is
> being posted and what the database looks like, is it possible to submit
> these to the graphs-stage.mozilla.org instead?
> 
> If not, I get a nightly snapshot of the prod DB, I can take a look at that
> (but logging the POSTs would be even more helpful)

It could be possible but quite some time to set it up.
If the info can get acquired from the prodDB that would be great, otherwise, I will have to set it up on for graphs-stage.
There is no txul nochrome test.  Though, it looks like we attempt to run txul as if it had a nochrome option - that could definitely explain what we are seeing.

Here's the offending lines in the config.py

    'chrome': GRAPH_CONFIG + ['--activeTests', 'ts:tdhtml:twinopen:tsspider'],
    'nochrome': GRAPH_CONFIG + ['--activeTests', 'tdhtml:twinopen:tsspider', '--noChrome'],

twinopen should be pulled from the nochrome list.
Assignee: nobody → armenzg
Status: NEW → ASSIGNED
Attachment #541450 - Flags: review?(anodelman)
Attachment #541450 - Flags: review?(anodelman) → review+
Comment on attachment 541450 [details] [diff] [review]
remove txul/twinopen from nochrome

http://hg.mozilla.org/build/buildbot-configs/rev/1a559685ef39

I will be picked up on the next rescheduled reconfig.
Attachment #541450 - Flags: checked-in+
This got merged to production and we should see it fixed sometime in the next few hours.
http://hg.mozilla.org/build/buildbot-configs/rev/8304529dfa83

rhelmer if you still want to chase down why there are different timestamps from the logs vs graphs please file a bug and we can follow up there.
So far it has no effects on tools or developers' as far as I know.
It is smooth now:
http://graphs-new.mozilla.org/graph.html#tests=[[17,1,12],[17,1,1]]&sel=1309249936694.7437,1309358378572&displayrange=7&datatype=running
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: