Closed Bug 438708 Opened 16 years ago Closed 16 years ago

new front end doesn't properly categorize leopard/vista

Categories

(Webtools Graveyard :: Graph Server, defect, P2)

defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: anodelman, Assigned: coop)

References

()

Details

Attachments

(4 files, 3 obsolete files)

Leopard and Vista test results show up as being from an "Unknown" OS.  They should correctly be placed under Mac 10.5 and Vista, respectively.
Assignee: nobody → ccooper
Target Milestone: --- → 0.3
Going to use the build inventory to determine the list of strings that we *should* be mapping here.
Status: NEW → ASSIGNED
Priority: -- → P2
Alice: is there a list of which machines are Vista/Leopard and are not labeled as such? Would there be uproar if we tried to rename those machines to make it easier to tell at a glance (and via script) what OS they were running?
Looking over my machine listings, the only mislabeled machines are qm-pmac-trunk04/05/06 which are leopard machines.  All the vista machines have 'vista' somewhere in their machine name.

We could do a renaming if we wrote a query to rename the machines in the database - otherwise we would end up splitting them off from their previously collected results.  So, renaming would take:

- machines renamed
- buildbot.tac on machines updated with new names
- master.cfg for talos redone to know new names
- production graph server db updated with new names

I think that that is totally do-able but would take a little planning.
Moving to 0.4 since 0.3 is the 'get stable' milestone.
Target Milestone: 0.3 → 0.4
A few questions:

* do we want to display a more precise OS type when we can figure it out from the machine name, e.g.  display "MacOS X 10.4" for *tiger* instead of "MacOS X" ?

* given that it is only 3 problem machines, is it easier to special case those in js/graph.js, or do we want rename the problem machines as Alice suggests? I would vote for renaming the machines despite the extra work because it will make things cleaner going forward.
I think that it would be reasonable to do a more precise OS display - since end users could then make more meaningful comparisons.

We can go the re-naming route, we just need to plan it out and schedule a downtime.
Based on some experience with the build system, I beefed up the platform regexps to be a little more future-proof, as well as breaking out 10.4 and 10.5 into their own platforms.

Still planning on going the machine-renaming route, but this patch can land in advance of that.
Attachment #333416 - Flags: review?(anodelman)
Attachment #333420 - Flags: review?(anodelman)
Attachment #333421 - Flags: review?(anodelman)
Comment on attachment 333416 [details] [diff] [review]
Identify Vista, and 10.4 vs 10.5; simplify regexp for platformclass 

Let's get this to lowercase first so that we can handle machines called 'Linux_mozilla-central', etc.

@@ -415,17 +415,21 @@ function platformFromData(t)
         var m = t.machine;
Attachment #333416 - Flags: review?(anodelman) → review-
Comment on attachment 333420 [details]
SQL script for updating machine names in the db 

Unfortunately, qm-pleopard-trunk04/05 already exist.  trunk04 is up and reporting on stage.

Also - should we line up this change with IT so that the actual machine names are updated in DNS?  I don't want us to get into too confusing a state where we have to maintain a mapping of machine names -> machine physical names.
Attachment #333420 - Flags: review?(anodelman) → review-
(In reply to comment #11)
> Unfortunately, qm-pleopard-trunk04/05 already exist.  trunk04 is up and
> reporting on stage.

So, um, what's the lowest # leopard machine name available?

> Also - should we line up this change with IT so that the actual machine names
> are updated in DNS?  I don't want us to get into too confusing a state where we
> have to maintain a mapping of machine names -> machine physical names.

BTW, I wasn't advocating pulling the trigger on the change yet, I just wanted to have the script in place for when our outage window does happen.

What does IT expect for Talos changes? Should I be filing a new bug or just CCing them here?
Attachment #333437 - Flags: review?(anodelman) → review+
I think that for IT this would just be a DNS update - so we could probably just cc them here.

I believe that the next available leopard name would be qm-pleopard-trunk06.
db script to be run *after* test machines names have been changed locally and in DNS, and the buildbot master has been updated. Uses next available leopard machine names.
Attachment #333420 - Attachment is obsolete: true
Attachment #333462 - Flags: review?(anodelman)
Uses next available leopard machine names.
Attachment #333463 - Flags: review?(anodelman)
Alice: when's our next outage window...Thursday? Could this tag along then?

cc-ing server-ops so they can take care of the required DNS changes during our downtime window, whenever that ends up being.

The DNS changes we'll need:

qm-pmac-trunk04 -> qm-pleopard-trunk06
qm-pmac-trunk05 -> qm-pleopard-trunk07
qm-pmac-trunk06 -> qm-pleopard-trunk08
Attachment #333421 - Attachment is obsolete: true
Attachment #333421 - Flags: review?(anodelman)
(In reply to comment #17)
> Alice: when's our next outage window...Thursday? Could this tag along then?

We have one tonight...
Reed's right - IT has a window tonight but you can have your own to match your schedules.
Comment on attachment 333462 [details]
[already run] SQL script for updating machine names in the db

I just talked with Mark about this query. Why are we locking all tables? We didn't see a reason to lock any with a simple update like this.
(In reply to comment #20)
> (From update of attachment 333462 [details])
> I just talked with Mark about this query. Why are we locking all tables? We
> didn't see a reason to lock any with a simple update like this.

The purist in me likes locking tables. If that's an unacceptable hit on the production server, by all means forgo locking.

(In reply to comment #3)
> - machines renamed
> - buildbot.tac on machines updated with new names
> - master.cfg for talos redone to know new names
> - production graph server db updated with new names

The outage window will only apply to the master.cfg update and subsequent master restart and *should* be very brief. We can take the affected slaves down at any point prior to that and do the requisite renaming.

I'd like to aim for early morning PDT (say 6am?) on Thursday Aug 14th to restart the master. 

IT: can you please give us aliases to the new machine names as soon as convenient, and then I can just restart the master when I start work in EDT on Thursday? We can remove the old DNS entries after the fact.
Attachment #333462 - Flags: review?(anodelman) → review+
Attachment #333463 - Flags: review?(anodelman) → review+
Re-assigning to IT to get the DNS aliases created ASAP. Please re-assign back once they're in place.
Assignee: ccooper → server-ops
Severity: normal → major
Status: ASSIGNED → NEW
Alice: I can't even connect to qm-pmac-trunk04 or qm-pmac-trunk05 (qm-pmac-trunk06 is fine). Is this a known issue?
qm-pmac-trunk04/05 being down is not a known issue, though it is definitely something that has happened in the past (we can't seem to keep leopard machines up consistently).  File a bug to have them given a reboot, unfortunately we can't do a remote reboot so this entails having someone in IT go and press the button.
Depends on: 450520
Assignee: server-ops → phong
All we need (for now) are 3 DNS aliases setup for these boxes. Can we get this done today to unblock the rest of the work please?

(In reply to comment #17)
> The DNS changes we'll need:
> 
> qm-pmac-trunk04 -> qm-pleopard-trunk06
> qm-pmac-trunk05 -> qm-pleopard-trunk07
> qm-pmac-trunk06 -> qm-pleopard-trunk08

Severity: major → critical
Comment on attachment 333437 [details] [diff] [review]
[checked in] Identify Vista, and 10.4 vs 10.5; simplify regexp for platformclass; compare as lowercase

changeset:   101:18cb7fe462d0
Attachment #333437 - Attachment description: Identify Vista, and 10.4 vs 10.5; simplify regexp for platformclass; compare as lowercase → [checked in] Identify Vista, and 10.4 vs 10.5; simplify regexp for platformclass; compare as lowercase
> > qm-pmac-trunk04 -> qm-pleopard-trunk06
> > qm-pmac-trunk05 -> qm-pleopard-trunk07
> > qm-pmac-trunk06 -> qm-pleopard-trunk08

Updated and pushing out...

Assignee: phong → ccooper
Severity: critical → normal
Attachment #333463 - Attachment description: Change slavenames in perfmaster master.cfg → [checked in] Change slavenames in perfmaster master.cfg
Alice helped with the slave renaming today, and xb95 made the db changes, so the bulk is done.

There are some machine names that are still ending up as "Unknown," so I'm rolling a patch to catch those cases now.
Attachment #333462 - Attachment description: SQL script for updating machine names in the db → [already run] SQL script for updating machine names in the db
We were still failing to parse the platform from machine names like 'WINNT_*' and 'OS_X_*' after the first patch. 

Also, we now use slightly different colors for each OS sub-platform, just like we were already doing for XP and Vista.
Attachment #334772 - Flags: review?(anodelman)
Comment on attachment 334772 [details] [diff] [review]
[checked in] Catch Windows Server and Leopard for missed boxes

Are they pretty colours?
Attachment #334772 - Flags: review?(anodelman) → review+
Status: NEW → ASSIGNED
Comment on attachment 334772 [details] [diff] [review]
[checked in] Catch Windows Server and Leopard for missed boxes

(In reply to comment #32)
> Are they pretty colours?

Totally.

changeset:   102:929e4adf6026
Attachment #334772 - Attachment description: Catch Windows Server and Leopard for missed boxes → [checked in] Catch Windows Server and Leopard for missed boxes
Is this bug done? If so let's close it.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
"Unknown" is still a platform value on http://graphs-stage.mozilla.org/; should this be reopened, given that comment 31 attempted to address a few machines ending up as "Unknown", or were those machines added post-bugfix?
Looking at graphs-stage all the machines labeled 'Unknown' are of the type "dougt_tester" and "graphs_tester" - these are just quick test runs that don't have machine types associated with them.

I think that we are in a good state now, but it's true that we should keep an eye on things to ensure that we can properly categorize any new machines that start reporting.
(In reply to comment #36)
> Looking at graphs-stage all the machines labeled 'Unknown' are of the type
> "dougt_tester" and "graphs_tester" - these are just quick test runs that don't
> have machine types associated with them.
> 
> I think that we are in a good state now, but it's true that we should keep an
> eye on things to ensure that we can properly categorize any new machines that
> start reporting.

Thanks, Alice.  I just wanted to be sure I was doing the right thing.

Verified FIXED, then, and we can spin off separate bugs for new issues.
Status: RESOLVED → VERIFIED
Product: Webtools → Webtools Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: