Talos swaps reporting of privatebytes and RSS on Mac

RESOLVED FIXED

Status

Release Engineering
General
P1
normal
RESOLVED FIXED
8 years ago
4 years ago

People

(Reporter: jrmuizel, Assigned: coop)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 4 obsolete attachments)

(Reporter)

Description

8 years ago
Talos currently reports VSZ as RSS and RSS as PrivateBytes because it uses
the wrong indexes into the psData array.

We should fix the reporting and perhaps change the code so that indexing happens during parsing and not in a separate function. This would have avoided this error in the first place.

Something like the following perhaps:

(pid, vss, rss) = line.split() ?


Also, private bytes is not a very good name for virtual size.
(Assignee)

Comment 1

8 years ago
Created attachment 401037 [details] [diff] [review]
Use correct indices for virtual size and resident size.

Not sure how keen Alice is about disturbing this code more deeply than this. Most of it seems to be Annie-era. The patch gets the indices right at least.
Assignee: nobody → ccooper
Status: NEW → ASSIGNED
Attachment #401037 - Flags: review?(anodelman)
This'll be a bit of a flag-day event if it lands, right? Data from before the fix will be backwards? Is it feasible to contemplate a graph server DB operation to "fix history" as well?
(Assignee)

Comment 3

8 years ago
(In reply to comment #2)
> This'll be a bit of a flag-day event if it lands, right? Data from before the
> fix will be backwards? Is it feasible to contemplate a graph server DB
> operation to "fix history" as well?

Should be simple to do db query to transpose the two numbers from all historical Mac data. Real downtime would make it easier.
I don't think that this should be landed without a) a downtime and b) a graph server update to repair old data.
Attachment #401037 - Flags: review?(anodelman) → review+
(Assignee)

Comment 5

8 years ago
Created attachment 402080 [details]
Python script to swap the existing private bytes (VSS) and RSS values in the graphserver db

I didn't actually see how long this took run to completion...killed my screen session by accident this morning. It was still running after 4 hours on the staging db last night though. :(

I'm not really familiar with mysqldb in python, but there were example scripts in the repo already and this seemed like a straightforward application. I'm more familiar with prepared statements in perl, so if there's a better way to cache to the statement in python for re-use, please let me know.

The meat of the script is simple:
* find all the private bytes (VSS) results on mac machines;
* iterate over them one at a time, looking for a matching RSS result set (matched based on date and machine)
* in the executemany, change the VSS result ids to 1, change the RSS result ids to the original VSS result id, change results with id 1 to the original RSS result id 

I use 1 as the placeholder ID for the swap here because I assume that older data with that ID will have been cleared out long ago. I can change it pretty trivially if that's not the case in production.

One possible bright spot: justdave gave me 115,385,147 as the number of rows in the values table in production. The staging db has 264,134,321 in the same table.
Attachment #402080 - Flags: review?(catlee)
(Assignee)

Updated

8 years ago
Attachment #402080 - Attachment mime type: application/octet-stream → text/plain
(Assignee)

Comment 6

8 years ago
Comment on attachment 402080 [details]
Python script to swap the existing private bytes (VSS) and RSS values in the graphserver db

I'm not writing to the right schema, apparently. :(
Attachment #402080 - Attachment is obsolete: true
Attachment #402080 - Flags: review?(catlee)
(Assignee)

Comment 7

8 years ago
Created attachment 402117 [details]
Python script to swap the existing private bytes (VSS) and RSS values in the graphserver db, v2

OK, think I'm targeting the correct schema now.

I did a test run in staging with the script limited to 10000 rows. Here's the tail of the output from that run:

# VSS rows: 10000
# dupes: 41
# unmatched: 1
# swaps: 9958
total # rows: 3550000

real    2m29.331s
user    0m2.885s
sys     0m6.892s

The "total # rows" is for all 3 updates (VSS->1, RSS->VSS, 1->RSS). Some rough math based on dividing that row count in 3, gives me just over 4 hours to process all 115 million rows in the production db using the script in it's current incarnation.

This script also doesn't do anything about re-mapping tests where there is no matching corresponding RSS test (unmatched) or where there are multiple corresponding RSS test matches (dupes). Not sure how aggressively we want to target those.
Attachment #402117 - Flags: review?(catlee)
(Assignee)

Comment 8

8 years ago
Created attachment 402262 [details]
Python script to swap the existing private bytes (VSS) and RSS test runs in the graphserver db

OK, now that I'm writing queries against the correct schema, things are much easier. We leave the data in place, and just swap the test_id in the test_runs table instead which makes for a much speedier process.

The new script swapped all relevant rows (47376) in the staging test_runs table in 9 seconds, so I don't foresee any problems for the downtime on Thursday.
Attachment #402117 - Attachment is obsolete: true
Attachment #402262 - Flags: review?(catlee)
Attachment #402117 - Flags: review?(catlee)
Comment on attachment 402262 [details]
Python script to swap the existing private bytes (VSS) and RSS test runs in the graphserver db

Looks good.

We should backup (or have IT backup) the database right before running this.
Attachment #402262 - Flags: review?(catlee) → review+
(Assignee)

Comment 10

8 years ago
Comment on attachment 402262 [details]
Python script to swap the existing private bytes (VSS) and RSS test runs in the graphserver db

Dave: we'll need someone from IT to run this script against the production graph server database during our planned releng downtime tomorrow (thurs sept 24).
We'll also want IT to perform a back-up of the db prior to running the script, just in case.

Can you give the script a quick once-over, and also let me know who is likely to be handling the IT side of things tomorrow AM during the downtime? Thanks.
Attachment #402262 - Flags: review?(justdave)
(In reply to comment #10)
> (From update of attachment 402262 [details])
> Dave: we'll need someone from IT to run this script against the production
> graph server database during our planned releng downtime tomorrow (thurs sept
> 24).
> We'll also want IT to perform a back-up of the db prior to running the script,
> just in case.
> 
> Can you give the script a quick once-over, and also let me know who is likely
> to be handling the IT side of things tomorrow AM during the downtime? Thanks.

We're looking at 8am-11am EDT currently for our downtime.

Updated

8 years ago
Attachment #401037 - Flags: checked-in+
Comment on attachment 401037 [details] [diff] [review]
Use correct indices for virtual size and resident size.

Checking in cmanager_mac.py;
/cvsroot/mozilla/testing/performance/talos/cmanager_mac.py,v  <--  cmanager_mac.py
new revision: 1.6; previous revision: 1.5
done
Ran a backup dump of the DB followed by the script after catlee gave a go
ahead.
Created attachment 402582 [details]
Updated script
Attachment #402262 - Attachment is obsolete: true
Attachment #402262 - Flags: review?(justdave)

Updated

8 years ago
Attachment #402582 - Attachment mime type: text/x-python → text/plain
(Assignee)

Comment 15

8 years ago
Production graphs are correctly showing the swapped values now.
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
Managed to switch rss and pbytes for all platforms instead of just mac.  Db corruption needs to be fixed.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Updated

8 years ago
Status: REOPENED → ASSIGNED
Priority: -- → P1
(Assignee)

Comment 17

8 years ago
Created attachment 402705 [details]
Updated catlee's script to use a sub-select, more verbose output
Attachment #402582 - Attachment is obsolete: true
Attachment #402705 - Flags: review?(anodelman)
Comment on attachment 402705 [details]
Updated catlee's script to use a sub-select, more verbose output

I'm willing to give this a try.
Attachment #402705 - Flags: review?(anodelman) → review+
Comment on attachment 402705 [details]
Updated catlee's script to use a sub-select, more verbose output

This looks ok to me.
Attachment #402705 - Flags: review+ → review?
(Assignee)

Comment 20

8 years ago
Aravind ran the updated script and it seems to have worked. Spikes are gone from the linux and win32 graphs, and Mac remains the same.
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago8 years ago
Resolution: --- → FIXED
(Assignee)

Updated

8 years ago
Attachment #402705 - Flags: review?
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.