Closed Bug 988643 Opened 10 years ago Closed 10 years ago

Add index for looking up assignment records by node name.

Categories

(Cloud Services Graveyard :: Server: Token, defect, P1)

defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: rfkelly, Assigned: rfkelly)

References

Details

(Whiteboard: [qa+])

Attachments

(3 files, 2 obsolete files)

Per Bug 985145 Comment 26, we should add an index to simplify the "find all rows assigned to this node" operation that's required when doing node migrations.

Sadly, the "node" column is currently a VARCHAR(64) rather than an integer reference into the "nodes" table.  It may be worth refactoring this, but the alterations it would imply on the live data in prod database scare me...
Whiteboard: [qa+]
Gimme the SQL to run and I'll make it happen. :) We're using MySQL 5.6 and it has online DDL (alter table) operations.
Depends on: 988134
Attached patch wimms-clear-node-assignment.diff (obsolete) — Splinter Review
Let's not normalize the node name out into a separate table for now, unless profiling proves we need to.  It will pessimize the read case into requiring a join or a second query.

This patch to WIMMS adds the index declaration and some interface methods for using it.  I can them roll a tokenserver management script on top of these methods.

The corresponding DDL for modifying the production databases is:

    CREATE INDEX node_idx ON users (service, node);

Adding this will increase our write load on the tokenserver db, we'll have to see how it fares in practise.  We could consider indexing only a prefix of the node name in order to decrease the size of the index.
Attachment #8398334 - Flags: review?(telliott)
Comment on attachment 8398334 [details] [diff] [review]
wimms-clear-node-assignment.diff

Review of attachment 8398334 [details] [diff] [review]:
-----------------------------------------------------------------

::: wimms/sql.py
@@ +440,5 @@
>          res.close()
>  
> +    def remove_node(self, service, node, timestamp=None):
> +        """Remove definition for a node."""
> +        self.unassign_node(service, node, timestamp)

Race condition? Shouldn't we delete the node, then unassign the users? Otherwise you run into a danger that someone gets assigned in between.
> Shouldn't we delete the node, then unassign the users?
> Otherwise you run into a danger that someone gets assigned in between.

Yes, this makes sense, I'll switch those around.  I was implicitly assuming that the node would be markd as "down" before doing this, but there's no such facility in this patch so I shouldn't assume that.
Updated with the ordering reversed, otherwise unchanged.
Attachment #8398334 - Attachment is obsolete: true
Attachment #8398334 - Flags: review?(telliott)
Attachment #8402889 - Flags: review?(telliott)
Attachment #8402889 - Flags: review?(telliott) → review+
https://github.com/mozilla-services/wimms/commit/46e8fc28c0f35bb8d511795ef9d867113b2cf327
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Re-opening this as :mostlygeek has expressed strong concerns about the size of this new index, and would prefer referencing the node by its id
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Yes, I think it'll just take up too much RAM to keep the index in memory. I would much prefer the index is as small as possible w/ two ints.
Note that this index is not used by the webapp itself, only be backend node-management scripts that have to operate on "all users assigned to a node".  Which might not make any difference in the scheme of things, but worth pointing out explicitly.
Priority: -- → P1
Depends on: 777650
Attached patch tokenserver-nodeid-index.diff (obsolete) — Splinter Review
OK, here's another take on this based on :mostlygeek concerns about the previous approach.  It adds a "nodeid" column to the users table and indexes on it for use by the backend cleanup scripts.  The acutal look-up-the-node-for-a-user query still uses the string-base "node" column.  Once we've got the new column successfully deployed we can do another migration to drop the old one.

The migration is the trickiest part here, as we need to populate the column for existing users.  There may be a small window where newly-created user records get a mismatched nodeid/nodename pair; we can work around this by either (1) running the same update again after we have finished the deployment, or (2) eating some downtime and disabling the service during the update.

Thoughts?
Attachment #8405899 - Flags: review?(telliott)
Attachment #8405899 - Flags: feedback?(bwong)
Blocks: 993537
Actually, let's take a more nuanced and downtime-avoiding approach here.  I'm obsoleting the previous patch with a two-part process.

The first is a migration that just adds the "nodeid" column.  It doesn't do anything to it or with it, so all nodeids will initially be NULL.

The second patch updates the code to actually write the "nodeid" column, as well as to use it for the desired node-clearing operation.  It includes a migration to set the nodeid column for existing users and then add the index on it.

The idea is that we can deploy as follows:

 1) deploy and apply the first migration, so the column is available for use
 2) deploy the second patch but don't apply the migration; get it running on
    all webheads and correctly writing to the nodeid column.
 3) apply the second migration, which will fixup the existing data and created
    the index.

We must do any node migrations between (2) and (3) but other than that this should be a safe update.
Attachment #8405899 - Attachment is obsolete: true
Attachment #8405899 - Flags: review?(telliott)
Attachment #8405899 - Flags: feedback?(bwong)
Attachment #8405910 - Flags: review?(telliott)
Attachment #8405910 - Flags: feedback?(bwong)
Attaching part 2.

Additional improvement: there's no need to index on (service, nodeid) since each nodeid is specific to a service already.  We can just index the nodeid column in isolation.

We can consider dropping the string "node" column once this is live, but it gets a bit fiddly....
Attachment #8405911 - Flags: review?(telliott)
No longer blocks: 993537
Attachment #8405910 - Flags: review?(telliott) → review+
Attachment #8405911 - Flags: review?(telliott) → review+
https://github.com/mozilla-services/tokenserver/commit/a6f6c95e11c29be85bb512a63a3bc60f34e3dec1
https://github.com/mozilla-services/tokenserver/commit/71a590f85a27eff6ee4cddfe25aaeb33d210baa2

:jbonacci in order to verify this, we need to get them all the way to prod and see that the new index works ok.  See Bug 996915 and an as-yet-unfiled deployment bug for the second step of the procedure.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Comment on attachment 8405910 [details] [diff] [review]
ts-add-nodeid-column.diff

chatted briefly with benson IRL, clearing feedback flag
Attachment #8405910 - Flags: feedback?(bwong)
Blocks: 1002295
Verified via load testing in Stage: bug 1002295
Status: RESOLVED → VERIFIED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: