Switch to database-independant IDs



2 years ago
11 months ago


(Reporter: jwhitlock, Assigned: jwhitlock)



(Whiteboard: [bc:infra][bc:milestone=bicycle])



2 years ago
What problem would this feature solve?
A permanent, database-independent ID is needed for resources, and the current auto-incrementing IDs will not work.  Because they reflect the order of insertion:

- A 'copy' of the production API will get slightly different IDs
- The (eventual) staging API will have different IDs than the production API
- A third party API instance will not be able to keep in sync with the MDN production API, or be able to merge local changes up to the production API.

Who has this problem?
Core contributors to MDN

How do you know that the users identified above have this problem?
A subset of data is used for integration tests, and small changes that affect creation order result in big changes in IDs [2].  Code reviewers have to periodically wipe out their database to keep up with API data changes, instead of re-syncing with the production data [2].

[1] https://github.com/mdn/browsercompat/commit/31b1e8a3634a163fc0d1cd986be163ab34f17a8b
[2] https://github.com/mdn/browsercompat-data 

How are the users identified above solving this problem now?
Slugs were supposed to be write-once, and are currently used as database-independent IDs.  Whenever they are looked at in detail, changes are made (bug 1078699, bug 1128525), so they are unsuitable for this purpose. Once they are freed of the "never change" requirement, they can more freely be changed as needs evolve, or be dropped entirely.

Do you have any suggestions for solving the problem? Please explain in detail.
UUIDs [3] are a common solution to the problem of distributed identification. Random UUIDs (version 4) and would work well for our use case.

1. UUIDs would be generated as alternate IDs for each resource
2. Modify tools to use the UUID
3. Drop old IDs

[3] https://en.wikipedia.org/wiki/Universally_unique_identifier

Is there anything else we should know?
Python has native support for generating v4 UUIDs, and they are supported by Django and PostgreSQL. There are strongly supported Javascript libraries [4] as well.

Clients could then generate their own UUIDs when creating resources, allowing more flexibility when creating resources.

[4] https://github.com/broofa/node-uuid


2 years ago
Blocks: 996570
Severity: enhancement → major
Keywords: in-triage


2 years ago
Blocks: 1240757
No longer blocks: 996570
Keywords: in-triage
OS: Other → All
Summary: [Compat Data] Switch to database-independant IDs → Switch to database-independant IDs
Whiteboard: [specification][type:feature] → [bc:infra][bc:milestone=bicycle]
Component: General → BrowserCompat


2 years ago
Assignee: nobody → jwhitlock


2 years ago
Blocks: 1224345

Comment 1

2 years ago
The JSON API v1.0 spec has some requirements on using UUIDs in creating resources:


A 204 No Content response is allowed if the created resource matches the POSTed resource. For the v2 API, creating a resource also includes generating a historical link, so a 201 Created will always be returned.
Severity: major → normal

Comment 2

2 years ago
To speed import, Feature resources should use MDN-generated UUIDs. This work is covered in bug 1246967. The sequence will be something like:

- Add UUID fields to all resources, in addition to the database-generate fields. Generate UUIDs for resources other than features
- MDN adds UUIDs to page API
- An import process assigns MDN UUIDs to features representing pages
- UUIDs are generated for the rest of the feature resources
- The database ID is dropped and the UUID becomes the primary key
Depends on: 1246967

Comment 3

11 months ago
The BrowserCompat project is canceled.  See https://github.com/mdn/browsercompat for current effort. Bulk status change includes the random word TEMPOTHRONE.
Last Resolved: 11 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.