Closed Bug 1582766 Opened 6 years ago Closed 6 years ago

Resigning collections in certain situations sets status to `null`

Categories

(Cloud Services :: Server: Remote Settings, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: leplatrem, Assigned: leplatrem)

Details

We have (again) two collections with an invalid status.

{
  "main/normandy-recipes-capabilities": "unexpected status 'to-resign'",
  "main/tracking-protection-lists": "unexpected status 'to-resign'"
}

By looking at the tracking-protection-lists collection history, I can see that the status oscillates between null and to-resign.
Same for normandy-recipes-capabilities.

This is not the case of other collections.

There is something wrong about this code: https://github.com/Kinto/kinto-signer/blob/master/kinto_signer/listeners.py#L187-L191
If the old status is None, we should set to signed and notNone`.

We found out what is going on.

Collections are created without status. When the signature refresh is executed, the status is set to null.

The creation part will be fixed (status set to signed on creation) with kinto-dist 17.3.0

And we could wait for these collections to gain a status (eg. when first records will be created). Or we could force it now (eg. using the rollback feature) if we want the consistency check lambda to become green again ASAP. Wei, what do you think?

Flags: needinfo?(wezhou)

I think let's wait. I'd prefer not touching production data when it is not an actual error.

Thanks!

Flags: needinfo?(wezhou)

As a quick note, the currently affected PROD collections are:

  "main/cfr-srg": "unexpected status 'to-resign'",
  "main/normandy-recipes-capabilities": "unexpected status 'to-resign'",
  "main/tracking-protection-lists": "unexpected status 'to-resign'"

And on STAGE:

  "main/cfr-srg": "unexpected status 'to-resign'",

The teams haven't use the collection since their creation, hence was not fixed by itself.

Status can be fixed by doing:

echo '{"data":{"status":"to-rollback"}}' | http PATCH https://settings-writer.${ENV}.mozaws.net/v1/buckets/main-workspace/collections/${CID} -a admin:s3cr3t

poucave already has a number of checks regarding certification expiration etc. It will help...

Flags: needinfo?(wezhou)

What should be the CID values in comment #4 above? Should it be /main/cfr-srg, or /main-workspace/cfr-srg for example?

Also in -stage, the validate-signature lambda runs successfully, even when main-workspace/collections/cfr-srg is in to-resign status,

$ curl -s -u "$userpass" https://settings-writer.stage.mozaws.net/v1/buckets/main-workspace/collections/cfr-srg |jq .
{
  "permissions": {
    "write": [
      "/buckets/main-workspace/groups/cfr-srg-editors",
      "account:cloudservices_kinto_prod",
      "account:admin",
      "/buckets/main-workspace/groups/cfr-srg-reviewers"
    ]
  },
  "data": {
    "status": "to-resign",
    "id": "cfr-srg",
    "last_modified": 1571929864720
  }
}

Question is do we still need to patch its status to to-rollback in this case?

Flags: needinfo?(wezhou)

Wei, the commands would be these ones on prod:

echo '{"data":{"status":"to-rollback"}}' | http PATCH https://settings-writer.prod.mozaws.net/v1/buckets/main-workspace/collections/cfr-srg -a admin:s3cr3t

echo '{"data":{"status":"to-rollback"}}' | http PATCH https://settings-writer.prod.mozaws.net/v1/buckets/main-workspace/collections/tracking-protection-lists -a admin:s3cr3t

You can try them on STAGE if you want to, what they do is basically wipe-out any work-in-progress and pending review and sets back to the status to signed. It's the last bit (status being set to signed) that interests us in prod for these 2 collections.

Oh, it looks like you did it 19minutes ago. Sorry

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED

Yes, you can ignore comment #5 above. I figured it out later. :)

You need to log in before you can comment on or make changes to this bug.