Closed Bug 778994 Opened 12 years ago Closed 11 years ago

Review security of integer millisecond timestamps in Sync2.0/AITC

Categories

(Cloud Services Graveyard :: Server: Sync, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED WONTFIX

People

(Reporter: rfkelly, Assigned: rfkelly)

Details

(Whiteboard: [qa?])

Attachments

(1 file)

It appears that sending millisecond-resolution timestamps from our servers is not allowed due to security risk.  Snipping some IRC chatter from #identity

---------
<_6a68> lloyd | warner: just read the scrollback. what about multiplying the JS timestamp by 1000 before pushing up to mysql? that would give unit test-friendly ms resolution
<warner> _6a68: um, would that give us database-side timestamps in the .. (2012-1970)*1000 = 4400AD era?
* benadida has quit (Quit: benadida)
<warner> er, 44000 AD
<atoll> _6a68: ugh, sync did that and we've regretted it forever
<_6a68> haha, ok
<atoll> and infrasec forced us to reduce all outputs to tenth second
<atoll> because it's a security risk to give up the raw timestamp for some reason
<warner> I suspect that mysql will be unhappy with timestamp abuse :)
<_6a68> slow unit tests suck mightily, is the thing.
<lloyd> atoll: corellation?  fingerprintng?
<warner> probably
<atoll> lloyd: something like that
---------


Sync2.0 and AITC both currently use integer millisecond timestamps, and do not truncate them to any lower precision.  Sounds like we should truncate them to hundredths-of-a-second precision when they are generated?
Attaching a dump of some relevant IRC chatter.

I think the first decision we need to make is this:  Will we keep the protocol definition in terms of integer millisecond timestamps?

If so, then there is no point in trying to come up with a solution that gives us more than 1ms granularity of writes.  If the client-facing values of X-Last-Modified and X-If-Unmodified-Since are in milliseconds, we must reject simultaneous writes to a collection that occur in the same millisecond.  Doing otherwise will make it impossible to enforce consistency of reads/writes via precondition headers.

If not, then how exactly are we going to define them?  We could specify that they are opaque server-generated "version numbers".  But at that point, we don't need to have timestamps in there at all and could go to a plain autoincr solution.

(I would argue that the client should always treat them as though they are opaque version numbers, but the protocol does currently say "these are integer millisecond timestamps")
I'm a strong proponent of referring to things as incrementing version numbers. In fact, I've been implementing the Firefox client using this terminology. "clientVersion" "serverVersion" etc.
cc'ing Anant, since as currently implemented, any change we make in sync server will be inherited by AitC automatically.

> I'm a strong proponent of referring to things as incrementing version
> numbers. In fact, I've been implementing the Firefox client using this
> terminology. "clientVersion" "serverVersion" etc.

Is it useful for the client to be able to compare two version numbers for ordering, or is it sufficient to be able to compare them for equality?  Obviously easy to implement either way, but if we do change the spec then we should explicitly call this out.
(In reply to Ryan Kelly [:rfkelly] from comment #3)
> Is it useful for the client to be able to compare two version numbers for
> ordering, or is it sufficient to be able to compare them for equality?
> Obviously easy to implement either way, but if we do change the spec then we
> should explicitly call this out.

I would assume ordering. How would you implement comparison for equality on just version numbers? Hashes? That's certainly a tempting feature to add. Although, I'm not sure how we'd make use of it without too much deviation from the existing protocol.
(In reply to Gregory Szorc [:gps] from comment #4)
> How would you implement comparison for equality on
> just version numbers? Hashes? That's certainly a tempting feature to add.

Sorry to be unclear.  I was thinking of a simple "has the version number changed?" test implemented as string equality, versus "which of these two version numbers is the newer?" implemented as an ordered comparison of some kind.

Not comparing equality of the *contents* of each version.  Content-addressable versions would be awesome, but that's sync3.0 at least :-)
Whiteboard: [qa?]
:atoll and :telliott, there are some things in your IRC chat that don't quite parse for me, and I don't want to push ahead with this unless we're all on the same page.  Consider:

    <atoll> sure, but pkey bigint is terrific vs. pkey anything char

    <atoll> this is a universal bigint pkey solution
    <telliott> I was thinking per user, but that's possible too
    <atoll> otherwise you don't get time order across all users

    <atoll> more efficient
    <atoll> healthier indexes, based on a single 8-byte bigint pkey

I guess "pkey" here refers to "primary key", but this doesn't line up with the current database schema.  The table storing BSOs currently has:

    Primary Key: (object id VARCHAR, collection id INT, user id INT)
    Index:  (user id INT, collection id INT, modified timestamp BIGINT)
    Index:  (ttl INT)

Are you suggesting a restructuring of this schema to go along with a change in timestamp format?  For example, switching to an event-logging type table that is keyed by timestamp rather than object id?

This certainly sounds worth exploring, but it would be good to tease it out into a new bug.
(In reply to Ryan Kelly [:rfkelly] from comment #6)
> Are you suggesting a restructuring of this schema to go along with a change
> in timestamp format?  For example, switching to an event-logging type table
> that is keyed by timestamp rather than object id?
> 
> This certainly sounds worth exploring, but it would be good to tease it out
> into a new bug.

I am *describing*, but not actively advising, an alternative schema design that dual-purposes a bigint pkey as both timestamp and as auto increment.

At our current data density it would not be valuable to delay Sync 2.x for such a change. A new bug is definitely relevant if it seems like the # of rows in a given wall-clock second per database would exceed some 2^x threshold, maybe around 2^18 or so.
(In reply to Richard Soderberg [:atoll] from comment #7)
> I am *describing*, but not actively advising, an alternative schema design
> that dual-purposes a bigint pkey as both timestamp and as auto increment.

OK thanks, understood.

From a protocol point of view, I think the status-quo of using integer milliseconds is good enough.  If we synthesize the last few digits via an autoincrement, we can meet security requirements while maintaining 1ms write resolution per collection.

Looking to a future scheme like the one suggested above, it would be *better* if the protocol could use opaque version identifiers rather than timestamps.  I will do up some docs patches as an experiment, to see just how much protocol surgery would be required.
I think this was the original bug regarding 1/10th second
bug #546715

I don't remember the original concern about using a more granular timestamp. There could be an issue for JPAKE if the key data is generated off system time. However the rest of the Sync code uses client generated secrets afaik. 

Personally I don't think there is risk from using the raw timestamps in the current Sync workflows
The original concern that led to using a millisecond-granularity timestamp was that the client devs were very concerned about collisions and collision handling, so we moved it to a level that made collisions incredibly unlikely. As we got actual data in, and a more robust client, collision danger went down, which is why we considered 10ths of a second. 

The suggestion here doesn't change the schema much, it just changes the BIGINT to seconds + autoinc rather than milliseconds. The advantage to this is that each entry will have a unique timestamp, making the offset work a lot easier.
(In reply to Toby Elliott [:telliott] from comment #10)
> The suggestion here doesn't change the schema much, it just changes the
> BIGINT to seconds + autoinc rather than milliseconds. The advantage to this
> is that each entry will have a unique timestamp, making the offset work a
> lot easier.

The protocol currently guarantees that items uploaded in a batch will all receive the same timestamp.  I think this makes a lot of sense in conjunction with X-If-Unmodified-Since, X-Last-Modified etc.

We could pad out the bigint with an autoinc to help with indexing etc, but exposing that intra-batch granularity to the client would require a protocol change.
closing due to deadness of relevant projects
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
Status: RESOLVED → VERIFIED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: