Closed Bug 745072 Opened 12 years ago Closed 6 years ago

better tracking of collection-level modified/deleted times

Categories

(Cloud Services Graveyard :: Server: Sync, defect, P4)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: rfkelly, Unassigned)

References

Details

(Whiteboard: [qa+])

Attachments

(1 file, 1 obsolete file)

This patch is a pretty serious change to the way we track timestamps in the database.

Currently, the last-modified time of a collection is calculated as MAX(modified) over the bso table.  This works as long as rows are never deleted from the table, but that is not guaranteed.  Clients can delete rows explicitly, and the ttl-pruning script can delete rows in the background.

The patch adds another table called "collection_timestamps" that allows us to track the modification and deletion times of collections explicitly.  It looks like this:

    collection_timestamps:
        userid INTEGER PRIMARY KEY
        collection INTEGER PRIMARY KEY
        last_modified BIGINTEGER NOT NULL
        last_deleted BIGINTEGER NOT NULL

Whenever we change something in a collection, we set last_modified to the current timestamp.  Whenever we delete a collection, we set both last_modified and last_deleted to the current timestamp.

As well as maintaining more accurate semantics for modification times, this change paves the way for two other outstanding bugs: it provides the defer-deletes-to-ops-discretion of Bug 693375 for free, and provides a suitable per-user-per-collection row to perform the locking described in Bug 735085.

On the downside, we're now doing an extra UPDATE statement for each write, and a more complicated SELECT query for each read.  We'll need to carefully bench how it performs under load.  But I think the overhead is worth paying to fix the edge-cases in our semantics.

I'll also note that we ultimately want some explicit transaction management around these writes, so that "INSERT INTO bso" and "UPDATE collection_timestamps" occur atomically.  I haven't done it in this patch because it'll require some refactoring of how we use sqlalchemy, and the race is no worse than the one we already have when writing the timestamps into memcache.

Thoughts?
Attachment #614671 - Flags: feedback?(telliott)
Attachment #614671 - Flags: feedback?(rsoderberg)
Attachment #614671 - Flags: feedback?(petef)
Whiteboard: [qa-]
Whiteboard: [qa-] → [qa+]
Blocks: 693375
Notes from a trial we did as part of the AITC loadtesting.

1) As expected, for a single-item PUT the write workload approximately doubles.  We'd expect less overhead for Sync where writes occur in large batches.

2) The writes should be performed inside a transaction before we push this live - not necessarily for data integrity, but because it'll mean less fsync'ing for the database and thus reduce the above-mentioned overhead.

3) We have some queries that use "WHERE ttl > :ttl" but there is no index on the ttl column, which could become a problem for large datasets.  

4) atoll adds on IRC: "the IN () clause is also a risk at large scale with mysql 5.1"

Unfortunately this patch produced some connection-pool failures under load, so we'll keep working on it but defer deployment for now.
Comment on attachment 614671 [details] [diff] [review]
patch to explicitly track per-collection modified timestamps

I got confused for a bit, because this is really two separate projects being handled by the same table. I guess there will be minimal aggregation on the table, so merging them makes sense and is unlikely to lead to table contention, but the bug title is misleading.

As noted before, please use better keys than "stamp" and "stamps" for memcache.

It's an ongoing problem, but I want to flag what this update doesn't address: if a user deletes a row, how do the other clients know this has happened? The timestamp updates. Other clients see that the collection has been modified and request the changes. They get... no data. That probably means a delete, but short of requesting the entire collection and doing reconciliation, the client isn't going to be able to tell what changed. This is why we've had tombstones in the past. 

It looks to me, in trying to parse the wretched syntax that is sqlalchemy, that the last deleted request is done as a subquery on a get. If so, that shouldn't be too bad, but if not (and it may not be possible in nonrelational systems) I would seriously consider making last_delete an interface parameter into a get request so that we can leverage memcache if we decide to go that way.

I suspect there may be legal issues with DELETE_USER, and that's a case where it isn't acceptable to defer deletes to operational discretion.
Attachment #614671 - Flags: feedback?(telliott) → feedback+
(In reply to Toby Elliott [:telliott] from comment #2)
> I got confused for a bit, because this is really two separate projects being
> handled by the same table. 

Well, two separate-but-related projects.  You have to have *something* to track timestamps of deleted collections, because they affect the overall timestamp of the storage.  So it turned out simpler to implement deferred-deletes than to not.

> As noted before, please use better keys than "stamp" and "stamps" for
> memcache.

Yep, noted.
 
> It's an ongoing problem, but I want to flag what this update doesn't
> address: if a user deletes a row, how do the other clients know this has
> happened? The timestamp updates. Other clients see that the collection has
> been modified and request the changes. They get... no data. That probably
> means a delete, but short of requesting the entire collection and doing
> reconciliation, the client isn't going to be able to tell what changed. This
> is why we've had tombstones in the past.

Hmmm...I guess we could have server-side tombstones and a convention for reporting them.  Maybe GET /storage/collection could return an object with multiple keys, like::

    {
      "modified": ["A", "B", "C"],
      "deleted": ["X", "Y"]
    }

Or we could just return things that have been modified *or* deleted and let the client sort it out with further requests.

If this is something we're interested in adding to the protocol at some point, maybe I should resurrect the previous approach using explicit tombstones for each item.

> It looks to me, in trying to parse the wretched syntax that is sqlalchemy,
> that the last deleted request is done as a subquery on a get. If so, that
> shouldn't be too bad, but if not (and it may not be possible in
> nonrelational systems) I would seriously consider making last_delete an
> interface parameter into a get request so that we can leverage memcache if
> we decide to go that way.

Yes, it's currently looked up via a subquery.  Work continues on my death-to-sqlalchemy branch which would replace this with a plain sql query.

Passing values from memcached into the queries is something I want to look at for managing last-modified as well, so this may be a possibility.

> I suspect there may be legal issues with DELETE_USER, and that's a case
> where it isn't acceptable to defer deletes to operational discretion.

I have vague recollections of discussing a deferred-delete option for AITC in spite of these legal issues, with a guaranteed maximum of "your data will be purged within XYZ days".  Definitely something that needs to be clarified.
Summary: better tracking of collection-level modified times → better tracking of collection-level modified/deleted times
Comment on attachment 614671 [details] [diff] [review]
patch to explicitly track per-collection modified timestamps

Review of attachment 614671 [details] [diff] [review]:
-----------------------------------------------------------------

::: syncstorage/storage/memcachedsql.py
@@ +12,1 @@
>  """

I flagged "stamp" as a typo of "stamps", but reading below it turns out that they're actually two different variables. Consider changing "stamp" to something that is not a singular form of an existing key, so that they can be distinguished from each other. We have encountered issues in the past with singular/plural and I'd like to avoid them.

::: syncstorage/storage/queries.py
@@ +36,5 @@
> +
> +    'COLLECTIONS_TIMESTAMPS': 'SELECT collection, last_modified '
> +                              'FROM collection_timestamps '
> +                              'WHERE userid=:user_id '
> +                              'AND last_modified != last_deleted',

Under what circumstances would the condition "last_modified == last_deleted" occur?

@@ +51,5 @@
> +                'FROM %(bso)s b WHERE userid=:user_id AND '
> +                'ttl>:ttl AND modified > '
> +                '  (SELECT last_deleted FROM collection_timestamps ts '
> +                '   WHERE ts.userid=:user_id AND ts.collection=b.collection) '
> +                'GROUP BY collection',

If the order of the rows returned by these queries is not important, using "GROUP BY ... ORDER BY null" will save an unnecessary tempsort for some queries. It's not a significant thing with only a few collections, merely a hint to the query engine that returned order is irrelevant.

::: syncstorage/storage/sql.py
@@ +298,5 @@
>      def _cache_collection_data(self, collection_id, collection_name):
>          if len(self._collections_by_name) > MAX_COLLECTIONS_CACHE_SIZE:
> +            msg = "More than %d collections have been created, "\
> +                  "refusing to cache them all"
> +            self.logger.warn(msg % (MAX_COLLECTIONS_CACHE_SIZE,))

Unnecessary trailing comma observed, since this line is being changed anyways.

@@ +519,5 @@
> +            try:
> +                self._safe_execute(query, **values)
> +            except IntegrityError:
> +                # Someone else inserted it at the same time.
> +                pass

Is it safe to trust the database error, or should we retry TOUCH_COLLECTION + rowcount != 1?
Attachment #614671 - Flags: feedback?(rsoderberg) → feedback+
Thanks for the feedback Richard.  For completeness I am noting that the details of the current patch will require substantial changes after the refactorings in Bug 770162 and Bug 770159 land, but the basic ideas and flow of control will stay the same.

(In reply to Richard Soderberg [:atoll] from comment #4)
> Comment on attachment 614671 [details] [diff] [review]
> patch to explicitly track per-collection modified timestamps
> 
> Review of attachment 614671 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: syncstorage/storage/memcachedsql.py
> @@ +12,1 @@
> >  """
> 
> I flagged "stamp" as a typo of "stamps", but reading below it turns out that
> they're actually two different variables. Consider changing "stamp" to
> something that is not a singular form of an existing key, so that they can
> be distinguished from each other. We have encountered issues in the past
> with singular/plural and I'd like to avoid them.

A very good point.  Part of the refactoring in Bug 770162 includes getting rid of the stamp/stamps ambiguity.

> ::: syncstorage/storage/queries.py
> @@ +36,5 @@
> > +
> > +    'COLLECTIONS_TIMESTAMPS': 'SELECT collection, last_modified '
> > +                              'FROM collection_timestamps '
> > +                              'WHERE userid=:user_id '
> > +                              'AND last_modified != last_deleted',
> 
> Under what circumstances would the condition "last_modified == last_deleted"
> occur?

When a collection has been deleted, and not subsequently re-created by storing new items in it.  So "last_modified == last_deleted" is essentially "the collection is currently deleted"

Having the deletion of a collection set both last_modified and last_deleted simplifies some other queries, such as the one that does MAX(last_modified) to get the modification timestamp of the entire store.

> @@ +51,5 @@
> > +                'FROM %(bso)s b WHERE userid=:user_id AND '
> > +                'ttl>:ttl AND modified > '
> > +                '  (SELECT last_deleted FROM collection_timestamps ts '
> > +                '   WHERE ts.userid=:user_id AND ts.collection=b.collection) '
> > +                'GROUP BY collection',
> 
> If the order of the rows returned by these queries is not important, using
> "GROUP BY ... ORDER BY null" will save an unnecessary tempsort for some
> queries. It's not a significant thing with only a few collections, merely a
> hint to the query engine that returned order is irrelevant.

This is good to know, thanks!

> > +            try:
> > +                self._safe_execute(query, **values)
> > +            except IntegrityError:
> > +                # Someone else inserted it at the same time.
> > +                pass
> 
> Is it safe to trust the database error, or should we retry TOUCH_COLLECTION + rowcount != 1?

Hmm.  I think you're right, we should try again to make sure that the recorded timestamp is at least as recent as the one we're trying to write.
Attachment #614671 - Flags: feedback?(petef)
It's been a while since the last patch here.  The refactor in Bug 770159 has taken care of part of this bug, in that last-modified timestamps are now explicitly tracked in the database.

I'd like to follow it up with explicit tracking of last-deletion timestamps for each collection - see attached.  This fixes a semantic discrepancy between the SQL and Memcache+SQL backends, where the SQL backend would not adjust modification times in response to deleting a collection.

Per Toby's notes in Comment 2, I've removed a lot of complexity associated with deferred deletes.  In this new patch the individual items are still deleted from the DB, it's just that a second timestamp is tracked for each collection.  

For course, it becomes an enabler for a proper implementation of deferred-deletes in Bug 693375.
Attachment #614671 - Attachment is obsolete: true
Attachment #689053 - Flags: review?(telliott)
Oops, please ignore the spurious "ALL_COLLECTIONS" query definition in this patch, that was a debugging aid that I forgot to remove before posting.
Comment on attachment 689053 [details] [diff] [review]
patch to explicitly track last-deleted time for each collection

Review of attachment 689053 [details] [diff] [review]:
-----------------------------------------------------------------

We're clearly not going to do this at this time, so clearing it out of my queue. We can revisit if it becomes a priority.
Attachment #689053 - Flags: review?(telliott) → review-
Priority: -- → P3
Priority: P3 → P4
This ain't causing us any particular problems in production, I'm going to close this very old bug.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: