Closed Bug 1162176 Opened 4 years ago Closed 4 years ago

Stop sending full IndexedDB database names as part of SlowSQL telemetry

Categories

(Toolkit :: Telemetry, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox39 --- affected
firefox40 --- affected
firefox41 --- fixed
firefox-esr31 --- affected
firefox-esr38 --- wontfix
b2g-v2.0 --- disabled
b2g-v2.0M --- disabled
b2g-v2.1 --- disabled
b2g-v2.1S --- disabled
b2g-v2.2 --- disabled
b2g-master --- fixed

People

(Reporter: bent.mozilla, Assigned: bent.mozilla)

References

(Blocks 1 open bug)

Details

(Keywords: privacy, sec-moderate, Whiteboard: [post-critsmash-triage][adv-main41-])

Attachments

(2 files, 6 obsolete files)

We're sending the full database filename now, which can easily be reverse-munged to see the name that a website gave the database. Most names are probably not a problem (beyond privacy leaks) but there's nothing stopping a website from naming one something like "email@server.com" or "username:password".

As discussed in bug 1161224 the plan is to normalize the names we send in telemetry to "db1", "db2", etc.
This allows storage consumers to specify a different database name for telemetry, and also allows consumers to report telemetry even if they don't appear in the whitelist.
Assignee: nobody → bent.mozilla
Status: NEW → ASSIGNED
Attachment #8603862 - Flags: review?(mak77)
This gives every non-chrome IndexedDB database a session-unique name for telemetry reporting purposes.
Attachment #8603863 - Flags: review?(Jan.Varga)
Comment on attachment 8603863 [details] [diff] [review]
Part 2: Mask IndexedDB database names for non-chrome databases

Review of attachment 8603863 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!
Just one comment below.

::: dom/indexedDB/ActorsParent.cpp
@@ +8638,5 @@
> +  // Any databases in these directories are owned by the application and should
> +  // not have their filenames masked. Hopefully they also appear in the
> +  // Telemetry.cpp whitelist.
> +  if (origin.EqualsLiteral("chrome") ||
> +      origin.EqualsLiteral("moz-safe-about+home")) {

What about indexeddb:// origins (addon-sdk uses it) ?
You could use QuotaManager::IsOriginWhitelistedForPersistentStorage() is case they should be included here.
Attachment #8603863 - Flags: review?(Jan.Varga) → review+
(In reply to Jan Varga [:janv] from comment #3)
> What about indexeddb:// origins (addon-sdk uses it) ?

Yeah, I considered that. We'll still see the data for any statements that get run but the name will be obfuscated. Maybe someday we'll have a need to see more database names but I don't think I care right now.
Unless I'm misunderstanding this doesn't sound like a serious security problem since we don't know of any bad names that could result in telemetry website vulnerabilities. Clearly it's serious from a privacy standpoint though.
I think we might rather modify the whitelist code to accept "wildcards", so that it will track databases starting with indexedDB- or other prefixes.
Tracking both a special name and alwaysRecordTelemetry looks like an unneeded complication, to me it's pretty clear if the consumer takes care of providing an alternative telemetry name, it wants to opt-in to telemetry (so in any case I'd remove the alwaysRecordStatement property and just check if telemetryFilename is defined).
Avoiding a plain whitelist bypass would also allow us to retain more control about what we care about.
Attachment #8603862 - Flags: review?(mak77)
(In reply to Marco Bonardo [::mak] from comment #6)
> I think we might rather modify the whitelist code to accept "wildcards", so
> that it will track databases starting with indexedDB- or other prefixes.
> Tracking both a special name and alwaysRecordTelemetry looks like an
> unneeded complication, to me it's pretty clear if the consumer takes care of
> providing an alternative telemetry name, it wants to opt-in to telemetry (so
> in any case I'd remove the alwaysRecordStatement property and just check if
> telemetryFilename is defined).

But we want to see actual db names for chrome databases since there's no privacy concern and we can make specific changes to those dbs if data shows we need to... I suppose that I could just always set the telemetryFilename option to the db filename in that case, is that what you're saying?

> Avoiding a plain whitelist bypass would also allow us to retain more control
> about what we care about.

Well, then every person that makes a chrome indexedDB database has to remember to add it to the whitelist. Someone remembered to do that for about:home (belatedly), but the folks who did the 'push' database still haven't figured out that they should do it. As the number of databases used in chrome increases I think we'll have more forgetfulness like this...
Flags: needinfo?(mak77)
(In reply to Daniel Veditz [:dveditz] from comment #5)
> Unless I'm misunderstanding this doesn't sound like a serious security
> problem since we don't know of any bad names that could result in telemetry
> website vulnerabilities. Clearly it's serious from a privacy standpoint
> though.

Yes!
The only difference here is that this gets rid of the special 'always' flag being passed in via the query params. I didn't do the whitelist change yet because of the issues I raised in comment 7, would love to know what you think about those.
Attachment #8603862 - Attachment is obsolete: true
Attachment #8605377 - Flags: review?(mak77)
This just updates the IndexedDB code to not send the 'always' flag to storage any more.
Attachment #8603863 - Attachment is obsolete: true
Attachment #8605383 - Flags: review?(Jan.Varga)
Attachment #8605383 - Flags: review?(Jan.Varga) → review+
Attachment #8605384 - Attachment is obsolete: true
(In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #7)
> But we want to see actual db names for chrome databases since there's no
> privacy concern and we can make specific changes to those dbs if data shows
> we need to... I suppose that I could just always set the telemetryFilename
> option to the db filename in that case, is that what you're saying?

Sorry, maybe I'm misunderstanding.
What I understood is that nothing is changing for currently tracked dbs, the only thing we want is to replace names of indexedDB databases with things like indexedDB-1, indexedDB-2 and so on.
Telemetry methods get the dbName so telemetry itself doesn't care, it will just trust what we pass.
So yes, we could pass indexedDB-1, -2 and so on, for chrome dbs we could pass their filename. This means people will have to keep adding dbs to the whitelist if they want telemetry (see below).
The whitelisting could not be too complex, we can decide for a special char that separates the prefix from the numbering, RFind and split on it, just putting the prefix in the whitelist hash.

> > Avoiding a plain whitelist bypass would also allow us to retain more control
> > about what we care about.
> 
> Well, then every person that makes a chrome indexedDB database has to
> remember to add it to the whitelist. Someone remembered to do that for
> about:home (belatedly), but the folks who did the 'push' database still
> haven't figured out that they should do it. As the number of databases used
> in chrome increases I think we'll have more forgetfulness like this...

True, but this is also valid if they add a normal .sqlite database. I think it's the module/feature owner responsibility to figure out if they can benefit from telemetry and we could disable less important probes to save resources, if we wish. It's not complicated to add to the whitelist, nor to review patches doing that. For example I don't think about:home telemetry is useful, due to the simplicify of its contents, and I'd have no problems disabling it if we'd need to save on resources.

The same way, by having a whitelist we can check nobody reports a db name that can hurt users privacy, while if we provide the bypass, nobody will check if someone bypasses the whitelist with user@domain.sqlite (luckily we are deprecating binary add-ons, or they could have used the bypass...)

I'm not going to block you on this, since both approaches have some pro and cons, but I would like to get your opinion on retaining more control over what we report, rather than providing a bypass of our control.
Flags: needinfo?(mak77)
Yeah, I'm fine maintaining the whitelist approach and matching based on prefix. I guess if we see untracked chrome indexedDB databases in telemetry then we can just file a bug on the owner to get it in the whitelist. Until then it's not vital that we get every chrome database reported anyway.

I talked with vladan and he was fine with our whitelist not being a runtime-generated hashtable but just a static array. I went the extra mile and did compile-time length generation too so we don't have to do strlen() any more so this should be pretty zippy.
Attachment #8605377 - Attachment is obsolete: true
Attachment #8605377 - Flags: review?(mak77)
Attachment #8605960 - Flags: review?(mak77)
Similar to the way it used to be.
Attachment #8605383 - Attachment is obsolete: true
Attachment #8605963 - Flags: review?(Jan.Varga)
Attachment #8605963 - Flags: review?(Jan.Varga) → review+
Attachment #8605964 - Attachment is obsolete: true
Comment on attachment 8605960 [details] [diff] [review]
Part 1: Add dynamic whitelisting to SlowSQL telemetry, v2

Review of attachment 8605960 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM!
Thank you for making this.

::: storage/src/mozStorageConnection.cpp
@@ +604,5 @@
>      mDBConn = nullptr;
>      return convertResultCode(srv);
>    }
>  
> +  // Leave mDatabaseFile and mFileURL empty to get a 'memory' database.

The comment as-is sounds a little bit strange without nothing happening after it...
Maybe something more verbose like:

// Do not populate mDatabaseFile and mFileURL here, since this is a "memory" database.

@@ +634,5 @@
>      return convertResultCode(srv);
>    }
>  
> +  // Leave mFileURL empty to get a disk-backed database with no additional
> +  // options.

ditto (even if here it's clearer since we populate mDatabaseFile)

@@ +666,5 @@
>      return convertResultCode(srv);
>    }
>  
> +  // Set both mDatabaseFile and mFileURL to get a disk-backed database with
> +  // additional options.

ditto (here I think "to get a disk-backed..." is a little unclear since populating these fields it's unlikely needed to "get" the database)

I think using in all the three cases a variation of the comment I suggested above, could work.

@@ +684,3 @@
>  
> +  if (mFileURL) {
> +    const char* dbPath = sqlite3_db_filename(mDBConn, "main");

please prefix with ::

@@ +686,5 @@
> +    const char* dbPath = sqlite3_db_filename(mDBConn, "main");
> +    MOZ_ASSERT(dbPath);
> +
> +    const char* telemetryFilename =
> +      sqlite3_uri_parameter(dbPath, "telemetryFilename");

ditto

@@ +699,5 @@
> +  if (mTelemetryFilename.IsEmpty()) {
> +    mTelemetryFilename = getFilename();
> +  }
> +
> +  MOZ_ASSERT(!mTelemetryFilename.IsEmpty());

nit: the empty line above this doesn't look useful
Attachment #8605960 - Flags: review?(mak77) → review+
GCC doesn't like that constexpr...
https://hg.mozilla.org/mozilla-central/rev/280dae3918d9
https://hg.mozilla.org/mozilla-central/rev/fba019b3618f
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
The bustage fix was simple:

struct TrackedDBEntry
{
  const char* mName;
  const uint32_t mNameLength;
+
+  MOZ_CONSTEXPR
+  TrackedDBEntry(const char* aName, uint32_t aNameLength)
+    : mName(aName)
+    , mNameLength(aNameLength)
+  { }

  // This struct isn't meant to be used beyond the static arrays below.
  TrackedDBEntry() = delete;
  TrackedDBEntry(TrackedDBEntry&) = delete;
};
Do we need this on Aurora/Beta/ESR38/B2G?
Flags: needinfo?(dveditz)
We probably don't need to back-port this.

Fabrice says current b2g master sends telemetry, but shipping devices don't. That argues against the need to fix this on old b2g branches, also.
(In reply to Daniel Veditz [:dveditz] from comment #26)
> We probably don't need to back-port this.
> 
> Fabrice says current b2g master sends telemetry, but shipping devices don't.
> That argues against the need to fix this on old b2g branches, also.

Sounds good to me.
Flags: needinfo?(ptheriault)
Bug 922898 should be marked as a dup of this once this bug is public.
Whiteboard: [post-critsmash-triage]
Group: core-security → core-security-release
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main41+]
Alias: CVE-2015-4517
Whiteboard: [post-critsmash-triage][adv-main41+] → [post-critsmash-triage][adv-main411]
Alias: CVE-2015-4517
Whiteboard: [post-critsmash-triage][adv-main411] → [post-critsmash-triage][adv-main41-]
Duplicate of this bug: 922898
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.