Collect about:memory reports without private data

RESOLVED FIXED in mozilla33

Status

()

defect
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: dmajor, Assigned: njn)

Tracking

unspecified
mozilla33
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 2 obsolete attachments)

In order to send about:memory with crash reports we'll need to remove private information, such as:

- Window URLs (maybe with exceptions? about:blank might be useful)
- Notable strings

What else?
Flags: needinfo?(n.nethercote)
> - Window URLs (maybe with exceptions? about:blank might be useful)

Chrome windows in general are probably ok.

> - Notable strings

"script-sources" -- it lists filenames.

add-ons? Or are they already sent in crash reports?

image locations, if they get added (both bug 921300 and bug 1009097 are about this).

Something that scares me a bit about this is that I don't have a clear idea what the mechanism for scrubbing this information will look like. I'm worried that we might implement something that covers all the relevant cases, but then in the future someone adds a new memory reporter without appropriately updating the scrubber. (And it's probably worth mentioning bug 709326 in passing here.)
Flags: needinfo?(n.nethercote)
Since this is important, let's solve our fears. We have at least two technical solution I can think of:

1) build this scrubbing into the memory-reporter system itself as a flag (passed to collectReports, or a state flag stored on nsIMemoryReporterCallback).

2) scrub as a separate step

#1 sounds much more future-proof to me, although it may involve more work along the way. Nick do you have somebody who can take this on? If we do #2, are there monitoring tricks we can use to verify that domains aren't appearing?
Flags: needinfo?(n.nethercote)
Yeah, #1 does sound better -- a separate scrubber would need updating anyway if any memory reporter started reporting some new kind of sensitive thing.

I can do this work. I'll need to think about the best way to indicate the preference. And we can probably have some simple checker somewhere that at least looks for obvious things in paths like "http://".
Assignee: nobody → n.nethercote
Flags: needinfo?(n.nethercote)
You'll need to scrub file:// too, and likely others.
It's probably best to whitelist chrome:// and maybe a couple others.
Here's a preliminary patch that sets up all the plumbing -- I added a new
"mode" argument to CollectReports. It also scrubs URLs in the
nsWindowMemoryReporter, and I've annotated all the other places that might need
scrubbing with comments.

One tricky case: at least one add-on (DownThemAll!) implements its own memory
reporter. There's no way for old versions of that add-on to respect the mode
argument to CollectReports(). I don't know if DownThemAll!'s memory reporter
exposes privacy-sensitive data, but it's certainly possible.
Can we just disable addon memory reporters entirely for this?  That would be the safest option, if possible.
Generally, I don't think we want Firefox's privacy sensitivity to depend on code written by a random author, code we never even see.  You could probably flag a reporter registered by whatever addon interface there is, and then not run them at all.
> Generally, I don't think we want Firefox's privacy sensitivity to depend on
> code written by a random author, code we never even see. 

I entirely agree, but I don't know how to enforce it. Add-ons just use the standard registration hook, via nsIMemoryReporterManager.

The only foolproof mechanism I can think of is to eliminate the ability to register memory reporters from JS. (Though, would that prevent binary add-ons from registering reporters?) But that would break DownThemAll!'s reporters, as well as the one reporter within Firefox that's implemented in JS.

All this stuff contributes to my concerns about this feature.
bsmedberg, what's your take on comment 9? Is there a way to distinguish memory reporters that are in add-ons?
Flags: needinfo?(benjamin)
If there's only one in-tree reporter that is registered from JS, it seems like it would be enough to just exclude JS-implemented ones.  The case of binary addons implementing reporters doesn't seem too likely, and maybe that could be covered by not exporting a symbol or something.
Addons *can* already add arbitrary data to telemetry or crash-stats payloads, so we shouldn't worry about preventing the problem. I think the only question to answer is how to solve the specific case here. Can we just change the name of the registration method so that the addon won't report to about:memory until it is updated?
Flags: needinfo?(benjamin)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #12)
> Addons *can* already add arbitrary data to telemetry or crash-stats
> payloads, so we shouldn't worry about preventing the problem. I think the
> only question to answer is how to solve the specific case here.

Sorry, which specific case? If we don't have to worry about add-ons then there isn't really a problem -- all the reporters in our codebase can be modified easily, including the one implemented in JS.
Another question: is the list of add-ons a user has installed considered privacy-sensitive?
I mean the specific addon which is currently adding private data to about:memory. We don't need to worry about future-unknown addons doing something wrong.
The list of addons is already collected in telemetry, FHR, and crash reports.
Here's a draft patch that basically works. There are a bunch of things I'm
unsure about -- many of which have "njn" comments on them -- so I'm asking for
feedback.

- I added a |int32_t mode| argument to collectReports() and similar API
  functions in nsIMemoryReporterManager and nsIMemoryInfoDumper. I'm undecided
  about whether MODE_* constants are better than a bool argument. Some people
  greatly dislike bool args in APIs, but we have bool |minimizeMemoryUsage|
  argument used in these APIs anyway, and it means I just have to convert the
  mode to a bool in numerous places.

- Currently about:memory is hard-coded to censor reports when you press either
  "Measure" or "Measure and save...". This obviously needs changing. I'm unsure
  if we need a way to trigger censoring from about:memory, or if just being
  able to do it via IDL calls suffices. I imagine the crash reporter will be
  using nsIMemoryInfoDumper directly...

- There's no testing yet.

- Here are the things I have censored.
  - nsWindowMemoryReporter: all window URIs
  - JSMainRuntimeCompartmentsReporter: compartment names
  - JSReporter: compartment names, notable strings, notable script sources
  - imgMemoryReporter: raster image and vector doc URIs [still to be done,
    waiting on bug 1009097]
  - WorkerPrivate::MemoryReporter: worker domains and URLs

  Some examples:

> ├───6,504,544 B (06.57%) -- top(<window-3>, id=3)
> │   ├──6,182,600 B (06.25%) -- active
> │   │  ├──5,909,120 B (05.97%) -- window(<window-4>)
> │   │  │  ├──3,003,072 B (03.04%) -- js-compartment(<compartment-181>)

- Here are the things I'm unsure on whether they need censoring.
  - nsDOMMemoryFileDataOwnerMemoryReporter: sha1 hashes of files?
  - nsWindowMemoryReporter: do we need to censor chrome URIs?
  - BlobURLsReporter: the owner and the JS stack
  - ContentParentsMemoryReporter: content process names?
  - SettingsManager: observer topics
  - JSMainRuntimeCompartmentsReporter, JSReporter: do we need to censor system
    compartment names?
  - storage::Service: sqlite file names
  - nsObserverService: observer topics
Attachment #8426043 - Flags: feedback?(benjamin)
Attachment #8423680 - Attachment is obsolete: true
> - I added a |int32_t mode| argument to collectReports() and similar API
>   functions in nsIMemoryReporterManager and nsIMemoryInfoDumper. I'm
> undecided
>   about whether MODE_* constants are better than a bool argument. Some people
>   greatly dislike bool args in APIs, but we have bool |minimizeMemoryUsage|
>   argument used in these APIs anyway, and it means I just have to convert the
>   mode to a bool in numerous places.

I don't have a strong preference about whether it's a bool or a mode, but I do think that if we're going to use a mode it should use bitflags, so e.g.

MODE_PRIVACY = 0x1

Future modes would be 0x2 / 0x4 etc.

Then use 0 for the default mode and combine future modes with MODE_PRIVACY | MODE_WHATEVER.

> - Currently about:memory is hard-coded to censor reports when you press
> either
>   "Measure" or "Measure and save...". This obviously needs changing. I'm
> unsure
>   if we need a way to trigger censoring from about:memory

The only reason to have a checkbox to let you censor from about:memory is to make it easier to have people QA this feature and make sure it does reasonable things.

> - Here are the things I have censored.
>   - nsWindowMemoryReporter: all window URIs

Does this include chrome URIs? I think we should whitelist/report chrome URIs.

>   - JSMainRuntimeCompartmentsReporter: compartment names

ditto if this affects chrome compartments. Crash reports already have the list of extensions, so including that data is not a problem and we'd definitely like to know if extensions cause memory issues.

> - Here are the things I'm unsure on whether they need censoring.
>   - nsDOMMemoryFileDataOwnerMemoryReporter: sha1 hashes of files?

Yes, I think we should censor these.

>   - nsWindowMemoryReporter: do we need to censor chrome URIs?

Aha. We should not.

>   - BlobURLsReporter: the owner and the JS stack

JS stack if it's content, yes. Not sure what owner means here.

>   - ContentParentsMemoryReporter: content process names?

I don't know.

>   - SettingsManager: observer topics

I don't know. If this is chrome-and-extensions only, then it's ok.

>   - JSMainRuntimeCompartmentsReporter, JSReporter: do we need to censor
> system
>     compartment names?

No.

>   - storage::Service: sqlite file names

I don't think so.

>   - nsObserverService: observer topics

No.
Attachment #8426043 - Flags: feedback?(benjamin)
> The only reason to have a checkbox to let you censor from about:memory is to make it easier to have people QA this feature and make sure it does reasonable things.

Just the other day, I did see somebody decline to attach an about:memory report because it contains a lot of private information, so I don't think it would be totally useless to make it user-visible.
(In reply to Andrew McCreight [:mccr8] from comment #19)
> > The only reason to have a checkbox to let you censor from about:memory is to make it easier to have people QA this feature and make sure it does reasonable things.
> 
> Just the other day, I did see somebody decline to attach an about:memory
> report because it contains a lot of private information, so I don't think it
> would be totally useless to make it user-visible.

Agreed, I ran into that in bug 1016264. Just like we have verbose we should probably have an anonymize flag in about:memory.
> Agreed, I ran into that in bug 1016264. Just like we have verbose we should
> probably have an anonymize flag in about:memory.

The difficulty with that is that "verbose" relates to the presentation of the data, and so applies to "Measure", "Load", "Load and diff" and "Read from clipboard". That's why the "verbose" checkbox is inside the bubble with those buttons.

In contrast, "anonymize" relates to the measurement, and so applies to "Measure" and "Measure and save" buttons, which are in different bubbles. I can't think of a good way to represent this in the UI.
OS: Windows 7 → All
Hardware: x86_64 → All
> >   - ContentParentsMemoryReporter: content process names?
> 
> I don't know.

Since apps are implemented via content processes, and apps are roughly equivalent to websites, I will anonymize them.
Here's an updated version ready for review. What a lot of plumbing.

Things to note:

- I'm using "anonymize" to describe this. I think it's better than "censor".

- Sample output:

> ├───20.36 MB (17.57%) -- window-objects
> │   ├──10.83 MB (09.35%) -- top(<anonymized-14>, id=14)
> │   │  ├───9.95 MB (08.59%) -- active
> │   │  │   ├──9.95 MB (08.59%) -- window(<anonymized-18>)

- The following things are anonymized.
  - nsWindowMemoryReporter: content window URIs [chrome window URIs aren't
    anonymized]
  - JSMainRuntimeCompartmentsReporter: user compartment names [system
    compartments aren't anonymized]
  - JSReporter: user compartment names, notable strings, notable script sources
  - imgMemoryReporter: URIs of content raster images and vector image docs
    [chrome ones aren't anonymized]
  - WorkerPrivate::MemoryReporter: worker domains and URLs, for workers that
    have a non-empty domain (empty domains means it's a chrome worker, AFAICT)
  - BlobURLsReporter: the owner, JS stack, and URL
  - ContentParentsMemoryReporter: content process names (excluding built-in
    ones like "(Nuwa)", "(Preallocated)", and "Browser").
  - nsDOMMemoryFileDataOwnerMemoryReporter: sha1 hashes of files
  - SettingsManager: topics

- I've added a "anonymize" checkbox to about:memory. It's within the "Save
  memory reports" bubble, but it applies to both the "Measure" and "Measure and
  save..." buttons. I'm still not happy with this, and hope to come up with
  something better before landing.

- I ended up using a boolean for the new parameter in collectReports() and the
  related functions.

- There is some light testing:
  - In normal mode, "anonymized" shouldn't appear in any memory reporter
    paths.
  - In anonymized mode, "http:" and "https:" shouldn't appear in any memory
    reporter paths.
Attachment #8426043 - Attachment is obsolete: true
Attachment #8434614 - Flags: review?(benjamin)
Comment on attachment 8434614 [details] [diff] [review]
Allow memory reports to be anonymized

This is great. I'm sorry for the delay. My only comment is that perhaps we should add some more detail to the doccomment in nsIMemoryInfoDumper.idl or elsewhere explaining what anonymization means: that it shouldn't record user URLs/domains, but does include extension information, and that anonymized reports may be sent with crash-stats and in the future as a special telemetry ping.
Attachment #8434614 - Flags: review?(benjamin) → review+
Hmm, we can have paths like this:

> compartment([System Principal], file:///home/njn/moz/mi6/co64/dist/bin/components/nsURLFormatter.js

Is the username in that path privacy-sensitive? Perhaps I should anonymize it like this:

> compartment([System Principal], file://<anonymized-path>/nsURLFormatter.js
Flags: needinfo?(benjamin)
Yes, we should anonymize that.
Flags: needinfo?(benjamin)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #26)
> Yes, we should anonymize that.

I've done that. try run looks good: https://tbpl.mozilla.org/?tree=Try&rev=895e3b493ea4
https://hg.mozilla.org/mozilla-central/rev/258916327d96
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Depends on: 1028217
It looks like this bug caused a massive drop on AWSY (which seems rather unlikely to be real). Is there a bug open about that?
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #31)
> It looks like this bug caused a massive drop on AWSY (which seems rather
> unlikely to be real). Is there a bug open about that?

No, though I emailed johns and co. about it yesterday. Care to file a new bug?
Depends on: 1122322
Depends on: 1125989
Depends on: 1137659
You need to log in before you can comment on or make changes to this bug.