Telemetry for image decode count

RESOLVED FIXED in mozilla9

Status

()

Core
ImageLib
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: jrmuizel, Assigned: jrmuizel)

Tracking

(Blocks: 2 bugs)

unspecified
mozilla9
x86
Mac OS X
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 1 obsolete attachment)

Comment hidden (empty)
(Assignee)

Updated

6 years ago
Depends on: 673292
(Assignee)

Comment 1

6 years ago
Created attachment 557693 [details] [diff] [review]
Add the ability to subtract from histograms
Attachment #557693 - Flags: review?(tglek)
(Assignee)

Comment 2

6 years ago
Created attachment 557696 [details] [diff] [review]
Keep a histogram of how often we've decoded a particular image
Attachment #557696 - Flags: review?(justin.lebar+bug)
Sid, is the policy to do a privacy review on every new histogram?  This one seems obviously fine to me, but marking privacy-review-needed just in case.
Keywords: privacy-review-needed
Comment on attachment 557696 [details] [diff] [review]
Keep a histogram of how often we've decoded a particular image

>@@ -186,16 +187,17 @@ RasterImage::RasterImage(imgStatusTracke
>   mFrameDecodeFlags(DECODE_FLAGS_DEFAULT),
>   mAnim(nsnull),
>   mLoopCount(-1),
>   mObserver(nsnull),
>   mLockCount(0),
>   mDecoder(nsnull),
>   mWorker(nsnull),
>   mBytesDecoded(0),
>+  mDecodeCount(0),
> #ifdef DEBUG
>   mFramesNotified(0),
> #endif
>   mHasSize(PR_FALSE),
>   mDecodeOnDraw(PR_FALSE),
>   mMultipart(PR_FALSE),
>   mDiscardable(PR_FALSE),
>   mHasSourceData(PR_FALSE),
>@@ -203,16 +205,17 @@ RasterImage::RasterImage(imgStatusTracke
>   mHasBeenDecoded(PR_FALSE),
>   mWorkerPending(PR_FALSE),
>   mInDecoder(PR_FALSE),
>   mAnimationFinished(PR_FALSE)
> {
>   // Set up the discard tracker node.
>   mDiscardTrackerNode.curr = this;
>   mDiscardTrackerNode.prev = mDiscardTrackerNode.next = nsnull;
>+  Telemetry::GetHistogramById(Telemetry::IMAGE_DECODE_COUNT)->Add(mDecodeCount);

This was kind of confusing to me.  Maybe Add(0) would be more clear?

> //******************************************************************************
> RasterImage::~RasterImage()
> {
>@@ -240,16 +243,17 @@ RasterImage::~RasterImage()
>   DiscardTracker::Remove(&mDiscardTrackerNode);
> 
>   // If we have a decoder open, shut it down
>   if (mDecoder) {
>     nsresult rv = ShutdownDecoder(eShutdownIntent_Interrupted);
>     if (NS_FAILED(rv))
>       NS_WARNING("Failed to shut down decoder in destructor!");
>   }
>+  Telemetry::GetHistogramById(Telemetry::IMAGE_DECODE_COUNT)->Subtract(mDecodeCount);

Hm, is this right?  If we never decode the image, we'll Add(0) in the constructor, but then we shouldn't ever Subtract(0).  Or am I misunderstanding?
Attachment #557696 - Flags: review?(justin.lebar+bug) → review-
(In reply to Justin Lebar [:jlebar] from comment #3)
> Sid, is the policy to do a privacy review on every new histogram?  This one
> seems obviously fine to me, but marking privacy-review-needed just in case.

Thanks... we should just do a quick once-over on new histograms, nothing too in depth for most.  Can you explain what's being measured in more detail?  Are you measuring time it takes to decode *any* image, or specific images?  Are we collecting information about specific images in a way that will tell us what specific image files users have loaded?
When a page contains a compressed image (e.g. a jpeg), we have to decode the jpeg before we can display it onscreen.  But those decoded images take up a lot more memory than the compressed images do, so we throw the decoded images away after the user switches away from the tab containing the images.  When the user switches back to the tab, we re-decode the images.

This histogram measures the distribution over "how many times did we decode this image?".  So histogram values of 0 -> 100, 1 -> 200, 2 -> 50 would indicate that we never decoded 100 images, that we decoded 200 images once, and that we decoded 50 images twice.
Okay, so if there's nothing being transmitted about individual images (hashes of the URL fingerprints or something), there doesn't seem to be anything privacy-risky here.  If I understand correctly you're, tallying how many images were decoded in each of the decoding passes.
> Okay, so if there's nothing being transmitted about individual images (hashes of the URL 
> fingerprints or something)

Correct, none of that stuff.

> If I understand correctly you're, tallying how many images were decoded in each of the decoding 
> passes.

We're tallying how many times a compressed image was thrown away and then re-decoded. (Plus whether it was ever decoded in the first place.)
(a bit of "teach noob sid via IRC" happened since the last comment). 

I now understand what we're measuring: the distribution of "reincarnations" of images.  

If there's one image that's incredibly common on one type of site and it gets re-decoded far more frequently than all others (like a hypothetical tracking beacon present on on most porn sites) we could discern from your histogram whether or not you've encountered this "most interesting" image and thus, whether or not you're spending time on that segment of the web.  

Not sure if that's a *giant* privacy risk (it is hypothetical), but we're not intending to collect that information, but are nevertheless learning it via this histogram.

What are we hoping to learn/fix by measuring this?
(Assignee)

Comment 10

6 years ago
(In reply to Sid Stamm [:geekboy] from comment #9)
> (a bit of "teach noob sid via IRC" happened since the last comment). 
> 
> I now understand what we're measuring: the distribution of "reincarnations"
> of images. 

It depends what you mean by "reincarnations". The decision to discard or decode is largely not controllable by content. i.e. there's no reason we need to redecode when loading a new a page.

> If there's one image that's incredibly common on one type of site and it
> gets re-decoded far more frequently than all others (like a hypothetical
> tracking beacon present on on most porn sites) we could discern from your
> histogram whether or not you've encountered this "most interesting" image
> and thus, whether or not you're spending time on that segment of the web.  

I think this would be very difficult to pull off. First, to pull this specific image out of the noise of other images stored in the histogram would be tricky. Second, there's no direct correlation between how frequent an image is and how often we redecode it. 

> Not sure if that's a *giant* privacy risk (it is hypothetical), but we're
> not intending to collect that information, but are nevertheless learning it
> via this histogram.
> 
> What are we hoping to learn/fix by measuring this?

We want to track if were decoding images the right amount. i.e we can tune things so we redecode too much or not enough. This also helps show if there are a lot of images that don't need to be decoded at all.
(Assignee)

Comment 11

6 years ago
(In reply to Justin Lebar [:jlebar] from comment #4)
> >   mDiscardTrackerNode.curr = this;
> >   mDiscardTrackerNode.prev = mDiscardTrackerNode.next = nsnull;
> >+  Telemetry::GetHistogramById(Telemetry::IMAGE_DECODE_COUNT)->Add(mDecodeCount);
> 
> This was kind of confusing to me.  Maybe Add(0) would be more clear?

I can switch to Add(0).


> >+  Telemetry::GetHistogramById(Telemetry::IMAGE_DECODE_COUNT)->Subtract(mDecodeCount);
> 
> Hm, is this right?  If we never decode the image, we'll Add(0) in the
> constructor, but then we shouldn't ever Subtract(0).  Or am I
> misunderstanding?


Yeah, this hunk should just go. It was from a previous version of this statistic that only tracked live images. For now, I'd like to keep the information from dead images around.
(Assignee)

Comment 12

6 years ago
Created attachment 558477 [details] [diff] [review]
Keep a histogram of how often we've decoded images v2
Assignee: nobody → jmuizelaar
Attachment #557696 - Attachment is obsolete: true
Attachment #558477 - Flags: review?(justin.lebar+bug)
> We want to track if were decoding images the right amount. i.e we can tune things so we redecode too 
> much or not enough.

Sid suggested that a histogram showing the amount of time between a tab being unfocused and refocused might be more useful for tuning this.

> This also helps show if there are a lot of images that don't need to be decoded at all.

Well, it shows us if there are a lot of images that we *don't* decode, which isn't the same thing as showing us that there are a lot of images that we *could choose* not to decode.  IOW, it's good for seeing whether changes we made to become lazier about decoding images had an impact, but it's not good for showing us whether we need to become lazier.
Comment on attachment 558477 [details] [diff] [review]
Keep a histogram of how often we've decoded images v2

r=me; the code looks right, and there's at least a plausible use-case for this data.  But I think we should look into tab focused/unfocused timings, as those might be more useful for tuning the discard timer.
Attachment #558477 - Flags: review?(justin.lebar+bug) → review+
(Assignee)

Updated

6 years ago
Blocks: 683286
I agree with what :jlebar said in comment 13 and comment 14.  I there's no serious privacy risk here (removing privacy-review-needed keyword), but have a hunch there might be other, more helpful measurements that can solve our problems.
Keywords: privacy-review-needed

Comment 16

6 years ago
Comment on attachment 557693 [details] [diff] [review]
Add the ability to subtract from histograms

># HG changeset patch
># Parent 3801b68949daee91c886c2e3e0a4f1d4a8a82b96
>Bug 684091. Add the ability to subtract values from Histograms
>
>diff --git a/ipc/chromium/src/base/histogram.cc b/ipc/chromium/src/base/histogram.cc
>--- a/ipc/chromium/src/base/histogram.cc
>+++ b/ipc/chromium/src/base/histogram.cc
>@@ -125,16 +125,27 @@ void Histogram::Add(int value) {
>   if (value < 0)
>     value = 0;
>   size_t index = BucketIndex(value);
>   DCHECK_GE(value, ranges(index));
>   DCHECK_LT(value, ranges(index + 1));
>   Accumulate(value, 1, index);
> }
> 
>+void Histogram::Subtract(int value) {
>+  if (value > kSampleType_MAX - 1)
>+    value = kSampleType_MAX - 1;
>+  if (value < 0)
>+    value = 0;
>+  size_t index = BucketIndex(value);
>+  DCHECK_GE(value, ranges(index));
>+  DCHECK_LT(value, ranges(index + 1));
>+  Accumulate(value, -1, index);
>+}
>+

I would prefer if instead of copying code here, you stuck the common logic into a int Normalize(int value) utility function

However it's a matter of taste, so r+ either way. 

Sorry for the slow review.
Attachment #557693 - Flags: review?(tglek) → review+
http://hg.mozilla.org/mozilla-central/rev/0f7583869bbd
http://hg.mozilla.org/mozilla-central/rev/eb8a6747f683
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla9

Updated

6 years ago
Blocks: 683284
You need to log in before you can comment on or make changes to this bug.