Last Comment Bug 791765 - telemetry may collect a lot of saved ping files
: telemetry may collect a lot of saved ping files
Status: RESOLVED FIXED
:
Product: Toolkit
Classification: Components
Component: Telemetry (show other bugs)
: Trunk
: x86_64 Windows 8.1
: -- normal (vote)
: mozilla18
Assigned To: Nathan Froyd [:froydnj]
:
Mentors:
Depends on: 792039
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-17 11:18 PDT by Jim Mathies [:jimm]
Modified: 2014-07-24 11:06 PDT (History)
4 users (show)
ryanvm: in‑testsuite?
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed


Attachments
telpings zip (8.87 KB, application/x-zip-compressed)
2012-09-18 09:40 PDT, Jim Mathies [:jimm]
no flags Details
patch (1015 bytes, patch)
2012-09-19 08:55 PDT, Nathan Froyd [:froydnj]
taras.mozilla: review+
bajaj.bhavana: approval‑mozilla‑aurora+
Details | Diff | Review

Description Jim Mathies [:jimm] 2012-09-17 11:18:29 PDT
We appear to be losing a lot of telemetry data assuming these are unsent pings.

On my system after four days of testing I have 157 telemetry pings in the saved-telementry-pings folder. 

Need to debug telemetry / idle service to figure out what these are and if pending, why they are not getting sent.
Comment 1 Nathan Froyd [:froydnj] 2012-09-17 11:28:52 PDT
Guessing this is somewhat similar to bug 790616, if not outright the same bug.

How long do you have this browser open for at a time, out of curiosity?  IIRC, we won't send pings until the browser's been up for ~30 minutes or so, possibly less, so that could easily explain why you have so many.
Comment 2 Jim Mathies [:jimm] 2012-09-17 11:53:24 PDT
(In reply to Nathan Froyd (:froydnj) from comment #1)
> Guessing this is somewhat similar to bug 790616, if not outright the same
> bug.
> 
> How long do you have this browser open for at a time, out of curiosity? 
> IIRC, we won't send pings until the browser's been up for ~30 minutes or so,
> possibly less, so that could easily explain why you have so many.

30 minutes will never get hit. Even if the user is running win8 on a desktop. The odds that the full screen metro browser will be open and idle for 30 minutes are slim to none.

Metro apps are full screen so if they are forefront, they are in use. When they are in the background the process is suspended.

Can we manually trigger this somehow when we know it's ok? Maybe before a suspend?
Comment 3 Jim Mathies [:jimm] 2012-09-17 12:33:42 PDT
I think I'm going to fix this from the idle service side. I might need to tweak telemetry's IDLE_TIMEOUT_SECONDS too.
Comment 4 Jim Mathies [:jimm] 2012-09-17 12:58:25 PDT
It doesn't look like this is a 30 minute wait. The idle service fires an idle-daily once a day, assuming the system is idle. Once the daily timeout expires, idle service waits DAILY_SIGNIFICANT_IDLE_SERVICE_SEC (5 min.) to fire idle-daily. On the telemetry side, we receive idle-daily, trigger an accumulate and wait IDLE_TIMEOUT_SECONDS (5 min.) before sending pings.

So we have to reach the daily timeout (not an issue) and then stay idle for ten minutes. I'm going to modify DAILY_SIGNIFICANT_IDLE_SERVICE_SEC for metro so that it is less, maybe 1 minute. I'm also probably going to want to shrink IDLE_TIMEOUT_SECONDS to around 2 minutes, assuming that isn't going to be an issue.

My tablet is designed to sleep (default setting) after seven minutes when idle. So if we can fit this all into about three minutes, we'll likely get better results.

However, I am still concerned about ping file accumulation. 160 pings is pretty bad and could happen even with the above changes. From looking at the code, it looks like we read these all in on startup? That seems to be a pretty bad scenario.
Comment 5 Jim Mathies [:jimm] 2012-09-17 20:48:39 PDT
So one of the problems here is that I'm working with local builds a lot. Local builds aren't official, so they don't send telemetry data. But for some reason ping files are still saved out from my sessions. These files, afaict, can never be sent and aren't cleared. I think we need a Telemetry.canSend check in the ping saving code similar to what we do before calling send().
Comment 6 (dormant account) 2012-09-18 04:27:54 PDT
(In reply to Nathan Froyd (:froydnj) from comment #1)
> Guessing this is somewhat similar to bug 790616, if not outright the same
> bug.
> 
> How long do you have this browser open for at a time, out of curiosity? 
> IIRC, we won't send pings until the browser's been up for ~30 minutes or so,
> possibly less, so that could easily explain why you have so many.

Nathan,
I've kept my browser up for multiple days. TELEMETRY_SUCCESS histogram indicates I sent 2 pings successfully,yet i have 78 pings in my saved directory. Something is very wrong. Oldest one is dated Aug 29.
Comment 7 Jim Mathies [:jimm] 2012-09-18 07:00:46 PDT
Here's another twist to the story - on forced shutdown, I see the following:

shuting down due to close gesture.
[JavaScript Error: "[Exception... "Component returned failure code: 0x80520008 (NS_ERROR_FILE_ALREADY_EXISTS) [nsIFileOutputStream.write]"  nsresult: "0x80520008 (NS_ERROR_FILE_ALREADY_EXISTS)"  location: "JS frame :: file:///T:/Mozilla/ELM-REL/dist/bin/components/TelemetryPing.js :: savePingToFile :: line 845"  data: no]" {file: "file:///T:/Mozilla/ELM-REL/dist/bin/components/TelemetryPing.js" line: 845}]
[71A5390] MetroWidget::Destroy mWnd=2505C2 type=0
MetroAppShell::Exit
mozilla::widget::winrt::MetroApp::ShutdownXPCOM: IsMainThread:1 ThreadId:7F8
Comment 8 Jim Mathies [:jimm] 2012-09-18 07:12:57 PDT
I've managed to force telemetry reporting on in a local build, and the ping files still stick around. From looking at TelemetryPing.js, I don't think we ever delete these files after they are created.
Comment 9 Nathan Froyd [:froydnj] 2012-09-18 07:15:57 PDT
The files should be written conditionally on Telemetry.canSend, yes.  I'll file a separate bug for that.

We don't delete the ping files unless they are sent successfully.  There is an open bug (bug 783005) on deleting them after N days...the question is why aren't they getting sent in the first place.

The error in comment 7 is something else that needs poking at; I will do that.
Comment 10 Nathan Froyd [:froydnj] 2012-09-18 09:17:01 PDT
(In reply to Jim Mathies [:jimm] from comment #7)
> Here's another twist to the story - on forced shutdown, I see the following:
> 
> shuting down due to close gesture.
> [JavaScript Error: "[Exception... "Component returned failure code:
> 0x80520008 (NS_ERROR_FILE_ALREADY_EXISTS) [nsIFileOutputStream.write]" 
> nsresult: "0x80520008 (NS_ERROR_FILE_ALREADY_EXISTS)"  location: "JS frame
> :: file:///T:/Mozilla/ELM-REL/dist/bin/components/TelemetryPing.js ::
> savePingToFile :: line 845"  data: no]" {file:
> "file:///T:/Mozilla/ELM-REL/dist/bin/components/TelemetryPing.js" line: 845}]
> [71A5390] MetroWidget::Destroy mWnd=2505C2 type=0
> MetroAppShell::Exit
> mozilla::widget::winrt::MetroApp::ShutdownXPCOM: IsMainThread:1 ThreadId:7F8

This error seems sort of bizarre; I don't see where writing would trigger this sort of error.  Can you set breakpoints in nsLocalFileWin.cpp lines 1302 (returning already-exists for directories) and 500 (converting windows errors to NS_ERROR_FILE_ALREADY_EXISTS) and provide stacks if either of those get hit (hopefully TelemetryPing.js is the only client that hits them)?
Comment 11 Jim Mathies [:jimm] 2012-09-18 09:40:06 PDT
code added below CreateFile:

wchar_t buf[2048];
memset(buf, 0, sizeof(buf));
wsprintf(buf, L"[%d] %s\n",  GetLastError(), name.get());
OutputDebugStringW(buf);

output:

[0] C:\Users\jim\AppData\Roaming\Mozilla\MetroFirefox\Profiles\xolcmrbe.default\saved-telemetry-pings\91234ff6-4997-4521-8803-ea0b8bff379c

[80] C:\Users\jim\AppData\Roaming\Mozilla\MetroFirefox\Profiles\xolcmrbe.default\saved-telemetry-pings\26646894-a3cc-4266-9b2c-34a077064724

[JavaScript Error: "[Exception... "Component returned failure code: 0x80520008 (NS_ERROR_FILE_ALREADY_EXISTS) [nsIFileOutputStream.write]"  nsresult: "0x80520008 (NS_ERROR_FILE_ALREADY_EXISTS)"  location: "JS frame :: file:///T:/Mozilla/ELM-REL/dist/bin/components/TelemetryPing.js :: savePingToFile :: line 842"  data: no]" {file: "file:///T:/Mozilla/ELM-REL/dist/bin/components/TelemetryPing.js" line: 842}]
Comment 12 Jim Mathies [:jimm] 2012-09-18 09:40:27 PDT
Created attachment 662198 [details]
telpings zip
Comment 13 Jim Mathies [:jimm] 2012-09-18 09:42:13 PDT
I can add additional debug output to TelementryPing.js, just let me know. dump statements are hooked up to my remote debugger.
Comment 14 Nathan Froyd [:froydnj] 2012-09-18 09:52:02 PDT
OK, I think what's happening here is that we expect the output stream initialization to complain:

http://dxr.lanedo.com/mozilla-central/toolkit/components/telemetry/TelemetryPing.js.html#l798

but we are using DEFER_OPEN when initializing the stream, which means that the file doesn't get opened until the first operation on it that requires it to be open.

DEFER_OPEN is nice for async writing, but we never use the async writing, except for testing purposes.  Maybe we should just ditch those bits.

That explains the error, but not the multitude of ping files.
Comment 15 Jim Mathies [:jimm] 2012-09-18 09:58:21 PDT
> That explains the error, but not the multitude of ping files.

Isn't this just bug 792039?

Does the ping file format have something in it that would identify whether it was written by an official build vs. a local one?
Comment 16 Jim Mathies [:jimm] 2012-09-18 10:00:11 PDT
Note, in the local build I'm currently testing in, every time I open and close a new file is added. This probably expains why I collect so many, I open and shut the browser *a lot* while debugging. :)
Comment 17 Nathan Froyd [:froydnj] 2012-09-18 10:01:18 PDT
(In reply to Jim Mathies [:jimm] from comment #15)
> > That explains the error, but not the multitude of ping files.
> 
> Isn't this just bug 792039?

Maybe.  I was assuming that, e.g. Taras's build in comment 6 was an official one.  If official builds are writing a multitude of ping files, there's still something else at play.

> Does the ping file format have something in it that would identify whether
> it was written by an official build vs. a local one?

No.

(In reply to Jim Mathies [:jimm] from comment #16)
> Note, in the local build I'm currently testing in, every time I open and
> close a new file is added. This probably expains why I collect so many, I
> open and shut the browser *a lot* while debugging. :)

Yup! :)
Comment 18 Nathan Froyd [:froydnj] 2012-09-19 08:55:00 PDT
Created attachment 662578 [details] [diff] [review]
patch

So of course, at one point, we had planned to have an array of pending pings.  Except that, for some reason, we would only ever send off the first ping in that list...and that never got changed.  So every time we send off an official ping, we'd only send off one ping on the pending list.  Which, depending on how often you open and close your browser, means that you could easily accumulate a backlog of unsent pings.

Obvious fix: loop over all the pending pings instead of just the first.
Comment 19 (dormant account) 2012-09-20 00:14:10 PDT
Comment on attachment 662578 [details] [diff] [review]
patch

this needs to be uplifted.
Comment 20 (dormant account) 2012-09-20 00:15:24 PDT
The code fix looks good however. I'm still not sure if this will fix my problem since it looks like only one ping is sent when I force a ping...which is probably the idle-daily ping.

Guess I'll test when this lands.
Comment 21 Jim Mathies [:jimm] 2012-09-20 02:15:50 PDT
Is it an issue that we might be uplifting lots of local build ping data? Say for example if I share a profile between local builds and release builds, the pings stored by local builds get saved and sent by release builds. This might be best handled in
Comment 22 Jim Mathies [:jimm] 2012-09-20 02:17:15 PDT
(In reply to Jim Mathies [:jimm] from comment #21)
> Is it an issue that we might be uplifting lots of local build ping data? Say
> for example if I share a profile between local builds and release builds,
> the pings stored by local builds get saved and sent by release builds. This
> might be best handled in... 
bug 792039.
Comment 24 Ryan VanderMeulen [:RyanVM] 2012-10-01 19:00:19 PDT
https://hg.mozilla.org/mozilla-central/rev/dc17abff955a

Should this have a test?
Comment 25 Nathan Froyd [:froydnj] 2012-10-05 09:04:40 PDT
Comment on attachment 662578 [details] [diff] [review]
patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Various changes made to telemetry.
User impact if declined: Saved telemetry ping files may accumulate without limit in a user's profile directory, taking up disk space and presenting a privacy risk.
Testing completed (on m-c, etc.): On m-c for ~1 week.
Risk to taking this patch (and alternatives if risky): Very low. 
String or UUID changes made by this patch: None.
Comment 26 bhavana bajaj [:bajaj] 2012-10-05 15:45:56 PDT
 Approved for aurora . Please land before monday oct 8th merge.

Note You need to log in before you can comment on or make changes to this bug.