telemetry may collect a lot of saved ping files

RESOLVED FIXED in Firefox 17

Status

()

Toolkit
Telemetry
RESOLVED FIXED
5 years ago
3 years ago

People

(Reporter: jimm, Assigned: froydnj)

Tracking

Trunk
mozilla18
x86_64
Windows 8.1
Points:
---
Bug Flags:
in-testsuite ?

Firefox Tracking Flags

(firefox17 fixed)

Details

Attachments

(2 attachments)

(Reporter)

Description

5 years ago
We appear to be losing a lot of telemetry data assuming these are unsent pings.

On my system after four days of testing I have 157 telemetry pings in the saved-telementry-pings folder. 

Need to debug telemetry / idle service to figure out what these are and if pending, why they are not getting sent.
(Assignee)

Comment 1

5 years ago
Guessing this is somewhat similar to bug 790616, if not outright the same bug.

How long do you have this browser open for at a time, out of curiosity?  IIRC, we won't send pings until the browser's been up for ~30 minutes or so, possibly less, so that could easily explain why you have so many.
(Reporter)

Comment 2

5 years ago
(In reply to Nathan Froyd (:froydnj) from comment #1)
> Guessing this is somewhat similar to bug 790616, if not outright the same
> bug.
> 
> How long do you have this browser open for at a time, out of curiosity? 
> IIRC, we won't send pings until the browser's been up for ~30 minutes or so,
> possibly less, so that could easily explain why you have so many.

30 minutes will never get hit. Even if the user is running win8 on a desktop. The odds that the full screen metro browser will be open and idle for 30 minutes are slim to none.

Metro apps are full screen so if they are forefront, they are in use. When they are in the background the process is suspended.

Can we manually trigger this somehow when we know it's ok? Maybe before a suspend?
(Reporter)

Comment 3

5 years ago
I think I'm going to fix this from the idle service side. I might need to tweak telemetry's IDLE_TIMEOUT_SECONDS too.
(Reporter)

Comment 4

5 years ago
It doesn't look like this is a 30 minute wait. The idle service fires an idle-daily once a day, assuming the system is idle. Once the daily timeout expires, idle service waits DAILY_SIGNIFICANT_IDLE_SERVICE_SEC (5 min.) to fire idle-daily. On the telemetry side, we receive idle-daily, trigger an accumulate and wait IDLE_TIMEOUT_SECONDS (5 min.) before sending pings.

So we have to reach the daily timeout (not an issue) and then stay idle for ten minutes. I'm going to modify DAILY_SIGNIFICANT_IDLE_SERVICE_SEC for metro so that it is less, maybe 1 minute. I'm also probably going to want to shrink IDLE_TIMEOUT_SECONDS to around 2 minutes, assuming that isn't going to be an issue.

My tablet is designed to sleep (default setting) after seven minutes when idle. So if we can fit this all into about three minutes, we'll likely get better results.

However, I am still concerned about ping file accumulation. 160 pings is pretty bad and could happen even with the above changes. From looking at the code, it looks like we read these all in on startup? That seems to be a pretty bad scenario.
(Reporter)

Comment 5

5 years ago
So one of the problems here is that I'm working with local builds a lot. Local builds aren't official, so they don't send telemetry data. But for some reason ping files are still saved out from my sessions. These files, afaict, can never be sent and aren't cleared. I think we need a Telemetry.canSend check in the ping saving code similar to what we do before calling send().
(Reporter)

Updated

5 years ago
Whiteboard: metro-preview

Comment 6

5 years ago
(In reply to Nathan Froyd (:froydnj) from comment #1)
> Guessing this is somewhat similar to bug 790616, if not outright the same
> bug.
> 
> How long do you have this browser open for at a time, out of curiosity? 
> IIRC, we won't send pings until the browser's been up for ~30 minutes or so,
> possibly less, so that could easily explain why you have so many.

Nathan,
I've kept my browser up for multiple days. TELEMETRY_SUCCESS histogram indicates I sent 2 pings successfully,yet i have 78 pings in my saved directory. Something is very wrong. Oldest one is dated Aug 29.
(Reporter)

Comment 7

5 years ago
Here's another twist to the story - on forced shutdown, I see the following:

shuting down due to close gesture.
[JavaScript Error: "[Exception... "Component returned failure code: 0x80520008 (NS_ERROR_FILE_ALREADY_EXISTS) [nsIFileOutputStream.write]"  nsresult: "0x80520008 (NS_ERROR_FILE_ALREADY_EXISTS)"  location: "JS frame :: file:///T:/Mozilla/ELM-REL/dist/bin/components/TelemetryPing.js :: savePingToFile :: line 845"  data: no]" {file: "file:///T:/Mozilla/ELM-REL/dist/bin/components/TelemetryPing.js" line: 845}]
[71A5390] MetroWidget::Destroy mWnd=2505C2 type=0
MetroAppShell::Exit
mozilla::widget::winrt::MetroApp::ShutdownXPCOM: IsMainThread:1 ThreadId:7F8
(Reporter)

Comment 8

5 years ago
I've managed to force telemetry reporting on in a local build, and the ping files still stick around. From looking at TelemetryPing.js, I don't think we ever delete these files after they are created.
(Assignee)

Comment 9

5 years ago
The files should be written conditionally on Telemetry.canSend, yes.  I'll file a separate bug for that.

We don't delete the ping files unless they are sent successfully.  There is an open bug (bug 783005) on deleting them after N days...the question is why aren't they getting sent in the first place.

The error in comment 7 is something else that needs poking at; I will do that.
(Assignee)

Updated

5 years ago
Depends on: 792039
(Assignee)

Comment 10

5 years ago
(In reply to Jim Mathies [:jimm] from comment #7)
> Here's another twist to the story - on forced shutdown, I see the following:
> 
> shuting down due to close gesture.
> [JavaScript Error: "[Exception... "Component returned failure code:
> 0x80520008 (NS_ERROR_FILE_ALREADY_EXISTS) [nsIFileOutputStream.write]" 
> nsresult: "0x80520008 (NS_ERROR_FILE_ALREADY_EXISTS)"  location: "JS frame
> :: file:///T:/Mozilla/ELM-REL/dist/bin/components/TelemetryPing.js ::
> savePingToFile :: line 845"  data: no]" {file:
> "file:///T:/Mozilla/ELM-REL/dist/bin/components/TelemetryPing.js" line: 845}]
> [71A5390] MetroWidget::Destroy mWnd=2505C2 type=0
> MetroAppShell::Exit
> mozilla::widget::winrt::MetroApp::ShutdownXPCOM: IsMainThread:1 ThreadId:7F8

This error seems sort of bizarre; I don't see where writing would trigger this sort of error.  Can you set breakpoints in nsLocalFileWin.cpp lines 1302 (returning already-exists for directories) and 500 (converting windows errors to NS_ERROR_FILE_ALREADY_EXISTS) and provide stacks if either of those get hit (hopefully TelemetryPing.js is the only client that hits them)?
(Reporter)

Comment 11

5 years ago
code added below CreateFile:

wchar_t buf[2048];
memset(buf, 0, sizeof(buf));
wsprintf(buf, L"[%d] %s\n",  GetLastError(), name.get());
OutputDebugStringW(buf);

output:

[0] C:\Users\jim\AppData\Roaming\Mozilla\MetroFirefox\Profiles\xolcmrbe.default\saved-telemetry-pings\91234ff6-4997-4521-8803-ea0b8bff379c

[80] C:\Users\jim\AppData\Roaming\Mozilla\MetroFirefox\Profiles\xolcmrbe.default\saved-telemetry-pings\26646894-a3cc-4266-9b2c-34a077064724

[JavaScript Error: "[Exception... "Component returned failure code: 0x80520008 (NS_ERROR_FILE_ALREADY_EXISTS) [nsIFileOutputStream.write]"  nsresult: "0x80520008 (NS_ERROR_FILE_ALREADY_EXISTS)"  location: "JS frame :: file:///T:/Mozilla/ELM-REL/dist/bin/components/TelemetryPing.js :: savePingToFile :: line 842"  data: no]" {file: "file:///T:/Mozilla/ELM-REL/dist/bin/components/TelemetryPing.js" line: 842}]
(Reporter)

Comment 12

5 years ago
Created attachment 662198 [details]
telpings zip
(Reporter)

Comment 13

5 years ago
I can add additional debug output to TelementryPing.js, just let me know. dump statements are hooked up to my remote debugger.
(Assignee)

Comment 14

5 years ago
OK, I think what's happening here is that we expect the output stream initialization to complain:

http://dxr.lanedo.com/mozilla-central/toolkit/components/telemetry/TelemetryPing.js.html#l798

but we are using DEFER_OPEN when initializing the stream, which means that the file doesn't get opened until the first operation on it that requires it to be open.

DEFER_OPEN is nice for async writing, but we never use the async writing, except for testing purposes.  Maybe we should just ditch those bits.

That explains the error, but not the multitude of ping files.
(Reporter)

Comment 15

5 years ago
> That explains the error, but not the multitude of ping files.

Isn't this just bug 792039?

Does the ping file format have something in it that would identify whether it was written by an official build vs. a local one?
(Reporter)

Comment 16

5 years ago
Note, in the local build I'm currently testing in, every time I open and close a new file is added. This probably expains why I collect so many, I open and shut the browser *a lot* while debugging. :)
(Assignee)

Comment 17

5 years ago
(In reply to Jim Mathies [:jimm] from comment #15)
> > That explains the error, but not the multitude of ping files.
> 
> Isn't this just bug 792039?

Maybe.  I was assuming that, e.g. Taras's build in comment 6 was an official one.  If official builds are writing a multitude of ping files, there's still something else at play.

> Does the ping file format have something in it that would identify whether
> it was written by an official build vs. a local one?

No.

(In reply to Jim Mathies [:jimm] from comment #16)
> Note, in the local build I'm currently testing in, every time I open and
> close a new file is added. This probably expains why I collect so many, I
> open and shut the browser *a lot* while debugging. :)

Yup! :)
(Assignee)

Comment 18

5 years ago
Created attachment 662578 [details] [diff] [review]
patch

So of course, at one point, we had planned to have an array of pending pings.  Except that, for some reason, we would only ever send off the first ping in that list...and that never got changed.  So every time we send off an official ping, we'd only send off one ping on the pending list.  Which, depending on how often you open and close your browser, means that you could easily accumulate a backlog of unsent pings.

Obvious fix: loop over all the pending pings instead of just the first.
Attachment #662578 - Flags: review?(taras.mozilla)

Comment 19

5 years ago
Comment on attachment 662578 [details] [diff] [review]
patch

this needs to be uplifted.
Attachment #662578 - Flags: review?(taras.mozilla) → review+

Comment 20

5 years ago
The code fix looks good however. I'm still not sure if this will fix my problem since it looks like only one ping is sent when I force a ping...which is probably the idle-daily ping.

Guess I'll test when this lands.
(Reporter)

Comment 21

5 years ago
Is it an issue that we might be uplifting lots of local build ping data? Say for example if I share a profile between local builds and release builds, the pings stored by local builds get saved and sent by release builds. This might be best handled in
Assignee: nobody → nfroyd
(Reporter)

Updated

5 years ago
Summary: metrofx collects a lot of saved ping files → telemetry may collect a lot of saved ping files
(Reporter)

Comment 22

5 years ago
(In reply to Jim Mathies [:jimm] from comment #21)
> Is it an issue that we might be uplifting lots of local build ping data? Say
> for example if I share a profile between local builds and release builds,
> the pings stored by local builds get saved and sent by release builds. This
> might be best handled in... 
bug 792039.
(Assignee)

Comment 23

5 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/dc17abff955a
Status: NEW → ASSIGNED
https://hg.mozilla.org/mozilla-central/rev/dc17abff955a

Should this have a test?
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
(Assignee)

Comment 25

5 years ago
Comment on attachment 662578 [details] [diff] [review]
patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Various changes made to telemetry.
User impact if declined: Saved telemetry ping files may accumulate without limit in a user's profile directory, taking up disk space and presenting a privacy risk.
Testing completed (on m-c, etc.): On m-c for ~1 week.
Risk to taking this patch (and alternatives if risky): Very low. 
String or UUID changes made by this patch: None.
Attachment #662578 - Flags: approval-mozilla-aurora?

Updated

5 years ago
Attachment #662578 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
 Approved for aurora . Please land before monday oct 8th merge.
https://hg.mozilla.org/releases/mozilla-aurora/rev/a75e458e617f
status-firefox17: --- → fixed
OS: Windows 8 Metro → Windows 8.1
You need to log in before you can comment on or make changes to this bug.