Closed Bug 707320 Opened 13 years ago Closed 13 years ago

Allow histograms to persist to disk

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla13

Tracking Flags:

Tracking

Status

firefox12

unaffected

firefox13

---

fixed

People

(Reporter: taras.mozilla, Assigned: froydnj)

References

(Blocks 1 open bug)

Details

(Whiteboard: [qa-])

Attachments

(5 files, 13 obsolete files)

0. provide reflection for separate Histograms and SampleSets 13 years ago Nathan Froyd [:froydnj] 1.21 KB, patch	taras.mozilla : review+	Details \| Diff \| Splinter Review
1. C++ interface bits for saving histograms to disk 13 years ago Nathan Froyd [:froydnj] 6.52 KB, patch		Details \| Diff \| Splinter Review
1. C++ interface bits for saving histograms to disk 13 years ago Nathan Froyd [:froydnj] 10.98 KB, patch	taras.mozilla : review-	Details \| Diff \| Splinter Review
2. JS bits to send previously-saved session bits 13 years ago Nathan Froyd [:froydnj] 6.12 KB, patch	taras.mozilla : review-	Details \| Diff \| Splinter Review
3. Tests for C++ load/save 13 years ago Nathan Froyd [:froydnj] 2.04 KB, patch	taras.mozilla : review+	Details \| Diff \| Splinter Review
1. C++ interface bits for saving histograms to disk, v2 13 years ago Nathan Froyd [:froydnj] 14.53 KB, patch		Details \| Diff \| Splinter Review
2. JS bits to send previously-saved session bits, v2 13 years ago Nathan Froyd [:froydnj] 5.70 KB, patch		Details \| Diff \| Splinter Review
part 1 - provide reflection for separate Histograms and SampleSets 13 years ago Nathan Froyd [:froydnj] 1.23 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
part 2 - C++ bits for saving histograms to disk, v3 13 years ago Nathan Froyd [:froydnj] 11.98 KB, patch	taras.mozilla : review+	Details \| Diff \| Splinter Review
part 3 - JS bits to send previously-saved session bits, v3 13 years ago Nathan Froyd [:froydnj] 7.55 KB, patch	taras.mozilla : review+	Details \| Diff \| Splinter Review
part 4 - tests for C++ load/save 13 years ago Nathan Froyd [:froydnj] 2.02 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
part 2 - C++ bits for saving histograms to disk, v3 13 years ago Nathan Froyd [:froydnj] 13.72 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
part 3 - JS bits to send previously-saved session bits, v4 13 years ago Nathan Froyd [:froydnj] 9.90 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
part 4 - tests for C++ load/save, v4 13 years ago Nathan Froyd [:froydnj] 2.57 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
fixed backout patch 13 years ago Nathan Froyd [:froydnj] 27.59 KB, patch	taras.mozilla : review+ akeybl : approval-mozilla-aurora+ mccr8 : checkin+	Details \| Diff \| Splinter Review
part 2 - C++ bits for saving histograms to disk 13 years ago Nathan Froyd [:froydnj] 13.44 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
part 3 - JS bits to send previously-saved session bits 13 years ago Nathan Froyd [:froydnj] 10.34 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
part 4 - tests for C++ load/save 13 years ago Nathan Froyd [:froydnj] 2.53 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review

(dormant account)

Reporter

Description

•

13 years ago

This way we can track stuff on shutdown.

Nathan Froyd [:froydnj]

Assignee

Comment 1

•

13 years ago

I guess this would block bug 706647; we're not going to want to add a telemetry ping to shutdown to report on how long it took to shutdown plugins?

Blocks: 706647

Nathan Froyd [:froydnj]

Assignee

Updated

•

13 years ago

Blocks: 708674

(dormant account)

Reporter

Comment 2

•

13 years ago

I was talking to Nathan/Sid about this. Seems like the way to do this is to serialize data on shutdown, then on startup: a) if there is no serialized telemetry data, send a ping same as we do now b) if there is serialized data: b1) send serialized data b2) reset UID, wipe all histograms

Keywords: privacy-review-needed

Summary: Allow certain histograms to persist to disk → Allow histograms to persist to disk

Nathan Froyd [:froydnj]

Assignee

Comment 3

•

13 years ago

This is basically what I'm implementing. I'm planning on marking certain histograms as "persistent" so we're not saving several tens of histograms to disk. Then we can send those the data from those histograms as an initial ping after startup, along with the saved UUID. We don't have to worry about combining histograms--I'm planning on just reflecting the serialized data into JS--and we don't have to deal with clearing histograms, either.

(dormant account)

Reporter

Comment 4

•

13 years ago

Nathan, after reflecting on this for a while: what's the benefit of not storing certain histograms on disk? Disk used by histograms should be pretty minimal, we can further reduce it with zlib compression(if needed).

Nathan Froyd [:froydnj]

Assignee

Comment 5

•

13 years ago

Attached patch 0. provide reflection for separate Histograms and SampleSets (obsolete) — Details — Splinter Review

Posting a WIP. This patch, however trivial, is ready for review. This patch is driven by a couple of observations: 1. We're only going to serialize SampleSets at shutdown; 2. We don't want to mix previous-session SampleSets and current-session SampleSets; 3. We want to reflect previous-session information into JS similarly to how histogramSnapshots works now. All of these mean that we need an interface to reflect SampleSets that didn't come from Histogram::SnapshotSample. This patch provides that interface.

Attachment #581294 - Flags: review?(mozilla)

Nathan Froyd [:froydnj]

Assignee

Comment 6

•

13 years ago

Attached patch 1. C++ interface bits for saving histograms to disk (obsolete) — Details — Splinter Review

This patch implements the actual save-and-load histogram bits. It persists all histograms to disk, rather than marking specific histograms as persistent. Back-of-the-envelope calculations suggest that the persistent file is going to be 10s of K, so not too bad. I haven't tested it because I don't have the JavaScript bits done yet; those bits would be the next part. Should be about 50 lines or so. Feedback on the approach welcome; if you think it's good enough for r+, please indicate as such.

Attachment #581296 - Flags: feedback?(mozilla)

(dormant account)

Reporter

Comment 7

•

13 years ago

Comment on attachment 581294 [details] [diff] [review] 0. provide reflection for separate Histograms and SampleSets should ss be const?

Attachment #581294 - Flags: review?(taras.mozilla) → review+

(dormant account)

Reporter

Comment 8

•

13 years ago

Comment on attachment 581296 [details] [diff] [review] 1. C++ interface bits for saving histograms to disk >+ Histogram::SampleSet ss; >+ h->SnapshotSample(&ss); >+ >+ // Write out explicit lengths to improve corruption detection. >+ size_t name_length = strlen(gHistograms[i].id); >+ if (!(pickle.WriteData(gHistograms[i].id, name_length) >+ && ss.Serialize(&pickle))) { >+ return false; >+ } >+ } >+ >+ return true; Don't need an if statement there. Same elsewhere >+} >+ >+bool >+DeserializeHistogramData(AutoFDClose &fd, Pickle &pickle, const char *uuid, >+ void **iter, JSContext *cx, jsval *ret) >+{ >+ size_t n_histograms = ArrayLength(gHistograms); Use HistogramCount here and elsewhere

Attachment #581296 - Flags: feedback?(taras.mozilla)

(dormant account)

Reporter

Comment 9

•

13 years ago

Comment on attachment 581296 [details] [diff] [review] 1. C++ interface bits for saving histograms to disk You should make sure that you either don't serialize empty/non-existent histograms or don't deserialize them

(dormant account)

Reporter

Comment 10

•

13 years ago

Comment on attachment 581296 [details] [diff] [review] 1. C++ interface bits for saving histograms to disk Should also read/write off main thread on the C++ side. ie have the api be read(callback). For writes we can just copy data into a buffer, fire off a thread and forget about it.

(dormant account)

Reporter

Updated

•

13 years ago

Blocks: 711217

Nathan Froyd [:froydnj]

Assignee

Comment 11

•

13 years ago

(In reply to Taras Glek (:taras) from comment #8) > >+ Histogram::SampleSet ss; > >+ h->SnapshotSample(&ss); > >+ > >+ // Write out explicit lengths to improve corruption detection. > >+ size_t name_length = strlen(gHistograms[i].id); > >+ if (!(pickle.WriteData(gHistograms[i].id, name_length) > >+ && ss.Serialize(&pickle))) { > >+ return false; > >+ } > >+ } > >+ > >+ return true; > > Don't need an if statement there. Same elsewhere Why not? We had an error, no sense in writing the rest of the file if there were problems.

Nathan Froyd [:froydnj]

Assignee

Comment 12

•

13 years ago

(In reply to Taras Glek (:taras) from comment #10) > Should also read/write off main thread on the C++ side. ie have the api be > read(callback). For writes we can just copy data into a buffer, fire off a > thread and forget about it. I assume for reads, you mean something like NS_DispatchToCurrentThread? For writes, will there be problems using the JSContext we received on a different thread?

(dormant account)

Reporter

Comment 13

•

13 years ago

(In reply to Nathan Froyd (:froydnj) from comment #11) > (In reply to Taras Glek (:taras) from comment #8) > > >+ Histogram::SampleSet ss; > > >+ h->SnapshotSample(&ss); > > >+ > > >+ // Write out explicit lengths to improve corruption detection. > > >+ size_t name_length = strlen(gHistograms[i].id); > > >+ if (!(pickle.WriteData(gHistograms[i].id, name_length) > > >+ && ss.Serialize(&pickle))) { > > >+ return false; > > >+ } > > >+ } > > >+ > > >+ return true; > > > > Don't need an if statement there. Same elsewhere > > Why not? We had an error, no sense in writing the rest of the file if there > were problems. return pickle.WriteData(gHistograms[i].id, name_length) && ss.Serialize(&pickle); > > I assume for reads, you mean something like NS_DispatchToCurrentThread? For > writes, will there be problems using the JSContext we received on a > different thread? Yes and yes, you should extract data out of js on main thread.

Nathan Froyd [:froydnj]

Assignee

Comment 14

•

13 years ago

(In reply to Taras Glek (:taras) from comment #13) > (In reply to Nathan Froyd (:froydnj) from comment #11) > > Why not? We had an error, no sense in writing the rest of the file if there > > were problems. > > return pickle.WriteData(gHistograms[i].id, name_length) && > ss.Serialize(&pickle); That if statement lives in a loop over all histograms; we don't want to return after writing the first one.

Nathan Froyd [:froydnj]

Assignee

Comment 15

•

13 years ago

Attached patch 1. C++ interface bits for saving histograms to disk (obsolete) — Details — Splinter Review

Totally redid I/O to defer it to a more convenient time. TelemetrySessionData is an XPCOM object for convenience reasons; if we really want to, I suppose it could be turned into a JS object instead.

Attachment #581296 - Attachment is obsolete: true

Attachment #584610 - Flags: review?(taras.mozilla)

Nathan Froyd [:froydnj]

Assignee

Comment 16

•

13 years ago

Attached patch 2. JS bits to send previously-saved session bits (obsolete) — Details — Splinter Review

Hooks TelemetryPing up to the new C++ functionality.

Attachment #584611 - Flags: review?(taras.mozilla)

Nathan Froyd [:froydnj]

Assignee

Comment 17

•

13 years ago

Attached patch 3. Tests for C++ load/save (obsolete) — Details — Splinter Review

Basic tests that callbacks and such work.

Attachment #584613 - Flags: review?(taras.mozilla)

(dormant account)

Reporter

Comment 18

•

13 years ago

Comment on attachment 584610 [details] [diff] [review] 1. C++ interface bits for saving histograms to disk This seems a bit too complicated, but I guess using xpcom to save function references does that. TelemetrySessionData::DeserializeHistogramData needs assertions/comments. Eg it's not clear why we'd have an entry serialized, but not read it in. Don't use new/delete, use nsAutoPtr, nsAutoPtr::forget to transfer ownership. I don't think you need two separate completion interfaces. One interfaces with "saved" "loaded" would do. I think xpcom allows leaving js methods unimplemented (ie they just return some nsresult failure code).

Attachment #584610 - Flags: review?(taras.mozilla) → review-

(dormant account)

Reporter

Comment 19

•

13 years ago

Comment on attachment 584611 [details] [diff] [review] 2. JS bits to send previously-saved session bits This is pretty simple(in a good way). A couple of observations: 1) I think we should set a custom reason string for the persisted data since it should be somewhat different from short-termed idle-daily data. Call it "unpickled" or something. 2) What happens when a) submission for old data fails, b) what if we shutdown before resubmitting data. Might be worth making it more persistent..ie keep last x sessions? r-ing to address 1. Can do 2 in a followup bug.

Attachment #584611 - Flags: review?(taras.mozilla) → review-

(dormant account)

Reporter

Comment 20

•

13 years ago

Comment on attachment 584613 [details] [diff] [review] 3. Tests for C++ load/save I'm not sure if tests can pollute the temp dir. Can you ask on #developers if that's ok? Other than that looks good.

Attachment #584613 - Flags: review?(taras.mozilla) → review+

Nathan Froyd [:froydnj]

Assignee

Comment 21

•

13 years ago

(In reply to Taras Glek (:taras) from comment #18) > I don't think you need two separate completion interfaces. One interfaces > with "saved" "loaded" would do. I think xpcom allows leaving js methods > unimplemented (ie they just return some nsresult failure code). My impression from chatting on #developers about how to do callbacks suggested that the [function] interfaces needed to be single-method so that XPCOM knows which method the callback is meant to implement. We do have an instance or two of multi-method [function] interfaces in the tree, but I don't know how those work or if they work at all (might be dead code?). (In reply to Taras Glek (:taras) from comment #20) > I'm not sure if tests can pollute the temp dir. Can you ask on #developers > if that's ok? The tests for the StartupCache touch the temp dir, so there's at least precedent. The removeFile do_register_cleanup was intended to attempt to clean up after ourselves.

Nathan Froyd [:froydnj]

Assignee

Comment 22

•

13 years ago

(In reply to Taras Glek (:taras) from comment #19) > 1) I think we should set a custom reason string for the persisted data since > it should be somewhat different from short-termed idle-daily data. Call it > "unpickled" or something. AFAICS, the reason parameter to send() is never actually sent; it's just used to determine whether to send to a fixed URL or a session-unique UUID URL. Is that what you were thinking of changing? If so, did you want to include that in the ping payload?

Nathan Froyd [:froydnj]

Assignee

Updated

•

13 years ago

Blocks: 715299

(dormant account)

Reporter

Comment 23

•

13 years ago

(In reply to Nathan Froyd (:froydnj) from comment #22) > (In reply to Taras Glek (:taras) from comment #19) > > 1) I think we should set a custom reason string for the persisted data since > > it should be somewhat different from short-termed idle-daily data. Call it > > "unpickled" or something. > > AFAICS, the reason parameter to send() is never actually sent; it's just > used to determine whether to send to a fixed URL or a session-unique UUID > URL. Is that what you were thinking of changing? If so, did you want to > include that in the ping payload? You are right. http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/TelemetryPing.js#210 reason is sent here. Still seems like we should change it for picked packets. re callbacks: It's a minor enough api, existing approach is fine.

Sid Stamm [:geekboy or :sstamm]

Comment 24

•

13 years ago

removing privacy-review-needed keyword as this has been updated in the privacy review for Telemetry. froydnj: is your code still following the procedure outlined in comment 2?

Keywords: privacy-review-needed

Nathan Froyd [:froydnj]

Assignee

Comment 25

•

13 years ago

No, it's not. The scheme now is: On shutdown, save all histograms we've collected and the UUID of the session to disk. When sending a telemetry ping, check for saved data. If there is any, also send a ping with the saved data associated with the old session's UUID.

Sid Stamm [:geekboy or :sstamm]

Comment 26

•

13 years ago

Okay, so that results in two pings if there's saved data. Is the saved data and UUID erased after it is successfully transmitted? -Sid

Nathan Froyd [:froydnj]

Assignee

Comment 27

•

13 years ago

Mmmm, the saved data is erased from memory (insofar as references to it go away), but not from disk. Was going to rely on browser shutdown overwriting saved data with new saved data.

Nathan Froyd [:froydnj]

Assignee

Comment 28

•

13 years ago

Attached patch 1. C++ interface bits for saving histograms to disk, v2 (obsolete) — Details — Splinter Review

New C++ patch, with some comments addressed and a few major changes. Since pickling blasts raw memory out to disk, we need to care about endianness and word size. Therefore, as StartupCache does, we need to make the saved datafile dependent on both of those characteristics. The easiest way to do this is to move the filename bits into C++ and out of JS. I haven't quite figured whether testing is feasible with this change or not (our tests were constructing the saved filename manually before). This version also changes how things are written out to disk. - Version number for the file format, just in case. - Instead of saving all histograms (and possibly creating a bunch in the process), we now take the path of GetHistogramSnapshots and just iterate over the ones that have explicitly collected data. This makes reading somewhat more complicated, of course. This second change also improves the situation we'd find ourselves when reading histograms in. As noted in TelemetrySessionData::ReflectSamples, we need to construct Histograms for every saved histogram we read in, by virtue of needing ranges and whatnot. By only saving the "interesting" histograms, we reduce the chance of constructing new histograms at load time (since histograms that were present in the previous session are most likely present in the current session). I don't know of a good way around this, short of rewriting a fair amount of code.

Attachment #584610 - Attachment is obsolete: true

Attachment #586442 - Flags: review?(taras.mozilla)

Nathan Froyd [:froydnj]

Assignee

Comment 29

•

13 years ago

Attached patch 2. JS bits to send previously-saved session bits, v2 (obsolete) — Details — Splinter Review

Updated JS bits for review comments and for moving filename bits into C++.

Attachment #584611 - Attachment is obsolete: true

Attachment #586444 - Flags: review?(taras.mozilla)

Sid Stamm [:geekboy or :sstamm]

Comment 30

•

13 years ago

(In reply to Nathan Froyd (:froydnj) from comment #27) > Mmmm, the saved data is erased from memory (insofar as references to it go > away), but not from disk. Was going to rely on browser shutdown overwriting > saved data with new saved data. Please delete it from disk after it's successfully sent. This will minimize the chance we'll send it again by accident and minimizes the time data is persisted to disk. Also, is any of the data saved on shutdown merged with new data collected after re-start, or are the sessions (UUID and data) kept completely separate?

Nathan Froyd [:froydnj]

Assignee

Comment 31

•

13 years ago

(In reply to Sid Stamm [:geekboy] from comment #30) > Please delete it from disk after it's successfully sent. This will minimize > the chance we'll send it again by accident and minimizes the time data is > persisted to disk. Can do. > Also, is any of the data saved on shutdown merged with new data collected > after re-start, or are the sessions (UUID and data) kept completely separate? We only persist one session at the moment; there's no merging involved. bug 715299 concerns persisting multiple sessions, if you'd like to go comment prior to investigation on what to watch out for (or if the idea is undesirable from a privacy perspective)

Nathan Froyd [:froydnj]

Assignee

Updated

•

13 years ago

Depends on: 704176

(dormant account)

Reporter

Comment 32

•

13 years ago

Comment on attachment 586442 [details] [diff] [review] 1. C++ interface bits for saving histograms to disk, v2 old patch

Attachment #586442 - Flags: review?(taras.mozilla)

(dormant account)

Reporter

Comment 33

•

13 years ago

Comment on attachment 586444 [details] [diff] [review] 2. JS bits to send previously-saved session bits, v2 old patch

Attachment #586444 - Flags: review?(taras.mozilla)

Nathan Froyd [:froydnj]

Assignee

Comment 34

•

13 years ago

Attached patch part 1 - provide reflection for separate Histograms and SampleSets — Details — Splinter Review

Rebasing against HEAD. Carrying over r+.

Attachment #581294 - Attachment is obsolete: true

Attachment #589269 - Flags: review+

Nathan Froyd [:froydnj]

Assignee

Comment 35

•

13 years ago

Attached patch part 2 - C++ bits for saving histograms to disk, v3 (obsolete) — Details — Splinter Review

Refresh against HEAD. We now drop histograms that we don't recognize on the floor during deserialization--assuming that we can read their data. Partly need to do this so that tests work, but it's a good thing to do in general.

Attachment #586442 - Attachment is obsolete: true

Attachment #589270 - Flags: review?(taras.mozilla)

Nathan Froyd [:froydnj]

Assignee

Comment 36

•

13 years ago

Attached patch part 3 - JS bits to send previously-saved session bits, v3 (obsolete) — Details — Splinter Review

Refresh against HEAD. I moved the wordsize, endian-naming bits into JS; I think it's worthwhile to keep them even with bug 704176 in the tree. Plus it's somewhat less complicated in JS. Doing naming in JS means that it's easier to test, too; less clutter in the nsITelemetry interface.

Attachment #586444 - Attachment is obsolete: true

Attachment #589271 - Flags: review?(taras.mozilla)

Nathan Froyd [:froydnj]

Assignee

Comment 37

•

13 years ago

Attached patch part 4 - tests for C++ load/save (obsolete) — Details — Splinter Review

Refresh against HEAD and with everything else. Carrying over r+.

Attachment #584613 - Attachment is obsolete: true

Attachment #589272 - Flags: review+

(dormant account)

Reporter

Comment 38

•

13 years ago

Comment on attachment 589270 [details] [diff] [review] part 2 - C++ bits for saving histograms to disk, v3 This looks good. idl docs should mention writing/reading occurs on a different thread.

(dormant account)

Reporter

Updated

•

13 years ago

Attachment #589270 - Flags: review?(taras.mozilla) → review+

(dormant account)

Reporter

Updated

•

13 years ago

Attachment #589271 - Flags: review?(taras.mozilla) → review+

(dormant account)

Reporter

Comment 39

•

13 years ago

Having r+ed above, I worry that this creates a race condition(ie if we quit before we get to the async save thread). I think saveHistograms should have a sync parameter which can be set when calling it on shutdown. I think we should also do async saves periodically in case the browser crashes(ie while polling for memory usage, we can also trigger a save once an hour).

Nathan Froyd [:froydnj]

Assignee

Updated

•

13 years ago

Blocks: 719167

Nathan Froyd [:froydnj]

Assignee

Updated

•

13 years ago

Blocks: 719437

Nathan Froyd [:froydnj]

Assignee

Comment 40

•

13 years ago

Attached patch part 2 - C++ bits for saving histograms to disk, v3 (obsolete) — Details — Splinter Review

Adding isSynchronous flag to saveHistograms and tweaking IDL comments, as requested. Nothing major, so carrying over r+.

Attachment #589270 - Attachment is obsolete: true

Attachment #589901 - Flags: review+

Nathan Froyd [:froydnj]

Assignee

Comment 41

•

13 years ago

Attached patch part 3 - JS bits to send previously-saved session bits, v4 (obsolete) — Details — Splinter Review

Synchronously save histograms at quit-application-granted. Carrying over r+.

Attachment #589271 - Attachment is obsolete: true

Attachment #589902 - Flags: review+

Nathan Froyd [:froydnj]

Assignee

Comment 42

•

13 years ago

Attached patch part 4 - tests for C++ load/save, v4 (obsolete) — Details — Splinter Review

Check for the existence of the temporary file before deleting it. Nothing big, carrying over r+.

Attachment #589272 - Attachment is obsolete: true

Attachment #589903 - Flags: review+

Nathan Froyd [:froydnj]

Assignee

Updated

•

13 years ago

Keywords: checkin-needed

Andrew McCreight [:mccr8] (PTO until 12-2)

Comment 43

•

13 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/5a315a55ea7e https://hg.mozilla.org/integration/mozilla-inbound/rev/6c2b90a11ea8 https://hg.mozilla.org/integration/mozilla-inbound/rev/5c40a413d9a9 https://hg.mozilla.org/integration/mozilla-inbound/rev/51bffa83d14f

Keywords: checkin-needed

OS: Windows 7 → All

Hardware: x86 → All

Target Milestone: --- → mozilla12

Version: unspecified → Trunk

Marco Bonardo [:mak]

Comment 44

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/5a315a55ea7e https://hg.mozilla.org/mozilla-central/rev/6c2b90a11ea8 https://hg.mozilla.org/mozilla-central/rev/5c40a413d9a9 https://hg.mozilla.org/mozilla-central/rev/51bffa83d14f

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Nathan Froyd [:froydnj]

Assignee

Updated

•

13 years ago

No longer blocks: 706647

Justin Lebar (not reading bugmail)

Updated

•

13 years ago

Blocks: 722545

Scoobidiver (away)

Updated

•

13 years ago

No longer blocks: 722545

Depends on: 722545

Justin Lebar (not reading bugmail)

Comment 45

•

13 years ago

Backing this out due to startup crashes (particularly, bug 722545).

Depends on: 722625

Justin Lebar (not reading bugmail)

Comment 46

•

13 years ago

Backed out from m-c before the uplift to FF13. But this backout may not be part of the uplift, depending on where we branch. So we'll have to keep an eye on it. https://hg.mozilla.org/mozilla-central/rev/d74a924a149b

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Ed Morley [:emorley]

Comment 47

•

13 years ago

(In reply to Justin Lebar [:jlebar] from comment #46) > Backed out from m-c before the uplift to FF13. But this backout may not be > part of the uplift, depending on where we branch. So we'll have to keep an > eye on it. > > https://hg.mozilla.org/mozilla-central/rev/d74a924a149b This backout has been undone on m-c since it caused M-oth orange: https://tbpl.mozilla.org/?rev=d74a924a149b https://hg.mozilla.org/mozilla-central/rev/974b0efab5c8 akeybl is going to revert the backout on aurora as well, since d74a924a149b (the backout) made it there, but 974b0efab5c8 didn't. The startup crashes in comment 45 therefore will need to be resolved on both m-c and aurora.

status-firefox12: --- → fixed

status-firefox13: --- → fixed

tracking-firefox12: --- → ?

Nathan Froyd [:froydnj]

Assignee

Comment 48

•

13 years ago

Attached patch fixed backout patch — Details — Splinter Review

Not quite sure how to handle this, but here's a patch that correctly backs out these patches on trunk. It's not a straight revision because TelemetryPing.getSessionPayloadAndSlug is also used in the observe method. TBPL is here: https://tbpl.mozilla.org/?tree=Try&rev=bd4b87fee4f4 M-oth contains the bits that wind up calling getSessionPayloadAndSlug (and were turned orange on the first backout), and the xpcshell tests hit all the Telemetry tests.

Attachment #593599 - Flags: review?(taras.mozilla)

(dormant account)

Reporter

Updated

•

13 years ago

Attachment #593599 - Flags: review?(taras.mozilla) → review+

Vladan Djeric (:vladan)

Comment 49

•

13 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/67a219c9fb12

Matthew N. [:MattN]

Updated

•

13 years ago

Blocks: 723802

Mark Reid [:mreid]

Comment 50

•

13 years ago

I just hit an exception in getSessionPayloadAndSlug using Nightly, sync'd from mozilla-central a few days ago: Timestamp: 12-02-03 08:04:09 AM Error: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsITelemetrySessionData.snapshots]" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: file:///...objdir.../dist/bin/components/TelemetryPing.js :: getSessionPayloadAndSlug :: line 432" data: no] Source File: file:///...objdir.../dist/bin/components/TelemetryPing.js Line: 432 Here's the code at and around line 432: getSessionPayloadAndSlug: function getSessionPayloadAndSlug(reason) { // Use a deterministic url for testing. let isTestPing = (reason == "test-ping"); let havePreviousSession = !!this._prevSession; let slug = (isTestPing ? reason : (havePreviousSession ? this._prevSession.uuid : this._uuid)); let payloadObj = { ver: PAYLOAD_VERSION, // Send a different reason string for previous session data. info: this.getMetadata(havePreviousSession ? "saved-session" : reason), }; if (havePreviousSession) { >>432>> payloadObj.histograms = this.getHistograms(this._prevSession.snapshots); } else { payloadObj.simpleMeasurements = getSimpleMeasurements(); payloadObj.histograms = this.getHistograms(Telemetry.histogramSnapshots); payloadObj.slowSQL = Telemetry.slowSQL; } return { previous: !!havePreviousSession, slug: slug, payload: JSON.stringify(payloadObj) }; },

Ed Morley [:emorley]

Comment 51

•

13 years ago

(In reply to Vladan Djeric (:vladan) from comment #49) > https://hg.mozilla.org/integration/mozilla-inbound/rev/67a219c9fb12 Merge of backout: https://hg.mozilla.org/mozilla-central/rev/67a219c9fb12

Ed Morley [:emorley]

Updated

•

13 years ago

Target Milestone: mozilla12 → ---

Nathan Froyd [:froydnj]

Assignee

Comment 52

•

13 years ago

Comment on attachment 593599 [details] [diff] [review] fixed backout patch [Approval Request Comment] Regression caused by (bug #): This bug. User impact if declined: Crashes on startup if telemetry is enabled. Testing completed (on m-c, etc.): Patch survived testing on mozilla-inbound, recently checked into mozilla-central. Risk to taking this patch (and alternatives if risky): Epsilon. String changes made by this patch: None.

Attachment #593599 - Flags: approval-mozilla-aurora?

Alex Keybl [:akeybl]

Comment 53

•

13 years ago

Comment on attachment 593599 [details] [diff] [review] fixed backout patch [Triage Comment] Approved for Aurora.

Attachment #593599 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Alex Keybl [:akeybl]

Updated

•

13 years ago

tracking-firefox12: ? → +

Nathan Froyd [:froydnj]

Assignee

Updated

•

13 years ago

Attachment #593599 - Flags: checkin?

Nathan Froyd [:froydnj]

Assignee

Comment 54

•

13 years ago

Please checkin the backout patch on aurora. Adding the a=akeybl would be great too, if you'd be so kind.

Keywords: checkin-needed

Andrew McCreight [:mccr8] (PTO until 12-2)

Comment 55

•

13 years ago

Comment on attachment 593599 [details] [diff] [review] fixed backout patch backed out checked in. https://hg.mozilla.org/releases/mozilla-aurora/rev/26c731b682ce

Attachment #593599 - Flags: checkin? → checkin+

Andrew McCreight [:mccr8] (PTO until 12-2)

Comment 56

•

13 years ago

back out checked in, rather...

Keywords: checkin-needed

Nathan Froyd [:froydnj]

Assignee

Comment 57

•

13 years ago

The reason that we had debug-only crashes was that the Histogram class is, by design, not thread-safe. Each Histogram, in addition to maintaining counts for its buckets, also maintains a "redundant count", a sum of all the counts we have accumulated thus far. This produces a situation where we have a race between two threads, A and B: A: Inserts a count into a histogram H B: Snapshots histogram H A: Updates the redundant count of H B now has data where sum(counts) != redundant_count. If we write the snapshot that B took to disk and read it later, we hit a debug-only check in the deserialization code that sum(counts) == redundant_count. This check was the cause of bugs like bug 722545. On opt builds, this degrades to a mere indication that deserialization failed. We take this indication to mean that *all* of the saved histogram data is corrupt. This is clearly not a good situation. Note, however, that the histogram code does allow for this possibility when checking for corruption and permits the two count measures to differ by no more than 1 when declaring a histogram not corrupt. So there is a bit of inconsistency here: the code is OK with drift in-memory, but has zero tolerance when it comes to serialization. Options for fixing this: 1) One big Telemetry lock that you have to grab before accumulating data or snapshotting data. This is suboptimal. Note that we snapshot histograms in more places than just persisting them to disk; we snapshot them for reflection into JS, for instance. 2) Per-histogram locks, effectively making accumulation and snapshotting thread-safe. The methods on Histogram for accumulating data are virtual for precisely this sort of thing, though I haven't looked to see if you can extend them easily in practice. 3) Some sort of reader-writer locking scheme where you can accumulate into any number of histograms simultaneously, but snapshotting is atomic with respect to accumulation. I like this one, but it'd be a good deal more complicated, I think, than any of the others. Note that this would leave us open to inconsistent sum(counts) and redundant_count, as two threads could be accumulating into the same histogram simultaneously. 4) Changing how snapshots are serialized, either by not serializing redundant_count or by inventing some checksum to make sure that counts and redundant_count didn't get corrupted with respect to each other. Then we can remove the sum(counts) == redundant_count check. This leaves the race in place, but makes the serialization process saner, IMHO. It's OK to read in slightly corrupted histograms, as we tolerate slightly corrupted histograms in-memory (see above discussion). This option requires hacking on chromium-imported code, which may or may not make life more difficult for future merges. At the very least, I think distinguishing between the "I didn't read all of the saved data" versus "the data looks corrupt" in the deserialization code would be useful. I think options 2 and 4 are the most reasonable; I'm leaning slightly towards option 4, but I'm curious whether anybody else has an opinion.

Nathan Froyd [:froydnj]

Assignee

Updated

•

13 years ago

Depends on: 726761

Nathan Froyd [:froydnj]

Assignee

Comment 58

•

13 years ago

Attached patch part 2 - C++ bits for saving histograms to disk — Details — Splinter Review

Rebasing.

Attachment #600912 - Flags: review+

Nathan Froyd [:froydnj]

Assignee

Comment 59

•

13 years ago

Attached patch part 3 - JS bits to send previously-saved session bits — Details — Splinter Review

Rebasing.

Attachment #589902 - Attachment is obsolete: true

Attachment #600913 - Flags: review+

Nathan Froyd [:froydnj]

Assignee

Updated

•

13 years ago

Attachment #589901 - Attachment is obsolete: true

Nathan Froyd [:froydnj]

Assignee

Updated

•

13 years ago

Whiteboard: [autoland-try:589269,600912,600913,589903]

Mozilla RelEng Bot

Updated

•

13 years ago

Whiteboard: [autoland-try:589269,600912,600913,589903] → [autoland-in-queue]

Mozilla RelEng Bot

Comment 60

•

13 years ago

Autoland Patchset: Patches: 589269, 600912, 600913, 589903 Branch: mozilla-central => try Error applying patch 589903 to mozilla-central. patching file toolkit/components/telemetry/tests/unit/test_nsITelemetry.js Hunk #1 FAILED at 99 1 out of 1 hunks FAILED -- saving rejects to file toolkit/components/telemetry/tests/unit/test_nsITelemetry.js.rej abort: patch failed to apply Could not apply and push patchset:

Mozilla RelEng Bot

Updated

•

13 years ago

Whiteboard: [autoland-in-queue]

Nathan Froyd [:froydnj]

Assignee

Comment 61

•

13 years ago

Attached patch part 4 - tests for C++ load/save — Details — Splinter Review

Update All The Patches!

Attachment #589903 - Attachment is obsolete: true

Attachment #600921 - Flags: review+

Nathan Froyd [:froydnj]

Assignee

Updated

•

13 years ago

Whiteboard: [autoland-try:589269,600912,600913,600921]

Mozilla RelEng Bot

Updated

•

13 years ago

Whiteboard: [autoland-try:589269,600912,600913,600921] → [autoland-in-queue]

Mozilla RelEng Bot

Comment 62

•

13 years ago

Autoland Patchset: Patches: 589269, 600912, 600913, 600921 Branch: mozilla-central => try Destination: http://hg.mozilla.org/try/pushloghtml?changeset=8f0fae38e776 Try run started, revision 8f0fae38e776. To cancel or monitor the job, see: https://tbpl.mozilla.org/?tree=Try&rev=8f0fae38e776

Nathan Froyd [:froydnj]

Assignee

Comment 63

•

13 years ago

The try run looks OK, despite going for 24 hours. Let's try to get this in and have some soak time before the Aurora uplift.

Keywords: checkin-needed

Justin Lebar (not reading bugmail)

Comment 64

•

13 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/a853405443e7 https://hg.mozilla.org/integration/mozilla-inbound/rev/229fe7214b40 https://hg.mozilla.org/integration/mozilla-inbound/rev/be00254af671 https://hg.mozilla.org/integration/mozilla-inbound/rev/f9eb5f6242f7

Keywords: checkin-needed

Mozilla RelEng Bot

Comment 65

•

13 years ago

Try run for 8f0fae38e776 is complete. Detailed breakdown of the results available here: https://tbpl.mozilla.org/?tree=Try&rev=8f0fae38e776 Results (out of 181 total builds): exception: 1 success: 144 warnings: 20 failure: 16 Builds (or logs if builds failed) available at: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/autolanduser@mozilla.com-8f0fae38e776 Timed out after 12 hours without completing.

Mozilla RelEng Bot

Updated

•

13 years ago

Whiteboard: [autoland-in-queue]

Matt Brubeck (:mbrubeck)

Comment 66

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/a853405443e7 https://hg.mozilla.org/mozilla-central/rev/229fe7214b40 https://hg.mozilla.org/mozilla-central/rev/be00254af671 https://hg.mozilla.org/mozilla-central/rev/f9eb5f6242f7

Status: REOPENED → RESOLVED

Closed: 13 years ago → 13 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla13

Alex Keybl [:akeybl]

Comment 67

•

13 years ago

I've been told that this bug prevents us from getting post-startup telemetry data. If that's the case, we should consider uplifting to Aurora 12 as soon as this has had a week or so to bake. We need telemetry data to know whether or not we have perf regressions across releases.

status-firefox12: fixed → affected

Nathan Froyd [:froydnj]

Assignee

Comment 68

•

13 years ago

(In reply to Alex Keybl [:akeybl] from comment #67) > I've been told that this bug prevents us from getting post-startup telemetry > data. This bug prevents us from receiving data collected in the last ~day before a running Firefox instance is shutdown. It does *not* affect normal telemetry gathering; we can still gather telemetry data for perf regression information across releases without this bug being fixed in 12. I'm inclined to not nominate this for Aurora 12. Given the problems that showed up last time, I'd like to let this bake for a complete release cycle.

(dormant account)

Reporter

Comment 69

•

13 years ago

This was backed out of 12, so it doesn't affect 12.

status-firefox12: affected → unaffected

Alex Keybl [:akeybl]

Comment 70

•

13 years ago

(In reply to Nathan Froyd (:froydnj) from comment #68) > This bug prevents us from receiving data collected in the last ~day before a > running Firefox instance is shutdown. It does *not* affect normal telemetry > gathering; we can still gather telemetry data for perf regression > information across releases without this bug being fixed in 12. > > I'm inclined to not nominate this for Aurora 12. Given the problems that > showed up last time, I'd like to let this bake for a complete release cycle. My concern here is that we're many times investigating performance regressions that accumulate over time and cause Firefox users to frequently restart their browser - from talking with Lawrence it sounds like telemetry data on these users would be lost. Is there any targeted QA that we could do to land this fix a cycle early? Telemetry and other data gathering are the places that we're willing to take on a little more risk since it could be eye opening in our understanding of perf regressions. If this carries too much risk, I don't mean to force the issue - I just want to make sure we're on the same page that this is the only external performance data we collect and can use to resolve perf regressions (seen in both FF10 and FF11 most recently).

(dormant account)

Reporter

Comment 71

•

13 years ago

(In reply to Alex Keybl [:akeybl] from comment #70) > (In reply to Nathan Froyd (:froydnj) from comment #68) > > This bug prevents us from receiving data collected in the last ~day before a > > running Firefox instance is shutdown. It does *not* affect normal telemetry > > gathering; we can still gather telemetry data for perf regression > > information across releases without this bug being fixed in 12. > > > > I'm inclined to not nominate this for Aurora 12. Given the problems that > > showed up last time, I'd like to let this bake for a complete release cycle. > > My concern here is that we're many times investigating performance > regressions that accumulate over time and cause Firefox users to frequently > restart their browser - from talking with Lawrence it sounds like telemetry > data on these users would be lost. Telemetry only records first 5-10minutes of browsing(+ every 24hours) because of the way our idle-daily notification works. > > Is there any targeted QA that we could do to land this fix a cycle early? > Telemetry and other data gathering are the places that we're willing to take > on a little more risk since it could be eye opening in our understanding of > perf regressions. > > If this carries too much risk, I don't mean to force the issue - I just want > to make sure we're on the same page that this is the only external > performance data we collect and can use to resolve perf regressions (seen in > both FF10 and FF11 most recently). The fact that we are tracking ff10 perf is news to me. This functionality has a long teething period. Both of the 2 landings had bugs. Second landing did not regress anything, hopefully things are now working with bug 732970 fixed. We can only consider uplifting this once this has baked on trunk for a while and we have serverside evidence that things work. It takes about 7 days to get sufficient confidence that telemetry is working as expected.

Andrew McCreight [:mccr8] (PTO until 12-2)

Updated

•

13 years ago

Depends on: 733977

(dormant account)

Reporter

Updated

•

13 years ago

Blocks: 736268

u279076

Comment 72

•

13 years ago

Is there anything QA can do to verify this fix?

Nathan Froyd [:froydnj]

Assignee

Comment 73

•

13 years ago

This was all disabled in bug 756152.

u279076

Comment 74

•

13 years ago

Thanks Nathan. Marking this [qa-]. We will verify-disabled in bug 756152. If there is anything you can add to that bug to aid in our verification, please comment there. Thanks.

Whiteboard: [qa-]

You need to log in before you can comment on or make changes to this bug.