Closed Bug 1270213 Opened 8 years ago Closed 8 years ago

In telemetry JSONStore, use docID for file names rather than unique ID argument

Categories

(Firefox for Android Graveyard :: General, defect)

All
Android
defect
Not set
normal

Tracking

(firefox49 fixed)

RESOLVED FIXED
Firefox 49
Tracking Status
firefox49 --- fixed

People

(Reporter: mcomella, Assigned: mcomella)

References

Details

Attachments

(7 files)

58 bytes, text/x-review-board-request
sebastian
: review+
Details
58 bytes, text/x-review-board-request
sebastian
: review+
Details
58 bytes, text/x-review-board-request
sebastian
: review+
Details
58 bytes, text/x-review-board-request
sebastian
: review+
Details
58 bytes, text/x-review-board-request
sebastian
: review+
Details
58 bytes, text/x-review-board-request
sebastian
: review+
Details
58 bytes, text/x-review-board-request
sebastian
: review+
Details
We'll lose an data in the old format when we switch to the new system (unless we keep dead code around) so we should fix this before it leaves Nightly.
Relevant info:

(In reply to Michael Comella (:mcomella) from bug 1243585 comment #92)
> (In reply to Mark Finkle (:mfinkle) from comment #91)
> > (In reply to Michael Comella (:mcomella) from comment #45)
> > > (In reply to Mark Finkle (:mfinkle) from comment #44)
> > > > As long as we are not encoding data into the filename for extraction,
> > > > I'm fine with the plan.
> > > 
> > > For my edification, why is that?
> > 
> > Because encoding bits of useful data into the filename is a bad pattern.
> 
> Why is it a bad pattern? I see a series of trade-offs:
>  - Error-prone
>  + Data is accessible without having to read the file (useful for formats
> that require you to read the whole file or risk complexity, like JSON)
>  + Prevents duplication of data (e.g. if you need to modify it, but see
> also: error-prone)
> 
> To be explicit about what I'm doing, I'm saving a unique ID for the pings in
> the filename. This ID is not extracted to be added back to the ping, but is
> used by the store to identify specific pings, e.g. to remove them when an
> upload is completed, or (since we assume the IDs are strictly increasing)
> when pruning we can remove the oldest pings.
> 
> > Encoding data that is not even required outside the ping (like the sequence
> > number) doesn't even make sense.
> 
> The sequence number is required for the core ping and I copy it to be the
> unique ID (rather than creating another arbitrary ID). The sequence number
> may not exist in all pings, which is why the unique ID is duplicated &
> separated.
> 
> > I could see using the docId as the full filename. It's a UUID and would
> > never conflict with different ping types. The docId is actually needed in
> > the URL generation too, so using it as the filename has the benefit not
> > needing to open the ping to get any data.
> 
> I like this because it doesn't rely on the ping-creator specifying an ID
> that is potentially incorrect – non-unique or non-increasing. We can
> generate the docID ourselves under the hood and use it as the file name.
> We'll have to rely on file modified times (creation times are not accessible
> in Java 7) for pruning, but since we write the files once, this should be
> fine.
> 
> > In fact, desktop/gecko uses docId as the archive filename. Did you take a
> > look at how desktop/gecko manages archiving and resending pings?
> 
> I have not – I'll look into it.
11:34 <mfinkle> mcomella: TelemetrySend.jsm has some stuff too
11:34 <mfinkle> they have limits on how many archived pings to save
11:34 <mfinkle> for space i guess
11:34 <mfinkle> the android limit is lower than desktop, for example
12:07 <gfritzsche> mcomella: starting point is here: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/TelemetrySend.jsm#691 [ed: in submitPing function]
From a quick skim: it looks like desktop saves pings into files with the ID as their name. Tangent: they don't appear to do batch operations on the pings – they retrieve/remove with specific IDs.
These will be used by the Store in the upcoming patches.

Review commit: https://reviewboard.mozilla.org/r/51187/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/51187/
Attachment #8749847 - Flags: review?(s.kaspari)
Attachment #8749848 - Flags: review?(s.kaspari)
Attachment #8749849 - Flags: review?(s.kaspari)
Attachment #8749850 - Flags: review?(s.kaspari)
Attachment #8749851 - Flags: review?(s.kaspari)
Attachment #8749852 - Flags: review?(s.kaspari)
Attachment #8749853 - Flags: review?(s.kaspari)
I don't actually use isUUID util, but rather the static regex constants. I'm
not sure if I should remove it because:
  1) I use it to test the regex constants/patterns are correct
  2) It seems like someone might find it useful at some point

Review commit: https://reviewboard.mozilla.org/r/51191/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/51191/
Comment on attachment 8749847 [details]
MozReview Request: Bug 1270213 - Move generic file filters to FileUtil. r=sebastian

https://reviewboard.mozilla.org/r/51187/#review48065
Attachment #8749847 - Flags: review?(s.kaspari) → review+
Attachment #8749848 - Flags: review?(s.kaspari) → review+
Comment on attachment 8749848 [details]
MozReview Request: Bug 1270213 - Add FileLastModifiedComparator. r=sebastian

https://reviewboard.mozilla.org/r/51189/#review48067
Attachment #8749849 - Flags: review?(s.kaspari)
Comment on attachment 8749849 [details]
MozReview Request: Bug 1270213 - Add UUIDUtil.isUUID. r=sebastian

https://reviewboard.mozilla.org/r/51191/#review48069

::: mobile/android/base/java/org/mozilla/gecko/util/UUIDUtil.java:26
(Diff revision 1)
> +
> +    /**
> +     * Return if the given string is a UUID. A caller can use `UUID.fromString` and catch the exception
> +     * for this same behavior, but exceptions cause a performance hit and shouldn't be used for control flow.
> +     */
> +    public static boolean isUUID(@NonNull final String UUID) {

Mh. If this is actually never used then I'd remove it. If you only use the regex/pattern then why not test them in the test?

The current design allows me to change the implementation of isUUID() and the tests pass. However your regex/pattern won't have any coverage anymore.

::: mobile/android/tests/background/junit4/src/org/mozilla/gecko/util/TestUUIDUtil.java:23
(Diff revision 1)
> +    public void testIsUUID() throws Exception {
> +        for (int i = 0; i < 3; ++i) {
> +            final String uuid = UUID.randomUUID().toString();
> +            assertTrue("Generated UUID, " + uuid + ", is a UUID", UUIDUtil.isUUID(uuid));
> +        }
> +
> +        final String uuid = UUID.randomUUID().toString();
> +        for (final String str : new String[] { "not-a-uuid", uuid.substring(0, uuid.length() - 1), uuid + "lol" }) {
> +            assertFalse("Given UUID, " + str + ", is not a valid UUID", UUIDUtil.isUUID(str));
> +        }

Somehow this looks more like a test of java.util.UUID.randomUUID() than isUUID(). While I like this saniy check, I think you should at least test this against some static values in isolation. Otherwise your test just asserts that isUUID() returns true for whatever randomUUID() returns.
Comment on attachment 8749850 [details]
MozReview Request: Bug 1270213 - Factor lockAndReadJSONFromFile out from getAllPings. r=sebastian r=sebastian

https://reviewboard.mozilla.org/r/51193/#review48073
Attachment #8749850 - Flags: review?(s.kaspari) → review+
Attachment #8749851 - Flags: review?(s.kaspari) → review+
Comment on attachment 8749851 [details]
MozReview Request: Bug 1270213 - Update TelemetryPing & builders to use docID. r=sebastian

https://reviewboard.mozilla.org/r/51195/#review48075

::: mobile/android/base/java/org/mozilla/gecko/telemetry/pingbuilders/TelemetryCorePingBuilder.java:55
(Diff revision 1)
>      private static final String PROFILE_CREATION_DATE = "profileDate";
> -    public static final String SEQ = "seq";
> +    private static final String SEQ = "seq";
>      private static final String VERSION_ATTR = "v";
>  
> -    public TelemetryCorePingBuilder(final Context context, final int sequenceNumber) {
> -        super(sequenceNumber);
> +    public TelemetryCorePingBuilder(final Context context) {
> +        super();

Nit: You do not need to explicitly call super:
http://docs.oracle.com/javase/tutorial/java/IandI/super.html
"If a constructor does not explicitly invoke a superclass constructor, the Java compiler automatically inserts a call to the no-argument constructor of the superclass. [..]"
Comment on attachment 8749852 [details]
MozReview Request: Bug 1270213 - Update TelemetryUploadService to use doc ID. r=sebastian

https://reviewboard.mozilla.org/r/51197/#review48077
Attachment #8749852 - Flags: review?(s.kaspari) → review+
https://reviewboard.mozilla.org/r/51197/#review48079

> This is not yet expected to compile.

Btw. if those patches are not going to be squashed then I wonder if this will affect bisecting. There will be a bunch of revisions that do not compile and can't be tested.
Comment on attachment 8749853 [details]
MozReview Request: Bug 1270213 - Move PingStore* to doc ID filenames. r=sebastian

https://reviewboard.mozilla.org/r/51199/#review48081

::: mobile/android/base/java/org/mozilla/gecko/telemetry/stores/TelemetryJSONFilePingStore.java:104
(Diff revision 1)
> -        final SortedSet<Integer> ids = getIDsFromFileList(files);
> -        removeFilesWithSmallestIDs(ids, files.length - MAX_PING_COUNT);
> -    }
> +        final SortedSet<File> sortedFiles = new TreeSet<>(new FileLastModifiedComparator());
> +        sortedFiles.addAll(Arrays.asList(files));
> +        deleteSmallestFiles(sortedFiles, files.length - MAX_PING_COUNT);

Can this be affected by manually or automatically (ntpd) updating the time?

Does the filesystem use a constant timezone or can this be affected by travelling?
Attachment #8749853 - Flags: review?(s.kaspari) → review+
https://reviewboard.mozilla.org/r/51199/#review48081

> Can this be affected by manually or automatically (ntpd) updating the time?
> 
> Does the filesystem use a constant timezone or can this be affected by travelling?

These are good questions. It appears it all [depends on the filesystem](http://ask-leo.com/why_do_file_timestamps_compare_differently_every_time_change.html), which is inconsistent across Android devices.

That being said, the data from one day isn't any more or less significant than any other day so it's okay if we lose newer data in favor of uploading old data. However, our sequence numbers will become slightly less useful to explain outages (e.g. if we don't receive 3, 5, 6, it's not necessarily "they had no connection for 3, regained for 4, and lost for 5-6" but instead, "they could have pruned 3, 5, 6 from changed timestamps and uploaded 4 when ready"). We should ask Georg if this is significant.
Georg, these changesets simplify our ping store's implementation but use the file's timestamp to identify which pings to delete first when we store too many. How the timestamps change from clock skew as well as traveling varies depending on filesystem and thus the timestamps can't be relied upon. Practically speaking, newer data isn't any more useful than older data (assuming we're uploading data from within a few days time) so this sounds fine.

However, the sequence numbers can no longer be relied upon to consistently show outages because they can be removed in any arbitrary order – is this okay? For what it's worth, we'll still have the (untrustworthy) ping creation date, (untrustworthy) ping submission date, and the server received date to help with forensics.
Flags: needinfo?(gfritzsche)
To be explicit, I'm blocked on comment 22 before I can upload the reuploader changes (bug 1243585 & see comment 0).
Sorry, i'm a bit busy on a work-week.

This shouldn't matter much and seems like a rare edge-case.
I'd recommend sending pings in a defined order (desktop does "oldest to newest"), but in case of skew or quota-exceeding its fine to break this.
Flags: needinfo?(gfritzsche)
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: