Write new client ID file in Java before Gecko starts to prevent race condition with JS accessing client ID

RESOLVED FIXED in Firefox 47

Status

()

defect
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: mcomella, Assigned: mcomella)

Tracking

(Blocks 1 bug)

unspecified
Firefox 48
All
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox47+ fixed, firefox48+ fixed, fennec47+)

Details

Attachments

(1 attachment)

So we don't have the chance of both Java and JS writing a new client ID file on top each other.

fwiw, if we only check if Gecko is running before running the java algo, it's possible Gecko will finish startup, try to get the client ID, and write over the Java data so this is only a small step in the right direction.

We'd have to do some FileLock-ing to properly fix this (though I'm not sure how to do that in JS).
Alternatively, Georg also suggested we might get the clientID in Gecko from the Java parts.
Summary: Receive client ID from Gecko if it is running & lock file before accessing clientID → Receive client ID from Gecko if it is running & lock file before accessing clientID OR get client ID from java
See https://bugzilla.mozilla.org/show_bug.cgi?id=1249491#c8 and the following comments for more details on the issue.
(In reply to Michael Comella (:mcomella) from comment #1)
> Alternatively, Georg also suggested we might get the clientID in Gecko from
> the Java parts.

We briefly talked about this at the last mobile Telemetry meeting and this seems like the simplest and most robust way here.
Summary: Receive client ID from Gecko if it is running & lock file before accessing clientID OR get client ID from java → In JS, get client ID from Java to prevent datareporting/state.json file write race condition
Georg, I'm not sure how to go about doing this. Do you think you can provide some guidance?

It seems safest to redirect directly in the getClientID call [1] so I was thinking we could have some sort of:

if (isAndroid) {
  getClientIDFromJava();
} else {
  // do what we've been doing
}

Seeing that getClientID calls out to ClientIDImpl, perhaps we can just construct a ClientIDAndroidImpl (e.g. like Java interfaces) if we're on Android instead.

It's worth noting that the calls to Java are sent as messages and are async so we'd probably have to do some trickery there.

My question: how can I tell if I'm on Android? Also, would it be considered poor design to put device-specific logic in toolkit? What if we used the "AndroidImpl" implementation strategy?

Also, if you think'd it'd be productive for you to do it (e.g. to get more context on the Android code), let me know! I can step you through sending js/java messages and getting the client ID from GeckoProfile.

[1]: https://mxr.mozilla.org/mozilla-central/source/toolkit/modules/ClientID.jsm#58
Assignee: nobody → michael.l.comella
Flags: needinfo?(gfritzsche)
(In reply to Michael Comella (:mcomella) from comment #0)
> So we don't have the chance of both Java and JS writing a new client ID file
> on top each other.

Is this even possible? On the Java side, we must create the GeckoProfile before we start Gecko. I'd rather avoid more message passing code.
(In reply to Michael Comella (:mcomella) from comment #4)
> It seems safest to redirect directly in the getClientID call [1] so I was
> thinking we could have some sort of:
> 
> if (isAndroid) {
>   getClientIDFromJava();
> } else {
>   // do what we've been doing
> }
[...]
> It's worth noting that the calls to Java are sent as messages and are async
> so we'd probably have to do some trickery there.

The current implementation is already async (due to disk I/O), so this wouldn't make a difference.
The only thing to keep in mind is to make sure that we don't end up with unresolved promises here for edge-cases like early shutdown etc.

> My question: how can I tell if I'm on Android?

You can use |AppConstants.platform == "android"|.

> Also, would it be considered
> poor design to put device-specific logic in toolkit? What if we used the
> "AndroidImpl" implementation strategy?

Branching based on platform for those non-hot paths is fine, we already do it in other similar places.

> Also, if you think'd it'd be productive for you to do it (e.g. to get more
> context on the Android code), let me know! I can step you through sending
> js/java messages and getting the client ID from GeckoProfile.

I don't think i can get here very soon given my backlog, but i'll let you know if i could get there first.
Flags: needinfo?(gfritzsche)
GeckoProfile should always run before Gecko itself inits; it's even part of GeckoThread now. Why can we not ensure that we write the client ID and win the race, just as we always write times.json?

Am I missing a good reason why it can't be that simple?
Flags: needinfo?(michael.l.comella)
Blocks: core-ping
No longer blocks: ut-android
(In reply to Richard Newman [:rnewman] from comment #7)
> GeckoProfile should always run before Gecko itself inits; it's even part of
> GeckoThread now. Why can we not ensure that we write the client ID and win
> the race, just as we always write times.json?
> 
> Am I missing a good reason why it can't be that simple?

To confirm this, GeckoProfile is created before the GeckoThread is finished launching but it seems almost coincidental:
 GeckoApp.onCreate
  GeckoThread.launch
   GeckoThread.start
...
 GeckoThread.run
  GeckoThread.getGeckoArgs (notably before GeckoLader.nativeRun)
   GeckoThread.addCustomProfileArg
    (if GAP.getGeckoInterface != null, which is set earlier in GeckoApp.onCreate)
     EITHER     
     GeckoProfile.getDir() (if in guest mode)
      GeckoProfile.forceCreate
     GeckoProfile.forceCreate (if not in guest mode)

forceCreate opens the times.json file.

That being said, if this code path changes, forceCreate is also called when the GeckoView is initialized (which occurs after the GeckoThread.launch call, but is likely to happen before Gecko finishes launching).

We can get getClientID to this forceCreate method.

Good call, Richard.
Flags: needinfo?(michael.l.comella)
As best as I can tell, this code runs before Gecko is initialized. via
bug 1255657 comment 8:

To confirm this, GeckoProfile is created before the GeckoThread is
finished launching but it seems almost coincidental:
 GeckoApp.onCreate
  GeckoThread.launch
   GeckoThread.start
...
 GeckoThread.run
  GeckoThread.getGeckoArgs (notably before GeckoLader.nativeRun)
   GeckoThread.addCustomProfileArg
    (if GAP.getGeckoInterface != null, which is set earlier in GeckoApp.onCreate)
     EITHER
     GeckoProfile.getDir() (if in guest mode)
      GeckoProfile.forceCreate
     GeckoProfile.forceCreate (if not in guest mode)

forceCreate opens the times.json file.

That being said, if this code path changes, forceCreate is also called
when the GeckoView is initialized (which occurs after the GeckoThread.launch
call, but is likely to happen before Gecko finishes launching).

We can get getClientID to this forceCreate method.

Review commit: https://reviewboard.mozilla.org/r/43073/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/43073/
Attachment #8736080 - Flags: review?(rnewman)
Summary: In JS, get client ID from Java to prevent datareporting/state.json file write race condition → Write new client ID file in Java before Gecko starts to prevent race condition with JS accessing client ID
Comment on attachment 8736080 [details]
MozReview Request: Bug 1255657 - Write client ID file before Gecko starts to prevent race. r=rnewman

https://reviewboard.mozilla.org/r/43073/#review40561

As well as making this change, you should ensure that GeckoThread coordinates, not races, with other code that creates profiles -- and that when GeckoThread is the caller that creates a profile, it does so through this code path!

The coordination we're trying to ensure is that by the time Gecko is launched with a profile, that the profile directory contains the client ID. That *should* be guaranteed either by it being written before profiles.ini, or by Gecko not being started until we've called through GeckoProfile.

::: mobile/android/base/java/org/mozilla/gecko/GeckoProfile.java:1037
(Diff revision 1)
>          }
>  
> +        // Create the client ID file before Gecko starts (we assume this method
> +        // is called before Gecko starts). If we let Gecko start, the JS telemetry
> +        // code may try to write to the file at the same time Java does and
> +        // undefined behavior can occur.

I'm not sure I'd call it "undefined" -- we know what'll happen!

::: mobile/android/base/java/org/mozilla/gecko/GeckoProfile.java:1038
(Diff revision 1)
>  
> +        // Create the client ID file before Gecko starts (we assume this method
> +        // is called before Gecko starts). If we let Gecko start, the JS telemetry
> +        // code may try to write to the file at the same time Java does and
> +        // undefined behavior can occur.
> +        getClientId();

This is a little bit silly.

You're calling `getClientId` for side-effects. But it does file IO for its return value, and it also tries to migrate old IDs! But we literally just created this profile directory (the loop on line 970 guarantees that it's a new directory), so all of that is wasted effort.

Worst case right now it does:

* Read CLIENT_ID_FILE_PATH. Fails: file doesn't exist.
* Read FHR_CLIENT_ID_FILE_PATH. Fails: file doesn't exist.
* Write CLIENT_ID_FILE_PATH. Succeeds.
* Read CLIENT_ID_FILE_PATH. Succeeds, and we discard the result.

Don't do this! Instead, just persist a new client ID:

```
persistClientId(UUID.randomUUID().toString());
```

and do this way up on line 980, immediately after creating the profile directory.
Attachment #8736080 - Flags: review?(rnewman)
Status: NEW → ASSIGNED
tracking-fennec: --- → ?
(In reply to Richard Newman [:rnewman] from comment #10)
> As well as making this change, you should ensure that GeckoThread
> coordinates, not races, with other code that creates profiles -- and that
> when GeckoThread is the caller that creates a profile, it does so through
> this code path!

I've only found one code path in Java to create GeckoProfiles: GeckoProfile.forceCreate. However, due to the fact that GeckoProfile is initialized lazily (generally through `GeckoProfile.getDir()`), it's potentially a race to see who will initialize GeckoProfile first based on some number of seemingly unrelated conditions. Note: the creation code path is mutually exclusive & thus safe – `forceCreate` is `synchronized`.

That being said, my analysis in comment 9 (which is liable to have missed something, e.g. [1] called from the UI thread) showed that this creation always occurs from the GeckoThread before Gecko has a chance to run (and start mucking about in our profiles).

However, we always need the profile ready before Gecko runs – perhaps we can just init it non-lazily! I filed bug 1262625 if you wanted to provide some input.

> The coordination we're trying to ensure is that by the time Gecko is
> launched with a profile, that the profile directory contains the client ID.
> That *should* be guaranteed either by it being written before profiles.ini,
> or by Gecko not being started until we've called through GeckoProfile.

As per the above, the latter should occur (and I have not investigated the former).

[1]: e.g. my analysis in comment 9 left out the case where GeckoApp.onCreate can call GeckoProfile.getFromArgs and call `forceCreate` on the profile under a series of conditions
Comment on attachment 8736080 [details]
MozReview Request: Bug 1255657 - Write client ID file before Gecko starts to prevent race. r=rnewman

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/43073/diff/1-2/
Attachment #8736080 - Flags: review?(rnewman)
jchen has been making a bunch of changes in here recently, could also be good to get his feedback.
[Tracking Requested - why for this release]: better telemetry numbers
tracking-fennec: ? → 47+
Attachment #8736080 - Flags: review?(rnewman) → review?(s.kaspari)
Attachment #8736080 - Flags: review?(s.kaspari) → review+
Comment on attachment 8736080 [details]
MozReview Request: Bug 1255657 - Write client ID file before Gecko starts to prevent race. r=rnewman

https://reviewboard.mozilla.org/r/43073/#review42297

Looks reasonable from what I can tell.
Comment on attachment 8736080 [details]
MozReview Request: Bug 1255657 - Write client ID file before Gecko starts to prevent race. r=rnewman

(In reply to :Margaret Leibovic from comment #13)
> jchen has been making a bunch of changes in here recently, could also be
> good to get his feedback.

I agree. Flagging Jim. :)
Attachment #8736080 - Flags: feedback?(nchen)
Comment on attachment 8736080 [details]
MozReview Request: Bug 1255657 - Write client ID file before Gecko starts to prevent race. r=rnewman

LGTM
Attachment #8736080 - Flags: feedback?(nchen) → feedback+
Comment on attachment 8736080 [details]
MozReview Request: Bug 1255657 - Write client ID file before Gecko starts to prevent race. r=rnewman

Approval Request Comment
[Feature/regressing bug #]: bug 1244295
[User impact if declined]: Under very improbable conditions, users could have their telemetry client ID corrupted if 1) they're creating a new client ID or 2) migrating from FHR. This should be resolved after restart, however.

[Describe test coverage new/current, TreeHerder]: Tested locally
[Risks and why]: Low – we add a line to  code to initialize the file before Gecko can start (and try to initialize the file, starting the race). This code already existed so is low risk and we're calling it from a place where there's a precedent to create files shared by Java and JS. This patch also has some small cosmetic changes.

[String/UUID change made/needed]: None
Attachment #8736080 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/747517785de2
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 48
Comment on attachment 8736080 [details]
MozReview Request: Bug 1255657 - Write client ID file before Gecko starts to prevent race. r=rnewman

Improves telemetry data quality in this specific scenario, Aurora47+
Attachment #8736080 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.