Glean runs as a separate instance in iOS application extensions
Categories
(Data Platform and Tools :: Glean: SDK, defect, P1)
Tracking
(Not tracked)
People
(Reporter: travis_, Assigned: travis_)
References
Details
(Whiteboard: [telemetry:glean-rs:backlog])
Recently it was noticed that we were receiving schema-rejection errors for Lockwise-iOS under the application id/document namespace of: org-mozilla-ios-Lockbox-CredentialProvider
. The expected application ID is org-mozilla-ios-Lockbox
which leads me to believe that we are seeing AppInfo.name
report a different ID where we use it to build the configuration to pass to the FFI layer when initializing Glean, but only when it's used/invoked to provide credentials, and not when the app is launched by the user.
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
Based on a conversation with Kayla, it looks like using AppInfo.baseBundleIdentifier
rather than AppInfo.name
might help guarantee that we always get the org-mozilla-ios-Lockbox
identifier, even if the CredentialProvider
is the bit that's running. She also helped out in giving some information on how to launch the CredentialProvider
from Xcode.
From Slack:
Kayla 12:01 PM
You build lockwise with xcode, log in with a test account or something that has a login for a website. Then you set your autofill settings in your
device or simulator to autofill with Lockwise. Then you build the CredentialProvider extension and it asks you like what app you want to build it
with and you click safari. (edited)
12:01
Then you go to the website you have the autofill password for and it will launch credential provider to fill it
12:02
I know it's a few steps, but that is how you launch and test the CredentialProvider extension and see what it is doing
The iOS team wasn't sure this would work, but it's easy enough to test. I think it's also important to ensure that AppInfo.baseBundleIdentifier
returns the same thing as AppInfo.name
so as to not cause issues with the telemetry we've already collected under that identifier.
Comment 2•5 years ago
•
|
||
(In reply to Travis Long [:travis_] from comment #1)
I think it's also important to ensure that
AppInfo.baseBundleIdentifier
returns the same thing asAppInfo.name
so as to not cause issues with the telemetry we've already collected under that identifier.
Looks like AppInfo.baseBundleIdentifier
is what was used on our iOS apps so far. Can you provide the official documentation entries for both these properties?
It sounds like using AppInfo.name
is wrong. We should probably make this bug a P1 rather than a backlog entry.
Assignee | ||
Comment 3•5 years ago
•
|
||
Currently we are using bundleIdentifier
in Glean in the Utils.swift file for AppInfo.name
After taking a look at the method used to get the "base identifier" in the searchfox link you posted, this should be returning the same bundle identifier as the way we are retrieving it and this code is manually chopping off the extension id part if it is present.
So, it looks like we would have to allow the app to set the app-id in Glean, otherwise, I don't see a way to intelligently detect and remove an extension id like the CredentialProvider
bit in the case we are looking at for this bug.
Comment 4•5 years ago
|
||
(In reply to Travis Long [:travis_] from comment #3)
So, it looks like we would have to allow the app to set the app-id in Glean, otherwise, I don't see a way to intelligently detect and remove an extension id like the
CredentialProvider
bit in the case we are looking at for this bug.
We should not allow the definition of application ids, this must be automated.
So, this code seems to be chopping off based on the number of components, relying on a fixed format of the bundle identifier. Is this correct? If so, can we use such an approach?
Comment 5•5 years ago
|
||
Looking at the linked code: What even is the "XPC!" package type? Why is it an indicator for chopping of the last component based on dots?
Did we establish that we do want these two different pieces of data under the same application ID? (I guess it makes sense because same app = same database, or not?)
(I do agree and prefer an automatic way over allowing to set application IDs)
Assignee | ||
Comment 6•5 years ago
|
||
(In reply to Alessio Placitelli [:Dexter] from comment #4)
(In reply to Travis Long [:travis_] from comment #3)
So, it looks like we would have to allow the app to set the app-id in Glean, otherwise, I don't see a way to intelligently detect and remove an extension id like the
CredentialProvider
bit in the case we are looking at for this bug.We should not allow the definition of application ids, this must be automated.
So, this code seems to be chopping off based on the number of components, relying on a fixed format of the bundle identifier. Is this correct? If so, can we use such an approach?
This particular function appears to be very app specific and I don't think we could do this in a generic way that would work for all apps, I'm afraid. We would have a difficult time detecting this and when to do it. Not every app has extensions like this that would require this, and there are even different kinds of app extensions. The names of these unknown app-id's could be unconventional making this more difficult to detect in an automated way. I think that while this is possible, it's definitely not feasible.
After speaking with Kayla and Elise on the Lockwise team, they seemed to agree that having the data in a separate table wasn't terrible and could be interesting to see what users were doing in Autofill vs. the app, but they admitted that such a decision would be best made by Data-Science/Product.
So I think the best course of action right now would be to start a proposal and involve DS on how to handle this situation in a way that allows us to best analyze the data, before we try to implement some fancy detection code for when the app-id gets things appended to it.
Comment 7•5 years ago
|
||
(In reply to Travis Long [:travis_] from comment #6)
After speaking with Kayla and Elise on the Lockwise team, they seemed to agree that having the data in a separate table wasn't terrible and could be interesting to see what users were doing in Autofill vs. the app, but they admitted that such a decision would be best made by Data-Science/Product.
I think there are two problems here:
(1) - How to deal with the lockwise case, if the best solution is to have two ids. And that's something that DS/Product should chime in.
(2) - Figure out if there's a technical solution to get the id of the application as it is advertised on the app store.
So I think the best course of action right now would be to start a proposal and involve DS on how to handle this situation in a way that allows us to best analyze the data, before we try to implement some fancy detection code for when the app-id gets things appended to it.
I think it is important for us to have a clear understanding about (2) before moving forward with (1).
So, this code seems to be chopping off based on the number of components, relying on a fixed format of the bundle identifier. Is this correct? If so, can we use such an approach?
This particular function appears to be very app specific and I don't think we could do this in a generic way that would work for all apps, I'm afraid. We would have a difficult time detecting this and when to do it. Not every app has extensions like this that would require this, and there are even different kinds of app extensions. The names of these unknown app-id's could be unconventional making this more difficult to detect in an automated way. I think that while this is possible, it's definitely not feasible.
I think that, at this point, we don't have all the elements we need to decide on this. At the very minimum, I think we need to understand the following:
- What's the XPC! package type?
- What's the usual format of the identifiers on apple platforms?
- Is there a specific format/standard for defining ids for the services/companion components? (e.g.
mozilla.ios.Lockbox.CredentialProvider
)
Assignee | ||
Comment 8•5 years ago
|
||
I think that, at this point, we don't have all the elements we need to decide on this. At the very minimum, I think we need to understand the following:
- What's the XPC! package type?
This appears to be a package type that defines lightweight services in the Apple Ecosystem. Based on this, it might be possible to detect at least these sorts of service applications that fall into the 'package' categories, but, just looking at the documentation, even this seems like it has a lot of opportunity to go wrong
- What's the usual format of the identifiers on apple platforms?
They typically take the form of reverse URLs, such as org.mozilla.ios.Lockbox
or com.apple.camera
, but there is some flexibility to this as while they are typically named in camel case, dashes and underscores are still allowed.
- Is there a specific format/standard for defining ids for the services/companion components? (e.g.
mozilla.ios.Lockbox.CredentialProvider
)
The only standard or method I could find through a quick search is that they take the "base" identifier and tack on the class name of the service or extension, so it would be dependent on the actual class name.
Comment 9•5 years ago
|
||
(In reply to Travis Long [:travis_] from comment #8)
I think that, at this point, we don't have all the elements we need to decide on this. At the very minimum, I think we need to understand the following:
- What's the XPC! package type?
This appears to be a package type that defines lightweight services in the Apple Ecosystem. Based on this, it might be possible to detect at least these sorts of service applications that fall into the 'package' categories, but, just looking at the documentation, even this seems like it has a lot of opportunity to go wrong
We can at least tell if we're in a service, that's a good start. Is there any way from the service to access the parent's bundle id?
Assignee | ||
Comment 10•5 years ago
|
||
We can at least tell if we're in a service, that's a good start. Is there any way from the service to access the parent's bundle id?
While you cannot access it through any Apple API directly, we can still do the same thing the function you linked from searchfox does by stripping off the last bit separated by the .
's.
Comment 11•5 years ago
|
||
(In reply to Travis Long [:travis_] from comment #10)
We can at least tell if we're in a service, that's a good start. Is there any way from the service to access the parent's bundle id?
While you cannot access it through any Apple API directly, we can still do the same thing the function you linked from searchfox does by stripping off the last bit separated by the
.
's.
Can services define more than one component ? e.g. mozilla.ios.lockbox.credential.provider?
Assignee | ||
Comment 12•5 years ago
|
||
(In reply to Alessio Placitelli [:Dexter] from comment #11)
Can services define more than one component ? e.g. mozilla.ios.lockbox.credential.provider?
I'm not sure if a sub-component like a service could define something another level like this or not. My instincts tell me that this is probably possible, but I will do my due diligence and research this possibility.
Assignee | ||
Updated•5 years ago
|
Comment 13•5 years ago
|
||
(In reply to Travis Long [:travis_] from comment #12)
(In reply to Alessio Placitelli [:Dexter] from comment #11)
Can services define more than one component ? e.g. mozilla.ios.lockbox.credential.provider?
I'm not sure if a sub-component like a service could define something another level like this or not. My instincts tell me that this is probably possible, but I will do my due diligence and research this possibility.
Super-worst-case: the glean "build system" define some project-wide variable that can be read by the service?
Assignee | ||
Comment 14•5 years ago
|
||
I've dug into this a bit more and I think that the approach taken by Focus (and thus by FF-iOS via their fork) in using the XPC!
CFBundlePackageType
to determine if this is an extension or not and then chopping off the last bit is probably the best approach if we decide that the extension data needs to be in the same table.
The more I think about this, the more I think it does need to be in the same table, and just be annotated somehow as :Dexter has suggested, as this solves the issue of being able to tell the difference between the telemetry generated by the app vs. the extension.
Now that we have explored this a bit (thanks a bunch Dexter!), let me put this together in a concise way and present it to Data Science and Lockwise to see what they think about the situation.
Comment 15•5 years ago
|
||
(In reply to Travis Long [:travis_] from comment #14)
Now that we have explored this a bit (thanks a bunch Dexter!), let me put this together in a concise way and present it to Data Science and Lockwise to see what they think about the situation.
I don't think this is something DS should care about.
This is an internal discussion for an SDK-related change :-)
Once we make this change, the Lockwise team might be interested, but mostly because they will need to decide whether or not they care about knowing if some data came from the service or from the app itself (i.e. they need to add a metric for that).
I feel that this internal discussion still requires some investigation, though:
- Can the service and the app run in parallel?
- If the two can run in parallel, do they run using the same data dir?
- If the above are true, does this result in concurrent writes/reads to the underlying glean dataset?
- Can the glean instances compete for ping uploads?
Assignee | ||
Comment 16•5 years ago
|
||
(In reply to Alessio Placitelli [:Dexter] from comment #15)
I feel that this internal discussion still requires some investigation, though:
- Can the service and the app run in parallel?
Yes, but the service would only ever run while the app was in the background, because the service is only invoked for other apps or for the OS, not for the parent Lockwise app. From what I have read and seen, the service is invoked and destroyed by the OS for each interaction, I don't see any evidence of the service continuing to run after Autofill is complete, but I'll do a little more digging just to be sure.
- If the two can run in parallel, do they run using the same data dir?
I believe so, but this is definitely something to find out for sure. I wonder if I can get access to a few of the discarded CredentialProvider
pings...
- If the above are true, does this result in concurrent writes/reads to the underlying glean dataset?
I think that the data directory is probably the same and I'm fairly certain that we wouldn't have both the app and autofill running at the same time, but I can see where maybe there would be some interesting interactions if they ran either concurrently or right after each other so I'll focus my investigation on whether this is a problem or not.
- Can the glean instances compete for ping uploads?
I wouldn't think that they could compete for ping uploads, but what I see happening is that the "app" is trying to collect the baseline ping while the service is setting baseline metrics.. a read/write conflict. I doubt that this would be the case, but since the service and the app run in different processes and potentially access the same data directories, it's definitely worth investigating to ensure we aren't colliding.
Assignee | ||
Comment 17•5 years ago
|
||
Here's what I found:
- The
CredentialProvider
creates its own Glean object and initializes it. - The
CredentialProvider
uses a different directory for data storage than the base Lockwise application - Both the
CredentialProvider
and the Lockwise app could potentially be running at the same time (but not in the foreground together, Lockwise cannot use theCredentialProvider
to log into iteself, which is the only way this could happen). - The application lifetime of the
CredentialProvider
is completely controlled by the OS, it is launched when the OS needs it and it is terminated as the OS sees fit. - We still seem to see the lifecycle events that we expect as the OS invokes the service, so it should work as a standalone Glean app/service.
Based on this information, I don't think there is going to be an easy way to make the embedded service use the same application ID as the base Lockwise application because they are essentially separate things. In light of this, I think that the best way forward would be to add the Lockwise CredentialProvider
application id to the pipeline in order to capture this usage. I'm not certain how bad this is to implement on the analysis side of things, but it would put some burden on it to join the data from the tables to get a full usage picture for the application.
Thoughts, Dexter?
Comment 18•5 years ago
|
||
(In reply to Travis Long [:travis_] from comment #17)
Here's what I found:
- The
CredentialProvider
creates its own Glean object and initializes it.- The
CredentialProvider
uses a different directory for data storage than the base Lockwise application
Ouch. This means that if some user uses both the app and CredentialProvider
then they'll have two instanfces of Glean with different client ids, meaning that analysis will end up potentially overcounting users.
- Any chance you can verify that?
- Can you check how legacy telemetry in Lockbox behaves in this case? (i.e. do we get the same client ids?)
Given that we ruled out potential data corruption, this is my biggest concern. Before going to product/DS, I'd recommend giving one last look at the above points to paint a complete picture of the problem. This is the reason wny Glean doesn't want to know about processes :-)
An alternative solution, for example, could be to do something similar to what lib-crash
does on Android to make sure that autofill usage is attributed to the right clients.
Assignee | ||
Comment 19•5 years ago
|
||
(In reply to Alessio Placitelli [:Dexter] from comment #18)
Ouch. This means that if some user uses both the app and
CredentialProvider
then they'll have two instanfces of Glean with different client ids, meaning that analysis will end up potentially overcounting users.
- Any chance you can verify that?
I didn't consider this when I was working on it yesterday, but yes, this looks to telemetry like two completely separate applications. Each has it's own directory, client-id, everything.
- Can you check how legacy telemetry in Lockbox behaves in this case? (i.e. do we get the same client ids?)
Legacy telemetry allows setting the application ID, this is why the "chopping off" of the CredentialProvider
bit works.
An alternative solution, for example, could be to do something similar to what
lib-crash
does on Android to make sure that autofill usage is attributed to the right clients.
Ewwww, that would also be icky as we'd have to find a good way to share data between processes. Even sharing a file like lib-crash does could potentially be problematic due to sandboxing, and I'm not sure we could even do it within Glean, it would have to be a Lockwise/app specific sort of change. The other challenge is because the telemetry code is shared between the app and the extension so I'd have to come up with some way to detect that and prevent the extension from initializing Glean.
Let me think on this a bit more to see if there is any way possible to treat all of this like a single Glean app...
Comment 20•5 years ago
|
||
(In reply to Travis Long [:travis_] from comment #19)
- Can you check how legacy telemetry in Lockbox behaves in this case? (i.e. do we get the same client ids?)
Legacy telemetry allows setting the application ID, this is why the "chopping off" of the
CredentialProvider
bit works.
Sorry if I'm being pedantic :-) Does that mean that "core" pings from CredentialProvider
and the Lockwise app report the same client_id? How would the application id affect this?
Assignee | ||
Comment 21•5 years ago
•
|
||
(In reply to Alessio Placitelli [:Dexter] from comment #20)
(In reply to Travis Long [:travis_] from comment #19)
- Can you check how legacy telemetry in Lockbox behaves in this case? (i.e. do we get the same client ids?)
Legacy telemetry allows setting the application ID, this is why the "chopping off" of the
CredentialProvider
bit works.Sorry if I'm being pedantic :-) Does that mean that "core" pings from
CredentialProvider
and the Lockwise app report the same client_id? How would the application id affect this?
They use UserDefaults
to store the client-id, which is a key-value store (similar to what our original Glean-AC used) that appears to be shared and accessible to both the app and extension. By allowing setting of the application ID explicitly when telemetry starts, they don't have to care running the app or the extension, they pull the client-id from shared key-value storage and set the application ID using their detection of the app extension and chopping off the CredentialProvider
if needed, which is then used for the telemetry URL path, ensuring that all telemetry is going to the same endpoint with the same client ID.
For Glean to be able to accomplish this, we would have to trick the extension into using the same directory as the base application (I'm not entirely certain if the extension can access the directory where the app stores Glean data due to sandboxing). That would at least mean that both were storing in the same place and using the same client-id. Unfortunately, since both could run in different processes this would mean that we would have to know when it was the extension and record data to a file so that the app could read it and process it like the hoop we jump through for lib-crash. The problem here is how do we know when it's the extension? Do we force the consuming application to worry about this? Do we try and detect it internally in Glean and always check to see if that XPC!
flag is set like Lockwise/Focus does to do the right thing, and if so are we sure we are handling it the way that the application requires?
Assignee | ||
Updated•5 years ago
|
Comment 22•5 years ago
|
||
(In reply to Travis Long [:travis_] from comment #21)
(In reply to Alessio Placitelli [:Dexter] from comment #20)
(In reply to Travis Long [:travis_] from comment #19)
- Can you check how legacy telemetry in Lockbox behaves in this case? (i.e. do we get the same client ids?)
Legacy telemetry allows setting the application ID, this is why the "chopping off" of the
CredentialProvider
bit works.Sorry if I'm being pedantic :-) Does that mean that "core" pings from
CredentialProvider
and the Lockwise app report the same client_id? How would the application id affect this?[...] they pull the client-id from shared key-value storage [...]
Ah ha! That's the trick!
For Glean to be able to accomplish this, we would have to trick the extension into using the same directory as the base application (I'm not entirely certain if the extension can access the directory where the app stores Glean data due to sandboxing). That would at least mean that both were storing in the same place and using the same client-id.
I think this may get us into comment 15 land again :(
Do we try and detect it internally in Glean and always check to see if that
XPC!
flag is set like Lockwise/Focus does to do the right thing, and if so are we sure we are handling it the way that the application requires?
"For some definition of right thing" :-) I think the SDK should not know/care about processes: this would complicate things quite a lot given how stuff differs across use-cases and platform. The fact that legacy did that, in Lockwise, doesn't mean we should. Given the status of Lockwise maybe this is not a problem we need to solve in the short run.
Maybe, given the outcome of this investigation, the best option right now is to add this id to the pipeline. Mike, thoughts?
Comment 23•5 years ago
|
||
I was just sort of in these parts for Python reasons... So I can probably answer some of these.
(In reply to Alessio Placitelli [:Dexter] from comment #15)
- Can the service and the app run in parallel?
- If the two can run in parallel, do they run using the same data dir?
- If the above are true, does this result in concurrent writes/reads to the underlying glean dataset?
LMDB explicitly handles concurrent reads/writes from multiple processes. And iOS is Unixy enough I'm confident it's correct there.
- Can the glean instances compete for ping uploads?
For this, there is good discussion over in bug 1628294.
Maybe, given the outcome of this investigation, the best option right now is to add this id to the pipeline. Mike, thoughts?
IIUC correctly, there's two separate things here:
-
The app is sending different client_ids in the two different contexts. This, to me, seems like a critical bug to fix. If each user/device is sending multiple client_ids, it will be impossible to get a real measure on just about anything.
-
The app sends to two different doctypes depending on context. This worries me less, but we'd want to document it clearly at a minimum. I still think this feels like the kind of thing where analyses could be incorrect because someone isn't aware of this behavior. I think I'd prefer trying to find a way to merge the two into the same table. If there's no good client solution, maybe there's a data platform solution to that?
Assignee | ||
Comment 24•5 years ago
|
||
(In reply to Michael Droettboom [:mdroettboom] from comment #23)
I was just sort of in these parts for Python reasons... So I can probably answer some of these.
(In reply to Alessio Placitelli [:Dexter] from comment #15)
- Can the service and the app run in parallel?
- If the two can run in parallel, do they run using the same data dir?
- If the above are true, does this result in concurrent writes/reads to the underlying glean dataset?
LMDB explicitly handles concurrent reads/writes from multiple processes. And iOS is Unixy enough I'm confident it's correct there.
- Can the glean instances compete for ping uploads?
For this, there is good discussion over in bug 1628294.
Maybe, given the outcome of this investigation, the best option right now is to add this id to the pipeline. Mike, thoughts?
IIUC correctly, there's two separate things here:
- The app is sending different client_ids in the two different contexts. This, to me, seems like a critical bug to fix. If each user/device is sending multiple client_ids, it will be impossible to get a real measure on just about anything.
In order to fix this, we will have to trick the Glean in the extension to use the directory of the base application's Glean store, and I'm not certain how trivial or even possible that this is due to sandboxing. There may be other ways of dealing with this, but it's not as simple as treating the extension like a library that uses Glean because libraries don't typically run on their own and in their own process. I'm not convinced that this is the easiest way to the solution but I'm starting to get to the point that I need to know what the requirements of analysis are in this case so that I can target solutions that get us there.
- The app sends to two different doctypes depending on context. This worries me less, but we'd want to document it clearly at a minimum. I still think this feels like the kind of thing where analyses could be incorrect because someone isn't aware of this behavior. I think I'd prefer trying to find a way to merge the two into the same table. If there's no good client solution, maybe there's a data platform solution to that?
If we can overcome the differing client-id problem, then the document-id problem is easier to fix because we have identified that it's possible to detect when it's the extension running and strip off the additional id piece to get back to the base document-id.
Comment 25•5 years ago
|
||
(In reply to Michael Droettboom [:mdroettboom] from comment #23)
Maybe, given the outcome of this investigation, the best option right now is to add this id to the pipeline. Mike, thoughts?
IIUC correctly, there's two separate things here:
- The app is sending different client_ids in the two different contexts. This, to me, seems like a critical bug to fix. If each user/device is sending multiple client_ids, it will be impossible to get a real measure on just about anything.
Yes, I concur that this problem needs to be fixed. I'm not sure the fix should be on the SDK side, though: this is an example of platform-specific concept (app extentions?) that walks outside of the process boundary (something that the SDK doesn't want to know about).
In this case I believe the lower-hanging fruit fix is in Lockwise: the extension can report back to the main application that some telemetry needs to be reported.
The fact that makes this real quirky is that the extension app (autofill) has a completely different lifecycle compared to the Lockwise application (and they can even run at the same time...?!).
Given that iOS makes it so easy to use these XPC! thingies, maybe we need a better strategy for handling this on iOS. However, this doesn't help us with the immediate problem of the data having a different client id and being in a different table.
Assignee | ||
Comment 26•5 years ago
|
||
Sounds like lightweight-proposal time? I have a couple of ideas on how to deal with this (some of which need a little more thought to be complete), ranging from documenting this behavior/drawback to more creative solutions. Considering the feedback that this has gathered so far, I think a proposal/doc might be a good place to plan out the strategy for dealing with this on iOS.
Assignee | ||
Comment 27•5 years ago
|
||
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 28•5 years ago
|
||
Updating the title to better reflect the direction this investigation has gone.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 29•4 years ago
|
||
It appears that the investigation on this has wound down to a resolution and it doesn't look like we can do much on the Glean side of things to make this easier and will have to put the burden on the integrating application to ensure that telemetry is recorded through the base application process.
I've filed a couple of follow-up bugs, blocked by this bug, that will track documenting the problem with using extensions with Glean and will also implement something to prevent Glean from being initialized in the extension processes to prevent Glean from running and sending telemetry from them.
Closing this investigation/proposal bug as FIXED.
Description
•