Closed Bug 1865845 Opened 1 year ago Closed 10 months ago

handle channel changes for cached macOS attribution data

Categories

(Firefox :: Installer, enhancement)

enhancement

Tracking

()

RESOLVED FIXED
123 Branch
Tracking Status
firefox123 --- fixed

People

(Reporter: bhearsum, Assigned: bhearsum)

References

Details

(Whiteboard: [fidedi-ope])

Attachments

(3 files)

macOS attribution data is meant to follow a particular install for its entire lifetime. We cached this data in directories that contain the full path to the install, and in filenames that contain the channel. (The channel is included because Release, Beta, and ESR all use Firefox.app as their application name - so the install directory alone is not enough distinguish them if a paveover occurs.)

Because the channel is included, there is a risk that this data may be lost when the channel changes. There's at least a few scenarios here to consider:

  1. A paveover occurs, replacing, eg: a Release install with a Beta install.
  2. The user overrides their channel.
  3. An update changes the user's channel. This typically happens when we desupport an OS version, and migrate users on it to the ESR channel for extended support

In the paveover scenario we do not want to preserve existing attribution data - we should use whatever new attribution data is present, if it exists.

In the other scenarios we should preserve attribution data if possible. We already have some code that runs code that runs around update time when the user changes their channel - although I'm not certain if it's suited for this purpose. Ideally, we'd detect the channel change at the next startup and deal with it then.

At the moment, I'm not certain how to detect the third scenario. Robin, I took a look around https://searchfox.org/mozilla-central/source/toolkit/mozapps/update/UpdateService.sys.mjs and https://searchfox.org/mozilla-central/source/toolkit/mozapps/update/AppUpdater.sys.mjs and couldn't find anything that might help - do you have any pointers you could give me? (Any other thoughts on this would be appreciated as well.)

Blocks: 1816982
Whiteboard: [fidedi-ope]
Flags: needinfo?(bytesized)

I'm afraid that I'm not especially familiar with the channel change process, but I'll answer this the best I can.

First, let me address the code that you linked. I believe that it successfully works in a very specific case: Firefox starts downloading an update and exits before completing it, then the channel is changed and Firefox is restarted. Then it would look at the in-progress update, realize that it's for the wrong channel, and discard it. If, however, there is a pending/staged update when the channel is changed, I believe that, when launched, Firefox will install it, restart, and then activate this code to incorrectly set update.state = STATE_FAILED; when it actually succeeded. If there were no update in-progress when the channel changes, I believe that this code would never run at all. The next time that the update process runs, it will set the update object's channel to match UpdateUtils.UpdateChannel, and we will avoid this code path when _postUpdateProcessing runs next.

I also can't seem to figure out how that code doesn't get consistently activated in your third scenario. I don't currently have the time to spin up a VM to investigate this, but it seems like when this runs, update.channel would be set to release, the update would be installed and would change the channel to esr, and then that same _postUpdateProcessing code would run and set update.state = STATE_FAILED;. I kinda feel like someone would probably have noticed by now if that were happening, but I can't seem figure out why it wouldn't without spinning up a VM and investigating.

I don't know of any especially good way to detect the second and third scenarios. The best I can think of at the moment is to add an idle task that duplicates the update channel into some other pref and then to move the attribution data around when the duplicated version doesn't match the current channel. Possibly the install path needs to be recorded somewhere as part of this to ensure that a profile doesn't get used by multiple installations and get confused? Of course, this would try to move the attribution data around once per profile, not once per installation. I'm not sure if that's a problem. It also could mean that the code might be run on a substantial delay, perhaps even after the channel had changed multiple times, or changed and changed back. Again, I'm not sure how much of a problem this is.

Potentially maybe you could avoid some (maybe even all, actually) of these problems by writing the duplicated pref into update-config.json instead of into the profile.

I would be happy to look into any of this further, if you'd like, but it might be a little while before I have the capacity to do so. Just let me know.

Flags: needinfo?(bytesized)

(In reply to Robin Steuber (they/them) [:bytesized] from comment #1)

If, however, there is a pending/staged update when the channel is changed, I believe that, when launched, Firefox will install it, restart, and then activate this code to incorrectly set update.state = STATE_FAILED; when it actually succeeded.

Oh, I just stumbled across some code that I forgot existed that probably prevents this from happening in most cases. When the user changed their channel, if they didn't also update update-settings.ini, this check would fail during the update process, causing the update to fail.

But it probably still isn't great that we unconditionally set update.state = STATE_FAILED; without checking what actually happened.

Robin and I had a good chat about this the other day and started to wonder whether or not we even need a cache file for mac attribution. As I understand it, the reason we had it was because fishing information out of the quarantine database in our original implementation was either expensive, or would not stick around forever. Our new implementation relies on xattrs on the .app directory, which I suspect would take a similar amount of I/O to read and parse compared with reading and parsing data from the cache file. If this is true, and it is also true that attribution data persists after an update, removing the cache file altogether seems like it would be the most robust solution here.

These are both assumptions for now which I'll follow up on soon.

(In reply to bhearsum@mozilla.com (:bhearsum) from comment #3)

Robin and I had a good chat about this the other day and started to wonder whether or not we even need a cache file for mac attribution. As I understand it, the reason we had it was because fishing information out of the quarantine database in our original implementation was either expensive, or would not stick around forever. Our new implementation relies on xattrs on the .app directory, which I suspect would take a similar amount of I/O to read and parse compared with reading and parsing data from the cache file. If this is true, and it is also true that attribution data persists after an update, removing the cache file altogether seems like it would be the most robust solution here.

Re: your understanding -- spot on. It's not clear to me that the xattr would persist; this seems a detail of how update stage and apply works that, IIUC, is an atomic swap that doesn't seem likely to retain xattr. Easily tested and could be implemented, of course.

These are both assumptions for now which I'll follow up on soon.

Sounds great!

(In reply to Nick Alexander :nalexander [he/him] from comment #4)

(In reply to bhearsum@mozilla.com (:bhearsum) from comment #3)

Robin and I had a good chat about this the other day and started to wonder whether or not we even need a cache file for mac attribution. As I understand it, the reason we had it was because fishing information out of the quarantine database in our original implementation was either expensive, or would not stick around forever. Our new implementation relies on xattrs on the .app directory, which I suspect would take a similar amount of I/O to read and parse compared with reading and parsing data from the cache file. If this is true, and it is also true that attribution data persists after an update, removing the cache file altogether seems like it would be the most robust solution here.

Re: your understanding -- spot on. It's not clear to me that the xattr would persist; this seems a detail of how update stage and apply works that, IIUC, is an atomic swap that doesn't seem likely to retain xattr. Easily tested and could be implemented, of course.

As it turns out, the atomic replace happens on the Contents directory, not the .app directory - so, luckily, the xattr on the .app directory is already preserved across updates.

As for the I/O, I was able to confirm with iosnoop that reading the xattr causes 3 reads to a resource fork totaling to 20480 bytes:

STRTIME              DEVICE  MAJ MIN   UID   PID D      BLOCK     SIZE                     PATHNAME ARGS
2023 Dec 13 15:47:48 ??        1  11   502   788 R    8477725     4096 ??/xattr/xattr/..namedfork/rsrc bash\0
2023 Dec 13 15:47:48 ??        1  11   502   788 R    8477726     8192 ??/xattr/xattr/..namedfork/rsrc bash\0
2023 Dec 13 15:47:48 ??        1  11   502   788 R    8477728     8192 ??/xattr/xattr/..namedfork/rsrc bash\0

Reading the cache file, on the other hand, is 4096 bytes:

TIME STIME DELTA DEVICE INS MAJ MIN UID PID PPID D BLOCK SIZE MOUNT FILE PATH COMM ARGS
4000333 4000123 209 ?? 5 1 5 502 688 1 R 30395385 4096 Data macAttributionDataCache-nightly-try ??/Firefox Nightly/macAttributionDataCache-nightly-try firefox firefox\0

That's a 5x increase - which sounds like a lot at first - but I believe this is out of the critical path of start-up, and possibly even off main thread - so reading 16 extra kilobytes feels like it's probably not a big deal for the simplicity it brings.

(In reply to bhearsum@mozilla.com (:bhearsum) from comment #5)

(In reply to Nick Alexander :nalexander [he/him] from comment #4)

(In reply to bhearsum@mozilla.com (:bhearsum) from comment #3)

Robin and I had a good chat about this the other day and started to wonder whether or not we even need a cache file for mac attribution. As I understand it, the reason we had it was because fishing information out of the quarantine database in our original implementation was either expensive, or would not stick around forever. Our new implementation relies on xattrs on the .app directory, which I suspect would take a similar amount of I/O to read and parse compared with reading and parsing data from the cache file. If this is true, and it is also true that attribution data persists after an update, removing the cache file altogether seems like it would be the most robust solution here.

Re: your understanding -- spot on. It's not clear to me that the xattr would persist; this seems a detail of how update stage and apply works that, IIUC, is an atomic swap that doesn't seem likely to retain xattr. Easily tested and could be implemented, of course.

As it turns out, the atomic replace happens on the Contents directory, not the .app directory - so, luckily, the xattr on the .app directory is already preserved across updates.

Excellent news! This'll want a macOS-specific updater test eventually, since we don't want the behaviour to change.

As for the I/O, I was able to confirm with iosnoop that reading the xattr causes 3 reads to a resource fork totaling to 20480 bytes:

STRTIME              DEVICE  MAJ MIN   UID   PID D      BLOCK     SIZE                     PATHNAME ARGS
2023 Dec 13 15:47:48 ??        1  11   502   788 R    8477725     4096 ??/xattr/xattr/..namedfork/rsrc bash\0
2023 Dec 13 15:47:48 ??        1  11   502   788 R    8477726     8192 ??/xattr/xattr/..namedfork/rsrc bash\0
2023 Dec 13 15:47:48 ??        1  11   502   788 R    8477728     8192 ??/xattr/xattr/..namedfork/rsrc bash\0

Reading the cache file, on the other hand, is 4096 bytes:

TIME STIME DELTA DEVICE INS MAJ MIN UID PID PPID D BLOCK SIZE MOUNT FILE PATH COMM ARGS
4000333 4000123 209 ?? 5 1 5 502 688 1 R 30395385 4096 Data macAttributionDataCache-nightly-try ??/Firefox Nightly/macAttributionDataCache-nightly-try firefox firefox\0

That's a 5x increase - which sounds like a lot at first - but I believe this is out of the critical path of start-up, and possibly even off main thread - so reading 16 extra kilobytes feels like it's probably not a big deal for the simplicity it brings.

I 100% agree that this is not likely to have noticeable impact.

We will be relying the com.apple.application-instance xattr being preserved for attribution data to be persisted across updates and channel changes. This change adds an extra check to most of the updater tests to ensure that this is the case.

I don't think we need to do anything special here around verifying that it happens for channel changes since those are just regular MARs that happen to include a bit that changes the channel - but I'm happy to add something if it's useful.

This function has been difficult to read for awhile. This patch cleans it up in preparation to adjust it for the removal of the cache file on macOS.

Depends on D197195

This is mostly removing code and tests related to reading/writing the cache file on macOS and updating of tests. Aside from that, the most notable part is the change to setAttributionString to automatically prepend __MOZCUSTOM__ when writing an attribution code. This is mostly done to make things simpler and cleaner in the majority of the tests, but seeing as getAttributionString is also aware of it, it seems like a generally nicer way to do this.

Depends on D197203

There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:bhearsum, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit BugBot documentation.

Flags: needinfo?(nalexander)
Flags: needinfo?(bhearsum)

These will land today or tomorrow.

Flags: needinfo?(bhearsum)
Pushed by bhearsum@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9a8c396a22db verify that xattrs are on .app are preserved across updates r=bytesized,application-update-reviewers,nalexander https://hg.mozilla.org/integration/autoland/rev/e825fb03ff56 refactor attribution code to clearly separate out flows for mac, nsis, and msix r=nalexander https://hg.mozilla.org/integration/autoland/rev/4f799b89c628 stop caching mac attribution data in a separate file r=nalexander
Flags: needinfo?(bhearsum)
Pushed by bhearsum@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/93b5332d9da6 verify that xattrs are on .app are preserved across updates r=bytesized,application-update-reviewers,nalexander https://hg.mozilla.org/integration/autoland/rev/4ac74de4ecd5 refactor attribution code to clearly separate out flows for mac, nsis, and msix r=nalexander https://hg.mozilla.org/integration/autoland/rev/aa6b62d272a2 stop caching mac attribution data in a separate file r=nalexander
Status: NEW → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → 123 Branch
Flags: needinfo?(nalexander)
Flags: needinfo?(bhearsum)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: