Closed Bug 1768911 Opened 7 months ago Closed 7 months ago

PersonalityProviderWorkerClass.jsm does a lot of disk I/O

Categories

(Firefox :: New Tab Page, defect)

defect

Tracking

()

RESOLVED FIXED
102 Branch
Tracking Status
firefox102 --- fixed

People

(Reporter: florian, Assigned: thecount)

References

Details

(Keywords: perf, perf:resource-use)

Attachments

(1 file)

Every 5 minutes, 8 I/O operations are done for each of the 25 files in the <user profile>/personality-provider folder.

Here's a profile showing the I/O: https://share.firefox.dev/3yuxaQd
Zooming out, the same pattern is repeated every 5 minutes: https://share.firefox.dev/3Md3taA

There's obvious room for optimization here:

  • we don't need to repeatedly attempt to create the personality-provider folder. It could be created by the onSync method before the loop rather than by the maybeDownloadAttachment method. But actually, it would be more efficient to not attempt to create it at all, and let the other I/O functions fail (and catch the error with a try/catch). I think the right place to create the folder would be in a catch block after await IOUtils.write in _downloadAttachment. That should ensure we only attempt to create the folder once. (The IOUtils.makeDirectory call in the deleteAttachment method could also be similarly removed.)
  • there's an IOUtils.exists call followed by an IOUtils.stat call. Both do a stat under the hood. The exists call should be removed, and replaced with a try/catch around the IOUtils.stat call.
  • I haven't found why in the code, but we read all files twice. I assume the first read is from the _getFileHash function. The second read, I don't know.

Questions:

  • Do we really need to sync all these files every 5 minutes? Could it be every hour instead?
  • It looks like the first time we read the files to check the size and hash, we should cache this information instead of computing it again 5 minutes later. Is there a reason not to do so?
  • Could these files be merged into one to reduce the amount of I/O operations we need to do on them?
  • Doing all these calls to maybeDownloadAttachment in parallel (ie. without awaiting them) is interesting; it ensures the BgIOThreadPool thread remains busy and never sleeps. That probably saves a few thread wake-ups... but I wonder if it's intentional that IOUtils uses a single thread, rather than sending its I/O tasks to various threads of a thread pool. Probably off topic for this bug though.

I'm fairly certain that no, we don't need to check every 5 minutes. That seems excessive.

it's been a bit since I looked at this, but that might actually not be intended.

I'll take a look and update here with what I find.

(In reply to Florian Quèze [:florian] from comment #0)

[...]

  • Doing all these calls to maybeDownloadAttachment in parallel (ie. without awaiting them) is interesting; it ensures the BgIOThreadPool thread remains busy and never sleeps. That probably saves a few thread wake-ups... but I wonder if it's intentional that IOUtils uses a single thread, rather than sending its I/O tasks to various threads of a thread pool. Probably off topic for this bug though.

It is intentional that IOUtils uses a single background thread for IO, partially for a simple design, and partially so that you can reason about IO operations (i.e., every IO operation submitted to IOUtils will be completed in the order they are submitted).

I wonder if this has always been an issue, or if something changed at some point.

I wonder this because looking at your comment, it sounds like onSync is the primary issue? And if so, my previous understanding was that RemoteSettings(collection).on("sync", this.onSync); would only happen as a result of something in remote settings changing, and not on a 5 min timer. I could also be wrong about my assumption that it happens as a result to something changing.

(In reply to Barret Rennie [:barret] (they/them) from comment #2)

It is intentional that IOUtils uses a single background thread for IO, partially for a simple design, and partially so that you can reason about IO operations (i.e., every IO operation submitted to IOUtils will be completed in the order they are submitted).

Guaranteeing the order makes sense when all operations were triggered from the same JS context, but if a worker queues several slow operations to big files, it might be a surprise to wait for a long time for a short I/O operation triggered from the main thread. But I guess if nobody complained, it's probably fine.

So initial investigation:
The personalization code is should be overall running on a 12 hour interval, but there might be a bug here.

On top of that, I did a test locally and I am not seeing the personalization file I/O based code running every 5 minutes.

I wonder if in your case, something is causing the attachments to fail, which causes them to try again in 5 minutes, which also fails, and so on?

I'll keep digging.

(In reply to Florian Quèze [:florian] from comment #0)

  • I haven't found why in the code, but we read all files twice. I assume the first read is from the _getFileHash function. The second read, I don't know.

I think the second read is here

(In reply to Florian Quèze [:florian] from comment #0)

  • Doing all these calls to maybeDownloadAttachment in parallel (ie. without awaiting them) is interesting; it ensures the BgIOThreadPool thread remains busy and never sleeps. That probably saves a few thread wake-ups... but I wonder if it's intentional that IOUtils uses a single thread, rather than sending its I/O tasks to various threads of a thread pool. Probably off topic for this bug though.

I don't see any downside from my point of view on this in awaiting them, if it offers benefits to the main thread. That would make sense to me.

(In reply to Scott [:thecount] Downe from comment #5)

On top of that, I did a test locally and I am not seeing the personalization file I/O based code running every 5 minutes.

I do see it running every 30 minutes though.

(In reply to Scott [:thecount] Downe from comment #7)

(In reply to Florian Quèze [:florian] from comment #0)

  • Doing all these calls to maybeDownloadAttachment in parallel (ie. without awaiting them) is interesting; it ensures the BgIOThreadPool thread remains busy and never sleeps. That probably saves a few thread wake-ups... but I wonder if it's intentional that IOUtils uses a single thread, rather than sending its I/O tasks to various threads of a thread pool. Probably off topic for this bug though.

I don't see any downside from my point of view on this in awaiting them, if it offers benefits to the main thread. That would make sense to me.

I don't think there's an obvious benefit for the main thread. The things happening in parallel was just a bit confusing when reading the code so I wasn't sure it was intended. Usually when trying to run things in parallel, there's an explicit Promise.all() call. And if the use of IOUtils from a worker turns out to delay things for the main thread, I think that's an IOUtils bug.

Some notes:

  • We are receiving a lot of signature errors in the personnality-provider-* collections. (source and source). When synchronization fails, we refetch and retry. Not every 5min, but every time there is a new change on the server (see live list of latest Remote Settings changes). It would be interesting to analyze the profile that calls sync so often and check that signature verification is not broken for these collections (using Remote Settings Devtools for example).

  • The attachment download code of the PersonalityProviderWorker module was copied from an initial version of certificates intermediates. We now have a built-in API in Remote Settings that relies on IndexedDB instead of disk to download and manage attachments. See https://firefox-source-docs.mozilla.org/services/settings/index.html#file-attachments We now even have an (undocumented) way to package attachments as resource://

  • Globally, the collections related to the new page seem to be the ones which report the highest signature error rate (tippytop, personnality-provider-models, personnality-provider-recipes, message-groups, whats-new-panel). I don't want to advance conclusions too fast, but could there be some hijacking involved?

I think I figured this out and am testing a patch now.

(In reply to Mathieu Leplatre [:leplatrem] from comment #10)

Some notes:

  • We are receiving a lot of signature errors in the personnality-provider-* collections. (source and source). When synchronization fails, we refetch and retry. Not every 5min, but every time there is a new change on the server (see live list of latest Remote Settings changes). It would be interesting to analyze the profile that calls sync so often and check that signature verification is not broken for these collections (using Remote Settings Devtools for example).

After some investigation, I think this is unrelated, but less requests overall might help reduce the number of errors.

I am interested in looking into this, probably worth filing a followup ticket.

Assignee: nobody → sdowne
Attachment #9276271 - Attachment description: WIP: Bug 1768911 - Pocket newtab reduce number of I/O operations made from personality provider. → Bug 1768911 - Pocket newtab reduce number of I/O operations made from personality provider.
Pushed by sdowne@getpocket.com:
https://hg.mozilla.org/integration/autoland/rev/296bf3c7d129
Pocket newtab reduce number of I/O operations made from personality provider. r=gvn
Status: NEW → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
Target Milestone: --- → 102 Branch
You need to log in before you can comment on or make changes to this bug.