Closed Bug 1788901 Opened 2 years ago Closed 2 years ago

Folder property caching handles missing properties badly: Re-instate code from bug 1726319 and bug 1776280 removed in bug 1778888

Categories

(MailNews Core :: General, defect, P1)

Thunderbird 102

Tracking

(thunderbird_esr102+ fixed, thunderbird105 fixed)

RESOLVED FIXED
106 Branch
Tracking Status
thunderbird_esr102 + fixed
thunderbird105 --- fixed

People

(Reporter: b1, Assigned: b1)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: perf, regression)

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36 Edg/104.0.1293.70

Steps to reproduce:

We've recently migrated a profile with ~2700 folders from TB 91 to TB 102. After the panacea.dat->folderCache.json migration, only ~1300 folders had an MRUTime attribute, that is, missing attribute, not attribute null.

As a consequence operations on recent folders are painfully slow, about 8 seconds to bring up Recent Folders in the folder pane.

Bug 1778888 focused on the treatment of null values, but removing the hunks from bug 1726319 and bug 1776280 worsened the situation for missing attributes.

Accessing recent folders now will again query MSF files causing noticeable delays. Fortunately, due to the fix in bug 1787609, no MSF loss should occur.

As a workaround for affected users, the folderCache.json can be edited adding a null value for MRUTime everywhere. Observed behaviour is that that makes access to recent folder instantaneous again.

Correction, not bug 1787609, but bug 1773822 is avoiding MSF loss.

Blocks: tb102found
Keywords: regression
Regressed by: 1778888
See Also: → 672835
Summary: Folder attribute caching handles missing attributes badly: Re-instate code from bug 1726319 and bug 1776280 removed in bug 1778888 → Folder property caching handles missing properties badly: Re-instate code from bug 1726319 and bug 1776280 removed in bug 1778888

Sorry to go full circle on this issue. It was only detected on a profile with many folders, a lot of them likely imported from a different client and without time properties.

Bug 1787609 is also worth a look (conflicting with the patch here) as the gremlin that mass destroys MSF files is still alive.

Attachment #9292867 - Flags: review?(benc)
Severity: -- → S3
Keywords: perf
Priority: -- → P1
Comment on attachment 9292867 [details] [diff] [review]
1788901-handle-missing-folder-props.patch

Review of attachment 9292867 [details] [diff] [review]:
-----------------------------------------------------------------

Basically happy for this to land, but just want to run over implications and alternatives first...

::: mailnews/base/src/nsMsgDBFolder.cpp
@@ +2030,5 @@
> +        // folderCache.json is removed or becomes invalid after moving
> +        // a profile (see bug 1726660).
> +        propertyValue.Truncate();
> +        return NS_OK;
> +      }

Just to be clear - this chunk of code is saying the time properties should _never_ be read from the DB.
For all intents and purposes the folderCache _is_ the database, as far as the time properties are concerned.

This seems like an icky state of affairs. But, of course, the root cause is parsing huge numbers of mork files is too slow :-(
So I don't really have any better immediate suggestions...

@@ +2046,5 @@
> +        // Now that we have the value, store it in our cache.
> +        if (cacheElement) {
> +          cacheElement->SetCachedString(propertyName, propertyValue);
> +        }
> +      }

Is this chunk needed? It'll never be executed for time properties...

Alternatively, this chunk by itself would (somewhat) fix the issue - you'd still have an initial delay as all the mork files were probed for MRUs, but this would cache those MRUs so subsequent runs would be quicker...
(Does `folderInfo->GetCharProperty(propertyName, propertyValue)` return NS_OK and "" for missing values? I honestly can't tell from the mork code...)
Assignee: nobody → b1
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

(In reply to Ben Campbell from comment #3)

Just to be clear - this chunk of code is saying the time properties should never be read from the DB.

Correct.

For all intents and purposes the folderCache is the database, as far as
the time properties are concerned.

Not sure what you're saying here. We've seen a case with 2700 folders, over a thousand had no time properties, so the databases were opened. So the folder cache wasn't the database. Maybe you mean that the folder cache becomes the database, if not in the cache, processing stops and the non-existent cache value is taken as null/empty/never.

This seems like an icky state of affairs. But, of course, the root cause is
parsing huge numbers of mork files is too slow :-(

And unnecessary. Accessing those folders will store time stamps in the cache, their absence implies that the folder was never accessed. Or as the previous comment said: https://hg.mozilla.org/comm-central/rev/aba71e533211#l1.18
"it's not present at all since a purge check for this folder has yet to occur since cache was initialized, until now."
The case of the user removing folderCache.json or moving the profile and invalidating the cache (remember absolute native file paths as keys) can't be penalized by accessing thousands of files and the related performance hit. In this context someone should investigate what happens on profile import with folderCache.json.

Is this chunk needed? It'll never be executed for time properties...

Correct. The chunk is not executed for time properties, but it may be executed for other properties, which may not be mass queried. However, that chunk does no damage and avoids repeated queries. It also passed review previously ;-)

Alternatively, this chunk by itself would (somewhat) fix the issue - you'd
still have an initial delay as all the mork files were probed for MRUs, but
this would cache those MRUs so subsequent runs would be quicker...

Correct, after a considerable initial delay all the retrieved values would then be stored. However, the initial delay is also not a nice experience after an update (or an import).

(Does folderInfo->GetCharProperty(propertyName, propertyValue) return
NS_OK and "" for missing values? I honestly can't tell from the mork code...)

You mean in the old scheme with panacea.dat. We don't know the answer, but this is more like an academic question since the effects we're seeing in the new implementation are very clear. You'd have to create a setup with many folders, remove the cache (panacea/folderCache), then open the profile with 91 or earlier and check the delay. But 91 would delete all MSFs after the 512th it opens, so you wouldn't notice a huge delay of opening thousands of MSFs anyway.

BTW, we've submitted the patch here since we were involved in the predecessor bugs. We can only highly recommend considering further fixes in bug 1787609 which eliminate another MSF-killing code path. We noticed the issue by trying to access a news message via ID which wasn't in any database, again leading to a massive MSF exodus. One good thing about the panacea to JSON migration is that we were made aware of a few places where the system killed MSFs.

(In reply to b1 from comment #4)

(In reply to Ben Campbell from comment #3)

For all intents and purposes the folderCache is the database, as far as
the time properties are concerned.

Not sure what you're saying here. We've seen a case with 2700 folders, over a thousand had no time properties, so the databases were opened. So the folder cache wasn't the database.
Maybe you mean that the folder cache becomes the database, if not in the cache, processing stops and the non-existent cache value is taken as null/empty/never.

Yes - that's what I meant - with this change installed, the folder cache is effectively the database as far as the time properties are concerned. i.e the MRU will never ever be read from the mork DB. (it might be written, but never read).
From outside, we're still saying "the MRU is stored in the database"... but really, that's a lie now.
Not the end of the world, but just another little corner case to keep in mind.

Is this chunk needed? It'll never be executed for time properties...

Correct. The chunk is not executed for time properties, but it may be executed for other properties, which may not be mass queried. However, that chunk does no damage and avoids repeated queries. It also passed review previously ;-)

Alternatively, this chunk by itself would (somewhat) fix the issue - you'd
still have an initial delay as all the mork files were probed for MRUs, but
this would cache those MRUs so subsequent runs would be quicker...

Correct, after a considerable initial delay all the retrieved values would then be stored. However, the initial delay is also not a nice experience after an update (or an import).

(Does folderInfo->GetCharProperty(propertyName, propertyValue) return
NS_OK and "" for missing values? I honestly can't tell from the mork code...)

You mean in the old scheme with panacea.dat.

No, no - was just curious in terms of landing this second change without the first one (the one which checks for time properties).
If GetCharProperty fails for unset values then it wouldn't work - the empty value wouldn't be sent to the cache, and so the next access would also hit the mork db again...

Comment on attachment 9292867 [details] [diff] [review]
1788901-handle-missing-folder-props.patch

Review of attachment 9292867 [details] [diff] [review]:
-----------------------------------------------------------------

r+
I don't have a better suggestion.
Really, I'd just like the databases to be fast enough that we can ditch the foldercache entirely ;-)
Attachment #9292867 - Flags: review?(benc) → review+

Comment on attachment 9292867 [details] [diff] [review]
1788901-handle-missing-folder-props.patch

[Approval Request Comment]
Regression caused by (bug #): bug 1778888
User impact if declined: Hung system after upgrade or import of profile with many folders
Testing completed (on c-c, etc.): Code was already deployed via bug 1726319 and bug 1776280.
Risk to taking this patch (and alternatives if risky): Code was already deployed via bug 1726319 and bug 1776280.

Attachment #9292867 - Flags: approval-comm-esr102?
Attachment #9292867 - Flags: approval-comm-beta?
Target Milestone: --- → 106 Branch

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/6d684996c97b
handle missing folder properties better. r=benc

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED

Comment on attachment 9292867 [details] [diff] [review]
1788901-handle-missing-folder-props.patch

[Triage Comment]
Approved for beta

Attachment #9292867 - Flags: approval-comm-beta? → approval-comm-beta+
See Also: → 1787609

Comment on attachment 9292867 [details] [diff] [review]
1788901-handle-missing-folder-props.patch

[Triage Comment]
Approved for esr102 - 102.2.2 - even though it's not yet on beta, given our performance issues and the fact that this code has previously been in the release .

Attachment #9292867 - Flags: approval-comm-esr102? → approval-comm-esr102+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: