Open Bug 1701737 Opened 4 years ago Updated 9 months ago

Possible slowdown for Places queries

Categories

(Toolkit :: Places, defect, P3)

defect

Tracking

()

People

(Reporter: andreio, Unassigned)

Details

We've noticed an increase in the error rates for one of our targeting attributes used in messaging system. This calls NewTabUtils.getTopFrecentSites and starting 86 it went up significatly.
I don't have more context other than the errors are attribute_timeout meaning it took longer than 30s to complete the call and it was aborted. We don't capture the error message.
Error rate seems to have gone up across all platforms and it's only for some attributes including this one.
There's no urgency the messages relying on this call have been scheduled for removal but this might point to an existing issue.

Let me know if I could provide more info. Mak pointed out to bug 1678619. I couldn't reproduce this locally.
Some more context that might be useful: we cache the getTopFrecentSites call, this happens here. I think that one possibility for the high number of failures is that (for some reason) the call never resolves so our "cached" value is an unresolved promise that will always time out and send a telemetry report for failure. Every time the user navigates/loads a website that triggers our user messaging we would send yet another telemetry event (in other words: the query can fail once and we cache that failure for 6hours).

Flags: needinfo?(standard8)
Flags: needinfo?(mak)

(In reply to Andrei Oprea [:andreio] from comment #1)

Mak pointed out to bug 1678619. I couldn't reproduce this locally.

I was thinking that after that bug we may send more frecency changed notifications, that would end up calling onManyLinksChanged() more often. Could that cause an increase in the number of times we execute getTopFrecentSites? There is a lot of abstraction in this code, so I can't understand if there may be a direct relation among those.
I'd still be unsure why getTopFrecentSites would time out, unless the database is corrupt, and if that's the reason maybe we just invoke it too often? I'd expect the frecency changed notification to just clear some cache that will be rebuilt once the data is necessary, rather than causing immediate requeries, though.
Can you help me understand if somehow an increase in the calls to onManyLinksChanged can cause an increase in the calls to getTopFrecentSites?

Flags: needinfo?(mak) → needinfo?(andrei.br92)
Flags: needinfo?(standard8)

Andrei clarified that there should be no connection there between the notifications and the calls to getFrecentSites.

We should maybe more telemetry for when we fail to create a connection to the Places Database.

Flags: needinfo?(andrei.br92)
Severity: -- → S3
Priority: -- → P2
Priority: P2 → P3
You need to log in before you can comment on or make changes to this bug.