Closed Bug 739763 Opened 12 years ago Closed 12 years ago

First time text entering in urlbar after startup causes Firefox to hang 10s and more

Categories

(Toolkit :: Places, defect)

13 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla14

People

(Reporter: TheOne, Assigned: mak)

Details

Attachments

(1 file)

When trying to enter a url in the urlbar for the first time after Firefox start, the whole UI locks for 10sec or more. This happens as soon as I do the first key stroke in the urlbar.

Here are some stats about my Mac instance running Firefox 13:
https://gist.github.com/2219343

about:telemetry identified the problematic query:
SELECT h.id, h.favicon_id, h.guid, (SELECT h.url FROM moz_bookmarks b WHERE b.fk = h.id UNION ALL SELECT (SELECT url FROM moz_places WHERE id = COALESCE(greatgrandparent.place_id, grandparent.place_id, parent.place_id)) FROM moz_historyvisits self JOIN moz_bookmarks b ON b.fk = COALESCE(greatgrandparent.place_id, grandparent.place_id, parent.place_id) LEFT JOIN moz_historyvisits parent ON parent.id = self.from_visit LEFT JOIN moz_historyvisits grandparent ON parent.from_visit = grandparent.id AND parent.visit_type IN (5, 6) LEFT JOIN moz_historyvisits greatgrandparent ON grandparent.from_visit = greatgrandparent.id AND grandparent.visit_type IN (5, 6) WHERE self.visit_type IN (5, 6) AND self.place_id = h.id LIMIT 1 ) FROM moz_places h WHERE h.url = :page_url

Fully reproducible on Win and Mac with Firefox 12 and 13.
mak identified that this only happens if a custom homepage is set. Mine is http://www.google.com/ncr which redirects to http://www.google.com.

If I use the default homepage, the problem does not occur.

Both of these two urls have about 4,6k visits in my places.sqlite.
we need to reproduce the data distribution that is creating the issue in the first place, that may be tricky. My assumption is that a specific data distribution makes the query extremely slow.
Assignee: nobody → mak77
tried to unwrap the query, seems to bring a net improvement according to Andreas, though have to triple-check I'm not missing some pieces.

SELECT h.id, h.url, h.favicon_id, h.guid,
(SELECT url FROM moz_places WHERE id =
  (  
     SELECT fk FROM moz_bookmarks b WHERE b.fk = h.id
     UNION ALL
     SELECT greatgrandparent.place_id
     FROM moz_historyvisits greatgrandparent
     JOIN moz_historyvisits grandparent ON grandparent.from_visit = greatgrandparent.id
     JOIN moz_historyvisits parent ON parent.from_visit = grandparent.id   AND grandparent.visit_type IN (5, 6)
     JOIN moz_historyvisits dest ON dest.from_visit = parent.id  AND parent.visit_type IN (5, 6)
     WHERE EXISTS(SELECT id FROM moz_bookmarks WHERE fk = dest.place_id)
     AND dest.place_id = h.id AND dest.visit_type IN (5,6)
     UNION ALL
     SELECT grandparent.place_id
     FROM moz_historyvisits grandparent
     JOIN moz_historyvisits parent ON parent.from_visit = grandparent.id
     JOIN moz_historyvisits dest ON dest.from_visit = parent.id  AND parent.visit_type IN (5, 6)
     WHERE EXISTS(SELECT id FROM moz_bookmarks WHERE fk = dest.place_id)
     AND dest.place_id = h.id AND dest.visit_type IN (5,6)
     UNION ALL
     SELECT parent.place_id
     FROM moz_historyvisits parent
     JOIN moz_historyvisits dest ON dest.from_visit = parent.id
     WHERE EXISTS(SELECT id FROM moz_bookmarks WHERE fk = dest.place_id)
     AND dest.place_id = h.id AND dest.visit_type IN (5,6)
     LIMIT 1
)) FROM moz_places h WHERE h.url = :page_url
I think the first line of that query would be:
SELECT h.id, h.favicon_id, h.guid,

(without the "h.url")

right?
yes, I probably modified it locally to read the url.
the fk conditions should be on the parents too.
Thankfully Andreas was able to hand me an anonymized database, that I can use to debug. The issue is due to poor choice of the join of bookmarks with a temp table of visits.  The query plan shows a SCAN TABLE moz_bookmarks AS b (~1000000 rows) that is painful.
Attached patch patch v1.0Splinter Review
I'm willing to backport this at least to Aurora, since while may not hit all the users, may hit in a quite bad way (12 seconds hang doesn't sound good)
Unfortunately this falls into that category for which we need to evaluate a new SQL harness, or have slow SQL telemetry reported in a usable way.
Attachment #610518 - Flags: review?(dietrich)
Note that I'm reducing the redirection levels we support, I don't think we should pay the price for 3 levels until we have a performant way to do it.
Comment on attachment 610518 [details] [diff] [review]
patch v1.0

Review of attachment 610518 [details] [diff] [review]:
-----------------------------------------------------------------

this looks ok, r=me.

is there a way we can produce diagnostic information about punishingly broad queries like this? possibly assert for scenarios where the query can't possibly perform well?
Attachment #610518 - Flags: review?(dietrich) → review+
which kind of assert? asserting in debug mode is unlikely to catch slow SQL like this, since it's really bound to the distribution of the data, our tests don't have that many distributions to catch it.
Asserting to the user, may be painful, and hard to distinguish a query taking 2 seconds cause it has to (like a complicated migration from previous version, or maintenance) from a query running slow by chance.
Slow SQL telemetry should catch these, but we need some sort of a dashboard showing any query taking more than 1 second (for example). Afaik so far we just have an ftp folder with a bunch of unreadable text (list of queries and times in multi megabytes files).
https://hg.mozilla.org/mozilla-central/rev/5dfad57e46ac
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment on attachment 610518 [details] [diff] [review]
patch v1.0

[Approval Request Comment]
Regression caused by (bug #): exists from at least Firefox 9
User impact if declined: Users having a redirecting page as homepage hit multi-seconds (this case was 12 seconds) hangs when using the locationbar after startup
Testing completed (on m-c, etc.): m-c
Risk to taking this patch (and alternatives if risky): worst risk is some redirecting favicon may be wrong, though changes have been limited as far as possible, to avoid mistakes
String changes made by this patch: none
Attachment #610518 - Flags: approval-mozilla-aurora?
(In reply to Marco Bonardo [:mak] from comment #13)
> User impact if declined: Users having a redirecting page as homepage hit
> multi-seconds (this case was 12 seconds) hangs when using the locationbar
> after startup
> Risk to taking this patch (and alternatives if risky): worst risk is some
> redirecting favicon may be wrong, though changes have been limited as far as
> possible, to avoid mistakes

Do we know of any other user reports of this issue?
it's hard to tell, this may potentially happen everytime a user visits a page and something else (like Sync or expiration) tries to do a database write, the effect would be an hang whose causes would be hard to report apart "Firefox occasionally hangs while I'm browsing".
In the reported case it happened cause the homepage is a visit, and the locationbar init needs to do a single write-like operation (that's the only time the locationbar has to do that).
Comment on attachment 610518 [details] [diff] [review]
patch v1.0

[Triage Comment]
Let's let this ride the train since the regression has been around for a while, this isn't causing significant user pain, and the patch itself carries non-zero risk.
Attachment #610518 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
(In reply to Alex Keybl [:akeybl] from comment #17)
> Comment on attachment 610518 [details] [diff] [review]
> patch v1.0
> 
> [Triage Comment]
> Let's let this ride the train since the regression has been around for a
> while, this isn't causing significant user pain, and the patch itself
> carries non-zero risk.

Well, it hurt me so much (and still does) that I was willing to pay money to someone for looking into this. I'm using Aurora on all of my computers and everytime I think twice before I restart because the blocking UI is a huge issue for me.

I am an AMO editor, we have to restart the browser dozens of times a day when testing add-ons. This really slows me down. It wastes my already limited time unnecessarily.

I can understand that this is not critical enough for landing on beta, but for aurora I didn't expect any landing issues, especially because the patch contains very minimal risk.
That the regression is around for some time now shouldn't not have an impact on this, in either way.
(In reply to Andreas Wagner [:TheOne] from comment #18)
> Well, it hurt me so much (and still does) that I was willing to pay money to
> someone for looking into this. I'm using Aurora on all of my computers and
> everytime I think twice before I restart because the blocking UI is a huge
> issue for me.

If there's ever any confusion as to whether or not a bug should be prioritized for a specific release, please nominate for tracking while the investigation is still occurring.

> I am an AMO editor, we have to restart the browser dozens of times a day
> when testing add-ons. This really slows me down. It wastes my already
> limited time unnecessarily.

This new context does make me sympathetic, but doesn't explain why you can't simply work around the issue in the meantime when using a version of Firefox >14.

> I can understand that this is not critical enough for landing on beta, but
> for aurora I didn't expect any landing issues, especially because the patch
> contains very minimal risk.
> That the regression is around for some time now shouldn't not have an impact
> on this, in either way.

This same case can be made for any bug fix on mozilla-central. The fact that this fix had non-zero risk of regression, and only one user reporting the issue, makes getting this into an earlier train a low priority.
blocking-kilimanjaro: --- → +
blocking-kilimanjaro: + → ---
(In reply to Alex Keybl [:akeybl] from comment #19)
> If there's ever any confusion as to whether or not a bug should be
> prioritized for a specific release, please nominate for tracking while the
> investigation is still occurring.

Ah, didn't know that. Will do next time, thanks!

> > I am an AMO editor, we have to restart the browser dozens of times a day
> > when testing add-ons. This really slows me down. It wastes my already
> > limited time unnecessarily.
> 
> This new context does make me sympathetic, but doesn't explain why you can't
> simply work around the issue in the meantime when using a version of Firefox
> 14.

I'm not using 14 because using nightly for doing stability and performance tests (which is part of the add-on review) is a risky. It could taint the results. For my main Firefox profile I could use 14 theoretically as I'm restarting that a couple of times a day too, but again using nightly is too risky for that.

> 
> > I can understand that this is not critical enough for landing on beta, but
> > for aurora I didn't expect any landing issues, especially because the patch
> > contains very minimal risk.
> > That the regression is around for some time now shouldn't not have an impact
> > on this, in either way.
> 
> This same case can be made for any bug fix on mozilla-central. The fact that
> this fix had non-zero risk of regression, and only one user reporting the
> issue, makes getting this into an earlier train a low priority.

Sorry, I'm not as familiar as you are with the landing criteria. I understand the reasons and I respect them, just keep the following in mind:
Only one user reporting it does not mean necessarily that only one user is experiencing it. The majority of users don't even know what bmo is, we can not expect them to file bug reports. 
Perf problems take a lot more time and effort to be recognized as such, so don't expect a lot of people reporting them. Marco said this has been around since 9 or so and yet it took a lot of time until I was annoyed enough to start digging deeper, besides that it's often really hart to find a starting point for doing so. (I even have the "advantage" that I'm involved with Mozilla for more than three years know, and a good part of that time was just about working with places.sqlite). In contrary to a broken UI element, they don't prevent you from using the browser, they just slow you down.

Just my two cents...thw good news is that it's only two weeks till next merge :)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: