Closed Bug 1239708 Opened 9 years ago Closed 7 years ago

Improve the autofill decisions algorithms

Categories

(Toolkit :: Places, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox62 --- fixed

People

(Reporter: mak, Assigned: adw)

References

(Depends on 4 open bugs, Blocks 4 open bugs, Regressed 1 open bug)

Details

(Whiteboard: [fxsearch][unifiedcomplete])

Attachments

(7 files)

In bug 769348 we changed the algorithm to figure out the best schema for an host, we decided to go with "use a prefix if all typed pages for that host use that prefix". That was after a long discussion in bug 769994 where users needed a way to override our too aggressive algorithm that was forcing a prefix even if just one entry had that prefix. Let's make an example, supposing the host "mozilla.org". Before bug 769348 it was enough that just one typed page with host "mozilla.org" had "https" schema, for us to suggest an https autofill entry. That was nice from a security point of view, but opened a can of worms (bug 769994), mostly due to https pages returning errors or broken certificates and other situations we cannot detect in Places. So it was decided in bug 769348 to suggest a given prefix for "mozilla.org" only if all typed pages have that prefix. The second solution offers a way out to users that couldn't always use https on an host. On the other side makes extremely easy to lost secure prefix just by typing once the non secure host. Typing in reality includes various user actions: - Opening a tab from "another device" (Sync) - middle-mouse paste - typing an url - loading one (or multiple) pages from any history view On a second thought, I think both decisions were pretty extreme, what matters is that we should somehow be able to adapt to the user/page changes over time. I think we have 2 problems here. The first problem is that we are abusing the typed flag and that pollutes data. It's not even anymore a real typed flag, since clicking on any history view sets it, plus we don't set it for loads from the bookmarks views that are likely even more important? Due to this polluting, since autofill is based on typed being set, we autofill a page loaded once from history, but not a bookmarked page :( This is sub-optimal. We could try to restore the typed flag to its original intent, remove browser.urlbar.autoFill.typed (it's the default) and change the algorithm to decide what to autofill. We decided to autofill only typed domains in bug 720258, since otherwise we were autofilling pages that the user visited just once... Using typed looked like a simple solution, but it's again sub-optimal. A better decision may be based on frecency of the host, we could autofill only if it's over a given threshold (TBD). This would likely also bring some perf wins, since keeping typed in sync has a cost. The second problem is that by stating "use a schema only if all typed pages have it" does not adapt with time. it's enough to have one single wrong visit to break the profile forever. Again, I think a better solution would be to base the decision on frecency, like we could say "use a schema only if all of the 3 most frecent pages have it". Both algorithms can be tweaked by changing the frecency threshold and number of frecent pages. The work to implement these can be splitted into 3 to 4 bugs, it's not extremely complex but involves changes to the awesomebar behavior, imo they will end up being improvements, since we will be able to better adapt to each single user, but it's still touching the most common point of interaction and autofill behavior. I'd like to get some thoughts on these suggested changes. Plus, I'd like to know from managers whether it's worth I spend some time on trying to implement them, these affect awesomebar quality.
Flags: needinfo?(past)
Flags: needinfo?(paolo.mozmail)
Flags: needinfo?(dolske)
Flags: needinfo?(adw)
Priority: -- → P2
I don't have a strong opinion here. Sounds like a good improvement?
Flags: needinfo?(dolske)
I agree that we should remove the variable of whether a URI was typed using the keyboard or not in our frecency calculations. This may even simplify our code? We may obviously still want to exclude from the calculations (or even from the recording of the visit) some loads without direct interaction, like pinned tabs and tabs restored by session restore, or maybe consider them differently. Selecting the schema based on frecency is also better. We should ensure that the adaptive algorithm still works for schema changes, in other words if I'm suggested HTTP but I manually select an HTTPS base domain, results should adapt for next time. I think this may occur naturally anyways without significant code changes.
Flags: needinfo?(paolo.mozmail)
(In reply to :Paolo Amadini from comment #2) > We may obviously still want to exclude from the calculations (or even from > the recording of the visit) some loads without direct interaction, like > pinned tabs and tabs restored by session restore, or maybe consider them > differently. Currently we don't ever register visits to pages restored with a session. That is a bug in itself (bug 613126) since supposing you use that page everyday but it's always restored, it will have a very low frecency. It is sort of off-topic here, but the problem exists. > Selecting the schema based on frecency is also better. We should ensure that > the adaptive algorithm still works for schema changes, in other words if I'm > suggested HTTP but I manually select an HTTPS base domain, results should > adapt for next time. I think this may occur naturally anyways without > significant code changes. By using frecency it would happen naturally, but may take some days for the new prefix to replace the old one. It would still be an improvement over the current situation where it may take one year and typing even just once the wrong prefix restarts the "timer". We could then build improvements on top of a better situation.
I agree that this seems like a good improvement. I'm less certain about the relative priority against other awesomebar work, but I think this should be part of the Places bug list that we talked about this week.
Flags: needinfo?(past)
one of the reasons I filed this, is that due to the regression bug 1234186, we have basically polluted users autofill data for the next year.
Yes, that one is quite nasty. This one seems certainly important too, I just don't know off the top of my head what else we've got.
Whiteboard: [fxsearch][unifiedcomplete]
That sounds good to me, Marco. I think using frecency is probably better than the two alternatives you mention, but I also think there's a danger that it will become self-reinforcing and cause you to get "stuck" with certain autofills over time, even if they aren't optimal, especially if you primarily visit pages by using autofill. I don't know what to do about that though.
Flags: needinfo?(adw)
Just to annotate a further possible improvement, we have adaptive history but we only use it to add a normal entry. We could, in addition to frecency, use adaptive history for the decision. For example, if the typed word happens to be at the beginning of the host, it's a very good sign the user may want to autofill THAT host, rather than another one.
I hope this happens already: Suggest (only) https scheme for a (sub)domain if there has been a >0 HSTS header/preloading. And: If I type in "nevervisitedbefore.example" for the first time and that domain is HSTS preloaded, please suggest "Visit https://nevervisitedbefore.example/".
We try to serve secure versions when possible, but we don't have available data about HSTS to query. On the other side, the usage of HSTS makes the request even less interesting.
Blocks: 1341350
Blocks: 764062
Blocks: 1352847
Blocks: 1406654
Assignee: nobody → adw
Status: NEW → ASSIGNED
Work in progress. I've just started poking around, so part of this is me developing an understanding of how autofill works. I realized as I'm typing this that this patch isn't doing quite what I thought. So far I'm setting up moz_hosts like this: "CREATE TABLE moz_hosts (" \ " id INTEGER PRIMARY KEY" \ ", rev_host TEXT NOT NULL UNIQUE" \ ", host TEXT NOT NULL" \ ", prefix TEXT" \ ", frecency INTEGER" \ ")" \ A moz_hosts entry per rev_host. But what I really want is an entry per origin (prefix + host (and probably port too, which would cover a related bug)), not rev_host. I was thinking rev_host included the prefix/scheme (all in reverse order of course), but I don't think that's true: https://dxr.mozilla.org/mozilla-central/rev/b056526be38e96b3e381b7e90cd8254ad1d96d9d/toolkit/components/places/Helpers.cpp#186 Anyway, the idea is to have an entry per origin, each with its own frecency.
Oh and one question I had is why do we complete only hosts, not full URLs (unless your search string includes a "/")? Why not just sort all matching moz_places, sort by frecency, and autofill that URL? (And make it the heuristic result too I guess, since it looks like autofill is tied to it.) I tested with Chrome and Safari, and they also complete hosts first, so at least we're consistent with them.
Comment 0 has some details about what we did and what we could do. (In reply to Drew Willcoxon :adw from comment #16) > A moz_hosts entry per rev_host. But what I really want is an entry per > origin (prefix + host (and probably port too, which would cover a related > bug)), not rev_host. Why introducing another moz_hosts table? what are you trying to achieve that the current one doesn't provide? (In reply to Drew Willcoxon :adw from comment #17) > Oh and one question I had is why do we complete only hosts, not full URLs > (unless your search string includes a "/")? This was an original idea of Limi, back in the past, because very often users want to visit the main page of a site, even if they then move to a sub page. In most cases other browsers act similarly nowadays, so looks like the idea was inherited around. Based on that idea, we built a moz_hosts table that is made exactly for filling hosts. The main problem we hit was that we must use an index, but often the user doesn't type the scheme, so we need a table that indexes hosts without schemes. That way, when you type "mo" we can sort all the hosts and pick the ones starting with "mo", it's fast, and that was critical because autofill happens while typing. > Why not just sort all matching > moz_places, sort by frecency, and autofill that URL? Because we cannot match on moz_places, there's no good index to match. The mistakes we have in the current system imo are 2 main ones: 1. We only autofill hosts, it would be great in some cases to provide more, but we don't have a good way to match and matching the whole moz_places is too expensive. Adaptive history (moz_inputhistory table) sounds like a very good candidate though, since it's pages explicitly picked by the user when he types certain letters. if those letters exactly match the beginning of the page host, we could fill that adaptive url. It may require to generate a url_fragment field in moz_inputhistory that we can index, that could contain a partial fixed url 2. We use typed hosts. This is both under-performing and ugly, because we were never good at defining what "typed" means. We have frecency of each host, we should fill the top frecency host if its frecency is over a certain value. The problem is defining that value, because frecency is different PER USER. We could maybe define something like frecency > MAX(frecency) / 2. Additionally, a perf problem we have, is that rev_host includes "WWW" bu we don't care to differenciate anymore. That has perf implications when updating moz_hosts (bug 843357). Should we go back at filling urls? Probably yes, but it's a complex problem to solve in a very efficient way, and that's why here I suggested starting with frecency hosts picking and using moz_inputhistory. both are things that adapt to the user habits and thus look very promising for this kind of stuff.
An example about input history. Say that you type "mo" and instead of visiting "mozilla.org" you instead pick "mozilla.org/contribute" from the list. moz_inputhistory will remember that. We could store in moz_inputhistory the fact "mo" is a good autofill candidate for the url, and the next time you type "mo" we autofill "mozilla.org/contribute". Again moz_inputhistory has a score, we should thus pick a threshold.
a last note, it may also be possible to provide a url fragment (the most frecent url) per each host, adding a field to moz_hosts. it would pretty much be a suffix field (currently we have a prefix field for the scheme). This would allow to still quickly match on the host, but then provide prefix + host + suffix as a result, that is the whole url. Whether this is better or worse than the previous suggestions, is TBD. It may be a bit more expensive to keep moz_hosts in sync. (In reply to Drew Willcoxon :adw from comment #17) > Oh and one question I had is why do we complete only hosts, not full URLs > (unless your search string includes a "/")? One thing I forgot, we autofill urls if the string ends with "/" for 2 reasons: 1. is slow (no index) 2. it allows "m" + RIGHT + "c" + ENTER to go to mozilla.org/contribute.
(In reply to Marco Bonardo [::mak] from comment #18) > Comment 0 has some details about what we did and what we could do. > Why introducing another moz_hosts table? what are you trying to achieve that > the current one doesn't provide? The second problem you described in comment 0 is about schemes, choosing the best scheme for a host. That's the problem I was trying to address. If moz_hosts were keyed on scheme + host instead of simply host (but also still had a separate host column that was indexed for searching), then we could just find all the rows whose hosts match what the user is typing and then pick the most frecent. That would give us the most frecent scheme + host. (And I wouldn't be adding another moz_hosts table. I'd be migrating the current one.) I'll ruminate on all these comments, thanks for them.
One thing that comes to mind immediately from your comments is that I'm not sure we should correlate autofill with the user's typing at all. If you visit mozilla.org often, shouldn't we just autofill it when you start typing it, regardless of *how* you have visited it in the past? For example, say it's bookmarked in your toolbar and you often click it, but you have never before typed "mozilla" in the urlbar. When you do first start typing "mozilla", we should autofill it. (Or more precisely, consider it a strong candidate along with all the other candidates for autofill.) If that's the case, then would we want to base autofill on moz_inputhistory? It seems not, right?
(In reply to Drew Willcoxon :adw from comment #21) > If moz_hosts were keyed on scheme + host instead of simply host (but also still > had a separate host column that was indexed for searching), then we could > just find all the rows whose hosts match what the user is typing and then > pick the most frecent. well, it is sort of done like that, since we have a prefix column containing the scheme (+eventual www) that we use. The current algo for the scheme has already been changed from "all the pages should have that scheme" to "most of the pages have that scheme", we could go further and just use scheme of the most frecent page, maybe. That happened in the dependency bug 1341350. (In reply to Drew Willcoxon :adw from comment #22) > One thing that comes to mind immediately from your comments is that I'm not > sure we should correlate autofill with the user's typing at all. If you > visit mozilla.org often, shouldn't we just autofill it when you start typing > it, regardless of *how* you have visited it in the past? Yes, that's why we should move off from using typed (and I was suggesting a frecency threshold). On the other side, the idea behind using only typed pages, was to avoid annoing the user with useless autofills. for example if you visited only once "mycatismad.com", autofilling it on "m" could not be exactly useful. > For example, say > it's bookmarked in your toolbar and you often click it, but you have never > before typed "mozilla" in the urlbar. When you do first start typing > "mozilla", we should autofill it. Yep, frecency is how we track that kind of stuff ineed. > If that's the case, then would we want to base autofill on moz_inputhistory? > It seems not, right? moz_inputhistory has a different advantage. While typing a url doesn't mean much, typing something and explicitly picking a url from a list, is a strong sign that when typing that word you likely want to visit that url. The other advantage is that the table is smaller, typed fragments are (probably?) already indexed, and at the act of inserting into it we could already store if it's a good autofill candidate (if it's domain starts with the typed letters". Thus it would be cheap to query, and could likely often pick what the user wanted. Should inputhistory win over a very frecent host? I don't know off-hand :(
> On the other side, the idea behind using only typed pages, was to avoid annoing the user with useless autofills. for example if you visited only once "mycatismad.com", autofilling it on "m" could not be exactly useful. Unfortunately that's pretty much the current behaviour. I type a single 'j' in the location bar, Firefox autofills and provides as the first suggestion http://js1k.com/. I have never even visited that URL. I visited a single page on that site a total of twice, most recently in January 2015. The current algorithm *always* suggests some seemingly random domain's root URL first if it can, even if it's a completely irrelevant domain. Meanwhile, the second suggested URL for 'j' has been visited > 200 times. I've also visited http://jalopnik.com/ 43 times. If the top suggestion must be a root URL for some reason (and there are good reasons why it shouldn't), it should at least be the legitimately most popular one. So your idea of preferencing typed URLs looks 100% correct from here.
(In reply to Paul from comment #24) > > On the other side, the idea behind using only typed pages, was to avoid annoing the user with useless autofills. for example if you visited only once "mycatismad.com", autofilling it on "m" could not be exactly useful. > > Unfortunately that's pretty much the current behaviour. I type a single 'j' > in the location bar, Firefox autofills and provides as the first suggestion > http://js1k.com/. I have never even visited that URL. I visited a single > page on that site a total of twice, most recently in January 2015. Yes, that's the problem in using "typed" as we define it. You likely "typed" that page without being conscious about that, because we mark pages as typed also if you pick them from the history menu, for example. It's just wrong. A frecency threshold would adapt to your changed habits and would easily discard low traffic pages.
Priority: P2 → P1
Another WIP that continues from the last one. It kind of works, but not really because prefix handling is messed up.
Comment on attachment 8930304 [details] Bug 1239708: Improve awesomebar autofill. Part 0: Core changes. Marco, could you please give some feedback at this point? The code is very messy, and right now I'd just like for you to look at test_autofill_hosts.js to see what I'm going for. (Feel free to look at the code course.) The code and test do work, though. Try builds if you'd like: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e1a2bcea8f46 As you can see from the test, this doesn't drastically change how autofill/completion works, but it does make these changes: * Auto-completion is now in terms of origins (scheme + host + port), not only hosts. * The only requirement for an origin to be a candidate for completion is that it appears in moz_places. IOW if a URL is in moz_places, then its origin is a candidate for completion. No "typed" or any other criteria. * A single host can map to multiple candidates for completion, one per scheme. Each origin has its own frecency, which is the max of frecencies of all URLs in moz_places with the origin. So if a URL has http and https versions, you can make autofill use the https version if you visit it more. * Ports are now completed. (bug 764062) * If example.com is in moz_places but www.example.com is not, I don't think we should allow "www.ex" to complete to www.example.com (or example.com). The reverse isn't true though: If www.example.com is in moz_places but example.com is not, then allow "ex" to complete to www.example.com, like we do now.
Attachment #8930304 - Flags: feedback?(mak77)
(In reply to Drew Willcoxon :adw from comment #29) > * If example.com is in moz_places but www.example.com is not, I don't think > we should allow "www.ex" to complete to www.example.com (or example.com). > The reverse isn't true though: If www.example.com is in moz_places but > example.com is not, then allow "ex" to complete to www.example.com, like we > do now. Yes, this is unclear behavior. One side it's true the www. version may not exist, thus it's wrong to add it. The other side, we are used to listen everywhere things like "visit www.something.com", in most TV ads in Italy for example they always explicitly read the www. Though, for that to be a problem the user should have first visited the non www version (somehow, maybe through a link), and then try to type the www version. I don't know if that's really common, in any case he can just continue typing what he wants, we'll just not fill up. To sum up, I think it's fine to proceed with your idea and check the feedback. It should do. I'll look at the patch now and see the schema changes behind the decision. The thing I'm mostly interested in is how we match and what's the destiny of the moz_hosts table.
Some random thoughts: 1. You could probably define a compound index between origin and host, maybe even a primary key (and use WITHOUT ROWID). Though how do we handle downgrades of the change to moz_hosts? 2. the 2 indices on origin are likely to grow the db a bit (expiration will probably reduce history size). We already took a hit recently to store Activity Stream metadata. We may want to bump up the target db size to avoid a second hit. See https://telemetry.mozilla.org/new-pipeline/evo.html#!aggregates=median&cumulative=0&end_date=null&keys=&max_channel_version=nightly%252F59&measure=PLACES_MOST_RECENT_EXPIRED_VISIT_DAYS&min_channel_version=nightly%252F56&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=null&trim=1&use_submission_date=0 3. If I understand correctly, you match the string against moz_hosts, both host and origin, because the user may have typed the scheme or not. From there get the most frecent origin. The most obvious thing is that we are keeping multiple entries per host, but we'll always only need the most frecent one. This I/O loss must be compensated by a perf win, so I hope this is more efficient than the prefix/host/suffix 3 fields system (where prefix and suffix must be continuously kept in sync). The origin field in moz_places appears to be used just to match urls with origins, and keep the hosts table up-to-date. I'm mostly worried by the additional space taken by: "moz_places.origin x2" (column+index) + "moz_hosts.origin x2" (column + index) that is pretty much "origin x4". If we could avoid the moz_places field, we'd recover some space. A possible idea could be to store just the origin length, that along with the uri allows to rebuild the origin string. Then we could match on rev_host and on the origin length, the remaining results list may be small enough for a linear scan? I'll keep brainstorming.
Comment on attachment 8930304 [details] Bug 1239708: Improve awesomebar autofill. Part 0: Core changes. It's actually more than 4 times the origin, since the origin will be repeated multiple times (for each place). I think the downgrade problem and the origin field in moz_places are the 2 biggest issues I saw, this far. From a functional point of view, I still think we should have a frecency threshold to fill, to avoid filling a page the user only visited once a year ago. Maybe we could use MAX(frecency) * tbd_ratio
Attachment #8930304 - Flags: feedback?(mak77)
(In reply to Marco Bonardo [::mak] from comment #31) > Some random thoughts: > 1. You could probably define a compound index between origin and host, maybe > even a primary key (and use WITHOUT ROWID). Though how do we handle > downgrades of the change to moz_hosts? Are there any inherent benefits or downsides to using a compound key vs. a single-column key in Sqlite (ignoring ancillary concerns like info duplication with other columns)? Downgrades -- fair question, but I'm not too worried about it. If we go to keying on origins, then we're storing more info than we used to. A downgrade will necessarily have to throw away some of that info. We can pick the most frecent origin per host and save that in the downgrade. > 3. If I understand correctly, you match the string against moz_hosts, both > host and origin, because the user may have typed the scheme or not. From > there get the most frecent origin. Yes. > The most obvious thing is that we are > keeping multiple entries per host, but we'll always only need the most > frecent one. This I/O loss must be compensated by a perf win, so I hope this > is more efficient than the prefix/host/suffix 3 fields system (where prefix > and suffix must be continuously kept in sync). The origin field in > moz_places appears to be used just to match urls with origins, and keep the > hosts table up-to-date. I'm mostly worried by the additional space taken by: > "moz_places.origin x2" (column+index) + "moz_hosts.origin x2" (column + > index) that is pretty much "origin x4". If we could avoid the moz_places > field, we'd recover some space. > A possible idea could be to store just the origin length, that along with > the uri allows to rebuild the origin string. Then we could match on rev_host > and on the origin length, the remaining results list may be small enough for > a linear scan? I'll keep brainstorming. That's an interesting idea. Another idea I had was to have a moz_origins table with an integer primary key. Then we only store origins in that table. The other tables have integer foreign keys into this table. We could even break the origins table into two tables to save more space (?): one table for schemes/prefixes, one for hosts (plus ports?). Is that crazy? There would be lots of joining. (In reply to Marco Bonardo [::mak] from comment #32) > From a functional point of view, I still think we should have a frecency > threshold to fill, to avoid filling a page the user only visited once a year > ago. Maybe we could use MAX(frecency) * tbd_ratio Yeah, I think something like that makes sense. I'm glad you mentioned it again.
(In reply to Drew Willcoxon :adw from comment #33) > Are there any inherent benefits or downsides to using a compound key vs. a > single-column key in Sqlite (ignoring ancillary concerns like info > duplication with other columns)? no rowid (less useless data) and faster matching when querying both info in the same query. > Downgrades -- fair question, but I'm not too worried about it. If we go to > keying on origins, then we're storing more info than we used to. A > downgrade will necessarily have to throw away some of that info. We can > pick the most frecent origin per host and save that in the downgrade. Yes, but the old version doesn't know anything about your modified table, it will just try to use the old moz_hosts the old way. There's no downgrade path. > That's an interesting idea. Another idea I had was to have a moz_origins > table with an integer primary key. Then we only store origins in that > table. The other tables have integer foreign keys into this table. We > could even break the origins table into two tables to save more space (?): > one table for schemes/prefixes, one for hosts (plus ports?). Is that crazy? > There would be lots of joining. Yes that's another possibility, indeed if I should redesign hosts and places, I would keep an origins table apart with origin, host, rev_host and just join with moz_places. The important thing is limit repeated textual data in moz_places.
also, the compound index if it's a primary key, will avoid dupes.
(In reply to Marco Bonardo [::mak] from comment #34) > Yes, but the old version doesn't know anything about your modified table, it > will just try to use the old moz_hosts the old way. > There's no downgrade path. Ah I think I misunderstood. I was thinking we had downgrade code paths like we do for upgrades. But that's not true, is it? Regardless, I think it makes sense to make a new table for this (moz_origins), and that's what I was planning to do ultimately. That would avoid downgrade problems, right?
(In reply to Drew Willcoxon :adw from comment #36) > Ah I think I misunderstood. I was thinking we had downgrade code paths like > we do for upgrades. But that's not true, is it? Regardless, I think it > makes sense to make a new table for this (moz_origins), and that's what I > was planning to do ultimately. That would avoid downgrade problems, right? The remaining problem would be that on downgrade the old moz_hosts would be out of date, because we'd stop updating it (we now have a new table), and on a next upgrade moz_origins would be out of date because the old version updates moz_host (but we could handle that through a migration that inserts missing entries). I'm not sure how critical is the fact a downgrade may miss some host. After the removal of about:permissions, the only consumer seems to be UnifiedComplete (the moz_host in nsPermissionManager.cpp is another table with the same name, so it's also good we change its name), so likely not a big deal. If we ship this in 59 it's even better because it's an ESR.
Another WIP that uses three tables: moz_prefixes, moz_hosts, and moz_origins: #define CREATE_MOZ_PREFIXES NS_LITERAL_CSTRING( \ "CREATE TABLE moz_prefixes (" \ " id INTEGER PRIMARY KEY" \ ", prefix TEXT NOT NULL UNIQUE" \ ")" \ ) #define CREATE_MOZ_HOSTS NS_LITERAL_CSTRING( \ "CREATE TABLE moz_hosts (" \ " id INTEGER PRIMARY KEY" \ ", host TEXT NOT NULL UNIQUE" \ ")" \ ) #define CREATE_MOZ_ORIGINS NS_LITERAL_CSTRING( \ "CREATE TABLE moz_origins (" \ " id INTEGER PRIMARY KEY" \ ", prefix_id INTEGER REFERENCES moz_prefixes(id) ON DELETE CASCADE" \ ", host_id INTEGER REFERENCES moz_hosts(id) ON DELETE CASCADE" \ ", frecency INTEGER" \ ")" \ ) This means the hosts query in UnifiedComplete requires a couple of joins. I wonder what perf impact that has, and whether it's maybe a better idea to denormalize the table used for autofill even at the expense of a larger DB. (I would guess so.)
duplication in moz_origins may not be a problem. My main complain was the duplication in moz_places because there the same origin is repeated many (likely hundreds) times, while in origins there would be at most scheme/www/plain duplicates. So I guess denormalizing origins could be ok if you are looking for feedback.
Yes, thanks.
More WIP: * This keeps the tables from the previous WIP and adds a denormalized moz_autofill_origins table * Adds back support for restricting autofill to "typed" pages because I think some people might complain if we remove it totally. But I default browser.urlbar.autoFill.typed to false instead of true so that by default we don't restrict. * Fixes the bookmarked host (now origin) query * Expands the test a little * Starts to clean up * One other thing I'd like to do here is to move rev_host from moz_places to the hosts table. * More to do as discussed in the bug comments above
(In reply to Drew Willcoxon :adw from comment #43) > * Adds back support for restricting autofill to "typed" pages because I > think some people might complain if we remove it totally. But I default > browser.urlbar.autoFill.typed to false instead of true so that by default we > don't restrict. I honestly don't think we should bring on this complication, it's additional code and queries, and requires syncing the typed field that is expensive. Additionally as I previously said, our "typed" field is pretty much broken. The reasoning doesn't sound very valid, whatever behavior change we make someone may miss the old behavior, we are aiming at a behavior that can satisfy most of our users, knowing someone may not agree with it. I'm happy to discuss any other alternative to limit results. > * One other thing I'd like to do here is to move rev_host from moz_places to > the hosts table. Even if I agree with the idea, sounds like this would make downgrades ever more nightmare-ish.
I started looking at the URL query today. If we're going to add a new denormalized moz_autofill_origins table that has textual prefix and host columns, as my patch does, then it seems to me we'd want a table with a textual path column too. Or in other words, a single autofill table like this: #define CREATE_MOZ_AUTOFILL NS_LITERAL_CSTRING( \ "CREATE TABLE moz_autofill ( " \ "place_id INTEGER PRIMARY KEY REFERENCES moz_places(id) ON DELETE CASCADE, " \ "origin_id INTEGER NOT NULL REFERENCES moz_origins(id), " \ "path_id INTEGER NOT NULL REFERENCES moz_paths(id), " \ "prefix TEXT NOT NULL, " \ "host TEXT NOT NULL, " \ "path TEXT NOT NULL, " \ "frecency INTEGER NOT NULL " \ ") " \ "WITHOUT ROWID" \ ) `path` would be truncated to 32 characters or so. origin_id and path_id point to separate moz_origins and moz_paths tables. The reason for those separate tables is so that each moz_place can also point to an origin and path using a single int for each instead of storing whole text strings, and the reason each moz_place needs to keep track of origin and path at all is so that frecency in moz_autofill can be updated efficiently. What do you think?
(In reply to Drew Willcoxon :adw from comment #45) > `path` would be truncated to 32 characters or so. origin_id and path_id > point to separate moz_origins and moz_paths tables. The reason for those > separate tables is so that each moz_place can also point to an origin and > path using a single int for each instead of storing whole text strings, and > the reason each moz_place needs to keep track of origin and path at all is > so that frecency in moz_autofill can be updated efficiently. > > What do you think? It sounds like it would end up taking a lot of space, regardless of the 32 chars limit. It would probably be half the size of moz_places or something like that, depending on how often those paths are duped. Moreover if you plan to search on path (and I assume you do), that info will be completely duplicated by the index. Let's suppose 40k entries, 32 chars each, duped by the index. In the best case it's 20MB, but could even be 30 or 40 for some locales. It would cancel any advantage we had by removing the index on url, to keep the db at a sane size, we should probably half the stored history. If we decide to really go towards the direction of autofilling urls instead of hosts, we should redesign the schema around that, for example by storing indexed fragments (origin+partial_path) in moz_places instead of url, and keeping the other info (prefix, full path and rev_host) in a separate table. This would be a quite large change requiring for real a places2.sqlite, since it couldn't be downgraded in any way.
WIP: * Removed typed stuff * Simplified all the searchString-related properties in UnifiedComplete added for preloaded hosts, and simplify the preloaded hosts functions and make the preloaded-host autofill function more like how the main autofill code path works * More cleanup, mostly of UnifiedComplete.js (In reply to Marco Bonardo [::mak] (Away 16 Dec - 2 Jan) from comment #46) > If we decide to really go towards the direction of autofilling urls instead > of hosts, we should redesign the schema around that > ... > This would be a quite large change requiring for real a places2.sqlite, Yes, it's hard (for me) not to want to make drastic changes with a singular focus on autofill. I shouldn't scope creep in this bug. One thing I've noticed with Safari is that it autofills not only URLs but titles too. So you can type "new york" and have it completed to "new york times" instead of having to type the domain name, nytimes.com. And the first, auto-selected result (like their version of our heuristic result) will be nytimes.com. Maybe Chrome does that too, I don't know. Of course we also include nytimes.com if it's in your history, but it won't be the heuristic result unless you type the domain. Something to think about if/as we improve autofill and the heuristic result in the future. (nytimes.com is a good example because its title isn't in the domain.)
autofilling on title is an interesting idea we can evaluate.
More WIP work on cleaning up and simplifying UnifiedComplete.js for the origin and URL queries. I modified the URL query so that it can take into account prefixes now, like the origin query.
* More UnifiedComplete.js cleanup/refactoring * Added a test for the URL query, consolidated the core test functions for both the URL query and origin query * Fixed problem where clicking the "go" arrow button in the urlbar loaded the autofilled value instead of the final complete value (i.e., mozilla.org/ instead of https://mozilla.org/.) The main things left to do: * Incorporate a frecency threshold when choosing values to autofill. Don't autofill a value unless its frecency is >= some threshold, like we've talked about in this bug. * Handle migration * More cleanup in other files
This experiments with using the mean and standard deviation of all moz_places frecencies as the threshold for deciding whether to autofill a match. I spent some time today thinking about a good threshold to use. I think max * someRatio is too simplistic. In my real profile for example, there are only two moz_places that are >= max * 0.5. The max is orders of magnitude bigger than the vast majority of moz_places, so even max * 0.1 is bigger than all but three of my moz_places. This patch uses the mean plus one standard deviation (of all frecencies >= 0) as the threshold. 57 of my moz_places (out of 112063) are >= that. 32 of my origins (out of 7965) are >= that. I'm thinking we could have a standard deviation multiplier in a hidden pref so that people could tweak the threshold if they want.
(In reply to Drew Willcoxon :adw (Away 12/20–1/3) from comment #56) > This patch uses the mean plus one standard deviation (of all frecencies >= > 0) as the threshold. 57 of my moz_places (out of 112063) are >= that. 32 > of my origins (out of 7965) are >= that. How expensive is to calculate that and how often does that calculation happen? We likely can't recalculate the threshold for each search, even if we should still somehow adapt to large history removals... I wonder if at the beginning we should just define a fixed low value threshold for things that are unlikely to be useful. It will be imperfect but if that means landing sooner and iterate later, it could be fine. > I'm thinking we could have a standard deviation multiplier in a hidden pref > so that people could tweak the threshold if they want. I don't think this is something we should expose, if we can't make a good choice we should try harder. Most of the frecency prefs should stop being exposed, in the end users at a maximum care to give more weight to bookmarks and the current system is over complex already. This is also in line with the recent prefs removal discussion thread.
(In reply to Marco Bonardo [::mak] from comment #58) > How expensive is to calculate that and how often does that calculation > happen? I'm using a running, incremental algorithm. I added another statement to moz_places_afterupdate_frecency_trigger that calls a custom update_frecency_stats function that simply updates a few stats, in O(1) time. From those stats, the mean and standard deviation can be calculated at any time, also in O(1) time. The algorithm is described here: https://en.wikipedia.org/wiki/Standard_deviation#Rapid_calculation_methods , https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Computing_shifted_data Then, when UnifiedComplete processes rows of an autofill query, it just discards rows whose frecencies are less than the threshold, which is based on the mean and standard deviation. > I wonder if at the beginning we should just define a fixed low value > threshold for things that are unlikely to be useful. It will be imperfect > but if that means landing sooner and iterate later, it could be fine. I'm not sure what you have in mind, but using a value that's not related to the shape of the individual user's frecency distribution will give poor results. I think what I'm proposing here is a good first version to land. > I don't think this is something we should expose, if we can't make a good > choice we should try harder. OK, that's fine. Using the mean plus one standard deviation is a good threshold I think. I just wouldn't mind providing an unsupported knob for people to tweak, instead of hardcoding a 1.0 multiplier.
(In reply to Drew Willcoxon :adw from comment #59) > I'm not sure what you have in mind, but using a value that's not related to > the shape of the individual user's frecency distribution will give poor > results. Yes, that's what I said originally too. I'm only trying to understand if there is a blocking problem delaying the changes, that we could breakdown into pieces. But looks like you have a good first solution to that problem, so we are in a good shape.
This has turned into a fairly big patch. I think it might be a good idea to split it up.
More WIP. * Added a test for frecency stats calculations. * More cleanup. * Started working on migration to new schema version and tests. Still to do: * Finish migration path and related tests. * More cleanup. * Split patch into more reviewable chunks.
More WIP. Now I'm just cleaning up and fixing existing tests. This is at a point where I can push to try, see what fails, and start fixing. Probably a lot now that "typed" isn't needed anymore to autofill. I'm in the middle of fixing some autocomplete xpcshell tests locally. https://treeherder.mozilla.org/#/jobs?repo=try&revision=7ef471114859
As we've discussed offline, an appropriate choice for the frecency threshold will depend on the shape of the distribution of places frecencies, so it would be good to take a look at some sample profile data. However, I expect this distribution to be generally long-tailed for most profiles, as evidenced by Drew's observation that very few items fall within some fraction of the max value. In the case of such long-tailed distributions, the mean+sd cutoff is a reasonable approach, but it does not necessarily have the typical interpretation of covering a certain percentage of the distribution. Another consideration is that I'd expect to see a majority of items with very low frecency values (eg. sites that have only been visited once or twice) which are likely to get excluded by a fixed frecency value cutoff. As per the above discussion, this may or may not be what we want. Another approach I'd suggest is using a percentile/rank-based cutoff: restrict autocompletion candidates to items among the largest 80% of frecency values, say. The advantage here is that the interpretation as to how much of the places database it "covers" is easy to work with and doesn't depend on the shape of the distribution, which may vary from user to user. Also, if the only matching candidates are low-frecency items, they are not necessarily excluded.
Thanks Dave! Re: low-frecency items: (In reply to Dave Zeber [:dzeber] from comment #66) > Another consideration is that I'd expect to see a majority of items with > very low frecency values (eg. sites that have only been visited once or > twice) which are likely to get excluded by a fixed frecency value cutoff. > As per the above discussion, this may or may not be what we want. > > Another approach I'd suggest is using a percentile/rank-based cutoff: > ... > Also, if the only matching candidates are low-frecency items, they are not > necessarily excluded. We do want to exclude low-frecency places, as a design decision, even when most or all of your places are low frecency. Marco comments on that in the last paragraph in comment 32, and I think I agree. So I do think that mean + standard deviation is a better fit for what we want. (Keeping in mind we're talking about autofilling here, not whether or not a given place will show up in the results.)
More work on getting all unifiedcomplete xpcshell tests passing.
Blocks: 843357
All unifiedcomplete xpcshell tests pass locally with this. Working on other tests still.
Comment on attachment 8930304 [details] Bug 1239708: Improve awesomebar autofill. Part 0: Core changes. https://reviewboard.mozilla.org/r/201438/#review220360 Static analysis found 11 defects in this patch. - 4 defects found by clang-tidy - 7 defects found by mozlint You can run this analysis locally with: - `./mach static-analysis check path/to/file.cpp` (C/C++) - `./mach lint check path/to/file` (Python/Javascript/wpt) If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx ::: toolkit/components/places/UnifiedComplete.js:1595 (Diff revision 15) > + } > + > + let searchStr = this._searchString; > + if (firstSlashIndex >= 0) { > + if (firstSlashIndex != searchStr.length - 1) { > + return; Error: Async method '_matchSearchEngineUrl' expected a return value. [eslint: consistent-return] ::: toolkit/components/places/UnifiedComplete.js:2409 (Diff revision 15) > - } else { > + } else { > - query.push(typed ? SQL_TYPED_HOST_QUERY > - : SQL_HOST_QUERY); > + query.push(SQL_ORIGIN_PREFIX_QUERY); > + } > + opts.prefix = this._strippedPrefix; > + } else { > + if (bookmarked) { Error: Unexpected if as the only statement in an else block. [eslint: no-lonely-if] ::: toolkit/components/places/UnifiedComplete.js:2478 (Diff revision 15) > - } else { > + } else { > - query.push(typed ? SQL_TYPED_URL_QUERY > - : SQL_URL_QUERY); > + query.push(SQL_URL_PREFIX_QUERY); > + } > + opts.prefix = this._strippedPrefix; > + } else { > + if (bookmarked) { Error: Unexpected if as the only statement in an else block. [eslint: no-lonely-if] ::: toolkit/components/places/nsNavHistory.cpp:730 (Diff revision 15) > + sFrecencyStatsSumOfSquares += unew * unew; > + } > + > + double mean, stddev; > + nsresult rv; > + rv = GetHistoryService()->GetFrecencyMean(&mean); Warning: Value stored to 'rv' is never read [clang-tidy: clang-analyzer-deadcode.DeadStores] ::: toolkit/components/places/nsNavHistory.cpp:730 (Diff revision 15) > + sFrecencyStatsSumOfSquares += unew * unew; > + } > + > + double mean, stddev; > + nsresult rv; > + rv = GetHistoryService()->GetFrecencyMean(&mean); Warning: Value stored to 'rv' is never read [clang-tidy: clang-analyzer-deadcode.DeadStores] ::: toolkit/components/places/nsNavHistory.cpp:732 (Diff revision 15) > + > + double mean, stddev; > + nsresult rv; > + rv = GetHistoryService()->GetFrecencyMean(&mean); > + MOZ_ASSERT(rv == NS_OK); > + rv = GetHistoryService()->GetFrecencyStandardDeviation(&stddev); Warning: Value stored to 'rv' is never read [clang-tidy: clang-analyzer-deadcode.DeadStores] ::: toolkit/components/places/nsNavHistory.cpp:732 (Diff revision 15) > + > + double mean, stddev; > + nsresult rv; > + rv = GetHistoryService()->GetFrecencyMean(&mean); > + MOZ_ASSERT(rv == NS_OK); > + rv = GetHistoryService()->GetFrecencyStandardDeviation(&stddev); Warning: Value stored to 'rv' is never read [clang-tidy: clang-analyzer-deadcode.DeadStores] ::: toolkit/components/places/tests/unifiedcomplete/test_swap_protocol.js:5 (Diff revision 15) > /* This Source Code Form is subject to the terms of the Mozilla Public > * License, v. 2.0. If a copy of the MPL was not distributed with this > * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > > +//XXXadw is this right? this test disables autofill. seems like the patch Error: Expected space or tab after '//' in comment. [eslint: spaced-comment] ::: toolkit/components/places/tests/unifiedcomplete/test_swap_protocol.js:7 (Diff revision 15) > * License, v. 2.0. If a copy of the MPL was not distributed with this > * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > > +//XXXadw is this right? this test disables autofill. seems like the patch > +// shouldn't affect non-autofill matches...? > +//XXXadw and at least most/all of these info()s are now wrong, need to be Error: Expected space or tab after '//' in comment. [eslint: spaced-comment] ::: toolkit/content/widgets/autocomplete.xml:1305 (Diff revision 15) > > if (itemExists) { > item = this.richlistbox.childNodes[this._currentIndex]; > > // Url may be a modified version of value, see _adjustACItem(). > + //XXXadw "url" may not be right Error: Expected space or tab after '//' in comment. [eslint: spaced-comment] ::: toolkit/content/widgets/autocomplete.xml:2022 (Diff revision 15) > </method> > > <method name="_reuseAcItem"> > <body> > <![CDATA[ > + //XXXadw don't think "url" is right Error: Expected space or tab after '//' in comment. [eslint: spaced-comment]
Code Review Bot,,,,,,,,,,,, thank's
Comment on attachment 8930304 [details] Bug 1239708: Improve awesomebar autofill. Part 0: Core changes. https://reviewboard.mozilla.org/r/201438/#review220418 Static analysis found 7 defects in this patch. - 7 defects found by mozlint You can run this analysis locally with: - `./mach lint check path/to/file` (Python/Javascript/wpt) If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx ::: toolkit/components/places/UnifiedComplete.js:1621 (Diff revision 16) > + } > + > + let searchStr = this._searchString; > + if (firstSlashIndex >= 0) { > + if (firstSlashIndex != searchStr.length - 1) { > + return; Error: Async method '_matchSearchEngineUrl' expected a return value. [eslint: consistent-return] ::: toolkit/components/places/UnifiedComplete.js:2435 (Diff revision 16) > - } else { > + } else { > - query.push(typed ? SQL_TYPED_HOST_QUERY > - : SQL_HOST_QUERY); > + query.push(SQL_ORIGIN_PREFIX_QUERY); > + } > + opts.prefix = this._strippedPrefix; > + } else { > + if (bookmarked) { Error: Unexpected if as the only statement in an else block. [eslint: no-lonely-if] ::: toolkit/components/places/UnifiedComplete.js:2504 (Diff revision 16) > - } else { > + } else { > - query.push(typed ? SQL_TYPED_URL_QUERY > - : SQL_URL_QUERY); > + query.push(SQL_URL_PREFIX_QUERY); > + } > + opts.prefix = this._strippedPrefix; > + } else { > + if (bookmarked) { Error: Unexpected if as the only statement in an else block. [eslint: no-lonely-if] ::: toolkit/components/places/tests/unifiedcomplete/test_swap_protocol.js:5 (Diff revision 16) > /* This Source Code Form is subject to the terms of the Mozilla Public > * License, v. 2.0. If a copy of the MPL was not distributed with this > * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > > +//XXXadw is this right? this test disables autofill. seems like the patch Error: Expected space or tab after '//' in comment. [eslint: spaced-comment] ::: toolkit/components/places/tests/unifiedcomplete/test_swap_protocol.js:7 (Diff revision 16) > * License, v. 2.0. If a copy of the MPL was not distributed with this > * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > > +//XXXadw is this right? this test disables autofill. seems like the patch > +// shouldn't affect non-autofill matches...? > +//XXXadw and at least most/all of these info()s are now wrong, need to be Error: Expected space or tab after '//' in comment. [eslint: spaced-comment] ::: toolkit/content/widgets/autocomplete.xml:1305 (Diff revision 16) > > if (itemExists) { > item = this.richlistbox.childNodes[this._currentIndex]; > > // Url may be a modified version of value, see _adjustACItem(). > + //XXXadw "url" may not be right Error: Expected space or tab after '//' in comment. [eslint: spaced-comment] ::: toolkit/content/widgets/autocomplete.xml:2022 (Diff revision 16) > </method> > > <method name="_reuseAcItem"> > <body> > <![CDATA[ > + //XXXadw don't think "url" is right Error: Expected space or tab after '//' in comment. [eslint: spaced-comment]
Comment on attachment 8930304 [details] Bug 1239708: Improve awesomebar autofill. Part 0: Core changes. https://reviewboard.mozilla.org/r/201438/#review220736 Static analysis found 7 defects in this patch. - 7 defects found by mozlint You can run this analysis locally with: - `./mach lint check path/to/file` (Python/Javascript/wpt) If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx ::: toolkit/components/places/UnifiedComplete.js:1621 (Diff revision 17) > + } > + > + let searchStr = this._searchString; > + if (firstSlashIndex >= 0) { > + if (firstSlashIndex != searchStr.length - 1) { > + return; Error: Async method '_matchSearchEngineUrl' expected a return value. [eslint: consistent-return] ::: toolkit/components/places/UnifiedComplete.js:2435 (Diff revision 17) > - } else { > + } else { > - query.push(typed ? SQL_TYPED_HOST_QUERY > - : SQL_HOST_QUERY); > + query.push(SQL_ORIGIN_PREFIX_QUERY); > + } > + opts.prefix = this._strippedPrefix; > + } else { > + if (bookmarked) { Error: Unexpected if as the only statement in an else block. [eslint: no-lonely-if] ::: toolkit/components/places/UnifiedComplete.js:2504 (Diff revision 17) > - } else { > + } else { > - query.push(typed ? SQL_TYPED_URL_QUERY > - : SQL_URL_QUERY); > + query.push(SQL_URL_PREFIX_QUERY); > + } > + opts.prefix = this._strippedPrefix; > + } else { > + if (bookmarked) { Error: Unexpected if as the only statement in an else block. [eslint: no-lonely-if] ::: toolkit/components/places/tests/unifiedcomplete/test_swap_protocol.js:5 (Diff revision 17) > /* This Source Code Form is subject to the terms of the Mozilla Public > * License, v. 2.0. If a copy of the MPL was not distributed with this > * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > > +//XXXadw is this right? this test disables autofill. seems like the patch Error: Expected space or tab after '//' in comment. [eslint: spaced-comment] ::: toolkit/components/places/tests/unifiedcomplete/test_swap_protocol.js:7 (Diff revision 17) > * License, v. 2.0. If a copy of the MPL was not distributed with this > * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > > +//XXXadw is this right? this test disables autofill. seems like the patch > +// shouldn't affect non-autofill matches...? > +//XXXadw and at least most/all of these info()s are now wrong, need to be Error: Expected space or tab after '//' in comment. [eslint: spaced-comment] ::: toolkit/content/widgets/autocomplete.xml:1305 (Diff revision 17) > > if (itemExists) { > item = this.richlistbox.childNodes[this._currentIndex]; > > // Url may be a modified version of value, see _adjustACItem(). > + //XXXadw "url" may not be right Error: Expected space or tab after '//' in comment. [eslint: spaced-comment] ::: toolkit/content/widgets/autocomplete.xml:2022 (Diff revision 17) > </method> > > <method name="_reuseAcItem"> > <body> > <![CDATA[ > + //XXXadw don't think "url" is right Error: Expected space or tab after '//' in comment. [eslint: spaced-comment]
Comment on attachment 8930304 [details] Bug 1239708: Improve awesomebar autofill. Part 0: Core changes. https://reviewboard.mozilla.org/r/201438/#review221536 Static analysis found 10 defects in this patch. - 10 defects found by mozlint You can run this analysis locally with: - `./mach lint check path/to/file` (Python/Javascript/wpt) If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx ::: browser/base/content/test/urlbar/browser_autocomplete_enter_race.js:151 (Diff revision 18) > + > + > + > + > + > +//XXXadw Error: Expected space or tab after '//' in comment. [eslint: spaced-comment] ::: browser/base/content/test/urlbar/browser_autocomplete_enter_race.js:168 (Diff revision 18) > + let file = Services.dirsvc.get("ProfD", Ci.nsIFile); > + file.append("places.sqlite"); > + let dbConn = gDBConn = Services.storage.openDatabase(file); > + > + // Be sure to cleanly close this connection. > + promiseTopicObserved("profile-before-change").then(() => dbConn.asyncClose()); Error: 'promiseTopicObserved' is not defined. [eslint: no-undef] ::: toolkit/components/places/UnifiedComplete.js:1622 (Diff revision 18) > + } > + > + let searchStr = this._searchString; > + if (firstSlashIndex >= 0) { > + if (firstSlashIndex != searchStr.length - 1) { > + return; Error: Async method '_matchSearchEngineUrl' expected a return value. [eslint: consistent-return] ::: toolkit/components/places/UnifiedComplete.js:2197 (Diff revision 18) > - let separatorIndex = strippedUrl.slice(searchString.length) > - .search(/[\/\?\#]/); > - if (separatorIndex != -1) { > - separatorIndex += searchString.length; > - if (strippedUrl[separatorIndex] == "/") { > - separatorIndex++; // Include the "/" separator > + let stddevMultiplier = 1.0; > + let threshold = > + PlacesUtils.history.frecencyMean + > + (stddevMultiplier * PlacesUtils.history.frecencyStandardDeviation); > + if (frecency < threshold) { > + //XXXadw if we're currently adding the heuristic result (and i think we Error: Expected space or tab after '//' in comment. [eslint: spaced-comment] ::: toolkit/components/places/UnifiedComplete.js:2454 (Diff revision 18) > + } else { > + query.push(SQL_ORIGIN_PREFIX_QUERY); > + } > + opts.prefix = this._strippedPrefix; > + } else { > - if (bookmarked) { > + if (bookmarked) { Error: Unexpected if as the only statement in an else block. [eslint: no-lonely-if] ::: toolkit/components/places/UnifiedComplete.js:2523 (Diff revision 18) > - } else { > + } else { > - query.push(typed ? SQL_TYPED_URL_QUERY > - : SQL_URL_QUERY); > + query.push(SQL_URL_PREFIX_QUERY); > + } > + opts.prefix = this._strippedPrefix; > + } else { > + if (bookmarked) { Error: Unexpected if as the only statement in an else block. [eslint: no-lonely-if] ::: toolkit/components/places/tests/unifiedcomplete/test_swap_protocol.js:5 (Diff revision 18) > /* This Source Code Form is subject to the terms of the Mozilla Public > * License, v. 2.0. If a copy of the MPL was not distributed with this > * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > > +//XXXadw is this right? this test disables autofill. seems like the patch Error: Expected space or tab after '//' in comment. [eslint: spaced-comment] ::: toolkit/components/places/tests/unifiedcomplete/test_swap_protocol.js:7 (Diff revision 18) > * License, v. 2.0. If a copy of the MPL was not distributed with this > * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > > +//XXXadw is this right? this test disables autofill. seems like the patch > +// shouldn't affect non-autofill matches...? > +//XXXadw and at least most/all of these info()s are now wrong, need to be Error: Expected space or tab after '//' in comment. [eslint: spaced-comment] ::: toolkit/content/widgets/autocomplete.xml:1305 (Diff revision 18) > > if (itemExists) { > item = this.richlistbox.childNodes[this._currentIndex]; > > // Url may be a modified version of value, see _adjustACItem(). > + //XXXadw "url" may not be right Error: Expected space or tab after '//' in comment. [eslint: spaced-comment] ::: toolkit/content/widgets/autocomplete.xml:2022 (Diff revision 18) > </method> > > <method name="_reuseAcItem"> > <body> > <![CDATA[ > + //XXXadw don't think "url" is right Error: Expected space or tab after '//' in comment. [eslint: spaced-comment]
All the tests that failed on the last try push are fixed locally with this... https://treeherder.mozilla.org/#/jobs?repo=try&revision=0d3e2e0b556de50baefc3758d6c109090518f29a
Hi Drew, you'll probably see this on Try, but I added one more use of `moz_updatehostsinsert_temp` to `SyncedBookmarksMirror.jsm` in bug 1305563.
Thanks Kit, very kind of you to let me know.
All tests passing on try with the previous patch. This one finishes cleaning up and is rebased on the current tree, including Kit's comment 88. Another push to try to make sure I didn't screw anything up in the cleanup, and then after that I'll break this apart and ask for review. https://treeherder.mozilla.org/#/jobs?repo=try&revision=aff5492df65314e47f5571cec4c5e4f8963b82dc
Some errors on try. Let's see if this fixes them. https://treeherder.mozilla.org/#/jobs?repo=try&revision=95c512ba97e061ffcd191751b8a82e57209419a1 While manually testing this, I noticed that the urlbar sometimes drops keystrokes, so I'll have to figure that out. Strangely the results in the popup do indicate that the keystrokes are being registered and responded to -- it's just that they don't actually appear in the textbox. It's very obvious when typing "mo". The "o" gets dropped. Maybe something to do with the match value... This is the first time in several days that I've manually tested, so I'm not sure when I broke it.
OK, figured out the problem about dropped characters in the input. I was setting the default index of the search result to 0 even when the autofill result fell below the threshold, so the autocomplete controller was trying to complete a moz-action URI because that happened to be the first result. Funnily enough, I was always testing by typing "moz" and variations, and so the autocomplete controller found that what I was typing was a prefix for moz-action. The last try push was good, so next I will break this up into smaller patches for easier review. One thing I noticed while playing around in a new profile is that, once you have pages with very high frecency, it can be difficult to trigger autofill when you type a non-www version of a site that redirects to a www version. The problem is the mean + standard deviation threshold combined with how we apparently assign frecencies in that case. For example, if you type nytimes.com, it redirects to www.nytimes.com, and both end up in moz_places, but we don't give either one a very high frecency. But if you type www.washingtonpost.com, which doesn't redirect, then that gets a huge frecency. With the mean + standard deviation threshold, neither nytimes.com nor www.nytimes.com autofills. (Or something like that. I'm seeing it slightly differently now that I'm trying to reproduce it.)
Small fix and test for autofilling preloaded top sites. I happened to notice a problem with it while manually testing.
Comment on attachment 8930304 [details] Bug 1239708: Improve awesomebar autofill. Part 0: Core changes. https://reviewboard.mozilla.org/r/201438/#review223736 Static analysis found 1 defect in this patch. - 1 defect found by mozlint You can run this analysis locally with: - `./mach lint check path/to/file` (Python/Javascript/wpt) If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx ::: toolkit/components/places/UnifiedComplete.js:1257 (Diff revision 25) > - this._addMatch(match); > - return true; > - } > > - // If no strict result found - we try loose match > - // regardless of "www." in Preloaded-sites or search string > + let url = matchedSite.uri.spec; > + let [_, value] = stripPrefix(url); Error: '_' is assigned a value but never used. Allowed unused vars must match /^Cc|Ci|Cu|Cr|EXPORTED_SYMBOLS/. [eslint: no-unused-vars]
Marco, I've tried to split the patch into reviewable chunks. I suggest looking at them in order. Part 0 is the core changes to the schema and UnifiedComplete.js. It's not complete by itself because it uses SQL functions that are defined in part 1. The idea is that you can review these core changes on their own in part 0 while not getting distracted with the SQL function implementation details in part 1. Part 0 is the only such non-self-contained patch. Mozreview took part 0 to be a new revision of my WIP patches. It's not, so please compare the latest revision to "orig" to see the right diff. If you more or less accept the core changes in part 0, then the remaining patches are basically follow-ons, except for the frecency stats stuff.
There were strange, unrelated failures on that try push. I'm 100% sure I pulled a bad tree, but here's one more push with a fresh tree to make sure: https://treeherder.mozilla.org/#/jobs?repo=try&revision=05ea2da89b6877df44a5622ca5bfec79fedc16fa
Blocks: 1345133
Comment on attachment 8930304 [details] Bug 1239708: Improve awesomebar autofill. Part 0: Core changes. https://reviewboard.mozilla.org/r/201438/#review227864 Sorry, there is some bitrotting to take care of. I think it's not much though. And I have some questions/doubts. It's possible next parts can better answer them, in case just drop with a comment. ::: toolkit/components/places/UnifiedComplete.js:844 (Diff revision 26) > - this._strippedPrefix = this._trimmedOriginalSearchString.slice( > - 0, this._trimmedOriginalSearchString.length - strippedOriginalSearchString.length > - ).toLowerCase(); > > + let [prefix, suffix] = stripPrefix(this._trimmedOriginalSearchString); > + this._searchString = textURIService.unEscapeURIForUI("UTF-8", suffix); The change to stripPrefix seems to change one of the main behaviors of the address bar, that is ignoring "www.". This mean if I type "www.facebook" we won't match anymore urls like "https://facebook.com/" from history/bookmarks/open-tabs? those common queries look for this._searchTokens.join(" ") and searchToken are directly derived from _searchString. ::: toolkit/components/places/UnifiedComplete.js:1433 (Diff revision 26) > - } > - return false; > + let query, params; > + let firstSlashIndex = this._searchString.indexOf("/"); > + let lastSlashIndex = this._searchString.lastIndexOf("/"); > + if (firstSlashIndex >= 0 && > + (firstSlashIndex != lastSlashIndex || > + firstSlashIndex < this._searchString.length - 1)) { Maybe I'm misreading this, but isn't the second check enough by itself? if the first found slash is the last string char, it's also the only and last slash and firstSlashIndex == lastSlashIndex. Is this for strings like "mydomain.com//"? ::: toolkit/components/places/UnifiedComplete.js:1439 (Diff revision 26) > + [query, params] = this._urlQuery; > + } else { > + [query, params] = this._originQuery; > } > > - let gotResult = false; > + if (query) { in which case we don't have a query? ::: toolkit/components/places/UnifiedComplete.js:1518 (Diff revision 26) > - if (!match) > + // single slash that's not at the end, don't try to match. > + let firstSlashIndex = this._searchString.indexOf("/"); > + let lastSlashIndex = this._searchString.lastIndexOf("/"); > + if (firstSlashIndex != lastSlashIndex || > + (firstSlashIndex >= 0 && > + firstSlashIndex != this._searchString.length - 1)) { sounds like the code could be factored out to an helper (looksLikeHost or something like that) along with the above one for autofill. Maybe it could also return a trimmed string. ::: toolkit/components/places/UnifiedComplete.js:2259 (Diff revision 26) > + // At this point, _searchString is not a URL with a path; it does not > + // contain a slash, except for possibly at the very end. If there is > + // trailing slash, remove it when searching here to match the rest of the > + // string because it may be an origin. > + let searchStr = > + !this._searchString.endsWith("/") ? nit: inverting the condition may make it a bit more readable ::: toolkit/components/places/UnifiedComplete.js:2296 (Diff revision 26) > * database with and an object containing the params to bound. > */ > get _urlQuery() { > - // We expect this to be a full URL, not just a host. We want to extract the > - // host and use that as a guess for whether we'll get a result from a URL > - // query. > + // Assuming the search string is a URL, get the hostname, the part of the > + // search string up to either the path slash or the port colon. > + let hostMatch = /^[^/:]+/.exec(this._searchString); may be worth memoizing the regex ::: toolkit/components/places/nsPlacesTables.h:139 (Diff revision 26) > ", post_data TEXT" \ > ")" \ > ) > > +#define CREATE_MOZ_PREFIXES NS_LITERAL_CSTRING( \ > + "CREATE TABLE moz_prefixes ( " \ the table name is a bit generic, prefixes of what? ::: toolkit/components/places/nsPlacesTables.h:149 (Diff revision 26) > - " id INTEGER PRIMARY KEY" \ > - ", host TEXT NOT NULL UNIQUE" \ > + "id INTEGER PRIMARY KEY, " \ > + "host TEXT NOT NULL UNIQUE " \ > - ", frecency INTEGER" \ > - ", typed INTEGER NOT NULL DEFAULT 0" \ > - ", prefix TEXT" \ > ")" \ I didn't check the next parts for this, but off-hand this is not downgradable since an older version expects frecency, typed and prefix to exist in this table. If we keep it like this, we should relnote that profiles created with Firefox NN are not backwards compatible and we may have to push a patch to ESR to consider them corrupt. ::: toolkit/components/places/nsPlacesTables.h:168 (Diff revision 26) > + "origin_id INTEGER PRIMARY KEY REFERENCES moz_origins(id) ON DELETE CASCADE, " \ > + "prefix TEXT NOT NULL, " \ > + "host TEXT NOT NULL, " \ > + "frecency INTEGER NOT NULL " \ > + ") " \ > + "WITHOUT ROWID" \ you have an integer primary key, so using without rowid is a no-op. WITHOUT ROWID is only useful when the primary key is a string or a compound key. Otherwise ROWID is just an alias of the integer primary key (comes at no additional cost)
Attachment #8930304 - Flags: review?(mak77)
Comment on attachment 8948567 [details] Bug 1239708: Improve awesomebar autofill. Part 1: Core follow-ons. https://reviewboard.mozilla.org/r/217974/#review227878 ::: netwerk/base/nsIBrowserSearchService.idl:160 (Diff revision 1) > > /** > * Gets a string representing the hostname from which search results for a > - * given responseType are returned, minus the leading "www." (if present). > - * This can be specified as an url attribute in the engine description file, > - * but will default to host from the <Url>'s template otherwise. > + * given responseType are returned. This can be specified as an url attribute > + * in the engine description file, but will default to host from the <Url>'s > + * template otherwise. As a side note, just to spread knowledge, I think the way this is currently implemented is not great. We should have a bug on file already. A possible future idea is for engines to define a resultDomain ONLY WHEN they want to be autofilled, and that resultDomain should always be a good landing page. If they don't provide one, we should not autofill. In any case it's the n-th incomplete and disabled feature. ::: toolkit/components/places/Database.h:22 (Diff revision 1) > #include "Shutdown.h" > #include "nsCategoryCache.h" > > // This is the schema version. Update it at any schema change and add a > // corresponding migrateVxx method below. > -#define DATABASE_SCHEMA_VERSION 42 > +#define DATABASE_SCHEMA_VERSION 43 sorry, will have to move to 44. ::: toolkit/components/places/Database.cpp:1281 (Diff revision 1) > // moz_inputhistory. > rv = mMainConn->ExecuteSimpleSQL(CREATE_MOZ_INPUTHISTORY); > NS_ENSURE_SUCCESS(rv, rv); > > - // moz_hosts. > - rv = mMainConn->ExecuteSimpleSQL(CREATE_MOZ_HOSTS); > + // moz_autofill_origins. > + rv = mMainConn->ExecuteSimpleSQL(CREATE_MOZ_AUTOFILL_ORIGINS); what is MOZ_AUTOFILL_ORIGINS? I cannot find its definition. Supposing is another strings populated table, do we have an idea how much larger places.sqlite becomes with these 3 new string tables? ::: toolkit/components/places/SQLFunctions.cpp:276 (Diff revision 1) > + nsAString::const_iterator& iter) > + { > + nsAString::const_iterator end; > + spec.BeginReading(iter); > + spec.EndReading(end); > + if (!FindCharInReadable(':', iter, end)) { nit: schemes have a limited length, maybe we could limit our search to a certain dependentSubstring. The longest IANA scheme atm is 31 chars, we could double that. ::: toolkit/components/places/SQLFunctions.cpp (Diff revision 1) > fixedSpec.Rebind(fixedSpec, 8); > } else if (StringBeginsWith(fixedSpec, NS_LITERAL_CSTRING("ftp://"))) { > fixedSpec.Rebind(fixedSpec, 6); > } > > - if (StringBeginsWith(fixedSpec, NS_LITERAL_CSTRING("www."))) { This is bound to my comment in the first part about www ::: toolkit/components/places/SQLFunctions.cpp:1190 (Diff revision 1) > + } else { > + nsAString::const_iterator start, end; > + start = iter; > + spec.EndReading(end); > + Unused << FindCharInReadable('/', iter, end); > + result->SetAsAString(Substring(start, iter)); what about urls with user:pwd@? ::: toolkit/components/places/SQLFunctions.cpp:1280 (Diff revision 1) > + uint32_t numArgs; > + nsresult rv = aArgs->GetNumEntries(&numArgs); > + NS_ENSURE_SUCCESS(rv, rv); > + MOZ_ASSERT(numArgs == 0); > + > + nsNavHistory *navHistory = nsNavHistory::GetHistoryService(); You should use GetConstHistory and the bool should be atomic, otherwise this doesn't look thread-safe ::: toolkit/components/places/nsNavHistory.cpp:109 (Diff revision 1) > #define PREF_FREC_UNVISITED_TYPED_BONUS_DEF 200 > #define PREF_FREC_RELOAD_VISIT_BONUS "places.frecency.reloadVisitBonus" > #define PREF_FREC_RELOAD_VISIT_BONUS_DEF 0 > > -// This is a 'hidden' pref for the purposes of unit tests. > +// This is a 'hidden' pref for the purposes of unit tests. If you change > +// RATE_DEF, make sure you update the same value in nsPlacesTriggers.h! maybe it shouldn't be a pref, since changing the pref won't fix nsPlacesTriggers.h?
Attachment #8948567 - Flags: review?(mak77)
Comment on attachment 8948567 [details] Bug 1239708: Improve awesomebar autofill. Part 1: Core follow-ons. https://reviewboard.mozilla.org/r/217974/#review227918 ::: toolkit/components/places/Database.cpp:2057 (Diff revision 1) > + rv = mMainConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING( > + "ALTER TABLE moz_places " \ > + "ADD COLUMN origin_id INTEGER REFERENCES moz_origins(id) ON DELETE CASCADE" > + )); > + NS_ENSURE_SUCCESS(rv, rv); > + } Shouldn't the migration populate these tables too?
Maybe we should first prepare the data, like having a separate bug for the migration (creating the new tables, populating them, keeping them updated, handling downgrades, considering db growth and expiration), and then this bug would just make use of those.
Comment on attachment 8948568 [details] Bug 1239708: Improve awesomebar autofill. Part 2: Non-core follow-ons. https://reviewboard.mozilla.org/r/217976/#review228024
Attachment #8948568 - Flags: review?(mak77) → review+
Comment on attachment 8948569 [details] Bug 1239708: Improve awesomebar autofill. Part 3: Front-end changes. https://reviewboard.mozilla.org/r/217978/#review228044 ::: toolkit/content/widgets/autocomplete.xml:1297 (Diff revision 1) > let originalValue, originalText, originalType; > - let value = controller.getValueAt(this._currentIndex); > + let style = controller.getStyleAt(this._currentIndex); > + let value = > + style && > + style.includes("autofill") && > + style.includes("heuristic") ? I'd like to know a bit more about this change, was it just wrong?
Attachment #8948569 - Flags: review?(mak77) → review+
Comment on attachment 8948571 [details] Bug 1239708: Improve awesomebar autofill. Part 5: xpcshell tests. https://reviewboard.mozilla.org/r/217982/#review228062 tests outcome depend on previous parts correctness, clearing for now.
Attachment #8948571 - Flags: review?(mak77)
Comment on attachment 8948572 [details] Bug 1239708: Improve awesomebar autofill. Part 6: Browser tests. https://reviewboard.mozilla.org/r/217984/#review228064
Attachment #8948572 - Flags: review?(mak77)
Thanks Marco. Responding to the bigger questions/concerns in your comments: (In reply to Marco Bonardo [::mak] from comment #114) > ::: toolkit/components/places/UnifiedComplete.js:844 > (Diff revision 26) > > - this._strippedPrefix = this._trimmedOriginalSearchString.slice( > > - 0, this._trimmedOriginalSearchString.length - strippedOriginalSearchString.length > > - ).toLowerCase(); > > > > + let [prefix, suffix] = stripPrefix(this._trimmedOriginalSearchString); > > + this._searchString = textURIService.unEscapeURIForUI("UTF-8", suffix); > > The change to stripPrefix seems to change one of the main behaviors of the > address bar, that is ignoring "www.". This mean if I type "www.facebook" we > won't match anymore urls like "https://facebook.com/" from > history/bookmarks/open-tabs? Yes, I don't think we should match non-www domains when you type www. We talked about this in comment 29 (last bullet point) and comment 30 and you seemed to agree I think? This is one of the big changes in the patches. (But we continue to match www domains when you don't type www.) > ::: toolkit/components/places/nsPlacesTables.h:149 > (Diff revision 26) > > - " id INTEGER PRIMARY KEY" \ > > - ", host TEXT NOT NULL UNIQUE" \ > > + "id INTEGER PRIMARY KEY, " \ > > + "host TEXT NOT NULL UNIQUE " \ > > - ", frecency INTEGER" \ > > - ", typed INTEGER NOT NULL DEFAULT 0" \ > > - ", prefix TEXT" \ > > ")" \ > > I didn't check the next parts for this, but off-hand this is not > downgradable since an older version expects frecency, typed and prefix to > exist in this table. OK, I'll have to make a moz_hosts2 (or something like that) table. I didn't mean to break downgrading. (In reply to Marco Bonardo [::mak] from comment #115) > ::: toolkit/components/places/Database.cpp:1281 > (Diff revision 1) > > // moz_inputhistory. > > rv = mMainConn->ExecuteSimpleSQL(CREATE_MOZ_INPUTHISTORY); > > NS_ENSURE_SUCCESS(rv, rv); > > > > - // moz_hosts. > > - rv = mMainConn->ExecuteSimpleSQL(CREATE_MOZ_HOSTS); > > + // moz_autofill_origins. > > + rv = mMainConn->ExecuteSimpleSQL(CREATE_MOZ_AUTOFILL_ORIGINS); > > what is MOZ_AUTOFILL_ORIGINS? I cannot find its definition. It's defined in the part 0 patch. It's the non-normalized table used for autofilling origins. We touched on this in comment 39 and comment 40. > "CREATE TABLE moz_autofill_origins ( " \ > "origin_id INTEGER PRIMARY KEY REFERENCES moz_origins(id) ON DELETE CASCADE, " \ > "prefix TEXT NOT NULL, " \ > "host TEXT NOT NULL, " \ > "frecency INTEGER NOT NULL " \ > ") " \ > Supposing is another strings populated table, do we have an idea how much > larger places.sqlite becomes with these 3 new string tables? I added three new normalized tables: moz_prefixes, moz_hosts (a new version, probably to be renamed as mentioned above), moz_origins. They're normalized so there's no string duplication among them. But that does (basically) duplicate the rev_host columns we have in various tables. > ::: toolkit/components/places/SQLFunctions.cpp:1190 > (Diff revision 1) > > + } else { > > + nsAString::const_iterator start, end; > > + start = iter; > > + spec.EndReading(end); > > + Unused << FindCharInReadable('/', iter, end); > > + result->SetAsAString(Substring(start, iter)); > > what about urls with user:pwd@? Good point, I thought about that too but I didn't handle them anywhere. I'm not sure whether they're worth handling, but I should make sure that none of the string parsing is tripped up by them. > ::: toolkit/components/places/nsNavHistory.cpp:109 > (Diff revision 1) > > #define PREF_FREC_UNVISITED_TYPED_BONUS_DEF 200 > > #define PREF_FREC_RELOAD_VISIT_BONUS "places.frecency.reloadVisitBonus" > > #define PREF_FREC_RELOAD_VISIT_BONUS_DEF 0 > > > > -// This is a 'hidden' pref for the purposes of unit tests. > > +// This is a 'hidden' pref for the purposes of unit tests. If you change > > +// RATE_DEF, make sure you update the same value in nsPlacesTriggers.h! > > maybe it shouldn't be a pref, since changing the pref won't fix > nsPlacesTriggers.h? OK, makes sense. (In reply to Marco Bonardo [::mak] from comment #116) > ::: toolkit/components/places/Database.cpp:2057 > (Diff revision 1) > > + rv = mMainConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING( > > + "ALTER TABLE moz_places " \ > > + "ADD COLUMN origin_id INTEGER REFERENCES moz_origins(id) ON DELETE CASCADE" > > + )); > > + NS_ENSURE_SUCCESS(rv, rv); > > + } > > Shouldn't the migration populate these tables too? Yes, probably! (In reply to Marco Bonardo [::mak] from comment #119) > ::: toolkit/content/widgets/autocomplete.xml:1297 > (Diff revision 1) > > let originalValue, originalText, originalType; > > - let value = controller.getValueAt(this._currentIndex); > > + let style = controller.getStyleAt(this._currentIndex); > > + let value = > > + style && > > + style.includes("autofill") && > > + style.includes("heuristic") ? > > I'd like to know a bit more about this change, was it just wrong? IIRC basically, yes, but: For autofill results, the "value" is the autofilled string. So if you type moz and we autofill to mozilla.org, "mozilla.org" is the value. But the "final complete value" may actually be www.mozilla.org (more accurately https://www.mozilla.org/). So it's more correct to get the URL/value from the final complete value, especially now.
(In reply to Drew Willcoxon :adw from comment #122) > Yes, I don't think we should match non-www domains when you type www. We > talked about this in comment 29 (last bullet point) and comment 30 and you > seemed to agree I think? This is one of the big changes in the patches. > (But we continue to match www domains when you don't type www.) Heh, my comment was about autofill simplification, I didn't think we were going to stop fuzzy searching in the other sources, so I didn't brainstorm it enough. We'd be losing some search capability. What are the other browsers doing? Off-hand looks like Chrome and Edge don't ignore "www.". We also had bugs filed in the past about "I don't understand why I typed "www.fa" and I was suggestes "www.doremifa". Maybe we could re-evaluate providing suggested urls with search suggestions when www. is typed, it seems useful in Edge to type "www.f" and get some popular pages suggested. All in all, I'm not against the change, I'll think about it in background to better evaluate what we lose and what we gain. > > I didn't check the next parts for this, but off-hand this is not > > downgradable since an older version expects frecency, typed and prefix to > > exist in this table. > > OK, I'll have to make a moz_hosts2 (or something like that) table. I didn't > mean to break downgrading. Yes, this is what we were discussing originally, any change to moz_host schema will break downgrades. We can freeze this table contents, or update it with "bogus" data, the old version may return wrong results in certain cases, but it's better than being broken. We can't remove columns from it. And this is why I care about understanding the disk size of the new information, because for a while we'll have to keep around both old and new info :( Also, if we freeze the contents of the old moz_hosts, we'd have a privacy hit. So, question, what do we break if we keep the table but empty it on migration to this new version? Who is today using moz_hosts apart from autofill? > > what is MOZ_AUTOFILL_ORIGINS? I cannot find its definition. > > It's defined in the part 0 patch. It's the non-normalized table used for > autofilling origins. We touched on this in comment 39 and comment 40. Ah sorry, find in page broke for some reason :( > I added three new normalized tables: moz_prefixes, moz_hosts (a new version, > probably to be renamed as mentioned above), moz_origins. They're normalized > so there's no string duplication among them. But that does (basically) > duplicate the rev_host columns we have in various tables. Could you please take your default profile (I suppose it's around 70MB like mine) add these tables, populate them, and calculate the % of growth? We'll need to adapt expiration. We'll also need to make a plan for removal of the old data.
Comment on attachment 8948570 [details] Bug 1239708: Improve awesomebar autofill. Part 4: Frecency stats. https://reviewboard.mozilla.org/r/217980/#review228168 ::: toolkit/components/places/SQLFunctions.cpp:1329 (Diff revision 1) > + int32_t oldFrecency = aArgs->AsInt32(1); > + int32_t newFrecency = aArgs->AsInt32(2); > + > + nsNavHistory* navHistory = nsNavHistory::GetHistoryService(); > + NS_ENSURE_STATE(navHistory); > + navHistory->UpdateFrecencyStats(placeID, oldFrecency, newFrecency); I'm not 100% sure this is thread-safe, because the history service is not we have the const history service to be used off the main thread. From the const service you could likely fire a runnable/runnablemethod on whatever thread you prefer.
Attachment #8948570 - Flags: review?(mak77)
(In reply to Marco Bonardo [::mak] from comment #123) > (In reply to Drew Willcoxon :adw from comment #122) > > Yes, I don't think we should match non-www domains when you type www. We > > talked about this in comment 29 (last bullet point) and comment 30 and you > > seemed to agree I think? This is one of the big changes in the patches. > > (But we continue to match www domains when you don't type www.) > > Heh, my comment was about autofill simplification, I didn't think we were > going to stop fuzzy searching in the other sources, so I didn't brainstorm > it enough. We'd be losing some search capability. > What are the other browsers doing? Off-hand looks like Chrome and Edge don't > ignore "www.". > We also had bugs filed in the past about "I don't understand why I typed > "www.fa" and I was suggestes "www.doremifa". That wouldn't happen with this patch. You would only get domains that started exactly with "www.fa". (Would that problem even happen with Firefox today? We strip the www so all domains that start with "fa" are also eligible to be matched, but not domains that don't start with "fa" or "www.fa", right? (Speaking of domain/host matching only, not non-boundary matching.)) > Maybe we could re-evaluate providing suggested urls with search suggestions > when www. is typed, it seems useful in Edge to type "www.f" and get some > popular pages suggested. Yes, I agree, we should do that. The pre-loaded sites functionality helps with that, but I think it's turned off on release? > Also, if we freeze the contents of the old moz_hosts, we'd have a privacy > hit. Is that because you're assuming clearing history, page expiration, etc. wouldn't touch the old table? Not sure I understand. > So, question, what do we break if we keep the table but empty it on > migration to this new version? Who is today using moz_hosts apart from > autofill? I'd have to check. > Could you please take your default profile (I suppose it's around 70MB like > mine) add these tables, populate them, and calculate the % of growth? Will do.
(In reply to Drew Willcoxon :adw from comment #125) > > We also had bugs filed in the past about "I don't understand why I typed > > "www.fa" and I was suggestes "www.doremifa". > > That wouldn't happen with this patch. You would only get domains that > started exactly with "www.fa". (Would that problem even happen with Firefox > today? We strip the www so all domains that start with "fa" are also > eligible to be matched, but not domains that don't start with "fa" or > "www.fa", right? (Speaking of domain/host matching only, not non-boundary > matching.)) It does happen in Firefox today, not for autofill though. It happens for normal history/bookmarks entries. So, all in all I think, also considered consistency with other browsers, it's ok to proceed without ignoring www. > > Maybe we could re-evaluate providing suggested urls with search suggestions > > when www. is typed, it seems useful in Edge to type "www.f" and get some > > popular pages suggested. > > Yes, I agree, we should do that. The pre-loaded sites functionality helps > with that, but I think it's turned off on release? It's turned off everywhere now, because it was confusing with the current list. Maybe we should re-evaluate that feature vs search suggestions indeed. > > Also, if we freeze the contents of the old moz_hosts, we'd have a privacy > > hit. > > Is that because you're assuming clearing history, page expiration, etc. > wouldn't touch the old table? Not sure I understand. If you remove OLD moz_host updating from triggers, then OLD moz_host contents won't be removed when pages go away. that means I remove www.embarassingsite.com from history, but moz_host still contains embarassingsite.com
See Also: → 1439007
WIP unbitrotting, addressing comments
Attachment #8930304 - Flags: review?(mak77)
Attachment #8948567 - Flags: review?(mak77)
Attachment #8948570 - Flags: review?(mak77)
Attachment #8948571 - Flags: review?(mak77)
Attachment #8948572 - Flags: review?(mak77)
(In reply to Marco Bonardo [::mak] from comment #114) > ::: toolkit/components/places/UnifiedComplete.js:1433 > (Diff revision 26) > > - } > > - return false; > > + let query, params; > > + let firstSlashIndex = this._searchString.indexOf("/"); > > + let lastSlashIndex = this._searchString.lastIndexOf("/"); > > + if (firstSlashIndex >= 0 && > > + (firstSlashIndex != lastSlashIndex || > > + firstSlashIndex < this._searchString.length - 1)) { > > Maybe I'm misreading this, but isn't the second check enough by itself? > if the first found slash is the last string char, it's also the only and > last slash and firstSlashIndex == lastSlashIndex. You're right, fixed > ::: toolkit/components/places/UnifiedComplete.js:1439 > (Diff revision 26) > > + [query, params] = this._urlQuery; > > + } else { > > + [query, params] = this._originQuery; > > } > > > > - let gotResult = false; > > + if (query) { > > in which case we don't have a query? The first thing _urlQuery does is match the search string against a regexp, and if there's no match, it returns a null query. That seems like the sanest way to handle that case. The regexp is /^[^/:]+/, so if you type "/" or ":", you'll hit that case. > ::: toolkit/components/places/UnifiedComplete.js:1518 > (Diff revision 26) > > - if (!match) > > + // single slash that's not at the end, don't try to match. > > + let firstSlashIndex = this._searchString.indexOf("/"); > > + let lastSlashIndex = this._searchString.lastIndexOf("/"); > > + if (firstSlashIndex != lastSlashIndex || > > + (firstSlashIndex >= 0 && > > + firstSlashIndex != this._searchString.length - 1)) { > > sounds like the code could be factored out to an helper (looksLikeHost or > something like that) along with the above one for autofill. Maybe it could > also return a trimmed string. Done... It was a challenge to name this function. I don't want it to be mistaken for a general-purpose host-parsing function. All it does is look for slashes. It doesn't actually have any smarts for detecting hosts at all. You shouldn't pass full URLs to it, or even origins. It's very particular for these two cases. So I named it trimTrailingSlashIfOnlySlash. > ::: toolkit/components/places/UnifiedComplete.js:2259 > (Diff revision 26) > > + // At this point, _searchString is not a URL with a path; it does not > > + // contain a slash, except for possibly at the very end. If there is > > + // trailing slash, remove it when searching here to match the rest of the > > + // string because it may be an origin. > > + let searchStr = > > + !this._searchString.endsWith("/") ? > > nit: inverting the condition may make it a bit more readable Done > ::: toolkit/components/places/UnifiedComplete.js:2296 > (Diff revision 26) > > * database with and an object containing the params to bound. > > */ > > get _urlQuery() { > > - // We expect this to be a full URL, not just a host. We want to extract the > > - // host and use that as a guess for whether we'll get a result from a URL > > - // query. > > + // Assuming the search string is a URL, get the hostname, the part of the > > + // search string up to either the path slash or the port colon. > > + let hostMatch = /^[^/:]+/.exec(this._searchString); > > may be worth memoizing the regex Done > ::: toolkit/components/places/nsPlacesTables.h:139 > (Diff revision 26) > > ", post_data TEXT" \ > > ")" \ > > ) > > > > +#define CREATE_MOZ_PREFIXES NS_LITERAL_CSTRING( \ > > + "CREATE TABLE moz_prefixes ( " \ > > the table name is a bit generic, prefixes of what? Renamed to moz_origin_prefixes > ::: toolkit/components/places/nsPlacesTables.h:149 > (Diff revision 26) > > - " id INTEGER PRIMARY KEY" \ > > - ", host TEXT NOT NULL UNIQUE" \ > > + "id INTEGER PRIMARY KEY, " \ > > + "host TEXT NOT NULL UNIQUE " \ > > - ", frecency INTEGER" \ > > - ", typed INTEGER NOT NULL DEFAULT 0" \ > > - ", prefix TEXT" \ > > ")" \ > > I didn't check the next parts for this, but off-hand this is not > downgradable since an older version expects frecency, typed and prefix to > exist in this table. I kept the existing moz_hosts and renamed my new table moz_origin_hosts. So now we have moz_origins, moz_origin_prefixes, and moz_origin_hosts. > ::: toolkit/components/places/nsPlacesTables.h:168 > (Diff revision 26) > > + "origin_id INTEGER PRIMARY KEY REFERENCES moz_origins(id) ON DELETE CASCADE, " \ > > + "prefix TEXT NOT NULL, " \ > > + "host TEXT NOT NULL, " \ > > + "frecency INTEGER NOT NULL " \ > > + ") " \ > > + "WITHOUT ROWID" \ > > you have an integer primary key, so using without rowid is a no-op. > WITHOUT ROWID is only useful when the primary key is a string or a compound > key. Otherwise ROWID is just an alias of the integer primary key (comes at > no additional cost) Fixed (In reply to Marco Bonardo [::mak] from comment #115) > ::: toolkit/components/places/Database.h:22 > (Diff revision 1) > > #include "Shutdown.h" > > #include "nsCategoryCache.h" > > > > // This is the schema version. Update it at any schema change and add a > > // corresponding migrateVxx method below. > > -#define DATABASE_SCHEMA_VERSION 42 > > +#define DATABASE_SCHEMA_VERSION 43 > > sorry, will have to move to 44. Fixed > ::: toolkit/components/places/SQLFunctions.cpp:276 > (Diff revision 1) > > + nsAString::const_iterator& iter) > > + { > > + nsAString::const_iterator end; > > + spec.BeginReading(iter); > > + spec.EndReading(end); > > + if (!FindCharInReadable(':', iter, end)) { > > nit: schemes have a limited length, maybe we could limit our search to a > certain dependentSubstring. The longest IANA scheme atm is 31 chars, we > could double that. Good idea, done. (In my quick look, it looks like 30 characters is the longest.) I chose 64 as a nice round number. > ::: toolkit/components/places/SQLFunctions.cpp:1190 > (Diff revision 1) > > + } else { > > + nsAString::const_iterator start, end; > > + start = iter; > > + spec.EndReading(end); > > + Unused << FindCharInReadable('/', iter, end); > > + result->SetAsAString(Substring(start, iter)); > > what about urls with user:pwd@? I commented earlier that I intentionally didn't handle these specially -- that's still true, but I went looked at the patch to check how such URLs would be handled, and to make sure my parsing code is OK. They'll end up getting treated as separate origins. So if you have http://example.com/, http://user@example.com/, and http://user:pass@example.com/, then you will end up with three different origins: http://example.com, http://user@example.com, and http://user:pass@example.com. I think that's OK. > ::: toolkit/components/places/SQLFunctions.cpp:1280 > (Diff revision 1) > > + uint32_t numArgs; > > + nsresult rv = aArgs->GetNumEntries(&numArgs); > > + NS_ENSURE_SUCCESS(rv, rv); > > + MOZ_ASSERT(numArgs == 0); > > + > > + nsNavHistory *navHistory = nsNavHistory::GetHistoryService(); > > You should use GetConstHistory and the bool should be atomic, otherwise this > doesn't look thread-safe Whoops, fixed > ::: toolkit/components/places/nsNavHistory.cpp:109 > (Diff revision 1) > > #define PREF_FREC_UNVISITED_TYPED_BONUS_DEF 200 > > #define PREF_FREC_RELOAD_VISIT_BONUS "places.frecency.reloadVisitBonus" > > #define PREF_FREC_RELOAD_VISIT_BONUS_DEF 0 > > > > -// This is a 'hidden' pref for the purposes of unit tests. > > +// This is a 'hidden' pref for the purposes of unit tests. If you change > > +// RATE_DEF, make sure you update the same value in nsPlacesTriggers.h! > > maybe it shouldn't be a pref, since changing the pref won't fix > nsPlacesTriggers.h? It looks like it's only a pref so that tests can tweak it. The thing that I was worrying about with this is that there are two definitions of the rate in two different files. So I moved (and renamed) this define to nsPlacesTriggers.h, and then here in nsNavHistory.cpp I include nsPlacesTriggers.h and use that define. (In reply to Marco Bonardo [::mak] from comment #116) > ::: toolkit/components/places/Database.cpp:2057 > (Diff revision 1) > > + rv = mMainConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING( > > + "ALTER TABLE moz_places " \ > > + "ADD COLUMN origin_id INTEGER REFERENCES moz_origins(id) ON DELETE CASCADE" > > + )); > > + NS_ENSURE_SUCCESS(rv, rv); > > + } > > Shouldn't the migration populate these tables too? Done, and updated test_current_from_v42.js to make sure it works (In reply to Marco Bonardo [::mak] from comment #123) > So, question, what do we break if we keep the table but empty it on > migration to this new version? Who is today using moz_hosts apart from > autofill? No one! Good news. So I updated patch 1 to empty moz_hosts as part of the migration. > > I added three new normalized tables: moz_prefixes, moz_hosts (a new version, > > probably to be renamed as mentioned above), moz_origins. They're normalized > > so there's no string duplication among them. But that does (basically) > > duplicate the rev_host columns we have in various tables. > > Could you please take your default profile (I suppose it's around 70MB like > mine) add these tables, populate them, and calculate the % of growth? > We'll need to adapt expiration. > We'll also need to make a plan for removal of the old data. Still need to do this, but I've updated my patches for review in the meantime. (In reply to Marco Bonardo [::mak] from comment #124) > ::: toolkit/components/places/SQLFunctions.cpp:1329 > (Diff revision 1) > > + int32_t oldFrecency = aArgs->AsInt32(1); > > + int32_t newFrecency = aArgs->AsInt32(2); > > + > > + nsNavHistory* navHistory = nsNavHistory::GetHistoryService(); > > + NS_ENSURE_STATE(navHistory); > > + navHistory->UpdateFrecencyStats(placeID, oldFrecency, newFrecency); > > I'm not 100% sure this is thread-safe, because the history service is not we > have the const history service to be used off the main thread. > From the const service you could likely fire a runnable/runnablemethod on > whatever thread you prefer. I changed this to dispatch a runnable on the main thread. The runnable updates the stats and saves them to preferences. (Previously I had a runnable that only saved them to preferences, so now I'm doing both things in the runnable.) However, I'm still setting a member variable in this method, mWeakUpdateFrecencyStatsRunnable. I think that should be OK? The only thing calling this method is whatever thread the database is using when the trigger runs. If that's a problem, then I could forget about trying to batch frecency stats updates into a single runnable and just dispatch a new runnable per frecency change.
(In reply to Drew Willcoxon :adw from comment #141) > (In reply to Marco Bonardo [::mak] from comment #124) > > ::: toolkit/components/places/SQLFunctions.cpp:1329 > > (Diff revision 1) > > > + int32_t oldFrecency = aArgs->AsInt32(1); > > > + int32_t newFrecency = aArgs->AsInt32(2); > > > + > > > + nsNavHistory* navHistory = nsNavHistory::GetHistoryService(); > > > + NS_ENSURE_STATE(navHistory); > > > + navHistory->UpdateFrecencyStats(placeID, oldFrecency, newFrecency); > > > > I'm not 100% sure this is thread-safe, because the history service is not we > > have the const history service to be used off the main thread. > > From the const service you could likely fire a runnable/runnablemethod on > > whatever thread you prefer. > > I changed this to dispatch a runnable on the main thread. The runnable > updates the stats and saves them to preferences. (Previously I had a > runnable that only saved them to preferences, so now I'm doing both things > in the runnable.) However, I'm still setting a member variable in this > method, mWeakUpdateFrecencyStatsRunnable. I think that should be OK? The > only thing calling this method is whatever thread the database is using when > the trigger runs. I thought about this more and remembered why I did it like that originally. The member variables that are updated are all atomic, so there's no danger of their being individually not thread safe. The only thread safety issue is reading the stats on the main thread while the individual member variables that are used to compute those stats are being written on the connection thread. But that's fine IMO -- a stat's numeric value may be slightly off when the connection thread is updating the members, and that's not really a problem. What do you think?
We are still missing a measure of the DB growth with the changes, to see if we need to adapt expiration. I'll proceed with the review in the meanwhile. (In reply to Drew Willcoxon :adw from comment #142) > I thought about this more and remembered why I did it like that originally. > The member variables that are updated are all atomic, so there's no danger > of their being individually not thread safe. The only thread safety issue > is reading the stats on the main thread while the individual member > variables that are used to compute those stats are being written on the > connection thread. But that's fine IMO -- a stat's numeric value may be > slightly off when the connection thread is updating the members, and that's > not really a problem. What do you think? Apart from the variables (solvable with atomics), there are 2 risks here: 1. the nsNavHistory destructor should not run out of the main-thread. It's possible the service manager keeps it alive long enough to avoid the problem... it's worth at least to have a MOZ_ASSERT checking mainthread in the destructor. Otherwise we should kungfudeathgrip it and ProxyFinalize. 2. In the future someone could change the UpdateFrecencyStats code in a non thread-safe manner. For this we could add a big warning into it.
(In reply to Drew Willcoxon :adw from comment #141) > I commented earlier that I intentionally didn't handle these specially -- > that's still true, but I went looked at the patch to check how such URLs > would be handled, and to make sure my parsing code is OK. They'll end up > getting treated as separate origins. So if you have http://example.com/, > http://user@example.com/, and http://user:pass@example.com/, then you will > end up with three different origins: http://example.com, > http://user@example.com, and http://user:pass@example.com. there is a security/privacy problem that sooner or later we'll hit, bug 130327. The fact we currently store user and password in history is a bug, and we'll now forward that bug to this new table. The problem complicates if we then allow to store bookmarks with credentials, but not history, since then we'll have only to merge some of these entries. I think the main concern is how expensive it will be in the future to identify and remove entries with credentials that are not bookmarks, but likely it's the same cost as removing them from moz_places. Thus, I suppose we can leave this problem to the future.
Comment on attachment 8930304 [details] Bug 1239708: Improve awesomebar autofill. Part 0: Core changes. https://reviewboard.mozilla.org/r/201438/#review233914 ::: toolkit/components/places/Database.cpp (Diff revision 27) > - rv = mMainConn->CreateAsyncStatement(NS_LITERAL_CSTRING( > - "UPDATE moz_hosts " > - "SET prefix = (" HOSTS_PREFIX_PRIORITY_FRAGMENT ") " > - "WHERE host IN (SELECT host FROM moz_migrate_v32_temp) " > - ), getter_AddRefs(updateHostsStmt)); > - NS_ENSURE_SUCCESS(rv, rv); From what I got you're leaving moz_hosts as is for now, just emptying it, thus this change doesn't seem necessary. and anyway a db that is upgraded to v32 would have a normal moz_hosts table. ::: toolkit/components/places/UnifiedComplete.js:716 (Diff revision 27) > - * @param spec > - * The text to modify. > - * @return the modified spec. > - */ > -function stripPrefix(spec) { > - ["http://", "https://", "ftp://"].some(scheme => { > + * @param str > + * The possible URL to strip. > + * @return If `str` is a URL, then [prefix, remainder]. Otherwise, ["", str]. > + */ > +function stripPrefix(str) { > + let match = /^[a-zA-Z]+:(?:\/\/)?/.exec(str); likely can memoize this regex too ::: toolkit/components/places/UnifiedComplete.js:2302 (Diff revision 27) > * @return an array consisting of the correctly optimized query to search the > * database with and an object containing the params to bound. > */ > get _urlQuery() { > - // We expect this to be a full URL, not just a host. We want to extract the > - // host and use that as a guess for whether we'll get a result from a URL > + // Assuming the search string is a URL, get the hostname, the part of the > + // search string up to either the path slash or the port colon. this comment could be a bit clearer ::: toolkit/components/places/UnifiedComplete.js:2313 (Diff revision 27) > - .substring(this._strippedPrefix.length, pathIndex) > - .toLowerCase().split("").reverse().join("") + "."; > - let searchString = stripPrefix( > - this._trimmedOriginalSearchString.slice(0, pathIndex).toLowerCase() + > - this._trimmedOriginalSearchString.slice(pathIndex) > - ); > + return [null, null]; > + } > + > + let bookmarked = > + this.hasBehavior("bookmark") && > + !this.hasBehavior("history"); this can take less space than 3 lines, and likely can be moved below before its use ::: toolkit/components/places/UnifiedComplete.js:2348 (Diff revision 27) > + query.push(SQL_URL_BOOKMARKED_QUERY); > + } else { > + query.push(SQL_URL_QUERY); > } > > - query.push({ > + query.push(opts); looks like you could rewrite this more compact using early returns: if (this._strippedPrefix) { opts.prefix = this._strippedPrefix; if (bookmarked) { return [SQL_URL_PREFIX_BOOKMARKED_QUERY, opts]; } return [SQL_URL_PREFIX_QUERY, opts]; } if (bookmarked) { return [... } return [... The same can be done in get _originQuery ::: toolkit/components/places/nsPlacesTables.h:178 (Diff revision 27) > + "CREATE TABLE moz_autofill_origins ( " \ > + "origin_id INTEGER PRIMARY KEY REFERENCES moz_origins(id) ON DELETE CASCADE, " \ > + "prefix TEXT NOT NULL, " \ > + "host TEXT NOT NULL, " \ > + "frecency INTEGER NOT NULL " \ > + ")" \ Ok, so as I said on IRC, I'm confused by the schema. It looks like moz_autofill_origins and moz_origins point at the same information, one normalized one denormalized. Like it is each of prefix and host are repeated 4 times in the db file (1 data + 1 index per table). likely not a big deal for prefix, but host is a bit larger. If effectively the cardinality of moz_autofill_origins is the same as moz_origins, the normalization is not saving anything, since then we also denormalize. At this point it's better to take the pill and accept we need the denormalized table. Though, we could still normalize prefix, it will be a very small table (I think?) and Sqlite should not suffer much from that joining.
Comment on attachment 8948567 [details] Bug 1239708: Improve awesomebar autofill. Part 1: Core follow-ons. https://reviewboard.mozilla.org/r/217974/#review233974 ::: toolkit/components/places/Database.h:22 (Diff revision 3) > #include "Shutdown.h" > #include "nsCategoryCache.h" > > // This is the schema version. Update it at any schema change and add a > // corresponding migrateVxx method below. > -#define DATABASE_SCHEMA_VERSION 43 > +#define DATABASE_SCHEMA_VERSION 44 this bitrotted a couple times, so now we are at version 46. Sorry. I don't know of other imminent schema changes... ::: toolkit/components/places/Database.cpp (Diff revision 3) > NS_ENSURE_SUCCESS(rv, rv); > > mozIStorageBaseStatement *stmts[] = { > expireOrphansStmt, > deleteHostsStmt, > - updateHostsStmt, probably should not touch this, it doesn't matter, but in any case v32 operates on old dbs that should have no problems with moz_host ::: toolkit/components/places/Database.cpp:1958 (Diff revision 3) > nsresult > +Database::MigrateV44Up() { > + MOZ_ASSERT(NS_IsMainThread()); > + > + nsresult rv; > + nsCOMPtr<mozIStorageStatement> stmt; We need a guessed measure of the time for this migration on a mechanical disk... do you have a way to measure that with your personal profile? Just because if it's extremely slow we may need to act in chunks or asynchronously) ::: toolkit/components/places/Database.cpp:1968 (Diff revision 3) > + ), getter_AddRefs(stmt)); > + if (NS_FAILED(rv)) { > + rv = mMainConn->ExecuteSimpleSQL(CREATE_MOZ_ORIGIN_PREFIXES); > + NS_ENSURE_SUCCESS(rv, rv); > + rv = mMainConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING( > + "INSERT OR IGNORE INTO moz_origin_prefixes (prefix) " \ OK, this won't be funny. the problem is that a user can upgrade to this scheme, then downgrade (because the browser crashes on startup or has graphical problems), use the old version for a while, then upgrade again... That means it's true the migration should not recreate the table, but it will regardless have to add entries that the old version didn't add. Fwiw, you could just use origin_id to tell both if origin_id and tables must be created... In any case data addition has to happen.
Attachment #8948567 - Flags: review?(mak77)
Blocks: 1441561
Comment on attachment 8948570 [details] Bug 1239708: Improve awesomebar autofill. Part 4: Frecency stats. https://reviewboard.mozilla.org/r/217980/#review234102 ::: toolkit/components/places/SQLFunctions.cpp:1329 (Diff revision 3) > + uint32_t numArgs; > + nsresult rv = aArgs->GetNumEntries(&numArgs); > + NS_ENSURE_SUCCESS(rv, rv); > + MOZ_ASSERT(numArgs == 3); > + > + int64_t placeID = aArgs->AsInt64(0); nit: let's use the usual camelCase even in cpp, old code did it differently but it's unimportant. ::: toolkit/components/places/UnifiedComplete.js:2036 (Diff revision 3) > > _addAutofillMatch(autofilledValue, finalCompleteValue, frecency, extraStyles = []) { > + let threshold = > + PlacesUtils.history.frecencyMean + > + PlacesUtils.history.frecencyStandardDeviation; > + if (frecency < threshold) { Is there a reason to not filter on frecency at the query level? ::: toolkit/components/places/nsNavHistory.h:504 (Diff revision 3) > + * @param aOldFrecency > + * The old value of the frecency. > + * @param aNewFrecency > + * The new value of the frecency. > + */ > + void UpdateFrecencyStats(int64_t placeID, nit: ditto on camelCase ::: toolkit/components/places/nsNavHistory.cpp:276 (Diff revision 3) > // nsNavBookmarks::kGetChildrenIndex_Position = 19; > // nsNavBookmarks::kGetChildrenIndex_Type = 20; > // nsNavBookmarks::kGetChildrenIndex_PlaceID = 21; > > -PLACES_FACTORY_SINGLETON_IMPLEMENTATION(nsNavHistory, gHistoryService) > +static uint64_t > +GetUInt64Pref(const char *prefName) Considered this complication, is it worth to store these in prefs, rather than just have defines? Do you think we may want to tweak the values through Shield? ::: toolkit/components/places/nsNavHistory.cpp:666 (Diff revision 3) > aGUID, aHidden, > aLastVisitDate); > (void)NS_DispatchToMainThread(notif); > } > > +class PlacesUpdateFrecencyStatsRunnable : public Runnable I suspect using NewRunnableMethod would simplify the code, you wouldn't need a class and a weakptr, and you can pass arguments to the runnable ::: toolkit/components/places/nsNavHistory.cpp:690 (Diff revision 3) > +void > +nsNavHistory::UpdateFrecencyStats(int64_t placeID, > + int32_t aOldFrecency, > + int32_t aNewFrecency) > +{ > + if (placeID < 0) { should we also MOZ_ASSERT? Is this something that can happen in valid use cases? ::: toolkit/components/places/nsNavHistory.cpp:702 (Diff revision 3) > + } > + FrecencyStatsPair pair = { > + .oldFrecency = aOldFrecency, > + .newFrecency = aNewFrecency > + }; > + mFrecencyStatsPairs.AppendElement(pair); in any case, it looks like these should be set on the runnable, not on the history object? And at that point you could use ConstHistoryService, I suppose.
Attachment #8948570 - Flags: review?(mak77)
I'll skip reviewing the tests for now, until we are ok with the previous parts.
Just a status update -- working on addressing Marco's comments plus making some slight improvements. I'd post a WIP but I don't want to retrigger review requests. I should be able to request review again in a day or two.
(In reply to Marco Bonardo [::mak] from comment #146) > ::: toolkit/components/places/Database.cpp:1958 > (Diff revision 3) > > nsresult > > +Database::MigrateV44Up() { > > + MOZ_ASSERT(NS_IsMainThread()); > > + > > + nsresult rv; > > + nsCOMPtr<mozIStorageStatement> stmt; > > We need a guessed measure of the time for this migration on a mechanical > disk... do you have a way to measure that with your personal profile? > Just because if it's extremely slow we may need to act in chunks or > asynchronously) Are we set up to handle async or chunked migration steps? It doesn't look like it. Have we ever done that before? (Not arguing we should/shouldn't do this, just wondering how we would do it.) Also, there are existing migration functions that look O(COUNT(moz_places)) -- at least MigrateV40Up, MigrateV37Up, MigrateV33Up. Is there something different about this migration that warrants async/chunks beyond those older migrations?
(In reply to Drew Willcoxon :adw from comment #150) > Are we set up to handle async or chunked migration steps? It doesn't look > like it. Have we ever done that before? Favicons migration in 55 did async and chunked payloads migration. On the other side, the favicons data is far less critical. > Is there something different about this > migration that warrants async/chunks beyond those older migrations? That's something to be measured, when I wrote favicons migration I tested it on my profile and noticed it was creating a 200MB WAL journal for a 100MB db, and then trying to merge it, it was really expensive and slow. Chunking it made the wal stay around 20MB and all the process completed sooner. It's really just matter of measuring the time and I/O involved, and if it's acceptable just do a usual migration, otherwise it may need some ad-hoc trick.
Blocks: 1205528
Attachment #8948571 - Flags: review?(mak77)
Attachment #8948572 - Flags: review?(mak77)
I took some measurements on a 2009 Core 2 Duo Mac Mini with a 5400 rpm disk running Windows 10. I used about:telemetry for data, specifically the slow SQL info, browser hangs info, and firstLoadURI in simple measurements. Without my patch's migration, firstLoadURI is ~9s (this is a slow machine, everything is slow on it), there's generally no slow SQL, and there's no reported browser hangs. I do get slow SQL on apparently the second run after a new places.sqlite -- mostly for migrating favicons? e.g.: 4913ms SELECT id FROM moz_bookmarks WHERE guid = :guid 1940ms SELECT id, expire_ms, data, width, root FROM moz_icons WHERE fixed_icon_url_hash = hash()... 5880ms PRAGMA cache_size = -2048 All off the main thread. I'm including it here to maybe give a sense of how fast that takes on this machine. With my patch's migration, firstLoadURI is doubled, ~18s. There's a "10 sec" reported browser hang. There's several slow SQL statements all related to the migration, e.g.: 3383ms UPDATE moz_places SET origin_id = ... 833ms INSERT OR IGNORE INTO moz_origins ... 290ms CREATE INDEX IF NOT EXISTS moz_places_originidindex... 1127ms UPDATE moz_origins SET frecency = ... 3767ms COMMIT 75ms PRAGMA cache_size = -2048 All on the main thread of course. So it seems like even though this is a slow machine and the baseline is slow, there's a non-neglible perf impact compared to the baseline, and I should do this migration off the main thread. What do you think?
Flags: needinfo?(mak77)
My places.sqlite before migration: * 52.4 MiB * select count(*) from moz_places => 114820 After: * 57.7 MiB * select count(*) from moz_origins => 8128 * No shm or wal files
(In reply to Drew Willcoxon :adw from comment #154) > My places.sqlite before migration: > * 52.4 MiB > After: > * 57.7 MiB it's about a 10% increase, that means we should probably increase DATABASE_MAX_SIZE to 75MB We'll recover some space in the future when we fix bug 1402890
(In reply to Drew Willcoxon :adw from comment #153) > Without my patch's migration, firstLoadURI is ~9s (this is a slow machine, > everything is slow on it), there's generally no slow SQL, and there's no > reported browser hangs. I do get slow SQL on apparently the second run > after a new places.sqlite -- mostly for migrating favicons? e.g.: Hm favicon are only migrated updating from FF<55 to FF 55 or newer. Am I misunderstanding? > 4913ms SELECT id FROM moz_bookmarks WHERE guid = :guid > 1940ms SELECT id, expire_ms, data, width, root FROM moz_icons WHERE > fixed_icon_url_hash = hash()... > 5880ms PRAGMA cache_size = -2048 All of these look far worse than my worst expectations... Was this on an optimized build? Debug builds are crazily slower in Sqlite because of SQLITE_DEBUG (not even Sqlite devs suggest using it for how slow it is). > 3383ms UPDATE moz_places SET origin_id = ... > 833ms INSERT OR IGNORE INTO moz_origins ... > 290ms CREATE INDEX IF NOT EXISTS moz_places_originidindex... > 1127ms UPDATE moz_origins SET frecency = ... > 3767ms COMMIT > 75ms PRAGMA cache_size = -2048 If we leave alone the absolute value (18s is a bit insane) it doesn't look too bad compared to a normal startup. Maybe there's some space to optimize the first query yet, we can't do much about the final commit, it's an fsync, the system just has horrible I/O. Any idea if the WAL journal grew much during these executions? That would tell us if chunking may help. > So it seems like even though this is a slow machine and the baseline is > slow, there's a non-neglible perf impact compared to the baseline, and I > should do this migration off the main thread. What do you think? The problem with the async migration is that it takes more (Far more) in wall clock time, than doing the work synchronously. That means the user is more likely to interrupt it in the middle. For favicons it was less of a problem, because it's favicons, if some are missing nothing critical breaks. So, I'd first check if there's any space for optimizations in the migration queries, and whether chunking queries helps building a smaller WAL (Cheaper to merge). If we have to go the async path, we'll have interesting half-migration-done issues to think about (we'll have to do something liek setting a pref and continuing migration on next start until we're sure it's complete and so on)...
Flags: needinfo?(mak77)
(In reply to Marco Bonardo [::mak] (Away 23 Apr - 1 May) from comment #156) > Hm favicon are only migrated updating from FF<55 to FF 55 or newer. Am I > misunderstanding? No, maybe it was expiration or something else, but those statements were reported in the slow SQL. I didn't investigate. > > 4913ms SELECT id FROM moz_bookmarks WHERE guid = :guid > > 1940ms SELECT id, expire_ms, data, width, root FROM moz_icons WHERE > > fixed_icon_url_hash = hash()... > > 5880ms PRAGMA cache_size = -2048 > > All of these look far worse than my worst expectations... Was this on an > optimized build? Yes, but again everything on this machine is very slow > Any idea if the WAL journal grew much during these executions? That would > tell us if chunking may help. There was no wal file after I migrated and quit Firefox. Is there wal journaling in the main places.sqlite itself and if so how can I check? > If we have to go the async path, we'll have interesting half-migration-done > issues to think about (we'll have to do something liek setting a pref and > continuing migration on next start until we're sure it's complete and so > on)... Yeah. Maybe in this case it's not so awful because (1) we're only populating this table for autofill, which isn't critical, (2) the table can be populated using info that's already in the database, and (3) populating individual rows is idempotent -- there's no harm in re-populating the same moz_places more than once.
(In reply to Drew Willcoxon :adw from comment #157) > > Any idea if the WAL journal grew much during these executions? That would > > tell us if chunking may help. > > There was no wal file after I migrated and quit Firefox. Is there wal > journaling in the main places.sqlite itself and if so how can I check? The wal exists during operations and is merged on shutdown (thus disappears at that point), I don't have a very scientific way to track it, usually you can just observe its size changing in the file manager while the operation is ongoing. I'm sure there are better ways but didn't investigate them. > Yeah. Maybe in this case it's not so awful because (1) we're only > populating this table for autofill, which isn't critical Well, the address bar functionality is quite important for us. If later we discover the experience is sub par due to an incomplete migration, we'll have to pay that cost regardless. It would also mine our possibility in the future to reuse this data, for example to drop rev_host or drive things caring only about hosts. , (2) the table can > be populated using info that's already in the database, and (3) populating > individual rows is idempotent -- there's no harm in re-populating the same > moz_places more than once. True, but still there should be a point in time where we can call this data "reliable" and "coherent". Question, since I don't recall, is frecency of each host completely recalculated on every visit to one of its pages, or just incrementally calculated? If it's the former maybe you're right, not all the data is critical. Would it be cheaper in such a case to just create the entries WITHOUT frecency synchronously, and just populate their frecency asynchronously? IF so, we would have coherent origins data, and any missing frecency data would be filled with use
(In reply to Marco Bonardo [::mak] (Away 23 Apr - 1 May) from comment #158) > The wal exists during operations and is merged on shutdown (thus disappears > at that point), I don't have a very scientific way to track it, usually you > can just observe its size changing in the file manager while the operation > is ongoing. I'm sure there are better ways but didn't investigate them. Hmm, I can try to look at this, but it seems much less important if it's just temporary IMO. > Question, since I don't recall, is frecency of each host completely > recalculated on every visit to one of its pages, or just incrementally > calculated? If it's the former maybe you're right, not all the data is > critical. Yes, recalculated on every visit. The SQL in the trigger is: > "UPDATE moz_origins " \ > "SET frecency = ( " \ > "SELECT MAX(frecency) " \ > "FROM moz_places " \ > "WHERE moz_places.origin_id = moz_origins.id " \ > ") " \ > "WHERE id = NEW.origin_id; " \ > Would it be cheaper in such a case to just create the entries > WITHOUT frecency synchronously, and just populate their frecency > asynchronously? IF so, we would have coherent origins data, and any missing > frecency data would be filled with use It would be cheaper since it would be removing a (sync) step from the migration, but if we're populating frecency asyncly, I'm not sure why we wouldn't just populate/migrate everything asyncly. Autofill needs frecencies to work. And it seems doable to come up with a test/predicate for determining whether the migration has finished. Maybe we could keep moz_hosts and use basically the current autofill implementation while the async migration is happening. Once the migration finishes, switch to moz_origins and the new implementation/queries. Then at least autofill would keep working immediately on upgrade. But at some point the queries/algorithm would change, which would maybe be jarring for the user. That seems kind of overkill though IMO. The async migration would only take a matter of seconds on the slowest machine, wouldn't it? Hmm, we could do the above, and when the migration finishes, mark it as done. Then on the next startup, we would see that the migration has been marked done and we would switch to it from that point on. Instead of switching the implementation/queries while the app is running. That kind of seems like an all-around win except for having to keep two implementations/queries around, but maybe that's the smallest and best price to pay.
(In reply to Drew Willcoxon :adw from comment #159) > Hmm, I can try to look at this, but it seems much less important if it's > just temporary IMO. So, I mentioned it because a large WAL is exponentially more expensive than a small WAL to merge. With favicons I discovered chunking was faster because otherwise the WAL was growing up to 100MB (bigger than the db itself!) and merging it was taking a huge I/O effort. By chunking the WAL never went over 20MB. > It would be cheaper since it would be removing a (sync) step from the > migration, but if we're populating frecency asyncly, I'm not sure why we > wouldn't just populate/migrate everything asyncly. Because I'd like to have coherence of origins in relation to moz_places, rather than not knowing if given a place we have an entry in origins or not. > Maybe we could keep moz_hosts and use basically the current autofill > implementation while the async migration is happening. Once the migration > finishes, switch to moz_origins and the new implementation/queries. It sounds complex off-hand. More complex than migrating origins synchronously and calculating frecency async. > user. That seems kind of overkill though IMO. The async migration would > only take a matter of seconds on the slowest machine, wouldn't it? you measured the sync one at 9s, async is slower, could take 3 or 4 times more, depending on system load. > That kind of > seems like an all-around win except for having to keep two > implementations/queries around, but maybe that's the smallest and best price > to pay. it's a possibility but has downsides as well: 1. it's more complex because we must bring on 2 implementations 2. we'd have no clear data to know when it will be safe to remove the old code 3. while we keep both data, the db will be larger and will expire more history Personally I'd do this (then you have a closer view of all the things here, and may better evaluate): 1. on sync migration create all the origins with frecency -1 2. set a bool pref like "mustMigrateOriginsFrecency" 3. start asynchronously calculating the frecencies in chunks, when done clear the pref 4. on startup, if the pref is set, goto 3
(In reply to Marco Bonardo [::mak] (Away 23 Apr - 1 May) from comment #160) > Personally I'd do this OK, I'll give that a shot.
[I don't expect/want you to look at these while you're away this week, but they're ready for review again so I'm posting them now.] The major changes: * moz_origins is the only new table. It's basically what moz_autofill_origins was in the previous patches. * Substantial changes to the SQLFunctions.cpp part of part 1. Now using const_char_iterator since it seems const_iterator is deprecated. Also using narrow UTF-8 strings instead of wide strings. Other related changes/improvements. * Migration liked we discussed: moz_origins and moz_places.origin_id are populated syncly on migration. A pref is set on migration, and Database::InitSchema starts chunked async migration of moz_origins.frecency when the pref is set. The pref is unset when that migration finishes. I modeled this part off of your favicon migration runnable. I tested again with these patches on my Windows machine, and the perf on startup was the same, which is expected. I also saw that during startup/migration, a wal file appeared. It grew to 27 mb (half the size of the db). After migration finished, the wal file size went down to 4 mb, but a couple of times it actually increased to 32 mb. On subsequent runs/startups, the wal file was created again but it was empty. The wal file is removed on app quit in all cases. (In reply to Marco Bonardo [::mak] from comment #143) > Apart from the variables (solvable with atomics), there are 2 risks here: Yes, I see. I changed this so that the SQL function calls nsNavHistory::GetConstHistoryService()->DispatchFrecencyStatsUpdate(), which dispatches a runnable to the main thread that calls nsNavHistory->UpdateFrecencyStats(). So I got rid of the batching, and now we dispatch a runnable per frecency change. Note that we don't update frecency stats while frecency decay is happening (because the SQL trigger checks for that case). (In reply to Marco Bonardo [::mak] from comment #144) > The fact we currently store user and password in history is a bug, and we'll > now forward that bug to this new table. I modified the part-1 patch to ignore user:pass (which RFC 3986 calls "userinfo"). So we still may have them in moz_places, but not in this new moz_origins. I also modified the UnifiedComplete.js part of part 0 to strip user:pass from search strings. (In reply to Marco Bonardo [::mak] from comment #145) > Comment on attachment 8930304 [details] > Bug 1239708: Improve awesomebar autofill. Part 0: Core changes. > > https://reviewboard.mozilla.org/r/201438/#review233914 > > ::: toolkit/components/places/Database.cpp > (Diff revision 27) > > - rv = mMainConn->CreateAsyncStatement(NS_LITERAL_CSTRING( > > - "UPDATE moz_hosts " > > - "SET prefix = (" HOSTS_PREFIX_PRIORITY_FRAGMENT ") " > > - "WHERE host IN (SELECT host FROM moz_migrate_v32_temp) " > > - ), getter_AddRefs(updateHostsStmt)); > > - NS_ENSURE_SUCCESS(rv, rv); > > From what I got you're leaving moz_hosts as is for now, just emptying it, > thus this change doesn't seem necessary. and anyway a db that is upgraded to > v32 would have a normal moz_hosts table. The reason I removed this is because I removed the HOSTS_PREFIX_PRIORITY_FRAGMENT #define from nsPlacesTriggers.h, since it's not needed anymore. So I added it back directly to this migration function. > ::: toolkit/components/places/UnifiedComplete.js:716 > (Diff revision 27) > > - * @param spec > > - * The text to modify. > > - * @return the modified spec. > > - */ > > -function stripPrefix(spec) { > > - ["http://", "https://", "ftp://"].some(scheme => { > > + * @param str > > + * The possible URL to strip. > > + * @return If `str` is a URL, then [prefix, remainder]. Otherwise, ["", str]. > > + */ > > +function stripPrefix(str) { > > + let match = /^[a-zA-Z]+:(?:\/\/)?/.exec(str); > > likely can memoize this regex too Done, added `const REGEXP_STRIP_PREFIX` > ::: toolkit/components/places/UnifiedComplete.js:2302 > (Diff revision 27) > > * @return an array consisting of the correctly optimized query to search the > > * database with and an object containing the params to bound. > > */ > > get _urlQuery() { > > - // We expect this to be a full URL, not just a host. We want to extract the > > - // host and use that as a guess for whether we'll get a result from a URL > > + // Assuming the search string is a URL, get the hostname, the part of the > > + // search string up to either the path slash or the port colon. > > this comment could be a bit clearer Fixed > ::: toolkit/components/places/UnifiedComplete.js:2313 > (Diff revision 27) > > - .substring(this._strippedPrefix.length, pathIndex) > > - .toLowerCase().split("").reverse().join("") + "."; > > - let searchString = stripPrefix( > > - this._trimmedOriginalSearchString.slice(0, pathIndex).toLowerCase() + > > - this._trimmedOriginalSearchString.slice(pathIndex) > > - ); > > + return [null, null]; > > + } > > + > > + let bookmarked = > > + this.hasBehavior("bookmark") && > > + !this.hasBehavior("history"); > > this can take less space than 3 lines, and likely can be moved below before > its use Fixed > ::: toolkit/components/places/UnifiedComplete.js:2348 > (Diff revision 27) > > + query.push(SQL_URL_BOOKMARKED_QUERY); > > + } else { > > + query.push(SQL_URL_QUERY); > > } > > > > - query.push({ > > + query.push(opts); > > looks like you could rewrite this more compact using early returns: > > if (this._strippedPrefix) { > opts.prefix = this._strippedPrefix; > if (bookmarked) { > return [SQL_URL_PREFIX_BOOKMARKED_QUERY, opts]; > } > return [SQL_URL_PREFIX_QUERY, opts]; > } > if (bookmarked) { > return [... > } > return [... > > The same can be done in get _originQuery Fixed > ::: toolkit/components/places/nsPlacesTables.h:178 > (Diff revision 27) > > + "CREATE TABLE moz_autofill_origins ( " \ > > + "origin_id INTEGER PRIMARY KEY REFERENCES moz_origins(id) ON DELETE CASCADE, " \ > > + "prefix TEXT NOT NULL, " \ > > + "host TEXT NOT NULL, " \ > > + "frecency INTEGER NOT NULL " \ > > + ")" \ > > Ok, so as I said on IRC, I'm confused by the schema. > > It looks like moz_autofill_origins and moz_origins point at the same > information, one normalized one denormalized. > Like it is each of prefix and host are repeated 4 times in the db file (1 > data + 1 index per table). likely not a big deal for prefix, but host is a > bit larger. > > If effectively the cardinality of moz_autofill_origins is the same as > moz_origins, the normalization is not saving anything, since then we also > denormalize. At this point it's better to take the pill and accept we need > the denormalized table. > Though, we could still normalize prefix, it will be a very small table (I > think?) and Sqlite should not suffer much from that joining. I removed all the new tables except for moz_autofill_origins, and I renamed it moz_origins. So there's only the one denormalized table now. Thanks for your feedback. (In reply to Marco Bonardo [::mak] from comment #146) > Comment on attachment 8948567 [details] > Bug 1239708: Improve awesomebar autofill. Part 1: Core follow-ons. > > https://reviewboard.mozilla.org/r/217974/#review233974 > > ::: toolkit/components/places/Database.h:22 > (Diff revision 3) > > #include "Shutdown.h" > > #include "nsCategoryCache.h" > > > > // This is the schema version. Update it at any schema change and add a > > // corresponding migrateVxx method below. > > -#define DATABASE_SCHEMA_VERSION 43 > > +#define DATABASE_SCHEMA_VERSION 44 > > this bitrotted a couple times, so now we are at version 46. Sorry. > I don't know of other imminent schema changes... Fixed > ::: toolkit/components/places/Database.cpp > (Diff revision 3) > > NS_ENSURE_SUCCESS(rv, rv); > > > > mozIStorageBaseStatement *stmts[] = { > > expireOrphansStmt, > > deleteHostsStmt, > > - updateHostsStmt, > > probably should not touch this, it doesn't matter, but in any case v32 > operates on old dbs that should have no problems with moz_host Fixed -- this was related to the #define removal mentioned above > ::: toolkit/components/places/Database.cpp:1958 > (Diff revision 3) > > nsresult > > +Database::MigrateV44Up() { > > + MOZ_ASSERT(NS_IsMainThread()); > > + > > + nsresult rv; > > + nsCOMPtr<mozIStorageStatement> stmt; > > We need a guessed measure of the time for this migration on a mechanical > disk... do you have a way to measure that with your personal profile? > Just because if it's extremely slow we may need to act in chunks or > asynchronously) As we discussed > ::: toolkit/components/places/Database.cpp:1968 > (Diff revision 3) > > + ), getter_AddRefs(stmt)); > > + if (NS_FAILED(rv)) { > > + rv = mMainConn->ExecuteSimpleSQL(CREATE_MOZ_ORIGIN_PREFIXES); > > + NS_ENSURE_SUCCESS(rv, rv); > > + rv = mMainConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING( > > + "INSERT OR IGNORE INTO moz_origin_prefixes (prefix) " \ > > OK, this won't be funny. the problem is that a user can upgrade to this > scheme, then downgrade (because the browser crashes on startup or has > graphical problems), use the old version for a while, then upgrade again... > That means it's true the migration should not recreate the table, but it > will regardless have to add entries that the old version didn't add. Good point... I updated this migration function so that the INSERT, UPDATE, and creating the moz_places.origin_id index are always performed. > Fwiw, you could just use origin_id to tell both if origin_id and tables must > be created... I left this as-is because I think it's better to directly test the thing we need to do/create. For example, if something happens to the database between the points where the moz_origins table is created and moz_places is altered to add the origin_id column, we shouldn't skip altering moz_places because moz_origins exists (or some similar situation). (In reply to Marco Bonardo [::mak] from comment #147) > Comment on attachment 8948570 [details] > Bug 1239708: Improve awesomebar autofill. Part 4: Frecency stats. > > https://reviewboard.mozilla.org/r/217980/#review234102 > > ::: toolkit/components/places/SQLFunctions.cpp:1329 > (Diff revision 3) > > + uint32_t numArgs; > > + nsresult rv = aArgs->GetNumEntries(&numArgs); > > + NS_ENSURE_SUCCESS(rv, rv); > > + MOZ_ASSERT(numArgs == 3); > > + > > + int64_t placeID = aArgs->AsInt64(0); > > nit: let's use the usual camelCase even in cpp, old code did it differently > but it's unimportant. Fixed > ::: toolkit/components/places/UnifiedComplete.js:2036 > (Diff revision 3) > > > > _addAutofillMatch(autofilledValue, finalCompleteValue, frecency, extraStyles = []) { > > + let threshold = > > + PlacesUtils.history.frecencyMean + > > + PlacesUtils.history.frecencyStandardDeviation; > > + if (frecency < threshold) { > > Is there a reason to not filter on frecency at the query level? Good point, I added a new :frecencyThreshold param to the origin and URL queries. > ::: toolkit/components/places/nsNavHistory.h:504 > (Diff revision 3) > > + * @param aOldFrecency > > + * The old value of the frecency. > > + * @param aNewFrecency > > + * The new value of the frecency. > > + */ > > + void UpdateFrecencyStats(int64_t placeID, > > nit: ditto on camelCase Fixed > ::: toolkit/components/places/nsNavHistory.cpp:276 > (Diff revision 3) > > // nsNavBookmarks::kGetChildrenIndex_Position = 19; > > // nsNavBookmarks::kGetChildrenIndex_Type = 20; > > // nsNavBookmarks::kGetChildrenIndex_PlaceID = 21; > > > > -PLACES_FACTORY_SINGLETON_IMPLEMENTATION(nsNavHistory, gHistoryService) > > +static uint64_t > > +GetUInt64Pref(const char *prefName) > > Considered this complication, is it worth to store these in prefs, rather > than just have defines? Do you think we may want to tweak the values through > Shield? These values are the frecency stats, and they need to be persisted across app restarts. (Unless you think we should calculate them on each startup?) > ::: toolkit/components/places/nsNavHistory.cpp:666 > (Diff revision 3) > > aGUID, aHidden, > > aLastVisitDate); > > (void)NS_DispatchToMainThread(notif); > > } > > > > +class PlacesUpdateFrecencyStatsRunnable : public Runnable > > I suspect using NewRunnableMethod would simplify the code, you wouldn't need > a class and a weakptr, and you can pass arguments to the runnable Nice, thank you! While I was here, I updated DispatchFrecencyChangedNotification() to use NewRunnableMethod, too. > ::: toolkit/components/places/nsNavHistory.cpp:690 > (Diff revision 3) > > +void > > +nsNavHistory::UpdateFrecencyStats(int64_t placeID, > > + int32_t aOldFrecency, > > + int32_t aNewFrecency) > > +{ > > + if (placeID < 0) { > > should we also MOZ_ASSERT? Is this something that can happen in valid use > cases? I modified the SQL that calls this function to only call it when the ID is >= 0. And I changed the conditional to a MOZ_ASSERT. IIRC I saw this on try because it ended up getting called for some moz_place with a -1 ID. (I think I used to know what that means but I've forgotten.) And the reason that was a problem was because, for these -1 IDs, they would get called with an old frecency > 0 which triggered the MOZ_ASSERT(mFrecencyStatsCount > 0). > ::: toolkit/components/places/nsNavHistory.cpp:702 > (Diff revision 3) > > + } > > + FrecencyStatsPair pair = { > > + .oldFrecency = aOldFrecency, > > + .newFrecency = aNewFrecency > > + }; > > + mFrecencyStatsPairs.AppendElement(pair); > > in any case, it looks like these should be set on the runnable, not on the > history object? And at that point you could use ConstHistoryService, I > suppose. Fixed, now that I'm using NewRunnableMethod
All of your answers make sense, I can't review this now, but I will be on it early next week. Thanks.
There were some new failures on try. I've fixed them locally and will update the mozreview once I see that they're gone on try. https://treeherder.mozilla.org/#/jobs?repo=try&revision=83b87814f66e5f20a30099d2ded74318718a201f
Changes from the previous mozreview revision: * Fix latest test failures. * One failure in particular needs calling out because I had to make a non-trivial code change for it. browser_preferences_usage.js failed because the frecency stats prefs were accessed too much. "Too much" means > 15 times while the test ran. The prefs are read only once, on startup, so the problem must be that they are written every time a frecency changes. So I added back batching of pref writes. Only the pref writes are batched; changes to the stats member variables still aren't. I chose to simply write prefs 5 seconds after stats are updated. i.e., the first time stats are changed, I start a 5-second timer, and all subsequent stats changes within that 5 seconds are batched together and the prefs are written when the 5 seconds elapse. I don't think we need anything more complex than that. 5 seconds seems reasonable: long enough to batch many quick consecutive changes, short enough to most likely avoid data loss without having to use a shutdown blocker or other techniques. * Rebased on new tree to avoid bitrot.
(In reply to Code Review Bot [:reviewbot] from comment #182) > Error: 'addbookmark' is not defined. [eslint: no-undef] I don't know what this is talking about... It's addBookmark, not addbookmark, and it's defined in head_common.js. And lint was fine on try and locally. And this code hasn't changed since the last mozreview revision.
Comment on attachment 8930304 [details] Bug 1239708: Improve awesomebar autofill. Part 0: Core changes. https://reviewboard.mozilla.org/r/201438/#review247464 ::: toolkit/components/places/UnifiedComplete.js (Diff revision 29) > // Prefs are defined as [pref name, default value]. > const PREF_URLBAR_BRANCH = "browser.urlbar."; > const PREF_URLBAR_DEFAULTS = new Map([ > ["autocomplete.enabled", true], > ["autoFill", true], > - ["autoFill.typed", true], should be removed from firefox.js as well ::: toolkit/components/places/UnifiedComplete.js:262 (Diff revision 29) > + id > + FROM moz_origins > - WHERE host BETWEEN :searchString AND :searchString || X'FFFF' > + WHERE host BETWEEN :searchString AND :searchString || X'FFFF' > - AND frecency <> 0 > + AND frecency <> 0 > - ${conditions} > + ${conditions} > - ORDER BY frecency DESC > + UNION we can likely use UNION ALL here instead of UNION, it should be a bit more efficient, and we have a LIMIT 1 regardless ::: toolkit/components/places/UnifiedComplete.js:293 (Diff revision 29) > > -const SQL_BOOKMARKED_HOST_QUERY = bookmarkedHostQuery(); > +const SQL_ORIGIN_PREFIX_BOOKMARKED_QUERY = originQuery( > + `AND bookmarked > + AND prefix BETWEEN :prefix AND :prefix || X'FFFF'`, > + `(SELECT foreign_count > 0 FROM moz_places > + WHERE moz_places.origin_id = moz_origins.id)` doesn't this return multiple values, since it's likely there will be multiple places pointing to the same origin? also in SQL_ORIGIN_BOOKMARKED_QUERY ::: toolkit/components/places/nsPlacesTables.h:26 (Diff revision 29) > ", guid TEXT" \ > ", foreign_count INTEGER DEFAULT 0 NOT NULL" \ > ", url_hash INTEGER DEFAULT 0 NOT NULL " \ > ", description TEXT" \ > ", preview_image_url TEXT" \ > + ", origin_id INTEGER REFERENCES moz_origins(id) ON DELETE CASCADE" \ Are you sure this FK has the right direction? This ON DELETE CASCADE means removing from moz_origins will remove all the entries in moz_places. That sounds dangerous, especially considering we don't have other FK in place to cleanup leftovers referring to these places, and it may create orphans. Where are we using this? Maybe the original intent here was to have moz_origins entries go away when they are no more referenced in moz_places? And maybe this means we are lacking a test? ::: toolkit/components/places/nsPlacesTables.h:169 (Diff revision 29) > -#define CREATE_UPDATEHOSTSDELETE_TEMP NS_LITERAL_CSTRING( \ > - "CREATE TEMP TABLE moz_updatehostsdelete_temp (" \ > - " host TEXT PRIMARY KEY " \ > +// details. > +#define CREATE_UPDATEORIGINSDELETE_TEMP NS_LITERAL_CSTRING( \ > + "CREATE TEMP TABLE moz_updateoriginsdelete_temp ( " \ > + "origin_id INTEGER PRIMARY KEY, " \ > + "host TEXT " \ > ") WITHOUT ROWID " \ WITHOUT ROWID doesn't make sense anymore here since the primary key is integer, thus it's basically a rowid alias. ::: toolkit/components/places/nsPlacesTables.h:179 (Diff revision 29) > -#define CREATE_UPDATEHOSTSINSERT_TEMP NS_LITERAL_CSTRING( \ > - "CREATE TEMP TABLE moz_updatehostsinsert_temp (" \ > - " host TEXT PRIMARY KEY " \ > +#define CREATE_UPDATEORIGINSINSERT_TEMP NS_LITERAL_CSTRING( \ > + "CREATE TEMP TABLE moz_updateoriginsinsert_temp ( " \ > + "place_id INTEGER PRIMARY KEY, " \ > + "prefix TEXT NOT NULL, " \ > + "host TEXT NOT NULL " \ > ") WITHOUT ROWID " \ ditto ::: toolkit/components/places/nsPlacesTriggers.h:87 (Diff revision 29) > "END" \ > ) > > // See CREATE_PLACES_AFTERINSERT_TRIGGER. This is the trigger that we want > -// to ensure gets run for each distinct host that we insert into moz_places. > -#define CREATE_UPDATEHOSTSINSERT_AFTERDELETE_TRIGGER NS_LITERAL_CSTRING( \ > +// to ensure gets run for each origin that we insert into moz_places. > +#define CREATE_UPDATEORIGINSINSERT_AFTERDELETE_TRIGGER NS_LITERAL_CSTRING( \ nit: I find UPDATEnnnINSERT_AFTERDELETE name a little bit confusing... not your fault, but maybe we could name it better while here. Maybe just UPDATEORIGINS_AFTERDELETE
Attachment #8930304 - Flags: review?(mak77)
Comment on attachment 8948567 [details] Bug 1239708: Improve awesomebar autofill. Part 1: Core follow-ons. https://reviewboard.mozilla.org/r/217974/#review247506 ::: toolkit/components/places/Database.cpp:2300 (Diff revision 5) > return NS_OK; > } > > nsresult > +Database::MigrateV48Up() { > + MOZ_ASSERT(NS_IsMainThread()); I centralized this assert at the beginning of initSchema, so it's not necessary to check at every migration method ::: toolkit/components/places/Database.cpp:2302 (Diff revision 5) > > nsresult > +Database::MigrateV48Up() { > + MOZ_ASSERT(NS_IsMainThread()); > + > + nsresult rv; nit: declare at first use ::: toolkit/components/places/Database.cpp:2327 (Diff revision 5) > + "SELECT origin_id FROM moz_places; " > + ), getter_AddRefs(stmt)); > + if (NS_FAILED(rv)) { > + rv = mMainConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING( > + "ALTER TABLE moz_places " \ > + "ADD COLUMN origin_id INTEGER REFERENCES moz_origins(id) ON DELETE CASCADE; " ditto from the other review about the FK ::: toolkit/components/places/Database.cpp:2350 (Diff revision 5) > + // main data in moz_origins, prefix and host, are coherent in relation to > + // moz_places. > + Unused << Preferences::SetBool(PREF_MIGRATE_V48_FRECENCIES, true); > + > + // From this point on, nobody should use moz_hosts again. Empty it so that we > + // don't leak the user's history. ...but don't remove the table yet to support downgrades. ::: toolkit/components/places/Database.cpp:2385 (Diff revision 5) > + // Migration done. Clear the pref. > + Unused << Preferences::ClearUser(PREF_MIGRATE_V48_FRECENCIES); > + return NS_OK; > + } > + > + nsresult rv; nit: declare at first use
Attachment #8948567 - Flags: review?(mak77) → review+
Comment on attachment 8948570 [details] Bug 1239708: Improve awesomebar autofill. Part 4: Frecency stats. https://reviewboard.mozilla.org/r/217980/#review247528 ::: toolkit/components/places/nsINavHistoryService.idl:1400 (Diff revision 5) > */ > unsigned long long hashURL(in ACString aSpec, [optional] in ACString aMode); > + > + /** > + * The mean and standard deviation of all frecencies > 0 in the database. > + */ It's probably worth adding a longer description about how these are used and updated. ::: toolkit/components/places/nsNavHistory.cpp:597 (Diff revision 5) > PRTime aLastVisitDate) > { > MOZ_ASSERT(!aGUID.IsEmpty()); > + > + nsCOMPtr<nsIURI> uri; > + Unused << NS_NewURI(getter_AddRefs(uri), aSpec); I hope this won't be too expensive in case of multiple frecencies... but in those cases we likely dispatch manyFrecenciesChanges... ::: toolkit/components/places/tests/unit/test_frecency_stats.js:63 (Diff revision 5) > + mean(Object.values(frecenciesByURL))); > + Assert.equal(PlacesUtils.history.frecencyStandardDeviation, > + stddev(Object.values(frecenciesByURL))); > + > + // Bookmark URL 1. > + let bookmark = await addBookmark({ uri: NetUtil.newURI(urls[1]) }); The addbookmark eslint is probably due to the order of the patches, it's not defined yet at this point in the patch set.
Attachment #8948570 - Flags: review?(mak77) → review+
Comment on attachment 8948572 [details] Bug 1239708: Improve awesomebar autofill. Part 6: Browser tests. https://reviewboard.mozilla.org/r/217984/#review247532 ::: browser/base/content/test/urlbar/browser_urlbarDelete.js:6 (Diff revision 5) > add_task(async function() { > let bm = await PlacesUtils.bookmarks.insert({ parentGuid: PlacesUtils.bookmarks.unfiledGuid, > url: "http://bug1105244.example.com/", > title: "test" }); > > + Services.prefs.setBoolPref("browser.urlbar.autoFill", false); Is this fixing an intermittent? why was it not necessary before?
Attachment #8948572 - Flags: review?(mak77) → review+
Comment on attachment 8948571 [details] Bug 1239708: Improve awesomebar autofill. Part 5: xpcshell tests. https://reviewboard.mozilla.org/r/217982/#review247534 Nothing jumps at my eyes as particularly wrong, though it's a large patch will probably do another pass ::: toolkit/components/places/tests/head_common.js:939 (Diff revision 5) > + Assert.ok(!!aBookmarkObj.uri, "Bookmark object contains an uri"); > + let parentId = aBookmarkObj.parentId ? aBookmarkObj.parentId > + : PlacesUtils.unfiledBookmarksFolderId; > + > + let bm = await PlacesUtils.bookmarks.insert({ > + parentGuid: (await PlacesUtils.promiseItemGuid(parentId)), we are trying to move away from ids, please make the original object take a parentGuid instead of parentId. Since in the future we'll also add tags to the official bookmarking API, I'd also prefer if this addBookmark helper would be limited to cases where we actually need to add a bookmark with a keyword, so maybe it should move to a more restricted head than head_common.js (the original position made more sense imo). I prefer consumers to stick to official APIs than helpers when possible. ::: toolkit/components/places/tests/migration/test_current_from_v43.js:205 (Diff revision 5) > add_task(async function test_meta_exists() { > let db = await PlacesUtils.promiseDBConnection(); > await db.execute(`SELECT 1 FROM moz_meta`); > }); > + > +add_task(async function test_origins() { please add a separate test_current_from_v47.js file, It's ok if inside it you reuse placesv43.sqlite or any version before 47. ::: toolkit/components/places/tests/unifiedcomplete/head_autocomplete.js:136 (Diff revision 5) > } > > let actualStyle = result.style.split(/\s+/).sort(); > if (style) > Assert.equal(actualStyle.toString(), style.toString(), "Match should have expected style"); > - if (uri.spec.startsWith("moz-action:")) { > + if (uri && uri.spec.startsWith("moz-action:")) { in which case do we have a match without uri? ::: toolkit/components/places/tests/unifiedcomplete/head_autofill.js:1 (Diff revision 5) > +/* This Source Code Form is subject to the terms of the Mozilla Public Instead of defining a separate head, I think you could use defineLazyScriptGetter in head_autocomplete.js to lazy load addAutofillTasks from a js file when it's actually used. ::: toolkit/components/places/tests/unifiedcomplete/test_autofill_origins.js:13 (Diff revision 5) > + > +// "example.com/" should match http://example.com/. i.e., the search string > +// should be treated as if it didn't have the trailing slash. > +add_task(async function trailingSlash() { > + await PlacesTestUtils.addVisits([{ > + uri: NetUtil.newURI("http://example.com/"), it's not necessary anymore to create nsIURIs (and in any case when we do we should use Services.io, not NetUtil) ::: toolkit/components/places/tests/unit/test_origins.js:99 (Diff revision 5) > + checkDB([ > + ["http://", "example.com"], > + ["http://", "www.example.com"], > + ["http://", "www.www.example.com"], > + ]); > + await PlacesUtils.history.remove([ history.remove also accepts single urls, not just arrays. You can probably oneline most calls here. ::: toolkit/components/places/tests/unit/test_origins.js:412 (Diff revision 5) > + } > + checkDB([ > + ["http://", "example.com"], > + ["http://", "www.example.com"], > + ]); > + await PlacesUtils.bookmarks.remove(bookmarks[0].guid); you don't need to extract the guid, just pass the bookmark object to remove(). ::: toolkit/components/places/tests/unit/test_origins.js:455 (Diff revision 5) > + await cleanUp(); > +}); > + > + > +function checkDB(expectedOrigins) { > + let stmt = DBConn().createStatement(` Please use promiseDBConnection along with the Sqlite.jsm syntax if possible, we should in general limit usage of the old synchronous APIs. ::: toolkit/components/places/tests/unit/test_sql_function_origin.js:28 (Diff revision 5) > + "get_prefix": parts.slice(0, 2).join(""), > + "get_host_and_port": parts.slice(3, 5).join(""), > + "strip_prefix_and_userinfo": parts.slice(3).join(""), > + }; > + for (let [func, expectedValue] of Object.entries(funcs)) { > + let stmt = DBConn().createStatement(` ditto for avoiding synchronous storage APIs
Attachment #8948571 - Flags: review?(mak77)
(In reply to Marco Bonardo [::mak] from comment #184) > Comment on attachment 8930304 [details] > Bug 1239708: Improve awesomebar autofill. Part 0: Core changes. > > https://reviewboard.mozilla.org/r/201438/#review247464 > > ::: toolkit/components/places/UnifiedComplete.js > (Diff revision 29) > > // Prefs are defined as [pref name, default value]. > > const PREF_URLBAR_BRANCH = "browser.urlbar."; > > const PREF_URLBAR_DEFAULTS = new Map([ > > ["autocomplete.enabled", true], > > ["autoFill", true], > > - ["autoFill.typed", true], > > should be removed from firefox.js as well This is removed in the part-3 (front end) patch > ::: toolkit/components/places/UnifiedComplete.js:262 > (Diff revision 29) > > + id > > + FROM moz_origins > > - WHERE host BETWEEN :searchString AND :searchString || X'FFFF' > > + WHERE host BETWEEN :searchString AND :searchString || X'FFFF' > > - AND frecency <> 0 > > + AND frecency <> 0 > > - ${conditions} > > + ${conditions} > > - ORDER BY frecency DESC > > + UNION > > we can likely use UNION ALL here instead of UNION, it should be a bit more > efficient, and we have a LIMIT 1 regardless Done > ::: toolkit/components/places/UnifiedComplete.js:293 > (Diff revision 29) > > > > -const SQL_BOOKMARKED_HOST_QUERY = bookmarkedHostQuery(); > > +const SQL_ORIGIN_PREFIX_BOOKMARKED_QUERY = originQuery( > > + `AND bookmarked > > + AND prefix BETWEEN :prefix AND :prefix || X'FFFF'`, > > + `(SELECT foreign_count > 0 FROM moz_places > > + WHERE moz_places.origin_id = moz_origins.id)` > > doesn't this return multiple values, since it's likely there will be > multiple places pointing to the same origin? also in > SQL_ORIGIN_BOOKMARKED_QUERY Yes. Here's the query before my patch: let query = `/* do not warn (bug NA): not worth to index on (typed, frecency) */ SELECT :query_type, host || '/', IFNULL(prefix, 'http://') || host || '/', ( SELECT foreign_count > 0 FROM moz_places WHERE rev_host = get_unreversed_host(host || '.') || '.' OR rev_host = get_unreversed_host(host || '.') || '.www.' ) AS bookmarked, NULL, NULL, NULL, NULL, NULL, NULL, frecency FROM moz_hosts WHERE host BETWEEN :searchString AND :searchString || X'FFFF' AND bookmarked AND frecency <> 0 ${conditions} ORDER BY frecency DESC LIMIT 1`; My patch doesn't substantially change the `bookmarked` part. It just replaces the rev_host check with a moz_origins check. The reason this is OK is that this query is only used for matching origins. e.g., when you type "mozilla.org", we use this query, and it doesn't matter which mozilla.org page you have bookmarked. > ::: toolkit/components/places/nsPlacesTables.h:26 > (Diff revision 29) > > ", guid TEXT" \ > > ", foreign_count INTEGER DEFAULT 0 NOT NULL" \ > > ", url_hash INTEGER DEFAULT 0 NOT NULL " \ > > ", description TEXT" \ > > ", preview_image_url TEXT" \ > > + ", origin_id INTEGER REFERENCES moz_origins(id) ON DELETE CASCADE" \ > > Are you sure this FK has the right direction? > This ON DELETE CASCADE means removing from moz_origins will remove all the > entries in moz_places. That sounds dangerous, especially considering we > don't have other FK in place to cleanup leftovers referring to these places, > and it may create orphans. > Where are we using this? Maybe the original intent here was to have > moz_origins entries go away when they are no more referenced in moz_places? It seemed like the right direction to me, but I removed it (the `ON DELETE`) given your feedback here and since there's a trigger that removes from moz_origins when all its rows in moz_places go away. > ::: toolkit/components/places/nsPlacesTables.h:169 > (Diff revision 29) > > -#define CREATE_UPDATEHOSTSDELETE_TEMP NS_LITERAL_CSTRING( \ > > - "CREATE TEMP TABLE moz_updatehostsdelete_temp (" \ > > - " host TEXT PRIMARY KEY " \ > > +// details. > > +#define CREATE_UPDATEORIGINSDELETE_TEMP NS_LITERAL_CSTRING( \ > > + "CREATE TEMP TABLE moz_updateoriginsdelete_temp ( " \ > > + "origin_id INTEGER PRIMARY KEY, " \ > > + "host TEXT " \ > > ") WITHOUT ROWID " \ > > WITHOUT ROWID doesn't make sense anymore here since the primary key is > integer, thus it's basically a rowid alias. Fixed > ::: toolkit/components/places/nsPlacesTables.h:179 > (Diff revision 29) > > -#define CREATE_UPDATEHOSTSINSERT_TEMP NS_LITERAL_CSTRING( \ > > - "CREATE TEMP TABLE moz_updatehostsinsert_temp (" \ > > - " host TEXT PRIMARY KEY " \ > > +#define CREATE_UPDATEORIGINSINSERT_TEMP NS_LITERAL_CSTRING( \ > > + "CREATE TEMP TABLE moz_updateoriginsinsert_temp ( " \ > > + "place_id INTEGER PRIMARY KEY, " \ > > + "prefix TEXT NOT NULL, " \ > > + "host TEXT NOT NULL " \ > > ") WITHOUT ROWID " \ > > ditto Fixed > ::: toolkit/components/places/nsPlacesTriggers.h:87 > (Diff revision 29) > > "END" \ > > ) > > > > // See CREATE_PLACES_AFTERINSERT_TRIGGER. This is the trigger that we want > > -// to ensure gets run for each distinct host that we insert into moz_places. > > -#define CREATE_UPDATEHOSTSINSERT_AFTERDELETE_TRIGGER NS_LITERAL_CSTRING( \ > > +// to ensure gets run for each origin that we insert into moz_places. > > +#define CREATE_UPDATEORIGINSINSERT_AFTERDELETE_TRIGGER NS_LITERAL_CSTRING( \ > > nit: I find UPDATEnnnINSERT_AFTERDELETE name a little bit confusing... not > your fault, but maybe we could name it better while here. Maybe just > UPDATEORIGINS_AFTERDELETE Hmm, I agree that these names are a little confusing or at least harder to read than they could be. Can we leave this for a follow-up? "UPDATEORIGINS_AFTERDELETE" won't work because there are *two* triggers that that name could describe. And even though the names are hard to read, there's at least some logic to them, and they seem to be internally consistent. So if we renamed this one, in order to preserve that consistency we would need to rename most if not all the others too. (In reply to Marco Bonardo [::mak] from comment #185) > Comment on attachment 8948567 [details] > Bug 1239708: Improve awesomebar autofill. Part 1: Core follow-ons. > > https://reviewboard.mozilla.org/r/217974/#review247506 > > ::: toolkit/components/places/Database.cpp:2300 > (Diff revision 5) > > return NS_OK; > > } > > > > nsresult > > +Database::MigrateV48Up() { > > + MOZ_ASSERT(NS_IsMainThread()); > > I centralized this assert at the beginning of initSchema, so it's not > necessary to check at every migration method Fixed > ::: toolkit/components/places/Database.cpp:2302 > (Diff revision 5) > > > > nsresult > > +Database::MigrateV48Up() { > > + MOZ_ASSERT(NS_IsMainThread()); > > + > > + nsresult rv; > > nit: declare at first use Fixed > ::: toolkit/components/places/Database.cpp:2327 > (Diff revision 5) > > + "SELECT origin_id FROM moz_places; " > > + ), getter_AddRefs(stmt)); > > + if (NS_FAILED(rv)) { > > + rv = mMainConn->ExecuteSimpleSQL(NS_LITERAL_CSTRING( > > + "ALTER TABLE moz_places " \ > > + "ADD COLUMN origin_id INTEGER REFERENCES moz_origins(id) ON DELETE CASCADE; " > > ditto from the other review about the FK I removed the `ON DELETE` clause, as with the part-0 patch > ::: toolkit/components/places/Database.cpp:2350 > (Diff revision 5) > > + // main data in moz_origins, prefix and host, are coherent in relation to > > + // moz_places. > > + Unused << Preferences::SetBool(PREF_MIGRATE_V48_FRECENCIES, true); > > + > > + // From this point on, nobody should use moz_hosts again. Empty it so that we > > + // don't leak the user's history. > > ...but don't remove the table yet to support downgrades. Done > ::: toolkit/components/places/Database.cpp:2385 > (Diff revision 5) > > + // Migration done. Clear the pref. > > + Unused << Preferences::ClearUser(PREF_MIGRATE_V48_FRECENCIES); > > + return NS_OK; > > + } > > + > > + nsresult rv; > > nit: declare at first use Fixed (In reply to Marco Bonardo [::mak] from comment #186) > Comment on attachment 8948570 [details] > Bug 1239708: Improve awesomebar autofill. Part 4: Frecency stats. > > https://reviewboard.mozilla.org/r/217980/#review247528 > > ::: toolkit/components/places/nsINavHistoryService.idl:1400 > (Diff revision 5) > > */ > > unsigned long long hashURL(in ACString aSpec, [optional] in ACString aMode); > > + > > + /** > > + * The mean and standard deviation of all frecencies > 0 in the database. > > + */ > > It's probably worth adding a longer description about how these are used and > updated. Done > ::: toolkit/components/places/nsNavHistory.cpp:597 > (Diff revision 5) > > PRTime aLastVisitDate) > > { > > MOZ_ASSERT(!aGUID.IsEmpty()); > > + > > + nsCOMPtr<nsIURI> uri; > > + Unused << NS_NewURI(getter_AddRefs(uri), aSpec); > > I hope this won't be too expensive in case of multiple frecencies... but in > those cases we likely dispatch manyFrecenciesChanges... My patch just moves the NS_NewURI call, it doesn't add any more than were already there. > ::: toolkit/components/places/tests/unit/test_frecency_stats.js:63 > (Diff revision 5) > > + mean(Object.values(frecenciesByURL))); > > + Assert.equal(PlacesUtils.history.frecencyStandardDeviation, > > + stddev(Object.values(frecenciesByURL))); > > + > > + // Bookmark URL 1. > > + let bookmark = await addBookmark({ uri: NetUtil.newURI(urls[1]) }); > > The addbookmark eslint is probably due to the order of the patches, it's not > defined yet at this point in the patch set. Ah, thanks (In reply to Marco Bonardo [::mak] from comment #187) > Comment on attachment 8948572 [details] > Bug 1239708: Improve awesomebar autofill. Part 6: Browser tests. > > https://reviewboard.mozilla.org/r/217984/#review247532 > > ::: browser/base/content/test/urlbar/browser_urlbarDelete.js:6 > (Diff revision 5) > > add_task(async function() { > > let bm = await PlacesUtils.bookmarks.insert({ parentGuid: PlacesUtils.bookmarks.unfiledGuid, > > url: "http://bug1105244.example.com/", > > title: "test" }); > > > > + Services.prefs.setBoolPref("browser.urlbar.autoFill", false); > > Is this fixing an intermittent? why was it not necessary before? A couple of things going on here: (1) Without the patches, this test doesn't trigger autofill (because the URL isn't typed). (2) With the patches, autofill happens, and on macOS, `EventUtils.synthesizeKey("KEY_ArrowLeft", {altKey: true})` does not actually move the caret to the beginning like the test thinks it does, which causes subsequent failures. (It moves the caret between "words". Without the patches, that works fine because the text in the urlbar is a single word, "bug1105244". With the patch, there are multiple "words" separated by ".": "bug1105244.example.com".) I didn't realize (2) was a problem until your comment made me take a closer look at this test. I fixed (2) by changing altKey to metaKey. But that by itself still doesn't fix the test because the test later assumes that autofill didn't happen, so I fixed that part, too. (In reply to Marco Bonardo [::mak] from comment #188) > Comment on attachment 8948571 [details] > Bug 1239708: Improve awesomebar autofill. Part 5: xpcshell tests. > > https://reviewboard.mozilla.org/r/217982/#review247534 > > Nothing jumps at my eyes as particularly wrong, though it's a large patch > will probably do another pass > > ::: toolkit/components/places/tests/head_common.js:939 > (Diff revision 5) > > + Assert.ok(!!aBookmarkObj.uri, "Bookmark object contains an uri"); > > + let parentId = aBookmarkObj.parentId ? aBookmarkObj.parentId > > + : PlacesUtils.unfiledBookmarksFolderId; > > + > > + let bm = await PlacesUtils.bookmarks.insert({ > > + parentGuid: (await PlacesUtils.promiseItemGuid(parentId)), > > we are trying to move away from ids, please make the original object take a > parentGuid instead of parentId. > > Since in the future we'll also add tags to the official bookmarking API, I'd > also prefer if this addBookmark helper would be limited to cases where we > actually need to add a bookmark with a keyword, so maybe it should move to a > more restricted head than head_common.js (the original position made more > sense imo). I prefer consumers to stick to official APIs than helpers when > possible. I reverted this change, so now I don't touch this addBookmark function at all. The reason I moved this addBookmark helper is because I needed it for the frecency stats test, but that's in unit/, not in unifiedcomplete/, where addBookmark is defined. So now the frecency stats test adds a bookmark without using a helper function. > ::: toolkit/components/places/tests/migration/test_current_from_v43.js:205 > (Diff revision 5) > > add_task(async function test_meta_exists() { > > let db = await PlacesUtils.promiseDBConnection(); > > await db.execute(`SELECT 1 FROM moz_meta`); > > }); > > + > > +add_task(async function test_origins() { > > please add a separate test_current_from_v47.js file, It's ok if inside it > you reuse placesv43.sqlite or any version before 47. Done > ::: toolkit/components/places/tests/unifiedcomplete/head_autocomplete.js:136 > (Diff revision 5) > > } > > > > let actualStyle = result.style.split(/\s+/).sort(); > > if (style) > > Assert.equal(actualStyle.toString(), style.toString(), "Match should have expected style"); > > - if (uri.spec.startsWith("moz-action:")) { > > + if (uri && uri.spec.startsWith("moz-action:")) { > > in which case do we have a match without uri? None, IIRC I made this change because in some cases I'm not interested in checking the URI, only the value and comment > ::: toolkit/components/places/tests/unifiedcomplete/head_autofill.js:1 > (Diff revision 5) > > +/* This Source Code Form is subject to the terms of the Mozilla Public > > Instead of defining a separate head, I think you could use > defineLazyScriptGetter in head_autocomplete.js to lazy load addAutofillTasks > from a js file when it's actually used. Done > ::: > toolkit/components/places/tests/unifiedcomplete/test_autofill_origins.js:13 > (Diff revision 5) > > + > > +// "example.com/" should match http://example.com/. i.e., the search string > > +// should be treated as if it didn't have the trailing slash. > > +add_task(async function trailingSlash() { > > + await PlacesTestUtils.addVisits([{ > > + uri: NetUtil.newURI("http://example.com/"), > > it's not necessary anymore to create nsIURIs (and in any case when we do we > should use Services.io, not NetUtil) Fixed > ::: toolkit/components/places/tests/unit/test_origins.js:99 > (Diff revision 5) > > + checkDB([ > > + ["http://", "example.com"], > > + ["http://", "www.example.com"], > > + ["http://", "www.www.example.com"], > > + ]); > > + await PlacesUtils.history.remove([ > > history.remove also accepts single urls, not just arrays. > You can probably oneline most calls here. Done > ::: toolkit/components/places/tests/unit/test_origins.js:412 > (Diff revision 5) > > + } > > + checkDB([ > > + ["http://", "example.com"], > > + ["http://", "www.example.com"], > > + ]); > > + await PlacesUtils.bookmarks.remove(bookmarks[0].guid); > > you don't need to extract the guid, just pass the bookmark object to > remove(). Done > ::: toolkit/components/places/tests/unit/test_origins.js:455 > (Diff revision 5) > > + await cleanUp(); > > +}); > > + > > + > > +function checkDB(expectedOrigins) { > > + let stmt = DBConn().createStatement(` > > Please use promiseDBConnection along with the Sqlite.jsm syntax if possible, > we should in general limit usage of the old synchronous APIs. Done > ::: toolkit/components/places/tests/unit/test_sql_function_origin.js:28 > (Diff revision 5) > > + "get_prefix": parts.slice(0, 2).join(""), > > + "get_host_and_port": parts.slice(3, 5).join(""), > > + "strip_prefix_and_userinfo": parts.slice(3).join(""), > > + }; > > + for (let [func, expectedValue] of Object.entries(funcs)) { > > + let stmt = DBConn().createStatement(` > > ditto for avoiding synchronous storage APIs Done
Comment on attachment 8930304 [details] Bug 1239708: Improve awesomebar autofill. Part 0: Core changes. https://reviewboard.mozilla.org/r/201438/#review248182
Attachment #8930304 - Flags: review?(mak77) → review+
Comment on attachment 8948571 [details] Bug 1239708: Improve awesomebar autofill. Part 5: xpcshell tests. https://reviewboard.mozilla.org/r/217982/#review248186
Attachment #8948571 - Flags: review?(mak77) → review+
Some new TV failures on try...
Try all green again. I'll land this after the next merge. I think Mark is working on a bug with a migration, so it would probably be a good idea to let him land his first and then update my patches.
(In reply to Drew Willcoxon :adw from comment #218) > Try all green again. I'll land this after the next merge. I think Mark is > working on a bug with a migration, so it would probably be a good idea to > let him land his first and then update my patches. I think you're talking about bug 824502, if so, I think this is more critical to land sooner and I'm fine with rebasing on top of yours (plus mine hasn't got r+ yet). This is a bigger patch & has been in progress longer, so lets get it landed.
Oops, we had already merged. I missed the date on my calendar. One more rebase on the current tree and a try push and then I'll land this. https://treeherder.mozilla.org/#/jobs?repo=try&revision=e97142a1944e6db9d74ff2b28674d8cf06bfeb71
Pushed by dwillcoxon@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/41989bf3c107 Improve awesomebar autofill. Part 0: Core changes. r=mak https://hg.mozilla.org/integration/autoland/rev/918f531e11ce Improve awesomebar autofill. Part 1: Core follow-ons. r=mak https://hg.mozilla.org/integration/autoland/rev/37ed5dc61be4 Improve awesomebar autofill. Part 2: Non-core follow-ons. r=mak https://hg.mozilla.org/integration/autoland/rev/8b944decbb13 Improve awesomebar autofill. Part 3: Front-end changes. r=mak https://hg.mozilla.org/integration/autoland/rev/acca8e457b36 Improve awesomebar autofill. Part 4: Frecency stats. r=mak https://hg.mozilla.org/integration/autoland/rev/1338a75d206e Improve awesomebar autofill. Part 5: xpcshell tests. r=mak https://hg.mozilla.org/integration/autoland/rev/6f0272415280 Improve awesomebar autofill. Part 6: Browser tests. r=mak
Blocks: 1461691
Depends on: 1461753
Depends on: 1461736
While researching an unexpected behaviour bug, I stumbled over this bug/commit. Thank you guys for trying to improve the awesome bar! Unfortunately it isn't working 100% perfect yet: https://bugzilla.mozilla.org/show_bug.cgi?id=1462129
Depends on: 1462275
QA Contact: gwimberly
Depends on: 1462726
Depends on: 1462046
Depends on: 1463132
Depends on: 1463580
Depends on: 1464328
Depends on: 1463017
Depends on: 1464154
Depends on: 1464270
Depends on: 1466233
Depends on: 1467627
Depends on: 1467631
Depends on: 1467537
Depends on: 1469651
Depends on: 1470887
Depends on: 1474755
Depends on: 1471982
Depends on: 1478003
Depends on: 1481319
Depends on: 1478163
Depends on: 1482485
I've tested on Firefox 62 and I verified the dependencies that were Resolved Fixed and had qe-verify+/ni, I marked qe-verify- all the dependencies according to the discussions with Drew in bugs/emails. There still are 4 new/unresolved bugs which will be resolved in later releases, not in 62 (as discussed with Drew via email).
Blocks: 1485296
Bug 1408044 removed about:. Fair enough. But somehow, instead of showing the "Hmm. That address doesn’t look right." page, Firefox now goes on to search "about:" on the default search engine. Which doesn't seem desirable. I mozregressed this behavior to the landing of this bug. Before the landing, the "Hmm. That address doesn’t look right." page was still shown.
Flags: needinfo?(adw)
(In reply to Mike Hommey [:glandium] from comment #232) > Bug 1408044 removed about:. Fair enough. But somehow, instead of showing the > "Hmm. That address doesn’t look right." page, Firefox now goes on to search > "about:" on the default search engine. Which doesn't seem desirable. I'm not sure it's not desirable, it's not less useful than showing a not found page, and it doesn't look like a privacy hit. Probably a P5. Anyway, it's due to the change done to stripPrefix, before it was only stripping "http://", "https://", "ftp://", now it strips any scheme, as a consequence the actual search string became empty, hasSearchTerms is now false and we skip _matchUnknownUrl. We should evaluate whether the hasSearchTerms check before _matchUnknownUrl makes sense with the new behavior.
I don't think I agree that searching for about: (or http:) is undesirable. I'm not sure what's desirable but it doesn't seem worse than other options, and if anything better.
Flags: needinfo?(adw)
FWIW, I'd at least expect typing about: to have the same behavior as typing about:foo where foo is not a supported about: thing. And it doesn't, because about:foo shows "Hmm. That address doesn’t look right.". Even if about:foo sent to a search like about: now does, it seems completely weird, UX wise, that non supported addresses in an explicitly supported scheme work this way. BTW, moz:foo sends to https://blog.mozilla.org/opendesign/, and moz: to a search. That's probably undesirable as well. moz: sent to https://blog.mozilla.org/opendesign/ before.
(In reply to Mike Hommey [:glandium] from comment #235) > That's probably undesirable as well. moz: sent to > https://blog.mozilla.org/opendesign/ before. How is that expected or discoverable?
discoverability is kind of irrelevant. The point is: there's a protocol handler, and it's not being referred to when typing scheme:, while it might support said url. file: used to open file:///. Now it sends to a search.
Please file a separate bug, but as I said I don't think it has priority, it's very likely to only affect a minuscule percentage of tech users, as such it will end up being a P5 (patches welcome) bug. Clearly that will change as/if we receive more duplicate bugs.
Depends on: 1488879
(In reply to Marco Bonardo [::mak] from comment #238) > Please file a separate bug Bug 1489853 was filed and it looks related (probably worse due to keyword.enabled being disabled)
Depends on: 1489853
Depends on: 1489060
No longer depends on: 1489060
Depends on: 1494468
Depends on: 1507755
See Also: → 1513453
Regressions: 1591175
Regressions: 1604430
Regressions: 1613107
Blocks: 1630431
No longer depends on: 1466233
Depends on: 1597791
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: