Closed Bug 552023 Opened 10 years ago Closed 9 years ago

Kill Places partitioned temp tables

Categories

(Toolkit :: Places, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla2.0b9
Tracking Status
blocking2.0 --- betaN+
fennec - ---

People

(Reporter: mak, Assigned: sdwilsh)

References

(Blocks 1 open bug)

Details

(Whiteboard: [fixed-in-places])

Attachments

(2 files, 18 obsolete files)

238.79 KB, patch
Details | Diff | Splinter Review
755 bytes, patch
Details | Diff | Splinter Review
There are nice reasons to do so, but we can't till we have both icons and visits adding async. Thus we need to kill LAZY_ADD.

Even if the tables avoid doing some I/O on visits adding, they make all queries unreadable, much more slower, and bookmarks adding causes a sync at each change, so the gain will be smaller than what we would like.
Once adding is async we should provide a non-temp version to Portable Firefox guys and Fennec and get feedback on perfs.
Depends on: 573492
Blocks: 449124
Blocks: 563538
Assignee: nobody → mak77
this bug should block because now that we have async visits we can get a really nice perf improvement here (in some area I expect a 20-30% improvement, a more global but smaller improvement in all the rest of Places). It will for sure reduce statements time, greatly reducing possibility of lock contentions (see bug 563538)
blocking2.0: --- → ?
will need a way for extensions (like Sync) to query temp tables when needed, but don't query them if they don't exist.
(In reply to comment #2)
> will need a way for extensions (like Sync) to query temp tables when needed,
> but don't query them if they don't exist.
We can keep the views in place I guess.
(In reply to comment #3)
> (In reply to comment #2)
> > will need a way for extensions (like Sync) to query temp tables when needed,
> > but don't query them if they don't exist.
> We can keep the views in place I guess.

that means keeping all triggers too, and temp tables. This is all Ts stuff that we could remove.
Blocks: 580825
Attached patch Code changes v1.0 (obsolete) — Splinter Review
saving work as a backup. it should work, but I'm still tweaking some query.
Attached patch Tests changes v1.0 (obsolete) — Splinter Review
Attached patch Code changes v1.1 (obsolete) — Splinter Review
Attachment #459207 - Attachment is obsolete: true
Attached patch Tests changes v1.1 (obsolete) — Splinter Review
Attachment #459208 - Attachment is obsolete: true
Attached patch Code changes v1.2 (obsolete) — Splinter Review
Attachment #459795 - Attachment is obsolete: true
Attached patch Tests changes v1.2 (obsolete) — Splinter Review
Attachment #459796 - Attachment is obsolete: true
I'm still investigating a couple oranges.
a timeout in test_history_expiration.js (dm test) that is visible only on the branch, but I can't reproduce it locally.
a failure in browser_tabMatchesInAwesomebar.js that could be fixed already by these patches.
I'm waiting for try server results now.
there is also a failure in test_history_removeAllPages.js due to a timing issue on when we fix invalid frecencies (sounds like it could run before our check)... actually we could even comment out the negative frecency check for now and file it as followup, if needed.
the download test_history_expiration.js failure that was on places branch has instead disappeared on tryserver... but I also see other failures on places branch that i've never seen on central, so hard to tell.
Attached patch Code changes v1.3 (obsolete) — Splinter Review
Attachment #459917 - Attachment is obsolete: true
Attached patch Tests changes v1.3 (obsolete) — Splinter Review
Attachment #459918 - Attachment is obsolete: true
I've splitted out Bug 581660, Bug 581657 and Bug 581648 that I've found while working on this, and could be needed before landing this.

Notice I've seen a Ts increase on Places branch, I suspect it's an fsync effect. Now Ts takes in count first visit addition, without temp tables the addition happens on disk rather than in the memory table, adding a couple fsyncs.
blocking2.0: ? → betaN+
Blocks: 490035
Blocks: 407981
Depends on: 583852
Is this ready for review?
there are still 2 random oranges and the regression, so I'm reordering patches so that WAL can land first.
I'm unsure we can do much for the regression due to additional fsyncs, but I have to check it again on the branch with the new ordering
Attached patch Code changes v1.4 (obsolete) — Splinter Review
Attachment #460027 - Attachment is obsolete: true
Attached patch Tests changes v1.4 (obsolete) — Splinter Review
All oranges should be gone. I have to fix bug 584658 before this one though because it ended up being persistent.
Attachment #460028 - Attachment is obsolete: true
Depends on: 584658
Attached patch Code changes v1.5 (obsolete) — Splinter Review
Attachment #463124 - Attachment is obsolete: true
Attachment #463296 - Flags: review?(sdwilsh)
Attached patch Tests changes v1.5 (obsolete) — Splinter Review
Attachment #463126 - Attachment is obsolete: true
Attachment #463297 - Flags: review?(sdwilsh)
Whiteboard: [needs bug 584658 for a persistent orange][WAL will land later]
Comment on attachment 463297 [details] [diff] [review]
Tests changes v1.5

For review comments with expandable code context, please see http://reviews.visophyte.org/r/463297/.

on file: docshell/test/test_bug94514.html line 47
> // Because adding visits is async, we will not be notified imemdiately.

wow...how the hell did this test pass before? :(


on file: toolkit/components/places/tests/unit/test_history_removeAllPages.js line 93
>     // Memory table has been updated, disk table has not

this comment is wrong, right?


on file: toolkit/components/places/tests/unit/test_history_removeAllPages.js line 95
>       "SELECT h.id FROM moz_places h WHERE h.frecency > 0 " +
>         "AND EXISTS (SELECT id FROM moz_bookmarks WHERE fk = h.id) LIMIT 1");

This is a behavior change that I don't see why it is needed.  Can you please
explain it?


on file: toolkit/components/places/tests/unit/test_history_removeAllPages.js line 179
>   QueryInterface: function(iid) {
>     if (iid.equals(Ci.nsINavHistoryObserver) ||
>         iid.equals(Ci.nsISupports)) {
>       return this;
>     }
>     throw Cr.NS_ERROR_NO_INTERFACE;
>   }

XPCOMUtils please


r=sdwilsh
Attachment #463297 - Flags: review?(sdwilsh) → review+
(In reply to comment #22)
> on file: toolkit/components/places/tests/unit/test_history_removeAllPages.js
> line 95
> >       "SELECT h.id FROM moz_places h WHERE h.frecency > 0 " +
> >         "AND EXISTS (SELECT id FROM moz_bookmarks WHERE fk = h.id) LIMIT 1");
> 
> This is a behavior change that I don't see why it is needed.  Can you please
> explain it?

actually it's an orange, even if it never shown up, I can't tell which order onClearHistory handlers are called, so this could run too early or too late. FixInvalidIndices can run before it and the test would fail. Plus we are going to make frecency async so that would fail regardless as it was.
FixInvalidFrecencies, I meant.
Comment on attachment 463296 [details] [diff] [review]
Code changes v1.5

This only took me like three hours.  That's not so bad at all!

For review comments with expandable code coverage, please see http://reviews.visophyte.org/r/463296/.

on file: toolkit/components/places/src/nsNavHistory.cpp line 364
> NS_IMPL_ISUPPORTS2(PlacesEvent, mozIStorageCompletionCallback, nsIRunnable)

nit: this doesn't follow our normal style


on file: toolkit/components/places/src/nsNavHistory.cpp line 846
>     // Visits triggers.
>     rv = mDBConn->ExecuteSimpleSQL(CREATE_HISTORYVISITS_AFTERINSERT_TRIGGER);
>     NS_ENSURE_SUCCESS(rv, rv);
>     rv = mDBConn->ExecuteSimpleSQL(CREATE_HISTORYVISITS_AFTERDELETE_TRIGGER);
>     NS_ENSURE_SUCCESS(rv, rv);

Hmm, did we have these before?


on file: toolkit/components/places/src/nsNavHistory.cpp line 1020
>   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>     "SELECT id, session, visit_date "
>     "FROM moz_historyvisits "
>     "WHERE place_id = (SELECT id FROM moz_places WHERE url = :page_url) "
>     "ORDER BY visit_date DESC "
>     "LIMIT 1 "

LIMIT 1 seems not needed here and adds extra opcodes


on file: toolkit/components/places/src/nsNavHistory.cpp line 1029
>   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>     "SELECT id FROM moz_historyvisits "
>     "WHERE place_id = :page_id "
>       "AND visit_date = :visit_date "
>       "AND session = :session "
>     "LIMIT 1 "

Likewise, here.  LIMIT 1 seems uneeded


on file: toolkit/components/places/src/nsNavHistory.cpp line 1696
>   if (!triggerExists) {

Please add a comment explaining that we only have to check one trigger because
both should have been removed in the past.


on file: toolkit/components/places/src/nsNavHistory.cpp line 4059
>   nsresult rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
>       "SELECT url, visit_date FROM moz_historyvisits v "
>       "JOIN moz_places h ON v.place_id = h.id "
>       "WHERE h.hidden <> 1 "
>       "ORDER BY visit_date DESC "
>       "LIMIT 1 "),

LIMIT 1 is not needed here either.


on file: toolkit/components/places/src/nsPlacesAutoComplete.js line 190
>   const SQL_BASE = "SELECT h.url, h.title, f.url, " + kBookTagSQLFragment + ", "
>                  +        "h.visit_count, h.typed, h.id, :query_type, "
>                  +        "t.open_count "
>                  + "FROM moz_places h "
>                  + "LEFT JOIN moz_favicons f ON f.id = h.favicon_id "
>                  + "LEFT JOIN moz_openpages_temp t ON t.place_id = h.id "
>                  + "WHERE h.frecency <> 0 "
>                  +   "AND AUTOCOMPLETE_MATCH(:searchString, h.url, "
>                  +                          "IFNULL(bookmark, h.title), tags, "
>                  +                          "h.visit_count, h.typed, parent, "
>                  +                          "t.open_count, "
>                  +                          ":matchBehavior, :searchBehavior) "
>                  +  "{ADDITIONAL_CONDITIONS} "
>                  + "ORDER BY h.frecency DESC, h.id DESC "
>                  + "LIMIT :maxResults";

oh wow this is easier to read!


on file: toolkit/components/places/src/nsPlacesAutoComplete.js line 237
>     return this._db.createAsyncStatement(

ugh...wow.  I thought I fixed this!


on file: toolkit/components/places/src/nsPlacesAutoComplete.js line 283
>       "/* do not warn (bug 487789) */ "

I just want to confirm that this warns still if we remove this, correct?


on file: toolkit/components/places/src/nsPlacesAutoComplete.js line 311
>       "/* do not warn (bug 487787) */ "

ditto here - I think doing the ORDER BY h.frecency means we shouldn't warn
anymore.


r=sdwilsh
Attachment #463296 - Flags: review?(sdwilsh) → review+
(In reply to comment #25)
mDBConn->ExecuteSimpleSQL(CREATE_HISTORYVISITS_AFTERINSERT_TRIGGER);
> >     NS_ENSURE_SUCCESS(rv, rv);
> >     rv = mDBConn->ExecuteSimpleSQL(CREATE_HISTORYVISITS_AFTERDELETE_TRIGGER);
> >     NS_ENSURE_SUCCESS(rv, rv);
> 
> Hmm, did we have these before?

yes!

> on file: toolkit/components/places/src/nsNavHistory.cpp line 1020
> >   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
> >     "SELECT id, session, visit_date "
> >     "FROM moz_historyvisits "
> >     "WHERE place_id = (SELECT id FROM moz_places WHERE url = :page_url) "
> >     "ORDER BY visit_date DESC "
> >     "LIMIT 1 "
> 
> LIMIT 1 seems not needed here and adds extra opcodes

is is gettin the most recent visit for that url, so it does not need to fetch more than 1 record. limiting usually helps perf in these cases because sqlite knows before to stop after first result.

> on file: toolkit/components/places/src/nsNavHistory.cpp line 1029
> >   rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
> >     "SELECT id FROM moz_historyvisits "
> >     "WHERE place_id = :page_id "
> >       "AND visit_date = :visit_date "
> >       "AND session = :session "
> >     "LIMIT 1 "
> 
> Likewise, here.  LIMIT 1 seems uneeded

probably here we can avoid it since it's hard that we have 2 visits at the same time, but covers that rare case

> on file: toolkit/components/places/src/nsNavHistory.cpp line 4059
> >   nsresult rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING(
> >       "SELECT url, visit_date FROM moz_historyvisits v "
> >       "JOIN moz_places h ON v.place_id = h.id "
> >       "WHERE h.hidden <> 1 "
> >       "ORDER BY visit_date DESC "
> >       "LIMIT 1 "),
> 
> LIMIT 1 is not needed here either.

looks like same as case 1, there are more visits for a single page, and we need just one

> on file: toolkit/components/places/src/nsPlacesAutoComplete.js line 283
> >       "/* do not warn (bug 487789) */ "
> 
> I just want to confirm that this warns still if we remove this, correct?

It should warn due to tags fragment

> on file: toolkit/components/places/src/nsPlacesAutoComplete.js line 311
> >       "/* do not warn (bug 487787) */ "
> 
> ditto here - I think doing the ORDER BY h.frecency means we shouldn't warn
> anymore.

I think we warn for the tags fragment, but will check
So, about the LIMIT stuff, I've found this comment from drh: http://www.mail-archive.com/sqlite-users@sqlite.org/msg35473.html
LIMIT can save memory if it's used with an ORDER BY that is not using an index. In our cases we should always be on an index and I can remove them.
Attached patch coalesced patch v1.6 (obsolete) — Splinter Review
Attachment #463296 - Attachment is obsolete: true
Attachment #463297 - Attachment is obsolete: true
http://hg.mozilla.org/mozilla-central/rev/b1e658a33221
and disabled test_bug500328.html with
http://hg.mozilla.org/mozilla-central/rev/83ae1826aa80
From what I see we try to add a page to moz_places twice (I guess from real visit and some other api, either registerOpenPage and/or pushstate), and the second addition fails, so visit is not added. Will file a bug for this.

Notice this also shown an increadible Ts increase on Linux, that we'll have to figure out. will file a bug for it as well, it could cause a backout though.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b4
Depends on: 585703
Depends on: 585704
Blocks: 585713
backed out both changes, due to Talos and stil la xpcshell test failure (typo), but mostly what is blocking this is the strange Ts regression on Linux.
http://hg.mozilla.org/mozilla-central/rev/d2f7b3ee3a73
http://hg.mozilla.org/mozilla-central/rev/a673a956d339
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [needs bug 584658 for a persistent orange][WAL will land later] → [Caused regressions that need investigation]
Attached patch coalesced patch v1.7 (obsolete) — Splinter Review
small unbitrot that applies on top of my long queue. Hope to be able to push something from that queue in the next hours.
Attachment #464068 - Attachment is obsolete: true
test_bug94514.html started failing intermittently after I pushed my patch to bug 580069.  I'm going to try to pull out the test fix from patch v1.7 and see if that fixes the problem.  If it does, I'll report back here.
I've been trying to pull out the fix for test_bug94514.html from this patch, but I've been causing much more orange on try than I've been fixing.

From the tryserver logs [1], it appears that the observer added in the test never gets removed.  (Notice that tests after test_bug94514.html fail *in* test_bug94514.html.)  Something must be wrong with the removeObserver call, since the line after it certainly executes [2].

I've been unable to reproduce this failure locally, unfortunately, so I don't have much more insight into what's going on.  But I'd really like to figure out how to fix it so I can resolve bug 580069.

[1] http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1282415594.1282416222.4180.gz
[2] http://hg.mozilla.org/try/pushloghtml?changeset=6c5c9a6c3ba4
looks like you did not apply the change to bug94514-postpage.html though
Ah, that would probably do it.  Thanks.
Blocks: 582703
Grabbing this since mak is busy.
Assignee: mak77 → sdwilsh
Status: REOPENED → ASSIGNED
Attached patch Code changes v1.5 (untested) (obsolete) — Splinter Review
This is the v1.5 patch just modified to apply to tip.  This does mean I need to address my review comments still.
Attachment #464835 - Attachment is obsolete: true
why not using v1.7?
Attached patch Test changes v1.5 (untested) (obsolete) — Splinter Review
Same as above, but tests this time.
fwiw I could have done additional fixed in v1.6 and v1.7, so please interdiff them with v1.5
(In reply to comment #38)
> why not using v1.7?
Largely because I didn't notice the difference and thought it was just a combined patch.  I noticed when I was uploading and had already spent two hours rebasing it. :(

(In reply to comment #40)
> fwiw I could have done additional fixed in v1.6 and v1.7, so please interdiff
> them with v1.5
Sadly, interdiff isn't going to be terribly useful
well 1.6 was combined, but unbitrotted and with your review comments fixed.Inded that's what we pushed.
1.7 was just a further unbitrot over 1.6.
Attached patch v1.7 (obsolete) — Splinter Review
v1.7 rebased.  This doesn't pass tests yet.
Attachment #474740 - Attachment is obsolete: true
Attachment #474742 - Attachment is obsolete: true
Only failing on unit\test_history_removeAllPages.js right now.  Out of time to debug, but I'll come back to this friday.
Attached patch fixups v1.0 (obsolete) — Splinter Review
Changes I used to get stuff passing.
(In reply to comment #45)
> Changes I used to get stuff passing.

These make sense, I thought I had fixed that FROM FROM typo in the last patch... looks like I did not. r=me to merge these changes to the patch.
test_bug94514.html got a double merge in your patch, my fix to that test has been already pushed by jlebar in another bug. So you should remove that change.
Minor other fixes that let tests pass locally (or at least tests in toolkit/components/places)
Attachment #474829 - Attachment is obsolete: true
Attachment #474843 - Attachment is obsolete: true
(In reply to comment #29)
> http://hg.mozilla.org/mozilla-central/rev/b1e658a33221
> and disabled test_bug500328.html with
> http://hg.mozilla.org/mozilla-central/rev/83ae1826aa80
> From what I see we try to add a page to moz_places twice (I guess from real
> visit and some other api, either registerOpenPage and/or pushstate), and the
> second addition fails, so visit is not added. Will file a bug for this.
Note that I'm still seeing this on windows on the places branch.  Commented in bug 585703 about that.
Trivial fix for test_clearHistory_shutdown.js that I'm going to roll into v1.9.
Depends on: 598721
Depends on: 598751
yeah, we won't receive sync anymore, the test fix can be merged.
The failure as I commented in bug 585703 must be something other, unless you still see a "constraint 19 failed" message
Depends on: 598781
And for the sake of interesting numbers, this is bug 573492 (WAL) and bug 552023 (kill temp tables) perf numbers: http://tinyurl.com/2dodkwm
Blocks: 599978
So, I'm renominating this for blocking because I think that since we now understand this problem a bit better and I'm starting to lean towards not blocking the release on this.  So let's start with what we know:
1) This blocks a bunch of bugs, only one of which is a blocker that absolutely can not happen without this (bug 582703).  I'm not even sure that needs to be a blocker.
2) This greatly reduces the complexity in all of our SQL queries for places, which makes our lives way easier.
3) This change nets us several performance wins:
   * Slight tp4 win on linux (1-2%)
   * Memory usage reduction on during tp4 on all platforms (0-6%)
   * tp4_shutdown on all platforms (4-30%)
   * ts on windows (2%) and 32-bit OS X (3%)
   * Slight ts_cold on 32-bit OS X (1%) and linux (0-2%)
     [note: these hold true for the generated profile tests too]
   * ts_cold_shutdown (0-16%)
     [note: these hold true for the generated profile tests too]   * txul on OS X (0-3%)
4) This change gives us some performance regressions
   * 39% tp4 regression on windows
   * ts regression on linux (18-25%)
   * ts_cold regression on 64-bit OS X (18%)
     [note: this turns into a win in ts_cold_generated_max_shutdown]
   * txul on windows (0-5%) and linux (0-2%)
5) Punting on this means we'll have to continue trying to maintain it and prevent it from bitrotting until we can get this into a workable state.

And what we don't know:
a) How much work the Sync team would also need to do to update themselves due to this change (I literally just realized this is a problem now, which is 
unfortunate.  I thought we'd be running Sync tests on our tinderbox...)
b) The cause of the tp4 regression on windows
c) The cause of the ts regression on linux, although increased number of fsyncs, even if they are not on the main thread, are highly suspect.
d) Why 32-bit and 64-bit OS X are getting different numbers.
e) What is wrong with test_bug500328.html and fix it.
f) What else might break from this that doesn't have test coverage, including add-ons that make raw SQL queries.
g) How much time it will take to get i) answers to (a)-(f) and ii) how much time solutions based on those answers might take.

What we don't know is what concerns me. I can see it taking a month or two of an engineer's time, and some of the questions would be answered far too late, like (f), to deal with at that time.  For these reasons, I'm leaning towards making this not block, but someone else should make that call.
blocking2.0: betaN+ → ?
And since I never put this in the bug, here are the perf numbers for this (on top of bug 573492 but not with it like comment 52):
http://tinyurl.com/234ewzo
(In reply to comment #53)
> a) How much work the Sync team would also need to do to update themselves due
> to this change (I literally just realized this is a problem now, which is 
> unfortunate.

Sync is ready for this ever since bug 583852 which got us from querying views back to tables + temp tables. Since this bug was very close to landing (and it did, for a brief while, land) we put the temp-less setup in for forwards compatibility (if not Sync would've been broken instantly).

> I thought we'd be running Sync tests on our tinderbox...)

We are. They're xpcshell-tests under services/sync/tests/. When they break they show up as orange X on tbpl as I painfully discovered myself recently ;)
(In reply to comment #55)
> Sync is ready for this ever since bug 583852 which got us from querying views
> back to tables + temp tables. Since this bug was very close to landing (and it
> did, for a brief while, land) we put the temp-less setup in for forwards
> compatibility (if not Sync would've been broken instantly).
Yeah, maybe that's why Places branch didn't go orange when I landed this then.

> We are. They're xpcshell-tests under services/sync/tests/. When they break they
> show up as orange X on tbpl as I painfully discovered myself recently ;)
To be clear, I meant the Places tinderbox, which may not be an exact copy of mozilla-central

Glad to know that (a) isn't an issue.
(In reply to comment #53)
> 4) This change gives us some performance regressions
>    * ts regression on linux (18-25%)
mak and I suspect this is related to an increased number of fsyncs, which would likely be mitigated by bug 599969 but there is no way we'd have time for that in 2.0.
The level of performance regression as currently stands is not something we can live with - and it's especially surprising because this change was supposed to improve performance!

Given that we don't know what caused the various problems, and the fact that these kinds of changes tend to result in many months of bugs, I'm sufficiently scared, and want to punt on it for Firefox 4.

However, this blocks 3 other blockers:

bug 582703
bug 519514
bug 563538

All of these are bad, and should be fixed. This makes me hesitant to unblock on this.

Maybe it'd be worth having Shawn or Marco spend a week in solitary to look at nothing but this?
blocking2.0: ? → betaN+
eep, reset the mobile flag somehow, and don't have privs to set it back - requesting re-blocking there.
tracking-fennec: --- → ?
(In reply to comment #58)
> However, this blocks 3 other blockers:
Sorry, I should have elaborated on these more...

> bug 582703
This bug blocks bug 563538 which is a blocker (and the reason why it became a blocker).  However, I just removed that dependency and renominated it (bug 563538 comment 5 for rationale).

> bug 519514
This would be accomplished by making an asynchronous bookmarks API.  It'd still be potentially slow even with this fixed because we'd still be doing disk I/O on the main thread.  Sadly, I missed this in my original assessment because I wasn't looking for the fennec blocking flag.  Regardless, fixing this bug won't change the approach that needs to be done to fix this bug.  It just means slightly more complicated queries, but we are already used to that.  In fact, I'm going to remove this bug from blocking it.  (see also bug 519514 comment 17)

> bug 563538
Everything in the dependency list for that bug is "oh this should help make this less of a problem".  No single thing there is going to make it go away, and missing one or two isn't likely to hurt.  The only way we can make that go away completely is to make *all* of places async, which is totally out of scope for Firefox 4.

I'm renoming this since I actually have rationale about the blockers here (down two, maybe all three) and dietrich's and my timezone offsets make it so that we can't discuss this in person.  I think the only point of contention here is ug 563538.
No longer blocks: placesAsyncBookmarks
blocking2.0: betaN+ → ?
Honestly, I'm having trouble understanding the last few comments. It seems like we have a known performance regression - if that's true, then it should block. Plain and simple.

Ultimately I agree with Dietrich, though, in that we should fix the dependent bugs first and then hopefully not need this fix for Firefox 4. It feels large, intensive and dangerous at this time.
It's not that cut and dry. Temp tables was a change in 3.6 that improved performance. But now we want to improve performance in further and broader ways, and that 3.6 change is holding us back.

Removing the 3.6 change (this bug) results in a fabulously contradictory mix of perf wins and losses, sometimes straddling platforms in double-digit deltas (eg: 2% tp4 win on linux, 39% tp4 regression on windows).
So is it the case that we have a known perf regression, or a reported-in-some instances regression? How bad is the regression? We might have to sequester Mak and sdwilsh, as you suggest.
(In reply to comment #61)
> Honestly, I'm having trouble understanding the last few comments. It seems like
> we have a known performance regression - if that's true, then it should block.
> Plain and simple.
It's a known performance regression in that we know we have one.  What we don't know, is what is causing it.  That will be at least a days worth of work, if not more to figure out.  The performance regression is also only related to the patch in this bug - it doesn't exist in the tree now.

> Ultimately I agree with Dietrich, though, in that we should fix the dependent
> bugs first and then hopefully not need this fix for Firefox 4. It feels large,
> intensive and dangerous at this time.
I'm not sure what you are referencing here to be honest.

(In reply to comment #63)
> So is it the case that we have a known perf regression, or a reported-in-some
> instances regression? How bad is the regression? We might have to sequester Mak
> and sdwilsh, as you suggest.
To restate: this patch will introduce a performance regression.  A very large one.  See http://tinyurl.com/234ewzo for all the details, or comment 53 for the summary (points (3) and (4)).
Windows Tp4 regression is what hurts, and it is so because of the way Tp4 works (a bunch of websites loaded in a short time) that is quite far from the common user's perspective. Personally I'd expect various UI Tsnappiness real benefits instead, but we trust Talos, so this can't land with that regression in.
I'd suggest to check where we will be when async frecency will land, if this will still be so bad, I fear there won't be time for this to make fx4, we could try 4.1.
(In reply to comment #65)
> Windows Tp4 regression is what hurts, and it is so because of the way Tp4 works
> (a bunch of websites loaded in a short time) that is quite far from the common
> user's perspective. Personally I'd expect various UI Tsnappiness real benefits
> instead, but we trust Talos, so this can't land with that regression in.
> I'd suggest to check where we will be when async frecency will land, if this
> will still be so bad, I fear there won't be time for this to make fx4, we could
> try 4.1.
Well, we are doing something that makes us take significantly longer to accomplish our work on windows that isn't the case on other platforms.  Even if talos doesn't measure what the user might actually see, it's measuring something here that we very likely care about.
Even better: the tp4 regression doesn't reproduce locally.  There are only two differences in my setup: I'm running win7 64-bit and the talos boxes run 32-bit, and the pageset is not what we test in production.  Gonna get the production page set now, but I'm not yet optimistic.
love to have it, but not blocking fenenc 2.0.
tracking-fennec: ? → 2.0-
So far I have been unable to reproduce the persistent orange on tinderbox locally or in a record & replay session.
(In reply to comment #53)
>    * ts regression on linux (18-25%)
Seeing 0 change in the number of fsyncs, and the number of reads and writes seems to be about the same between before and after. :(
(In reply to comment #53)
>    * txul on windows (0-5%) and linux (0-2%)
Likewise here on linux.  Sadfaces.  It is possible that this isn't a real regression though.
(In reply to comment #70)
> (In reply to comment #53)
> >    * ts regression on linux (18-25%)
> Seeing 0 change in the number of fsyncs, and the number of reads and writes
> seems to be about the same between before and after. :(

maybe a query that we run on startup that in the rewrite has become slower? Sounds strange but it's possible that sqlite optimizer is taking the wrong way like in the other case we saw.
(In reply to comment #72)
> maybe a query that we run on startup that in the rewrite has become slower?
> Sounds strange but it's possible that sqlite optimizer is taking the wrong way
> like in the other case we saw.
I would expect that to be cross platform regressions then since the optimizer is not platform specific.
http://hg.mozilla.org/projects/places/rev/676388cb7893
Whiteboard: [Caused regressions that need investigation] → [fixed-in-places]
Depends on: 602732
(In reply to comment #69)
> So far I have been unable to reproduce the persistent orange on tinderbox
> locally or in a record & replay session.
And it's not persistent orange anymore!  I haven't even seen it go orange yet.  Maybe we had a bad places tree before...
Duplicate of this bug: 602984
How soon will the glory that is this patch land on m-c?
(In reply to comment #77)
> How soon will the glory that is this patch land on m-c?
When the performance regressions are all taken care of.  I realize that's not a great estimate, but it's my number one focus right now.
Is there anything I can do to help?
(In reply to comment #79)
> Is there anything I can do to help?
I could use some help with the Ts regression on linux if you have linux hardware.  Also, if you can reproduce the Tp4 regression on windows, getting profiles may be helpful.  I can't reproduce it locally myself, which is making it hard to analyses.

Also, unclear why we have regressions on only 64-bit OS X (I have no insight on that at all).  If you want to jump on any of those, that would be helpful.
An interesting question: with temp tables we know PlacesDBFlush was initing Places pretty early. Do we know who is the king of the hill today? We could be initing Places at a point where we can avoid it. Did nsBrowserGlue take back the lead?
hm, or probably it's not nsBrowserGlue that listens for init-complete... I'm maybe thinking to tryBookmarkCharset and bug 582712 could help Ts quite a bit if that's the case.
(In reply to comment #81)
> An interesting question: with temp tables we know PlacesDBFlush was initing
> Places pretty early. Do we know who is the king of the hill today? We could be
> initing Places at a point where we can avoid it. Did nsBrowserGlue take back
> the lead?
Well, we add the startup test page to history anyway, so we are still going to init ourselves in the startup path, and there really is no way around that AFAICT.
yes but there is an interesting difference, the startup page is going to fire async stuff that can walk in parallel, while other sync stuff would hurt much more.
My current hunch on the tp numbers is that we are constantly idle, and therefore are doing expiration all the time.  Since we no longer have temp tables, that results in more writes.  I just pushed a patch to the try server that effectively disables the idle service (http://hg.mozilla.org/try/rev/a42e3a7b0d9c) to see if the Tp4 numbers go down.  Right now they are about 20-25% on windows with various other work we've landed on the places branch to fix these regressions:
http://tinyurl.com/23obk3a
if that would be true idle service would be completely messed up and we had noticed a regression when landing expiration. Actually expiration on idle should be running just once and then stopping any activity till 'back' is received. If idle would be notified multiple times I'd be pretty much sad. Also from my experience OSX is always reporting idle, but it doesn't show regression.
(In reply to comment #86)
> if that would be true idle service would be completely messed up and we had
> noticed a regression when landing expiration. Actually expiration on idle
> should be running just once and then stopping any activity till 'back' is
> received. If idle would be notified multiple times I'd be pretty much sad. Also
> from my experience OSX is always reporting idle, but it doesn't show
> regression.
I'll find out soon - builds just finished (http://hg.mozilla.org/try/rev/58adeeb9e0ca).
And...no significant change.  Back to do the drawing board on the tp4 regression :(
http://tinyurl.com/2fzktd8
Blocks: 518804
Blocks: 596546
Blocks: 412736
Blocks: 606460
blocking2.0: ? → betaN+
Depends on: 610442
Blocks: 614103
Blocks: 616556
(In reply to comment #89)
> http://hg.mozilla.org/mozilla-central/rev/676388cb7893
Thanks, but I was going to close all these bugs tomorrow with the m-c changesets ;)
Status: ASSIGNED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: mozilla2.0b4 → mozilla2.0b9
Attachment #476842 - Attachment description: v1.8 → v1.8 [Checked in: Comment 74+89]
Attachment #477632 - Attachment description: fix test_clearHistory_shutdown.js → fix test_clearHistory_shutdown.js [Checked in: Comment 74+89]
+
http://hg.mozilla.org/mozilla-central/rev/c8d8af0fcc38
"Fix test failure due to merge. _views are no longer around on the places branch."
Attachment #476842 - Attachment description: v1.8 [Checked in: Comment 74+89] → v1.8 [Checked in: Comment 74 & 89]
Attachment #477632 - Attachment description: fix test_clearHistory_shutdown.js [Checked in: Comment 74+89] → fix test_clearHistory_shutdown.js [Checked in: Comment 74 & 89]
Flags: in-testsuite+
Blocks: 524761
You need to log in before you can comment on or make changes to this bug.