Closed
Bug 968982
Opened 11 years ago
Closed 10 years ago
Investigate keeping less data for the seer to improve I/O performance
Categories
(Core :: Networking, defect)
Core
Networking
Tracking
()
RESOLVED
INVALID
People
(Reporter: u408661, Assigned: u408661)
Details
First note: we don't want to swing too far in the direction of reducing our prediction capabilities just to improve I/O perf, since that would make the feature useless. We're trying to find a good balance, here.
There are a few different approaches we can take, with different tradeoffs (exact impact of those tradeoffs are yet to be seen - that's why we need to experiment).
1. "Diet Seer" - This approach drops full URIs entirely from the database, operating only on origins. Quick rundown of tradeoffs:
- Stores significantly less data (current db size is dominated by full URIs)
- To create more than 1 speculative connection (which we want to be able to do), we would have to fudge the number somehow, either by keeping track of an approximation in the db, or by just picking a number and opening that many connections.
- Reduced prediction precision for pages already visited (whether or not this makes a difference remains to be seen)
- Removes the ability to do prefetch predictions entirely (we definitely want to do this in the future, but it's not necessary now)
2. "Low-Cal Seer" - Similar to the above (we never store target URIs for predictions, only origins), but we do store source URIs (ie, the URI in the URL bar) for increased precision on revisited pages. Tradeoffs here are the same as above, but we keep a little more data (but still a significant reduction from the current scheme) for better precision on pages already visited.
3. "New Coke Seer" - This approach would give us all the advantages of the current approach with the exception of the ability to do prefetch predictions. Instead of storing full URIs for resources on a page, we would store a small (32-64 bit) hash of the URI, allowing for full-precision predictions without having to store full URIs. This also has the advantage that it could (relatively) easily be expanded to allow prefetch predictions in certain cases, with a storage increase that remains to be seen, but is guaranteed to be less than the current schema.
We should see how these approaches play out against each other *and* the current approach (which gets us some pretty awesome improvements on SpeedIndex - see https://groups.google.com/forum/#!searchin/mozilla.dev.platform/hurley/mozilla.dev.platform/tAnuc-J0DbY/snrGsOVW1MAJ ).
I will also consult with DRH on my proposed schemas for the above to make sure they're as smart as they can be.
This investigation won't involve any significant changes to transaction usage or space limiting code (except, in the case of the latter, as dictated by schema changes).
Comment 1•11 years ago
|
||
I'm happy to take a look at your schema proposals if you wish. Off-hand option 3 seems the best approach, though I don't know why you can't do prefetch preditions with it, cause I don't know details about how predictions are working (is there a documentation somewhere?), do you need partial fragments of uris for that?
I made some experiments in places.sqlite db to figure if the mfbt hashing algorithm (we use for hashtables) would be usable for the scope of replacing uris, out of 105000 uris I hit 2 collisions, so, dependending on how bad would be to collide for you, may be an easy choice to implement. It's also quite fast. Though, I will also experiment with the WITHOUT ROWID option in the next weeks.
Marking invalid, as we no longer use sqlite for predictor backend.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → INVALID
You need to log in
before you can comment on or make changes to this bug.
Description
•