Closed Bug 636393 Opened 14 years ago Closed 8 years ago

mozIAsyncHistory: accept string URIs in mozIPlaceInfo objects

Categories

(Toolkit :: Places, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: philikon, Unassigned)

References

Details

We talked about accepting strings for the 'uri' attribute in mozIPlaceInfo objects but never implemented it. Turns out there's be a serious advantage to not having to create thousands of URI objects in JavaScript but in C++: they wouldn't have to be gc'ed at some point (expensive for tens of thousands of objects).
there is also a serious disadvantage, nsIURIs are checked for their validity, so you can't really pass in a bad url string. Even if we accept strings we should then check if they are valid uris, by making a nsIURI, then you would lose your advantage.
just as a stupid example without considering bad uris passing in http://www.mozilla.com and http://www.mozilla.com/ would end up creating 2 separate entries. nsIURI ensures this won't happen. Same for encoded/decoded and such.
I'm aware of that. Anything that doesn't convert to an nsIURI (which can be added as in CanAddURI) shouldn't be accepted. And by that I mean it should trigger the callback with an error code, not throw an exception.
then, if we are going to convert to nsIURI in the backend, where is the perf gain of not doing that in the caller?
(In reply to comment #4) > then, if we are going to convert to nsIURI in the backend, where is the perf > gain of not doing that in the caller? No need for JS GC to collect these objects (see comment 0). Sync creates way too many objects during a first sync right now, causing GC to step in and block the main thread while it collects them all. Anything that we can do to avoid creating those objects in the first place (at least in a way that has the GC free them) would be great.
I find confusing a API that adds something different than what I passed in. The above example is clear I pass http://domain but the api adds http://domain/, then suppose I wait for notification in js about the url I added. There could be other cases where nsIURI changes what is being added. Sounds error prone, but maybe I'm just too paranoid.
(In reply to comment #6) > I find confusing a API that adds something different than what I passed in. > The above example is clear I pass http://domain but the api adds > http://domain/, then suppose I wait for notification in js about the url I > added. There could be other cases where nsIURI changes what is being added. > Sounds error prone, but maybe I'm just too paranoid. I think we are fine as long as we document that it will create a URI object if a string is provided. (In reply to comment #3) > I'm aware of that. Anything that doesn't convert to an nsIURI (which can be > added as in CanAddURI) shouldn't be accepted. And by that I mean it should > trigger the callback with an error code, not throw an exception. That's counter to how we handle other errors like this though. If it doesn't convert, it'd be an illegal argument and throw.
(In reply to comment #7) > That's counter to how we handle other errors like this though. If it doesn't > convert, it'd be an illegal argument and throw. I know. I'm increasingly thinking that this wasn't such a good idea. Aborting the whole batch due to one bad URI would completely break us. Good thing we marked this API as experimental, right? :)
(In reply to comment #8) > I know. I'm increasingly thinking that this wasn't such a good idea. Aborting > the whole batch due to one bad URI would completely break us. Good thing we > marked this API as experimental, right? :) The problem is that it'd be a big change in the back-end to report an error instead of throwing.
(In reply to comment #9) > The problem is that it'd be a big change in the back-end to report an error > instead of throwing. Fair enough. I realize that now. Though in my mind this only changes the risk/benefit assessment and timeline for this bug. I still think we should do it.
dunno, if we are really creating thousands of nsIURI objects between consecutive gc session, there is clearly something more broken than what we can fix by changing the API.
(In reply to comment #11) > dunno, if we are really creating thousands of nsIURI objects between > consecutive gc session, there is clearly something more broken than what we can > fix by changing the API. gc is going to be far more expensive than internal refcounting.
...although, those same objects would be reflected to the gc anyway when we do our callbacks. Are we sure this would fix anything?
(In reply to comment #11) > dunno, if we are really creating thousands of nsIURI objects between > consecutive gc session, there is clearly something more broken than what we can > fix by changing the API. Not sure I'd call it broken. The constraints are (correct me if I'm wrong): * on a first sync, we're syncing down thousands of history records (capped to 5000 on mobile) * history URIs should be verified somehow (right now we use nsIURI) The goal here is to get this without the GC churn that we have right now from creating those nsIURI objects in JavaScript. It might be less than ideal using nsIURI objects here, even in C++. If we can avoid it while still getting the validation, fine by me. My point is that any kind of object creation churn we can avoid in JS should be avoided.
(In reply to comment #13) > ...although, those same objects would be reflected to the gc anyway when we do > our callbacks. Are we sure this would fix anything? Yep, that's why I wanted an explicit error callback, remember? the placeInfo object passed back to JS is useless to us if the call succeeded. I will file a separate bug for that.
(In reply to comment #14) > * on a first sync, we're syncing down thousands of history records (capped to > 5000 on mobile) Sure, this should be done in a timeframe adapt to the device. I don't think the scope is to give user all history in the first 10 seconds. 5000 visits could not be a lot if there are multiple visits to a page, but 5000 pages are a lot. Empirically, on mobile, I'd not add at a ratio greater than 500 pages/minute (clearly in bundles, like 20-30 pages). We got reported cases where places.sqlite-wal increased up to 70MB, a not acceptable value on mobile, this can happen only with large transactions, and with Sync disabled the situation was looking better (1MB wal). If GX has to deal with 1 thousands nsIURIs, I think we should start thinking to reduce the added pages ratio, and that's a more realistic and easy thing to do. Is there a bug to evaluate the ratio and figure out if we can do better (or worse)? > * history URIs should be verified somehow (right now we use nsIURI) Right, this should be feasible by both the backend and the caller, that's why we use nsIURI, it's transparent validation to both sides. To be clear, we already had this discussion in the past about cost of nsIURI, the conclusion was that the cost was paying the data safety gain and we continued making APIs with nsIURI. If we end up finding a cheaper way to validate URIs cool, I'm all for the change, but the generic "GC is expensive" cost is hard to compare, while the costs on the other side are clear: - you could still have to create nsIURIs for other history/bookmarks APIs - you could still receive nsIURIs from notifications - comparing nsIURIs can be really cheap, comparing urlstrings is not - no more transparent data-safety across the boundary
(In reply to comment #16) > (In reply to comment #14) > > * on a first sync, we're syncing down thousands of history records (capped to > > 5000 on mobile) > > Sure, this should be done in a timeframe adapt to the device. I don't think the > scope is to give user all history in the first 10 seconds. That's not what I hear from the mobile team... Faster is better for sure since this conserves battery on mobile devices ("race to suspend"). > 5000 visits could not be a lot if there are multiple visits to a page, > but 5000 pages are a lot. They are 5000 places, each of them having between 1 and 10 visits. We can debate whether it should be 1000 or 10000 places on mobile, but this doesn't change the > Empirically, on mobile, I'd not add at a ratio greater than 500 pages/minute > (clearly in bundles, like 20-30 pages). We got reported cases where > places.sqlite-wal increased up to 70MB, a not acceptable value on mobile, this > can happen only with large transactions, and with Sync disabled the situation > was looking better (1MB wal). Right now we call mozIAsyncHistory::updatePlaces() with batches of 50. Is that too big of a batch size? > If GX has to deal with 1 thousands nsIURIs, I > think we should start thinking to reduce the added pages ratio, and that's a > more realistic and easy thing to do. > Is there a bug to evaluate the ratio and figure out if we can do better (or > worse)? There's bug 636312 about tuning the batch sizes. Is that what you mean by "ratio"? > > * history URIs should be verified somehow (right now we use nsIURI) > > Right, this should be feasible by both the backend and the caller, that's why > we use nsIURI, it's transparent validation to both sides. > To be clear, we already had this discussion in the past about cost of nsIURI, > the conclusion was that the cost was paying the data safety gain and we > continued making APIs with nsIURI. If we end up finding a cheaper way to > validate URIs cool, I'm all for the change, but the generic "GC is expensive" > cost is hard to compare, while the costs on the other side are clear: I think it's easy to compare: relying on GC to free lots of objects means expensive operations once in a while. In C++ you can allocate and free as you go, potentially even reuse already allocated objects. But maybe I'm misunderstanding what you mean by "compare". > - you could still have to create nsIURIs for other history/bookmarks APIs The only other history API we use at this point is CanAddURI which I'd like to push down into mozIAsyncHistory with this very bug. > - you could still receive nsIURIs from notifications Sync only needs to know about records failing to apply. See bug 636603. > - comparing nsIURIs can be really cheap, comparing urlstrings is not Where would we be comparing nsIURIs? In the end you walk up to sqlite and query or insert based on a string anyway? > - no more transparent data-safety across the boundary Can you elaborate on this point?
(In reply to comment #17) > Right now we call mozIAsyncHistory::updatePlaces() with batches of 50. Is that > too big of a batch size? Do you still do this while in a runBatched callback?
(In reply to comment #18) > (In reply to comment #17) > > Right now we call mozIAsyncHistory::updatePlaces() with batches of 50. Is that > > too big of a batch size? > Do you still do this while in a runBatched callback? Yes. sdwilsh said that didn't matter for mozIAsyncHistory::updatePlaces() so I left it in for the 3.5/3.6 code path. Was that wrong?
(In reply to comment #17) > That's not what I hear from the mobile team... Faster is better for sure since > this conserves battery on mobile devices ("race to suspend"). There are 3 constraints: - not too many, to avoid killing IO and GC - not too few, to avoid having the device always ON - a rationale number that the user expects (I don't think any brained user would expect to have 5000 pages available after 10 seconds, neither having to wait 10 minutes to get the last week of history). Would be nice to have 10 identical devices, try different ratios that could satisfy the user experience, repeat multiple times, and then check the battery decreases and get the better compromise. > They are 5000 places, each of them having between 1 and 10 visits. We can > debate whether it should be 1000 or 10000 places on mobile I have an interesting question (or maybe even a enh request to be filed), Places adapts the number of pages it can retain on each device, does Sync make use of this value? Can we better expose that? The problem is that if the device can have 10 000 entries, and Sync adds 20 000, then we have to expire half, addition of half of those entries was useless work. I figure out it's hard to manage if visits you are going to add are more or less "important" than the existing ones though. > Right now we call mozIAsyncHistory::updatePlaces() with batches of 50. Is that > too big of a batch size? > There's bug 636312 about tuning the batch sizes. Is that what you mean by > "ratio"? I think there are 2 important data to tweak: - batch size - pauses between batches It's not possible to give numbers without having real life experimental data (battery life, IO and memory usage) :( > maybe I'm > misunderstanding what you mean by "compare". Yes, sorry, I mean "we save 10 milliseconds, we save 1% battery, we save 5% memory", I mean the benefit so far is hard to measure. I think my points are more toward this as a general purpose API, rather than a synchronization specific API, generic add-ons benefit more than Sync from data coherence across all APIs, mostly because there is less quality and knowledge put in the code (for obvious reasons).
(In reply to comment #19) > Yes. sdwilsh said that didn't matter for mozIAsyncHistory::updatePlaces() so I > left it in for the 3.5/3.6 code path. Was that wrong? Well, it holds open a transaction, so it might just be wrong on mobile (if you have multiple updatePlaces calls inside of it).
(In reply to comment #21) > (In reply to comment #19) > > Yes. sdwilsh said that didn't matter for mozIAsyncHistory::updatePlaces() so I > > left it in for the 3.5/3.6 code path. Was that wrong? > Well, it holds open a transaction, so it might just be wrong on mobile How would it affect just mobile and not desktop? > (if you have multiple updatePlaces calls inside of it). We do. I can refactor our code to not do this easily. Tentatively filed bug 636631.
(In reply to comment #22) > How would it affect just mobile and not desktop? Less memory available.
Whiteboard: [places-next-wanted]
Depends on: 648346
Whiteboard: [places-next-wanted]
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.