Open Bug 1375896 Opened 7 years ago Updated 1 year ago

Export Database::MaxUrlLength() to the JS modules

Categories

(Toolkit :: Places, task, P3)

task

Tracking

()

People

(Reporter: nanj, Unassigned)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [snt-scrubbed][places-tech-debt])

      No description provided.
Priority: -- → P1
Whiteboard: [fxsearch]
The scope of the bug is to expose Database::MaxUrlLength through nsINavHistoryService and then replace DB_URL_LENGTH_MAX in PlacesDBUtils
Likely also URI_LENGTH_MAX in nsNavHistory.h should change. Fwiw I don't think it makes sense to cut a uri, the result may just be broken, likely we should throw earlier.
Priority: P1 → P2
N.B., if these limits exist, they should be documented and applied more broadly, including being respected on iOS and Android. That's particularly true if you're changing limits or changing behavior -- you can break an existing Sync account in this way.
These limits always existed.
How does changing the limits break a sync account? wouldn't that be seen as a title change? Is it just due to the fact we use folder titles to reparent bookmarks? (but how common is a 256 chars folder title?)
Flags: needinfo?(rnewman)
(In reply to Marco Bonardo [::mak] from comment #4)

> These limits always existed.

We should still make sure they're documented as part of Sync! See Bug 1409777 as an example…

> How does changing the limits break a sync account? wouldn't that be seen as
> a title change? Is it just due to the fact we use folder titles to reparent
> bookmarks? (but how common is a 256 chars folder title?)

Content-based lookup usually isn't a problem, but hitting limits is.

Let's say I have some 1000-character data on the server.

I install a Firefox that has changed the limit, reducing it to 256.

I sign in to Sync. Places throws when we apply the existing records.

Worse: it might truncate and not tell Sync that the data changed.

Even worse: Places and Sync might proceed with a bookmark folder missing, leading to bookmarks being reparented or otherwise mashed up.

It's not OK for Firefox to reduce the set of acceptable Sync data. It can grow it, and it can accept-and-rewrite, but it can't start to reject previously valid records. That becomes an issue if these limits change.

Bug 1409777 touched on this: Firefox for iOS was able to create bookmarks that exceeded the URL length that Places would accept. iOS would store the bookmark, and Sync would store the bookmark, but it wouldn't apply on desktop. We fixed that by just truncating on iOS, because the limit was pretty high, and the desktop limit was long-standing.

If you want to truncate to 256 then this requires coordination across platforms and some consideration for how the change rolls out.
Flags: needinfo?(rnewman)
(In reply to Richard Newman [:rnewman] from comment #5)
> Let's say I have some 1000-character data on the server.
> 
> I install a Firefox that has changed the limit, reducing it to 256.
> 
> I sign in to Sync. Places throws when we apply the existing records.

This can happen for many other reasons, the url definition could slightly differ across platforms, and an url valid on platform A could be invalid on B. The same could be for other properties of the bookmark, a device may be unable to support non-ascii tags for example, for whatever reason.

> Worse: it might truncate and not tell Sync that the data changed.

Why would that be worse? if the title is not synced fully, there's still no critical information missing, apart from a 99%-non-important piece of the title.

> Even worse: Places and Sync might proceed with a bookmark folder missing,
> leading to bookmarks being reparented or otherwise mashed up.

This assumes that the folder title is over the limit, that is extremely unlikely, isn't it? Who would type and find useful a 256 chars bookmark title?

> It's not OK for Firefox to reduce the set of acceptable Sync data. It can
> grow it, and it can accept-and-rewrite, but it can't start to reject
> previously valid records. That becomes an issue if these limits change.

We usually don't discard data, at a maximum we crop it.
But even discarding data should be supported, because the devices can be so different there's no way to assume they all can understand the same exact data.

> Bug 1409777 touched on this: Firefox for iOS was able to create bookmarks
> that exceeded the URL length that Places would accept.

It could have been about other properties of the url, not just the length, for example suppose that the url parser changes (like it did recently) and the Desktop parser is updated sooner than the mobile parser (likely), then mobile could try to store on desktop urls that are no more valid for the desktop parser (I actually think something like that happened recently, even if I don't have the proof yet!).

> iOS would store the
> bookmark, and Sync would store the bookmark, but it wouldn't apply on
> desktop. We fixed that by just truncating on iOS, because the limit was
> pretty high, and the desktop limit was long-standing.

But this doesn't look like a critical issue, nor breaking a Sync account, as you said in comment 3.
While it's not great that a device misses something from another one, imo it's an acceptable outcome in edge-cases (like that url longer than 65536 chars, or the bookmark with a pointless long title).

I still fail to see a critical issue breaking a Sync account. I see an edge case causing a possible small data loss (on the synced device).

> If you want to truncate to 256 then this requires coordination across
> platforms and some consideration for how the change rolls out.

Sure, we can do this, let's start from bug 1421567, I'd like to limit the titles to 256 chars, because longer titles make no sense. I still didn't decide whether it should be a global limit, or only a limit for newly added visits.
Note that in general when cropping I would only change newer insertions, not the already existing ones.
Flags: needinfo?(rnewman)
(In reply to Marco Bonardo [::mak] from comment #6)

> This can happen for many other reasons, the url definition could slightly
> differ across platforms, and an url valid on platform A could be invalid on
> B. The same could be for other properties of the bookmark, a device may be
> unable to support non-ascii tags for example, for whatever reason.

IMO we should be trying to make that list of reasons as small as possible.

 
> > Worse: it might truncate and not tell Sync that the data changed.
> 
> Why would that be worse? if the title is not synced fully, there's still no
> critical information missing, apart from a 99%-non-important piece of the
> title.

Different behavior between platforms (e.g., different Awesomebar text matches). Altered reconciling behavior.

Convergence is a desirable property of a distributed storage system.

 
> > Even worse: Places and Sync might proceed with a bookmark folder missing,
> > leading to bookmarks being reparented or otherwise mashed up.
> 
> This assumes that the folder title is over the limit, that is extremely
> unlikely, isn't it?

If it's extremely unlikely that folder titles are long, why bother imposing a restrictive limit on them?


> Who would type and find useful a 256 chars bookmark
> title?

Users are always surprising. I can certainly imagine organized professionals — e.g., lawyers — having naming conventions that blow through 256 chars, let alone page titles.

Remember also that titles come from web pages, not just user typing.

There are 159 pages in my history with titles longer than that, including ones with useful information right up to the end:

"[@ java.lang.NullPointerException: Attempt to invoke virtual method ''boolean java.lang.String.startsWith(java.lang.String)'' on a null object reference at org.mozilla.gecko.home.BrowserSearch.access$1900(BrowserSearch.java)] - FennecAndroid 44.0b8 Crash Report - Report ID: aa12bdb8-1dca-41a2-a73b-626fd2160115"



> It could have been about other properties of the url, not just the length,
> for example suppose that the url parser changes (like it did recently) and
> the Desktop parser is updated sooner than the mobile parser (likely), then
> mobile could try to store on desktop urls that are no more valid for the
> desktop parser (I actually think something like that happened recently, even
> if I don't have the proof yet!).

IMO that's something that should cause a failure to *load*, not a failure to store; what we consider a valid URL is a property of the Sync object formats, not a property of Gecko!

Consider what will happen in that case if the desktop change is buggy and rejects valid URLs… indeed, we work around such parser differences on iOS, accommodating URLs that iOS considers invalid but desktop writes into Sync.

Mobile platforms also happily accept desktop bookmarks like "about:config", even though those URLs are meaningless.

Incidental client code changes over time should not, as a fundamental principle, alter which Sync records are valid.

Our clients should follow Postel's Law -- and for a system like Firefox Sync, that also means _preserving_ and round-tripping data.


> But this doesn't look like a critical issue, nor breaking a Sync account, as
> you said in comment 3.

Define "broken"! It would have failed to insert on desktop, and then the desktop bookmark tree would have a dangling pointer to a missing record. If that were a long folder name instead of a long URL, Places would have had to reparent bookmarks to accommodate the missing folder. That's user-visible breakage of the kind we've spent two years working to eliminate.


> While it's not great that a device misses something from another one, imo
> it's an acceptable outcome in edge-cases (like that url longer than 65536
> chars, or the bookmark with a pointless long title).

If you mean truncation, I agree (though I'd rather have fairly high limits for remote-sourced data). If you mean dropping records on the floor, I disagree.


> I still fail to see a critical issue breaking a Sync account. I see an edge
> case causing a possible small data loss (on the synced device).

Bear in mind that it's never just on the synced device -- in the case of a node reassignment, or a user restoring their device from Sync, this data needs to be reuploaded, or is no longer present elsewhere.


> Note that in general when cropping I would only change newer insertions, not
> the already existing ones.

Can we impose that restriction on data originating locally, rather than insertion from Sync?
Flags: needinfo?(rnewman)
(In reply to Richard Newman [:rnewman] from comment #7)
> > This assumes that the folder title is over the limit, that is extremely
> > unlikely, isn't it?
> 
> If it's extremely unlikely that folder titles are long, why bother imposing
> a restrictive limit on them?

I didn't phrase it well, it's extremely unlikely that it will be useful to the user.
Whether we should impose a limit on them is TBD. I'd first like to limit history titles.

> Remember also that titles come from web pages, not just user typing.

That's exactly why we should not allow pages to trash our space with pointless long titles.
I don't care much about user typing a long title, I care more about pages pushing long titles on us to make more likely to appear in Address Bar results, for example.

> There are 159 pages in my history with titles longer than that, including
> ones with useful information right up to the end:
> 
> "[@ java.lang.NullPointerException: Attempt to invoke virtual method
> ''boolean java.lang.String.startsWith(java.lang.String)'' on a null object
> reference at
> org.mozilla.gecko.home.BrowserSearch.access$1900(BrowserSearch.java)] -
> FennecAndroid 44.0b8 Crash Report - Report ID:
> aa12bdb8-1dca-41a2-a73b-626fd2160115"

How is likely you will find that page by typing the last piece of its title, that likely you didn't even see because all of our UIs only show a few chars?
Btw, 256 was just a guess pick, the point is the current 4096 chars limit has no utility and someone could abuse it.

> > Note that in general when cropping I would only change newer insertions, not
> > the already existing ones.
> 
> Can we impose that restriction on data originating locally, rather than
> insertion from Sync?

We could.
I was thinking as an initial step to just crop titles coming from the docshell visits, that should have no Sync consequences, apart from outgoing visits having a shorter title.
Though this leaves open the fact one could "abuse" Firefox on IOS storing tens of 4096-long titles, that Sync would copy back to desktop, because the normal history APIs wouldn't crop.
(In reply to Marco Bonardo [::mak] from comment #8)

> That's exactly why we should not allow pages to trash our space with
> pointless long titles.

Do they do that?

And if they do do that, how many of them are longer than 256 characters?


> How is likely you will find that page by typing the last piece of its title,
> that likely you didn't even see because all of our UIs only show a few chars?

At the risk of ratholing with N=1 examples, yes, I've found crash reports and bugs in my history by matching part of the filename, which is usually at the end of the title. (E.g., "VersioningDelegateHelper.java" => Bug 1392078)

And my 27" desktop display on my Mac can display the entirety of a page title like

"[@ java.lang.IllegalStateException: Expected to modify syncVersion for a guid, but did not at org.mozilla.gecko.db.BrowserProvider.bulkUpdateSyncVersions(BrowserProvider.java)] - FennecAndroid 57.0a1 Crash Report - Report ID: 4376cf17-b436-4722-a289-9f49a0170823"

which is 262 characters. (Throw some emoji in there and we can get up to 500 chars!)

The entire top bar of the Photon window shows title. A 30-inch display can probably display 320+ characters without altering any settings.

The reason I'm spending time discussing in this bug (and the other one!) is that while speed is a differentiator we are pursuing, _so is the awesomebar_, and so are the other ways we leverage user data.

It would be a shame to see a change degrade all of those other characteristics for an unmeasurable edge-case improvement in performance.


> Btw, 256 was just a guess pick, the point is the current 4096 chars limit
> has no utility and someone could abuse it.

That's probably true. I think another way of saying what you wrote is: the 4096 limit isn't excluding _anything_, so it's not a limit, right?

The question is: where should a limit be to maximize utility?

If I were to wildly speculate I'd say 1024, which accommodates all but a handful of titles I've seen in the wild, and leaves sufficient padding for non-Latin scripts — 
My 34" ultrawide sympathizes with your 27" screen and could very likely accomodate 500 chars.
I'd be fine with 512 or 1024 as a start. Obviously I'd prefer the stricter limit, while you'd prefer the wider one. Do you have some examples of those titles longer than 512? If they are just filled with SEO tricks, I'm not sure we should take the bait, it would make search results worse, rather than better.

How do we proceed from there to make Sync happy? If we limit cropping to newly added history, how do we manage things on other devices (Android and IOS)? Should we also fix them to do their own cropping on newly added history?
(In reply to Marco Bonardo [::mak] from comment #10)
> Do you have some examples of those
> titles longer than 512?

From my own places.sqlite, only a few. Instagram titles can be long and include emoji:

----
Nikki Richardson on Instagram: “Sorry for no activity lately I started a new job out west so I'm without any internet for 8 days at a time! so to make up for it I'm posting one of my favourite spider videos I got of a huntsmen eating another very unidentifiable spider #straya 
Hey it turns out that Bugzilla can't do emoji! 

Let's try that again.




(In reply to Marco Bonardo [::mak] from comment #10)
> Do you have some examples of those
> titles longer than 512?

From my own places.sqlite, only a few. Instagram titles can be long and include emoji:

----
Nikki Richardson on Instagram: “Sorry for no activity lately I started a new job out west so I'm without any internet for 8 days at a time! so to make up for it I'm posting one of my favourite spider videos I got of a huntsmen eating another very unidentifiable spider #straya I Took this on my iPad with the 10x Olloclip macro lens @australia _______________________________________________ _ #film #documentry #macro_vision #olloclipjapan #as_videos #macro #earthcapture #ausgeo #olloclipmacro #olloclip #ollocliplens @theellenshow #natgeo #nationalgeographic
----

Amazon listings, of course:

----
AmazonSmile: #HappyPrimeDay Special Offer JEBSENS - 4in1 3.1A Dual USB Car Charger, Cigarette Lighter Voltage Digital Panel Meter Volt Voltmeter Monitor for Auto Car Truck, with Blue LED Display - Displays Voltage, Amps and Internal Temperature (Fahrenheit) - Total of 3.1 amps, 15W, Compatible with iPhone 6, 6 Plus, 5s, 5c, 4s, 4, iPods, iPad, Samsung Galaxy Note 2, Note 3, S2 S3, S4, S5, HTC One, LG G3, Most Android/Windows Smart Cell Phones, GPS, Tablets, and Other USB-charged Devices (Black with BLUE LED): Cell Phones & Accessories
----

and scientific research papers can be 300+:

----
Hydrogen bonding in diols and binary diol–water systems investigated using DFT methods. II. Calculated infrared OH-stretch frequencies, force constants, and NMR chemical shifts correlate with hydrogen bond geometry and electron density topology. A reevaluation of geometrical criteria for hydrogen bonding - Klein - 2003 - Journal of Computational Chemistry - Wiley Online Library
----

We can speculate about others: breadcrumb trails in folder-based systems, crash reports, stock photo library keywords, map addresses…


> If they are just filled with SEO tricks, I'm not
> sure we should take the bait, it would make search results worse, rather
> than better.

N.B., Google penalizes sites that have over-long titles; I think that's a larger force than trying to game the awesomebar, so I would be surprised to see long titles outside of sites like Amazon.

https://moz.com/blog/long-title-tags


> How do we proceed from there to make Sync happy? If we limit cropping to
> newly added history, how do we manage things on other devices (Android and
> IOS)? Should we also fix them to do their own cropping on newly added
> history?

Yes, I think that's the best approach.

Essentially I'm suggesting a tiered system, inspired by the end-to-end principle:

- A storage and sync layer that has some rudimentary, and fairly stable, sanity limits (driven by Sync record sizes, column length limitations), but otherwise tries to honestly store and consistently move around whichever observations clients want to record.
- Clients that implement appropriate business logic on top.


Android already truncates URLs to 25,000 UTF-16 characters and titles to 255 UTF-16 characters (MAX_TITLE_LENGTH in browser.js) in its frontend messaging, because the message pipeline was expensive, but does not (as far as I can tell) truncate when writing to history or checking for visits via nsAndroidHistory.cpp. Presumably any change made to the Gecko history recording mechanism would apply there by default.

iOS truncates to match desktop's storage limits at the time: URLs 65K, titles 4K, descriptions 1K.
Blocks: 1433368
Severity: normal → S3

We should look at the various code points defining the url lengths and unify those.

Type: enhancement → task
Priority: P2 → P3
Whiteboard: [fxsearch] → [snt-scrubbed][places-tech-debt]
Severity: S3 → N/A

Tangentially related, because Desktop Places and Mobile Places really need to agree on this value, the Rust code defines this constant here.

You need to log in before you can comment on or make changes to this bug.