Investigate Glean.js overflow errors from Bedrock
Categories
(Data Platform and Tools :: Glean: SDK, task, P2)
Tracking
(Not tracked)
People
(Reporter: brosa, Assigned: brosa)
Details
After seeing a spike in events in Bedrock, it was also noticed that there was a spike in overflow
errors on the glean.page_load
event. https://mozilla.cloud.looker.com/dashboards/1452?Event+Name=%22glean.page_load%22&App+Name=www.mozilla.org&Window+Start+Time=28+days&Channel=
We want to verify what version of Glean.js this is coming from and that the issue itself is not with Glean.js.
Assignee | ||
Comment 1•1 year ago
|
||
After some investigation, we found that the errors are valid because of URLs coming from Bedrock (mostly on the dev server) that are over 500 bytes long. Glean.js is behaving as expected and URLs of this length are always going to be truncated.
https://sql.telemetry.mozilla.org/queries/97350/source
I talked with agibson and we are going to try and find out if there is a fixed or at least an idea of how long these URLs can get. If they are only a bit larger than 500 characters, then there is potential to change the length of the extras again, but we would want to make sure if would be worth it.
Assignee | ||
Updated•1 year ago
|
Comment 2•1 year ago
•
|
||
a 500 byte long url is not actually that long. For example Chrome allows URLs up to 2MB and IE traditionally limited URLs to 2KB. Given that we aren't batching multiple events into a single ping, I think it's valid to reconsider increasing the size limits we place on event extras.
Assignee | ||
Comment 3•1 year ago
|
||
I am not sure of the implications that a decision like this has on the data platform itself, but from the Glean.js side its an easy change once we choose the value. We just increased the previous size from 100 characters to 500 bytes.
We aren't batching multiple events, but under certain circumstances we could have an event with 50 extras now, which could end up being a potentially significant size.
Does anyone else have thoughts on this? I know we already brought it up in our team channel a bit, but this can be a more formal request for info from others on my team.
Comment 4•1 year ago
•
|
||
I'm not opposed to increasing the limits, but I would like to understand if there is a requirement that the URL be collected whole/intact or if there are only pieces of it that are interesting? I think that more detailed/constrained collection of data here would result in better quality data by splitting the data into multiple extras rather than a potentially > 500 byte string URL that could need further parsing for analysis.
Based on some quick research, no browser seems to support 2MB URLs, Firefox is actually the leader here with > 300KB
Browser Address bar document.location
or anchor tag
------------------------------------------
Chrome 32779 >64k
Android 8192 >64k
Firefox >300k >300k
Safari >64k >64k
IE11 2047 5120
Edge 16 2047 10240
Based on other bottlenecks like search engines, etc. the current consensus is most URLs should be < 2048 Bytes to be supported across the web. Is this a reasonable limit to change this to? I'd like to avoid having to change this again in the future, but I'm also not certain that there isn't a better way to capture the information in this case.
Edit: Further research seems to contradict some of the numbers I found here, the numbers I found were the maximum lengths each browser could display, not necessarily what they could handle. I did see "2 MB" as supported by Chrome, but I have existential issues with any URL that long. Allowing something that big could have impacts on other limitations we have set, like maximum ping size, or storage sizes, etc.
Updated•1 year ago
|
Description
•