Intermittent error loading js chunk seemingly related to caching
Categories
(Core :: Networking: HTTP, defect, P2)
Tracking
()
People
(Reporter: danielkarp, Assigned: valentin, NeedInfo)
References
(Blocks 1 open bug)
Details
(Whiteboard: [necko-triaged])
Attachments
(1 file)
2.20 MB,
image/png
|
Details |
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36
Steps to reproduce:
Intermittently, Firefox (and only Firefox) users of our Angular site (with lazy loading of chunks via WebPack) will get ChunkLoadErrors that prevent our site from loading properly, that are cleared up when the user clears their cache. This has been maddeningly intermittent and I have not been able to reproduce it, but I've collected some HAR files from users with details that may point in the right direction. I can't totally rule out an issue with plugins/add-ons/extension, but from the list of plugins etc users have reported, there is no plugin in common between them all, I believe, and some have said they have seen the issue when they run vanilla Firefox.
Basically, what happens is that users load our site, one js chunk does not load properly, and so the site fails to run. They report that clearing their cache solves the issue.
Actual results:
Here is an example from one user. I have additional HAR files as well.
I'm not super adept at decoding HAR files, but it is interesting is that when chunks fail to load, there appears to be something odd with the Protocol.
I've attached three screenshots (combined into one image) from the HAR viewer, pointing out some specific issues. I'm reluctant to upload the whole files, because there is sensitive information in them, but I might be able to get permission to share them privately.
In screenshot one:
I've focused on a chunk that loaded properly (it seems).
In screenshot 2, from the same HAR, a chunk that did not load properly (notice status code 0, -1 B transferred). Notice also that the GET line in the raw Request omits HTTP/2. I believe that this file was the missing chunk that broke the page.
In screenshot 3, there is a request that seems to have loaded properly, but for some reason has status code 0? This one could just be loading from Cache or something, but I'm not sure.
The key is that when the user clears their cache, the file that previously would not load loads possibly (same filename, etc). Again, this occurs only for Firefox users.
For more discussion of this: I first asked about this on StackOverflow, here:
https://stackoverflow.com/questions/71070943/what-is-the-source-of-this-intermittent-webpack-chunkloaderror-in-firefox (The only person to answer thought this was a firefox bug).
I also brought it up as a possible Webpack issue, here:
https://github.com/webpack/webpack/discussions/15422
Finally, you can see other user reports of the same issue in our forums:
https://boardgamegeek.com/thread/2788693/specific-videos-page-immediately-redirects-me-empt
https://boardgamegeek.com/thread/2800974/cant-open-forum-threads-firefox
Expected results:
We shouldn't be getting those chunk load errors.
I was one of the users experiencing this bug (very intermittently; twice in months of heavy usage). I have seen a similar issue using Adobe Assets through Firefox - reviewing a particular file fails, clearing site cache resolves the issue.
Comment 2•3 years ago
|
||
Since the issue is intermittent it's hard for me to reproduce it in a short time, but it's clearly that many users are facing this issue.
I'm going to assign "Core: Javascript Engine" component, it could be a good starting point so the engineering team could take a look at this issue.
Reporter | ||
Comment 3•3 years ago
|
||
Thank you! Anyone working on this, feel free to reach out to me, and I will, with the users' permission, share any HAR files that might be of use.
Updated•3 years ago
|
Assignee | ||
Comment 4•3 years ago
|
||
Thank you for the report.
Check out our guide for reporting networking bugs
Specifically here, the HAR file and preferrably a HTTP log would help. Thanks!
Reporter | ||
Comment 5•3 years ago
|
||
(In reply to Valentin Gosu [:valentin] (he/him) from comment #4)
Thank you for the report.
Check out our guide for reporting networking bugs
Specifically here, the HAR file and preferrably a HTTP log would help. Thanks!
I've sent some HAR files to you directly by email, and will talk with affected users about HTTP logging.
Reporter | ||
Comment 6•3 years ago
|
||
I sent some HAR files direct to Valentin, but I’m not sure if he is the person actually working on this. Again, because the HAR files may contain sensitive information, I do not want to post them here, but I have permission to send them to whomever might be working on the bug—just let me know!
Comment 7•3 years ago
|
||
(In reply to danielkarp from comment #6)
I sent some HAR files direct to Valentin, but I’m not sure if he is the person actually working on this. Again, because the HAR files may contain sensitive information, I do not want to post them here, but I have permission to send them to whomever might be working on the bug—just let me know!
Networking team will work on this bug.
I've also seen the HAR files, but I can't figure out the root cause unfortunately.
Could you try to get some http log?
You could send the log to necko at mozilla dot com
or kershaw at mozilla dot com
. Thanks.
Reporter | ||
Comment 8•3 years ago
|
||
I will try to get one of the users who can reproduce the bug to try getting an HTTP log. Does the networking team also handle HTTP caching? From the outside looking in, with no idea what the code is actually doing, the behavior they report makes me think the problem could be at the caching layer.
Assignee | ||
Comment 9•3 years ago
|
||
It is also possible this could be related to bug 1640212.
Could you also ask the reporters if setting the network.http.rcwn.enabled
pref to false
improves things?
Comment 10•3 years ago
|
||
Keep the needinfo
flag to indicate that we are still waiting for more information.
Reporter | ||
Comment 11•3 years ago
|
||
I have asked the users to set that flag to false, and also asked them to try to get HTTP logs. Still waiting to hear back.
Reporter | ||
Comment 12•3 years ago
|
||
D'oh! Didn't mean to clear the needinfo. I guess it was automatically checked when I replied?
Comment 13•3 years ago
|
||
(In reply to danielkarp from comment #12)
D'oh! Didn't mean to clear the needinfo. I guess it was automatically checked when I replied?
Yes, it's automatically checked. No worries.
Thanks for your help!
Reporter | ||
Comment 14•3 years ago
|
||
So far, it seems that setting the network.http.rcwn.enabled
pref to false
is working for people. No one has been able to provide an HTTP log yet.
Reporter | ||
Comment 15•3 years ago
|
||
OK, I have privately sent on an HTTP Log captured by one of our users. I'll forward on any other HTTP logs users are able to capture as well. Is there anything else that would be helpful?
Reporter | ||
Comment 16•3 years ago
|
||
Oh, one more thing I forgot to mention. Until recently, the domain that has been giving us trouble, cf.geekdo-static.com, did not support IPv6. We added support for IPv6 last week, but it has not seemed to affect whether this bug occurs.
Reporter | ||
Comment 17•3 years ago
|
||
FYI, the user who provided that HTTP log also reported that setting network.http.rcwn.enabled
to false
didn't fix the problem for him. That calls into question whether he is having the same issue the other users are having, since for most users that setting seems to fix the problem for them.
Comment 18•3 years ago
|
||
From the log, I can only tell there's one js request failed.
2022-03-10 08:47:17.535000 UTC - [Parent 12576: Main Thread]: D/nsHttp HttpChannelParent::RecvCancel [this=2231b371440]
2022-03-10 08:47:17.535000 UTC - [Parent 12576: IPDL Background]: V/nsHttp HttpBackgroundChannelParent::ActorDestroy [this=2231decbee0]
2022-03-10 08:47:17.535000 UTC - [Parent 12576: Main Thread]: V/nsHttp nsHttpChannel::Cancel [this=2231b16ed00 status=804b001b]
2022-03-10 08:47:17.535000 UTC - [Parent 12576: Main Thread]: V/nsHttp nsHttpChannel::CancelInternal [this=2231b16ed00]
The failed reason is NS_ERROR_INVALID_CONTENT_ENCODING
.
Reporter | ||
Comment 19•3 years ago
|
||
FYI, it continues to be the case that for most users reporting this issue, setting network.http.rcwn.enabled
to false
completely solves the problem for them. That suggests that the underlying problem here might be the same as Bug 1640212, which also seems to be solved by changing that setting to false.
Comment 20•3 years ago
|
||
I have spent some time trying to reproduce the issue and I was not able to. I tried without logging in. The resource that was failing in the log is not accessible without a login (I am getting 403 responses).
The log only shows the failing requst. The request fails because it cannot be decoded, i.e. gzip decoding is failing. I am wondering why is that. One possible issue would be that we store corrupted content. That is unlikely because changing the pref network.http.rcwn.enabled would not fix the problem but make it worse.
Maybe our RCWN feature is corrupting the content.
How hard is to reproduce th issue? Is it possigle to record log for a longeer period? I would need onee successful load followed by a failed load. I am interested to see what is happening in between.
Reporter | ||
Comment 21•3 years ago
|
||
So unfortunately, I've never been able to reproduce it myself. All of the recordings come from other users, but some of them DO seem to be able to reproduce it semi-reliably, but by that I mean at most a couple times a day. Still, I'll see if I can get one of them to record a log for a longer period.
Reporter | ||
Comment 22•3 years ago
|
||
Oh, also, you can create a free account on BGG (or I can provide you with one) if you want to try further to reproduce this, if not having an account is part of the problem.
Reporter | ||
Comment 23•3 years ago
|
||
So users who are willing to try to create the log has asked the following:
And then they want use to be recording a SUCCESSFUL one followed by a failure as the next one, literally? Since it's apparently random and impossible to predict in advance when the next one is going to be a failure, it's unclear how feasible that is, unless we just continually log and log and log until a failure happens. (Maybe that's feasible to do, I have no idea how big these logs get...)
I'd love to help, but as Russ mentioned, this is pretty hard. It's pretty random. Or at least, I could not notice a pattern.
Two questions:
- does it have to be the exact same page ?
- can the log be on all the time, then when it happens we just have to get it ? Or does it require some specific manipulation from our part each time we want to get a log ?
So how big are these logs? How long is it feasible to record a log? Do you need logs that have a success followed immediately by a failure?
Comment 24•2 years ago
|
||
(In reply to danielkarp from comment #23)
So users who are willing to try to create the log has asked the following:
And then they want use to be recording a SUCCESSFUL one followed by a failure as the next one, literally? Since it's apparently random and impossible to predict in advance when the next one is going to be a failure, it's unclear how feasible that is, unless we just continually log and log and log until a failure happens. (Maybe that's feasible to do, I have no idea how big these logs get...)
I'd love to help, but as Russ mentioned, this is pretty hard. It's pretty random. Or at least, I could not notice a pattern.
Two questions:
- does it have to be the exact same page ?
- can the log be on all the time, then when it happens we just have to get it ? Or does it require some specific manipulation from our part each time we want to get a log ?
So how big are these logs? How long is it feasible to record a log? Do you need logs that have a success followed immediately by a failure?
You can limit the size of a log by adding "rotate:200" this would mean the max size would be 200MB.
You can record a log for a very long time, but please add the limit otherwise they will be too big.
I needed a log that contained both successful and failed loads, it is not important how far apart they are.
Comment 25•2 years ago
|
||
Set the ni flag to indicate that we are waiting for more information.
Comment 26•2 years ago
|
||
Redirect a needinfo that is pending on an inactive user to the triage owner.
:valentin, since the bug has recent activity, could you have a look please?
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 27•2 years ago
|
||
This seems like a real bug, so even though it's incomplete.
(In reply to Kershaw Chang [:kershaw] from comment #18)
From the log, I can only tell there's one js request failed.
2022-03-10 08:47:17.535000 UTC - [Parent 12576: Main Thread]: D/nsHttp HttpChannelParent::RecvCancel [this=2231b371440] 2022-03-10 08:47:17.535000 UTC - [Parent 12576: IPDL Background]: V/nsHttp HttpBackgroundChannelParent::ActorDestroy [this=2231decbee0] 2022-03-10 08:47:17.535000 UTC - [Parent 12576: Main Thread]: V/nsHttp nsHttpChannel::Cancel [this=2231b16ed00 status=804b001b] 2022-03-10 08:47:17.535000 UTC - [Parent 12576: Main Thread]: V/nsHttp nsHttpChannel::CancelInternal [this=2231b16ed00]
The failed reason is
NS_ERROR_INVALID_CONTENT_ENCODING
.
Hmm, this makes me think that we have the wrong headers. The only way this could happen is if we mixed network and cached responses.
I think I have an idea where to look. Might also be able to write a test for it.
Reporter | ||
Comment 28•2 years ago
|
||
Sorry I haven't been able to come through with a log. The problem is that the users who reported this issue (I've never seen it myself) have mostly discovered, in our forums, the trick of setting network.http.rcwn.enabled
to false
, and are mostly no longer concerned with it. It is a lot to ask them to change their settings again, start logging, and hoping that the bug reoccurs--I had no takers the last time I asked. But I can try again.
Assignee | ||
Comment 29•2 years ago
|
||
No worries, I totally understand. :) I'll get back to you in a few days, it's possible I've actually found the bug.
Reporter | ||
Comment 30•2 years ago
|
||
(In reply to Valentin Gosu [:valentin] (he/him) from comment #29)
No worries, I totally understand. :) I'll get back to you in a few days, it's possible I've actually found the bug.
In the mean time, one of the users who was experiencing the bug actually got the logs you were asking for, I believe. I have emailed you links to the files, but you should download them soon, as those links will expire shortly.
Reporter | ||
Comment 32•3 months ago
|
||
I'd like to reengage on this issue, if possible.
Our Firefox users (and only our Firefox users) continue to have problems loading js chunks on our site. I reported the issue to Angular, but they believe the problem is not on their end: https://github.com/angular/angular-cli/issues/28361
One user recently provided the following from his console:
Loading failed for the module with source “https://cf.geekdo-static.com/frontend/chunk-7VTT4A6L.js”. boardgamegeek.com
Loading failed for the module with source “https://cf.geekdo-static.com/frontend/chunk-KRXMVMZV.js”. boardgamegeek.com
Loading failed for the module with source “https://cf.geekdo-static.com/frontend/chunk-UOJLIIXE.js”. boardgamegeek.com
Loading failed for the module with source “https://cf.geekdo-static.com/frontend/chunk-UKMIOM3P.js”. boardgamegeek.com
Loading failed for the module with source “https://cf.geekdo-static.com/frontend/chunk-JIBBZ5SB.js”. boardgamegeek.com
Loading failed for the module with source “https://cf.geekdo-static.com/frontend/chunk-SFTBLGAN.js”. boardgamegeek.com
Loading failed for the module with source “https://cf.geekdo-static.com/frontend/chunk-LYULK75U.js”. boardgamegeek.com
I immediately checked the files--all of them were still available from the CDN, and after he went into his settings and cleared his cache for the last day, the problem went away for him. He mentioned that reloading with Ctrl-F5 did NOT fix the problem.
So it seems that this continues to be an issue.
Reporter | ||
Comment 33•2 months ago
|
||
I've had another report from a user, with some more information on this. They captured a screenshot showing a problematic load for a file. You can see it here:
https://boardgamegeek.com/image/8437775/jesterknot
Everything in there is exactly correct, as it should be, except for the "Transferred" line. The file (as of this writing) is still available on our server, and all headers match what I see in that screenshot. I'm pretty much at a loss, here. The only thing I can think of that might help is to reduce our max-age for all of our scripts so that Firefox users at least won't have problems for as long, at the expense of essentially forcing everyone to reload these files more often than necessary, but for obvious reasons, that isn't a trade-off I'm in a hurry to make.
Valentin, is this still an issue that you are working on? This continues to be a problem for our Firefox users (and only our Firefox users).
Description
•