Download (save page as...) with some ad-blocker fails (because some subresources are blocked) and succeeds when retried
Categories
(Core :: DOM: Serializers, defect, P3)
Tracking
()
People
(Reporter: reibjerk, Unassigned)
References
(Blocks 1 open bug, )
Details
(Whiteboard: [tor 32225])
Attachments
(6 files)
Comment 1•7 years ago
|
||
Comment 7•7 years ago
|
||
Comment 9•5 years ago
|
||
Re-upping for QA reproduction. Otherwise, we should close this.
Comment 10•5 years ago
|
||
Alex, can you help reproduce this issue? Thanks!
Reporter | ||
Comment 11•5 years ago
|
||
If I can help, I will.
It may help to know that I'm still experiencing this issue, but to a less extent.
It was originally reported on Windows 7 I think, but I'm now on Windows 10 latest, with latest FF of course.
So this is, compared to the original problem, on a from scratch installed new system. (No fluke on the Win7)
Also, I no longer run with ad-blocking through dummy entries in my hosts file, but indirectly through use of uMatrix add-on.
The mis-behaviour happens on some/many sites, when I try to download them for local storage.
I just picked one site now; https://edition.cnn.com/
and did a Ctrl-S, save locally.
The DL icon gets this yellow dot that indicated an error. And, as before, when I retry it always succeeds.
But, and this may be related, I have issues with watching the local web copies afterwards.
They seldom work/display as desired.
It would (hopefully) be strange if I'm the only one experiencing this, but I may not be the most typical user,
using uMatrix and (totally unrelated !) have my shared ext4 partition on /dev/sdi15 :-)
Just tell me what trace/files you need.
Reporter | ||
Comment 12•5 years ago
|
||
Or, you could just try the single line in your hosts file;
0.0.0.0 www.googletagmanager.com
Comment 13•5 years ago
|
||
I was able to consistently reproduce the issue on the latest Release (67.0.3 / 20190618025334), Beta (68.0b12 / 20190619234730) and Nightly (69.0a1 / 20190619214046) under Windows 10 Pro 64-bit and macOS High Sierra 10.13.6, following the provided STR.
To be more specific regarding what I’ve done to reproduce the issue, I’ve created a new/fresh profile, installed the latest version of uMatrix (version 1.3.16 from https://addons.mozilla.org/en-US/firefox/addon/umatrix/), proceeded to the mentioned websites (and a couple more) where I attempted to download the web pages via CTRL+S.
These are the results I’ve noticed when reproducing the issue:
The page fails to download on the first attempt even with uMatrix disabled or not installed at all. Upon retrying, the page was downloaded successfully. This, obviously occurs with the add-on being installed as well, however, I believe this does not influence the results i.e. the page fails to download on the first attempt.
Opening the locally stored page did not properly display the contents (see screenshots 1 and 2). The page contents are not displayed properly regardless of uMatrix being enabled or disabled, or installed for that matter.
The page is downloaded successfully on the first try with uMatrix disabled or not installed at all.
With the add-on installed, download fails on the first attempt however, upon retrying, the page was downloaded successfully.
Opening the locally stored page did not properly display the contents, as well (see screenshots 3 to 6). For screenshots 5 and 6, uMatrix seems to still be blocking some content and thus the page appears as depicted. Disabling the add-on and reloading the saved page will properly display it as the original, non-downloaded page.
Tried with https://www.facebook.com/ as well. With the add-on enabled, the page downloads only after retrying the process. The page is however correctly displayed when loaded from local storage, regardless of having the add-on enabled or disabled.
With https://www.youtube.com/, download succeeds on the first try with the add-on enabled, though the page is not correctly displayed when loaded locally, regardless of having the add-on enabled or disabled (the page initially loads correctly and immediately after it is fully loaded, it goes blank).
Regarding the alternate method of reproducing the issue (0.0.0.0 www.googletagmanager.com added to the hosts file), I am not sure on how to exactly do this so I would like to ask you to provide more detailed STR to attempt this as well, just in case. Thanks !
Comment 14•5 years ago
|
||
Comment 15•5 years ago
|
||
Comment 16•5 years ago
|
||
Comment 17•5 years ago
|
||
Comment 18•5 years ago
|
||
Comment 19•5 years ago
|
||
Reporter | ||
Comment 20•5 years ago
|
||
On the topic (only) of adding
0.0.0.0 www.googletagmanager.com
or similar to the hosts file;
- What this does is making a local "DNS-like" lookup entry for host names.
You thereby tell the computer at what address this host can be fount at. It can be useful for naming server aliases for instance.
www.googletagmanager.com is actually at address "2a00:1450:400f:809::2008:" (ipv6), so by saying it is at address 0 (0.0.0.0 ipv4)
you are disabling access to this address.
Since you may want to do this for ad-servers, this becomes an ad-block method.
This can be done manually, as in this case, or through use of a tool, for instance mvps (http://winhelp2002.mvps.org/hosts.htm)
- How you do it;
The 'hosts' file is a plain text file, located in the directory
C:\Windows\System32\drivers\etc
for Windows, and /etc in Linux/Unix.
Take a backup of the original file and just use an editor to add
0.0.0.0 www.googletagmanager.com
at the end of the file. You need to be admin (root) to change this file.
You may want to test this mechanism by adding
1.2.3.4 myhost
for instance in the hosts file. Then you can do a
ping myhost
at the command line to see that the host dummy entry change is working. You will not get any reply with the ping but you will see that the host name 'myhost' resolves to the ip address 1.2.3.4 and hence the mechanism is working.
Comment 21•5 years ago
|
||
Hello,
I have configured the ‘hosts’ file as you have detailed above, for both Windows 10 Pro 64-bit and macOS High Sierra 10.13.6 and have managed to reproduce the issue (with the same results as when using uMatrix) on the latest versions of Firefox (Release - 67.0.4 / 20190619235627; Beta - 68.0b12 / 20190619234730; Nightly – 69.0a1 / 20190620220631).
The only differences I have managed to observe are that loading the locally saved https://www.mozilla.org/en-US/ will now display it’s contents properly and https://www.facebook.com/ will download on the first attempt.
Also, I have tried another webpage (https://www.timesnewroman.ro) which will fail to download on the first attempt and will succeed only after retrying. Loading the saved page will not display the contents correctly.
As a conclusion, the reported issue is present and consistently reproducible, either using the uMatrix extension or by modifying the ‘hosts’ file, with a wide range of websites being affected by this.
Comment 22•5 years ago
|
||
Based on the above comments it seems that this issue can be reproduced not just with an extension but also by changing the /etc/hosts file on the system, and so it doesn't seem to be an issue specific to a WebExtensions API.
I'm moving it into the "Toolkit :: Downloads API" to be re-triaged (but it could also be that the right bugzilla component is "Firefox :: File Handling", based on the component description of "Toolkit :: Downloads API").
Comment 23•5 years ago
•
|
||
This looks like bug 1536530, but for any website where a subresource fails to load as a result of an adblocker / hosts block. I expect the retry works because (I expect, haven't verified) we've network-cached the fact that the request failed and somehow that doesn't break the webbrowserpersist code in the same way.
I'm not sure what we want to do here. Failing the download is in principle correct, as one (or more) of the requests that were part of the download failed. However, it's clearly not very helpful here. The crux is likely to be whether we can distinguish the nature of the failure in the webbrowserpersist code (from "real" network failures) and do something else. Luca, do you know how uMatrix and other such solutions reject these types of requests in the webrequest API, and what the resulting XPCOM error is?
As for how "correct" the resulting page is, that's not really related here -- saving a webpage locally is always tricky and best-effort. For instance, if you save a page without any scripts, some elements won't work. But if you save a page and include the script code, it might run differently (when it realizes it's not being served by a webpage at the original http(s) address), all the more so if you save the "live" DOM instead of the as-requested-from-the-server DOM. So I wouldn't worry about that in relation to this issue.
Comment 24•5 years ago
|
||
(In reply to :Gijs (he/him) from comment #23)
Luca, do you know how uMatrix and other such solutions reject these types of requests in the webrequest API, and what the resulting XPCOM error is?
From a very quick look to the uMatrix sources, it looks that blocked subresources are blocked by returning {cancel: true}
from a blocking webRequest listener:
That return value is then used from WebRequest.jsm to actually cancel the request (by calling the ChannelWrapper
's cancel
method with Cr.NS_ERROR_ABORT
as a parameter):
Comment 25•5 years ago
|
||
Looks like bug 1493599 added some info that should allow us to distinguish some of these cases, but it also looks like there's no specific blocked status for things cancelled through the webrequest listener. It looks to me like we just don't show those resources in the network inspector. Honza, does that look right? Is there some other way to distinguish these requests, given that lots of things call nsIRequest::cancel with NS_ERROR_ABORT ?
Comment 26•5 years ago
|
||
(In reply to :Gijs (he/him) from comment #25)
Looks like bug 1493599 added some info that should allow us to distinguish some of these cases, but it also looks like there's no specific blocked status for things cancelled through the webrequest listener. It looks to me like we just don't show those resources in the network inspector. Honza, does that look right? Is there some other way to distinguish these requests, given that lots of things call nsIRequest::cancel with NS_ERROR_ABORT ?
I don't know if changes introduced in bug 1493599 can be any helpful here, it's for request blocked by the platform (CORS, CSP, etc.) not by addons.
But, there is another bug 1555057 for requests blocked by add-ons and one of the suggestion is introducing cancelWithReason
and use it in WebRequest
:
https://searchfox.org/mozilla-central/rev/0671407b7b9e3ec1ba96676758b33316f26887a4/toolkit/components/extensions/webrequest/WebRequest.jsm#830
This API is also mentioned in bug 1556451 and it sounds like it could be useful for several things (including this bug report).
Honza
Updated•5 years ago
|
Comment 27•5 years ago
|
||
This obviously affects the Tor Browser too (via NoScript), see https://trac.torproject.org/projects/tor/ticket/32225#comment:9
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Comment 33•5 years ago
|
||
fixing the downloads issue shouldn't depend on devtools netmonitor changes. The webextension functionality is ready with bug 1604618.
Comment 34•5 years ago
|
||
Fixing this probably involves handling the webextension case separately at https://searchfox.org/mozilla-central/rev/cfd1cc461f1efe0d66c2fdc17c024a203d5a2fd8/dom/webbrowserpersist/nsWebBrowserPersist.cpp#1359-1364 and/or https://searchfox.org/mozilla-central/rev/cfd1cc461f1efe0d66c2fdc17c024a203d5a2fd8/dom/webbrowserpersist/nsWebBrowserPersist.cpp#1330-1333 and/or https://searchfox.org/mozilla-central/rev/cfd1cc461f1efe0d66c2fdc17c024a203d5a2fd8/dom/webbrowserpersist/nsWebBrowserPersist.cpp#1241-1243 . It'd be helpful if someone could either debug or clarify from the webext side, at what point in the channel lifetime webextensions can/do cancel URI loads right now.
Comment 35•5 years ago
|
||
(In reply to :Gijs (he/him) from comment #34)
It'd be helpful if someone could either debug or clarify from the webext side, at what point in the channel lifetime webextensions can/do cancel URI loads right now.
That is documented on mdn [1] Or is that query about translating webrequest to httpchannel? "http-on-modify-request", "http-on-before-connect" and http-on-examine-* are probably the most common, but any webrequest api documented as accepting the blocking param will potentially be able to cancel. You can see the mechanisms used for various events here[2].
Each channels loadinfo now has a cancel reason on it, and the property bag on the channel will contain the extension id. So it is possible for the download to be aware that an extension has canceled some part of a page download. I'd probably check the cancel reason on loadinfo at any point that the download may be cancelled via the channel.
[1] https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/webRequest
[2] https://searchfox.org/mozilla-central/rev/cfd1cc461f1efe0d66c2fdc17c024a203d5a2fd8/toolkit/components/extensions/webrequest/WebRequest.jsm#1154
Updated•5 years ago
|
Comment 36•5 years ago
|
||
This has been a nuisance for me for a year or more. But I've just gotten used to clicking the downloads dropdown and using retry to save it correctly. However, over the past month or so I've noticed that for some websites it will actually mess up the second time, rather than fixing the problem, so for some websites I need to leave it "failed" for the saved webpage to work, and if I do retry it, the saved version becomes corrupted (losing elements like formatting or some dynamically loaded components).
My system: OSX with Firefox 72.0, with Adblock Plus extension enabled. (Yes, if I disable it, webpages save correctly, but I shouldn't be required to do that when they load correctly in the browser, and when the desired blocking is unrelated to the websites that have problems saving!)
This happens to most websites I visit (I'm guessing most websites that have dependent files to load). One easy (and popular) website to try this at is Worldcat, such as http://www.worldcat.org/oclc/1028407523 (but any book will do). As you can see there, when you load the webpage, Worldcat dynamically loads a list of nearby libraries that have the item. (Adblock Plus has no effect on this when viewing!) But when I try to save the page, it says "failed" even though it works, and if it retry, then that dynamically loaded section is removed from the saved file! (As a researcher, I'm often specifically saving pages for there to remember where rare books are located. But this is just one example of many websites where this happens.)
This is becoming infuriating.
Updated•5 years ago
|
Updated•5 years ago
|
Comment 37•5 years ago
|
||
Hi, I've just set up Windows 10 version 1909 (I was on windows 8.1 before), and I have the bug since that with Firefox 75 and ublock origin 1.26.0.
Description :
When I try to save a FULL webpage (with the pictures, not just the html part) with some blocked stuff inside, Firefox tells me that the download failed. Which is wrong.
A specific URL where the issue occurs :
Steps to Reproduce :
Go to that webpage.
Do Ctrl+S, select FULL webpage
Click on Save
Expected behavior:
The download icon of firefox should be blue and indicates that the download succeded.
Actual behavior:
The download icon shows an orange circle and says it failed.
Your environment :
uBlock Origin version: 1.26.0
Browser Name and version: firefox 75.0 64 bits
Operating System and version: windows 10 64 bits
I hope someone could fix that because that annoying.
Comment 38•5 years ago
|
||
We're aware of this issue, we know what causes it, but it currently isn't a priority to address. I'd be happy to review a patch, though even if this was a priority it'd probably make sense to wait for the refactoring in bug 1576188 to land first.
My understanding is that there's a very easy work around: disable ublock temporarily when you save a webpage.
Updated•5 years ago
|
Comment 39•5 years ago
|
||
(In reply to :Gijs (back Tue 14; he/him) from comment #38)
My understanding is that there's a very easy work around: disable ublock temporarily when you save a webpage.
Hi, for your information disabling Ublock origin via the extension doesn't solve the issue... It must be disabled in about:addons.
Comment 40•5 years ago
|
||
(In reply to Julien L. from comment #39)
(In reply to :Gijs (back Tue 14; he/him) from comment #38)
My understanding is that there's a very easy work around: disable ublock temporarily when you save a webpage.
Hi, for your information disabling Ublock origin via the extension doesn't solve the issue... It must be disabled in about:addons.
This may happen when "context" of request is not preserved. AFAIK parentFrameId
and originUrl
must be correctly set in webRequest callback for uBO to know which filters to apply. If this info is lost uBO will not know if request is whitelisted.
Comment 41•5 years ago
|
||
We won't know the (internal equivalent of the) parentFrameId
here, at least for now. It's possible the refactor in bug 1576188 will help with that though.
Comment 42•5 years ago
•
|
||
disabling Ublock origin via the extension doesn't solve the issue
I can't reproduce this on my side, disabling uBO on https://www.freenews.fr/
makes the "Save As..." error disappear. Also, uBO does not use parentFrameId
when tabId
is -1
. uBO's logger should be used to validate that uBO didn't block anything at "Save As..." time after being disabled. If the logger does not show anything being blocked, it must be investigated that something else than uBO blocked network requests.
Comment 43•5 years ago
|
||
Oh, sorry, I did not checked this thoroughly, turns out on my side ClearURLs was responsible for this error. It was configured to only redirect but somehow it was also blocking some request. I reinstalled and reconfigured and don't see errors anymore with uBO disabled for the page.
Comment 45•4 years ago
|
||
(In reply to :Gijs (he/him) from comment #38)
We're aware of this issue, we know what causes it, but it currently isn't a priority to address. I'd be happy to review a patch, though even if this was a priority it'd probably make sense to wait for the refactoring in bug 1576188 to land first.
My understanding is that there's a very easy work around: disable ublock temporarily when you save a webpage.
What about adding an option to disable any content-altering extensions in the page saving dialog?
Comment 46•4 years ago
|
||
(In reply to Digi from comment #45)
What about adding an option to disable any content-altering extensions in the page saving dialog?
I think the usual user expectation would be that resources blocked by the content altering extensions would also not be saved - if the page worked without them when rendered from the web, we can do the same when saved to disk, right?
Even if we added this option, some users would not use it, and the feature should Just Work in that case, and we shouldn't mark the download as failed.
Comment 47•4 years ago
|
||
(In reply to :Gijs (he/him) from comment #46)
(In reply to Digi from comment #45)
What about adding an option to disable any content-altering extensions in the page saving dialog?
I think the usual user expectation would be that resources blocked by the content altering extensions would also not be saved - if the page worked without them when rendered from the web, we can do the same when saved to disk, right?
Even if we added this option, some users would not use it, and the feature should Just Work in that case, and we shouldn't mark the download as failed.
I'd say that the usual user expectation is also that page saving doesn't fail due to obscure reasons.
Comment 48•4 years ago
|
||
PS from the user's perspective - saving a slightly different page is way much better than not saving it at all.
Comment 49•3 years ago
|
||
I'm still having this problem in v89, even when the adblocker is turned off. For instance, this page: http://rc-aviation.ru/chertplosk/91-ploskf22
Comment 50•3 years ago
|
||
(In reply to Digi from comment #48)
PS from the user's perspective - saving a slightly different page is way much better than not saving it at all.
I'd add that failed saves are often confusing as themselves. FF tells that save failed - but I see page's .html and folder present on disk. Some subresource was not saved maybe, but general result seems like success - marked as failed. So I don't know if I could end here, and what exactly is broken in saved copy.
Updated•2 years ago
|
Description
•