Open Bug 1007251 Opened 7 years ago Updated 23 days ago

Save Page As... > Web Page, complete does not save .htm / .html file on some sites

Categories

(Firefox :: File Handling, defect, P5)

x86_64
Windows 7
defect

Tracking

()

People

(Reporter: theinspectoralex, Unassigned)

References

Details

(Whiteboard: STR in comment 15)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0 (Beta/Release)
Build ID: 20140428174145

Steps to reproduce:

Right click, Save Page As... > Web Page, complete


Actual results:

On some sites firefox does not save the .htm / .html file, saves only "page title"_files folder

not working sites:
http://forums.mozillazine.org/viewforum.php?f=49  (working with addons disabled)
http://forums.mozillazine.org/viewtopic.php?f=38&t=2621801
http://lazarsoft.info/pehdr.html

same results on firefox 30 (beta)
http://forums.mozillazine.org/viewforum.php?f=49  working with addons enabled


Expected results:

Firefox should generate an .htm / .html file

ps. Save Page As function working on firefox 29 (release)
WFM on FF 30b2, 32.0a1 (2014-05-07) Win 7 x64.

(In reply to theinspectoralex from comment #0)
> http://forums.mozillazine.org/viewforum.php?f=49  (working with addons
> disabled)
It means one of your addons is causing the problem. Disable them one by one until you find the culprit and then send a notification to the addon's developer.
I tested the following 4 pages on all versions of firefox, the results:

pages:
http://forums.mozillazine.org/viewforum.php?f=49
http://forums.mozillazine.org/viewtopic.php?f=38&t=2621801
http://lazarsoft.info/pehdr.html
https://translate.google.co.uk/	

firefox 29 (release) with addons + without addons:    all pages can be saved

ff 30 (beta) with addons + without addons, ff 31 (aurora) with addons + without addons, ff 32 (nightly) 32bit/64bit with addons + without addons:

http://lazarsoft.info/pehdr.html   cannot be saved
https://translate.google.co.uk/    cannot be saved
http://forums.mozillazine.org/viewforum.php?f=49               working some times/random
http://forums.mozillazine.org/viewtopic.php?f=38&t=2621801     working some times/random

If a page is already saved and try to replace it, firefox delete the .htm file and doesn't create a new.
Some times firefox creates the .htm file and delete it.
Version: 32 Branch → 30 Branch
(In reply to theinspectoralex from comment #2)
> http://lazarsoft.info/pehdr.html   cannot be saved
> https://translate.google.co.uk/    cannot be saved
> If a page is already saved and try to replace it, firefox delete the .htm
> file and doesn't create a new.
I still can't reproduce, FF 32.0a1 (2014-05-11), Win 7 x64.
Please check on a new, empty profile:
http://support.mozilla.org/en-US/kb/Managing-profiles#w_starting-the-profile-manager
I tried with a new profile on FF 32.0a1 (2014-05-11), same results.

Something  I noticed, in the download list ff says "failed" for all these pages even if the htm file is saved.

http://imgur.com/S62VIuW,SoZm0En#0
http://imgur.com/S62VIuW,SoZm0En#1
Blocks: 115634
Component: Untriaged → File Handling
Product: Firefox → Core
Couldn't reproduce in 32.0a1 (2014-06-09), Win 7 x64.
Try to look after external apps which may interfere with Firefox. (ie. antivirus etc)
I have reproduced a similar issue on Windows XP, 7 and 8 on 10 + machines consistently.  A slight variation on what is described above will recreate it every time.  It affects any page with a frame or reference to an image (the pages I tested on were https).  

Basically, load the page from the site and then click File | Save Page As, then select Web Page, Complete. Select your target directory and name the file, then Save.

Next, set the browser to Work Offline (File | Work Offline).

Next, Open the htm file from the local drive.

Next click File | Save Page As, then select Web Page, and use the same file name and directory as before (you can click on the file and it will set the name to the same.  Click Save.

The Folder with the sub pages/elements will remain but the .htm file will be deleted.  Not in the trash, not hidden, but gone.  We have been trying to use this saving locally to allow inspectors to work offline and then resync with the server later but this problem has stopped us in our tracks.

I can provide sample files and a video of this process if needed.  Please make this a priority, when Save is actually Delete, people get really upset.
(In reply to Paul Silaghi, QA [:pauly] from comment #6)
> Couldn't reproduce in 32.0a1 (2014-06-09), Win 7 x64.
> Try to look after external apps which may interfere with Firefox. (ie.
> antivirus etc)

I have pretty much confirmed this isnt related to antivirus/malware, addons or os.  Would it help if I sent a zip with files and a video?  This is the first bug I've contributed on and am not sure how to help escalate this, any advice would be appreciated.

Thanks!
Flags: needinfo?(dcrouch)
(In reply to David Crouch from comment #7)
> Next click File | Save Page As, then select Web Page
web page complete or web page html only ?
I can reproduce if choosing 'web page html only' the second time, in offline mode.
33.0a1 (2014-07-13), win 7 x64

(In reply to David Crouch from comment #8)
> This is the
> first bug I've contributed on and am not sure how to help escalate this, any
> advice would be appreciated.
You did a good job by providing the steps to reproduce. Thank you for your support, and keep up the good work on other bugs !
Status: UNCONFIRMED → NEW
Ever confirmed: true
See Also: → 1061280
(In reply to Paul Silaghi, QA [:pauly] from comment #6)
> Couldn't reproduce in 32.0a1 (2014-06-09), Win 7 x64.

This bug can be reproduced as follows.  Create a text file with the following 6 lines

 function FindProxyForURL(url, host){
  if(shExpMatch(host, "*.optimizely.com"))
   return "PROXY 127.0.0.1:80"
  else
   return "DIRECT"
 }

then place a path to this file in the "Automatic proxy configuration URL" field under Firefox Preferences > Advanced > Network > Connection Settings.  Restart Firefox and go to http://mozilla.org.  If you attempt to "Save Page As...  Web Page, complete", then the saving operation will fail and no HTML file will be saved.

The 6-line Proxy Auto Configuration file described above is a common way to block ads.  In this case, it will block all traffic from optimizely.com.  But a component of http://mozilla.org is served by optimizely.com, and when the Proxy Auto Configuration file blocks this component, the Save Page As operation fails.

In previous versions of Firefox (such as Firefox ESR 24.8.1), a dialog box would notify the user that a component of the web page couldn't be saved, but then the save operation would still continue until completion.  In recent versions of Firefox (such as Firefox ESR 31.2.0) there is no dialog box and the save operation just fails, never saving the HTML file.
See Also: → 1089321
(In reply to Matthew Szudzik from comment #10)
> (In reply to Paul Silaghi, QA [:pauly] from comment #6)
> > Couldn't reproduce in 32.0a1 (2014-06-09), Win 7 x64.
> 
> This bug can be reproduced as follows.  Create a text file with the

This bug now appears to be fixed in Firefox 38.
Actually, this bug has not been fixed in Firefox 38. 

I am currently using 38.0.5 portable (with no adblock and noscript) and it's driving me completely up the wall. 

I think the problem comes from me having a modified hosts file set specifically to break the ads in the pages, but disabling that is not an option. 

My old FireFox 25 portable (with adblock, ghostery and noscript) is able to save the html pages properly, even if frames or images are missing. Never had any problems with 21 either.

It's basically the same symptom here. I browse a site, I click Save As. The download fails. I end up with a *_files folder and no html file. Sometimes it works if save the page again, sometimes I need to reload and save the page again. And at other times it just won't work whatever I do (on the same site, but in a different thread) and I have to print the page to pdf or close my browser and use another FireFox version.

Is there no about:config setting to just force the browser to save the html file regardless how many objects are missing inside it?

System config (probably not necessary, but in case needed):
Windows 7 Professional SP1 64bit, 8GB RAM, Non-unicode program display language and system language are Japanese.
Flags: needinfo?(silaghipaul)
I am using Safari on mac running OS X Yosemite 10.10.4, I just downloaded the latest version of Firefox and have no ad-ons.

I have a local site I built which I use to keep track of current invoices and totals. I need to edit and re-save the page (with user inputted data) constantly, and while it works the first time by selecting save page as / complete, if I make changes and select to re-save / replace existing file it deletes the .html file same a described above.
(In reply to BFG10K from comment #12)
> Is there no about:config setting to just force the browser to save the html
> file regardless how many objects are missing inside it?
I'm not sure, but I don't think so.

(In reply to Dylan from comment #13)
> I have a local site
Yes, I'm also reproducing the problem only with local pages.
Flags: needinfo?(paul.silaghi)
FF 40.0.2, 43.0a1 (2015-08-16) Win 7 x64.
Steps to reproduce:
1. Open google.com
2. Save Page As - Web Page, complete
3. Open the Google.htm file from the local drive
4. Save Page As - Web Page, HTML only - using the same name for the .htm file (overwrite the previous Google.htm file)

Actual results:
Google.htm is gone
Whiteboard: STR in comment 15
This bug affects me as well. I have to use SeaMonkey to save pages, which I can not save with Firefox.
While I am saving the problematic page, for example:
http://www.analyticsmarket.com/blog/upgrading-from-urchin_js-to-ga_js
via SeaMonkey I get info:  ...\ga.js could not be saved, because the source file could not be read.

I know, that some files can not be saved, because I block some domains via etc/hosts file.

So I believe this is the reason why FireFox is not able to save complete HTML (and it is able to save pure HTML).

I would say that there is not necessary special setting to force FireFox to save a page with missing parts - I find SeaMonkey's approach much better - just to inform that all parts of the page could not be saved but save all others parts - mainly HTML file itself.
Product: Core → Firefox
Version: 30 Branch → unspecified
The bug can be reproduced on Windows 7 with Firefox 47.01.

@ Wesly, may I have your help?
Do we need to fix this bug in Firefox 50 or put it on the backlog?
Flags: needinfo?(dcrouch) → needinfo?(wehuang)
(In reply to William Hsu [:whsu] from comment #17)
> The bug can be reproduced on Windows 7 with Firefox 47.01.
> 
> @ Wesly, may I have your help?
> Do we need to fix this bug in Firefox 50 or put it on the backlog?

Thanks for William's check. I feel it's not a super critical one for FFX 50, but indeed bothering to users so should consider fixing. For moving forward, I'll firstly check within our project team, if this is something related to the features we are going to implement/enhance thus able to handle:

1. If yes then put it into our backlog for work prioritization.
2. If not, reach out for further help.

keep ni on me for following.
(In reply to Wesly Huang (Firefox EPM) from comment #18)
> (In reply to William Hsu [:whsu] from comment #17)
> > The bug can be reproduced on Windows 7 with Firefox 47.01.
> > 
> > @ Wesly, may I have your help?
> > Do we need to fix this bug in Firefox 50 or put it on the backlog?
> 
> Thanks for William's check. I feel it's not a super critical one for FFX 50,
> but indeed bothering to users so should consider fixing. For moving forward,
> I'll firstly check within our project team, if this is something related to
> the features we are going to implement/enhance thus able to handle:
> 
> 1. If yes then put it into our backlog for work prioritization.
> 2. If not, reach out for further help.
> 
> keep ni on me for following.

Thanks for your reply.
No more concern about this.
:)
Flags: needinfo?(wehuang)
48.0b7 Same problem here (I have to use another browser in order to save the pages)  I'm not sure how it is possible NOT to consider a fix for this.  Isn't saving web pages in the top 10 requirements of a web browser?   I agree it wouldn't be so bad if it were efficient to have 2 or more browsers open at the same time, but cpu and memory usage by Firefox already is cumbersome and adding other browsers just for this task is too much.

FYI:  In case Mozilla decides not to fix this (I remember having a small utility that saved complete web pages... maybe this is an alternative to getting Mozilla to fix the problem) or (if Mozilla is reading this... maybe they could get permission from one of those utility programmers or find a free open-source one... to implement into the File/SaveAs Menu (I can even do that in Visual Basic so I know it's a few minute job...) if it is too difficult for their programmers to find the bug (because as easy as it is to add a menu function... bugs aren't hard to kill... they are hard to find)  :)
Chuck.
(In reply to Wesly Huang (Firefox EPM) from comment #18)
> (In reply to William Hsu [:whsu] from comment #17)
> > The bug can be reproduced on Windows 7 with Firefox 47.01.
> > 
> > @ Wesly, may I have your help?
> > Do we need to fix this bug in Firefox 50 or put it on the backlog?
> 
> Thanks for William's check. I feel it's not a super critical one for FFX 50,
> but indeed bothering to users so should consider fixing. For moving forward,
> I'll firstly check within our project team, if this is something related to
> the features we are going to implement/enhance thus able to handle:
> 
> 1. If yes then put it into our backlog for work prioritization.
> 2. If not, reach out for further help.
> 
> keep ni on me for following.

After checking with the team, we don't think this is an area our project team is able to take, thus reach out for help.

@Paolo:

Need your kindly advice here. Do you know the person who might be able to look into this issue?
Flags: needinfo?(paolo.mozmail)
We don't have a specific team that owns the whole page saving process, and the code is located in different areas of the source tree.

Paul, are the steps from comment 15 the only way we could consistently reproduce this bug? Does the download fail in step 4? Are you prompted for overwriting when selecting the file?

I'd like to understand this because if it only affects saving a page over itself, and you have a clear warning, it's probably lower priority than overwriting without prompting. Any clarification about the scope of this bug will help.
Flags: needinfo?(paolo.mozmail) → needinfo?(paul.silaghi)
(In reply to :Paolo Amadini from comment #22)
> Paul, are the steps from comment 15 the only way we could consistently
> reproduce this bug?
For me, yes. I couldn't reproduce the issue with online pages with any examples in this bug.

> Does the download fail in step 4?
Yes

> Are you prompted for overwriting when selecting the file?
Yes

No errors in the browser console.
Tested on 50.0a1 (2016-07-17), Win 7 x64.
Flags: needinfo?(paul.silaghi)
Thanks. Looks like we could investigate this as an improvement to save more web pages without errors.
Priority: -- → P5
Thanks to Paul and Paolo's prompt help!
(In reply to Paul Silaghi, QA [:pauly] from comment #23)
> (In reply to :Paolo Amadini from comment #22)
> > Paul, are the steps from comment 15 the only way we could consistently
> > reproduce this bug?
> For me, yes. I couldn't reproduce the issue with online pages with any
> examples in this bug.

My example of the bug is dependent on a block in my etc/hosts, which blocks a part of a page I want to save.

My conclusion is: FireFox can not save HTML file if it is supposed to save complete web page and there is a part of the page which is not accessible. It is not an improvement - it is a bug.
I can tell you that I still experience this problem in version 49.0.1 of FF. It consistently fails to save the URL (mentioned above) http://forums.mozillazine.org/viewforum.php?f=49
This also fails with all add-ons disabled. (I have a replaced hosts -file.)
.
One additional comment that may or may not be true for all, is that
after the page shows up as failed when peeking inside the 'blue downloaded' -arrow,
the page in question (failed) ALWAYS succeeds to download when I click RETRY !!!

reibjerk
FF Fails because of the modified hosts -file.
If I comment away the following lines in this host -file then FF File Save will work.

0.0.0.0 cdn3.optimizely.com
0.0.0.0 log3.optimizely.com
0.0.0.0 log.optimizely.com
0.0.0.0 odds.optimizely.com

Why does Firefox depend on these websites?
I have to add that my two previous comments may not hold logic completely...
I cannot know for sure where it may fail, but now, even with a hosts -file without changes (ie where there is normal behaviour) for .optimizely.com - I still see fails when saving some web pages...
As yet I cannot say if there are more entries that should have been altered back to normal (ie an 'empty' hosts -file) or what else may be the culprit ?
(In reply to reibjerk from comment #27)
> One additional comment that may or may not be true for all, is that
> after the page shows up as failed when peeking inside the 'blue downloaded'
> -arrow,
> the page in question (failed) ALWAYS succeeds to download when I click RETRY

Hello,

It looks actually weirder than that…
What I have experienced so far is that when it failed because of the edited hosts file:
* sometimes despite the ‘error’, the saved page is already
* sometimes it is not, either the page itself, or some attached files are missing.

In both case, I can click retry, and it shows up as success then. I obtain the .htm file of the page, plus all the attachment in the associated folder. BUT: most of the times, when I open the page, the CSS cannot be loaded, it seems that all links between the saved page and the attachments are broken. But this is not always the case.

Example: in my hosts file I got the lines
127.0.0.1				ssl.google-analytics.com
127.0.0.1				www.google-analytics.com
127.0.0.1				google-analytics.com

Then I go to the page: http://derstandard.at/2000044792832/Akustische-Philatelie
First save, marked as failed.
The page has actually been saved and I can open it well.
Then, I hit retry, and refresh the local saved file: all CSS are broken!

Example2: I go to this page: https://lejournaldupeintre.wordpress.com/2016/09/27/lycee-bergson-paris/ and wait until it fully loads.
I save the page, it fails.
The attachment folder has been created, but the htm file is NOT saved.
I hit retry. The saved page is displaying all right, because all links are absolute url (ie you must be online).

This behavior is really tiring, because you must be constantly looking at if the page has been correctly saved before you hit retry to avoid having an incomplete page…
See Also: → 1341998

I'm having this issue with Firefox 65.0.1. If I save a webpage using "Save page as" and close the tab the only thing that will remain is the folder. The html file will simply vanish. I've started using other browsers to save pages since Firefox is completely unreliable in this regard.

My trouble started just this year.
So far (July 2019), I have found that I must check the Download Progress Icon - the downward pointing arrow - to see the color blue (good) or the color yellow (not good).
If yellow, clicking on the downwards arrow icon will open a list of downloads and whether they are complete or failed.
If a download failed, I have found that clicking "retry" on this list will successfully effect the download on this 2nd try.

Of course, I randomly check these downloads every few days just in case. So far, all is well.

(In reply to Montag from comment #32)

If a download failed, I have found that clicking "retry" on this list will successfully effect the download on this 2nd try.

Please can you check that, for such a download, if the attachments (images, etc) are actually saved ? to me, the 2nd time, which do not fails, miss to save them and only a 'naked' page is stored.

Thanks

I index 100% of all my saved pages in an Excel spreadsheet by "inserting" a "hyperlink" to an "existing file or web page". I do this immediately after saving them.
If the page did not save properly, it does not appear in the "insert hyperlink" drop down list, so I immediately know whether I have a problem.

Furthermore, I will open various saved pages to see if they appear properly in the Firefox browser, and so far, so good.
This is a process I had to figure out myself; no one gave me a fix; maybe there is no one fix.

This morning I found that pages at SOME sites are affected in the "save page as..." process by Firefox add-ons, specifically NoScript in my case.

For SOME sites I have found that disabling NoScript entirely allows pages on these particular sites to

  1. be saved immediately, and
  2. be saved correctly as "complete" and not merely as "html".

I did this on the sites "The Atlantic" and "APS Physics" (https://physics.aps.org/articles/).

IF the problem is solely within the browser, this makes sense. Other add-ons may have effects, also.

When will saving as a webpage actually function correctly? This is 6 years old? Really? What's going on in Firefox development that a glaring bug for something basic a fundamental in web browsing on a desktop, which occurs on the latest Win and Mac (on my end), just floats around like this. I seriously want to look SOMEONE in the face who works on this and just ask why this is so hard? I'm not the expert, so please explain why this is acceptable after 6+ years, I'm dying to hear the rationale behind failing downloads being left as normal in a web browser. This is on BOTH latest mainstream OSes people!

(In reply to Montag from comment #35)

This morning I found that pages at SOME sites are affected in the "save page as..." process by Firefox add-ons, specifically NoScript in my case.

For SOME sites I have found that disabling NoScript entirely allows pages on these particular sites to

  1. be saved immediately, and
  2. be saved correctly as "complete" and not merely as "html".

I did this on the sites "The Atlantic" and "APS Physics" (https://physics.aps.org/articles/).

IF the problem is solely within the browser, this makes sense. Other add-ons may have effects, also.

Why would it work fine when doing a "retry" then, which is what happens under my observation? Fails first time on SOME sites, but ALWAYS works the second time "retrying it"

Why not just have it save "the way that works" the first time?

This is just silly and a reason why people switch browsers. Normal people just think things like "That Firefox never saves my pages right" and they move on. I'm only here saying this because I love Firefox and hate that this is STILL going on.

(In reply to dandelion54 from comment #37)

Why would it work fine when doing a "retry" then, which is what happens under my observation? Fails first time on SOME sites, but ALWAYS works the second time "retrying it"

Why not just have it save "the way that works" the first time?

Hum.. to me the 'retry' is actually not alright. Yeah, the program does not show the second time that the save action failed; but... the second time the tape is sometimes saved without the attachments. Perhaps not in all cases; but please have a look to open the saved web page, and tell us if the images, styles, etc. are locally served.
I think the behaviour is the following. First time, Firefox saves attachments + the page. One of the attachments fails (no access to host for example, or event 404 maybe?), so the save action stops, and the .htm may not even be created. Second time (retry), Firefox knows there was an error due to an attached files, and it does a 'single page only' save action.

This is just silly and a reason why people switch browsers. Normal people just think things like "That Firefox never saves my pages right" and they move on. I'm only here saying this because I love Firefox and hate that this is STILL going on.

When will saving as a webpage actually function correctly? This is 6 years old? Really? What's going on in Firefox development that a glaring bug for something basic a fundamental in web browsing on a desktop, which occurs on the latest Win and Mac (on my end), just floats around like this. I seriously want to look SOMEONE in the face who works on this and just ask why this is so hard? I'm not the expert, so please explain why this is acceptable after 6+ years, I'm dying to hear the rationale behind failing downloads being left as normal in a web browser. This is on BOTH latest mainstream OSes people!

I fear it must be "political". If you look at other browsers such as Edge, you can see there is no longer a save action! Chrome has different, but similar, issues, with save actions. The truth must be that, even if this is a basic feature, and certainly required, there aren't may people using it, so "normal people" do not even notice "that Firefox never saves my pages right".
I wouldn't be surprised that in the future more and more browsers will remove that feature then, expecting people only to access online content after all. I mean developers maintain what is mainly used, and users use what they are offered, and webmasters/service providers do not really expect their pages to be saved offline. But letting down such a feature would be a shame, as it questions our right to access, archive and dispose the information. So as I said, this is political, and we must get our voice heard!

I have to agree with guinotphil in regards to this ongoing issue with saving webpages with Firefox.
I save pages that I visit when there is information I will need to retrieve again or at a later time.
For Example, a simple page like: https://getgreenshot.org/faq/are-there-any-dependencies-to-other-software-frameworks/
I am unable "Trust" Firefox when it is saying Failed or complete. Lately I've been having to try to open the saved info to see if it truly works or not. This is a wasteful process and this ongoing saving issue should be resolved sooner than later.
I really hope this issue is resolved since it has been an ongoing issue for 6 years.

You need to log in before you can comment on or make changes to this bug.