Closed Bug 115174 Opened 21 years ago Closed 14 years ago

Save Page As Web Page, HTML Only will attempt to resubmit cgi to save instead of using cached or displayed content

Categories

(Core Graveyard :: File Handling, defect, P2)

x86
All
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: bugzilla, Unassigned)

References

(Blocks 1 open bug, )

Details

(4 keywords)

Attachments

(1 file)

bill has a counter cgi test to illustrate this, but here's a summary test case
he told me about:

1. go to a page that has a counter. let's say the counter displays "1013 visits
to date!"
2. save the page [ctrl+S, File > Save Page As, etc.].
3. look at the page that you had saved locally.

results: the counter now reads "1014 visits to date!"
feel free to resummarize this bug --my language skills are kinda woggly today. ;)
Keywords: regression
Would trying PERSIST_FLAGS_FROM_CACHE before falling back on PERSIST_FLAGS_NONE
work?
Summary: saving page w/counter will resubmit cgiand save that rather than the cached version → saving page w/counter will resubmit cgi and save that rather than the cached version
Status: NEW → ASSIGNED
Priority: -- → P2
Target Milestone: --- → mozilla0.9.8
the way the idl is documented suggests that using PERSIST_FLAGS_FROM_CACHE would
result in potential failures if the document was not cached. Isn't the default
behaviour of WBP to use cached data if it's present, (which in the counter case
it certainly is) - Adam?
Adding the PERSIST_FLAGS_FROM_CACHE should ensure the cached copy is saved if it
exists but it depends how the counter data is being embedded. If the site uses
JS to generate the counter, this might not work, e.g. my own site
(http://www.iol.ie/~locka/mozilla/mozilla.htm).
the problem with doing that though is that according to the documentation, this
could cause other saves to fail if data isn't in the cache....

based on my interpretation of the interface, the default behaviour should be
best, if it uses cached data if it exists, falling back on retrieving from
network. Is this what happens?
I believe the default behaviour should fetch from cache if it exists or from the
net otherwise. Doug can you just confirm that this code below should do just
that - when I fetch a URI it will get it first from cache unless the flags say
otherwise?

http://lxr.mozilla.org/seamonkey/source/embedding/components/webbrowserpersist/src/nsWebBrowserPersist.cpp#458
I discussed this a bit with Ben yesterday and suggested that we may not be able
to use the same cache strategy for all downloads.

In the case of "save page" (and various frame-related flavors of that) and "save
image," then we most likely want whatever copy is in the cache (if there is
one).  That's "validate never" in terms of the lower-level (and older?) necko
caching flags (which are distinct from those supported on the
nsIWebBrowserPersist interface).

But in the case of "save link as," we need to do standard cache validation, I
think (i.e., respecting the caching control specified by the web server via http
response headers, and, the user prefs for validating once-per-session, etc.). 
If we don't do that, then "save link as" will result in saving stale data
sometimes.  I think "save link as" should save the same information the user
would see if they clicked on the link.

Note that this is how nsIStreamTransfer::selectFileAndTransferLocationSpec used
to be utilized.  It has a "doNotValidate" argument that was PR_TRUE for saving
pages and images, and PR_FALSE for saving links.
so my question is, is it likely that you could miss in the cache in any of the
save sitations proposed? (and thus fail to do anything - not retrieve from network) 
Keywords: nsbeta1
Target Milestone: mozilla0.9.8 → mozilla0.9.9
OK, I pushed this out to .9.9 because even when I pass in the unconditional
USE_CACHE flag, the counter at the URL specified is incremented when I load the
saved version and compare it to the one on screen. Adam, this seems to imply
something is incorrect inside webbrowserpersist, unless I'm missing something?
Doug can you confirm that webbrowserpersist is doing everything it can to ensure
data gets pulled from the cache first?

http://lxr.mozilla.org/seamonkey/source/embedding/components/webbrowserpersist/src/nsWebBrowserPersist.cpp#458

The http spec <http://www.ietf.org/rfc/rfc2616.txt> does allow servers to
specify cache control and expiry headers so downloaded data might not be in the
cache but in this instance there appears to be none of that.

The log of data being sent back and forth indicates that this is a
straightforward operation and the data should be retrived from cache if it is there.

D:\personal\junkbuster\junkbstr.exe: accept connection ... OK
scan: GET /cgi-bin/counter.exe HTTP/1.1
scan: Host: law.mcom.com
scan: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.4)
Gecko/20011019 Netscape6/6.2
scan: Accept: text/xml, application/xml, application/xhtml+xml, text/html;q=0.9,
image/png, image/jpeg, image/gif;q=0.2, text/plain;q=0.8, text/css, */*;q=0.1
scan: Accept-Language: en-us
scan: Accept-Encoding: gzip, deflate, compress;q=0.9
scan: Accept-Charset: ISO-8859-1, utf-8;q=0.66, *;q=0.66
scan: Keep-Alive: 300
scan: Proxy-Connection: keep-alive crunch!
addh: Proxy-Connection: Keep-Alive
D:\personal\junkbuster\junkbstr.exe: GPC	law.mcom.com/cgi-bin/counter.exe
D:\personal\junkbuster\junkbstr.exe: connect to: law.mcom.com ... OK
scan: HTTP/1.1 200 OK
scan: Date: Wed, 16 Jan 2002 18:51:08 GMT
scan: Server: Apache/1.3.14 (Win32)
scan: Transfer-Encoding: chunked
scan: Content-Type: text/plain
D:\personal\junkbuster\junkbstr.exe: accept connection ... 
darin should look at that.
yes, it appears that the improper load flags are being set on the http channel
used to request the document.  nsIRequest::LOAD_FROM_CACHE should be set on the
channel to avoid hitting the net for the document.  in this case it appears that
this flag is not being set, and because the page is served up w/ a zero
freshness lifetime, it is re-requested from the server instead of being pulled
from the cache (just as it would be if the user visited the page via a href).
Ben, webbrowserpersist will pass nsIRequest::LOAD_FROM_CACHE if
PERSIST_FLAGS_FROM_CACHE is passed to it, therefore I think this is the thing to
try first.

If that doesn't work, I don't know what the problem could be. The load flag
setting coding in webbrowserpersist is pretty straightforward.
*** Bug 118487 has been marked as a duplicate of this bug. ***
OS: Linux -> All (according duped bug 118487)
OS: Linux → All
nsbeta1+ per ADT triage team, assuming this means an incorrect page could be
saved in the general case.  Adding topembed keyword in case this is an embedding
issue too.
Keywords: nsbeta1nsbeta1+, topembed
-> 1.0
Target Milestone: mozilla0.9.9 → mozilla1.0
isn't this a really old bug, related to view source doing the same thing as well?
Whiteboard: dupeme
nsbeta1- per ADT
Keywords: nsbeta1+nsbeta1-
Target Milestone: mozilla1.0 → mozilla1.2
topembed- per EDT triage.
Keywords: topembedembed, topembed-
This is fixed in recent builds. It was related to the famous view
source/CGI/cache problem, I think.

I tested this successfully at http://www.sdsu.edu/~boyns/counter.html  with 1.0
RC1 branch Build ID: 2002050706 Windows 98. 

Updating URL, as the old one is dead.

Can we resolve this worksforme?
As a note, the view-source fix should not have affected this... so if it now
works, it's for some other reason.
nominating for buffy.

but should this go to adam or rpotts?
Keywords: nsbeta1-nsbeta1
WFM.
*** Bug 166584 has been marked as a duplicate of this bug. ***
QA Contact: sairuh → petersen
Depends on: 177329
nsbeta1- per the nav triage team.
Keywords: nsbeta1nsbeta1-
Despite whatever "Cache-Control" header the server has given you, there needs to
be a way to save the CURRENT PAGE.  Ie, whatever source is being rendered on the
browser right now.

I use this feature to save invoices / order confirmations from several vendors.
 But I've "regressed" to using Internet Explorer for all orders, because Mozilla
will always resubmit the order if I try to "Save" the confirmation / invoice.

"Save Page" should save what's being displayed on the browser window, just like
"View Source" should show you the source that's being rendered, rather than
reconnecting with the website and downloading new source.  This is a Mozilla
anachronism.
mozilla@acct.whipsnap.org:

are you saying that mozilla will automatically POST data to a server without
asking you for confirmation first?  if so, have you tried increasing the size of
your memory and/or disk cache to remedy the problem?  (i'm not claiming that
this is a great solution, but i'm just trying to understand the problem you are
seeing.)  thx!
*** Bug 151124 has been marked as a duplicate of this bug. ***
> are you saying that mozilla will automatically POST data to a server

Yes.  

> without asking you for confirmation first?

Yes.

The problem is that until bug 170722 got fixed you could pass the postData to
the persistence object (so we did), but you could not pass a cache key (so it
never set the channel to only read from cache).  The result was that trying to
save a POST result page would repost silently.

Now that the persistence object takes a cache key, we should be able to fix
this.... (contentAreaUtils.js, just pass around the nsIWebPageDescriptor along
with the postdata stream....)
I spoke too soon.. the postdata handling in there is _also_ broken; it just uses
the postdata for the toplevel page and breaks if you do, eg, "save frame as". 
<sigh>.
*** Bug 193110 has been marked as a duplicate of this bug. ***
I would argue for a severity increase (dataloss).  This problem can result in
people losing things like order numbers, inadvertently placing duplicate orders
and similar.  As a quick fix, warning the user that the page will be resubmitted
is essential.  Moz 1.2.1 doesn't appear to suffer this problem, but 1.3b does.

URL needs updating as broken, how about http://www.sys3175.co.uk/try.php ? 
Description for this bug is terrible, basically unfindable by people about to
file this as a bug. My experience is this only happens with POST pages - is this
the case? (if so, the description needs to reflect this, to make it easier for
people to find => less dupes).  Target milestone also needs updating.
Depends on: 84106
Removing past target.  Adam, didn't this get worked on recently on a separate
bug report?
Target Milestone: mozilla1.2alpha → ---
The patch against bug 177329 would add some extra flags to allow finer cache
control behaviour. This bug covers a lot of the same ground as that one.
Blocks: 210065
I've seen this behavior when making payments to Discover Card
(http://www.discovercard.com) and Citicards (http://www.citicards.com).  Of
course I can't give complete URLs without revealing personal banking
information.  The Citicards web server gives me an error message saying
essentially that my payment has already been submitted.  This indicates to me
that the form may be being sumbitted again when "Save Page As, HTML Only" is
selected.  What really concerns me about this is that, on a less smart web page,
a user's payment may be submitted twice, when all the user wanted to do was save
a copy of the payment receipt.  This is more than just annoyance, it could cause
people's checks to bounce unexpectedly.  I've seen web pages (can't remember
URL's off the top of my head, though) which say  something to the effect of
"Press the Submit button ONLY ONCE, or your account may be debited or charged
twice."  Selecting "Save As" should NEVER cause a form to be re-submitted, at
least not without a warning to the user.

I first made this comment in bug 144541, but it probably belongs here.  I'm
surprised this bug isn't higher severity, as in my opinion it makes Mozilla
unsafe for doing ANYTHING involving money, as the risk of duplicate transactions
and lost confirmation pages (invoices, order numbers, etc.) is too great.  I'll
use Mozilla for browsing, but Internet Explorer gets opened when I go to my
bank's web site or an online store.
*** Bug 144541 has been marked as a duplicate of this bug. ***
Correct.  Setting dataloss/critical per Comment #33, etc.  I've saved pages that
turned out to be important and were not actually saved.  Unfortunately the two
bugs this depends on need fixing; it'll take a bunch of developer work and time.
 Perhaps the Release Notes need an entry.

Replacing summary :saving page w/counter will resubmit cgi and save that rather
than the cached version
with : Save Page As... will attempt to resubmit cgi/reload  (e.g.
bank/financial/purchase/post transaction is re-attempted and may fail, or even
worse, succeed; page counter will increase) and try to save that rather than the
cached or the displayed version.

Removing URL (http://www.sdsu.edu/~boyns/counter.html); it's now 404.

Marking Bug 144541 a duplicate. 'Save Page As...' should NEVER cause a form to
be re-submitted.  Bug 185368 is a symptom/dupe of this bug and the now
deprecated frame.
Severity: normal → critical
Keywords: dataloss
Summary: saving page w/counter will resubmit cgi and save that rather than the cached version → Save Page As... will attempt to resubmit cgi/reload (e.g. bank/financial/purchase/post transaction is re-attempted and may fail, or even worse, succeed; page counter will increase) and try to save that rather than the cached or the displayed version.
http://bugzilla.mozilla.org/show_bug.cgi?id=160454#c6 perhaps explains how this
bug can be fixed, though I wonder if printing a page that's being displayed but
isn't cached and mustn't be reloaded over the 'net is going to require a change
in how printing works.   Perhaps this bug shold be filed under printing, not
file handling?  (Just a guess; perhaps Chris Lyon, who I note has experience
assigning, can tell us if I'm FOS.)
what does printing have to do with this? printing can just use the in-memory DOM
and re-layout it; while saving the page really does require the original page
from somewhere.
I haven't verified this, but it looks like maybe encrypted pages aren't cached.
Useless-ui as "File | Save Page As" fails to save page in these instances.

Ben Goodger is assignee. Considering his hard work on MozillaFirebird, he might
not have time to work on this bug. Should we reassign it?
Flags: blocking1.7a?
Keywords: useless-UI
To whom, exactly?  Ben did check in the code in question, you know...
Clobbering useless ui. Useless and not working 100% aren't the same.

For the people who are whining in this bug instead of posting patches: Does
File>Save Page As...>Save As Type: Web Page, complete. Give you the same problem?
Keywords: useless-UI
Yes, same problem.  The problem seems to occur before you even get the chance to
select 'Web page, complete'.  I'm using 1.3a, btw.

To reproduce, go to www.theprinterworks.com and add something to your shopping
cart.  When I try to save the shopping cart page, I get the error "The link
could not be saved.  The web page might have been removed or had its name
changed."  'View:Page Source' on the page seems to work, though.

If 'Work Offline' is selected, 'Save Page As...' just silently fails.
Please ignore my earlier assertion.  I have no clue how the cache works.
Erm. Nothing before 1.4 should be considered remotely supported unless you're on
MacOS9 in which case you should be using a vendor for tech support.

For anyone using a non branded mozilla you should be using 1.5 or later.

If you're using something older than those you should provide a reason ("Bug
XXXXX made mozilla 1.4 and later unusable").
1.6b does the same thing, except, when working offline, I now get the same error
message as I do when online.
We should really hold the full, original, source of any page that we're still
displaying in memory.  Considering our current expansion factor, it wouldn't
increase our footprint too much, and it would allow us to solve this class of
bugs with saving pages and similar bugs with character encoding switching.
looks like we are going to miss the window for 1.6 on this one.
Flags: blocking1.7b?
Flags: blocking1.7a?
Flags: blocking1.7a-
Flags: blocking1.7b?
Flags: blocking1.7b-
Flags: blocking1.7?
It would be great to have a testcase for this bug.
Flags: blocking1.8a?
not likely for 1.7. 
Flags: blocking1.7? → blocking1.7-
Flags: blocking1.8a?
Flags: blocking1.8a-
Flags: blocking1.7a-
Flags: blocking1.8a2?
from posting to the newsgroup...

Felix Miata wrote:

>> In a generic sense, this has happened to me before. Today in 1.7rc2 was 
>> different than I remember before. When trying to save the "thank you for 
>> placing your order" page at https://servicesales.sel.sony.com/..., which 
>> came up after placing an online order with a credit card, I was greeted with 
>> this alert:
>> 
>> The link could not be saved. The web page might have been removed or had its 
>> name changed.
>> 
>> I got the same result doing view source and trying to save that. I was able 
>> to save the page by selecting all on view source and copying that to some file.
>> 
>> I have previously been unable to save thank you pages from 
>> https://secure.computergate.com/ online purchases. There, instead of the 
>> alert I got today from Sony, the page does save, but differently than 
>> displayed, losing all relevant content and instead including the following 
>> content:
>> 
>> "Unauthorized Access Detected, Access Denied
>> 
>> You attempted to access a secure page without logging in first, OR you 
>> attempted to access a secure page in the incorrect sequence. . . ."
>> 
>> The only existing bugs that look at all related to this are bug 166786 & bug 
>> 172429, but neither seem to be on point. Does anyone know of an existing bug 
>> on point? If not, can someone explain to me what special information needs 
>> to be collected the next time in order to file a good bug?
I ran into this bug weekly in the same scenario, as I buy
vitamins/music/books/tickets/etc on the web and save the page off for later.
This is another one of those bugs that once I ran into it and saw the bug (this
one) had been entered was went unfixed for 3 years that I went back to using IE.
As I commented on another bug, if 1.7 is to be the "long lived" branch, it seems
that major deficiencies like this in a common use case should raelly be fixed.
On http://www.pcimicro.com/ on the post-order-submission (confirmation) page it
isn't even possible to view source, much less save the page. Trying either gets
an empty shopping basket page.
This bug and the same concerning view source make it hard to use mozilla for
intranet applications. The source never maps with what is displayed. Often, the
source is the error page for unauthentified users or for session lost. 

Today I was just unable so save an error page to send it to the developer.

The same way, it is impossible to save page that summaries a payement
transaction because they cannot be done twice (hopefully most servers don't make
you pay twice...)
Workaround:  If you only need to save a copy of the rendered page view, use the
Print function.  Print to a .ps file, or use something like PDFcreator for .pdf.
Useful for saving invoices, banking statements, etc.  Not useful for seeing
actual page sources.

Note: In no way is this intended to discourage squashing of this bug.  It is a
bad one and IMO really needs to be fixed. 

<http://sourceforge.net/projects/pdfcreator/>
Flags: blocking1.8a2? → blocking1.8a2-
Another workaround is to save the page as text.  You lose all the formatting but
the text is OK.  I've been doing this for years.

But, yes, I agree that this is a nagging problem and needs to be fixed soon.
critical, dataloss: seems a candidate for "blocking-aviary1.0PR"
Flags: blocking-aviary1.0PR?
Flags: blocking1.7b-
Flags: blocking-aviary1.0PR? → blocking-aviary1.0PR-
I believe that I have been the victim of this bug, using Firefox 0.9.1 on MacOS
10.3.  I used "Save Page As..." on a .jsp page contaning an order confirmation,
and I ended up with a duplicate credit card charge that was a pain to rescind. 
I just now saw the problem on another .jsp page using Firefox 0.9.3; luckily it
was not a financial transaction.

I am rather shocked to see the long history of this bug.  If it is so hard to
fix, at least warn the users -- "Saving this page as html may re-submit the
information used to generate it: OK/Cancel".

I really can't understand why it takes years to fix this problem which is 
for me (and many others) critical and almost showstopper for a real Firefox 1.0
or whatever non beta ...
Present in Firefox 1.0 release (it's caught me twice.)

Mozilla/5.0 (Windows; U; Win98; en-GB; rv:1.7.5) Gecko/20041110 Firefox/1.0

This kind of bug could generate really bad PR, as it costs people money and it
upsets the banks.  I support the just-in-case dialogue box of Comment #59, if
nothing better is possible.  For Firefox's sake a link is going in my signature.
 Time to get this one squashed.
There seems to be a completely *weird* philosophy in saving / viewing source.
Under no circumstances should either function have to access the site again.

The chosen philosophy causes an endless list of problems, ranging from annoying
to dataloss/moneyloss critical.

A security patch really should be issued for Moz and Firefox ASAP.
Can someone provide a link to a page that shows this problem in a nightly build?
And no "go buy something on e-bay then try to save" or "sign up for a back
account then transfer your money". A page that we can get to without signing up
for anything and can submit over and over without fear of going backrupt.
http://www.duckbytes.com/dbstore_demo.php is a demo for an online store system.
 It acts like a real online store, but doesn't actually charge money (and you
can enter fake data for everything, I entered "123" for the credit card number).
 I placed an order, attempted to save the confirmation page, and the saved page
was not the displayed confirmation page but a page which said "cart is empty". 
This is exactly what you get if you hit the reload button, only you don't get a
warning when you attempt to save the page.

I'm also adding this bug to be tracked in bug 288462.
Blocks: 288462
Both Save As and View Source work for me on that page in Mozilla/5.0 (Windows;
U; Windows NT 5.1; en-US; rv:1.8b2) Gecko/20050326 Firefox/1.0+

Have you (can you) try with a nightly build (http://www.mozilla.org/developer/)?

Also, what are your cache settings?
Ok, I downloaded the nightly build (I had been using Firefox 1.0.2).  I was able
to reproduce the problem.  However, I did notice it only happens when using "Web
Page, HTML only".  "Web Page, complete" (and also "Text Files") correctly saves
the page.  I usually like to save HTML only because I don't care about all the
image files (logos, etc.) that are typically on these types of confirmation pages.

My disk cache is set at 50 MB and I cleared the cache before doing this test.

I still would be wary about even using "Web Page, complete" (and thus I'll
continue to use IE for these types of transactions) until I can be assured that
Mozilla will never, ever, ever, ever repost form data without warning me and
without my confirmation.  To me this behavior is about as bad as an e-mail
program which opens executable attachments without warning.
Gotcha, I can reproduce with HTML only.
Summary: Save Page As... will attempt to resubmit cgi/reload (e.g. bank/financial/purchase/post transaction is re-attempted and may fail, or even worse, succeed; page counter will increase) and try to save that rather than the cached or the displayed version. → succeed; page counter will increase) and try to save that rather than the cached or the displayed version. Save Page As Web Page, HTML Only will attempt to resubmit cgi/reload (e.g. bank/financial/purchase/post transaction is re-attempted and may fail, o…
Asking to block since it is a dataloss bug.
Flags: blocking-aviary1.1?
Flags: blocking1.8a2-
Flags: blocking1.8a1-
Flags: blocking1.7-
Flags: blocking-aviary1.1?
Flags: blocking-aviary1.1-
Flags: blocking-aviary1.0PR-
*** Bug 293470 has been marked as a duplicate of this bug. ***
*** Bug 235142 has been marked as a duplicate of this bug. ***
(In reply to comment #68)
> Asking to block since it is a dataloss bug.

I'm very surprised this was minused, as it may also be a security risk. 
*** Bug 300037 has been marked as a duplicate of this bug. ***
This happened to me again today (Firefox 1.0.5), and this time it has actually
cost me money (unlike any security bug this far). Maybe the talk of CGI counters
in the beginning of this bug distracts people into thinking this is important.

At the very least it should refuse to anything if it cannot save the cached
page. Submitting a URL is not a free operation!

I now have the regretful task of calling the suppliers premium rate number from
another continent to undo the duplicate charge (if that is at all possible).

This bug should have the 'ecommerce' keyword, but I am just a lowely user and
cannot add that.


Flags: blocking-aviary2.0?
Flags: blocking-aviary1.1?
Flags: blocking-aviary1.1-
Flags: blocking-aviary1.0.7?
Flags: blocking-aviary1.0.6?
Not a regression.  Minusing for aviary1.0.6/1.0.7.
Flags: blocking-aviary1.0.7?
Flags: blocking-aviary1.0.7-
Flags: blocking-aviary1.0.6?
Flags: blocking-aviary1.0.6-
Flags: blocking-aviary1.1? → blocking-aviary1.1-
For a clear example, go to http://www.cartex.co.uk and add some items to your
basket, then save the basket page as Web Page HTML only. Open the saved page and
- voila, another item magically appears. In the past I've actually had a
duplicate order with this company because I saved the confirmation page - and it
resubmitted my order.

Judging by how long this bug's been around, the underlying problem will take
some time to fix properly. In the meantime, as others have said, the browser
should definitely warn when you try to save this kind of page. Would it be
helpful if I filed a seperate bug for the warning dialog? 
Can this be fixed please?
Darin, biesi, I think this is something consumers of webbrowserpersist have to
handle (try with the "only from cache" flag, and if that fails request
permission from the user to retry for real)...  webbrowserpersist itself has no
way to know what you mean when you call saveURI, and imo should not be posing
any prompts.

Thoughts?
Shouldn't we instead have a way to guarantee that we have the source available
for any Web page that we're displaying?  Why should the fact that we don't ever
have to be "handled"?
We could just keep two copies of the source, I suppose (in cache and not in
cache).  That's the only way I can see to always have the source available given
that the cache can be disabled.
I think that this should be done:
- bug 262350 should be fixed
- store the cache token for the currently-loaded page on docshell or
contentviewer something
- make "save page as" use that function
- then do comment 77
Depends on: 262350
I don't think we should rely on cache tokens.  For example, the cache will
refuse data that exceeds a certain byte length, so the cache token will be
invalid.  There are other cases too where we might not have a usable cache token.
Assignee: bugs → file-handling
Status: ASSIGNED → NEW
QA Contact: chrispetersen → ian
Whiteboard: dupeme
As a user, the expected result of saving or printing a page currently being
displayed is to do exactly that -- not re-fetch the page first.  As others have
already pointed out, a multitude of problems can be caused by the way Mozilla
currently handles this.  Call it a technical issue, call it a HCI issue..
regardless, it needs to be fixed ASAP.
Another user here. I heard about this from Slashdot. Anyone care to give a
user-friendly status update, when we might see a fix etc? Thanks.
Please, NO MORE "USER COMMENTS" like comment 82 or comment 83.  Don't respond to
this comment either.

These make it significantly harder for developers to discuss and fix the bug
since it's harder to find the technical discussion amid all the noise.
Then how about giving people a status update every so oftern so that other
people don't have to post these kinds of messages? I'm fully aware that "me too"
comments shouldn't be posted, but how else are we can we find out what's going
on? I mean it has been 4 years since this bug was filed.

Bugzilla isn't just for the developers. Perhaps there needs to be a new feature
added to Bugzilla to seperate technical comments from other comments?
If there were technical progress being made on this bug, you would likely be
reading about it here and seeing patches attached. Community forums like
Mozillazine are a more appropriate place for comments not directly related to
fixing a bug. It's probably not the best use of developer time for them to go
around to every bug that hasn't been worked on to say "nope, haven't gotten to
this yet" every so often.
*** Bug 321037 has been marked as a duplicate of this bug. ***
I really doubt this is branch-friendly, but moving it over to the core nominations for evaluation.
Flags: blocking-aviary2? → blocking1.8.1?
Annoying, and if complete and text-only work it sounds like it won't be too invasive: 1.8.1+.
Flags: blocking1.8.1? → blocking1.8.1+
why do you say that? web page, complete and text only just serialize the DOM, while HTML only should save exactly what the server sent. so they are rather different...
How about a status update?

This bug was first reported over FIVE years ago.  My god, this is a CRITICAL issue.  I thought this was only a pain in the ass for web developers, but seeing that this affects normal users, and has implications for people's financial information?!

What does it take to light a fire under someone's ass and get some progress on this bug?  How about some news?
Note Bug 136633.

Also note for historical purposes, Bug 40867, which shows that this problem was marked as a Blocker, and fixed back in 2002.

Jonas - can you take a look at this?
Assignee: file-handling → bugmail
The only way I can see fixing this is to serialize the DOM like "Web page complete" does. Or at least use that as fallback when the original source is not available in cache.

I'm not exited about the idea of pinning the source in cache using cache tokens. Even disregarding the fact that it won't work for too large source-files, it seems like a waste of memory when for 99% of the pages you go to you're not interested in getting to the source.
What's worse than re-getting the source is resubmitting form-data. Maybe fix it in two steps (1: don't re-submit, 2:don't re-get) if it cannot be done in one step (have a smart cache). Or at least pop up a warning as "step 0" before harmful operations are carried out.
(In reply to comment #94)
> it seems like a waste of memory when for 99% of the pages you go
> to you're not interested in getting to the source.

You mean a waste of disk space, right?

(In reply to comment #95)
> What's worse than re-getting the source is resubmitting form-data.

But that's exactly the same thing, is it not?
Well, if we save to disk it's a waste of cycles which are short to come by during page load.
? We already do that. It's called "disk cache". Most pages do not end up in the memory cache.
Given recent comments it doesn't look like there is an easy low-risk fix.   If that changes please re-nom.
Flags: blocking1.8.1+ → blocking1.8.1-
So is pinning in cache and hoping it'll end up in disk really what we want to do?
Darin?
I think that we need to not rely on the disk cache for proper functionality here.  If the network cache cannot supply the document, then we should just serialize the DOM like IE does.
*** Bug 357084 has been marked as a duplicate of this bug. ***
Summary: Save Page As Web Page, HTML Only will attempt to resubmit cgi/reload (e.g. bank/financial/purchase/post transaction is re-attempted and may fail, or even worse succeed; page counter will increase) and try to save that rather than the cached or the displa… → Save Page As Web Page, HTML Only will attempt to resubmit cgi to save instead of using cached or displayed content
 http://127.0.0.1:1051/bug.cgi 

WHATEVER THIS IS, IT'S RUINING MY LIFE!!  PLEASE HELP!!  I ALSO GET A LITTLE SQUARE BOX WITH AN X IN IT.  NO CLUE AS TO WHAT ANY OF THIS MEANS...DO YOU????

THANK YOU,
CAROL
It is not JUST a case of using the cache, rather than refreshing.  Using Firefox 2.0.0.3, I keep finding that you do a 'Save As' on the simplest web pages (even text only ones) and the software 'appears' to do the save and reports no error, yet you find that none of the pages have in fact been saved.  So there is also a problem with using a temp folder, in that the existence/validity of that temp folder needs to be confirmed, or AT LEAST an error message should be given to the user, that the page was NOT saved.  I've repeated this test over 200 times, and Firefox has not saved ANY of the pages, as html-only or even text-only, yet IE6 has consistently saved those pages.
Thinking outside the square, not being able to save a web page as viewed into a single file (like IE does with '.mht' files) remains the biggest single obstacle to wide use of Firefox2.  Instead of going down a proprietary MHT route, it would be great to be able to save any html page as an ODT file, or it too hard, a PDF, as either of those are more portable/standardised than MHT.  And you wouldn't want the interactive code stored, but simply the current screen paint of such request from a server, so that you could keep a simple permanent record of that database retrieval (airline booking or whatever) as it was presented to you at that time, rather than the code, which would not work the same ever again, as the server's data would be different if re-run.  In other words, you almost want what you get with a print-screen, except that of course it would be best to have the text as text, rather than pixels... but the goal is a perfect archive of what a user saw at that time...  THAT would beat the pants of what IE6 can do!!!
I agree that Firefox needs the ability to save a web page as shown on the screen, including all text typed in input fields, selection of radio buttons, etc.  This is often vital for use as a receipt of online transactions, and currently the only way to get this is to print the page (note that solutions that allow you to print to a PDF or PostScript file are common and need not require specific support from Firefox -- they're generally done as a virtual printer).

Problems with printing a page to use as a receipt include inability to copy and paste the text later, inability to see all the text contained in TEXTAREAs with scrollbars, issues with content getting cut off on the right side of the page, etc.  Therefore, my proposed solution for this (bug 293834) is to save as HTML but to have the form inputs filled in as currently shown on the screen.  I don't see any pressing need for a single-file archive like IE's .MHT -- an HTML file plus an accompanying directory of images and other support files generally works well.

However, I don't think further discussion of this should be held on this bug, which is focused on the problem of page requests getting resubmitted when saving.  (I do think it's good for those working on this bug to know that work on bug 293834 is underway, though, since they do relate.)
Duplicate of this bug: 351000
after six years still in Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9

very irritating and can compromise security in some situations
I just wrote a simple PHP script to test the scenario described in the initial comment of this issue, and the test showed that this bug doesn't exist (at least, in Firefox 3). Both saving and viewing source (a different bug) work as expected. 

I think this issue should be closed. If this bug still shows up, conditions must be different from what's described in the initial comment and should be described in more detail, and a new issue should be opened, in my opinion.

If somebody wants to test this, here's the code (to save you time):

<?php

// load counter data
$info = simplexml_load_file("counter.xml");
$counter = $info->counter + 1;

echo "you are visitor #$counter";

// update the counter
$info->counter = $counter;

// save to file
file_put_contents("counter.xml", $info->asXML());	 
?>


counter.xml:

  <?xml version="1.0" ?> 
- <root>
  <counter>11</counter> 
  </root>
If I open a page with 300 images,
wherefrom some images do not exist,
but I fully let the page load,
then turn off my internet connection,
and hit 'Save Page As',
my browser freezes completely,
even the OS becomes nearly unresponsive,
and many to-be-saved images and the 
html file are never saved anywhere.

( on XP & FF 3.03 )
Assignee: jonas → nobody
QA Contact: ian → file-handling
When I save pages or images, which are visible on the screen, Firefox 3.0.4 re-downloads them.

Especially bad when I try to save confirmation pages after purchasing something with a credit card.  I have ceased using Firefox for all credit card transactions due to this bug, instead using any other browser all of which have no trouble saving the currently viewed page.
There are some reasons that a resource may not be retrieved from cache, or stored in the cache at all.  This includes a range of cache control directives that an HTTP 1.1 server may send.  A client needs to obey these to be compliant with the HTTP spec.  (there may be security implications to changing any of Firefox's behaviour in regard to this, which we should consider)

It is acceptable for the client to use a cached copy in situations such as pressing the back button or saving/printing the current page even if the cache-control headers originally sent with that resource indicate that it is no longer fresh or should not be retrieved from the cache for any reason.  However, cache directives such as "must-revalidate" and "no-store", if I understand correctly, may need be honoured even in these trivial situations.

For instance, a "no-store" directive sent by the server will cause the original version of the page to never enter the cache.  Viewing the source or saving such a resource would not be possible without resubmitting the request as the original response was not stored.  There may also be other reasons that something cannot be sent to the cache.

The original post in this bug refers to a "web counter" which increments as the result of a GET request.  Despite other comments above, the correct behaviour is not a security risk brought about by the browser, because a browser is supposed to be able to re-submit a non-POST request at any time, without notifying the user, and without adverse affects on a web application (in the case, the counter).

The fact that this web counter increments as a result of such a re-request, even a duplicate request coming from the same client, is a result of the way the counter is written and is not a bug in Firefox or the HTTP spec; it is correct behaviour that such a counter be incremented again if:
- The client needs to request the resource again, and
- The original response was uncacheable, and
- It's just a GET request, and 
- The counter increments the count value on repeat requests.

The example in comment #109 does not take into account that the server may be sending different cache-control directives and may even fall into a different zone (Firefox treats localhost differently to external addresses with regard to caching).

The design of such a web counter could be improved such that it tries to detect repeat requests, though as long as it updates without POST-ing a form, then it will be susceptible to this kind of thing.

On the other hand, if there are ways in which we could improve Firefox's re-use of requests without disobeying the HTTP 1.1 spec or weakening the security of any online application that depends on current caching behaviour, then such an improvement would be good.  Particularly, if the browser were re-submitting POST requests under any circumstances without warning the user, then this is a problem that would need to be fixed.
Thomas Rutter: "There are some reasons that a resource may not be retrieved from cache, or stored in the cache at all.  This includes a range of cache control directives that an HTTP 1.1 server may send.  A client needs to obey these to be compliant with the HTTP spec.  (there may be security implications to changing any of Firefox's behaviour in regard to this, which we should consider)"

Correct.

But the cache should have nothing to do with printing the currently viewed page/image, saving the currently viewed page/image, viewing the source of the currently viewed page, or any other similar functions.  The cache is relevant when a person wishes to go to a page a second time -- not the first time.  The page being viewed in the here and now is the page being viewed in the here and now, not some to-be-downloaded-again page, that would be a *different* page, not the one being viewed when the user hits Save As, Print, View Source, etc.

Similarly, when the Back button is hit, the user expects to go "back" to the page as it was when they were first viewing it, not having it reloaded from the network, which would actually be going forward to a *new* page.  This is all the more pronounced today, when most web pages are dynamically generated content and change when reloaded.  Why should a Back (or Forward) button not go to the page the person was on previously?  Is this not how everyone understands the functioning of Back/Forward?  It is how I have always understood it.

If I do any of the functions mentioned when I am using other browsers, such as Opera or ELinks (the two I use the most apart from Firefox), it will never reload the pages from the network -- the pages are already there being viewed.

This seems pretty clear and straight-forward to me and numerous others in the comments on these many interrelated bug reports.
(In reply to comment #113)


> Similarly, when the Back button is hit, the user expects to go "back" to the
> page as it was when they were first viewing it, not having it reloaded from the
> network, which would actually be going forward to a *new* page.  This is all
> the more pronounced today, when most web pages are dynamically generated
> content and change when reloaded.  Why should a Back (or Forward) button not go
> to the page the person was on previously?  Is this not how everyone understands
> the functioning of Back/Forward?  It is how I have always understood it.

In fact, that is also how the W3C or the IETF understand it--one of the
W3C or IETF specifications (probably the HTTP RFC) says that the Back
button is expected to show what was shown before, and _not_ re-retrieve
the page to show its _current_ content.
Well, "view source" is mainly used by web developers.  Usually, they want to see the downloaded source code, not the current serialized DOM - especially because the DOM may have been parsed badly from improperly formatted source.

However, for printing and saving, it makes much more sense to save the modified DOM.  For example, many times people "save" pages to show them to people, and when those pages utilize JavaScript - as is becoming more common - that doesn't really help anyone.  Printing is similar, in that you would want the current DOM printed - just with the correct CSS applied to it.

Even when a page is not cached, the current DOM should be available of course.  So this seems like a viable solution.

Still, there's a separate bug for View Source, and I think it's a completely separate issue.  If I want to see the current source, I would use "View Selection Source" or Firebug, DOMi, etc.

-[Unknown]
Firebug doesn't display the actual source, it displays the "DOM source".  There can be a significant difference between the two, as I have discovered more than a few times, to much pain and teeth-gnashing.

I noticed that some of the pre-1.0 bugs were mentioned recently.  It's a sad state of affairs, this *was* fixed in the days before 1.0, but was broken again shortly after 1.0 was released.

It's incredibly frustrating and painful that 6 years later, this bug still exists, and that people have refused to pay attention to RFC's, security warnings and the like, and have supported the broken functionality we continue to see today.  This is a feature that worked in the earliest days of browser development, was (and is) a continued critical piece of functionality in some circles.  What will it take to get it fixed?
(In reply to comment #116)
> What will it take to get it fixed?

Making this widely known. If a popular publication shows the world that somebody using Firefox has paid twice for a product or something along these lines, Mozilla developers will jump on this. Their ego won't let them allow their product to be worse than all other available browsers in an essential area. They criticize MS for having IE violate some CSS standards, but they let Firefox have bugs that can cost people money.
Peter,

That is my point.  What Firebug shows is what you want to print, and what you want to save, but not what you want to see as View Source.  In other words, the same solution cannot be used on all three of these together.

(In reply to comment #117)
> Making this widely known. If a popular publication shows the world that
> somebody using Firefox has paid twice for a product or something along these
> lines, Mozilla developers will jump on this. Their ego won't let them allow
> their product to be worse than all other available browsers in an essential
> area. They criticize MS for having IE violate some CSS standards, but they let
> Firefox have bugs that can cost people money.

I am sorry.  I am a developer and I've made a few ecommerce type sites myself.  If any site ever charges someone twice due to them posting twice, that site is violating many public and well established standards, and is not designed properly.  Even if this bug were fixed, and no browser had this flaw, it would be a huge and CRITICAL flaw in the site for a long list of reasons.

Should such a situation exist, I would firstly and rightly blame the site author, not Firefox.  Even so, Firefox should be fixed, but only because this affects other forms.

-[Unknown]
OK.  This bug as originally filed has been fixed forever, ever since the UI actually started passing LOAD_FROM_CACHE into the core code when saving.  Comment 109 confirms this.

If you are having an issue with saving things not hitting the cache, please take the following steps:

1) Make sure that your issue is reproducible with a current trunk nightly from
   2009-02-20 or later.  That means that you have the fix for bug 84106 in your
   build, as well as the new image cache that no longer stomps on the necko
   memory cache.
2) File a bug with a description of your problem, including step by step
   directions to reproduce.  Don't just say "save the image", for example; tell
   me which exact menu items you're using in which exact menu, or which
   keyboard shortcuts you're using.
3) cc me on the bug you file; it seems that some people have been marking bugs
   about save issues with cache duplicates of unrelated bugs, so if you don't
   do this there's no guarantee that anyone competent will actually see your
   bug.

This code has been more or less abandoned for the last 6 years or so, but that doesn't mean it can't get improved.  It does mean that there are several bugs on file which are so confused (due to covering a host of unrelated problems, some of which have been fixed for years) as to be completely unusable.  Let's do a good job of tracking the problems that remain, with one bug per problem, so that we can get them fixed, get testcases added for them, and make sure the bugs don't reappear in the future.

Thanks, all.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
(In reply to comment #114)

I wrote:

> In fact, that is also how the W3C or the IETF understand it--one of the
> W3C or IETF specifications (probably the HTTP RFC) says that the Back
> button is expected to show what was shown before, and _not_ re-retrieve
> the page to show its _current_ content.

Someone found what I was remembering partially:  RFC 2068 (HTTP 1.1), 
section 13.13 (History Lists).
(In reply to Boris Zbarsky [:bz] from comment #119)
> OK.  This bug as originally filed has been fixed forever, ever since the UI
> actually started passing LOAD_FROM_CACHE into the core code when saving. 
> Comment 109 confirms this.
> 
> If you are having an issue with saving things not hitting the cache, please
> take the following steps:
> 
> 1) Make sure that your issue is reproducible with a current trunk nightly
> from
>    2009-02-20 or later.  That means that you have the fix for bug 84106 in
> your
>    build, as well as the new image cache that no longer stomps on the necko
>    memory cache.
> 2) File a bug with a description of your problem, including step by step
>    directions to reproduce.  Don't just say "save the image", for example;
> tell
>    me which exact menu items you're using in which exact menu, or which
>    keyboard shortcuts you're using.
> 3) cc me on the bug you file; it seems that some people have been marking
> bugs
>    about save issues with cache duplicates of unrelated bugs, so if you don't
>    do this there's no guarantee that anyone competent will actually see your
>    bug.
> 
> This code has been more or less abandoned for the last 6 years or so, but
> that doesn't mean it can't get improved.  It does mean that there are
> several bugs on file which are so confused (due to covering a host of
> unrelated problems, some of which have been fixed for years) as to be
> completely unusable.  Let's do a good job of tracking the problems that
> remain, with one bug per problem, so that we can get them fixed, get
> testcases added for them, and make sure the bugs don't reappear in the
> future.
> 
> Thanks, all.

Is there a list of these follow-up bugs somewhere?
I'm not aware of anyone filing any.
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.