Open Bug 125729 Opened 18 years ago Updated 3 years ago

When saving page, add source URL address as comment ("saved from ...") [Save As, Save Page As]

Categories

(Firefox :: File Handling, enhancement)

enhancement
Not set

Tracking

()

People

(Reporter: asensi2, Assigned: adamlock)

References

Details

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:0.9.8) Gecko/20020204
BuildID:    2002020406

Saving the page could write the page address in the file.

Surely it has happened to you: you have saved a page and a when you later read it you see a link pointing to file:///modules.php?op=modload&name=NS-lj-issues/issue95&file=5622s1
How do you retrieve that?

if you had the address written in the page when saved, like the Internet Exploter does, we wouldn't have those problems. 

By the way, yours is a good work!
confirming.  To adam, but this sounds like it should be handled in the encoder....
Assignee: law → adamlock
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows NT → All
Hardware: PC → All
Reporter can you rephrase the problem in a reproducable manner?
Target Milestone: --- → Future
This is not marked as a bug in Mozilla, but as a needed enhancement. 

The problem to solve is that: when you save a page and later open it -> you cannot go to the links which are relative (most of them). This has a very easy solution: to record in the page the original URL.

For example: When the content of the page is saved, a line could be added in the source:
     <!-- http://www.domain.com/sub/page.html -->
or maybe at the beginning of the page
     This page was saved from: <a href="http://www.domain.com/sub/page.html">http://www.domain.com/sub/page.html</a>

and it would be done. Very easy and useful, no more problems of this kind.

Greetings
I'm confused; isn't there a base tag?
Note to self - probably have to call nsIDOMNSDocument::GetLocation to obtain the
document's original location (url), and then add a comment node inside the head
element. Would work much like nsWebBrowserPersist::SetDocumentBase does now.
Insertion should only happen when saving to file://.
*** Bug 145581 has been marked as a duplicate of this bug. ***
*** Bug 31375 has been marked as a duplicate of this bug. ***
*** Bug 154646 has been marked as a duplicate of this bug. ***
*** Bug 158580 has been marked as a duplicate of this bug. ***
I have the feeling that bugs marked as duplicates of this one do not get scoured
for the value of their commentary, so I'm copying the commentary from "my"
duplicate bug since this is the one that's getting attention...

-------------

In beos, it was possible to tag each downloaded file with it's point of origin.
 that was a feature of the filesystem and the local philosophy and I think it's
something to borrow elsewhere.

Microsoft's Internet exploder prepends a comment to downloaded web pages
indicating where they came from...  the comment looks like this:

<!-- saved from
url=(0135)http://www.google.com/search?q=cache:www.wadada.net/alwarda/eliot/BurntNorton.htm+round++corner+deception+thrush+garlic+sapphires&hl=en
-->

I think that's great, so far as it goes...  what i generally do with that
comment is convert it into a visible citation with a perl script...  I'd like
for mozilla (which i use increasingly now that it can save all the images along
with saved html) is the ability to choose (via preferences) between 

   1. the invisible IE-like comment-based origin tag 
      (please add a date stamp as well) 

   2. something more visible...  like google's cache headers...
      (this is also good advertizing for mozilla, i think)

   3. MOZILLA {C,SH}OULD BE ABLE TO DETECT AND EXPOSE (AS GOOGLE-ISH HEADERS) 
      THE COMMENT-BASED ORIGIN TAGS OF BOTH IE AND MOZILLA IN PAGES THAT DO 
      NOT MAKE THEM VISIBLE.

the invisible annotation should be the default...

and the ability to expose mirroring comments (in both saved pages and
email-forwarded "send page" attachments) should default to a semi-silent pop-up
menu option ("go to original site of this page")

-- stig

It would be nice to be able to disable this proposed feature though --
sometimes, I would want the file saved exactly as it is on the remote server,
without any changes. (Probably by doing this as "save page as -> HTML only")
If you want pages exactly as they are on the remote server, something simpler
like wget(1) is "The Right Thing"...  As is, mozilla is currently saving all
images in a _files directory alongside the html folder and all of the image tags
get modified to point to the local copies.

I would prefer something like storing the page and it's images in a .htm.d
directory where the source file retains it's name but is linked to index.html
(and /or whatever file windows' file browser would need for that folder to
display as a web page...you'd still be able to access the directory with
right-click-->Explore)
*** Bug 173983 has been marked as a duplicate of this bug. ***
QA Contact: sairuh → petersen
*** Bug 198825 has been marked as a duplicate of this bug. ***
*** Bug 206258 has been marked as a duplicate of this bug. ***
*** Bug 215189 has been marked as a duplicate of this bug. ***
*** Bug 216863 has been marked as a duplicate of this bug. ***
Wouldn't the text like

<!-- Original page location:
http://bugzilla.mozilla.org/show_bug.cgi?id=125729
-->

be enough? It would not change the view of the page, but is easy retrievable.
*** Bug 224675 has been marked as a duplicate of this bug. ***
Please change someone the summary to catch more of the search terms. Something
like "Saving the page (Save Page As) could write/add/append the page
address/URL/location in the file as HTML comment".

See all the dupes.
Updated summary as per comment 20.

Old summary "Saving the page could write the page address in the file"
Summary: Saving the page could write the page address in the file → Saving web page (Save Page As) should write/add/append URL in the file as comment
Fixing Summary to add more searchable keywords ... I couldn't find this bug either.

Was:  Saving the page could write the page address in the file
Now:  When save page, add source URL address as comment [Save As, Save Page As]
Summary: Saving web page (Save Page As) should write/add/append URL in the file as comment → When save page, add source URL address as comment [Save As, Save Page As]
*** Bug 237465 has been marked as a duplicate of this bug. ***
Sorry if this is spam; 

I was thinking something like this would be as good as IE's implementation of
this + some; if we were to save this before the <html> tags (i.e. absolute top
of a page) :

<!-- Saved from (date) : http://url -->

Perhaps more useful with the date aswell.
We should save pages with URL comment in genuine MSIE-compatible form, e.g.

<!-- saved from url=(0050)http://bugzilla.mozilla.org/show_bug.cgi?id=125729 -->

in order to support MSIE-compatible third-party parsers for saved webpages.

Having four-digit character count preceding the URL is really handy, since
there's no other string separator (like "-quotes).

Though I understand that primary target is to keep this comment human-readable.
*** Bug 251894 has been marked as a duplicate of this bug. ***
>
> *** Bug 251894 has been marked as a duplicate of this bug. ***
>

This duplicate bug suggests following EXACTLY the COMMENT STYLE used by IE 
because WindowsXP SP2 includes a security feature that forces a locally saved 
HTML page to be restricted to the untrusted Internet Zone if it includes 
EXACTLY a comment of the format:

<!-- saved from url=(0032)http://geocities.yahoo.com/home/ -->

See also:
"Changes to Local Machine Zone for Windows XP Service Pack 2" (scroll down)
http://msdn.microsoft.com/workshop/security/szone/overview/overview.asp

The advantage is that local HTML pages can be downloaded and viewed from the
local computer but is still restricted to the internet zone and therefore cannot
request pages outside its domain (via frames, XMLHttpRequest, etc.) and cannot
execute local applications (via redirection through "file://", Directory
Traversal, etc.).
*** Bug 253784 has been marked as a duplicate of this bug. ***
*** Bug 254131 has been marked as a duplicate of this bug. ***
Bug 267369 is about to put the source URL in additional file properties like
"Summary" on NTFS5 (the info is stored into an alternate data stream named
?SummaryInformation or ?DocumentSummaryInformation [the question mark represents
an unprintable character, I guess 0x05]) and not in the file itself.
*** Bug 277755 has been marked as a duplicate of this bug. ***
*** Bug 280279 has been marked as a duplicate of this bug. ***
Summary: When save page, add source URL address as comment [Save As, Save Page As] → When saving page, add source URL address as comment ("saved from ...") [Save As, Save Page As]
There are several comments here about including the
<!-- saved from url=(####)webPage -->
line upon a Save As, and about how people would like to see increased utility
from Save As.

What increased utility means to me is that if I do Save As from Firefox, I can
turn around and open up the saved file in IE and see pretty much the same thing.
 But in most of the pages I save (e.g. The tree view of usenet posts found on
google's groups-beta site such as
http://groups-beta.google.com/group/microsoft.public.scripting.vbscript/browse_frm/thread/7c7dac6063446556
) this does not happen because there is some javascript involved (or other page
component) and of course when I bring up the saved file, the browser is lost.

The simple fix that I've been using is to insert, right after <head> the following:
<base href="original url" author="Csaba"> and so far this has worked very well
for me.  But it's really a pain in the neck having to copy the url, going to the
saved file, opening it up, writing in the new line, then saving it, then closing
the editor, then returning to the browser.

It would be great if I had this functionality as a Save As option (Save with
Base info)

Csaba Gabor from Vienna
I totally agree with Csaba that this feature should be implemented asap, 
please. It is such an annoyance to know that this is done in IE without any 
problems, yet is not being done in FF! This is one of those features that, when 
you need it, you need it BAAAAAD. :)

Thanks, people.
I agree with Csaba Gabor from Vienna.  I, too, had to open the file saved using
FF and add the URL right then and there!  Although I eventually have to edit
IE-saved ones, IE saves me a heck of a lot of time and effort by having the
original URL handy.  See, when I'm doing research, I save pages to a folder. 
It's only when I go back and look through the info, sometimes as much as weeks
and months after, that I decide which pages to keep and which to discard.  I
usu. end up with only a handful of kept pages out of all those initially saved!
   I then format these and make that URL that IE writes to the file a clickable
link.  I can then at any time in future check the current info online by
clicking on that source link.  But with FF, I am forced to save the URL for
every page saved whether or not I ultimately decide to keep it.  Months ago I
gave up on FF when I'm saving pages and was just plain forced back to IE while
surfing and saving.  Really defeats the purpose ... Hope this feature happens
soon.  I see by the comments that this request is a few years old!
*** Bug 315915 has been marked as a duplicate of this bug. ***
> If you want pages exactly as they are on the remote server, something 
> simpler like wget(1) is "The Right Thing"...  

How is wget going to help when the page you want to save is the result of 
posting a form?

Mozilla(/Firefox/Seamonkey) should be able to save the downloaded entity 
to a file WITHOUT corrupting it by adding something when writing it to a 
file.

If Mozilla lets you specify to add a source-URL comment, it must also let 
you easily specify to NOT add the comment.



i think some people here are confusing the firefox gtk2 linux file save dialog with the gecko save capabilities.

if you choose save as web page, html only on windows, you get the unadulturated page. note that this includes absolutely no fixup.

personally for testcases, i always save both html and complete, so that i can undo dynamic content munging but still have all the resources i want.

people who are not mozilla developers really shouldn't be asserting that doing work on mozilla (especially anywhere near file handling) is easy.

someone asked me if there should be a localized timestamp. for the record, i said no as many web browsers (especially mozilla) do not have the ability to parse all known localized timestamps.
i should note that if i expand the gtk2 file picker i have here, i can pick the other file formats....
*** Bug 363559 has been marked as a duplicate of this bug. ***
Has anyone thought of putting original URL in the meta tag?
(In reply to comment #41)
> Has anyone thought of putting original URL in the meta tag?

See Comment 33.
Note that it may be the case that <base href=...> already exists, in which case the insertion of the original URL would not serve a functional part, and could be accomplished in a few different ways.

Also, in case a developer does proceed along the lines of Bug 125729#33 one should take care of repeated saves.  In particular, suppose the file is saved from the internet to A.htm locally.  Then it is opened again and resaved as A.htm.  Then the process is repeated.  There should only be a finite amount of growth from this process.
Duplicate of this bug: 372595
Until this bug is not fixed SaveWithUrl extension (for firefox) can be used as a workaround - https://nic-nac-project.de/~kaosmos/savewithurl-en.html
(In reply to comment #4)
> I'm confused; isn't there a base tag?

base tag is specially commented out by firefox when saving completely, if it would not, relative addressess to files in local ..._files directory would be appended to the base url.
QA Contact: chrispetersen → file-handling
Sad that this issue has still not been resolved.
There is an obvious need for this functionality.
And once again, the latest update of FF (now at 27.0) has broken the addon that provide a workaround for this "BUG".
The addon, Savewithurl at https://freeshell.de/~kaosmos/savewithurl-en.html , once again if left enabled, will prevent any webpage from being saved.
SHEEESH...
The addon Savewithurl at https://freeshell.de/~kaosmos/savewithurl-en.html , was working fine through several FF updates.
But once again the latest FF update - 36.0 - has broken Savewithurl.
FF 36.0 prevents the full functionality of Savewithurl -- works only for "Complete" saving.
"HTML only" results in false saving.
(In reply to dbkh999 from comment #47)
> The addon Savewithurl at https://freeshell.de/~kaosmos/savewithurl-en.html ,
> was working fine through several FF updates.
> But once again the latest FF update - 36.0 - has broken Savewithurl.
> FF 36.0 prevents the full functionality of Savewithurl -- works only for
> "Complete" saving.
> "HTML only" results in false saving.

That's true; and people don't get warned. If this capability was already in Firefox, like it was in Internet Explorer, people wouldn't have all the problems quoted on this page.
The 0.3 version of SaveWithUrl should have fixed all the compatibility problems.
It has some changes from the previous ones, please read here https://freeshell.de/~kaosmos/savewithurl-en.html the details.
Product: Core → Firefox
Target Milestone: Future → ---
Version: Trunk → unspecified
With the expiration of not only the SaveWithURL addon but both of the alternatives I could find, this functionality needs to be baseline in Firefox now or it doesn't exist at all in any accessible form.

People have been asking for this functionality for SIXTEEN YEARS.  It's neither a difficult nor complicated nor time-consuming addition.  Its practicality cannot be questioned.  Until now its absence was mitigated by the existence of the aforementioned XUL addons, but now that the executive decision was made to eliminate that framework entirely, that mitigation is also eliminated.  There are no new addons that replace this functionality.

Just do it already!
if anybody here just(only) wants so save original urls of saved pages, try "scrapbook" extension. i am now using "scrapbook x", a fork. scrapbook has other many good features compared to built-in saving. for example you have not to manually type unique addition to file name every time page with same title is saved. you can save with styles but without images.
Thanks for the summary.  
Ignore the practical obvious usability of a feature long enough and its like an accepted bug with its tedious workarounds.  
I finally just gave up and started going through the often multi-step process of opening this or that save location and dragging the URL link there.

And Thanks for the scrapbook note... Would be nice if could just use it to replace the save without having to use a special location structure for the saved file..
You need to log in before you can comment on or make changes to this bug.