The default bug view has changed. See this FAQ.

CSS not fixed up by webbrowserpersist ("save page as, complete" omits background images)

NEW
Unassigned

Status

()

Firefox
File Handling
--
minor
16 years ago
3 months ago

People

(Reporter: Dominik Dröscher, Unassigned, Mentored)

Tracking

(Blocks: 1 bug, {helpwanted, smoketest, testcase})

unspecified
helpwanted, smoketest, testcase
Points:
---
Dependency tree / graph
Bug Flags:
blocking-aviary1.5 -
in-testsuite ?
in-litmus +

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [Halloween2011Bug][lang=c++])

Attachments

(8 attachments, 3 obsolete attachments)

(Reporter)

Description

16 years ago
Save complete Webpage works great but not 100% correct. Try to download this page:
http://phpbb.sourceforge.net/phpBB2/viewforum.php?f=1

It seems to forget the table background images which are called in the CSS header.
To ben.  This has been mentioned in the fora on mozillazine.org too.
Assignee: asa → ben
Status: UNCONFIRMED → NEW
Ever confirmed: true
This applies to non-css backgrounds too.
Component: Browser-General → File Handling
QA Contact: doronr → sairuh
Summary: CSS table/cell backgrounds not saved with save complete → table/cell backgrounds not saved with save complete
*** Bug 115532 has been marked as a duplicate of this bug. ***

Comment 4

16 years ago
see bug 115532 for more in the straight html for background tags.

Comment 5

16 years ago
The webbrowserpersist object doesn't save anything from the CSS whether inline
or not.

I don't know if it is possible to walk through, fixup and generate externally
linked and inline CSS (with the minimum of effort), but it's something I will
take for the time being.
Assignee: ben → adamlock
Summary: table/cell backgrounds not saved with save complete → CSS not fixed up by webbrowserpersist

Comment 6

16 years ago
Adding "(background images not saved)" to summary.
Summary: CSS not fixed up by webbrowserpersist → CSS not fixed up by webbrowserpersist (background images not saved)

Comment 7

16 years ago
*** Bug 116660 has been marked as a duplicate of this bug. ***

Comment 8

16 years ago
Seen this on Linux 2001-12-20 (Slackware 8) changing OS to All
OS: Windows 2000 → All
Hardware: PC → All

Comment 9

16 years ago
This is regarding the <td background=''> attribute.

I have a page I saved, from www.stomped.com as of today. using 12-27 build.. the
source has a <td background="images/trans.gif">  and Mozilla renders that
correctly.  Now doing 'save page as', doesn't create a subdirectory of images
for example 'thedefaultsavedir'/www.stomped.com_files/images, and it doesn't
parse this tag attribute to retrieve the image: 'trans.gif' and save it to the:
'/www.stomped.com_files/images' subdirectory.   'Save page as' page source,
still outputs the default source tag as above, and when trying to access the
image upon viewing using open file > context menu > view background image, the
alert box shows it trying to access just 'thedefaultsavedir'/images/trans.gif.

There is also no doc type declared on this page.
Keywords: nsbeta1

Updated

15 years ago
Target Milestone: --- → Future

Comment 10

15 years ago
just spoke w/ sarah about this. minusing.
Keywords: nsbeta1 → nsbeta1-
*** Bug 120859 has been marked as a duplicate of this bug. ***

Comment 12

15 years ago
*** Bug 126307 has been marked as a duplicate of this bug. ***
Blocks: 126309
*** Bug 128843 has been marked as a duplicate of this bug. ***

Comment 14

15 years ago
I suspect there may actually be two issues at work here; I've made a few
testcases to illustrate my point.

- Pages with <td background="foo"> aren't saved completely. 
http://pantheon.yale.edu/~al262/mozilla/115107/tables/index.html
- Pages with foo{background-image:url(bar);} in the CSS aren't saved completely,
even if foo != td.   http://pantheon.yale.edu/~al262/mozilla/115107/css/index.html
- However, pages with simple <body background="foo"> tags work fine. 
http://pantheon.yale.edu/~al262/mozilla/115107/plain/index.html

Or maybe I'm missing something.  Regardless, going to zip up the whole lot and
attach it.

Comment 15

15 years ago
Created attachment 72701 [details]
Test cases possibly showing independence of CSS bug and table bug

Testcases above, packaged up (brown paper packages tied up in string / these
are a few of my favorite things...)
Andrew, the <td background="foo"> thing should be a separate bug.  See
http://lxr.mozilla.org/seamonkey/source/embedding/components/webbrowserpersist/src/nsWebBrowserPersist.cpp#1807
-- we basically never check for background attrs on <td> nodes.  That should be
simple to fix.  If we don't have a bug on that already, file one on me.

Comment 17

15 years ago
Boris, at the moment I don't read C++, so I'm going to take your word for it. 
There was a bug (Bug 115532) that seems to address your issue but it was marked
as a duplicate of this one (see this bug 115107 comment 3).

Comment 18

15 years ago
Neither TABLE nor TD elements have a background attribute (in HTML 4.0) which is
why it isn't fixed up. I could add fixup for such quirks if people feel the
practice is prevalent enough.

As for inline/external styles with url declarations such as this:

  BODY {
    color: black;
    background: url(http://foo.com/texture.gif);
  }

I don't believe there is much that can be done until we have a way to serialize
CSS from it's in-memory representation (DOM).
Adam, we _do_ have such a way.  The CSSOM should allow you to walk the
document.styleSheets array, walk the .cssRules array for each sheet, check
whether a rule has a background image set in it, change the URL if so, save the
.cssText for each rule,  recurse down @import rules, etc.  I can't actually
think of any bugs we have blocking that sequence of actions....  If you _do_ run
into bugs in that code that block this, please let me know.

Comment 20

15 years ago
Boris, do you know where the CSS serialization code lives? It's not being done
the way other DOMs are serialized via the content encoder.
Heh.  There is no pre-build serializer, yet.  Each CSSRule object has a
GetCssText() method that returns an nsAString holding a serialization of the
rule (per DOM2 Style).  The CSSOM-walking has to get done by hand,
unfortunately.... 

Comment 22

15 years ago
*** Bug 133725 has been marked as a duplicate of this bug. ***

Comment 23

15 years ago
*** Bug 157708 has been marked as a duplicate of this bug. ***

Updated

15 years ago
Keywords: testcase

Comment 24

15 years ago
I have seen this bug as well. In my case, the CSS for a page contains:

body {
        background-color: #ffffff;
        margin: 1em 1em 1em 40px;
        font-family: sans-serif;
        color: black;
        background-image: url(/images/logo-left.gif);
        background-position: top left;
        background-attachment: fixed;
        background-repeat: no-repeat;
}

The background image (/images/logo-left.gif) is not saved, when it clearly
ought to be.

Rich.
QA Contact: sairuh → petersen

Comment 25

15 years ago
Not only does the @import css not get saved to disk, when you later try to open
the saved html while the original server is unavailable, it takes forever to
load the local file while Mozilla waits for the css that never comes.

Comment 26

14 years ago
*** Bug 187590 has been marked as a duplicate of this bug. ***

Comment 27

14 years ago
Created attachment 121701 [details] [diff] [review]
Work in progress

Patch is work in progress but it attempts to fix up style properties that
typically contain an url(), e.g. background-image. I have most of the rule
searching and url extraction / fixup down but I have to clean up the node
cloning function.

I also have a horrible feeling that just asking a node for it's inline syle
drags a bunch of -moz styles into existence even if they weren't there to start
with. This means you get a mess of extra styles in the output document. I'll
have to examine this issue a bit more.
So this is based on a totally drive-by skim of the patch:

1)  NS_ARRAY_LENGTH is a nice macro.  ;)
2)  "content" and "cursor" can have URI values
3)  The parser you wrote will fail to parse something like:

     content: "url(foo)" url(foo);

    correctly.  You're right that GetPropertyCSSValue would be nice here...
4)  "foo: bar" in CSS is a declaration, not a rule (the "// Test if the inline
    style contains rules that" comment)
5)  The code for setting the URL value will not work for "content" and "cursor"
    because they can include things other than the URL.
6)  I'm not sure how putting @import in the same list as property names will
    work -- the two don't even live in the same places...

Comment 29

14 years ago
Created attachment 122047 [details] [diff] [review]
Work in progress 2

Patch addresses most of the issues but is still work in progress. Uses
NS_ARRAY_LENGTH, adds support for various "cursor-*" props but not "content",
skips quoted text in values, fixes the broken comment, removes @import.

Patch actually fixes up content now, but is busted in a couple of ways:

1. The call to cssDeclOut->SetProperty(propName, newValue, propPriority) during
fixup results in a mutant URL caused by the CSS resolving the supplied relative
url into an absolute URL using the base address of the original location. So
"url(bg.gif)" becomes "url(http://foo.com/original_path/local_path/bg.gif)".
I'll have to figure out how it is getting its base address and fix it.

2. Just touching the .style property on an element results in all kinds of
nasty extra styles to leap into existence. For example:

  <body style="background: yellow url(bg.gif)">

Becomes:

  <body style="background: yellow url(126309_files/bg.gif) repeat scroll 0%; 
    -moz-background-clip: initial; -moz-background-inline-policy: initial;
    -moz-background-origin: initial;">
  
Finally, the patch will have to be able to extract and fixup multiple urls that
the likes of "content" could contain. This is likely to complicate string
parsing matters a lot.
Attachment #121701 - Attachment is obsolete: true
> I'll have to figure out how it is getting its base address and fix it.

For inline style, this is coming from GetBaseURL on the document object.  For
style in stylesheets, this is the stylesheet's URI, which the sheet stores. 
It's set at creation time via nsICSSStyleSheet::Init().

"background" is a shorthand property.  So it will set all sorts of stuff.  As
long as you only work with "background-image" you don't really have to worry
about it...

The cursor property is "cursor".  Those things you have in the list are just
possible values.  The problem is, you can have something like:

  cursor: url(foo) crosshair;

Are there useful changes that could be made to the style system APIs that would
make this easier?  Should some of those changes become part of DOM2 CSS?
Good lord. You really don't want to be doing actual parsing of CSS here. We
already have one CSS parser, having two just means that we'll have twice as many
bugs (probably more than twice, since this code won't be tested much).

Does it deal with CSS escapes? CSS comments? etc.

Surely there's a better way to do this.
It sorta deals with CSS escapes (probably to the extent that it's needed in this
code).  Comments are a non-issue, since the only CSS we have to parse are
property values gotten via getPropertyValue().  We're basically parsing
canonicalized values here...

But yes, it would be nice if we could ask the CSS decl for that info, since it
already has it all broken down and has to trouble itself to produce CSS just so
we can reparse it....
I thought we already could? How does "view background image" work?
That gets the computed style, not the declared style.  Computed style supports
getPropertyCSSValue() to a certain extent (not live, but better than nothing).

Note that we don't implement computed "content" either... ;)  
Well in that case I would recommend writing a quick internal hack of an API. In
fact probably the simplest API would be on the stylesheet or rule interfaces,
which just serialises the entire stylesheet or rule using a new base URI.

The reason I say that, as opposed to "implement DOM2 CSS", is that DOM2 CSS is
likely to be junked in favour of a more sensible API.
Yes, I realize that.  That's why we haven't implemented it.  ;)

Serializing with a new base URI is nontrivial, since we only store absolute
URIs.  We would need to duplicate some fun code that webbrowserpersist already
has.  Further, for inline style there is no "sheet" -- there is just a
CSSDeclaration, and that implements no useful interfaces
(nsIDOMCSSStyleDeclaration/nsIDOMCSS2Properties are the closest it gets).

I suppose we could add another interface that you could QI these beasties to...
as long as we're careful, it should not be too expensive to do that...

Adam, I assume you're trying to fix this for 1.4?

Comment 37

14 years ago
I'll fix it for 1.4 if I can get it working reasonably at least for the CSS 1
stuff. I don't care so much about CSS 2 as background images are the most
visible breakage this patch tries to fix.

As I mentioned, there are a few issues to be sorted out before I would call it
ready for review. I'd like to know why these extra styles suddenly materialised
for example. I can probably live with forcing an absolute uri into the fixed up
style to make it work for now.
Comment on attachment 122047 [details] [diff] [review]
Work in progress 2

>+    "cursor-crosshair",
>+    "cursor-default",
>+    "cursor-pointer",
>+    "cursor-move",
>+    "cursor-e-resize",
>+    "cursor-ne-resize",
>+    "cursor-nw-resize",
>+    "cursor-n-resize",
>+    "cursor-se-resize",
>+    "cursor-sw-resize",
>+    "cursor-s-resize",
>+    "cursor-w-resize",
>+    "cursor-text",
>+    "cursor-wait",
>+    "cursor-help",

I've never heard of any of these.
Adam, those properties are just set by the "background" shorthand... Or are you
saying that something in your patch is making them appear? (I see nothing there
that should cause that to happen....)

Comment 40

14 years ago
I was confused about the cursor thing. I was taking the CSS2 spec to mean there
were cursor, cursor-wait, cursor-pointer etc. properties just as there is a
background and a more specific background-image. I can cut those too for the
time being.

I was trying to avoid having to deal with lines like this:

cursor: url(foo.gif), url(foo2.gif), url(foo3.gif), text;

As for those other properties, I still don't understand why they suddenly appear
like they do. They are not in the original source so why should they just
materialise when I ask for the style object? If the styles are harmless then
okay, but I suspect if I checked in a fix which has this behaviour, I'd soon see
another bug open complaining about them.
Don't forget -moz-binding: url(...); either, while you're at it.


Regarding the property explosion. Assuming the original stylesheet reads:

   background: green;

...then it is exactly equivalent to:

   barkground-color: green; background-image: none; background-position: 0% 0%;
   background-attachment: scroll; background-repeat: repeat;
   -moz-background-clip: initial; -moz-background-origin: initial;
   /* and a few more that i've forgotten */
Adam, those properties appear when you serialize the style attr.  Your patch has
nothing to do with that....

Comment 43

14 years ago
I know they have nothing to do with my patch but expect another bug nonetheless.
If someone publishes a page with an inline style and sees these new things
appear, they're not going to understand why.

I'm not sure I do either. I can appreciate that saying "background:
url(foo.gif)" might internally imply some other styles but I am not sure that
they should be serialized when they were not explicitly defined in the first place.

I can raise another bug on that issue and revise my patch to concentrate on
background images for the time being.

Comment 44

14 years ago
Boris Zbarsky in comment 36 says:
>Serializing with a new base URI is nontrivial, since we only store absolute
>URIs.  We would need to duplicate some fun code that webbrowserpersist already
>has.

Which "fun code" in webbrowserpersist would you need to duplicate?  If you want
to get a relative url from an absolute url (relative to another location), you
can do so with nsIURL's getRelativeSpec:
  AUTF8String getRelativeSpec(in nsIURI aURIToCompare)
Does that help?

Updated

14 years ago
Blocks: 167262

Comment 45

14 years ago
*** Bug 206401 has been marked as a duplicate of this bug. ***

Comment 46

14 years ago
Encountering the bug when trying to save the URL for bug 216573.
*** Bug 221532 has been marked as a duplicate of this bug. ***

Comment 48

14 years ago
Is there a fix-for-version for this yet ? 
With increasing use of CSS (the latest Dreamweaver makes much better use of it)
it is going to bite more and more people !

I think it should be a more major issue, as potentialy navigation elements etc.
on webpages may vanish for no good reason.
See the target milestone.  No one is working on this right now or will in the
immediate future.  There is a work-in-progress patch here, and now that the
style system has been changed to also store non-absolute URIs things should be a
bit easier (issue 1 in comment 29 should not be present anymore).  So someone
who cares about this bug needs to pick it up and make it work.

I'd say that a first cut could skip cursor and content and just do @import,
backgrounds, and -moz-binding...
Keywords: helpwanted
> No one is working on this right now or will in the immediate future.

Hey Boris... Where did you get that... This is one of the bugs I want to
see fixed for Nvu. Please don't reassign yet, I have plenty of stuff on
my plate right now.
glazou, I got it from earlier comments from Adam (the assignee), from the target
milestone, and from general knowledge of what I know people are working on. 
Since the details what you're doing has been shrouded in secrecy and since I
can't read minds (well, not ones that far away, at least), I hardly had a way of
knowing you were thinking of working on this... ;)

Comment 52

14 years ago
*** Bug 224801 has been marked as a duplicate of this bug. ***

Comment 53

14 years ago
I'm nominating this for 1.6 because I think saving web pages is a fairly common
task and as CSS becomes more prominent, we're seeing more and more pages that
don't save completely. dbaron thinks this might be more 1.7alpha material but it
does cause one of our smoketests, B.27, (and I suspect an increasingly
representative smoketest) to partially fail. 
Flags: blocking1.6b?
Keywords: smoketest
The way to fix this is to implement saving web pages the way that MacIE does it 
-- saving the files unchanged, annotated with their original URI, so that links 
between dependent resources still work, even if they are done via obscure ways 
like built from JavaScript (which we could never solve otherwise).

Comment 55

14 years ago
Jump ball. Who wants to try to tackle this? Glazman, can you take this?

Comment 56

14 years ago
*** Bug 226925 has been marked as a duplicate of this bug. ***

Comment 57

14 years ago
doesn't look like this is happening for 1.6
Flags: blocking1.6b? → blocking1.6b-
Blocks: 115634
http://www.alistapart.com/articles/customunderlines/

Yet another good testcase for this bug, and for bug 126309.

The whole page is broken in Firebird 0.7, Firefox 0.8. (But I am not familiar
with Mozilla 1.6, so you'll have to test it there for yourselves. Sorry.)

Comment 59

13 years ago
http://www.alistapart.com/articles/customunderlines/ WFM
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040207 Firefox/0.8
Tom Chiverton, does http://www.alistapart.com/articles/customunderlines/ work
for you even after being saved (File, Save page as, Web page, complete) and then
opened from local drive?

Comment 61

13 years ago
Sergey- No, sorry, doesn't work after save.
My bad - I thought this bug was just about on-line pages, not saved ones too.

FWIW, it looks like the saved page hasn't had a style sheet applied, which I
guess fits with what seems to be wrong with Mozilla 'fixing' URLs when it saves
pages.

Is there any chance of this bug being assigned a non-future milestone, or at
least having severity increased - this bug is going to bite a lot more people as
other comments have indicated, as saving is fairly common.

Comment 62

13 years ago
*** Bug 245799 has been marked as a duplicate of this bug. ***

Comment 63

13 years ago
(In reply to comment #62)
> *** Bug 245799 has been marked as a duplicate of this bug. ***
(sorry, searched but couldn't find this report
from my report:)

Reproducible: Always
Steps to Reproduce:
1. Go the example URL http://dromoscopio.net/
2. Menu:File/Save As.../Save As Type: Web Page, complete
3. look at saved file in a file explorer

Actual Results:  
none of the images are saved, eg
#uno div{
	background-image: url(uno.gif);
}

Expected Results:  
saved all images, even thoses declaired in CSS

default config

workaround:
1. hover mouse over desired CSS background image
2. right-mouseclick, View Background Image
3. right-mouseclick, Save Image As...

This is not not critical, but may render off-line viewing of saved webpages
impossible without editing the HTML/CSS, if the page is written with the font &
background colour the same.  (same issue when browsing some sites while the
browser is set w/images off: eg if the font color is white & the background
image is dark, but the background-color isn't set)
*** Bug 251815 has been marked as a duplicate of this bug. ***

Comment 65

13 years ago
*** Bug 254306 has been marked as a duplicate of this bug. ***

Comment 66

13 years ago
*** Bug 255838 has been marked as a duplicate of this bug. ***

Comment 67

13 years ago
This problem will appear much more often in the future - see eg.
http://plone.org and any comments on how to be XHTML+CSS2 compliant and achieve
accessibility (AAA, S508 etc).

It breaks saving pages from plone.org right now.

Comment 68

13 years ago
A propaganda website that makes massive use of this techniques, and which is
advocating it with good arguments, is

  http://www.csszengarden.com/

Comment 69

13 years ago
The problem will appear more and more often. For example now that mozilla.org
has a new style with a pretty background image : http://www.mozilla.org :(

Comment 70

13 years ago
*** Bug 268810 has been marked as a duplicate of this bug. ***

Comment 71

13 years ago
*** Bug 273218 has been marked as a duplicate of this bug. ***

Comment 72

13 years ago
*** Bug 274163 has been marked as a duplicate of this bug. ***

Updated

12 years ago
Flags: blocking-aviary1.1+

Comment 73

12 years ago
*** Bug 294976 has been marked as a duplicate of this bug. ***

Comment 74

12 years ago
Long silence here. I am not 100% sure. Does this bug here include that files
called by include in style files are not included into save page, complete? If
so, the summary is totally missleading. Maybe someone who better understands the
bug can find a better wording.

pi

Updated

12 years ago
Flags: blocking-aviary1.1+ → blocking-aviary1.1-
*** Bug 298819 has been marked as a duplicate of this bug. ***

Comment 76

12 years ago
*** Bug 299394 has been marked as a duplicate of this bug. ***
*** Bug 305438 has been marked as a duplicate of this bug. ***

Comment 78

12 years ago
*** Bug 305630 has been marked as a duplicate of this bug. ***
*** Bug 308489 has been marked as a duplicate of this bug. ***

Comment 80

12 years ago
(In reply to comment #74) 
> Long silence here. I am not 100% sure. Does this bug here include that files 
> called by include in style files are not included into save page, 
 
Yeah, that's spot on Boris. 
 

Updated

12 years ago
Flags: blocking1.9a1?
Flags: blocking-aviary2.0?
Not in the scope for Fx2, which is not going to include significant changes to Gecko.
Flags: blocking-aviary2? → blocking-aviary2-

Comment 82

11 years ago
Windows 2000, Firefox 1.5
For simplicity, I give an example. Save page http://www.mozilla.org/ to your local disk. Then open the saved page. You see that inscription "mozilla.org" and an image of the dinosaur at the top of the page disappeared.
Sometimes it significantly disfigures pages, especially forums.

*** Bug 322817 has been marked as a duplicate of this bug. ***

Updated

11 years ago
Flags: blocking-firefox2-

Updated

11 years ago
Summary: CSS not fixed up by webbrowserpersist (background images not saved) → CSS not fixed up by webbrowserpersist ("save page as, complete" omits background images)

Comment 84

11 years ago
*** Bug 328588 has been marked as a duplicate of this bug. ***

Comment 85

11 years ago
Oh my, this bug has been reported 4 years ago, and it wasn't fixed yet? How come is it so hard?
*** Bug 332899 has been marked as a duplicate of this bug. ***

Updated

11 years ago
Blocks: 337206
No longer blocks: 337206

Updated

11 years ago
Flags: blocking1.9a1? → blocking1.9-
Whiteboard: [wanted-1.9]
This bug needs someone to step up if it's going to get fixed.  All the code involved is currently unowned...  I can promise quick patch reviews on this one.
Assignee: adamlock → file-handling
QA Contact: chrispetersen → ian
So, for what it's worth, the way I think this ought to be fixed is roughly as follows:

To avoid having two separate bits of CSS serialization code, we should teach the style system to serialize with URI fixup.

For a start, this requires the nsWebBrowserPersist code to implement an interface a lot like nsIDocumentEncoderNodeFixup -- except for URI fixup.  In other words, the interface would take a URI as input (probably nsIURI) and return a URI as output (probably string), potentially enqueueing the saving of the URI in question as part of the implementation.

Then we could add a Serialize method to nsIStyleSheet that took one of these as an argument (but where the object being null would mean no URI fixup was needed).  Furthermore, we would then add Serialize methods to everything that has a GetCssText method by renaming GetCssText and adding the argument to this new interface -- and then adding a GetCssText that called Serilaize(null) -- where the null object for this new interface would mean to do no URI fixup.  This object would further have to be passed down through nsCSSDeclaration::ToString and the methods it calls -- where, again, null would mean that no fixup was needed.

I think this is highly preferable to reimplementing CSS parsing and serialization.

Boris, does this seem reasonable?

Comment 89

11 years ago
(I plan to work on this, although not in the next few days.)
Yeah, the plan in comment 88 sounds pretty good.

In fact, it makes me wish that our content serialization worked more like that... Then fixup would work for, say, SVG too...

Comment 91

10 years ago
*** Bug 364036 has been marked as a duplicate of this bug. ***

Updated

10 years ago
Duplicate of this bug: 369282

Updated

10 years ago
Duplicate of this bug: 371419

Comment 94

10 years ago
As a temp solution you might be interested to try the Save Complete [1] extension. It supports saving images referenced in CSS files. We also discuss the problem in the « "File -> Save Page As" does not save images fro » [2] MozillaZine Forums topic.

Notes :
* [1] <https://addons.mozilla.org/en-US/firefox/addon/2925>
* [2] <http://forums.mozillazine.org/viewtopic.php?t=538458>

Updated

10 years ago
Duplicate of this bug: 376597

Updated

10 years ago
Duplicate of this bug: 393054

Comment 97

10 years ago
Created attachment 279129 [details] [diff] [review]
preliminary, incomplete implementation

So I've been working on dbaron's suggestion for a week or two now. I'm leaving for the rest of the weekend in half an hour to run my first half marathon (eep!) so I figured I'd post what I have so far for preliminary review.

I've tried to avoid gratuitous refactoring of nsWebBrowserPersist, opting for small changes/hacks around an architecture not really aligned with the sequence of operations CSS serialization requires.

Right now, the biggest thing left to implement feature-wise is to make CSS and other content-serialized, not-downloaded files get the right file extension. Currently, wbp fixes up persisted files with nsIChannel-derived mime types; since we're not downloading the CSS files we persist from a remote server, that's not going to work. I haven't spent more than a few minutes looking at how much I have to work with, so I'm not sure of the best way to go about doing that. For some things it's easy (nsICSSStyleSheet should be saved as .css) but for things like background-image: url(blah), I'm not sure whether CSS knows what the mime type of blah is. Can't just take the extension from the URI, since that fails for dynamically-generated pages. For images I guess we do content-sniffing anyways, so we could save it as whatever we wanted, but still. 

Anyways, I'll be back Monday and can hopefully post a more polished/complete patch later in the week. This patch deserves at least two r-'s in its current state. At least there's enough code there to make comments on.
Attachment #279129 - Flags: review?(bzbarsky)

Comment 98

10 years ago
Oh, one other thing: we technically don't need to persist @import-ed stylesheets linked to from style elements as separate files, their child rules can be recursively serialized along with the main page stylesheet. Right now the code does both, I think, because it's in flux. Since that's the first thing in the diff I just wanted to comment on it.
I doubt I'll be able to get to this until after I get back in mid-to-late Sept.

> their child rules can be recursively serialized along with the main page
> stylesheet

Not if they contain @charset, @namespace, etc rules.

Updated

10 years ago
Duplicate of this bug: 396042

Updated

10 years ago
Duplicate of this bug: 396276
Comment on attachment 279129 [details] [diff] [review]
preliminary, incomplete implementation

>Index: layout/style/nsCSSStyleSheet.cpp
>+    aContent.Append(NS_LITERAL_STRING("/* Rules from linked stylesheet, original URL: */\n"));

As I said, this is no good if the linked stylesheets have @namespace rules (or @charset or @import, but skipping those won't necessarily break things, while skipping @namespace most definitely will).  Please make sure to write tests for this case.

What you probably want to do instead is to serialize the rules, and have @import serialization start the serialization of the imported sheet.  I believe that's what dbaron suggested too.

>+  for (i = 0; i < styleRules; ++i) {
>+    rv = GetStyleRuleAt(i, *getter_AddRefs(rule));

Declare |rule| here, not up at the beginning somewhere?

>Index: layout/style/nsHTMLCSSStyleSheet.cpp
>+  NS_IMETHOD Serialize(nsAString& aContent, nsIDocumentEncoderFixup *aFixup);

Why not just put this method on nsICSSStyleSheet, since those are all you serialize?

>Index: layout/style/nsICSSRule.h

And probably put this on nsICSSStyleRule.

I don't see any rule implementations of Serialize() in this patch.  Some of the rules will require fixup too, of course (@document rules come to mind).

Of course such rules in user sheets won't work right once saving has happened, but getting that to work is food for another bug, I think.

>Index: embedding/components/webbrowserpersist/src/nsWebBrowserPersist.cpp
>+#include "nsIFormControl.h"
>+#include "nsIDOM3Node.h"

This looks like part of another patch, right?

>+        // Leave mCurrentDataPath alone so that CSS fixup knows where to save
>+        //mCurrentDataPath = oldDataPath;

This is pretty questionable.  I'd have to dig to make sure, but I would be it's wrong.

>@@ -2504,17 +2522,37 @@ nsWebBrowserPersist::EnumPersistURIs(nsH
>+        rv = cos->Init(outputStream, nsnull, 0, 0);
>+        NS_ENSURE_SUCCESS(rv, rv);

This is wrong if the sheet has an @charset rule. Unless you plan to strip those, in which case it's wrong if the main document is not UTF-8.  Again, please add tests.

>+        PRBool wroteFullString = PR_FALSE;
>+        rv = cos->WriteString(data->mContents, &wroteFullString);

Why is wroteFullString being ignored?  Perhaps this should be called something else?

>+        rv = cos->Flush();
>+        rv = outputStream->Flush();
>+        rv = cos->Close();

Why bother assigning to rv if you plan to ignore it?

>+                    nodeAsLink->GetType(type);
>+                    if (type.EqualsLiteral("text/css")) {

This is wrong.  |type| could be an empty string for a CSS style sheet.  Further, if type is "text/css" there ould be no DOMStyleSheet attached to the node.  You probably want to just GetSheet() and then null-check it.  Again, add tests.

>+                        nsAutoString content;
>+                        ss->Serialize(content, mFixup);
...
>+                        data->mContents.Assign(content);

Why not get the |data| first and just pass data->mContents to Serialize()?

I haven't reviewed the URI-munging details.

CSS fixup should probably also be applied to "style" attributes.  Might be a separate bug.

To answer your quesions in the bug:

> I've tried to avoid gratuitous refactoring of nsWebBrowserPersist

If doing said refactoring (in a separate patch prior to implementing this) would make things better, please go for it!  Unless you think you want to get this into 1.9 and that would make it harder; in that case please file a followup on the refactoring.

> for things like background-image: url(blah), I'm not sure whether CSS knows
> what the mime type of blah is.

For images in particular, it does if they were used and we've gotten far enough in downloading them to know the type.  More precisely, given an nsCSSValue::Image you can get its imgIRequest and get the type from that.

But really, if we're persisting those images (which we should be, right?), we'll know the type because those we _do_ get via an nsIChannel.
Attachment #279129 - Flags: review?(bzbarsky) → review-

Updated

10 years ago
Depends on: 293834

Comment 103

10 years ago
Created attachment 282460 [details] [diff] [review]
mostly complete patch (actually not nearly complete)

Updated patch attached. It correctly serializes CSS images. I actually had most of this work done before your review; I should have uploaded what I had earlier, since this resulted in some amount of wasted effort for you, bz. Mea culpa.

I think I removed all the changes related to bug 293834 in this patch.

(In reply to comment #102)
> (From update of attachment 279129 [details] [diff] [review])
> >Index: layout/style/nsCSSStyleSheet.cpp
> >+    aContent.Append(NS_LITERAL_STRING("/* Rules from linked stylesheet, original URL: */\n"));
> 
> As I said, this is no good if the linked stylesheets have @namespace rules (or
> @charset or @import, but skipping those won't necessarily break things, while
> skipping @namespace most definitely will).  Please make sure to write tests for
> this case.
> 
> What you probably want to do instead is to serialize the rules, and have
> @import serialization start the serialization of the imported sheet.  I believe
> that's what dbaron suggested too.
> 

I believe the code should now do this, but haven't written tests yet.


> >+  for (i = 0; i < styleRules; ++i) {
> >+    rv = GetStyleRuleAt(i, *getter_AddRefs(rule));
> 
> Declare |rule| here, not up at the beginning somewhere?
done

> 
> >Index: layout/style/nsHTMLCSSStyleSheet.cpp
> >+  NS_IMETHOD Serialize(nsAString& aContent, nsIDocumentEncoderFixup *aFixup);
> 
> Why not just put this method on nsICSSStyleSheet, since those are all you
> serialize?
> 
> >Index: layout/style/nsICSSRule.h
> 
> And probably put this on nsICSSStyleRule.
> 
Done, though I had Serialize on nsICSSRule and not nsICSSStyleRule because the non-style rules in nsCSSRules.h require Serialize too. But it's not much different either way, I guess.

> I don't see any rule implementations of Serialize() in this patch.
Added.

> Some of the rules will require fixup too, of course
> (@document rules come to mind).
> 
> Of course such rules in user sheets won't work right once saving has happened,
> but getting that to work is food for another bug, I think.
Haven't tested/accounted for this yet.

> >Index: embedding/components/webbrowserpersist/src/nsWebBrowserPersist.cpp
> >+#include "nsIFormControl.h"
> >+#include "nsIDOM3Node.h"
> 
> This looks like part of another patch, right?
> 

Yes, they're from bug 293834, which this bug depends on to correctly persist serialized style elements.

> >+        // Leave mCurrentDataPath alone so that CSS fixup knows where to save
> >+        //mCurrentDataPath = oldDataPath;
> 
> This is pretty questionable.  I'd have to dig to make sure, but I would be it's
> wrong.
Yeah, that was leftover code from experimentation that I neglected to delete. Removed.

> 
> >@@ -2504,17 +2522,37 @@ nsWebBrowserPersist::EnumPersistURIs(nsH
> >+        rv = cos->Init(outputStream, nsnull, 0, 0);
> >+        NS_ENSURE_SUCCESS(rv, rv);
> 
> This is wrong if the sheet has an @charset rule. Unless you plan to strip
> those, in which case it's wrong if the main document is not UTF-8.  Again,
> please add tests.

What would be the correct way of persisting @charset rules? Extract the @charset and write the stream with that encoding?

> 
> >+        PRBool wroteFullString = PR_FALSE;
> >+        rv = cos->WriteString(data->mContents, &wroteFullString);
> 
> Why is wroteFullString being ignored?  Perhaps this should be called something
> else?
Er, because WriteString requires a boolean out param, and I couldn't think of what we might do if the full string wasn't written?

> 
> >+        rv = cos->Flush();
> >+        rv = outputStream->Flush();
> >+        rv = cos->Close();
> 
> Why bother assigning to rv if you plan to ignore it?
rv's removed.

> 
> >+                    nodeAsLink->GetType(type);
> >+                    if (type.EqualsLiteral("text/css")) {
> 
> This is wrong.  |type| could be an empty string for a CSS style sheet. 
> Further, if type is "text/css" there ould be no DOMStyleSheet attached to the
> node.  You probably want to just GetSheet() and then null-check it.  Again, add
> tests.
Code fixed, tests later.

> 
> >+                        nsAutoString content;
> >+                        ss->Serialize(content, mFixup);
> ...
> >+                        data->mContents.Assign(content);
> 
> Why not get the |data| first and just pass data->mContents to Serialize()?

done

> 
> I haven't reviewed the URI-munging details.
> 
> CSS fixup should probably also be applied to "style" attributes.  Might be a
> separate bug.

Yeah, I think leaving that as a separate bug will be better than squeezing it in this one.

> 
> To answer your quesions in the bug:
> 
> > I've tried to avoid gratuitous refactoring of nsWebBrowserPersist
> 
> If doing said refactoring (in a separate patch prior to implementing this)
> would make things better, please go for it!  Unless you think you want to get
> this into 1.9 and that would make it harder; in that case please file a
> followup on the refactoring.
> 


> For images in particular, it does if they were used and we've gotten far enough
> in downloading them to know the type.  More precisely, given an
> nsCSSValue::Image you can get its imgIRequest and get the type from that.
> 
> But really, if we're persisting those images (which we should be, right?),
> we'll know the type because those we _do_ get via an nsIChannel.
> 

Yes, but the problem is that, due to the way webbrowserpersist is structured, we need to know the type of what we're going to download before we download it. The "normal" flow for webbrowserpersist is this:

SaveDocument
  SaveDocumentInternal
    OnWalkDOMNode results in calls to StoreURI
  SaveGatheredURIs
    EnumPersistURIs results in fixed-up filenames based on download channel MIME type
    SaveDocuments results in calls to FixupURI, which maps original URIs to fixed-up filenames.
      
In essence, it boils down to webbrowserpersist expecting to do two passes, and to download files before determining the filename to replace the URI with, whereas dbaron's writeup is a one-pass system, where a the URI is immediately replaced by a filename. It works if you do a little munging with MIME types, it's just not pretty (thus the musing about refactoring).
Attachment #279129 - Attachment is obsolete: true
Attachment #282460 - Flags: review?(bzbarsky)
> Done, though I had Serialize on nsICSSRule

Yeah, that's what I meant.

> What would be the correct way of persisting @charset rules? Extract the
> @charset and write the stream with that encoding?

Yes.  For perfect fidelity, we'd have sheets know what charset they were parsed as, and serialize as that charset, modifying or inserting @charset rules as needed.  That's what we do for documents...

> Er, because WriteString requires a boolean out param

How about calling the stack boolean "ignored" then?  Or warning if it's false?  Or something?

> In essence, it boils down to webbrowserpersist expecting to do two passes, and
> to download files before determining the filename to replace the URI with,

Yes, that's the right way to go about it.

I doubt I'll be able to review this any time soon.  I definitely won't spend time reviewing until we have tests that this passes, but even then I really doubt that I'll be able to do it within a reasonable timeframe (measured in months).

Comment 105

10 years ago
>> In essence, it boils down to webbrowserpersist expecting to do two passes, and
>> to download files before determining the filename to replace the URI with,
>
> Yes, that's the right way to go about it.

Do you mean to say that you think that when CSS is being fixed up, FixupURI should block and not return until the original URI has been downloaded? Wouldn't forcing all the downloads to be made in serial like that have an adverse impact on performance?

Also, by "tests," do you mean automated tests, or just manually-verifiable testcases? I have no idea how to automate testing of webbrowserpersist...

Comment 106

10 years ago
Re: testing. I had a patch in bug 366072 to make use of reftest for WBP testing, but it bit-rotted since then and I didn't get to update it.
> Do you mean to say that you think that when CSS is being fixed up, FixupURI
> should block and not return until the original URI has been downloaded?

How do we handle <img> right now?  We should handle CSS the same way, basically.

> Also, by "tests," do you mean automated tests

Yes, ideally automated tests.  But even manually-verifiable ones would be a start....

Comment on attachment 282460 [details] [diff] [review]
mostly complete patch (actually not nearly complete)

Per comment 104, someone else should review this...
Attachment #282460 - Flags: review?(bzbarsky)

Updated

10 years ago
Duplicate of this bug: 358709

Updated

9 years ago
Duplicate of this bug: 409200

Comment 111

9 years ago
just test the wikipedia.org, the master css file that include another css file.

@import url('./anothercss.css');
@import url('./anothercss-two.css');

their's also must be save with the images and another css included.
Comment on attachment 282460 [details] [diff] [review]
mostly complete patch (actually not nearly complete)

On the principle that it's better off in somebody's review queue than nobody's, adding this to my review queue, although I also probably won't get to it for a bit.
Attachment #282460 - Flags: review?(dbaron)
Assignee: file-handling → web+moz
Flags: wanted1.9+
QA Contact: ian → file-handling
Whiteboard: [wanted-1.9]
Target Milestone: Future → ---
Duplicate of this bug: 421711
Just FYI; added this bug # to https://litmus.mozilla.org/show_test.cgi?id=3952 for easier reference.
Flags: in-litmus+
Defect is still reproducible on version 3.0b5pre. Firefox doesn't save background image in css style when using save page as, web page complete.
Defect is reproducible on version 3.0b5pre (XP, Vista, Mac OS X 10.4). Firefox doesn't save background image in css style when using save page as, web page complete.

Updated

9 years ago
Duplicate of this bug: 434480

Comment 118

9 years ago
Bug is still there on the latest nightly (Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9.1a2pre) Gecko/2008072803 Minefield/3.1a2pre).

BTW, bug 293834 was recently fixed. Will it help to fix that bug? All the blockers are gone now.

Updated

9 years ago
Flags: blocking1.9.1?
Adding qawanted keyword to reflect the request for automated tests for this.
Keywords: qawanted
Flags: in-testsuite?
Duplicate of this bug: 451109
Flags: blocking1.9.1? → blocking1.9.1-

Updated

9 years ago
Flags: wanted1.9.1?

Updated

8 years ago
Duplicate of this bug: 467906
So I've been thinking about this a little bit, and I'm wondering how tolerable the loss of accuracy in the CSS is.  In particular, we lose properties we don't support, conditional comment hacks, etc.

An alternative approach might involve re-tokenizing the CSS but not parsing it, and then fixing up all the url() functions.  That's not trivial either, though, given that we have some context-sensitive tokenization, especially for URLs.

I guess this approach probably is ok, though.  But in the long run I think "Save Page As, Complete" is broken-by-design, and if we want to save complete Web pages we ought to support saving them into some package format.

Comment 123

8 years ago
A cross-browser package format standard would be more than welcome, but in the meantime what about integrating scrapbook's (https://addons.mozilla.org/en-US/firefox/addon/427) excellent abilities to save complete web pages into firefox ?

Updated

8 years ago
Duplicate of this bug: 474424
Will we have any progress on this bug in the near future?

Comment 126

8 years ago
Created attachment 391979 [details]
Firefox start page before the Save Page As....

Comment 127

8 years ago
My understanding is that this affects saving our current home page

http://www.google.com/firefox?client=firefox-a&rls=org.mozilla:en-US:official

I attached the before and after shots for the "Save Page..." as web page
Complete.

This is on Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.13)
Gecko/2009073022 Firefox/3.0.13

STR:
1) Install firefox
2) See start page
3) Select File -> Save Page As...
4) Make sure "save as type:" is "Web Page, complete"
5) save it to  some convenient place like your desktop
6) open the file on your desktop, called firefox by default.

Comment 128

8 years ago
Created attachment 391980 [details]
Saved firefox page after "Save Page As..." - missing elements

Comment 129

8 years ago
I won't have time to work on this in the near future, so reverting to default owner.
Assignee: eschew → nobody

Updated

8 years ago
Duplicate of this bug: 508591

Updated

8 years ago
Duplicate of this bug: 349114
Created attachment 395446 [details] [diff] [review]
mostly complete patch, mostly merged to trunk

This is mostly merged to trunk (I haven't tried compiling yet, though).  The aSerializedCloneKids change appears to have landed already.  I merged the URI serialization code in nsCSSDeclaration.  But I still need to add a replacement for the code that was patching TryBackgroundShorthand in the old patch.
So now that I'm attempting to compile this, I'm having trouble figuring out how *any* of the patches in this bug ever compiled (well, I can see how the first patch could compile... but it wouldn't link).  They both have code in nsCSSStyleSheet.cpp that calls a Serialize method on either nsICSSRule or nsICSSStyleRule.  In the first patch it was declared pure virtual on nsICSSRule but never implemented; in the second patch it's neither declared nor implemented.

And it makes far more sense to me for that method to be on nsICSSRule.  The current serialization code simply omits things like @media rules, @namespace rules, etc.


So I'd thought this patch was really ready for review, but it seems like it has some pretty major gaps in it.

Or were there changes you had in your tree that you just didn't include in your diff?
Also, the patch renames nsEncoderNodeFixup to nsEncoderFixup in its implementation, but doesn't touch its declaration... which also clearly isn't going to compile, unless, again, many entire files are missing from the patch.
Created attachment 395474 [details] [diff] [review]
more merged to trunk

I redid a bunch of the guts of the CSS changes to be much more thorough, and thus hit all CSS properties with URLs in them.  (Though it still doesn't hit downloadable fonts, but could once the issue below is fixed.)

I also filled in most of the missing pieces of code.  However, the CSS rule serialize implementations are just "FIXME: WRITE ME".
Attachment #395446 - Attachment is obsolete: true

Updated

8 years ago
Duplicate of this bug: 519084
Duplicate of this bug: 526823
Duplicate of this bug: 530801
Comment on attachment 282460 [details] [diff] [review]
mostly complete patch (actually not nearly complete)

Marking review- because this wasn't actually anywhere near complete.  (And see also other comments above.)
Attachment #282460 - Attachment description: mostly complete patch → mostly complete patch (actually not nearly complete)
Attachment #282460 - Flags: review?(dbaron) → review-

Updated

7 years ago
Duplicate of this bug: 592642

Updated

6 years ago
Duplicate of this bug: 659230
Issue is still reproducible on the latest nightly:
Mozilla/5.0 (Windows NT 5.1; rv:7.0a1) Gecko/20110704 Firefox/7.0a1

Comment 143

6 years ago
Confirming on latest Seamonkey build
Build identifier: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0a1) Gecko/20111017 Firefox/10.0a1 SeaMonkey/2.7a1

Updated

6 years ago
Whiteboard: [Halloween2011Bug]

Comment 144

6 years ago
Additional 1.5 months and we will be able to celebrate 10th birthday of this bug. ^___^

Comment 145

5 years ago
Devs don't save pages at all as I see. 
They don't pay attention to such important bugs as this bug and Bug 653522 - saving pages when offline (or using adblock) inundates the user with error dialogs:

https://bugzilla.mozilla.org/show_bug.cgi?id=653522

Comment 146

5 years ago
Why are major bugs in core functionality like this not fixed when instead we get garbage like microsummaries, geolocation, and DNS prefetching?

Comment 147

5 years ago
Wrt. comment#146: It's likely the problem that you and I don't pay the devs to do so, and the people who do, see more value in a read-only web and user tracking. It's also usually much easier to make something new than fix old bugs.
The problem is really that there's no good approach here with the current "Save As, Complete" model.  If we fix this, we'll break the CSS working in other browsers.

A better solution would be replacing Save As, Complete (which attempts to rewrite the pages to fix up URLs) with a cross-browser package format for saved documents (which could save resources as they were).

Comment 149

5 years ago
(In reply to David Baron [:dbaron] from comment #148)
> If we fix this, we'll break the CSS working in other browsers.
Only for hacks and browser-specific properties. This is not a big deal. The same problem exists anyway for things like IE's conditional comments, or any kind of user-agent selectivity on the server, or indeed any other type of hack or browser-specific feature.

But obviously it's SO much better with the current situation that it works in no browsers at all!

Are you seriously more worried about browser-specific CSS features, which are probably expected to degrade gracefully anyway, than, oh I don't know, background-image!?

> A better solution would be replacing Save As, Complete (which attempts to
> rewrite the pages to fix up URLs) with a cross-browser package format for
> saved documents (which could save resources as they were).
And you'd STILL need to rewrite the URLs. Come on, give it half a thimble of thought.
(In reply to Anonymous from comment #149)
> (In reply to David Baron from comment #148)
> > A better solution would be replacing Save As, Complete (which attempts to
> > rewrite the pages to fix up URLs) with a cross-browser package format for
> > saved documents (which could save resources as they were).
> And you'd STILL need to rewrite the URLs.
Let's suppose for the sake of argument that we used IE's .mht format. It doesn't rewrite the URLs. Instead, each entry in the file is associated with its original URL. (What I don't know is whether it would be possible to achieve this scheme in Gecko.)

Comment 151

5 years ago
(In reply to neil@parkwaycc.co.uk from comment #150)
>Instead, each entry in the file is associated with its original URL.
Oh I see. Fair enough. But unless anyone thinks that the feature to save a web page as individually manipulable files should be completely removed, the bug is still not avoidable, and the URLs still need rewriting.
Just for the record, I do have code in BlueGriffon achieving a full rewrite
of all stylesheets attached to a given document, tweaking all URIs.
I needed it to be able to turn an arbitrary document into a reusable
well-packaged template. That code is based on my parser/serializer JSCSSP.
In other terms, a full rewrite is doable but requires considerable complexity
and footprint...

Comment 153

5 years ago
(In reply to David Baron [:dbaron] from comment #148)
> A better solution would be replacing Save As, Complete (which attempts to
> rewrite the pages to fix up URLs) with a cross-browser package format for
> saved documents

Modern pages have very big js files - 500kb or 1mb per page. I delete js files from folders of saved pages sometimes. And a lot of reasons to open folders of pages. And I will never use the browser with only archive format for save as.

Please don't add reasons not to use Gecko browsers. (Now I use SeaMonkey 1.1.19 as main browser only because of regression in save as function, which was developed by kind developers)) Bug 653522)

Comment 154

5 years ago
(In reply to David Baron [:dbaron] from comment #148)
> A better solution would be replacing Save As, Complete (which attempts to
> rewrite the pages to fix up URLs) with a cross-browser package format for
> saved documents (which could save resources as they were).

I use "Save As, Complete" to save pages complete to examine a source code. So I need a working local copy of the page, not just a package.

Comment 155

5 years ago
I agree with Emin, save the source as seen in view-source and rewrite urls to resources to a local folder next to the saved page. multiple request are definitely allowed. no reason to get any performance on this task as it is user invoked and the expectation is that the current document is saved as-is for various reasons, for later use. Not as firefox reads the document, nor as quickly as possible or to consume least space as possible - there is save document (not complete) for that.

Comment 156

5 years ago
Actually it's not enough. Complete save must save page as it is visible with all elements inserted by ajax (like comments) and if some elements are missing then they should remain missing (like advertisements with active adblock). Even all additional user CSS applied to the elements on the page must be preserved as-is.
Otherwise it is not complete.
I mean it have to crawl DOM and generate page from it instead of saving page's source with rewritten links.
I do agree there is no reason for fast performance but when you open such page it must resemble page which you have saved as much as possible. It could not be considered as complete if it will download anything from the web on open. Also I don't think preserving interactivity is expected behavior in this case. Users usually saves page to be able to read it later or send someone to read. So, it must be preserved as he seen it when decided to save it.

For investigation purposes there is separate applications like extension "DownThemAll!". You can configure how exactly it will save things and how deep will it go. It does even more then Emin requests. It not only makes local copy of the page - it could make local copy of the whole site or decent part of it.

Comment 157

5 years ago
@Lain_13 Do you mean something like PhantomJS for Gecko?
http://stackoverflow.com/questions/5490438/phantomjs-and-getting-modified-dom

Comment 158

5 years ago
Not exactly but kind of. In your example that code generates a screenshot of the site while complete copy of the page have to remain as text and linked objects to let you scale it, select, copy and everything else. The only thing which shouldn't work is interactivity. I mean scripts must not work.
BTW, Scrapbook does such a thing already. I'd just like to see ability to make complete stand-alone static copy of the page in the Fx itself.

Comment 159

5 years ago
Well, the example also does that. As does this: http://code.google.com/p/phantomjs/wiki/QuickStart#DOM_Manipulation but it might not be so clear.
This is the interesting bit:
var page = require('webpage').create(),
    url = 'http://lite.yelp.com/search?find_desc=pizza&find_loc=94040&find_submit=Search';

page.open(url, function (status) {
    if (status !== 'success') {
        console.log('Unable to access network');
    } else {
        var html = page.evaluate(function() {
            return document.innerHTML; // <- get all generated and styled HTML
        });
        console.log(html);
    }
    phantom.exit();
});
There is a similar project for Gecko called offscreen but I can't see how it's possible to get the generated HTML with this though.
Project Page: http://offscreengecko.sourceforge.net/
Source code: http://hg.mozilla.org/incubator/offscreen/

Comment 160

5 years ago
(In reply to David Baron [:dbaron] from comment #148)
> The problem is really that there's no good approach here with the current
> "Save As, Complete" model.  If we fix this, we'll break the CSS working in
> other browsers.
> 
> A better solution would be replacing Save As, Complete (which attempts to
> rewrite the pages to fix up URLs) with a cross-browser package format for
> saved documents (which could save resources as they were).

Consider how some browsers are closed source and written to the interests of their owners and how they implement either CSS or JS compared to the standards ... duh

On the other hand, I would strongly vote against such a method, one reason being the examination of various resource files related to a page. 

I use an extension called Web Developer Toolbar. This extension has a neat feature called "view generated source". This contains all content generate/modified by JS as displayed. Based on this one could easily reference each resource file (image/css/etc) for saving. 

Also, saving locally a webpage, in principle, does not imply that the entire webpage functionality will be preserved for the simple reason that some features require online interractivity with the source server, and this is impossible to replicate in a local save and offline viewing scenario.
Duplicate of this bug: 852007

Comment 162

4 years ago
It would be nice to have that feature back. I also like saving pages for offline inspection later ("coding on the lake" and that sort of thing :).
This bug report/ticket exists for eleventh years. 
Is there really no one interested in fixing this bug.......?
Some remarks regarding creation of 'faithful offline snapshot of displayed page' from browser:
- Aforementioned "Save complete" addon [0] seems no longer maintained.
- There is "Mozilla Archive Format" addon [1][2] which promotes 'Faithful save system' or 'Exact snapshot' feature, seems to work quite well. Maybe worth checking.
- (Parity) Opera browser is able to successfully save page with imported resources as well.

[0] https://addons.mozilla.org/en-US/firefox/addon/save-complete-4723/
[1] http://maf.mozdev.org/ 
[2] https://addons.mozilla.org/en-US/firefox/addon/mozilla-archive-format/
I apologize for the "me too" voice here.

dbaron, Are you saying that this bug is simply not fixable?

> If we fix this, we'll break the CSS working in other browsers.

But right now the CSS doesn't work in any browser. What would be the harm in just making it work in Firefox?
Well, it could break things that aren't broken today.  Though now that prefixes are becoming less common that would be less of an issue, so I'm less worried about it than I was a few years ago.

Alternatively, we could do tokenization-level fixup, which actually might not be that bad, except it's a litte tricky for @import which takes strings in addition to url()s.

Comment 167

4 years ago
I feel inclined to develop a Firefox Add-On for this. To me, this seems overly simple.
1. Just take the download tree (e.g. as seen in Firebug ect.) and redownload it all.
2. Save all files to a folder as is (which is, if I remember correct, the current behavior).
3. Parse all css/js files and replace URI's to local path.
4. Have heaps of fun with script loaders (AMD et al.) making NO. 3 difficult.

There you have it. The page, saved in it's entirety.
Actually, the MAF add-on (https://addons.mozilla.org/en-us/firefox/addon/mozilla-archive-format/) does appear to do exactly that.

If you have it installed, Firefox Save complete page works much better.
Whiteboard: [Halloween2011Bug] → [Halloween2011Bug][mentor=dbaron][see comment 122]
Whiteboard: [Halloween2011Bug][mentor=dbaron][see comment 122] → [Halloween2011Bug][mentor=dbaron]
Whiteboard: [Halloween2011Bug][mentor=dbaron] → [Halloween2011Bug][mentor=dbaron][lang=c++]

Comment 169

3 years ago
@mkaply Yep, "Faithful to the original" is all I need.
Thanks

Comment 170

3 years ago
Removing "qawanted" since the need for an automated test is marked by "in-testsuite?".
Keywords: qawanted
(Assignee)

Updated

3 years ago
Mentor: dbaron@dbaron.org
Whiteboard: [Halloween2011Bug][mentor=dbaron][lang=c++] → [Halloween2011Bug][lang=c++]

Comment 171

2 years ago
Hello I'd like to be assigned to this bug please assign me.
Roughly what steps are you planning to take to fix this?
Flags: needinfo?(wknapek)

Comment 173

2 years ago
[Blocking Requested - why for this release]:
Flags: needinfo?(wknapek)

Comment 174

2 years ago
I read the whole thread, and since I'm new to the project i try implement solution proposed by the dotnetCarpenter. What do you think David?
Do you mean comment 167?  That sounds like a proposal to rewrite the entire "Save As, Complete" feature, of which this bug (fixing up URIs in the CSS) would be just a small part.  But it doesn't tell me anything about what you plan to do for the part covered by this bug:  fixing up URIs in the CSS.

Comment 176

2 years ago
Created attachment 8599503 [details]
Bugzilla main page test case- original

Comment 177

2 years ago
Created attachment 8599504 [details]
Bugzilla main page test case- after save

This 13-year-old bug in basic functionality is surprising - why is this still broken?  
I've attached a very local test demonstration from the Bugzilla main page, to help illustrate its status today.

Comment 178

2 years ago
UnMHT is on GPL licence ( https://addons.mozilla.org/fr/firefox/addon/unmht/license/7.3.0.5 ), is it not possible to use a part of code ? UnMHT is for me a great tool for save a complete page : HTML / CSS / JS / media

Comment 179

2 years ago
Hi all, here are my two cents... I hope it helps!

To sum up:
- I tried to use the "save as, complete web page" with https://slate.adobe.com/a/NzR3A/ and as a result background images and other CSS effects were missing.
- I tried to save this web page with https://addons.mozilla.org/en-us/firefox/addon/mozilla-archive-format/ in either MAF or MHT formats, and it failed (only the first background image and text were displayed).
- I tried to save this web page with https://addons.mozilla.org/fr/firefox/addon/unmht/ in MHT format and it worked perfectly!
 
In conclusion: a million thanks to UnMHT developers! :-)

Updated

a year ago
Duplicate of this bug: 1232103
Component: File Handling → File Handling
Product: Core → Firefox
Version: Trunk → unspecified
Duplicate of this bug: 1328204
You need to log in before you can comment on or make changes to this bug.