Last Comment Bug 115107 - CSS not fixed up by webbrowserpersist ("save page as, complete" omits background images)
: CSS not fixed up by webbrowserpersist ("save page as, complete" omits backgro...
Status: NEW
[Halloween2011Bug][lang=c++]
: helpwanted, smoketest, testcase
Product: Firefox
Classification: Client Software
Component: File Handling (show other bugs)
: unspecified
: All All
: -- minor with 121 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors: David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch)
: 116660 120859 126307 128843 133725 157708 187590 206401 221532 224801 226925 245799 251815 254306 255838 268810 273218 274163 294976 298819 299394 305438 305630 308489 322817 328588 332899 349114 358709 364036 369282 371419 376597 393054 396042 396276 409200 421711 434480 451109 467906 474424 508591 519084 526823 530801 592642 659230 852007 1232103 (view as bug list)
Depends on: 293834
Blocks: @importSave 115634 167262
  Show dependency treegraph
 
Reported: 2001-12-13 12:40 PST by Dominik Dröscher
Modified: 2016-06-22 11:27 PDT (History)
137 users (show)
asa: blocking‑aviary1.5-
stephen.donner: in‑testsuite?
stephen.donner: in‑litmus+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Test cases possibly showing independence of CSS bug and table bug (26.41 KB, application/x-zip-compressed)
2002-03-05 17:16 PST, Andrew Lin
no flags Details
Work in progress (8.22 KB, patch)
2003-04-25 12:20 PDT, Adam Lock
no flags Details | Diff | Splinter Review
Work in progress 2 (12.45 KB, patch)
2003-04-29 14:04 PDT, Adam Lock
no flags Details | Diff | Splinter Review
preliminary, incomplete implementation (49.60 KB, patch)
2007-08-31 11:42 PDT, Ben Karel [eschew]
bzbarsky: review-
Details | Diff | Splinter Review
mostly complete patch (actually not nearly complete) (43.30 KB, patch)
2007-09-26 13:52 PDT, Ben Karel [eschew]
dbaron: review-
Details | Diff | Splinter Review
Firefox start page before the Save Page As.... (94.68 KB, image/tiff)
2009-07-31 15:37 PDT, Tim Riley [:timr]
no flags Details
Saved firefox page after "Save Page As..." - missing elements (68.05 KB, image/tiff)
2009-07-31 15:42 PDT, Tim Riley [:timr]
no flags Details
mostly complete patch, mostly merged to trunk (37.56 KB, patch)
2009-08-19 16:23 PDT, David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch)
no flags Details | Diff | Splinter Review
more merged to trunk (93.61 KB, patch)
2009-08-19 18:04 PDT, David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch)
no flags Details | Diff | Splinter Review
Bugzilla main page test case- original (181.85 KB, image/png)
2015-04-29 13:33 PDT, WBT
no flags Details
Bugzilla main page test case- after save (35.65 KB, image/png)
2015-04-29 13:36 PDT, WBT
no flags Details

Description Dominik Dröscher 2001-12-13 12:40:03 PST
Save complete Webpage works great but not 100% correct. Try to download this page:
http://phpbb.sourceforge.net/phpBB2/viewforum.php?f=1

It seems to forget the table background images which are called in the CSS header.
Comment 1 Boris Zbarsky [:bz] 2001-12-13 13:28:16 PST
To ben.  This has been mentioned in the fora on mozillazine.org too.
Comment 2 Boris Zbarsky [:bz] 2001-12-16 21:01:58 PST
This applies to non-css backgrounds too.
Comment 3 Boris Zbarsky [:bz] 2001-12-16 21:02:26 PST
*** Bug 115532 has been marked as a duplicate of this bug. ***
Comment 4 [not reading bugmail] 2001-12-16 22:31:16 PST
see bug 115532 for more in the straight html for background tags.
Comment 5 Adam Lock 2001-12-17 06:44:12 PST
The webbrowserpersist object doesn't save anything from the CSS whether inline
or not.

I don't know if it is possible to walk through, fixup and generate externally
linked and inline CSS (with the minimum of effort), but it's something I will
take for the time being.
Comment 6 Dimitrios 2001-12-17 13:03:33 PST
Adding "(background images not saved)" to summary.
Comment 7 David Hallowell 2001-12-23 05:48:01 PST
*** Bug 116660 has been marked as a duplicate of this bug. ***
Comment 8 David Hallowell 2001-12-23 05:59:01 PST
Seen this on Linux 2001-12-20 (Slackware 8) changing OS to All
Comment 9 [not reading bugmail] 2001-12-29 02:50:45 PST
This is regarding the <td background=''> attribute.

I have a page I saved, from www.stomped.com as of today. using 12-27 build.. the
source has a <td background="images/trans.gif">  and Mozilla renders that
correctly.  Now doing 'save page as', doesn't create a subdirectory of images
for example 'thedefaultsavedir'/www.stomped.com_files/images, and it doesn't
parse this tag attribute to retrieve the image: 'trans.gif' and save it to the:
'/www.stomped.com_files/images' subdirectory.   'Save page as' page source,
still outputs the default source tag as above, and when trying to access the
image upon viewing using open file > context menu > view background image, the
alert box shows it trying to access just 'thedefaultsavedir'/images/trans.gif.

There is also no doc type declared on this page.
Comment 10 Judson Valeski 2002-01-28 15:23:08 PST
just spoke w/ sarah about this. minusing.
Comment 11 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2002-01-30 20:02:43 PST
*** Bug 120859 has been marked as a duplicate of this bug. ***
Comment 12 R.K.Aa. 2002-02-18 16:59:47 PST
*** Bug 126307 has been marked as a duplicate of this bug. ***
Comment 13 Boris Zbarsky [:bz] 2002-03-05 12:57:01 PST
*** Bug 128843 has been marked as a duplicate of this bug. ***
Comment 14 Andrew Lin 2002-03-05 17:13:40 PST
I suspect there may actually be two issues at work here; I've made a few
testcases to illustrate my point.

- Pages with <td background="foo"> aren't saved completely. 
http://pantheon.yale.edu/~al262/mozilla/115107/tables/index.html
- Pages with foo{background-image:url(bar);} in the CSS aren't saved completely,
even if foo != td.   http://pantheon.yale.edu/~al262/mozilla/115107/css/index.html
- However, pages with simple <body background="foo"> tags work fine. 
http://pantheon.yale.edu/~al262/mozilla/115107/plain/index.html

Or maybe I'm missing something.  Regardless, going to zip up the whole lot and
attach it.
Comment 15 Andrew Lin 2002-03-05 17:16:51 PST
Created attachment 72701 [details]
Test cases possibly showing independence of CSS bug and table bug

Testcases above, packaged up (brown paper packages tied up in string / these
are a few of my favorite things...)
Comment 16 Boris Zbarsky [:bz] 2002-03-05 17:25:48 PST
Andrew, the <td background="foo"> thing should be a separate bug.  See
http://lxr.mozilla.org/seamonkey/source/embedding/components/webbrowserpersist/src/nsWebBrowserPersist.cpp#1807
-- we basically never check for background attrs on <td> nodes.  That should be
simple to fix.  If we don't have a bug on that already, file one on me.
Comment 17 Andrew Lin 2002-03-05 17:59:57 PST
Boris, at the moment I don't read C++, so I'm going to take your word for it. 
There was a bug (Bug 115532) that seems to address your issue but it was marked
as a duplicate of this one (see this bug 115107 comment 3).
Comment 18 Adam Lock 2002-03-06 04:01:07 PST
Neither TABLE nor TD elements have a background attribute (in HTML 4.0) which is
why it isn't fixed up. I could add fixup for such quirks if people feel the
practice is prevalent enough.

As for inline/external styles with url declarations such as this:

  BODY {
    color: black;
    background: url(http://foo.com/texture.gif);
  }

I don't believe there is much that can be done until we have a way to serialize
CSS from it's in-memory representation (DOM).
Comment 19 Boris Zbarsky [:bz] 2002-03-06 08:13:44 PST
Adam, we _do_ have such a way.  The CSSOM should allow you to walk the
document.styleSheets array, walk the .cssRules array for each sheet, check
whether a rule has a background image set in it, change the URL if so, save the
.cssText for each rule,  recurse down @import rules, etc.  I can't actually
think of any bugs we have blocking that sequence of actions....  If you _do_ run
into bugs in that code that block this, please let me know.
Comment 20 Adam Lock 2002-03-06 10:03:35 PST
Boris, do you know where the CSS serialization code lives? It's not being done
the way other DOMs are serialized via the content encoder.
Comment 21 Boris Zbarsky [:bz] 2002-03-06 10:14:45 PST
Heh.  There is no pre-build serializer, yet.  Each CSSRule object has a
GetCssText() method that returns an nsAString holding a serialization of the
rule (per DOM2 Style).  The CSSOM-walking has to get done by hand,
unfortunately.... 
Comment 22 Andrew Lin 2002-03-27 06:51:39 PST
*** Bug 133725 has been marked as a duplicate of this bug. ***
Comment 23 Alfonso Martinez 2002-07-16 07:06:51 PDT
*** Bug 157708 has been marked as a duplicate of this bug. ***
Comment 24 Richard Jones 2002-09-25 08:30:30 PDT
I have seen this bug as well. In my case, the CSS for a page contains:

body {
        background-color: #ffffff;
        margin: 1em 1em 1em 40px;
        font-family: sans-serif;
        color: black;
        background-image: url(/images/logo-left.gif);
        background-position: top left;
        background-attachment: fixed;
        background-repeat: no-repeat;
}

The background image (/images/logo-left.gif) is not saved, when it clearly
ought to be.

Rich.
Comment 25 Felix Miata 2002-12-07 12:41:13 PST
Not only does the @import css not get saved to disk, when you later try to open
the saved html while the original server is unavailable, it takes forever to
load the local file while Mozilla waits for the css that never comes.
Comment 26 Alfonso Martinez 2003-01-03 11:12:25 PST
*** Bug 187590 has been marked as a duplicate of this bug. ***
Comment 27 Adam Lock 2003-04-25 12:20:41 PDT
Created attachment 121701 [details] [diff] [review]
Work in progress

Patch is work in progress but it attempts to fix up style properties that
typically contain an url(), e.g. background-image. I have most of the rule
searching and url extraction / fixup down but I have to clean up the node
cloning function.

I also have a horrible feeling that just asking a node for it's inline syle
drags a bunch of -moz styles into existence even if they weren't there to start
with. This means you get a mess of extra styles in the output document. I'll
have to examine this issue a bit more.
Comment 28 Boris Zbarsky [:bz] 2003-04-25 12:51:24 PDT
So this is based on a totally drive-by skim of the patch:

1)  NS_ARRAY_LENGTH is a nice macro.  ;)
2)  "content" and "cursor" can have URI values
3)  The parser you wrote will fail to parse something like:

     content: "url(foo)" url(foo);

    correctly.  You're right that GetPropertyCSSValue would be nice here...
4)  "foo: bar" in CSS is a declaration, not a rule (the "// Test if the inline
    style contains rules that" comment)
5)  The code for setting the URL value will not work for "content" and "cursor"
    because they can include things other than the URL.
6)  I'm not sure how putting @import in the same list as property names will
    work -- the two don't even live in the same places...
Comment 29 Adam Lock 2003-04-29 14:04:19 PDT
Created attachment 122047 [details] [diff] [review]
Work in progress 2

Patch addresses most of the issues but is still work in progress. Uses
NS_ARRAY_LENGTH, adds support for various "cursor-*" props but not "content",
skips quoted text in values, fixes the broken comment, removes @import.

Patch actually fixes up content now, but is busted in a couple of ways:

1. The call to cssDeclOut->SetProperty(propName, newValue, propPriority) during
fixup results in a mutant URL caused by the CSS resolving the supplied relative
url into an absolute URL using the base address of the original location. So
"url(bg.gif)" becomes "url(http://foo.com/original_path/local_path/bg.gif)".
I'll have to figure out how it is getting its base address and fix it.

2. Just touching the .style property on an element results in all kinds of
nasty extra styles to leap into existence. For example:

  <body style="background: yellow url(bg.gif)">

Becomes:

  <body style="background: yellow url(126309_files/bg.gif) repeat scroll 0%; 
    -moz-background-clip: initial; -moz-background-inline-policy: initial;
    -moz-background-origin: initial;">
  
Finally, the patch will have to be able to extract and fixup multiple urls that
the likes of "content" could contain. This is likely to complicate string
parsing matters a lot.
Comment 30 Boris Zbarsky [:bz] 2003-04-29 14:28:02 PDT
> I'll have to figure out how it is getting its base address and fix it.

For inline style, this is coming from GetBaseURL on the document object.  For
style in stylesheets, this is the stylesheet's URI, which the sheet stores. 
It's set at creation time via nsICSSStyleSheet::Init().

"background" is a shorthand property.  So it will set all sorts of stuff.  As
long as you only work with "background-image" you don't really have to worry
about it...

The cursor property is "cursor".  Those things you have in the list are just
possible values.  The problem is, you can have something like:

  cursor: url(foo) crosshair;

Are there useful changes that could be made to the style system APIs that would
make this easier?  Should some of those changes become part of DOM2 CSS?
Comment 31 Hixie (not reading bugmail) 2003-04-29 14:46:31 PDT
Good lord. You really don't want to be doing actual parsing of CSS here. We
already have one CSS parser, having two just means that we'll have twice as many
bugs (probably more than twice, since this code won't be tested much).

Does it deal with CSS escapes? CSS comments? etc.

Surely there's a better way to do this.
Comment 32 Boris Zbarsky [:bz] 2003-04-29 14:51:32 PDT
It sorta deals with CSS escapes (probably to the extent that it's needed in this
code).  Comments are a non-issue, since the only CSS we have to parse are
property values gotten via getPropertyValue().  We're basically parsing
canonicalized values here...

But yes, it would be nice if we could ask the CSS decl for that info, since it
already has it all broken down and has to trouble itself to produce CSS just so
we can reparse it....
Comment 33 Hixie (not reading bugmail) 2003-04-29 14:53:56 PDT
I thought we already could? How does "view background image" work?
Comment 34 Boris Zbarsky [:bz] 2003-04-29 14:59:26 PDT
That gets the computed style, not the declared style.  Computed style supports
getPropertyCSSValue() to a certain extent (not live, but better than nothing).

Note that we don't implement computed "content" either... ;)  
Comment 35 Hixie (not reading bugmail) 2003-04-29 15:03:55 PDT
Well in that case I would recommend writing a quick internal hack of an API. In
fact probably the simplest API would be on the stylesheet or rule interfaces,
which just serialises the entire stylesheet or rule using a new base URI.

The reason I say that, as opposed to "implement DOM2 CSS", is that DOM2 CSS is
likely to be junked in favour of a more sensible API.
Comment 36 Boris Zbarsky [:bz] 2003-04-29 15:11:29 PDT
Yes, I realize that.  That's why we haven't implemented it.  ;)

Serializing with a new base URI is nontrivial, since we only store absolute
URIs.  We would need to duplicate some fun code that webbrowserpersist already
has.  Further, for inline style there is no "sheet" -- there is just a
CSSDeclaration, and that implements no useful interfaces
(nsIDOMCSSStyleDeclaration/nsIDOMCSS2Properties are the closest it gets).

I suppose we could add another interface that you could QI these beasties to...
as long as we're careful, it should not be too expensive to do that...

Adam, I assume you're trying to fix this for 1.4?
Comment 37 Adam Lock 2003-04-29 17:00:49 PDT
I'll fix it for 1.4 if I can get it working reasonably at least for the CSS 1
stuff. I don't care so much about CSS 2 as background images are the most
visible breakage this patch tries to fix.

As I mentioned, there are a few issues to be sorted out before I would call it
ready for review. I'd like to know why these extra styles suddenly materialised
for example. I can probably live with forcing an absolute uri into the fixed up
style to make it work for now.
Comment 38 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2003-04-29 17:04:02 PDT
Comment on attachment 122047 [details] [diff] [review]
Work in progress 2

>+    "cursor-crosshair",
>+    "cursor-default",
>+    "cursor-pointer",
>+    "cursor-move",
>+    "cursor-e-resize",
>+    "cursor-ne-resize",
>+    "cursor-nw-resize",
>+    "cursor-n-resize",
>+    "cursor-se-resize",
>+    "cursor-sw-resize",
>+    "cursor-s-resize",
>+    "cursor-w-resize",
>+    "cursor-text",
>+    "cursor-wait",
>+    "cursor-help",

I've never heard of any of these.
Comment 39 Boris Zbarsky [:bz] 2003-04-29 17:21:55 PDT
Adam, those properties are just set by the "background" shorthand... Or are you
saying that something in your patch is making them appear? (I see nothing there
that should cause that to happen....)
Comment 40 Adam Lock 2003-04-30 04:53:35 PDT
I was confused about the cursor thing. I was taking the CSS2 spec to mean there
were cursor, cursor-wait, cursor-pointer etc. properties just as there is a
background and a more specific background-image. I can cut those too for the
time being.

I was trying to avoid having to deal with lines like this:

cursor: url(foo.gif), url(foo2.gif), url(foo3.gif), text;

As for those other properties, I still don't understand why they suddenly appear
like they do. They are not in the original source so why should they just
materialise when I ask for the style object? If the styles are harmless then
okay, but I suspect if I checked in a fix which has this behaviour, I'd soon see
another bug open complaining about them.
Comment 41 Hixie (not reading bugmail) 2003-04-30 05:04:08 PDT
Don't forget -moz-binding: url(...); either, while you're at it.


Regarding the property explosion. Assuming the original stylesheet reads:

   background: green;

...then it is exactly equivalent to:

   barkground-color: green; background-image: none; background-position: 0% 0%;
   background-attachment: scroll; background-repeat: repeat;
   -moz-background-clip: initial; -moz-background-origin: initial;
   /* and a few more that i've forgotten */
Comment 42 Boris Zbarsky [:bz] 2003-04-30 06:50:41 PDT
Adam, those properties appear when you serialize the style attr.  Your patch has
nothing to do with that....
Comment 43 Adam Lock 2003-04-30 07:05:39 PDT
I know they have nothing to do with my patch but expect another bug nonetheless.
If someone publishes a page with an inline style and sees these new things
appear, they're not going to understand why.

I'm not sure I do either. I can appreciate that saying "background:
url(foo.gif)" might internally imply some other styles but I am not sure that
they should be serialized when they were not explicitly defined in the first place.

I can raise another bug on that issue and revise my patch to concentrate on
background images for the time being.
Comment 44 Kathleen Brade 2003-04-30 07:30:30 PDT
Boris Zbarsky in comment 36 says:
>Serializing with a new base URI is nontrivial, since we only store absolute
>URIs.  We would need to duplicate some fun code that webbrowserpersist already
>has.

Which "fun code" in webbrowserpersist would you need to duplicate?  If you want
to get a relative url from an absolute url (relative to another location), you
can do so with nsIURL's getRelativeSpec:
  AUTF8String getRelativeSpec(in nsIURI aURIToCompare)
Does that help?
Comment 45 Alfonso Martinez 2003-05-20 11:24:46 PDT
*** Bug 206401 has been marked as a duplicate of this bug. ***
Comment 46 Jerome Lacoste 2003-08-18 13:52:36 PDT
Encountering the bug when trying to save the URL for bug 216573.
Comment 47 Boris Zbarsky [:bz] 2003-10-07 17:15:05 PDT
*** Bug 221532 has been marked as a duplicate of this bug. ***
Comment 48 Tom Chiverton 2003-10-31 06:21:02 PST
Is there a fix-for-version for this yet ? 
With increasing use of CSS (the latest Dreamweaver makes much better use of it)
it is going to bite more and more people !

I think it should be a more major issue, as potentialy navigation elements etc.
on webpages may vanish for no good reason.
Comment 49 Boris Zbarsky [:bz] 2003-10-31 08:26:33 PST
See the target milestone.  No one is working on this right now or will in the
immediate future.  There is a work-in-progress patch here, and now that the
style system has been changed to also store non-absolute URIs things should be a
bit easier (issue 1 in comment 29 should not be present anymore).  So someone
who cares about this bug needs to pick it up and make it work.

I'd say that a first cut could skip cursor and content and just do @import,
backgrounds, and -moz-binding...
Comment 50 Daniel Glazman (:glazou) 2003-10-31 08:29:17 PST
> No one is working on this right now or will in the immediate future.

Hey Boris... Where did you get that... This is one of the bugs I want to
see fixed for Nvu. Please don't reassign yet, I have plenty of stuff on
my plate right now.
Comment 51 Boris Zbarsky [:bz] 2003-10-31 08:39:39 PST
glazou, I got it from earlier comments from Adam (the assignee), from the target
milestone, and from general knowledge of what I know people are working on. 
Since the details what you're doing has been shrouded in secrecy and since I
can't read minds (well, not ones that far away, at least), I hardly had a way of
knowing you were thinking of working on this... ;)
Comment 52 Bill Mason 2003-11-05 12:17:57 PST
*** Bug 224801 has been marked as a duplicate of this bug. ***
Comment 53 Asa Dotzler [:asa] 2003-11-18 13:26:57 PST
I'm nominating this for 1.6 because I think saving web pages is a fairly common
task and as CSS becomes more prominent, we're seeing more and more pages that
don't save completely. dbaron thinks this might be more 1.7alpha material but it
does cause one of our smoketests, B.27, (and I suspect an increasingly
representative smoketest) to partially fail. 
Comment 54 Hixie (not reading bugmail) 2003-11-19 06:48:42 PST
The way to fix this is to implement saving web pages the way that MacIE does it 
-- saving the files unchanged, annotated with their original URI, so that links 
between dependent resources still work, even if they are done via obscure ways 
like built from JavaScript (which we could never solve otherwise).
Comment 55 Asa Dotzler [:asa] 2003-11-25 12:40:59 PST
Jump ball. Who wants to try to tackle this? Glazman, can you take this?
Comment 56 José Jeria 2003-11-27 05:28:14 PST
*** Bug 226925 has been marked as a duplicate of this bug. ***
Comment 57 Asa Dotzler [:asa] 2003-12-01 15:16:39 PST
doesn't look like this is happening for 1.6
Comment 58 Sergey «Mithgol the Webmaster» Sokoloff 2004-03-11 21:50:33 PST
http://www.alistapart.com/articles/customunderlines/

Yet another good testcase for this bug, and for bug 126309.

The whole page is broken in Firebird 0.7, Firefox 0.8. (But I am not familiar
with Mozilla 1.6, so you'll have to test it there for yourselves. Sorry.)
Comment 59 Tom Chiverton 2004-03-12 01:27:21 PST
http://www.alistapart.com/articles/customunderlines/ WFM
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040207 Firefox/0.8
Comment 60 Sergey «Mithgol the Webmaster» Sokoloff 2004-03-13 03:01:11 PST
Tom Chiverton, does http://www.alistapart.com/articles/customunderlines/ work
for you even after being saved (File, Save page as, Web page, complete) and then
opened from local drive?
Comment 61 Tom Chiverton 2004-03-15 01:36:45 PST
Sergey- No, sorry, doesn't work after save.
My bad - I thought this bug was just about on-line pages, not saved ones too.

FWIW, it looks like the saved page hasn't had a style sheet applied, which I
guess fits with what seems to be wrong with Mozilla 'fixing' URLs when it saves
pages.

Is there any chance of this bug being assigned a non-future milestone, or at
least having severity increased - this bug is going to bite a lot more people as
other comments have indicated, as saving is fairly common.
Comment 62 Bill Mason 2004-06-07 09:42:51 PDT
*** Bug 245799 has been marked as a duplicate of this bug. ***
Comment 63 tomByrer 2004-06-07 10:41:45 PDT
(In reply to comment #62)
> *** Bug 245799 has been marked as a duplicate of this bug. ***
(sorry, searched but couldn't find this report
from my report:)

Reproducible: Always
Steps to Reproduce:
1. Go the example URL http://dromoscopio.net/
2. Menu:File/Save As.../Save As Type: Web Page, complete
3. look at saved file in a file explorer

Actual Results:  
none of the images are saved, eg
#uno div{
	background-image: url(uno.gif);
}

Expected Results:  
saved all images, even thoses declaired in CSS

default config

workaround:
1. hover mouse over desired CSS background image
2. right-mouseclick, View Background Image
3. right-mouseclick, Save Image As...

This is not not critical, but may render off-line viewing of saved webpages
impossible without editing the HTML/CSS, if the page is written with the font &
background colour the same.  (same issue when browsing some sites while the
browser is set w/images off: eg if the font color is white & the background
image is dark, but the background-color isn't set)
Comment 64 Boris Zbarsky [:bz] 2004-07-16 17:39:01 PDT
*** Bug 251815 has been marked as a duplicate of this bug. ***
Comment 65 José Jeria 2004-08-05 01:47:37 PDT
*** Bug 254306 has been marked as a duplicate of this bug. ***
Comment 66 Bill Mason 2004-08-16 23:28:54 PDT
*** Bug 255838 has been marked as a duplicate of this bug. ***
Comment 67 a_geek 2004-08-25 08:54:56 PDT
This problem will appear much more often in the future - see eg.
http://plone.org and any comments on how to be XHTML+CSS2 compliant and achieve
accessibility (AAA, S508 etc).

It breaks saving pages from plone.org right now.
Comment 68 a_geek 2004-08-26 08:47:57 PDT
A propaganda website that makes massive use of this techniques, and which is
advocating it with good arguments, is

  http://www.csszengarden.com/
Comment 69 Xavier Robin 2004-09-06 10:15:03 PDT
The problem will appear more and more often. For example now that mozilla.org
has a new style with a pretty background image : http://www.mozilla.org :(
Comment 70 Bill Mason 2004-11-10 01:07:24 PST
*** Bug 268810 has been marked as a duplicate of this bug. ***
Comment 71 Bill Mason 2004-12-05 00:50:03 PST
*** Bug 273218 has been marked as a duplicate of this bug. ***
Comment 72 Oliver Klee 2004-12-11 05:58:25 PST
*** Bug 274163 has been marked as a duplicate of this bug. ***
Comment 73 djpohly 2005-05-20 13:48:30 PDT
*** Bug 294976 has been marked as a duplicate of this bug. ***
Comment 74 Boris 'pi' Piwinger 2005-06-05 11:06:17 PDT
Long silence here. I am not 100% sure. Does this bug here include that files
called by include in style files are not included into save page, complete? If
so, the summary is totally missleading. Maybe someone who better understands the
bug can find a better wording.

pi
Comment 75 Peter van der Woude [:Peter6] 2005-06-26 08:53:22 PDT
*** Bug 298819 has been marked as a duplicate of this bug. ***
Comment 76 Zach Lipton [:zach] 2005-07-01 12:34:44 PDT
*** Bug 299394 has been marked as a duplicate of this bug. ***
Comment 77 Dave Townsend [:mossop] 2005-08-21 14:07:03 PDT
*** Bug 305438 has been marked as a duplicate of this bug. ***
Comment 78 Zook Valem 2005-08-23 19:22:29 PDT
*** Bug 305630 has been marked as a duplicate of this bug. ***
Comment 79 Uri Bernstein (Google) 2005-09-14 07:44:35 PDT
*** Bug 308489 has been marked as a duplicate of this bug. ***
Comment 80 Tom Chiverton 2005-09-14 09:17:28 PDT
(In reply to comment #74) 
> Long silence here. I am not 100% sure. Does this bug here include that files 
> called by include in style files are not included into save page, 
 
Yeah, that's spot on Boris. 
 
Comment 81 Mike Connor [:mconnor] 2005-11-14 12:07:01 PST
Not in the scope for Fx2, which is not going to include significant changes to Gecko.
Comment 82 Emin 2005-12-26 14:57:10 PST
Windows 2000, Firefox 1.5
For simplicity, I give an example. Save page http://www.mozilla.org/ to your local disk. Then open the saved page. You see that inscription "mozilla.org" and an image of the dinosaur at the top of the page disappeared.
Sometimes it significantly disfigures pages, especially forums.

Comment 83 Ria Klaassen (not reading all bugmail) 2006-01-09 07:23:24 PST
*** Bug 322817 has been marked as a duplicate of this bug. ***
Comment 84 Jesse Ruderman 2006-02-25 12:18:42 PST
*** Bug 328588 has been marked as a duplicate of this bug. ***
Comment 85 Ivo Goncalves 2006-02-25 12:24:57 PST
Oh my, this bug has been reported 4 years ago, and it wasn't fixed yet? How come is it so hard?
Comment 86 Boris Zbarsky [:bz] 2006-04-05 16:17:35 PDT
*** Bug 332899 has been marked as a duplicate of this bug. ***
Comment 87 Boris Zbarsky [:bz] 2006-08-10 15:49:03 PDT
This bug needs someone to step up if it's going to get fixed.  All the code involved is currently unowned...  I can promise quick patch reviews on this one.
Comment 88 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2006-10-15 14:21:01 PDT
So, for what it's worth, the way I think this ought to be fixed is roughly as follows:

To avoid having two separate bits of CSS serialization code, we should teach the style system to serialize with URI fixup.

For a start, this requires the nsWebBrowserPersist code to implement an interface a lot like nsIDocumentEncoderNodeFixup -- except for URI fixup.  In other words, the interface would take a URI as input (probably nsIURI) and return a URI as output (probably string), potentially enqueueing the saving of the URI in question as part of the implementation.

Then we could add a Serialize method to nsIStyleSheet that took one of these as an argument (but where the object being null would mean no URI fixup was needed).  Furthermore, we would then add Serialize methods to everything that has a GetCssText method by renaming GetCssText and adding the argument to this new interface -- and then adding a GetCssText that called Serilaize(null) -- where the null object for this new interface would mean to do no URI fixup.  This object would further have to be passed down through nsCSSDeclaration::ToString and the methods it calls -- where, again, null would mean that no fixup was needed.

I think this is highly preferable to reimplementing CSS parsing and serialization.

Boris, does this seem reasonable?
Comment 89 Nickolay_Ponomarev 2006-10-15 14:27:49 PDT
(I plan to work on this, although not in the next few days.)
Comment 90 Boris Zbarsky [:bz] 2006-10-16 10:58:38 PDT
Yeah, the plan in comment 88 sounds pretty good.

In fact, it makes me wish that our content serialization worked more like that... Then fixup would work for, say, SVG too...
Comment 91 Erik Fabert 2006-12-16 02:53:55 PST
*** Bug 364036 has been marked as a duplicate of this bug. ***
Comment 92 Marcel Dejean 2007-02-04 11:11:50 PST
*** Bug 369282 has been marked as a duplicate of this bug. ***
Comment 93 Dave Townsend [:mossop] 2007-02-23 13:10:42 PST
*** Bug 371419 has been marked as a duplicate of this bug. ***
Comment 94 Jean-Marc M. 2007-04-13 02:26:41 PDT
As a temp solution you might be interested to try the Save Complete [1] extension. It supports saving images referenced in CSS files. We also discuss the problem in the « "File -> Save Page As" does not save images fro » [2] MozillaZine Forums topic.

Notes :
* [1] <https://addons.mozilla.org/en-US/firefox/addon/2925>
* [2] <http://forums.mozillazine.org/viewtopic.php?t=538458>
Comment 95 Sylvain Pasche 2007-08-02 12:05:14 PDT
*** Bug 376597 has been marked as a duplicate of this bug. ***
Comment 96 Kevin Brosnan 2007-08-21 10:25:31 PDT
*** Bug 393054 has been marked as a duplicate of this bug. ***
Comment 97 Ben Karel [eschew] 2007-08-31 11:42:23 PDT
Created attachment 279129 [details] [diff] [review]
preliminary, incomplete implementation

So I've been working on dbaron's suggestion for a week or two now. I'm leaving for the rest of the weekend in half an hour to run my first half marathon (eep!) so I figured I'd post what I have so far for preliminary review.

I've tried to avoid gratuitous refactoring of nsWebBrowserPersist, opting for small changes/hacks around an architecture not really aligned with the sequence of operations CSS serialization requires.

Right now, the biggest thing left to implement feature-wise is to make CSS and other content-serialized, not-downloaded files get the right file extension. Currently, wbp fixes up persisted files with nsIChannel-derived mime types; since we're not downloading the CSS files we persist from a remote server, that's not going to work. I haven't spent more than a few minutes looking at how much I have to work with, so I'm not sure of the best way to go about doing that. For some things it's easy (nsICSSStyleSheet should be saved as .css) but for things like background-image: url(blah), I'm not sure whether CSS knows what the mime type of blah is. Can't just take the extension from the URI, since that fails for dynamically-generated pages. For images I guess we do content-sniffing anyways, so we could save it as whatever we wanted, but still. 

Anyways, I'll be back Monday and can hopefully post a more polished/complete patch later in the week. This patch deserves at least two r-'s in its current state. At least there's enough code there to make comments on.
Comment 98 Ben Karel [eschew] 2007-08-31 11:48:36 PDT
Oh, one other thing: we technically don't need to persist @import-ed stylesheets linked to from style elements as separate files, their child rules can be recursively serialized along with the main page stylesheet. Right now the code does both, I think, because it's in flux. Since that's the first thing in the diff I just wanted to comment on it.
Comment 99 Boris Zbarsky [:bz] 2007-08-31 11:54:25 PDT
I doubt I'll be able to get to this until after I get back in mid-to-late Sept.

> their child rules can be recursively serialized along with the main page
> stylesheet

Not if they contain @charset, @namespace, etc rules.
Comment 100 Jesse Ruderman 2007-09-13 10:31:37 PDT
*** Bug 396042 has been marked as a duplicate of this bug. ***
Comment 101 Carsten Book [:Tomcat] 2007-09-15 09:05:30 PDT
*** Bug 396276 has been marked as a duplicate of this bug. ***
Comment 102 Boris Zbarsky [:bz] 2007-09-26 00:05:27 PDT
Comment on attachment 279129 [details] [diff] [review]
preliminary, incomplete implementation

>Index: layout/style/nsCSSStyleSheet.cpp
>+    aContent.Append(NS_LITERAL_STRING("/* Rules from linked stylesheet, original URL: */\n"));

As I said, this is no good if the linked stylesheets have @namespace rules (or @charset or @import, but skipping those won't necessarily break things, while skipping @namespace most definitely will).  Please make sure to write tests for this case.

What you probably want to do instead is to serialize the rules, and have @import serialization start the serialization of the imported sheet.  I believe that's what dbaron suggested too.

>+  for (i = 0; i < styleRules; ++i) {
>+    rv = GetStyleRuleAt(i, *getter_AddRefs(rule));

Declare |rule| here, not up at the beginning somewhere?

>Index: layout/style/nsHTMLCSSStyleSheet.cpp
>+  NS_IMETHOD Serialize(nsAString& aContent, nsIDocumentEncoderFixup *aFixup);

Why not just put this method on nsICSSStyleSheet, since those are all you serialize?

>Index: layout/style/nsICSSRule.h

And probably put this on nsICSSStyleRule.

I don't see any rule implementations of Serialize() in this patch.  Some of the rules will require fixup too, of course (@document rules come to mind).

Of course such rules in user sheets won't work right once saving has happened, but getting that to work is food for another bug, I think.

>Index: embedding/components/webbrowserpersist/src/nsWebBrowserPersist.cpp
>+#include "nsIFormControl.h"
>+#include "nsIDOM3Node.h"

This looks like part of another patch, right?

>+        // Leave mCurrentDataPath alone so that CSS fixup knows where to save
>+        //mCurrentDataPath = oldDataPath;

This is pretty questionable.  I'd have to dig to make sure, but I would be it's wrong.

>@@ -2504,17 +2522,37 @@ nsWebBrowserPersist::EnumPersistURIs(nsH
>+        rv = cos->Init(outputStream, nsnull, 0, 0);
>+        NS_ENSURE_SUCCESS(rv, rv);

This is wrong if the sheet has an @charset rule. Unless you plan to strip those, in which case it's wrong if the main document is not UTF-8.  Again, please add tests.

>+        PRBool wroteFullString = PR_FALSE;
>+        rv = cos->WriteString(data->mContents, &wroteFullString);

Why is wroteFullString being ignored?  Perhaps this should be called something else?

>+        rv = cos->Flush();
>+        rv = outputStream->Flush();
>+        rv = cos->Close();

Why bother assigning to rv if you plan to ignore it?

>+                    nodeAsLink->GetType(type);
>+                    if (type.EqualsLiteral("text/css")) {

This is wrong.  |type| could be an empty string for a CSS style sheet.  Further, if type is "text/css" there ould be no DOMStyleSheet attached to the node.  You probably want to just GetSheet() and then null-check it.  Again, add tests.

>+                        nsAutoString content;
>+                        ss->Serialize(content, mFixup);
...
>+                        data->mContents.Assign(content);

Why not get the |data| first and just pass data->mContents to Serialize()?

I haven't reviewed the URI-munging details.

CSS fixup should probably also be applied to "style" attributes.  Might be a separate bug.

To answer your quesions in the bug:

> I've tried to avoid gratuitous refactoring of nsWebBrowserPersist

If doing said refactoring (in a separate patch prior to implementing this) would make things better, please go for it!  Unless you think you want to get this into 1.9 and that would make it harder; in that case please file a followup on the refactoring.

> for things like background-image: url(blah), I'm not sure whether CSS knows
> what the mime type of blah is.

For images in particular, it does if they were used and we've gotten far enough in downloading them to know the type.  More precisely, given an nsCSSValue::Image you can get its imgIRequest and get the type from that.

But really, if we're persisting those images (which we should be, right?), we'll know the type because those we _do_ get via an nsIChannel.
Comment 103 Ben Karel [eschew] 2007-09-26 13:52:27 PDT
Created attachment 282460 [details] [diff] [review]
mostly complete patch (actually not nearly complete)

Updated patch attached. It correctly serializes CSS images. I actually had most of this work done before your review; I should have uploaded what I had earlier, since this resulted in some amount of wasted effort for you, bz. Mea culpa.

I think I removed all the changes related to bug 293834 in this patch.

(In reply to comment #102)
> (From update of attachment 279129 [details] [diff] [review])
> >Index: layout/style/nsCSSStyleSheet.cpp
> >+    aContent.Append(NS_LITERAL_STRING("/* Rules from linked stylesheet, original URL: */\n"));
> 
> As I said, this is no good if the linked stylesheets have @namespace rules (or
> @charset or @import, but skipping those won't necessarily break things, while
> skipping @namespace most definitely will).  Please make sure to write tests for
> this case.
> 
> What you probably want to do instead is to serialize the rules, and have
> @import serialization start the serialization of the imported sheet.  I believe
> that's what dbaron suggested too.
> 

I believe the code should now do this, but haven't written tests yet.


> >+  for (i = 0; i < styleRules; ++i) {
> >+    rv = GetStyleRuleAt(i, *getter_AddRefs(rule));
> 
> Declare |rule| here, not up at the beginning somewhere?
done

> 
> >Index: layout/style/nsHTMLCSSStyleSheet.cpp
> >+  NS_IMETHOD Serialize(nsAString& aContent, nsIDocumentEncoderFixup *aFixup);
> 
> Why not just put this method on nsICSSStyleSheet, since those are all you
> serialize?
> 
> >Index: layout/style/nsICSSRule.h
> 
> And probably put this on nsICSSStyleRule.
> 
Done, though I had Serialize on nsICSSRule and not nsICSSStyleRule because the non-style rules in nsCSSRules.h require Serialize too. But it's not much different either way, I guess.

> I don't see any rule implementations of Serialize() in this patch.
Added.

> Some of the rules will require fixup too, of course
> (@document rules come to mind).
> 
> Of course such rules in user sheets won't work right once saving has happened,
> but getting that to work is food for another bug, I think.
Haven't tested/accounted for this yet.

> >Index: embedding/components/webbrowserpersist/src/nsWebBrowserPersist.cpp
> >+#include "nsIFormControl.h"
> >+#include "nsIDOM3Node.h"
> 
> This looks like part of another patch, right?
> 

Yes, they're from bug 293834, which this bug depends on to correctly persist serialized style elements.

> >+        // Leave mCurrentDataPath alone so that CSS fixup knows where to save
> >+        //mCurrentDataPath = oldDataPath;
> 
> This is pretty questionable.  I'd have to dig to make sure, but I would be it's
> wrong.
Yeah, that was leftover code from experimentation that I neglected to delete. Removed.

> 
> >@@ -2504,17 +2522,37 @@ nsWebBrowserPersist::EnumPersistURIs(nsH
> >+        rv = cos->Init(outputStream, nsnull, 0, 0);
> >+        NS_ENSURE_SUCCESS(rv, rv);
> 
> This is wrong if the sheet has an @charset rule. Unless you plan to strip
> those, in which case it's wrong if the main document is not UTF-8.  Again,
> please add tests.

What would be the correct way of persisting @charset rules? Extract the @charset and write the stream with that encoding?

> 
> >+        PRBool wroteFullString = PR_FALSE;
> >+        rv = cos->WriteString(data->mContents, &wroteFullString);
> 
> Why is wroteFullString being ignored?  Perhaps this should be called something
> else?
Er, because WriteString requires a boolean out param, and I couldn't think of what we might do if the full string wasn't written?

> 
> >+        rv = cos->Flush();
> >+        rv = outputStream->Flush();
> >+        rv = cos->Close();
> 
> Why bother assigning to rv if you plan to ignore it?
rv's removed.

> 
> >+                    nodeAsLink->GetType(type);
> >+                    if (type.EqualsLiteral("text/css")) {
> 
> This is wrong.  |type| could be an empty string for a CSS style sheet. 
> Further, if type is "text/css" there ould be no DOMStyleSheet attached to the
> node.  You probably want to just GetSheet() and then null-check it.  Again, add
> tests.
Code fixed, tests later.

> 
> >+                        nsAutoString content;
> >+                        ss->Serialize(content, mFixup);
> ...
> >+                        data->mContents.Assign(content);
> 
> Why not get the |data| first and just pass data->mContents to Serialize()?

done

> 
> I haven't reviewed the URI-munging details.
> 
> CSS fixup should probably also be applied to "style" attributes.  Might be a
> separate bug.

Yeah, I think leaving that as a separate bug will be better than squeezing it in this one.

> 
> To answer your quesions in the bug:
> 
> > I've tried to avoid gratuitous refactoring of nsWebBrowserPersist
> 
> If doing said refactoring (in a separate patch prior to implementing this)
> would make things better, please go for it!  Unless you think you want to get
> this into 1.9 and that would make it harder; in that case please file a
> followup on the refactoring.
> 


> For images in particular, it does if they were used and we've gotten far enough
> in downloading them to know the type.  More precisely, given an
> nsCSSValue::Image you can get its imgIRequest and get the type from that.
> 
> But really, if we're persisting those images (which we should be, right?),
> we'll know the type because those we _do_ get via an nsIChannel.
> 

Yes, but the problem is that, due to the way webbrowserpersist is structured, we need to know the type of what we're going to download before we download it. The "normal" flow for webbrowserpersist is this:

SaveDocument
  SaveDocumentInternal
    OnWalkDOMNode results in calls to StoreURI
  SaveGatheredURIs
    EnumPersistURIs results in fixed-up filenames based on download channel MIME type
    SaveDocuments results in calls to FixupURI, which maps original URIs to fixed-up filenames.
      
In essence, it boils down to webbrowserpersist expecting to do two passes, and to download files before determining the filename to replace the URI with, whereas dbaron's writeup is a one-pass system, where a the URI is immediately replaced by a filename. It works if you do a little munging with MIME types, it's just not pretty (thus the musing about refactoring).
Comment 104 Boris Zbarsky [:bz] 2007-09-26 20:44:15 PDT
> Done, though I had Serialize on nsICSSRule

Yeah, that's what I meant.

> What would be the correct way of persisting @charset rules? Extract the
> @charset and write the stream with that encoding?

Yes.  For perfect fidelity, we'd have sheets know what charset they were parsed as, and serialize as that charset, modifying or inserting @charset rules as needed.  That's what we do for documents...

> Er, because WriteString requires a boolean out param

How about calling the stack boolean "ignored" then?  Or warning if it's false?  Or something?

> In essence, it boils down to webbrowserpersist expecting to do two passes, and
> to download files before determining the filename to replace the URI with,

Yes, that's the right way to go about it.

I doubt I'll be able to review this any time soon.  I definitely won't spend time reviewing until we have tests that this passes, but even then I really doubt that I'll be able to do it within a reasonable timeframe (measured in months).
Comment 105 Ben Karel [eschew] 2007-09-26 21:03:36 PDT
>> In essence, it boils down to webbrowserpersist expecting to do two passes, and
>> to download files before determining the filename to replace the URI with,
>
> Yes, that's the right way to go about it.

Do you mean to say that you think that when CSS is being fixed up, FixupURI should block and not return until the original URI has been downloaded? Wouldn't forcing all the downloads to be made in serial like that have an adverse impact on performance?

Also, by "tests," do you mean automated tests, or just manually-verifiable testcases? I have no idea how to automate testing of webbrowserpersist...
Comment 106 Nickolay_Ponomarev 2007-09-27 03:38:51 PDT
Re: testing. I had a patch in bug 366072 to make use of reftest for WBP testing, but it bit-rotted since then and I didn't get to update it.
Comment 107 Boris Zbarsky [:bz] 2007-09-30 21:08:35 PDT
> Do you mean to say that you think that when CSS is being fixed up, FixupURI
> should block and not return until the original URI has been downloaded?

How do we handle <img> right now?  We should handle CSS the same way, basically.

> Also, by "tests," do you mean automated tests

Yes, ideally automated tests.  But even manually-verifiable ones would be a start....

Comment 108 Boris Zbarsky [:bz] 2007-10-14 13:01:07 PDT
Comment on attachment 282460 [details] [diff] [review]
mostly complete patch (actually not nearly complete)

Per comment 104, someone else should review this...
Comment 109 Ginn Chen 2007-11-20 05:17:42 PST
*** Bug 358709 has been marked as a duplicate of this bug. ***
Comment 110 Jeremy Baron 2007-12-20 06:42:28 PST
*** Bug 409200 has been marked as a duplicate of this bug. ***
Comment 111 sweb 2008-01-15 22:33:59 PST
just test the wikipedia.org, the master css file that include another css file.

@import url('./anothercss.css');
@import url('./anothercss-two.css');

their's also must be save with the images and another css included.
Comment 112 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2008-01-15 23:05:23 PST
Comment on attachment 282460 [details] [diff] [review]
mostly complete patch (actually not nearly complete)

On the principle that it's better off in somebody's review queue than nobody's, adding this to my review queue, although I also probably won't get to it for a bit.
Comment 113 :Gavin Sharp [email: gavin@gavinsharp.com] 2008-03-09 00:04:00 PST
*** Bug 421711 has been marked as a duplicate of this bug. ***
Comment 114 Stephen Donner [:stephend] 2008-03-09 01:34:19 PST
Just FYI; added this bug # to https://litmus.mozilla.org/show_test.cgi?id=3952 for easier reference.
Comment 115 Viktoriia Bogdanova 2008-03-17 03:29:08 PDT
Defect is still reproducible on version 3.0b5pre. Firefox doesn't save background image in css style when using save page as, web page complete.
Comment 116 Viktoriia Bogdanova 2008-03-17 08:33:30 PDT
Defect is reproducible on version 3.0b5pre (XP, Vista, Mac OS X 10.4). Firefox doesn't save background image in css style when using save page as, web page complete.
Comment 117 Brian Polidoro 2008-05-19 08:27:05 PDT
*** Bug 434480 has been marked as a duplicate of this bug. ***
Comment 118 Danny 2008-07-29 04:56:34 PDT
Bug is still there on the latest nightly (Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9.1a2pre) Gecko/2008072803 Minefield/3.1a2pre).

BTW, bug 293834 was recently fixed. Will it help to fix that bug? All the blockers are gone now.
Comment 119 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2008-08-08 17:40:15 PDT
Adding qawanted keyword to reflect the request for automated tests for this.
Comment 120 Phil Ringnalda (:philor, back in August) 2008-08-18 15:44:59 PDT
*** Bug 451109 has been marked as a duplicate of this bug. ***
Comment 121 Brian Polidoro 2008-12-04 07:17:55 PST
*** Bug 467906 has been marked as a duplicate of this bug. ***
Comment 122 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2008-12-26 12:52:54 PST
So I've been thinking about this a little bit, and I'm wondering how tolerable the loss of accuracy in the CSS is.  In particular, we lose properties we don't support, conditional comment hacks, etc.

An alternative approach might involve re-tokenizing the CSS but not parsing it, and then fixing up all the url() functions.  That's not trivial either, though, given that we have some context-sensitive tokenization, especially for URLs.

I guess this approach probably is ok, though.  But in the long run I think "Save Page As, Complete" is broken-by-design, and if we want to save complete Web pages we ought to support saving them into some package format.
Comment 123 Alessandro Curci 2008-12-29 01:21:48 PST
A cross-browser package format standard would be more than welcome, but in the meantime what about integrating scrapbook's (https://addons.mozilla.org/en-US/firefox/addon/427) excellent abilities to save complete web pages into firefox ?
Comment 124 Tracy Walker [:tracy] 2009-01-21 08:11:12 PST
*** Bug 474424 has been marked as a duplicate of this bug. ***
Comment 125 Henrik Skupin (:whimboo) 2009-07-31 15:16:34 PDT
Will we have any progress on this bug in the near future?
Comment 126 Tim Riley [:timr] 2009-07-31 15:37:13 PDT
Created attachment 391979 [details]
Firefox start page before the Save Page As....
Comment 127 Tim Riley [:timr] 2009-07-31 15:40:05 PDT
My understanding is that this affects saving our current home page

http://www.google.com/firefox?client=firefox-a&rls=org.mozilla:en-US:official

I attached the before and after shots for the "Save Page..." as web page
Complete.

This is on Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.13)
Gecko/2009073022 Firefox/3.0.13

STR:
1) Install firefox
2) See start page
3) Select File -> Save Page As...
4) Make sure "save as type:" is "Web Page, complete"
5) save it to  some convenient place like your desktop
6) open the file on your desktop, called firefox by default.
Comment 128 Tim Riley [:timr] 2009-07-31 15:42:07 PDT
Created attachment 391980 [details]
Saved firefox page after "Save Page As..." - missing elements
Comment 129 Ben Karel [eschew] 2009-07-31 17:18:38 PDT
I won't have time to work on this in the near future, so reverting to default owner.
Comment 130 Brian Polidoro 2009-08-05 14:03:03 PDT
*** Bug 508591 has been marked as a duplicate of this bug. ***
Comment 131 Robert Kaiser 2009-08-06 18:00:34 PDT
*** Bug 349114 has been marked as a duplicate of this bug. ***
Comment 132 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2009-08-19 16:23:31 PDT
Created attachment 395446 [details] [diff] [review]
mostly complete patch, mostly merged to trunk

This is mostly merged to trunk (I haven't tried compiling yet, though).  The aSerializedCloneKids change appears to have landed already.  I merged the URI serialization code in nsCSSDeclaration.  But I still need to add a replacement for the code that was patching TryBackgroundShorthand in the old patch.
Comment 133 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2009-08-19 17:07:11 PDT
So now that I'm attempting to compile this, I'm having trouble figuring out how *any* of the patches in this bug ever compiled (well, I can see how the first patch could compile... but it wouldn't link).  They both have code in nsCSSStyleSheet.cpp that calls a Serialize method on either nsICSSRule or nsICSSStyleRule.  In the first patch it was declared pure virtual on nsICSSRule but never implemented; in the second patch it's neither declared nor implemented.

And it makes far more sense to me for that method to be on nsICSSRule.  The current serialization code simply omits things like @media rules, @namespace rules, etc.


So I'd thought this patch was really ready for review, but it seems like it has some pretty major gaps in it.

Or were there changes you had in your tree that you just didn't include in your diff?
Comment 134 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2009-08-19 17:10:26 PDT
Also, the patch renames nsEncoderNodeFixup to nsEncoderFixup in its implementation, but doesn't touch its declaration... which also clearly isn't going to compile, unless, again, many entire files are missing from the patch.
Comment 135 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2009-08-19 18:04:32 PDT
Created attachment 395474 [details] [diff] [review]
more merged to trunk

I redid a bunch of the guts of the CSS changes to be much more thorough, and thus hit all CSS properties with URLs in them.  (Though it still doesn't hit downloadable fonts, but could once the issue below is fixed.)

I also filled in most of the missing pieces of code.  However, the CSS rule serialize implementations are just "FIXME: WRITE ME".
Comment 136 Jo Hermans 2009-09-27 03:41:59 PDT
*** Bug 519084 has been marked as a duplicate of this bug. ***
Comment 137 Phil Ringnalda (:philor, back in August) 2009-11-05 12:49:44 PST
*** Bug 526823 has been marked as a duplicate of this bug. ***
Comment 138 Matthias Versen [:Matti] 2009-11-24 08:38:13 PST
*** Bug 530801 has been marked as a duplicate of this bug. ***
Comment 139 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2010-03-02 11:52:04 PST
Comment on attachment 282460 [details] [diff] [review]
mostly complete patch (actually not nearly complete)

Marking review- because this wasn't actually anywhere near complete.  (And see also other comments above.)
Comment 140 Kevin Brosnan 2010-09-01 06:14:33 PDT
*** Bug 592642 has been marked as a duplicate of this bug. ***
Comment 141 Cork 2011-05-23 22:49:20 PDT
*** Bug 659230 has been marked as a duplicate of this bug. ***
Comment 142 Simona B [:simonab] 2011-07-05 06:22:38 PDT
Issue is still reproducible on the latest nightly:
Mozilla/5.0 (Windows NT 5.1; rv:7.0a1) Gecko/20110704 Firefox/7.0a1
Comment 143 Phoenix 2011-10-29 12:30:52 PDT
Confirming on latest Seamonkey build
Build identifier: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0a1) Gecko/20111017 Firefox/10.0a1 SeaMonkey/2.7a1
Comment 144 Lain_13 2011-10-30 11:52:42 PDT
Additional 1.5 months and we will be able to celebrate 10th birthday of this bug. ^___^
Comment 145 Vladimir 2011-11-11 10:19:04 PST
Devs don't save pages at all as I see. 
They don't pay attention to such important bugs as this bug and Bug 653522 - saving pages when offline (or using adblock) inundates the user with error dialogs:

https://bugzilla.mozilla.org/show_bug.cgi?id=653522
Comment 146 Anonymous 2012-07-28 20:00:01 PDT
Why are major bugs in core functionality like this not fixed when instead we get garbage like microsummaries, geolocation, and DNS prefetching?
Comment 147 a_geek 2012-08-03 16:53:08 PDT
Wrt. comment#146: It's likely the problem that you and I don't pay the devs to do so, and the people who do, see more value in a read-only web and user tracking. It's also usually much easier to make something new than fix old bugs.
Comment 148 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2012-08-03 17:13:17 PDT
The problem is really that there's no good approach here with the current "Save As, Complete" model.  If we fix this, we'll break the CSS working in other browsers.

A better solution would be replacing Save As, Complete (which attempts to rewrite the pages to fix up URLs) with a cross-browser package format for saved documents (which could save resources as they were).
Comment 149 Anonymous 2012-08-04 09:52:05 PDT
(In reply to David Baron [:dbaron] from comment #148)
> If we fix this, we'll break the CSS working in other browsers.
Only for hacks and browser-specific properties. This is not a big deal. The same problem exists anyway for things like IE's conditional comments, or any kind of user-agent selectivity on the server, or indeed any other type of hack or browser-specific feature.

But obviously it's SO much better with the current situation that it works in no browsers at all!

Are you seriously more worried about browser-specific CSS features, which are probably expected to degrade gracefully anyway, than, oh I don't know, background-image!?

> A better solution would be replacing Save As, Complete (which attempts to
> rewrite the pages to fix up URLs) with a cross-browser package format for
> saved documents (which could save resources as they were).
And you'd STILL need to rewrite the URLs. Come on, give it half a thimble of thought.
Comment 150 neil@parkwaycc.co.uk 2012-08-04 09:58:41 PDT
(In reply to Anonymous from comment #149)
> (In reply to David Baron from comment #148)
> > A better solution would be replacing Save As, Complete (which attempts to
> > rewrite the pages to fix up URLs) with a cross-browser package format for
> > saved documents (which could save resources as they were).
> And you'd STILL need to rewrite the URLs.
Let's suppose for the sake of argument that we used IE's .mht format. It doesn't rewrite the URLs. Instead, each entry in the file is associated with its original URL. (What I don't know is whether it would be possible to achieve this scheme in Gecko.)
Comment 151 Anonymous 2012-08-04 11:46:20 PDT
(In reply to neil@parkwaycc.co.uk from comment #150)
>Instead, each entry in the file is associated with its original URL.
Oh I see. Fair enough. But unless anyone thinks that the feature to save a web page as individually manipulable files should be completely removed, the bug is still not avoidable, and the URLs still need rewriting.
Comment 152 Daniel Glazman (:glazou) 2012-08-04 12:17:14 PDT
Just for the record, I do have code in BlueGriffon achieving a full rewrite
of all stylesheets attached to a given document, tweaking all URIs.
I needed it to be able to turn an arbitrary document into a reusable
well-packaged template. That code is based on my parser/serializer JSCSSP.
In other terms, a full rewrite is doable but requires considerable complexity
and footprint...
Comment 153 Vladimir 2012-08-04 12:52:04 PDT
(In reply to David Baron [:dbaron] from comment #148)
> A better solution would be replacing Save As, Complete (which attempts to
> rewrite the pages to fix up URLs) with a cross-browser package format for
> saved documents

Modern pages have very big js files - 500kb or 1mb per page. I delete js files from folders of saved pages sometimes. And a lot of reasons to open folders of pages. And I will never use the browser with only archive format for save as.

Please don't add reasons not to use Gecko browsers. (Now I use SeaMonkey 1.1.19 as main browser only because of regression in save as function, which was developed by kind developers)) Bug 653522)
Comment 154 Emin 2012-08-06 02:01:29 PDT
(In reply to David Baron [:dbaron] from comment #148)
> A better solution would be replacing Save As, Complete (which attempts to
> rewrite the pages to fix up URLs) with a cross-browser package format for
> saved documents (which could save resources as they were).

I use "Save As, Complete" to save pages complete to examine a source code. So I need a working local copy of the page, not just a package.
Comment 155 dotnetCarpenter 2012-08-06 06:26:28 PDT
I agree with Emin, save the source as seen in view-source and rewrite urls to resources to a local folder next to the saved page. multiple request are definitely allowed. no reason to get any performance on this task as it is user invoked and the expectation is that the current document is saved as-is for various reasons, for later use. Not as firefox reads the document, nor as quickly as possible or to consume least space as possible - there is save document (not complete) for that.
Comment 156 Lain_13 2012-08-06 08:04:09 PDT
Actually it's not enough. Complete save must save page as it is visible with all elements inserted by ajax (like comments) and if some elements are missing then they should remain missing (like advertisements with active adblock). Even all additional user CSS applied to the elements on the page must be preserved as-is.
Otherwise it is not complete.
I mean it have to crawl DOM and generate page from it instead of saving page's source with rewritten links.
I do agree there is no reason for fast performance but when you open such page it must resemble page which you have saved as much as possible. It could not be considered as complete if it will download anything from the web on open. Also I don't think preserving interactivity is expected behavior in this case. Users usually saves page to be able to read it later or send someone to read. So, it must be preserved as he seen it when decided to save it.

For investigation purposes there is separate applications like extension "DownThemAll!". You can configure how exactly it will save things and how deep will it go. It does even more then Emin requests. It not only makes local copy of the page - it could make local copy of the whole site or decent part of it.
Comment 157 dotnetCarpenter 2012-08-06 16:42:58 PDT
@Lain_13 Do you mean something like PhantomJS for Gecko?
http://stackoverflow.com/questions/5490438/phantomjs-and-getting-modified-dom
Comment 158 Lain_13 2012-08-06 17:13:57 PDT
Not exactly but kind of. In your example that code generates a screenshot of the site while complete copy of the page have to remain as text and linked objects to let you scale it, select, copy and everything else. The only thing which shouldn't work is interactivity. I mean scripts must not work.
BTW, Scrapbook does such a thing already. I'd just like to see ability to make complete stand-alone static copy of the page in the Fx itself.
Comment 159 dotnetCarpenter 2012-08-08 06:44:52 PDT
Well, the example also does that. As does this: http://code.google.com/p/phantomjs/wiki/QuickStart#DOM_Manipulation but it might not be so clear.
This is the interesting bit:
var page = require('webpage').create(),
    url = 'http://lite.yelp.com/search?find_desc=pizza&find_loc=94040&find_submit=Search';

page.open(url, function (status) {
    if (status !== 'success') {
        console.log('Unable to access network');
    } else {
        var html = page.evaluate(function() {
            return document.innerHTML; // <- get all generated and styled HTML
        });
        console.log(html);
    }
    phantom.exit();
});
There is a similar project for Gecko called offscreen but I can't see how it's possible to get the generated HTML with this though.
Project Page: http://offscreengecko.sourceforge.net/
Source code: http://hg.mozilla.org/incubator/offscreen/
Comment 160 Andrei Boros 2012-08-09 04:12:59 PDT
(In reply to David Baron [:dbaron] from comment #148)
> The problem is really that there's no good approach here with the current
> "Save As, Complete" model.  If we fix this, we'll break the CSS working in
> other browsers.
> 
> A better solution would be replacing Save As, Complete (which attempts to
> rewrite the pages to fix up URLs) with a cross-browser package format for
> saved documents (which could save resources as they were).

Consider how some browsers are closed source and written to the interests of their owners and how they implement either CSS or JS compared to the standards ... duh

On the other hand, I would strongly vote against such a method, one reason being the examination of various resource files related to a page. 

I use an extension called Web Developer Toolbar. This extension has a neat feature called "view generated source". This contains all content generate/modified by JS as displayed. Based on this one could easily reference each resource file (image/css/etc) for saving. 

Also, saving locally a webpage, in principle, does not imply that the entire webpage functionality will be preserved for the simple reason that some features require online interractivity with the source server, and this is impossible to replicate in a local save and offline viewing scenario.
Comment 161 Matthias Versen [:Matti] 2013-03-18 10:40:02 PDT
*** Bug 852007 has been marked as a duplicate of this bug. ***
Comment 162 Dražen Lučanin 2013-04-16 15:01:48 PDT
It would be nice to have that feature back. I also like saving pages for offline inspection later ("coding on the lake" and that sort of thing :).
Comment 163 Dragoș Valentin, Rădulescu 2013-04-16 15:23:46 PDT
This bug report/ticket exists for eleventh years. 
Is there really no one interested in fixing this bug.......?
Comment 164 Michal Čaplygin [:myf] 2013-05-03 07:46:01 PDT
Some remarks regarding creation of 'faithful offline snapshot of displayed page' from browser:
- Aforementioned "Save complete" addon [0] seems no longer maintained.
- There is "Mozilla Archive Format" addon [1][2] which promotes 'Faithful save system' or 'Exact snapshot' feature, seems to work quite well. Maybe worth checking.
- (Parity) Opera browser is able to successfully save page with imported resources as well.

[0] https://addons.mozilla.org/en-US/firefox/addon/save-complete-4723/
[1] http://maf.mozdev.org/ 
[2] https://addons.mozilla.org/en-US/firefox/addon/mozilla-archive-format/
Comment 165 Mike Kaply [:mkaply] 2013-06-28 22:05:19 PDT
I apologize for the "me too" voice here.

dbaron, Are you saying that this bug is simply not fixable?

> If we fix this, we'll break the CSS working in other browsers.

But right now the CSS doesn't work in any browser. What would be the harm in just making it work in Firefox?
Comment 166 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2013-06-28 22:22:12 PDT
Well, it could break things that aren't broken today.  Though now that prefixes are becoming less common that would be less of an issue, so I'm less worried about it than I was a few years ago.

Alternatively, we could do tokenization-level fixup, which actually might not be that bad, except it's a litte tricky for @import which takes strings in addition to url()s.
Comment 167 dotnetCarpenter 2013-06-29 09:22:34 PDT
I feel inclined to develop a Firefox Add-On for this. To me, this seems overly simple.
1. Just take the download tree (e.g. as seen in Firebug ect.) and redownload it all.
2. Save all files to a folder as is (which is, if I remember correct, the current behavior).
3. Parse all css/js files and replace URI's to local path.
4. Have heaps of fun with script loaders (AMD et al.) making NO. 3 difficult.

There you have it. The page, saved in it's entirety.
Comment 168 Mike Kaply [:mkaply] 2013-07-17 14:20:39 PDT
Actually, the MAF add-on (https://addons.mozilla.org/en-us/firefox/addon/mozilla-archive-format/) does appear to do exactly that.

If you have it installed, Firefox Save complete page works much better.
Comment 169 dotnetCarpenter 2013-11-21 12:59:46 PST
@mkaply Yep, "Faithful to the original" is all I need.
Thanks
Comment 170 Ioana (away) 2013-12-13 00:02:27 PST
Removing "qawanted" since the need for an automated test is marked by "in-testsuite?".
Comment 171 Witek 2015-02-02 14:49:19 PST
Hello I'd like to be assigned to this bug please assign me.
Comment 172 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2015-02-02 15:07:20 PST
Roughly what steps are you planning to take to fix this?
Comment 173 Witek 2015-02-02 15:12:23 PST
[Blocking Requested - why for this release]:
Comment 174 Witek 2015-02-02 16:04:06 PST
I read the whole thread, and since I'm new to the project i try implement solution proposed by the dotnetCarpenter. What do you think David?
Comment 175 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2015-02-02 16:26:24 PST
Do you mean comment 167?  That sounds like a proposal to rewrite the entire "Save As, Complete" feature, of which this bug (fixing up URIs in the CSS) would be just a small part.  But it doesn't tell me anything about what you plan to do for the part covered by this bug:  fixing up URIs in the CSS.
Comment 176 WBT 2015-04-29 13:33:08 PDT
Created attachment 8599503 [details]
Bugzilla main page test case- original
Comment 177 WBT 2015-04-29 13:36:03 PDT
Created attachment 8599504 [details]
Bugzilla main page test case- after save

This 13-year-old bug in basic functionality is surprising - why is this still broken?  
I've attached a very local test demonstration from the Bugzilla main page, to help illustrate its status today.
Comment 178 Zéfling 2015-04-29 15:19:45 PDT
UnMHT is on GPL licence ( https://addons.mozilla.org/fr/firefox/addon/unmht/license/7.3.0.5 ), is it not possible to use a part of code ? UnMHT is for me a great tool for save a complete page : HTML / CSS / JS / media
Comment 179 David Bourguignon 2015-10-13 16:20:49 PDT
Hi all, here are my two cents... I hope it helps!

To sum up:
- I tried to use the "save as, complete web page" with https://slate.adobe.com/a/NzR3A/ and as a result background images and other CSS effects were missing.
- I tried to save this web page with https://addons.mozilla.org/en-us/firefox/addon/mozilla-archive-format/ in either MAF or MHT formats, and it failed (only the first background image and text were displayed).
- I tried to save this web page with https://addons.mozilla.org/fr/firefox/addon/unmht/ in MHT format and it worked perfectly!
 
In conclusion: a million thanks to UnMHT developers! :-)
Comment 180 Dan Jacobson 2015-12-12 06:41:44 PST
*** Bug 1232103 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.