Save Page As (complete) mangles ampersand sequences




14 years ago
14 years ago


(Reporter: martin.thomson, Assigned: bugs)



Windows 2000
Bug Flags:
blocking-aviary1.0 +

Firefox Tracking Flags

(Not tracked)




(1 attachment)



14 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040803 Moonpartridge/0.9.3 (Firefox/0.9.3 polymorph)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040803 Moonpartridge/0.9.3 (Firefox/0.9.3 polymorph)

While this may be related to 181456, I don't think that it is exactly the same.

The Save Page As... [Web Page, complete] function turns a series of
    into a series of Â/#194 characters (the actual
character, not the ampersand entity) separated by spaces.  Consequently the page
looks bad when it is loaded back up in Firefox.

I think that this may be a consequence of an encoding scheme mismatch (the local
filesystem comes up with ISO-8859-1, whereas the W3 website uses UTF-8).

Reproducible: Always
Steps to Reproduce:
1. Load a webpage that has a long sequence of     (the URL
provided is a case where I noticed the problem)
2. File -> Save Page As...
3. Choose Web Page, complete as the file type
4. Save the file.
5. Load the file back in the browser and observe the ickyness.

Actual Results:  
The sequence of "          "
looks like "A A A A A" (except the Acirc character instead).

Expected Results:  
I don't mind that the save as function loses the &xxx; character sequences when
it saves the file, as long as it saves the correct character.  Ideally though,
any unusual characters should be saved using the &#xxx; sequence to
opportunitythe chance for such errors to occur.

Now this could be a critical bug for two reasons, but I'll let someone else
decide that.
1. It causes data loss (but it does have a workaround, albeit a painful and
tedious one.
2. It should just work - this is not an advanced feature and the impact on users
who see such a bug is fairly negative.

Comment 1

14 years ago
On your example URL, on which line appears this long "nbsp;..." string in the
source code? I can't find it.
This is basically like bug 119146.  I wonder why that patch doesn't help here..
Well, I have made some testcases. They might be useful (or not):
(I don't have time right now to test and compare them)

Comment 4

14 years ago
Martijn's test cases show that this is only a problem when only the HTTP headers
report the charset.  If the charset is reported in the <meta ...> tags, the page
displays correctly.  The &xxx; sequences are still modified though.

It would appear that the suggestion at the end of bug 119146 should be adopted.
> It would appear that the suggestion at the end of bug 119146 should be adopted.

There's a suggestion there on how to fix the bug?

Again, the fix in that bug should have handled "&nbsp;" (that's what that bug
was about).  It was fixed.

I just realized that reporter is using Firefox, though (and damn extensions that
hide the fact).  And firefox has forked all the download code, and the fix for
bug 119146 was NOT ported to firefox.

Over to firefox; this works fine in Mozilla builds.
Assignee: file-handling → bugs
Component: File Handling → File Handling
Ever confirmed: true
Flags: blocking-aviary1.0?
Product: Browser → Firefox
QA Contact: ian → bmo
Flags: blocking-aviary1.0? → blocking-aviary1.0+
Created attachment 160463 [details] [diff] [review]

add addtl encoding flag for rich text saves.
br & trunk fixed. 
Last Resolved: 14 years ago
Resolution: --- → FIXED

Comment 8

14 years ago
looks like it has been fixed on the branch, so could somebody please add the 
fixed-aviary1.0 keyword?
(i know this is trivial but it would be useful for querying the overall number
of remaining bugs)


14 years ago
Keywords: fixed-aviary1.0
You need to log in before you can comment on or make changes to this bug.