Closed Bug 119146 Opened 24 years ago Closed 23 years ago

Incorrect Save As - Web Page, Complete saves (may be, because of national charset)

Categories

(Core Graveyard :: File Handling, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.4final

People

(Reporter: atmjav, Assigned: rbs)

References

Details

Attachments

(1 file)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9.7+) Gecko/20020109 BuildID: 2002010903 Incorrect Save As - Web Page, Complete saves (may be, because of national charset). Original: -----8<------------------------------------------------------ Ì&nbsp;å&nbsp;ä&nbsp;ë&nbsp;à&nbsp;é&nbsp;í&nbsp;.&nbsp;Ð&nbsp;ó&nbsp;&nbsp; -&nbsp;&nbsp; Ð&nbsp;Î&nbsp;Ñ&nbsp;Ñ&nbsp;È&nbsp;É&nbsp;Ñ&nbsp; &nbsp;È&nbsp;É&nbsp;&nbsp; Á&nbsp;È&nbsp;Î&nbsp;Ì&nbsp;Å&nbsp;Ä&nbsp;È&nbsp;Ö&nbsp;È&nbsp;Í&nbsp;Ñ&nbsp; &nbsp;È&nbsp;É&nbsp;&nbsp; Æ&nbsp;Ó&nbsp;Ð&nbsp;Í&nbsp;À&nbsp;Ë -----8<------------------------------------------------------- Saved: -----8<------------------------------------------------------- Ìšåšäšëšàšéšíš.šÐšóšš -šš КΚњњȚɚњ šÈšÉšš ÁšÈšÎšÌšÅšÄšÈšÖšÈšÍšÑš šÈšÉšš ÆšÓšÐšÍšÀšË -----8<------------------------------------------------------- Reproducible: Always
In that page all "&nbsp;" was saved as "½".
Added to tracking bug.
Blocks: 115634
try http://www2.medline.ru/ On that page... Original: <form name=searchForm1 method=get action="http://atlant.ownnet.ru/cgi-bin/s.cgi"> <input type=hidden name=tmpl value="m2"> Saved: <form name="searchForm1" method="get" action="http://atlant.ownnet.ru/cgi-bin/s.cgi"></form>
No longer blocks: 115634
Blocks: 116757
now really readding to tracking bug
Blocks: 115634
No longer blocks: 116757
Could someone attach an actual HTML page (using http://bugzilla.mozilla.org/attachment.cgi?bugid=119146&action=enter) that demonstrates the bug? Does one need to save "Web page, complete" or does the bug also appear for "HTML Only" saves?
Yes, for HTML only too... but in the other way... not so hard. Try to Save As http://www2.medline.ru/
OK. With Linux build 2002-01-07-06, I do "Save Page, Complete" on that URL, then open the saved HTML in Mozilla. The saved HTML renders identically to the HTML on the original site.
I DO NOT say it renders different. I say about differences in page code.
Confirming on Linux. All &nbsp; entities are converted into non-ascii characters of some sort (based on the page encoding, most likely). I'd attach a minimal testcase, but I can't seem to save any small files due to bug 116757. This is probably a back end bug...
Assignee: trudelle → adamlock
Status: UNCONFIRMED → NEW
Component: XP Apps → File Handling
Ever confirmed: true
OS: Windows 98 → All
Hardware: PC → All
Bug 110135 will allow alternative charsets to be specified during saving
Depends on: 110135
The issue with input elements disappearing etc. appears to be because the HTML in the page is dubious - a form and form fields appearing directly inside underneath a table element uncontained by tr/td elements). The parser is probably throwing away the bustage so the DOM doesn't contain the faulty elements and so they are not there to be saved out again. The issue with &nbsp; being converted might be fixed by passing nsIWebBrowserPersist::ENCODE_FLAGS_ENCODE_ENTITIES, though I will have to confirm this.
Priority: -- → P3
Target Milestone: --- → mozilla0.9.9
Specifying ENCODE_FLAGS_ENCODE_ENTITIES does not appear to fix the problem. Back to the drawing board.
Target Milestone: mozilla0.9.9 → mozilla1.0.1
Blocks: 142490
*** Bug 143751 has been marked as a duplicate of this bug. ***
No longer blocks: 142490
*** Bug 142490 has been marked as a duplicate of this bug. ***
*** Bug 138438 has been marked as a duplicate of this bug. ***
QA Contact: sairuh → petersen
Changing target milestone to 'Future' since 'mozilla1.0.1' came and went already.
Target Milestone: mozilla1.0.1 → Future
*** Bug 203630 has been marked as a duplicate of this bug. ***
*** Bug 202418 has been marked as a duplicate of this bug. ***
*** Bug 200232 has been marked as a duplicate of this bug. ***
*** Bug 206407 has been marked as a duplicate of this bug. ***
Attached patch proposed fixSplinter Review
[BTW, is nsProgressDlg.js still used?]
-> taking. re: comment #12. The reason why it didn't work before was because of a limitation of the serializer. I fixed that a while back (as part of some other work in View Selelection Source).
Assignee: adamlock → rbs
Comment on attachment 123810 [details] [diff] [review] proposed fix asking r/sr
Attachment #123810 - Flags: superreview?(heikki)
Attachment #123810 - Flags: review?(adamlock)
nsProgressDialog is alive and well in SeaMonkey if you switch the right pref in preferences. I fervently hope this is also the case in Firebird, given the slew of bugs in the download manager.
Haven't reviewed this yet, but the comment in nsIWebBrowserPersist.idl says that the following entities will be escaped if that flag is set: &nbsp; &amp; &lt; &gt; &quot; However, nbsp is not defined in XML and therefore it should not be serialized like that. (I think we could actually store what XML entities were defined, but that would be a different bug.)
The comment is probably wrong if some encoders do not support this behaviour. These flags on the persist object are just public values that are mapped onto the equivalents on nsIDocumentEncoder. So it would be up to the document encoder to determine what to do if anything with this flag and presumably the XML encoder wouldn't insert nbsp entities. I recall it did a very long time ago, but that it was fixed.
Comment on attachment 123810 [details] [diff] [review] proposed fix r=adamlock.
Attachment #123810 - Flags: review?(adamlock) → review+
re: comment #25 yeah, the IDL blurb doesn't tell the full story (my fault, BTW). It depends on the encoder (serializer). The flag will be understood in the HTML world. But in the XML world, The XML serializer restricts to what it knows as comment 26 says. [Seems that serializing XHTML might need a special-case at some stage so that it understand these flags too.]
Attachment #123810 - Flags: superreview?(heikki) → superreview+
Comment on attachment 123810 [details] [diff] [review] proposed fix asking a= for this simple patch to turn on a flag that triggers the output of basic entities such as &nbsp; when doing "Save As - Web Page, Complete".
Attachment #123810 - Flags: approval1.4?
Comment on attachment 123810 [details] [diff] [review] proposed fix a=asa (on behalf of drivers) for checkin to 1.4
Attachment #123810 - Flags: approval1.4? → approval1.4+
Checked in. [Some people (e.g., bug 138438) may ask for more entities such as accented letters. There are other flags that could be used to get that if needed.]
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Target Milestone: Future → mozilla1.4final
This is not fixed completely. When saving &uuml; entities, they are converted to some binary stuff. Viewing the saved files then switches encoding to Japanese, because it thinks that it's some utf 16 encoded stuff. Switching back to UTF 8 shows the characters correctly. Requesting REOPENING and complete fix (that is, save all &xxx; entities as is).
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: