Closed Bug 162239 Opened 22 years ago Closed 22 years ago

POST document could not inherit charset from previous page if the previous charset is from autodetection.

Categories

(Core :: Internationalization, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla1.0.1

People

(Reporter: kazhik, Assigned: shanjian)

References

Details

(Keywords: intl, regression, topembed, Whiteboard: [adt1 RTM] [ETA 08/21])

Attachments

(1 file, 5 obsolete files)

Auto-detection doesn't work for POST document.

Steps to reproduce:
(1) Set Default Character Coding to EUC-JP or ISO-2022-JP
 and turn on Japanese auto-detection.
(2) Access to <http://www.jra.go.jp/> and select "HTML".
(3) Push one of the five button in the left, "Shutsuba-hyo",
 "Kyousou-seiseki"...

Actual result: The next page is displayed as garbage.

Expected result: Auto-detection should work.


Original report in Bugzilla-jp:
http://bugzilla.mozilla.gr.jp/show_bug.cgi?id=2500
Keywords: intl
QA Contact: ruixu → ylong
This is a regression and a serious bug for Non-English users.
It seems that this problem has arisen after 1.1 Beta.

This should be All/All because we confirm this on Windows, MacOS,
Linux, FreeBSD.

When there is no charset in HTTP header nor <meta> charset,
Auto-detect should work even if it is the case of POST.
Confirm it happens on all platforms, N7.0PR1 doesn't has this problem, and base
on the comment above, the regression is after mozilla 1.1 Beta.

Nominate as nsbeta1.
Keywords: nsbeta1, regression
OS: Windows 2000 → All
Hardware: PC → All
In fact, this regression is between 06-12 and 06-13 branchm which is I guess
caused by fixed of bug 143579/bug 144897.

And there is another bug 156824, I think it's the same problem.
I talked to yuying, and here is my understanding. We really need to disable
autodetection for posted page, that is very important and we won't back out that
change. The problem here is that the posted page does not inherit the charset
from its submittee. I'll take a look to see what can we do. 
Assignee: yokoyama → shanjian
Summary: Auto-detection doesn't work for POST document → POST document could not inherit charset from previous page if the previous charset is from autodetection.
This should be evluated as a show stopper for MachV, and major embedors. Marking
it   as [ADT1 RTM].
Blocks: 143047
Keywords: nsbeta1nsbeta1+, topembed
Priority: -- → P1
Whiteboard: [adt1 RTM] [ETA Needed]
Target Milestone: --- → mozilla1.0.1
The current summary was changed to:
POST document could not inherit charset from previous page if the previous
charset is from autodetection.

Actually, it's not only for the previous page charset is from auto-detection but
also previous page with charset meta-tag while the submission result page has no
meta-charset, see bug 156824.
Whiteboard: [adt1 RTM] [ETA Needed] → [adt1 RTM] [ETA 08/20]
Attached patch patch (obsolete) — Splinter Review
Explaination to the problem and fix:
For POST document, charset autodetection is disabled and charset of submitting
document should be used. The only charset we carry over from old doc to new doc
is default charset in the existing code. Clearly it is not a good idea to change
default charset, otherwise it will affect all pages loaded after that point. We
have hintCharset in nsIMarkupDocumentViewer which is not used for anything right
now. My solution is set this hint charset based on submitting document, and
utilized this hint during the new document load. 
Status: NEW → ASSIGNED
Comment on attachment 95909 [details] [diff] [review]
patch

sr=jst
Attachment #95909 - Flags: superreview+
This is a pretty bad regression. We should definitely mark this one as a show
stopper.
In the new patch, I lower the priority of inherited charset to meta charset and
http header charset. We already have some code there to prevent meta charset
from reloading a POST page. This way we can still honor meta charset and http
header charset if it is different from form document. 
Comment on attachment 95960 [details] [diff] [review]
update patch which lower the priority of hint charset.

r=jkeiser, but two comments:

(a) please move nsCOMPtr<nsIDocument> document; into the if() right before you
use it (to reduce its scope)

(b) you don't need to declare hintCharset anymore.

This patch looks good :)
Attachment #95960 - Flags: review+
Attached patch update take john's comment. (obsolete) — Splinter Review
Attachment #95909 - Attachment is obsolete: true
Attachment #95960 - Attachment is obsolete: true
Comment on attachment 95969 [details] [diff] [review]
update take john's comment.

carry over review.
Attachment #95969 - Flags: superreview+
Attachment #95969 - Flags: review+
Attachment #95969 - Flags: superreview+
Attachment #95969 - Flags: superreview+
Comment on attachment 95969 [details] [diff] [review]
update take john's comment.

sr=dveditz
Comment on attachment 95969 [details] [diff] [review]
update take john's comment.

A few nits:

- In nsDocShell.cpp:

+	     if (document) {
+	       nsAutoString docCharset;
+	       document->GetDocumentCharacterSet(docCharset);
...

This file uses 4-space indentation, so use 4 spaces here...

- In nsHTMLDocument.cpp:

@@ -955,7 +970,15 @@
   // TryHintCharset and TryParentCharset. It should be always safe to try more 
   // sources. 
   if (! TryUserForcedCharset(muCV, dcInfo, charsetSource, charset)) {
-    TryHintCharset(muCV, charsetSource, charset);
+    //check if current doc is from POST command

Add a space after the '//' to match other comments here.

sr=jst
A problem was found with the current approach. The use of hintCharset conflicts
with charset autodetection. I need to figure out a new patch.
Attachment #95969 - Attachment is obsolete: true
Attachment #95969 - Flags: superreview+
Attachment #95969 - Flags: review+
Comment on attachment 96042 [details] [diff] [review]
new patch with a new defined charset field in document viewer

One more test, this patch does not seems to work in some situation.
Attachment #96042 - Attachment is obsolete: true
In which situations does the patch not work? Pls elaborate ...
Whiteboard: [adt1 RTM] [ETA 08/20] → [adt1 RTM] [ETA 08/21]
about the patch:
PrevDocCharacterSet is added to document viewer, just as other charset source,
such as forcedCharset, hintCharset and defaultCharset. This field is set when a
document get a new charset. The charset is passed around. For POST document,
this information will be referenced. 

Most of the charset is set in "nsHTMLDocument::StartDocumentLoad". There is one
exception. Parser may set charset if it found meta tag charset. We need to
record the charset in that case. That is done in
"HTMLContentSink::SetDocumentCharset". Parser notified the nsDocument about its
charset change in the same place.
Comment on attachment 96056 [details] [diff] [review]
this patch should work in all situation.

- In nsIMarkupDocumentViewer.idl:

	*/
	attribute PRInt32 hintCharacterSetSource;

+    /*
+    character set from prev document 
+	*/
+	attribute wstring prevDocCharacterSet;
+

Please fix this indentation mess, there are tabs all over this file, so be
careful.

sr=jst
Attachment #96056 - Flags: superreview+
yuying found a problem in the test build.
Still investigating.
Attachment #96056 - Attachment is obsolete: true
Attachment #96056 - Flags: superreview+
Comment on attachment 96093 [details] [diff] [review]
add sink charset updated in charset detector 

>Index: docshell/base/nsIMarkupDocumentViewer.idl

>+    /*
>+    character set from prev document 
>+	*/
>+	attribute wstring prevDocCharacterSet;
>+
> 	//void GetCharacterSetHint(in wstring hintCharset, in PRInt32 charsetSource);

string-fu nit: you should use AString instead of wstring here.	also please fix
the indentation and remove commented out function.

i haven't verified the logic of the patch itself.  someone more familar with
the
parser, docshell, and the content sink should really do that.
Comment on attachment 96093 [details] [diff] [review]
add sink charset updated in charset detector 

r/sr=darin

i spoke to shanjian via AIM, and he told me that he's fixed the indentation
problem, and we decided to hold off on converting the interface from wstring to
AString (since many other methods/attributes already use wstring).  plus, it
keeps things simpler, which is a good thing for the 1.0 branch.
Attachment #96093 - Flags: superreview+
Comment on attachment 96093 [details] [diff] [review]
add sink charset updated in charset detector 

r/sr=jst
Attachment #96093 - Flags: review+
adt1.0.1+ (on ADT's behalf) approval for checkin to the 1.0 branch, pending
Drivers' approval. pls check this in asap, then replace "mozilla1.0.1+" with the
"fixed1.0.1" keyword.
Keywords: adt1.0.1+
Pls Note: Teri Preston in QA tested a private build of the 1.0 brnach with this
fix, and verified there were "No new form control regressions found on this
build." Yuying Long, the assigned QA conatct, also verified on a private build
that patch works as designed, and found no regressions.
I carefully look at the patch and go over with shanjian about the logic. It 
looks like his patch make sense and are right. Cannot think about other issue 
now. We should land it. 
Comment on attachment 96093 [details] [diff] [review]
add sink charset updated in charset detector 

a=chofmann for 1.0.1
Attachment #96093 - Flags: approval+
we need to test the heck out of this one in branch builds as soon as this lands.
fix checked into 1.0 branch. 
Yuying's test report: 

 I have verified this build with bug 162239, the results are good. Here are the
details:

1. Non charset meta-tag in page before submission, auto-detector works proper
with different options (Universal, Japanese...), then POST, get a display
correct page which is carry over from previous page:
e.g. http://www.jra.go.jp

2. With charset meta-tag in page before submission, and no charset meta-tag in
after form submission posted page, will carry over from previous page also:
e.g. http://babel/tests/browser/form/gb2312_simpleform.html

3. Before/After form submission has different charset, the submission result
page will take it's own page charset instead of previous page charset, the
purpose for test this is there might be some western pages would possible has
two different western charsets between before and after submission.
http://babel/tests/browser/form/gb2312_shiftJis.html

4. There is no change with a related method GET in a form,  auto-detect will
keep work with this method.  While POST doesn't enable the auto-detection which
is what we intentionally to do.

5. Some other related scenarios outside a form with auto-detection, charset
carry over through links, different language auto-detect options, different
language encoding settings...etc.  No problem was found so far.

I'm comfortable with current results - the page display, marked charset, reload
condition...etc.., even though I found a separate problem (not critical) today
that has been there for a long time, and I'm going to file it as a new bug.
This patch change the way how charset is resolved for POST document. It should
have no impact on non-POST pages. When verify this bug, except the general
regression, please pay special attent to how the submitting docment get its
charset from. The POST doc's charset shouldn't be affected. 

ylong: pls verify this as fixed on the 1.0 branch tomorrow. thanks!
Verified fixed in 08-21 branch build on all platforms.  The result is same as in
comment #36.
fix checked into trunk.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Fixed verified in 09-12 trunk build / WinXP-SC.
Status: RESOLVED → VERIFIED
Blocks: 154896
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: