Closed Bug 162239 Opened 22 years ago Closed 22 years ago

POST document could not inherit charset from previous page if the previous charset is from autodetection.

Categories

(Core :: Internationalization, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla1.0.1

People

(Reporter: kazhik, Assigned: shanjian)

References

Details

(Keywords: intl, regression, topembed, Whiteboard: [adt1 RTM] [ETA 08/21])

Attachments

(1 file, 5 obsolete files)

Auto-detection doesn't work for POST document. Steps to reproduce: (1) Set Default Character Coding to EUC-JP or ISO-2022-JP and turn on Japanese auto-detection. (2) Access to <http://www.jra.go.jp/> and select "HTML". (3) Push one of the five button in the left, "Shutsuba-hyo", "Kyousou-seiseki"... Actual result: The next page is displayed as garbage. Expected result: Auto-detection should work. Original report in Bugzilla-jp: http://bugzilla.mozilla.gr.jp/show_bug.cgi?id=2500
Keywords: intl
QA Contact: ruixu → ylong
This is a regression and a serious bug for Non-English users. It seems that this problem has arisen after 1.1 Beta. This should be All/All because we confirm this on Windows, MacOS, Linux, FreeBSD. When there is no charset in HTTP header nor <meta> charset, Auto-detect should work even if it is the case of POST.
Confirm it happens on all platforms, N7.0PR1 doesn't has this problem, and base on the comment above, the regression is after mozilla 1.1 Beta. Nominate as nsbeta1.
Keywords: nsbeta1, regression
OS: Windows 2000 → All
Hardware: PC → All
In fact, this regression is between 06-12 and 06-13 branchm which is I guess caused by fixed of bug 143579/bug 144897. And there is another bug 156824, I think it's the same problem.
I talked to yuying, and here is my understanding. We really need to disable autodetection for posted page, that is very important and we won't back out that change. The problem here is that the posted page does not inherit the charset from its submittee. I'll take a look to see what can we do.
Assignee: yokoyama → shanjian
Summary: Auto-detection doesn't work for POST document → POST document could not inherit charset from previous page if the previous charset is from autodetection.
This should be evluated as a show stopper for MachV, and major embedors. Marking it as [ADT1 RTM].
Blocks: 143047
Keywords: nsbeta1nsbeta1+, topembed
Priority: -- → P1
Whiteboard: [adt1 RTM] [ETA Needed]
Target Milestone: --- → mozilla1.0.1
The current summary was changed to: POST document could not inherit charset from previous page if the previous charset is from autodetection. Actually, it's not only for the previous page charset is from auto-detection but also previous page with charset meta-tag while the submission result page has no meta-charset, see bug 156824.
Whiteboard: [adt1 RTM] [ETA Needed] → [adt1 RTM] [ETA 08/20]
Attached patch patch (obsolete) — Splinter Review
Explaination to the problem and fix: For POST document, charset autodetection is disabled and charset of submitting document should be used. The only charset we carry over from old doc to new doc is default charset in the existing code. Clearly it is not a good idea to change default charset, otherwise it will affect all pages loaded after that point. We have hintCharset in nsIMarkupDocumentViewer which is not used for anything right now. My solution is set this hint charset based on submitting document, and utilized this hint during the new document load.
Status: NEW → ASSIGNED
Comment on attachment 95909 [details] [diff] [review] patch sr=jst
Attachment #95909 - Flags: superreview+
This is a pretty bad regression. We should definitely mark this one as a show stopper.
In the new patch, I lower the priority of inherited charset to meta charset and http header charset. We already have some code there to prevent meta charset from reloading a POST page. This way we can still honor meta charset and http header charset if it is different from form document.
Comment on attachment 95960 [details] [diff] [review] update patch which lower the priority of hint charset. r=jkeiser, but two comments: (a) please move nsCOMPtr<nsIDocument> document; into the if() right before you use it (to reduce its scope) (b) you don't need to declare hintCharset anymore. This patch looks good :)
Attachment #95960 - Flags: review+
Attached patch update take john's comment. (obsolete) — Splinter Review
Attachment #95909 - Attachment is obsolete: true
Attachment #95960 - Attachment is obsolete: true
Comment on attachment 95969 [details] [diff] [review] update take john's comment. carry over review.
Attachment #95969 - Flags: superreview+
Attachment #95969 - Flags: review+
Attachment #95969 - Flags: superreview+
Attachment #95969 - Flags: superreview+
Comment on attachment 95969 [details] [diff] [review] update take john's comment. sr=dveditz
Comment on attachment 95969 [details] [diff] [review] update take john's comment. A few nits: - In nsDocShell.cpp: + if (document) { + nsAutoString docCharset; + document->GetDocumentCharacterSet(docCharset); ... This file uses 4-space indentation, so use 4 spaces here... - In nsHTMLDocument.cpp: @@ -955,7 +970,15 @@ // TryHintCharset and TryParentCharset. It should be always safe to try more // sources. if (! TryUserForcedCharset(muCV, dcInfo, charsetSource, charset)) { - TryHintCharset(muCV, charsetSource, charset); + //check if current doc is from POST command Add a space after the '//' to match other comments here. sr=jst
A problem was found with the current approach. The use of hintCharset conflicts with charset autodetection. I need to figure out a new patch.
Attachment #95969 - Attachment is obsolete: true
Attachment #95969 - Flags: superreview+
Attachment #95969 - Flags: review+
Comment on attachment 96042 [details] [diff] [review] new patch with a new defined charset field in document viewer One more test, this patch does not seems to work in some situation.
Attachment #96042 - Attachment is obsolete: true
In which situations does the patch not work? Pls elaborate ...
Whiteboard: [adt1 RTM] [ETA 08/20] → [adt1 RTM] [ETA 08/21]
about the patch: PrevDocCharacterSet is added to document viewer, just as other charset source, such as forcedCharset, hintCharset and defaultCharset. This field is set when a document get a new charset. The charset is passed around. For POST document, this information will be referenced. Most of the charset is set in "nsHTMLDocument::StartDocumentLoad". There is one exception. Parser may set charset if it found meta tag charset. We need to record the charset in that case. That is done in "HTMLContentSink::SetDocumentCharset". Parser notified the nsDocument about its charset change in the same place.
Comment on attachment 96056 [details] [diff] [review] this patch should work in all situation. - In nsIMarkupDocumentViewer.idl: */ attribute PRInt32 hintCharacterSetSource; + /* + character set from prev document + */ + attribute wstring prevDocCharacterSet; + Please fix this indentation mess, there are tabs all over this file, so be careful. sr=jst
Attachment #96056 - Flags: superreview+
yuying found a problem in the test build. Still investigating.
Attachment #96056 - Attachment is obsolete: true
Attachment #96056 - Flags: superreview+
Comment on attachment 96093 [details] [diff] [review] add sink charset updated in charset detector >Index: docshell/base/nsIMarkupDocumentViewer.idl >+ /* >+ character set from prev document >+ */ >+ attribute wstring prevDocCharacterSet; >+ > //void GetCharacterSetHint(in wstring hintCharset, in PRInt32 charsetSource); string-fu nit: you should use AString instead of wstring here. also please fix the indentation and remove commented out function. i haven't verified the logic of the patch itself. someone more familar with the parser, docshell, and the content sink should really do that.
Comment on attachment 96093 [details] [diff] [review] add sink charset updated in charset detector r/sr=darin i spoke to shanjian via AIM, and he told me that he's fixed the indentation problem, and we decided to hold off on converting the interface from wstring to AString (since many other methods/attributes already use wstring). plus, it keeps things simpler, which is a good thing for the 1.0 branch.
Attachment #96093 - Flags: superreview+
Comment on attachment 96093 [details] [diff] [review] add sink charset updated in charset detector r/sr=jst
Attachment #96093 - Flags: review+
adt1.0.1+ (on ADT's behalf) approval for checkin to the 1.0 branch, pending Drivers' approval. pls check this in asap, then replace "mozilla1.0.1+" with the "fixed1.0.1" keyword.
Keywords: adt1.0.1+
Pls Note: Teri Preston in QA tested a private build of the 1.0 brnach with this fix, and verified there were "No new form control regressions found on this build." Yuying Long, the assigned QA conatct, also verified on a private build that patch works as designed, and found no regressions.
I carefully look at the patch and go over with shanjian about the logic. It looks like his patch make sense and are right. Cannot think about other issue now. We should land it.
Comment on attachment 96093 [details] [diff] [review] add sink charset updated in charset detector a=chofmann for 1.0.1
Attachment #96093 - Flags: approval+
we need to test the heck out of this one in branch builds as soon as this lands.
fix checked into 1.0 branch.
Yuying's test report: I have verified this build with bug 162239, the results are good. Here are the details: 1. Non charset meta-tag in page before submission, auto-detector works proper with different options (Universal, Japanese...), then POST, get a display correct page which is carry over from previous page: e.g. http://www.jra.go.jp 2. With charset meta-tag in page before submission, and no charset meta-tag in after form submission posted page, will carry over from previous page also: e.g. http://babel/tests/browser/form/gb2312_simpleform.html 3. Before/After form submission has different charset, the submission result page will take it's own page charset instead of previous page charset, the purpose for test this is there might be some western pages would possible has two different western charsets between before and after submission. http://babel/tests/browser/form/gb2312_shiftJis.html 4. There is no change with a related method GET in a form, auto-detect will keep work with this method. While POST doesn't enable the auto-detection which is what we intentionally to do. 5. Some other related scenarios outside a form with auto-detection, charset carry over through links, different language auto-detect options, different language encoding settings...etc. No problem was found so far. I'm comfortable with current results - the page display, marked charset, reload condition...etc.., even though I found a separate problem (not critical) today that has been there for a long time, and I'm going to file it as a new bug.
This patch change the way how charset is resolved for POST document. It should have no impact on non-POST pages. When verify this bug, except the general regression, please pay special attent to how the submitting docment get its charset from. The POST doc's charset shouldn't be affected.
ylong: pls verify this as fixed on the 1.0 branch tomorrow. thanks!
Verified fixed in 08-21 branch build on all platforms. The result is same as in comment #36.
fix checked into trunk.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Fixed verified in 09-12 trunk build / WinXP-SC.
Status: RESOLVED → VERIFIED
Blocks: 154896
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: