Closed
Bug 91696
Opened 23 years ago
Closed 23 years ago
Meta tag does not work
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
VERIFIED
FIXED
mozilla0.9.4
People
(Reporter: ftang, Assigned: jbetak)
References
()
Details
(Keywords: intl, Whiteboard: fixed on the trunk)
Attachments
(4 files)
806 bytes,
patch
|
Details | Diff | Splinter Review | |
816 bytes,
patch
|
Details | Diff | Splinter Review | |
2.79 KB,
patch
|
Details | Diff | Splinter Review | |
3.16 KB,
patch
|
Details | Diff | Splinter Review |
reproduce procedure 1. visit http://tw.yahoo.com expect result- see Chinese actual result- see garbage The reason is this page use illegal meta tag- <meta http-equiv=content-type content=text/html; charset=big5> while it should be <meta http-equiv=content-type content="text/html; charset=big5"> However, other browser work.
Comment 1•23 years ago
|
||
Reassign to ftang.
Assignee: nhotta → ftang
OS: Windows NT → All
Hardware: PC → All
Summary: Meta tag does not work → Meta tag does not work
Reporter | ||
Comment 2•23 years ago
|
||
jbetak, can you work on this one? The code should be in nsMetaCharsetObserver.cpp
Assignee: ftang → jbetak
Assignee | ||
Comment 4•23 years ago
|
||
accepting for 0.9.3
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.3
Assignee | ||
Comment 5•23 years ago
|
||
still investigating - pushing out ot 0.9.4
Target Milestone: mozilla0.9.3 → mozilla0.9.4
Assignee | ||
Comment 6•23 years ago
|
||
found something related - bug 88746, confirmed that this is not a regression, although we have to make sure to to regress the fix for bug 88746. Frank, please correct me if I'm wrong, but I believe we need to attack the parser logic here: http://lxr.mozilla.org/seamonkey/source/htmlparser/src/nsParser.cpp#2524 The metaCharsetObserver is not being notified with the parsed value, only with the charset value from cache.
Assignee | ||
Comment 7•23 years ago
|
||
OK, I have a working fix in my local tree. I'll polish it up tonight and post for review. The culprit seems to be this (from nsParser): 2519 if ((offset = tokenizer.GetNextWord(PR_TRUE)) != kNotFound) { 2520 const char* contentStart = attrStart+offset; 2521 CWordTokenizer<char> contentTokenizer(contentStart, 0, 2522 tokenizer.GetLength()); 2523 When calling tokenizer.GetNextWord(PR_TRUE)), it'll give special handling to words starting with qoutes, regardless whether we pass in PR_TRUE or PR_FALSE. If a word doesn't start with a quote, tokenizer.GetLength() will return the length of that word 'till the first whitespace delimiter. If a word starts with a quote, the length runs till the end of the buffer. Replacing tokenizer.GetLength() by tokenizer.GetTotalLength() seems to fix the problem. I'll keep working on this, just wanted to give a quick progress status...
Assignee | ||
Comment 8•23 years ago
|
||
Assignee | ||
Comment 9•23 years ago
|
||
OK, this seems to be the simplest solution - I'm still evaluating it in terms of correctness, consistency and potential regression risk. The problem is as follows: on line 2519 in nsParser.cpp we are implicitly assuming that the content value is quoted (content="text/html; charset=big5">). The tokenizer is pointing at the first value after the '=' sign and when requesting to read the next word, it will either read until the next white space character, or alternatively take a whole quoted string as one word. In absence of quotes (content=text/html; charset=big5>). , it will only read in 'text/html;', which will be then assumed as the whole content value. It evidently doesn't contain the charset information, which then will never be parsed out. The proposed solution is a small hack around the current code. Rather than assuming that we have a quoted content value, we instantiate a new tokenizer starting with the first character after the '=' sign and the last character before the '>' bracket. This might be potentially inaccurate, when there are more values than just content. I'm still looking into it, although the current code is very specific about what it will parse out of the content value and about the expected location of the value elements. I think this change might turn out to be quite safe. To limit regression risk, we could check, whether the content value is qouted and apply this tweak only then. Actually, I will do that first thing tomorrow.
Assignee | ||
Comment 10•23 years ago
|
||
Assignee | ||
Comment 11•23 years ago
|
||
cc'ing harishd and vidur for review. A short explanation: I'd like to use the first patch to handle META http-equiv content-type tags w/o quoted content value. Parsing until the end of the tokenizer seems reasonable given this context and the currently present code in nsParser.cpp... I'll also talk to Frank about it.
Assignee | ||
Comment 12•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Whiteboard: have patch - need r/sr
Assignee | ||
Comment 13•23 years ago
|
||
roy: could you help with reviewing the nsMetaCharsetObserver.cpp part of the fix, before I forget, what exactly I'm doing there ;-) vidur: could you please review the nsParser.cpp change?
Comment 14•23 years ago
|
||
I believe shanjian is the best person to review your nsMetaCharsetObserver.cpp . shanjian: can you review his patch? --- cc'ing shanjian ----
Comment 15•23 years ago
|
||
Juraj, I don't quite understand why you want to modify parser. Charset was parsed by parser as an additional attribute, so only your change in nsMetaCharsetObserver.cpp is necessary. Did I miss something? Also for your change in nsMetaCharsetObserver.cpp, is ((0==nsCRT::strncasecmp(contentValue,texthtml.get(),texthtml.Length())) || (((contentValue[0]=='\'') || (contentValue[0]=='\"'))&& (0==nsCRT::strncasecmp(contentValue+1, texthtml.get(), texthtml.Length())) )) AND ((0==nsCRT::strncasecmp(contentValue,texthtml.get(),texthtml.Length())) || (((contentValue[0]=='\'') || (contentValue[0]=='\"'))&& (0==nsCRT::strncasecmp(contentValue+1, texthtml.get(), texthtml.get(), texthtml.Length())) || ((nsnull != charsetValue) && (0==nsCRT::strncasecmp(contentValue, texthtml.get(), texthtml.Length()))) )) equivalent? I tried on my local tree, and remove this part seems working fine.
Assignee | ||
Comment 16•23 years ago
|
||
shanjian, thanks for looking at this - and you are right, the observer changes would be sufficient. I already verified that. You see, I did changes to nsParser first, since I was hitting it first in my debugger. Frank then pointed out, that in poor networking enviroments the parser might not have the complete metatag available and we need to fix it in the metatagobserver also and I used his cgi script to emulate such enviroments. I kept the changes in the parser, since I believe it might be better for the performance. We would set the document charset sooner, before the observer had a chance to kick in. I deemed the change in the parser to be negligible, but if it has a potential to help, why not? But wait - I was just wondering, why do we then parse the meta charset in the parser at all? There must be a reason and I think it's performance. Since we parse the meta charset in both places, why not fixing both to include the case of "missing quotation marks"? I think we should keep the change, if the regression risk is low.
Assignee | ||
Comment 17•23 years ago
|
||
oh, I added this || ((nsnull != charsetValue) && (0==nsCRT::strncasecmp(contentValue, texthtml.get(), texthtml.Length())) to allow us to bypass the condition that the content value starts with an " or ' and the immediately following character sequence is text/html. I've expanded this condition by allowing text/html without quoting, when the parser was able to parse out a charset value. The crucial point in this is that quoted values are read by the parser in one shot, it ignores the white space. This way the document type and charset come over mangled together in one key value. Otherwise the white space will separate them into two consecutive key values. BTW Did you test on tw.yahoo.com? Did you apply the parser changes? If you did, the parser is probably catching the charset before the observer kicks in...
Comment 18•23 years ago
|
||
Ok, I didn't realize your first part change was in meta charset sniffing. I thought it was for preparing the attributes. (In fact, I didn't check that part at all). So you need to ask harishd or vidur to review that part. And I agree, this part should be fixed. As for my second comment, the code you added still seems redundant to me. (Yes, I tried without that part, and without parser fix, it works fine.) No matter charsetValue is nsnull or not, all need to be there is to make sure contentValue start with "text/html". Disable parser part and try out yourself.
Assignee | ||
Comment 19•23 years ago
|
||
Assignee | ||
Comment 20•23 years ago
|
||
Shanjian, you were right - I haven't read that part of the if statement correctly. ((0==nsCRT::strncasecmp(contentValue,texthtml.get(),texthtml.Length ())) || (((contentValue[0]=='\'') || (contentValue[0]=='\"'))&& (0==nsCRT::strncasecmp(contentValue+1, texthtml.get(), texthtml.Length())) )) Covers explicitly both - the quoted and unquoted case. I dropped my extension of the if clause and simplified the patch somewhat. Sigh, I still can't get vidur to respond to my review request.
Comment 21•23 years ago
|
||
r=shanjian for patch #3, file nsMetaCharsetObserver.cpp
Comment 22•23 years ago
|
||
The change to the parser treats the rest of the tag after the content attribute as the attribute value. This should work for the case shown and probably for cases where there are additional attributes after content (e.g. content appears before http-equiv), though the latter should be tested. sr=vidur.
Assignee | ||
Comment 23•23 years ago
|
||
thanks Vidur! I'll look into the scenario you mentioned tonight. Meanwhile I'm clearing the whiteboard status and nominating for the branch. This prevents a number of international sites from displaying properly, of which the most prominent one is Taiwanese Yahoo.
Keywords: nsBranch
Whiteboard: have patch - need r/sr → prepared to land, waiting for the tree to open
Assignee | ||
Comment 24•23 years ago
|
||
QA I checked this into the trunk on 08/09/2001. Could you please verify - I'm particularly keen to see, whether this causes any regressions.
Whiteboard: prepared to land, waiting for the tree to open → fixed on the trunk, need some verification work before lifting to the branch
Comment 25•23 years ago
|
||
Yes, it works on 08-10 trunk win32 / Win2k-cn (just checked lnly one right now).
Assignee | ||
Comment 26•23 years ago
|
||
Yuying, thanks for your prompt reply! I'll keep this around for a few days to see, whether any regressions will arise from smoketests and do some testing myself. Please let me know, should you see any problems/changes in META charset sniffing and detection...
Assignee | ||
Comment 27•23 years ago
|
||
resolving per ftang's request
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Whiteboard: fixed on the trunk, need some verification work before lifting to the branch → fixed on the trunk
Comment 28•23 years ago
|
||
Mark it as verified - works good on 09-04 trunk build / Win2k-CN.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•