Closed Bug 451557 Opened 16 years ago Closed 16 years ago

Null byte in page source is shown as a replacement character (Diamond-like, �)

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: qfmomen, Unassigned)

References

()

Details

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1

http://www.worldski.com/ski-specialoffers.aspx
http://cheap4holidays.com/faqs.aspx
http://cheap4holidays.com/terms.aspx
http://www.cheap4carhire.com/terms.aspx

In the above pages, some scrap char � is added while it is working fine in IE. This char is inserted sometimes in the class name or image name or sometimes it is inserted into any tag, which disrupted the page. 

Can you please look in this matter.

thanks

Qurban

Reproducible: Always

Steps to Reproduce:
1.
2.
3.
can not be critical because this is no crash/dataloss.
A screenshot and attached page source etc are needed.
Severity: critical → normal
http://www.worldski.com/ski-specialoffers.aspx - I don't see any diamonds.

http://cheap4holidays.com/faqs.aspx - one diamond, "Cheap4Holidays� is a trading name".  There is a null byte in a UTF-8 stream, and I guess our UTF-8 decoder turns this into the standard replacement character.

http://cheap4holidays.com/terms.aspx - same as faqs.aspx

http://www.cheap4carhire.com/terms.aspx - "Cancellation & Amendments: <�br/>".  Another null byte in a weird place in UTF-8.  Strangely, I only see it if I save the file using Firefox, not using wget/curl.

Tools used: wget, curl, hexdump -C, Firefox trunk (not 3.0.x)

Dunno if this is a bug in Firefox, but if you're the owner of the sites, you can make the problem go away by removing those null bytes from the pages.
Assignee: nobody → smontagu
Component: General → Internationalization
Product: Firefox → Core
QA Contact: general → i18n
Wikipedia says 0x00 is valid UTF-8 for a null character.  Seems like a bug in Firefox to me that it's being turned into a replacement character.
Testcase:

data:text/html;charset=UTF-8,a%00b
Summary: Diamond like � Character → Null byte in UTF-8 page is shown as a replacement character (Diamond-like, �)
I don't think the UTF-8 decoder is inserting the replacement character: any other encoding I tried does the same.
data:text/html;charset=iso-8859-1,a
Summary: Null byte in UTF-8 page is shown as a replacement character (Diamond-like, �) → Null byte in page source is shown as a replacement character (Diamond-like, �)
The last comment got cut off, it should have ended
data:text/html;charset=iso-8859-1,a%00b
data:text/html;charset=windows-1250,a%00b
data:text/html;charset=Shift_JIS,a%00b
data:text/html;charset=Big5,a%00b
etc., etc.
If I'm not mistaken the parser is replacing all null bytes by the replacement character, q.v. bug 315473.
Assignee: smontagu → nobody
Component: Internationalization → HTML: Parser
OS: Windows XP → All
QA Contact: i18n → parser
Hardware: PC → All
(In reply to comment #1)
> can not be critical because this is no crash/dataloss.
> A screenshot and attached page source etc are needed.
> 

It have lost lot of my data as when we send request to third party services, the response is lost and the users can view it. The data usage and bandwidth usage is increased thousand times. 

Can you please tell me how can i provide your screen-shots on this forum. 
(In reply to comment #2)
> http://www.worldski.com/ski-specialoffers.aspx - I don't see any diamonds.
> 
> http://cheap4holidays.com/faqs.aspx - one diamond, "Cheap4Holidays� is a
> trading name".  There is a null byte in a UTF-8 stream, and I guess our UTF-8
> decoder turns this into the standard replacement character.
> 
> http://cheap4holidays.com/terms.aspx - same as faqs.aspx
> 
> http://www.cheap4carhire.com/terms.aspx - "Cancellation & Amendments:
> <�br/>".  Another null byte in a weird place in UTF-8.  Strangely, I only see
> it if I save the file using Firefox, not using wget/curl.
> 
> Tools used: wget, curl, hexdump -C, Firefox trunk (not 3.0.x)
> 
> Dunno if this is a bug in Firefox, but if you're the owner of the sites, you
> can make the problem go away by removing those null bytes from the pages.
> 

Thanks for your comments! 

1- http://www.worldski.com/ski-specialoffers.aspx... it is not essential that this char is appearing at some particular position. It appears at different part of the page at different access. If you view the source code of the page, you will find it somewhere in data, tags, attribute values, and URLs. I have snapshots but i am not sure if i could attach those?
(In reply to comment #2)
> http://www.worldski.com/ski-specialoffers.aspx - I don't see any diamonds.
> 
> http://cheap4holidays.com/faqs.aspx - one diamond, "Cheap4Holidays� is a
> trading name".  There is a null byte in a UTF-8 stream, and I guess our UTF-8
> decoder turns this into the standard replacement character.
> 
> http://cheap4holidays.com/terms.aspx - same as faqs.aspx
> 
> http://www.cheap4carhire.com/terms.aspx - "Cancellation & Amendments:
> <�br/>".  Another null byte in a weird place in UTF-8.  Strangely, I only see
> it if I save the file using Firefox, not using wget/curl.
> 
> Tools used: wget, curl, hexdump -C, Firefox trunk (not 3.0.x)
> 
> Dunno if this is a bug in Firefox, but if you're the owner of the sites, you
> can make the problem go away by removing those null bytes from the pages.
> 

http://cheap4holidays.com/faqs.aspx - Diamond like char as postfix is not the trade name. It appears due where the my aspx statement written as follows;
<%=session("Sitename")%>... There is nothing wrong with this statement. just to have some experiments, i have cast it to String and trimmed it and used cleanHTML function etc.

Can you please little bit explain, how firefox is handling null bytes and what is the possible solution to remove it from the HTML rendered?
If you do a 'hexdump -C' on the source code for your aspx program and don't see null bytes there, they're probably in places like the "Sitename" value.  I can't give you detailed help with fixing that because I don't know aspx.

The problem in http://www.cheap4carhire.com/terms.aspx shows up in Safari and Opera too; those browsers just don't show the replacement character between "<" and "/br>".  If *random* (???) null bytes are getting into ski-specialoffers.aspx, you'll have similar problems there occasionally until you fix the site.
Blocks: 315473
The handling of null bytes in text nodes is rather inconsistent: if a null byte is at the beginning of the text it just gets omitted, but anywhere else it is replaced by the � character:

data:text/html,<p>%00abc%00def</p>
Any solution?
Even after we fix this bug (which is specific to text nodes), you'll need to stop putting random null bytes in your page (see comment 11), so you might as well do that now...
Thanks dear.

Well, go to this link using firefox;
http://cheap4hotels.com/Hotel_Availability.aspx?IATA_code=LON&location=London&Rating=2-5&dtStart=2008-8-29&dtEnd=2008-8-30&typeDouble=1&hotelcode=0&refId=203&r=459

Click Get a quote of any hotel and you will this char in the text area of terms & condition. 

this is the simple code which i have written over there;

<TEXTAREA class="terms">
<%=Session("SiteName")%> is a trading............ 
</TEXTAREA>

There is nothing in code; It is working fine using IE. I can not understand what to replace and where is that character.
Quarban Ali, I suggest using wget to download that url.  If that contains null characters in it (which it does over here), then you'll have to look at your code and see where you (or your database, possibly) are putting them in.

In any case, HTML5 specifies what should be done with null characters, and it's replacement with the replacement character.  Anything else leads to security bugs.
Status: UNCONFIRMED → RESOLVED
Closed: 16 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.