Closed Bug 19337 Opened 25 years ago Closed 25 years ago

[DOGFOOD]Single characters being dropped from pages

Categories

(Core :: Networking, defect, P1)

defect

Tracking

()

VERIFIED FIXED

People

(Reporter: hobbit, Assigned: rickg)

References

Details

(Whiteboard: [PDT+] 12/3)

Attachments

(1 file)

When viewing some pages, random letters aren't there. Suggestion was made
that this is a networking issue rather than rendering problems.

I am using a CVS build of Mozilla from Nov 18th, which is running on a 128 meg
K7 and displaying remotely on a 64meg Cyrix 586 MediaGX. These are connected
to the net proper by a 33.6k modem. If you need further details, please ask:
I am not sure what you need.

How to reproduce: (this works for me, I hope it works for you)

http://www.linux.org.uk has a section of headings: All the news, Alan's diary,
(few other things) Sponsored Penguin Visit (some others).

Actual result: The link referring to the penguin is rendered variously as a
'Peguin' and a 'enguin' as I revisit the page. (Reloading the page shows the
same error rather than a variant of it) When viewing source, I found that word
spelled a slightly different way again. The word is correctly spelt when viewing
it with Netscape from another machine here and with Lynx on the 586, so it's not
just a typo in the source.

Expected result: the correct spelling :) Or, more precisely, no missing letters.

Builds: I have been using Mozilla (milestones M7 on and CVS builds) on Linux
(2.2 kernels) for some time but have not noticed this before. However, it may
simply have escaped my notice. I shall try with later builds if I can. Sorry I
can't be more specific. (The machine with the program on is not mine.)

Additional info: I know nothing about networking and it's at the suggestion
of someone else that I put this in here. If you need more precise information,
please tell me what I need to supply and I shall do. I have checked the top bugs
list and the Mozilla 11 release notes and found nothing that looks related
there. I have a screenshot of the source and the browser next to each other to
prove I'm not going mad. I doubt that will be much help, but if you want it..?

Telsa (who thinks mozilla is coool).
Someone has suggested that it might be related to EAGAIN handling, which has
apparently caused single-byte data loss in other networking apps before.

Just a thought.
Assignee: gagan → rpotts
Rick, embedded null problem? Sounds unlikely actually as those can only really
happen at the end of a HTTP body.
More info: rebuilt mozilla from CVS today (20th Nov 99).

Problem still exists on the page I mentioned when I run mozilla remotely,
displaying on the Cyrix machine. It does _not_ exist when running and displaying
on the K7.

It also exists on this url: (warning, it's a gruesome topic, sorry. If you
don't like flying, don't view it)
http://news.bbc.co.uk/low/english/world/americas/newsid_528000/528963.dtm
It's right down at the bottom of the report before the little graphic: the
word 'pitched' is shown on my machine as 'pitced' when looking at the page
or viewing the source. Again, when running mozilla with the display on the
K7, this doesn't show up.

When I run mozilla-viewer (remotely, displaying on the Cyrix) on the bbc news
url, the problem doesn't show up: 'pitched' is spelled correctly. When I run
mozilla-viewer on the linux news site, the problem does show up: 'Penguin' is
shown as 'Pengin'.
Er, sorry. Typo. That's .stm at the end of the url. My apologies.
http://news.bbc.co.uk/low/english/world/americas/newsid_528000/528963.stm
*** Bug 19487 has been marked as a duplicate of this bug. ***
Whiteboard: [PDT+]
*** Bug 19487 has been marked as a duplicate of this bug. ***
adding PDT+ to this bug (the one that was just marked dup of this one was PDT+.
Status: NEW → ASSIGNED
Whiteboard: [PDT+] → [PDT+] 11/29/99
Summary: Single characters being dropped from pages → [DOGFOOD]Single characters being dropped from pages
Putting [DOGFOOD] in summary, since it's already PDT+.

This bug is definitely a recent regression.  I've been using Mozilla as dogfood
(for some things) in its more usable nightly builds on and off for at least a
few weeks now.  Furthermore, it causes random very-weird-things happening on
pages when a character of *markup* is the one dropped.  (Some of these are
described on bug 19487.)

It seems to me the characters dropped could be on the ends of packets.  See the
test case from bug 19487 and my comments there.
I did a little more investigation:

This bug began happening between the builds of 1999-11-15-12-M12 and
1999-11-17-17-M12.

The points where it happens seem to be a subset of the same points.  That is, in
my testcase in bug 19487, I see the following missing:

(The gaps are in bytes (using 56 chars per line), and I'm calculating them
quickly.  1448 and 2896 do show up quite often:)

Attempt 1:
 * the b in line 023
 * the j in line 072 (gap: 2752)
 * the T in line 124 (gap: 2904)
 * the z in line 143 (gap: 1096)
 * the L in line 290 (gap: 8192)
 * the L in line 365 (gap: 4200)
 * the D in line 391 (gap: 1448)

Attempt 2:
 * the b in line 023
 * the T in line 049 (gap: 1448)
 * the j in line 072 (gap: 1304)
 * the T in line 124 (gap: 2896)
 * the L in line 150 (gap: 1448)
 * the z in line 201 (gap: 2896)
 * the z in line 218 (gap: 952)
 * the r in line 244 (gap: 1448)
 * the b in line 296 (gap: 2896)
 * the T in line 322 (gap: 1448)
Severity: normal → major
Priority: P3 → P2
OS: Linux → All
Priority: P2 → P1
Hardware: PC → All
Ok, this is definitely a http problem.
sites.netscape.com/sford4/dropcharacter2.html
seems to drop random characters, while if placed on your hardrive it displays
correctly.
*** Bug 19916 has been marked as a duplicate of this bug. ***
Assignee: rpotts → rickg
Status: ASSIGNED → NEW
This is actually a bug in the parser. This is easily verified by putting a printf
in nsParser::OnDataAvailable, in the while loop:

	    mParserContext->mTransferBuffer[aLength] = '\0';
	    printf(mParserContext->mTransferBuffer);

This shows that no data is missing at this stage.

The character that goes missing is the _first_ character of the second and
subsequent chunks for which OnDataAvailable is called. The bug appears to be in
the nsScanner. nsScanner->mOffset is 1 too large after the first chunk, which
causes us to miss the first character of the second chunk.
nsScanner::ReadUntil() is incrementing mOffset even when it finds the string
termination null, which is wrong.
Here's a patch. This fixes one instance of the bug; it may be the case that the
same fix needs to be applied elsewhere in nsScanner.

Index: nsScanner.cpp
===================================================================
RCS file: /cvsroot/mozilla/htmlparser/src/nsScanner.cpp,v
retrieving revision 3.64
diff -u -2 -r3.64 nsScanner.cpp
--- nsScanner.cpp	1999/11/17 00:25:33	3.64
+++ nsScanner.cpp	1999/11/27 04:29:54
@@ -983,4 +983,5 @@
       result=Peek(theChar);
       theBuf=mBuffer.GetUnicode();
+      mOffset --;	// this nul is synthetic, don't count it.
       theOrigin=mOffset;
     }
It looks like sfraser beat me to the fix :-)

I traced this bug up through Necko and all the data is fine until it gets to the
scanner...

It looks like we are hitting a boundary case in nsScanner::ReadUntil(...) [line
956].  At this point, if we consume the entire buffer, we drop into the else
clause and Peek() returns EOF.  Unfortunately, mOffset has already been
incremented...  This throws us off by one the next time Tokenize(...) is
called.

Hey rick...  I've attached a patch that gets everything working for the test
case: http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2986.

It looks like the other versions of ReadUntil(...) do the right thing... But I
think that ReadWhile(...) and ReadIdentifier(...) have the same problem in the
else clause...
Whiteboard: [PDT+] 11/29/99 → [PDT+] 12/3
Target Milestone: M12
so who's checking in the fix for this?
For those following the bug, rickg says he has a fix for this and related issues
waiting in the wings.  Not for long, I hope! =)
*** Bug 20578 has been marked as a duplicate of this bug. ***
Fixed by update to scanner.
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
Verified with dbaron@fas.harvard.edu's testcase with 1999-12-06-09 build.
Working perfect. Marking as verified.
Bulk move of all Networking-Core (to be deleted component) bugs to new
Networking component.
Thank you very much! Finally updated Mozilla to current as of 18th Dec 1999
and I now see a correctly-spelled 'Penguin' on the original page that started
all this and characters (bytes?) are no longer getting lost on the way.

Shame it's not possible to fix real typos on the web like this, too. :) Again,
thanks.
Just being pedantic.

Bug 20451 was incorrectly marked as a duplicate of 19937, it should be marked as
a duplicate of this. Adding this comment just for future reference (or is that
historical purposes?).
*** Bug 20451 has been marked as a duplicate of this bug. ***
*** Bug 20451 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: