19337 - [DOGFOOD]Single characters being dropped from pages

Reporter

Description

•

25 years ago

When viewing some pages, random letters aren't there. Suggestion was made
that this is a networking issue rather than rendering problems.

I am using a CVS build of Mozilla from Nov 18th, which is running on a 128 meg
K7 and displaying remotely on a 64meg Cyrix 586 MediaGX. These are connected
to the net proper by a 33.6k modem. If you need further details, please ask:
I am not sure what you need.

How to reproduce: (this works for me, I hope it works for you)

http://www.linux.org.uk has a section of headings: All the news, Alan's diary,
(few other things) Sponsored Penguin Visit (some others).

Actual result: The link referring to the penguin is rendered variously as a
'Peguin' and a 'enguin' as I revisit the page. (Reloading the page shows the
same error rather than a variant of it) When viewing source, I found that word
spelled a slightly different way again. The word is correctly spelt when viewing
it with Netscape from another machine here and with Lynx on the 586, so it's not
just a typo in the source.

Expected result: the correct spelling :) Or, more precisely, no missing letters.

Builds: I have been using Mozilla (milestones M7 on and CVS builds) on Linux
(2.2 kernels) for some time but have not noticed this before. However, it may
simply have escaped my notice. I shall try with later builds if I can. Sorry I
can't be more specific. (The machine with the program on is not mine.)

Additional info: I know nothing about networking and it's at the suggestion
of someone else that I put this in here. If you need more precise information,
please tell me what I need to supply and I shall do. I have checked the top bugs
list and the Mozilla 11 release notes and found nothing that looks related
there. I have a screenshot of the source and the browser next to each other to
prove I'm not going mad. I doubt that will be much help, but if you want it..?

Telsa (who thinks mozilla is coool).

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 1

•

25 years ago

Someone has suggested that it might be related to EAGAIN handling, which has
apparently caused single-byte data loss in other networking apps before.

Just a thought.

Judson Valeski

Updated

•

25 years ago

Assignee: gagan → rpotts

Judson Valeski

Comment 2

•

25 years ago

Rick, embedded null problem? Sounds unlikely actually as those can only really
happen at the end of a HTTP body.

hobbit

Reporter

Comment 3

•

25 years ago

More info: rebuilt mozilla from CVS today (20th Nov 99).

Problem still exists on the page I mentioned when I run mozilla remotely,
displaying on the Cyrix machine. It does _not_ exist when running and displaying
on the K7.

It also exists on this url: (warning, it's a gruesome topic, sorry. If you
don't like flying, don't view it)
http://news.bbc.co.uk/low/english/world/americas/newsid_528000/528963.dtm
It's right down at the bottom of the report before the little graphic: the
word 'pitched' is shown on my machine as 'pitced' when looking at the page
or viewing the source. Again, when running mozilla with the display on the
K7, this doesn't show up.

When I run mozilla-viewer (remotely, displaying on the Cyrix) on the bbc news
url, the problem doesn't show up: 'pitched' is spelled correctly. When I run
mozilla-viewer on the linux news site, the problem does show up: 'Penguin' is
shown as 'Pengin'.

hobbit

Reporter

Comment 4

•

25 years ago

Er, sorry. Typo. That's .stm at the end of the url. My apologies.
http://news.bbc.co.uk/low/english/world/americas/newsid_528000/528963.stm

Judson Valeski

Comment 5

•

25 years ago

*** Bug 19487 has been marked as a duplicate of this bug. ***

Judson Valeski

Updated

•

25 years ago

Whiteboard: [PDT+]

Judson Valeski

Comment 6

•

25 years ago

*** Bug 19487 has been marked as a duplicate of this bug. ***

Judson Valeski

Comment 7

•

25 years ago

adding PDT+ to this bug (the one that was just marked dup of this one was PDT+.

rpotts (gone)

Updated

•

25 years ago

Status: NEW → ASSIGNED

Whiteboard: [PDT+] → [PDT+] 11/29/99

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

25 years ago

Summary: Single characters being dropped from pages → [DOGFOOD]Single characters being dropped from pages

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 8

•

25 years ago

Putting [DOGFOOD] in summary, since it's already PDT+.

This bug is definitely a recent regression.  I've been using Mozilla as dogfood
(for some things) in its more usable nightly builds on and off for at least a
few weeks now.  Furthermore, it causes random very-weird-things happening on
pages when a character of *markup* is the one dropped.  (Some of these are
described on bug 19487.)

It seems to me the characters dropped could be on the ends of packets.  See the
test case from bug 19487 and my comments there.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 9

•

25 years ago

I did a little more investigation:

This bug began happening between the builds of 1999-11-15-12-M12 and
1999-11-17-17-M12.

The points where it happens seem to be a subset of the same points.  That is, in
my testcase in bug 19487, I see the following missing:

(The gaps are in bytes (using 56 chars per line), and I'm calculating them
quickly.  1448 and 2896 do show up quite often:)

Attempt 1:
 * the b in line 023
 * the j in line 072 (gap: 2752)
 * the T in line 124 (gap: 2904)
 * the z in line 143 (gap: 1096)
 * the L in line 290 (gap: 8192)
 * the L in line 365 (gap: 4200)
 * the D in line 391 (gap: 1448)

Attempt 2:
 * the b in line 023
 * the T in line 049 (gap: 1448)
 * the j in line 072 (gap: 1304)
 * the T in line 124 (gap: 2896)
 * the L in line 150 (gap: 1448)
 * the z in line 201 (gap: 2896)
 * the z in line 218 (gap: 952)
 * the r in line 244 (gap: 1448)
 * the b in line 296 (gap: 2896)
 * the T in line 322 (gap: 1448)

Michael Lowe

Updated

•

25 years ago

Severity: normal → major

Priority: P3 → P2

Michael Lowe

Updated

•

25 years ago

OS: Linux → All

Priority: P2 → P1

Hardware: PC → All

Sammy Ford

Comment 10

•

25 years ago

Ok, this is definitely a http problem.
sites.netscape.com/sford4/dropcharacter2.html
seems to drop random characters, while if placed on your hardrive it displays
correctly.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 11

•

25 years ago

*** Bug 19916 has been marked as a duplicate of this bug. ***

Simon Fraser [no longer active]

Updated

•

25 years ago

Assignee: rpotts → rickg

Status: ASSIGNED → NEW

Simon Fraser [no longer active]

Comment 12

•

25 years ago

This is actually a bug in the parser. This is easily verified by putting a printf
in nsParser::OnDataAvailable, in the while loop:

	    mParserContext->mTransferBuffer[aLength] = '\0';
	    printf(mParserContext->mTransferBuffer);

This shows that no data is missing at this stage.

The character that goes missing is the _first_ character of the second and
subsequent chunks for which OnDataAvailable is called. The bug appears to be in
the nsScanner. nsScanner->mOffset is 1 too large after the first chunk, which
causes us to miss the first character of the second chunk.
nsScanner::ReadUntil() is incrementing mOffset even when it finds the string
termination null, which is wrong.

Simon Fraser [no longer active]

Comment 13

•

25 years ago

Here's a patch. This fixes one instance of the bug; it may be the case that the
same fix needs to be applied elsewhere in nsScanner.

Index: nsScanner.cpp
===================================================================
RCS file: /cvsroot/mozilla/htmlparser/src/nsScanner.cpp,v
retrieving revision 3.64
diff -u -2 -r3.64 nsScanner.cpp
--- nsScanner.cpp	1999/11/17 00:25:33	3.64
+++ nsScanner.cpp	1999/11/27 04:29:54
@@ -983,4 +983,5 @@
       result=Peek(theChar);
       theBuf=mBuffer.GetUnicode();
+      mOffset --;	// this nul is synthetic, don't count it.
       theOrigin=mOffset;
     }

rpotts (gone)

Comment 14

•

25 years ago

Attached patch Patch to nsScanner.cpp — Details — Splinter Review

rpotts (gone)

Comment 15

•

25 years ago

It looks like sfraser beat me to the fix :-)

I traced this bug up through Necko and all the data is fine until it gets to the
scanner...

It looks like we are hitting a boundary case in nsScanner::ReadUntil(...) [line
956].  At this point, if we consume the entire buffer, we drop into the else
clause and Peek() returns EOF.  Unfortunately, mOffset has already been
incremented...  This throws us off by one the next time Tokenize(...) is
called.

Hey rick...  I've attached a patch that gets everything working for the test
case: http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2986.

It looks like the other versions of ReadUntil(...) do the right thing... But I
think that ReadWhile(...) and ReadIdentifier(...) have the same problem in the
else clause...

rickg

Assignee

Updated

•

25 years ago

Whiteboard: [PDT+] 11/29/99 → [PDT+] 12/3

Target Milestone: M12

Judson Valeski

Comment 16

•

25 years ago

so who's checking in the fix for this?

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 17

•

25 years ago

For those following the bug, rickg says he has a fix for this and related issues
waiting in the wings.  Not for long, I hope! =)

rickg

Assignee

Comment 18

•

25 years ago

*** Bug 20578 has been marked as a duplicate of this bug. ***

rickg

Assignee

Comment 19

•

25 years ago

Fixed by update to scanner.

Status: NEW → RESOLVED

Closed: 25 years ago

Resolution: --- → FIXED

Prashant Desale

Updated

•

25 years ago

Status: RESOLVED → VERIFIED

Prashant Desale

Comment 20

•

25 years ago

Verified with dbaron@fas.harvard.edu's testcase with 1999-12-06-09 build.
Working perfect. Marking as verified.

leger

Comment 21

•

25 years ago

Bulk move of all Networking-Core (to be deleted component) bugs to new
Networking component.

hobbit

Reporter

Comment 22

•

25 years ago

Thank you very much! Finally updated Mozilla to current as of 18th Dec 1999
and I now see a correctly-spelled 'Penguin' on the original page that started
all this and characters (bytes?) are no longer getting lost on the way.

Shame it's not possible to fix real typos on the web like this, too. :) Again,
thanks.

Peter "jag" Annema

Comment 23

•

24 years ago

Just being pedantic.

Bug 20451 was incorrectly marked as a duplicate of 19937, it should be marked as
a duplicate of this. Adding this comment just for future reference (or is that
historical purposes?).

Péter Bajusz

Comment 24

•

24 years ago

*** Bug 20451 has been marked as a duplicate of this bug. ***

Péter Bajusz

Comment 25

•

24 years ago

*** Bug 20451 has been marked as a duplicate of this bug. ***