Closed
Bug 108134
Opened 24 years ago
Closed 22 years ago
history database is endian; corrupts the display of titles
Categories
(Core Graveyard :: History: Global, defect)
Core Graveyard
History: Global
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jmorzins, Assigned: rbasch)
References
()
Details
Attachments
(3 files)
|
255.52 KB,
text/plain
|
Details | |
|
1.61 KB,
patch
|
alecf
:
review-
bzbarsky
:
superreview+
|
Details | Diff | Splinter Review |
|
7.29 KB,
patch
|
alecf
:
review+
bzbarsky
:
superreview+
|
Details | Diff | Splinter Review |
This bug is best experienced with screenshots, so I've put some up at
http://web.mit.edu/jmorzins/www/mozilla/history-endian.html
What's wrong:
Mozilla stores titles into the history database in an endian-dependent fashion.
What happens:
If a user switches between little-endian machines (like PCs) and big-endian
machines (like Sun workstations), they will only be able to read half of their
history titles at a time. The little-endian machine is able to read the
little-endian titles, and big-endian ones come out looking like garbled Chinese.
Likewise, the big-endian machine will be able to read the big-endian titles,
with little-endian titles coming out as garbled Chinese.
What should have happened:
Mozilla should either store names in an endian-independent fashion, or should
tag the Unicode strings in a way that allows Mozilla to recognize what
endianness the strings are.
Got any other information?
I don't understand the format of the history.dat file, but if it would be useful
to you I can provide mine.
Comment 1•23 years ago
|
||
No dupe, can we please get this off UNCO and into something else ??
This bug will be trivial for anyone at a large Unix site to duplicate, but
perhaps there aren't that many of us left. I'm attaching my history.dat file
to this bug report, in case examining it will help developers believe this bug
report.
To reproduce the bug:
1) On a big-endian machine, visit a web page. For example,
http://web.mit.edu/jmorzins/www/mozilla/big-endian.html
2) Quit mozilla (to save your history).
3) One a little-endian machine, visit a web page. For example,
http://web.mit.edu/jmorzins/www/mozilla/little-endian.html
4) Quit mozilla (to save your history).
5) Now re-start mozilla. Notice that:
(a) if you type URLs into the location bar, the pop-up autocompletion
menu has garbled (Chinese!) titles for other-endian platforms
(b) if you view your history, the list of titles in the history has
garbled (Chinese!) titles for other-endian platforms.
Updated•23 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Updated•23 years ago
|
Keywords: mozilla1.1
Updated•23 years ago
|
Blocks: profile-corrupt
Comment 3•23 years ago
|
||
Here's that bug I mentioned.... Looks like there's no problem with the URL
itself (I hate having a lossy memory). And titles are fucked up whether or not
they are ascii. So the base-64-decoding hypothesis seems like the most likely
one....
Comment 4•23 years ago
|
||
Looking at the example history.dat at the end of
http://web.mit.edu/jmorzins/www/mozilla/history-endian.html, it is in fact some
kind of encoding of UCS-2, not base64-encoded UTF-8, as I said before. (I was
thinking of the export format, not the internal one). The odd thing is that I
don't see a difference in endian-ness between the lines that show up differently
in the screenshots.
Hm. I don't remember if I copied the 30 excerpted lines of my history.dat file
at the same moment that I generated the screen shots. If I copied them after
some intervening web browsing, the endianness of page titles visible in the
excerpt might not correspond to the endianness of titles visible in the screen
shots.
I've been more careful about only visiting the "big-endian.html" URL when
browsing from a big-endian platform, and the "little-endian.html" URL when
browsing from a little-endian platform. At present my history.dat file contains
the following six entries for those two pages. Notice the difference between
the byte order of the 645E9 field and the 645F9 field.
(645E7
=http://web.mit.edu/jmorzins/www/mozilla/big-endian.html)(645E8
=1027946404095809)(645E9=$00v$00i$00e$00w$00e$00d$00 $00o$00n$00 $00b$00i$00\
g$00-$00e$00n$00d$00i$00a$00n)(645F7
=http://web.mit.edu/jmorzins/www/mozilla/little-endian.html)(645F8
=1027946493690063)(645F9
=v$00i$00e$00w$00e$00d$00 $00o$00n$00 $00l$00i$00t$00t$00l$00e$00-$00e$00n\
$00d$00i$00a$00n$00)
Comment 6•23 years ago
|
||
*** Bug 158127 has been marked as a duplicate of this bug. ***
| Assignee | ||
Comment 7•22 years ago
|
||
This patch to xpfe/components/history/src/nsGlobalHistory.cpp
fixes bug 108134 (titles in history.dat are byte order dependent), by
inserting and detecting a byte order marker in the title string.
(The patch is against Mozilla 1.2.1). If the byte order marker is
byte-reversed, the resulting Unicode string is byte-swapped.
Comment 9•22 years ago
|
||
Comment on attachment 117883 [details] [diff] [review]
Patch to use byte order marker in history.dat titles
+ PRUnichar c = (((p[i] >> 8) & 0xff) | (p[i] << 8));
Would it be better to have |(p[i] & 0xff) << 8| at the end there? I guess
we're assuming that PRUnichar is 16-bit anyway, so this is safe...
alec, would you review, please?
Attachment #117883 -
Flags: superreview+
Attachment #117883 -
Flags: review?(alecf)
| Assignee | ||
Comment 10•22 years ago
|
||
If PRUnichar is not 16 bits, then it should not get to the byte-swap
case. (And byte-swapping a 32-bit value, for example, would have to
be done differently anyway). I suppose it wouldn't hurt to change it
to "(p[i] & 0xff) << 8", or to check the size, but, while I am admittedly
a novice with this code, it looks like there will be other problems if
PRUnichar is not a 16-bit value.
Comment 11•22 years ago
|
||
Yeah, the codebase is littered with assumptions that PRUnichar is 16-bit...
Comment 12•22 years ago
|
||
Comment on attachment 117883 [details] [diff] [review]
Patch to use byte order marker in history.dat titles
oh man this is so wrong! do NOT check this in
first of all, the yarn belongs to mork, and is supposed to be read-only. do not
attempt to tweak the data in the yarn.
Second of all, you're putting the BOM at the front of every title string in the
database? that seems like a total waste. I think you need to set an attribute
in the file indicating the endianness of the file, and then converting as
appropriate whenever you read in the title.
the endianness of the file should be set the first time the DB is created..
then when the file gets passed back and forth between PCs and Macs, then the
machine can make the decision about how to do the conversion.
We really need to get to a world where we're using UTF8... *sigh* (that's a
whole other bug dealing with corrupting of short titles with non-zero
mYarn_Forms when you compact data)
Attachment #117883 -
Flags: review?(alecf) → review-
| Assignee | ||
Comment 13•22 years ago
|
||
> first of all, the yarn belongs to mork, and is supposed to be read-only.
The patch code does not modify the yarn buffer; it converts the characters
read from it as it stores them into the result, analogous to the way PRInt64
data is converted from/to a string.
> the endianness of the file should be set the first time the DB is created..
I thought about doing something along this line, but thought it was too
problematic for dealing with entries in an existing file. I would also
be a bit concerned about the possible "waste" of doing excessive conversions
when the user switches endianness. (In our environment, users commonly
switch back and forth between big and little-endian machines, which share
home directories).
> We really need to get to a world where we're using UTF8... *sigh* (that's a
> whole other bug dealing with corrupting of short titles with non-zero
> mYarn_Forms when you compact data)
I gathered that something like this was the case, from comments in the code.
How feasible would it be to address that bug in order to solve this problem?
Comment 14•22 years ago
|
||
ccing alecf so he'll see the comments and all... ;)
| Reporter | ||
Comment 15•22 years ago
|
||
I think it will be helpful to re-read the initial report, before talking about
passing a file back and forth between Macs and PCs. The file is on a network
filesystem, and never gets passed anywhere. Instead, a client mounts a network
filesystem and starts reading the file from the network.
It would be delightful if the endianness of the file were fixed. The problem is
that the endianness of the *Mozilla client* is not fixed. Intel-Unix Mozilla's
assume that the file is of one endianness, and write data into the file in
little-endian fashion. Sparc-Unix Mozilla's assume that the file is another
endianness, and write data into the file in big-endian fashion. After a few
days of browsing, switching platforms randomly each day, a user will be left
with a throroughly mixed-up database.
Comment 16•22 years ago
|
||
I've read through the comments - it doesn't matter how this file makes it from
one platform to the next, etc..
as for your issue of waste when you switch from big to little endian and back,
what seems wasteful is to optimize for the case where the user does switch back
and forth. when 99.99% of the users never switch platforms (let alone share
history.dat between platforms) I don't think we should be optimizing for those
0.01% of the users.
If we simply flag the file as "this is how you should be writing to the file"
then we don't have to encode 0xFFEF into every title string... it means we MAY
break backwards compatibility with any entries that are already in the history
file, but again that seems like the right case to optimize for. it seems quite
reasonable to add something to the release notes "when upgrading from mozilla
1.3 to mozilla 1.4 and then switching from one platform to another, page titles
created with the original platform may be corrupt until they are visited again"
i.e. we would check "does this history file have an endianness flag?" if not, we
set it to the current machine's endianness and just leave the old
wrong-endianness titles as they are. Then we just let the titles work themselves
out over time.. since history entries do expire over time, and titles are
reupdated as you visit them.
by the way, you can look at the current endianness of the platform with the
IS_BIG_ENDIAN / IS_LITTLE_ENDIAN flags...
Comment 17•22 years ago
|
||
some sample pseudocode:
#if IS_BIG_ENDIAN
if (file_is_big_endian)
SetStringValue(str);
else
SetStringValue(ConvertToLittleEndian(str));
#else
if (file_is_little_endian)
SetStringValue(str);
else
SetStringValue(ConvertToBigEndian(str));
#endif
| Assignee | ||
Comment 18•22 years ago
|
||
I agree that the code should optimize for the common case, all things being
equal, but the overhead of adding one character per title seems trivial to
me. Especially when you consider that the current code essentially doubles
the size of title strings in the file, i.e. by not converting to UTF8,
apparently merely to work around some other bug.
Can you refer me to more information on the nonzero mYarn_Forms bug? Is that
the only thing that prevents enabling the code to convert to UTF8?
Comment 19•22 years ago
|
||
Yeah, that's the only thing preventing us from going utf8. I don't know where
the bug is, but the problem is basically when you start using non-zero forms
like we were trying to do for UTF8, the database gets corrupted if you compact
it. I narrowed it down to short titles (4 characters or less, like say "TREK" on
www.trekbikes.com - though now they've changed it) - mork does not atomize
titles if they take less than 4 bytes.. and the parser would barf when it hit
one of these non-atomized strings.
that would really be a better bug to fix, but it means fixing mork! :(
| Assignee | ||
Comment 20•22 years ago
|
||
I tried enabling the UTF8 conversion code, and found that setting a nonzero
Form doesn't seem to work at all; all titles were corrupted, regardless of
size, apparently because the Form was not properly detected. I started to
look at the relevant code, but I fear that fixing this may require a greater
effort than I am able to make. (It might help if there were documentation
on the database format, and the implementation -- is such documentation
available? Or was that the reason for your ":("?...)
Comment 21•22 years ago
|
||
Two (obvious) workarounds, documented here only for completeness:
* Nuke the history (Edit -> Pref -> Navigator -> History -> Clear History)
each time you switch endian machines.
or
* Have two profiles, one for big-endian machines, one for little-endian
| Assignee | ||
Comment 22•22 years ago
|
||
This patch implements the suggested alternate solution, of setting the byte
order in the history file, and byte swapping title strings when running on
an other-endian machine. Changes include:
- A ByteOrder token is added to the history file meta row.
- When the file byte order is not yet set, it is initialized to the machine
byte order (either "BE" or "LE").
- When the file is opened, its byte order is noted, and a flag is set
indicating
whether Unicode strings need to be byte-swapped.
- In OpenNewFile(), code is added to create the meta row (fixes an existing
small bug which becomes more of a problem with the addition of the new
token.
- By adding byte-swapping methods for the explicit types, we can handle the
case where PRUnichar is either a 16-bit or 32-bit value; anything else
should cause a compile error.
- The file byte order is reset when the history is cleared.
The patch is against the Mozilla 1.2.1 base (as stored in our local CVS
repository).
Comment 23•22 years ago
|
||
Comment on attachment 126046 [details] [diff] [review]
Patch to set the byte order in the history file
For reference, this applies cleanly (modulo a few offsets) to a current trunk
tree.
Alec, could you please take a look at this?
Attachment #126046 -
Flags: review?(alecf)
Comment 24•22 years ago
|
||
great! the approach sounds ideal. I'll take a look at the patch sometime around
monday or tuesday (6/23-6/24)
however, this is something I'd like to see land on the trunk before it hits a
branch like 1.2.. I'm guessing enough has changed since 1.2 such that you're
going to need to make a new patch...
Comment 25•22 years ago
|
||
oops, just saw boris's comments.. amazing that it applies! :)
I'll still review on mon/tues
Comment 26•22 years ago
|
||
Comment on attachment 126046 [details] [diff] [review]
Patch to set the byte order in the history file
wow, love the AFS file path, haven't seen that in ages :)
sr=alecf
Comment 27•22 years ago
|
||
Comment on attachment 126046 [details] [diff] [review]
Patch to set the byte order in the history file
wow, love the AFS file path, haven't seen that in ages :)
sr=alecf
Attachment #126046 -
Flags: review?(alecf) → review+
Updated•22 years ago
|
Attachment #126046 -
Flags: superreview+
Comment 28•22 years ago
|
||
Changed a few idents to follow Mozilla convention (mFoo for members, aFoo for
function arguments, nsnull vs NULL) and checked into the trunk. Robert, thanks
for the patch and your patience.... I'm sorry this has taken so ridiculously
long. :(
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•