Closed Bug 17889 Opened 25 years ago Closed 9 years ago

Changing character set reloads the page from web.

Categories

(Core :: Internationalization, defect, P3)

defect

Tracking

()

RESOLVED WONTFIX
Future

People

(Reporter: moshev, Assigned: jshin1987)

References

(Blocks 1 open bug)

Details

(4 keywords, Whiteboard: [Hixie-P3] (HTTP))

Attachments

(1 file)

When a site (any site) has finished loading,
pick any character set from the view menu other
than current. The page will start loading from the net again.
I think it is not needed, since you have all the page source
in memory already.
P.S. hope it belongs to i18n, if not, appologies.
Assignee: ftang → buster
buster, we simply do a reload of that page. Is reload in webshell got the data
from the net ? Do we have cache now ?
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → REMIND
Target Milestone: M15
seakmonkey has no cache yet.  I'll mark this REMIND, so that once a cache is in
place we can verify that it is used correctly.
This is similar to bug 6119 (view-source also reloads from the server).
Reopening since cache had arrived and chnaging milestone since M15 is out.
Status: RESOLVED → REOPENED
Resolution: REMIND → ---
Target Milestone: M15 → M16
moving out to M18, added PERF keyword
I may try to find another owner for this.
Status: REOPENED → ASSIGNED
Keywords: perf
Target Milestone: M16 → M18
gordon, can you verify that this is no longer an issue?  I don't know how to 
tell if a page is being loaded from net or cache on my connection here at the 
office.  I think the things to verify are:
1) change charset 
2) view source
3) edit page
Assignee: buster → gordon
Status: ASSIGNED → NEW
OS: Linux → All
This is not just perf, it's a data-loss issue. Real-world example ... A Japanese 
customer at the Internet cafe started up IE 5, logged in to a Japanese Web-mail 
site, noticed that the Japanese text was displaying as gibberish, but didn't know 
how to change it ... so she started composing a new message anyway. Came the time 
to attach a file, all the button text was gibberish and she couldn't remember 
which button was the `Attach' button, so she asked for help. When we changed the 
encoding to Japanese, IE reloaded the page. Her entire message (entered into a 
TEXTAREA on the page, as is usual for Web-mail accounts) was lost, with no way to 
retrieve it. Ouch.

Relying on the cache is probably not a good idea. Mozilla should never reload the 
page on a change of character set, even if the page's caching information says it 
should always be reloaded, and even if the disk cache is set to zero. In this 
way, changing character set is similar to save, print, view source, etc.
URL: Any
Keywords: perf
Hardware: PC → All
Depends on: 40867
Sounds like the problem is higher level than the cache.
Assignee: ftang
Status: NEW → ASSIGNED
Keywords: nsbeta3
nsbeta3+ per bug meeting P2
Keywords: perf
Priority: P3 → P2
Whiteboard: nsbeta3+
how can we verify it is reload from the net or reload from the cache?
momoi- do you have a web server somewhere that we can check the log to tell ?
how can we verify it is reload from the net or reload from the cache?
momoi- do you have a web server somewhere that we can check the log to tell ?
can you teach teruko/blee how to view the server log ?
Yes. It's quite easy to tell from the server log if reloading leads to a new 
access to the server. blee already has an access to a server and I'll be 
happy to assist. If more subtle form of access records is needed that can 
be arranged, too. 
blee- can you try the nsbeta2 with some page have meta charset, and see it 
access the server once or twice ?
Whiteboard: nsbeta3+ → nsbeta3+, waiting for QA result from blee.
QA Contact: teruko → blee
Changed QA contact to blee@netscape.com.
by using nsbeta2
1. If I hit a page with META charest, it will reload by hitting the http server
twice
2. If I change the encoding, it will reload by hitting the http server again
3. If I hit the reload button, it will reload by hitting the http server again.
Erm, a gentle reminder: the fix for this bug should have nothing to do with the 
cache. The page should not be reloaded from anywhere -- Net, cache, whatever -- 
otherwise you will have problems with DOM stuff (for example, form elements you 
have filled in, before changing the encoding, being inadvertently cleared). 
That's why this bug is dependent on bug 40867.
This bug is closely related to bug 6119, in which viewing source or saving
reloads from the web. Not marking as a dup, but this and 6119 are likely to have
the same fix.
When the user select a "character set" , it mean the user decide to say-
The character set the browser currently used is not the one it should be, change 
it to THIS charset. 
By doing so, it WILL and it HAVE to reload because we need to reconvert the data 
from the byte to Unicode BEFORE we pass to parser. 
No matter how we fix it, it required RELOAD. The question is we reload from 
cache or reload from the net. It some code have been apply to the document 
through DOM, it have to be redo again. 

 Matthew Thomas :
>A Japanese customer at the Internet cafe started up IE 5, logged in to a
> Japanese Web-mail site, noticed that the Japanese text was displaying as 
> gibberish, but didn't know how to change it ... so she started composing
> a new message anyway. Came the time to attach a file, all the button text 
>was gibberish and she couldn't remember which button was the `Attach' button,
> so she asked for help. When we changed the encoding to Japanese, IE reloaded 
>the page. Her entire message (entered into a TEXTAREA on the page, as is 
>usual for Web-mail accounts) was lost, with no way to retrieve it. Ouch.

The story tell us
1. If the user correct her mistake when she first see the garbage, she won't 
have this problem
2. If people delay the time to correct the mistake, they need to pay a huge 
price. 
3. If s/he pay a huge price this time, s/he won't make such mistake twice in 
his/her life again, and s/he will remember how to do that for her/his whole 
life.
4. The current SeaMonkey behavior is not worst than IE.

Suggest nsbeta3- 
Whiteboard: nsbeta3+, waiting for QA result from blee.
Frank: Matthew's story is fun, but it's not the worst case behavior of this bug.
The worst case is (for example) when, at the end of a big important transaction
(such as money transfer, stock transactions, etc.) a page is displayed as the
result of your cgi POST transaction. You need to change the charset, but in
doing so, you re-execute the POST because the document is reloaded. This doubles
your transaction, which you *really* don't want to do.

As you noted, there are two places we can get documents from: the cache, and
their original source. Necko does this for us. What we need is what's being
discussed in bug 40867 (see Bill Law's comments on 2000-8-24), which is a way
besides the cache for Necko to hang on to the current document.

See also bug 6119 which describes several other symptoms of this same problem.

You might want to reassign this or mark as a DUP of either 40867 or (better) 6119.
I don't think this is a dup. Once 40867 is fixed, it may be with a new function 
or whatever that the fix for this bug needs to call to get the source when it 
reconverts the bytes to Unicode. You may want to note in bug 40867 that the 
source needs to be retained at the byte level, not the Unicode level.

I can understand this getting nsbeta3- because it's `no worse than IE', though 
the use of IE as a measuring stick for whether bugs get +ed or -ed is mildly 
galling.
Marking as nsbeta3+ per I18N Bug Triage.
Whiteboard: [nsbeta3+]
>The worst case is (for example) when, at the end of a big important transaction
>(such as money transfer, stock transactions, etc.) a page is displayed as the
>result of your cgi POST transaction. You need to change the charset, but in
>doing so, you re-execute the POST because the document is reloaded. This 
>doublesyour transaction, which you *really* don't want to do.
I don't understand this, if this transaction is important, the server shoudl 
send out the charset in the HTTP Content-Type instead of leave it blank and let 
user to switch it. If the server send out correctly, the user won't even need to 
change the character set. The server can easily correct both the display 
problem and this issue by sending out charset= in the HTTP header. 

>You may want to note in bug 40867 that the 
>source needs to be retained at the byte level, not the Unicode level.
For this particular bug, we should cache in byte level, not in the Unicode 
level, because switch the view encoding mean we reinterprete the bytes into 
Unicode. 

I think we should - this bug.
PDT agrees this bug could be nsbeta3-
put (consider to cut) into the status whiteboard
Whiteboard: [nsbeta3+] → [nsbeta3+](consider to cut)
[nsbeta3-] per i18n bug meeting
Whiteboard: [nsbeta3+](consider to cut) → [nsbeta3-]
*** Bug 53724 has been marked as a duplicate of this bug. ***
*** Bug 55725 has been marked as a duplicate of this bug. ***
give this bug to nhotta since this browser related.
Mark this as Moz0.9 and P3. We should decide what we want to do with this bug.
Assignee: ftang → nhotta
Status: ASSIGNED → NEW
Keywords: nsbeta3intl
Priority: P2 → P3
Whiteboard: [nsbeta3-]
Target Milestone: M18 → mozilla0.9
Target Milestone: mozilla0.9 → Future
Changing QA Contact to andreasb.
QA Contact: blee → andreasb
Changing QA contact to ylong@netscape.com.
QA Contact: andreasb → ylong
*** Bug 74043 has been marked as a duplicate of this bug. ***
Whiteboard: [Hixie-P3] (HTTP)

*** This bug has been marked as a duplicate of 82244 ***
Status: NEW → RESOLVED
Closed: 25 years ago23 years ago
Resolution: --- → DUPLICATE
I can still reproduce the problem reported by the original reporter when Mozilla
is in "online" mode (i.e. "view"->"character coding"->[select a different
charset] reloads from network). However, when Mozilla is in "offline" mode, it
seems to happily re-use the data from cache. Does the fix for bug 82244
completely fixes this bug? Or is there a new bug filed for the "online" reload
behavior?

Win2k, build ID:2001072703 trunk
reopen
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Reassign to ftang.
Assignee: nhotta → ftang
Status: REOPENED → NEW
Depends on: 90722
Status: NEW → ASSIGNED
shanjian- can you help to look at this one?
Assignee: ftang → shanjian
Status: ASSIGNED → NEW
Target Milestone: Future → mozilla0.9.6
I don't understand why this bug got reopened. Changing charset will reload the 
page. If cache is available, we will use it. Otherwise, reload the page from 
web. It might looks possible to redo charset conversion without downloading data 
from website, but in fact we need big arch change to do that. That is because 
raw data is not cached anywhere else except cache, and roundtrip conversion 
usually is not possible. Don't try to reopen this bug. File a new bug if you 
believe something does not work as expected, but changing character set does 
need to reload from web (treat network cache as a fast way to access web, and 
network cache is transparent for other componnets inside browser).
Status: NEW → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → WONTFIX
Mark it as verified per Shanjian's coments.
Status: RESOLVED → VERIFIED
> It might looks possible to redo charset conversion without downloading data 
> from website, but in fact we need big arch change to do that.

Incompetency and/or lazyness are not valid reasons for WONTFIXing a bug.

REOPENing. If you are unable or don't want to fix it, reassign it to nobody. This
is a valid bug. Just like with View Source, we should *never* reload from the
network unless the user has requested it.
Status: VERIFIED → REOPENED
Resolution: WONTFIX → ---
reassign to nobody. 
Assignee: shanjian → nobody
Status: REOPENED → NEW
0.9.6 is long gone. -> 0.9.7
Target Milestone: mozilla0.9.6 → mozilla0.9.7
Nobody is nobody@mozilla.org. Accept no cheap imitations. Reassigning.
Assignee: nobody → nobody
unfortunately this won't make it for 0.9.8 -> 0.9.9
Target Milestone: mozilla0.9.7 → mozilla0.9.9
reassign to ftang and future
Assignee: nobody → ftang
Target Milestone: mozilla0.9.9 → Future
Status: NEW → ASSIGNED
A neat idea would be to walk the DOM and change all textnodes/attribute-values 
of the parsed document when the encoding is changed. This assumes two things:

1. That the chars that affect parsing, <>="'&; etc, are the same in all
   encodings

2. that it's possible to losslessly decode and then encode a textstream though
   all charsets

I don't know if these assumpions are true for most or all charsets. However if 
it works it would both fix the reload problem fully as well as the problem 
described in comment #7
> 1. That the chars that affect parsing, <>="'&; etc, are the same in all
>    encodings

True for most encodings. Not true for UTF-7. Kill UTF-7.

All character above "@" included are in danger with SJIS. The one you list are
below, but I remember not being able _at all_ to display some pages in SJIS
before changing the encoding, so there must some case where it creates problems.

> 2. that it's possible to losslessly decode and then encode a textstream though
>   all charsets

To achieve the intended purpose, all conversions from local charset to unicode
that encounter illegal characters should encode them to special, reserved,
garanteed not to be displayed and not to be later interpreted as normal unicode
sequences that can be reversed to go back to the original value.

In fact, only 256 of those are needed. The expansion coefficient would be very
bad, but we don't actually care in this case.

I think this would be useful for some other bugs. 
There's for example that bug with non 7 bit characters in newsgroups name, where
it's systematically interpreted as ISO-8859-1, but it can be something else.

The unicode is reconverted to ISO-8859-1 when sending the data to the
newsserver, but if there was originally an illegal in ISO-8859-1 character, it
will be lost.

I'm sure they are some other cases for that.
Bug 40867 is now fixed. Fixing this bug should be possible now. Nominating for
Mozilla 1.0.1.
Keywords: mozilla1.0.1
> 1. That the chars that affect parsing, <>="'&; etc, are the same in all
>    encodings
not true for ISO-2022-JP neither.
Is this bug report good for Firefox as well, or is there a separate one for
Firefox (or should I submit one)?
what a hack. I have not touch mozilla code for 2 years. I didn't read these bugs
for 2 years. And they are still there. Just close them as won't fix to clean up.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago19 years ago
Resolution: --- → WONTFIX
Mass Re-assigning bugs that Frank Tang Closed on March 1st Spam is his fault

Mass Re-Open to follow
Assignee: ftang → nobody
Mass Bug Re-Open of bugs Frank Tang Closed with no good reason. Spam is his
fault not my own
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Reassigning Franks old bugs to Jungshik Shin for triage - Sorry for spam
Assignee: nobody → jshin1987
Status: REOPENED → NEW
Blocks: 288462
The fact that this is a real reload causes the page to not jump back to the
position it was at when the character set change was initiated. This is also a
problem for embeddors.
Blocks: 336109
Flags: blocking1.9?
Keywords: dataloss
So...  bug 336109 is not really this bug, as this bug was filed.  It's a bug about us not persisting layout state, which is ON PURPOSE for charset reloads: see the CVS blame for why.
Point is, dupping it here makes sure that issue will never get addressed.

Comment 54 is about bug 336109, not this bug.  We restore scroll position just fine on loads from the network if we want.
I don't follow the last two comments. Apart from the totally uninformative "see the CVS blame for why", if not persisting layout state is on purpose, does that mean that bug 336109 WONTFIX, or does the issue need to get addressed?
This bug is about us sometimes pulling updated data from the server on charset reload.

Bug 336109 is about not preserving scroll position.  The reason we don't is that a single mechanism is used for scroll position restoration and form control state restoration.  And we don't want to restore the text for controls that had text by default, since it'll have been decoded with the wrong charset.  This came up recently in some bug where I did the CVS archeology to dig this stuff up....

So the point is, the scroll/cursor state not being restored is just a bug that needs fixing.  The form control state not being restored needs serious thought, because it's not clear what the right fix would be.  Perhaps we can consider just restoring it and hoping the the "value changed" checks we now do in content will prevent the wrong-charset-used-initially thing from biting.  But no matter what, all of that is not this bug.
Bug 391632 is what I was thinking.  So in fact, bug 336109 might be a duplicate of bug 134911.  Should fix the latter and then retest.  ;)
Attached patch experimentSplinter Review
FWIW, this is how I tried to fix this by doing the same as view-source does.
No longer blocks: 336109
Not a regression.  -'ing.
Flags: blocking1.9? → blocking1.9-
QA Contact: amyy → i18n
The Text Encoding menu is unused in 99.99% Firefox sessions. It's not worthwhile to tweak cache behavior for such a rarely used feature.
Status: NEW → RESOLVED
Closed: 19 years ago9 years ago
Resolution: --- → WONTFIX
(In reply to Henri Sivonen (:hsivonen) from comment #64)
> 99.99%

Not a rhetorical percentage but an actual telemetry reading.
(In reply to Henri Sivonen (:hsivonen) from comment #64)
> The Text Encoding menu is unused in 99.99% Firefox sessions. It's not
> worthwhile to tweak cache behavior for such a rarely used feature.

Cache not being used for various excuses seems to be a common thread in many long-standing firefox issues. Even if some of these only happen in 0.01% of the sessions, taken together, they surely amount to more than that. And a behavior can become annoying even if it happens much less often than 100% of the time. So please fix these cache&history issues once and for all.

It might also be worthwhile comparing the time needed to fix it, with the time taken discussing it... If it takes more time (collectively) coming up with excuses and diversions why it shouldn't be fixed than it would take to just fix it, then that means it's worthwhile fixing.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: