Closed Bug 530328 Opened 12 years ago Closed 12 years ago

error behavior of ISO-8859-* decoders (other than ISO-8859-1) depends on previous user of decoder


(Core :: Internationalization, defect, P1)




Tracking Status
status1.9.2 --- beta4-fixed


(Reporter: dbaron, Assigned: dbaron)




(1 file)

This is a continuation of the underlying problem that I discovered while continuing to investigate bug 525581.  I'm filing it as a separate bug so that that bug doesn't keep going, and because I think this bug should be a 1.9.2 blocker on its own merits, even with that crash fixed in the hunspell code.

For a long time, nsCharsetConverterManager has made an optimization for ISO-8859-* decoders:  that they are stateless and therefore the same decoder object can be handed out to any caller. changed all decoders to carry state regarding their error-handling behavior.  This doesn't affect our ISO-8859-1 decoder (since it has a table that maps every byte, and therefore has no concept of an error), but it affects all other ISO-8859-* decoders, and means they are no longer truly stateless.  This error behavior defaults to "do it the old way" but the only (non-HTML5-parser) caller in the tree that sets it explicitly (the HTML/XML-parser) sets it to the strict mode.  This means that once you've parsed an XML (or, I think, HTML) file in an ISO-8859-* encoding, the decoder is set into signaling error handling mode.  (For example, in the Polish localization, the ISO-8859-2 decoder gets set into signaling mode during startup, and this is an important encoding for Polish.)

This doesn't affect the loading of HTML, scripts, or stylesheets, since those callers make repeat calls to the decoder on error.  (I think this may still be required for some decoders even in non-signaling mode.)  However, other users of the decoders (e.g., plugin code, font code) do not, and I'm worried we could end up with a bunch of hard-to-diagnose regressions in some locales (particularly users of the locales that use ISO-8859-* encodings other than ISO-8859-1; see ).

I'm going to attach a patch to this bug (already attached to bug 525581) to remove the special optimization for ISO-8859-* that reuses the supposedly-stateless decoders.  This bug *may* cause a measurable performance regression; it's hard to know without landing it.

I'm then going to attach patches to two other bugs that should help get that performance regression back (though one of them may not be a significant help, but I've had it in my tree for ages).
Flags: blocking1.9.2?
Attachment #413833 - Flags: review?(smontagu)
The two performance improvement bugs are bug 530330 and bug 213197.  The latter may not make much difference.
Attachment #413833 - Flags: review?(smontagu) → review+
Closed: 12 years ago
Priority: -- → P1
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a1
Attachment #413833 - Flags: approval1.9.2?
Attachment #413833 - Flags: approval1.9.2?
Flags: blocking1.9.2? → blocking1.9.2+
Haven't seen any performance notifications (not yet, anyway).
You need to log in before you can comment on or make changes to this bug.