Closed Bug 638381 Opened 13 years ago Closed 7 years ago

define nsIUnicodeDecoder::Convert contract more clearly

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- wontfix

People

(Reporter: dbaron, Assigned: hsivonen)

References

Details

(Keywords: sec-audit, Whiteboard: [sg:audit][fixed by encoding_rs])

I think we should define the nsIUnicodeDecoder::Convert contract much more clearly than it's currently defined, and fix both callers and decoders to follow that contract.  Some callers may be making assumptions that may not be guaranteed by all decoders, and some decoders may be violating the rules in various ways.

In particular, I think we should look at the Convert contract as a state machine, in which the return value from Convert represents the current state.  This would imply, for example, that passing an empty source buffer can't change the state (see bug 638236).  I think the contract should require that the manner in which the source buffer is broken up into calls to Convert must not change the result stream.  (A whole bunch of decoders currently violate this.)

The states would essentially be:
 1. waiting for input (normal)
 2. waiting for input with state buffered from previous input that has not yet been output
 3. have output to write
 4. error
in which only some transitions between these states would be allowed.  (In particular, the allowed transitions would be 1->2, 1->3, 1->4, 2->2, 2->3, 2->4, and 3->1.  (But see also bug 638379.)  And the Convert method should return **if and only if** the state machine is (a) in state 4, (b) in state 3 and the output buffer is full, or (c) in state 1 or 2 and the input buffer is empty.  (It's tempting to allow the transition back to state (1) when the output buffer is full just to check if the input buffer is simultaneously 

This implies some things that have, I think, gotten us into trouble in the past:
 * if the output buffer is filled, then the only valid response is NS_OK_UDEC_MOREOUTPUT.  (Or *maybe* we'd want to allow NS_OK, but I tend to think not doing so keeps things simpler.)
 * if the input buffer is not fully consumed, then either an error must be reported or the output buffer must be filled


That said, this seems like a rather large project.  We should perhaps consider importing some other library to do this rather than maintaining the code ourselves.
These correspond to return values:

(In reply to comment #0)
>  1. waiting for input (normal)

NS_OK

>  2. waiting for input with state buffered from previous input that has not yet
> been output

NS_OK_UDEC_MOREINPUT

>  3. have output to write

NS_OK_UDEC_MOREOUTPUT

>  4. error

NS_ERROR_ILLEGAL_INPUT



> (b) in state 3 and the output buffer is full

In this case, the fullness check is done *after* outputing _each_ PRUnichar.  Passing an empty output buffer would be forbidden by the API.

> (c) in state 1 or 2 and the input buffer is empty.

In this case the emptiness check is done *before* reading _each_ char.
To fix a long-standing bug, we should have a way for the caller to signal end of stream in such a way that the converter can produce output (a REPLACEMENT CHARACTER) on that call.
Oh, and we'd also need 3->3 transitions to allow states that require multiple bytes of output.

That said, I think my descriptions of the states aren't very good, perhaps because I really need separate states to represent the "before reading input" vs. "after reading input" within (1) and (2), and likewise "before writing output" vs. "after writing output" within (3).
(In reply to comment #2)
> To fix a long-standing bug, we should have a way for the caller to signal end
> of stream in such a way that the converter can produce output (a REPLACEMENT
> CHARACTER) on that call.

Can we handle that now at the caller by differentiating based on whether the last Convert call returned NS_OK or NS_OK_UDEC_MOREINPUT, i.e., whether we ended in state (1) or (2)?

(Then again, I'd like to move away from handling that at the caller - bug 638379.)
Whiteboard: [sg:audit]
Keywords: sec-audit
Bug 733282 was caused by the lack of clear precedence in the case were situations described by multiple return values hold at the same time.
Group: core-security → layout-core-security
Bug 912470 added documentation for some undocumented-but-relied-upon aspects of the interfaces. (I had to discover what the interfaces *really* are when writing a new decoder and a new encoder...)
Depends on: encoding_rs
The API contract of the rewrite from bug 1261841 is documented. Rust side: https://docs.rs/encoding_rs/ . C++ side: https://hg.mozilla.org/mozilla-central/file/b266a8d8fd59/intl/Encoding.h
Assignee: smontagu → hsivonen
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Whiteboard: [sg:audit] → [sg:audit][fixed by encoding_rs]
Target Milestone: --- → mozilla56
Group: layout-core-security → core-security-release

Can we unhide this, considering that the last ESR that required keeping this hidden was 52?

Flags: needinfo?(dveditz)

Hm, not sure how these missed my "unhiding" queries. possibly because there's no "fixed" status-firefoxXX field? Will have to fix that.

Group: core-security-release
Flags: needinfo?(dveditz)
You need to log in before you can comment on or make changes to this bug.