Closed Bug 723448 Opened 12 years ago Closed 12 years ago

abcdump misreads string literals that begin with the UTF BOM

Categories

(Tamarin Graveyard :: Tools, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: pnkfelix, Assigned: pnkfelix)

Details

Attachments

(2 files)

Spawned off of Bug 707700

abcdump mishandles an abc compiled from the source file containing solely: '\ufeff1234'

I think this is likely an artifact of this code at the start of ByteArrayObject::readUTFBytes:

    String* ByteArrayObject::readUTFBytes(uint32_t length)
    {
        if (m_byteArray.Available() < length)
            toplevel()->throwEOFError(kEOFError);

        const uint8_t* p = (const uint8_t*)m_byteArray.GetReadableBuffer() + m_byteArray.GetPosition();
        // Skip UTF8 BOM (but it is still counted in the length we consume).
        if (length >= 3 && p[0] == 0xEFU && p[1] == 0xBBU && p[2] == 0xBFU)
        {
            p += 3;
            length -= 3;
        }
This code is illustrating that we cannot round-trip a String that starts with a BOM via writeUTFBytes and readUTFBytes.

When I run this code, here is what I get:

  % $AVM testbom.abc 
  s1: 1234 s1.length: 5
  b.length: 7
  read(0) s:  s.length: 0
  read(1) s: ï s.length: 1
  read(2) s: ï» s.length: 2
  read(3) s:  s.length: 0
  read(4) s: 1 s.length: 1
  read(5) s: 12 s.length: 2
  read(6) s: 123 s.length: 3
  read(7) s: 1234 s.length: 4
  s1: 1234 s: 1234 s1.length: 5 s.length: 4 thus equal: false
(In reply to Felix S Klock II from comment #1)
> Created attachment 593783 [details]
> code illustrating issue when using readUTFBytes
> 
> This code is illustrating that we cannot round-trip a String that starts
> with a BOM via writeUTFBytes and readUTFBytes.

(of course it is easy to compensate for this behavior.  investigating now.)
(Moved from Bug 707700, comment 8.)

As I said on the original bug:

Ed: redirect review as you like (or rubber stamp if you prefer).

I am pretty sure we would prefer to accurately capture the string constant we read in (that is, I am claiming this is a 'real bug').
Assignee: nobody → fklockii
Status: NEW → ASSIGNED
Attachment #593816 - Flags: review?(edwsmith)
Attachment #593816 - Flags: review?(edwsmith) → review+
changeset: 7193:99987b969155
user:      Felix S Klock II <fklockii@adobe.com>
summary:   Bug 723448: abcdump: undo in readUTFBytes leading UTF BOM compensation (r=edwsmith).

http://hg.mozilla.org/tamarin-redux/rev/99987b969155
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: