Closed Bug 245384 Opened 21 years ago Closed 18 years ago

[ps] Unicode printing improvements

Categories

(Core :: Printing: Output, defect)

x86
Linux
defect
Not set
minor

Tracking

()

RESOLVED WONTFIX

People

(Reporter: kherron+mozilla, Assigned: kherron+mozilla)

Details

(Keywords: memory-footprint)

Attachments

(2 files)

The PS printing module prints PRUnichar (16-bit) strings by writing the string to the postscript output and calling a custom procedure called "unicodeshow". unicodeshow processes the string one byte pair at a time and uses a variety of rules to find a glyph matching the character code; one of the rules is to consult a hash table mapping unicode values to glyph names which is embedded into each print job. This process needs to be taken out and shot, but until that day comes, it could be made more efficient. Some issues that can be addressed are: 1) The hash table is hardcoded into mozilla as a big string defining 1051 entries, which is copied verbatim into every print job. This wastes space both inside mozilla and in each print job. It also precludes doing anything more intelligent with the contents of the hash. 2) The string is written to the postscript output using a four-character octal escape for every byte. This is wasteful for latin1 text and makes it harder to read the postscript source. 3) The unicodeshow procedure performs a lot of processing for each character. Mozilla should generate more friendly postscript.
This patch introduces three things: 1) An enumerated type for the postscript glyph names built into mozilla 2) A compact data buffer storing the text of these glyph names 3) An array mapping unicode values to glyph names for the unicodeshow fallback behavior. When generating postscript to print a PRUnichar string, mozilla will identify runs of latin1 characters and print those using an ordinary postscript "show" operation. In this case, most characters just represent themselves, which reduces the size of the print job by seven bytes per character. For characters outside the latin1 character set, mozilla will generate a "unicodeshow" procedure call as usual. It will also track these characters in a character code map; at the end of the print job, it will use the new data tables to generate a unicode->glyph hash table customized to the current print job. For print jobs containing only latin1 text, mozilla leaves out the hash table and unicodeshow postscript logic entirely. Mozilla also embeds a latin1 encoding vector into every print job. I've reworked that to be defined using the enum list instead of storing the actual character strings. With my linux debug builds, this patch reduces code size by about 9.5k of text and 1k of data. Printing the mozilla start page, which is all latin1, yields a print job about 45k smaller. Documents with international text will see less improvement, and larger documents will see more improvement.
nsPostScriptObj.cpp contains large blocks of lines which were just shifted right. Here's a diff of that file with -b added to the diff flags.
Attachment #149858 - Flags: review?(tor)
Comment on attachment 149858 [details] [diff] [review] Glyph name rework I'm a bit uneasy about the large tables produced by a unreleased tool, as it somewhat limits our options if the mapping needs to be modified in the future. I'd like to see it polished as needed and checked into the tree somewhere. Other than that, r=tor.
Attachment #149858 - Flags: review?(tor) → review+
Closing this. The code in question is obsolete on the trunk, and only major bug fixes are being taken on the branches.
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: