Closed Bug 129387 Opened 24 years ago Closed 8 years ago

speed up Unicode en/decoders for large character sets(CJK)

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla56

People

(Reporter: nicolas, Assigned: hsivonen)

References

(
URL
)

Details

(Keywords: intl, perf, Whiteboard: [fixed by encoding_rs])

Attachments

(2 files, 6 obsolete files)

a 'test' patch for binary search 23 years ago Jungshik Shin 11.00 KB, patch		Details \| Diff \| Splinter Review
a new patch (binary search turned on for all CJK mapping tables) 23 years ago Jungshik Shin 13.62 KB, patch		Details \| Diff \| Splinter Review
a quick'n'dirty shell script to generate all CJK uf\|ut files 23 years ago Jungshik Shin 7.52 KB, text/plain		Details
a revised shell script to generate CJK umap files 23 years ago Jungshik Shin 8.45 KB, text/plain		Details
a new shell script(with ucvja map source url updated) 23 years ago Jungshik Shin 8.41 KB, text/plain		Details
updated patch 22 years ago Jungshik Shin 16.31 KB, patch		Details \| Diff \| Splinter Review
updated patch 22 years ago Jungshik Shin 16.31 KB, patch		Details \| Diff \| Splinter Review
updated shell script to generate CJK umap files 22 years ago Jungshik Shin 8.91 KB, text/plain		Details

Nicolas Guyomard

Reporter

Description

•

24 years ago

Gtk platform. Any page with korean characters will use a lot of CPU. During the page loading, several seconds are spent in nsFontGTKNormal::GetWidth(). Any page with a charset "euc-kr" will spend on average 50 to 100 times more time in this function than if it is a western one.

Jacky Lam

Comment 1

•

24 years ago

It is highly related to the performance of function uMapCode() in intl/src/uconv/umap.c Within the function, sequential search is used to find correct mapping in *.u[t]f]. So, the larger the table, the poorer the performance. I have try to use binary search instead. I got a great improvement but 'slightly' broken those mapping table which is not sorted by srcBegin field. But it can be fixed by regenerating the table sorted by srcBegin. Anyway, I could find the raw table now. Anyone point me to get that?

Rui Xu

Updated

•

24 years ago

Keywords: intl

QA Contact: ruixu → ylong

Roy Yokoyama

Comment 2

•

24 years ago

assign to smontagu for performance

Assignee: yokoyama → smontagu

kill this account

Comment 3

•

24 years ago

Simon: I'd recommend you chat with Frank about this.

Yuying Long

Comment 4

•

24 years ago

Confirm the refered URL page does use a lot of CPU on RH7.2.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Jarrod Gray

Updated

•

24 years ago

Keywords: perf

R.K.Aa.

Comment 5

•

23 years ago

The performance issue resembles bug 113549 (thai (?) fonts) When i loaded http://www.nstda.or.th/newsroom/pr/pr010998.html in the beginning of December, it took around two minutes to load. When i loaded it again in the middle of February, it took over 5 minutes. (Linux, builds from around the time of testing)

Simon Montagu :smontagu

Updated

•

23 years ago

Status: NEW → ASSIGNED

Frank Tang

Updated

•

23 years ago

Blocks: 157673

Frank Tang

Comment 6

•

23 years ago

probablly adding cache code into korean converter could solve this issue.

Frank Tang

Comment 7

•

23 years ago

reassign back to ftang I won't fix this issue for a while. currently, Korean Linux is not in the priority of my team. Anyone who want to improve it is welcome to take it (hint hint jshin, katakai ) I think the way to improve it is to 1. rewrite the korean converters use similar way we did in simp chinese one, or 2. add a cache code for the umap (I have some half stuff there) and use either 32, 64, or 128 array to cache the mapping result based on the lower 5,6,7 bits of the input field. mark it as future for now. take out blocking 157673 people care about Linux/Unix performance or Korean performance are welcome to grab this bug and work on it. This is not a priority for my group right now.

Assignee: smontagu → ftang

No longer blocks: 157673

Status: ASSIGNED → NEW

Target Milestone: --- → Future