Closed Bug 815473 Opened 12 years ago Closed 12 years ago

Eliminate runtime computed constant tables

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla20

Tracking Flags:

Tracking

Status

firefox19

---

fixed

firefox20

---

fixed

b2g18

---

fixed

People

(Reporter: laszio.bugzilla, Assigned: laszio.bugzilla)

References

Details

Attachments

(3 files, 7 obsolete files)

Part 1/3: Replace runtime computed gUnicodeToGBKTable by constants. 12 years ago Ting-Yuan Huang 171.78 KB, patch	smontagu : review+	Details \| Diff \| Splinter Review
Part 2/3: Replace runtime calculated sUnpremultiplyTable/sPremultiplyTable with constants. 12 years ago Ting-Yuan Huang 811.16 KB, patch		Details \| Diff \| Splinter Review
Part 3/3: Replace the runtime computed FFS table with constants. 12 years ago Ting-Yuan Huang 4.09 KB, patch	justin.lebar+bug : feedback+	Details \| Diff \| Splinter Review
Bug 815473 - Replace runtime computed gUnicodeToGBKTable by constants. r=smontagu 12 years ago Ting-Yuan Huang 173.80 KB, patch		Details \| Diff \| Splinter Review
Part 2/3: Replace runtime calculated sUnpremultiplyTable/sPremultiplyTable with constants. 12 years ago Ting-Yuan Huang 11.85 KB, patch	roc : review+	Details \| Diff \| Splinter Review
Part 3/3: Replace the runtime computed jpeg_nbits_table with constants. 12 years ago Ting-Yuan Huang 5.44 KB, patch	justin.lebar+bug : review+	Details \| Diff \| Splinter Review
Part 1/3: Replace runtime computed gUnicodeToGBKTable by constants. r=smontagu 12 years ago Ting-Yuan Huang 173.80 KB, patch	akeybl : approval-mozilla-aurora+ akeybl : approval-mozilla-b2g18+	Details \| Diff \| Splinter Review
Part 2/3: Replace runtime computed sUnpremultiplyTable/sPremultiplyTable with constants. r=roc 12 years ago Ting-Yuan Huang 11.88 KB, patch	akeybl : approval-mozilla-aurora+ akeybl : approval-mozilla-b2g18+	Details \| Diff \| Splinter Review
Part 3/3: Replace runtime computed jpeg_nbits_table by constants. r=jlebar 12 years ago Ting-Yuan Huang 5.44 KB, patch		Details \| Diff \| Splinter Review
Part 3/3: Replace runtime computed jpeg_nbits_table by constants. r=jlebar 12 years ago Ting-Yuan Huang 5.44 KB, patch	akeybl : approval-mozilla-aurora+ akeybl : approval-mozilla-b2g18+	Details \| Diff \| Splinter Review

Ting-Yuan Huang

Assignee

Description

•

12 years ago

There are 4 coefficient tables computed at runtime:

   Num:    Value  Size Type    Bind   Vis      Ndx Name
194068: 012f3170 65536 OBJECT  LOCAL  DEFAULT   24 jpeg_nbits_table
180068: 012e0be4 65536 OBJECT  LOCAL  DEFAULT   24 _ZL17sPremultiplyTable
180066: 012d0be4 65536 OBJECT  LOCAL  DEFAULT   24 _ZL19sUnpremultiplyTable
 14744: 012bb15c 41984 OBJECT  LOCAL  DEFAULT   24 _ZL18gUnicodeToGBKTable

http://mxr.mozilla.org/mozilla-central/source/media/libjpeg/jchuff.c#24
http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxUtils.cpp#24
http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxUtils.cpp#25
http://mxr.mozilla.org/mozilla-central/source/intl/uconv/ucvcn/nsGBKConvUtil.cpp#18

They can be easily converted to constants to save 233KB .bss at the expense of elf size. Is it worth it?

The following two dynamically allocated tables are actually redundant of the above, although they are in quite different source trees.
http://mxr.mozilla.org/mozilla-central/source/content/canvas/src/CanvasRenderingContext2D.cpp#3558
http://mxr.mozilla.org/mozilla-central/source/content/canvas/src/CanvasRenderingContext2D.cpp#3362

Ting-Yuan Huang

Assignee

Updated

•

12 years ago

Blocks: 802446

Ting-Yuan Huang

Assignee

Comment 1

•

12 years ago

> The following two dynamically allocated tables are actually redundant of the
> above, although they are in quite different source trees.
> http://mxr.mozilla.org/mozilla-central/source/content/canvas/src/
> CanvasRenderingContext2D.cpp#3558
> http://mxr.mozilla.org/mozilla-central/source/content/canvas/src/
> CanvasRenderingContext2D.cpp#3362

These two seem to be called rarely. I played for a while without hitting the breakpoints.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 2

•

12 years ago

(In reply to Ting-Yuan Huang from comment #0)
> http://mxr.mozilla.org/mozilla-central/source/media/libjpeg/jchuff.c#24

Seems to me this could be converted to a 256-entry table storing the number of bits per byte, at very low cost (two array lookups and an add instead of one array lookup). Then storing it in .rodata definitely makes sense.

> http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxUtils.cpp#24
> http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxUtils.cpp#25
> http://mxr.mozilla.org/mozilla-central/source/intl/uconv/ucvcn/nsGBKConvUtil.
> cpp#18
> 
> They can be easily converted to constants to save 233KB .bss at the expense
> of elf size. Is it worth it?

Seems like a good idea especially since we can share that data across processes if we put it in .rodata.

> The following two dynamically allocated tables are actually redundant of the
> above, although they are in quite different source trees.
> http://mxr.mozilla.org/mozilla-central/source/content/canvas/src/
> CanvasRenderingContext2D.cpp#3558
> http://mxr.mozilla.org/mozilla-central/source/content/canvas/src/
> CanvasRenderingContext2D.cpp#3362

These are used by canvas getImageData and putImageData. These should definitely be modified to use the gfxUtils tables.

Ting-Yuan Huang

Assignee

Comment 3

•

12 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #2)
> (In reply to Ting-Yuan Huang from comment #0)
> > http://mxr.mozilla.org/mozilla-central/source/media/libjpeg/jchuff.c#24
> 
> Seems to me this could be converted to a 256-entry table storing the number
> of bits per byte, at very low cost (two array lookups and an add instead of
> one array lookup). Then storing it in .rodata definitely makes sense.

Sounds great! and it should have very little impacts on performance. I have to bother try server again...

By the way, this is actually a ceil(lgX) table in spite of its name. Although we may utilize instructions like CLZ in modern CPUs, mixing compiler dependent built-ins or directives would be a mess.

> > The following two dynamically allocated tables are actually redundant of the
> > above, although they are in quite different source trees.
> > http://mxr.mozilla.org/mozilla-central/source/content/canvas/src/
> > CanvasRenderingContext2D.cpp#3558
> > http://mxr.mozilla.org/mozilla-central/source/content/canvas/src/
> > CanvasRenderingContext2D.cpp#3362
> 
> These are used by canvas getImageData and putImageData. These should
> definitely be modified to use the gfxUtils tables.

Maybe we can address this in the second patch?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 4

•

12 years ago

(In reply to Ting-Yuan Huang from comment #3)
> Maybe we can address this in the second patch?

Sure.

Ting-Yuan Huang

Assignee

Comment 5

•

12 years ago

Attached patch Part 1/3: Replace runtime computed gUnicodeToGBKTable by constants. (obsolete) — Details — Splinter Review

Attachment #688692 - Flags: review?(smontagu)

Ting-Yuan Huang

Assignee

Comment 6

•

12 years ago

Attached patch Part 2/3: Replace runtime calculated sUnpremultiplyTable/sPremultiplyTable with constants. (obsolete) — Details — Splinter Review

Attachment #688693 - Flags: review?(roc)

Ting-Yuan Huang

Assignee

Comment 7

•

12 years ago

Attached patch Part 3/3: Replace the runtime computed FFS table with constants. (obsolete) — Details — Splinter Review

Attachment #688694 - Flags: review?(justin.lebar+bug)

Simon Montagu :smontagu

Comment 8

•

12 years ago

Comment on attachment 688692 [details] [diff] [review]
Part 1/3: Replace runtime computed gUnicodeToGBKTable by constants.

Review of attachment 688692 [details] [diff] [review]:
-----------------------------------------------------------------

r=me, but can you just add a comment to cp936invmap.h explaining how it was generated?

Attachment #688692 - Flags: review?(smontagu) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 9

•

12 years ago

Comment on attachment 688693 [details] [diff] [review]
Part 2/3: Replace runtime calculated sUnpremultiplyTable/sPremultiplyTable with constants.

Review of attachment 688693 [details] [diff] [review]:
-----------------------------------------------------------------

Instead of checking in these giant tables, can we use macros or templates to generate them during compilation?

Something like
#define GEN_TABLE_ENTRY_PREMULTIPLY(v) (((v) >> 8)*((v) & 0xFF))/255,
#define GEN_TABLE_ENTRY_PREMULTIPLY_2(v) GEN_TABLE_ENTRY_PREMULTIPLY(v) \
                                         GEN_TABLE_ENTRY_PREMULTIPLY(v + 1)
#define GEN_TABLE_ENTRY_PREMULTIPLY_4(v) GEN_TABLE_ENTRY_PREMULTIPLY_2(v) \
                                         GEN_TABLE_ENTRY_PREMULTIPLY_2(v + 2)
...
#define GEN_TABLE_ENTRY_PREMULTIPLY_65536(v) GEN_TABLE_ENTRY_PREMULTIPLY_32768(v) \
                                             GEN_TABLE_ENTRY_PREMULTIPLY_32768(v + 32768)
static const uint8_t table = {
GEN_TABLE_ENTRY_PREMULTIPLY_65536(0)
};

You'd want to check that this produces the exact values we want :-)

Ting-Yuan Huang

Assignee

Comment 10

•

12 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #9)
> Comment on attachment 688693 [details] [diff] [review]
> Part 2/3: Replace runtime calculated sUnpremultiplyTable/sPremultiplyTable
> with constants.
> 
> Review of attachment 688693 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Instead of checking in these giant tables, can we use macros or templates to
> generate them during compilation?
> 
> Something like
> #define GEN_TABLE_ENTRY_PREMULTIPLY(v) (((v) >> 8)*((v) & 0xFF))/255,
> #define GEN_TABLE_ENTRY_PREMULTIPLY_2(v) GEN_TABLE_ENTRY_PREMULTIPLY(v) \
>                                          GEN_TABLE_ENTRY_PREMULTIPLY(v + 1)
> #define GEN_TABLE_ENTRY_PREMULTIPLY_4(v) GEN_TABLE_ENTRY_PREMULTIPLY_2(v) \
>                                          GEN_TABLE_ENTRY_PREMULTIPLY_2(v + 2)
> ...
> #define GEN_TABLE_ENTRY_PREMULTIPLY_65536(v)
> GEN_TABLE_ENTRY_PREMULTIPLY_32768(v) \
>                                             
> GEN_TABLE_ENTRY_PREMULTIPLY_32768(v + 32768)
> static const uint8_t table = {
> GEN_TABLE_ENTRY_PREMULTIPLY_65536(0)
> };
> 
> You'd want to check that this produces the exact values we want :-)

Cool! but msvc seems to be unable to handle this:

https://tbpl.mozilla.org/php/getParsedLog.php?id=17661781&tree=Try#error0
e:/builds/moz2_slave/try-w32-dbg/build/gfx/thebes/gfxUtils.cpp(61) : fatal error C1060: compiler is out of heap space

Although a simple compiler tweak should fix it, should we adopt the naive but compiler friendly approach?

Ting-Yuan Huang

Assignee

Comment 11

•

12 years ago

Attached patch Bug 815473 - Replace runtime computed gUnicodeToGBKTable by constants. r=smontagu (obsolete) — Details — Splinter Review

Thanks, smontagu. I added the C snippet used to generate it. A more formal way to do this might be a perl script similar to that generates cp936map.h. Since this table should be modified rarely, if ever, that seems to be unnecessary.

Attachment #688692 - Attachment is obsolete: true

Justin Lebar (not reading bugmail)

Comment 12

•

12 years ago

Comment on attachment 688694 [details] [diff] [review]
Part 3/3: Replace the runtime computed FFS table with constants.

Have you done any performance measurements with this change?  It makes what appears to be a hot path in jpeg encoding significantly more complicated, by adding what appears to be an unpredictable branch.

We encode JPEGs when we take screenshots in B2G, so I'd want to know if this is a significant slowdown.

Please update media/libjpeg/MOZCHANGES and media/libjpeg/mozilla.diff.

Also, if the perf aspects don't look too bad, this would probably be worth upstreaming (perhaps with a compile-time pref to enable).

Attachment #688694 - Flags: review?(justin.lebar+bug) → feedback+

Justin Lebar (not reading bugmail)

Comment 13

•

12 years ago

DRC, I'm curious if you have any thoughts on patch 3/3.  (DRC is the maintainer of libjpeg-turbo.)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 14

•

12 years ago

(In reply to Ting-Yuan Huang from comment #10)
> Although a simple compiler tweak should fix it, should we adopt the naive
> but compiler friendly approach?

Another easy option is to generate it with a python script instead.

Ting-Yuan Huang

Assignee

Comment 15

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #12)
> Have you done any performance measurements with this change?  It makes what
> appears to be a hot path in jpeg encoding significantly more complicated, by
> adding what appears to be an unpredictable branch.

Are there any suggested benchmarks?

After twiddling the codes a bit, the compiler now generates conditional execution codes which are available on ARM:

    movs    r2, r0, lsr #8
    andeq   r2, r0, #255
    add     r3, pc, r3
    ldrb    r3, [r3, r2]    @ zero_extendqisi2
    movne   r0, #8
    moveq   r0, #0
    add     r0, r0, r3
    and     r0, r0, #255

Although it needs 7 more instructions, it should not be much slower due to better data locality, especially on CPUs with small caches.

I'm also OK for the simplest approach: a 64KB table in .rodata. Since it can be shared, the overhead is really small :)

DRC

Comment 16

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #13)
> DRC, I'm curious if you have any thoughts on patch 3/3.  (DRC is the
> maintainer of libjpeg-turbo.)

In libjpeg-turbo, I treat performance the same as I treat stability.  That means that every change that might potentially affect performance must undergo rigorous testing on a variety of platforms to ensure that performance has not regressed.  Since I don't get paid to work on libjpeg-turbo unless I have a specific commercial contract to enhance or fix something in it (which hasn't happened in a while-- mostly, it has reached a point at which everyone is happy with it), I'm very reluctant to make any modifications like this, because I would have to eat the cost of doing a complete performance regression test.

I did run a quick & dirty test with this on one platform, using the same benchmark methodology described here:  http://www.libjpeg-turbo.org/About/Performance.  I don't claim these images to be canonical, but they represent the sort of performance that I specifically care about (bearing in mind that the reason why I started maintaining libjpeg-turbo to begin with was as a way of accelerating remote display of 3D applications.)

Patch 3 regresses performance for 32-bit code by 5% (+/- 0.5%) in some cases and 64-bit code by 7% (+/- 0.5%) in some cases, and that's on my fastest machine.  It may very well be more on another machine, but this is enough that I do not trust the performance neutrality of the patch.  I've been doing this sort of thing long enough (17 years next month) that I can look at a patch like this and tell that it will have an impact, and my spidey sense told me 5% before I even ran the tests.  What probably would be performance-neutral is to build the entire 64k-entry table at compile time, but why bother?  This seems to be solving a problem that isn't a problem.  Additionally, I would still have to do a complete regression suite before even a patch I felt was performance-neutral would be accepted upstream.  Unless I could see a significant performance gain, there is no impetus for me to even fool with this, nor is there any impetus for a customer to pay me to fool with it.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 17

•

12 years ago

(In reply to DRC from comment #16)
> What probably would be
> performance-neutral is to build the entire 64k-entry table at compile time,
> but why bother?  This seems to be solving a problem that isn't a problem.

Because if we have N different Gecko processes each instantiating the JPEG decoder, then the current approach means we have N*64K of identical data tables. We can at least convert this to 64K of .rodata which can be shared between the processes.

It sounds like we should just do that.

DRC

Comment 18

•

12 years ago

I'm sure there's a good reason why Mozilla does that, but it seems to me that spawning a lot of sub-processes in that manner is rather unique behavior.  The table is allocated globally in LJT, so even if you used threads, this would be a non-issue.

Since this is an issue that is likely to affect only Mozilla for the moment, I would suggest maintaining a downstream patch that pre-computes the entire 64k table at compile time.  Since LJT only computes the table once per process, I don't think a static table would improve upstream performance, so there isn't any reason for me to look at merging the patch upstream unless other applications were discovered to have the same sorts of multi-process memory footprint issues.

Ting-Yuan Huang

Assignee

Comment 19

•

12 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #14)
> (In reply to Ting-Yuan Huang from comment #10)
> > Although a simple compiler tweak should fix it, should we adopt the naive
> > but compiler friendly approach?
> 
> Another easy option is to generate it with a python script instead.

Did you mean
1) 2 giant table with scripts generating them, or
2) just 2 scripts to generate tables at compile time?

If 2), I would rather generate them by your nice macro, since either way I have to make changes to the build system :)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 20

•

12 years ago

(In reply to Ting-Yuan Huang from comment #19)
> Did you mean
> 1) 2 giant table with scripts generating them, or
> 2) just 2 scripts to generate tables at compile time?

1 script could generate 2 files at build time.

> If 2), I would rather generate them by your nice macro, since either way I
> have to make changes to the build system :)

Those CPP macros aren't as nice as a python script :-)

Ting-Yuan Huang

Assignee

Comment 21

•

12 years ago

Attached patch Part 2/3: Replace runtime calculated sUnpremultiplyTable/sPremultiplyTable with constants. (obsolete) — Details — Splinter Review

Attachment #688693 - Attachment is obsolete: true

Attachment #688693 - Flags: review?(roc)

Attachment #691186 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 22

•

12 years ago

Comment on attachment 691186 [details] [diff] [review]
Part 2/3: Replace runtime calculated sUnpremultiplyTable/sPremultiplyTable with constants.

Review of attachment 691186 [details] [diff] [review]:
-----------------------------------------------------------------

Great!

Attachment #691186 - Flags: review?(roc) → review+

Ting-Yuan Huang

Assignee

Comment 23

•

12 years ago

Attached patch Part 3/3: Replace the runtime computed jpeg_nbits_table with constants. (obsolete) — Details — Splinter Review

Attachment #688694 - Attachment is obsolete: true

Attachment #691206 - Flags: review?(justin.lebar+bug)

Justin Lebar (not reading bugmail)

Comment 24

•

12 years ago

Comment on attachment 691206 [details] [diff] [review]
Part 3/3: Replace the runtime computed jpeg_nbits_table with constants.

r=me with some tweaks per below.

>diff --git a/media/libjpeg/genTables.py b/media/libjpeg/genTables.py
>new file mode 100755
>--- /dev/null
>+++ b/media/libjpeg/genTables.py
>@@ -0,0 +1,11 @@
>+#!/usr/bin/python
>+
>+import math
>+
>+def table_generator(f):
>+    return ",\n".join([", ".join(["0x%2.2x" % h for h in [f(i) for i in range(r,r+16)]]) for r in range(0, 65536, 16)])
>+
>+f = open("jpeg_nbits_table.h", "w")
>+f.write(table_generator(lambda i: math.ceil(math.log(i + 1, 2))) + "\n")
>+f.close()

This is kind of difficult to read (at least for me)  because of the nested list
comprehensions.

A simple loop feels more Pythonic to me:

> for i in range(65536):
>   f.write('%2d' % math.ceil(math.log(i + 1, 2)))
>   if i != 65535:
>     f.write(', ')
>   if (i + 1) % 16 == 0:
>     f.write('\n')

Note I switched to %2d instead of 0x%2.2x, although that's just my preference.

>diff --git a/media/libjpeg/Makefile.in b/media/libjpeg/Makefile.in
>--- a/media/libjpeg/Makefile.in
>+++ b/media/libjpeg/Makefile.in
>@@ -158,8 +158,13 @@ EXPORTS	= \
> 	jpegint.h \
> 	jpeglib.h \
> 	$(NULL)
> 
> # need static lib for some of the libimg componentry to link properly
> FORCE_STATIC_LIB = 1
> 
> include $(topsrcdir)/config/rules.mk
>+
>+CONSTANT_TABLES:
>+	$(PYTHON) $(srcdir)/genTables.py
>+
>+jchuff.$(OBJ_SUFFIX): CONSTANT_TABLES

I think you need to add CONSTANT_TABLES to .DUMMY.  Or otherwise just add the
right dependencies:

  jpeg_nbits_table.h: genTables.py
      $(PYTHON) $(srcdir)/genTables.py

  jchuff.$(OBJ_SUFFIX): jpeg_nbits_table.h

should do it, I think.  You'd probably want to make this change in the other patch, too.


For the record, another alternative which might perform well is detailed at

  http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog.

This uses a small lookup table (so we avoid blowing out the cache), but is still branch-less.  We'd have to tweak it to make it work for 16-bit ints, though.

I don't think we need to investigate this unless the current approach is found to be lacking for some reason.

Attachment #691206 - Flags: review?(justin.lebar+bug) → review+

Ting-Yuan Huang

Assignee

Comment 25

•

12 years ago

Attached patch Part 1/3: Replace runtime computed gUnicodeToGBKTable by constants. r=smontagu — Details — Splinter Review

just renaming the title.

Attachment #689145 - Attachment is obsolete: true

Ting-Yuan Huang

Assignee

Comment 26

•

12 years ago

Attached patch Part 2/3: Replace runtime computed sUnpremultiplyTable/sPremultiplyTable with constants. r=roc — Details — Splinter Review

Add the CONSTANT_TABLES rule to .PHONY.

Attachment #691186 - Attachment is obsolete: true

Ting-Yuan Huang

Assignee

Comment 27

•

12 years ago

Attached patch Part 3/3: Replace runtime computed jpeg_nbits_table by constants. r=jlebar (obsolete) — Details — Splinter Review

1. Add rule CONSTANT_TABLES to .PHONY.
2. Refine the script generating constant tables according to review comments.

Attachment #691206 - Attachment is obsolete: true

Ting-Yuan Huang

Assignee

Comment 28

•

12 years ago

Attached patch Part 3/3: Replace runtime computed jpeg_nbits_table by constants. r=jlebar — Details — Splinter Review

fix a typo in patch title...

Attachment #691684 - Attachment is obsolete: true

Ting-Yuan Huang

Assignee

Comment 29

•

12 years ago

https://tbpl.mozilla.org/?tree=Try&rev=91097f392043

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 30

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/865dc2eb00fb
https://hg.mozilla.org/integration/mozilla-inbound/rev/ef72d3080dc7
https://hg.mozilla.org/integration/mozilla-inbound/rev/a5f94f2c5f94

Assignee: nobody → thuang

Keywords: checkin-needed

Justin Lebar (not reading bugmail)

Comment 31

•

12 years ago

Ting-Yuan, can you ask for approval here if you think these patches are safe to take for B2G v1?

Ting-Yuan Huang

Assignee

Comment 32

•

12 years ago

Comment on attachment 691681 [details] [diff] [review]
Part 1/3: Replace runtime computed gUnicodeToGBKTable by constants. r=smontagu

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: 41KB redundant memory per process
Testing completed: tryserver*1*2.
Risk to taking this patch (and alternatives if risky): Scrambled GBK characters if something wrong, that is very unlikely to happen since this is just a replacement of runtime computed table by a constant table. I also compared the tables by dumping them using GDB.
String or UUID changes made by this patch:

*1 There are two tests covering this change: xpcshell/tests/intl/uconv/tests/unit/test_encode_gbk.js, xpcshell/tests/intl/uconv/tests/unit/test_bug367026.js
*2 https://tbpl.mozilla.org/?tree=Try&rev=91097f392043

Attachment #691681 - Flags: approval-mozilla-b2g18?

Ting-Yuan Huang

Assignee

Comment 33

•

12 years ago

Comment on attachment 691682 [details] [diff] [review]
Part 2/3: Replace runtime computed sUnpremultiplyTable/sPremultiplyTable with constants. r=roc

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: 128KB redundant memory per process
Testing completed: tryserver*1
Risk to taking this patch (and alternatives if risky): Rendering errors when using the corresponding APIs, that is very unlikely to happen since this is just a replacement of runtime computed table by a constant table. I also compared the tables by dumping them using GDB.
String or UUID changes made by this patch:

*1 https://tbpl.mozilla.org/?tree=Try&rev=91097f392043

Attachment #691682 - Flags: approval-mozilla-b2g18?

Ting-Yuan Huang

Assignee

Comment 34

•

12 years ago

Comment on attachment 692126 [details] [diff] [review]
Part 3/3: Replace runtime computed jpeg_nbits_table by constants. r=jlebar

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: 64KB redundant memory per process
Testing completed: tryserver
Risk to taking this patch (and alternatives if risky): The encoded/decoded jpegs would be scrambled, that is very unlikely to happen since this is just a replacement of runtime computed table by a constant table. I also compared the tables by dumping them using GDB.
String or UUID changes made by this patch:

*1 https://tbpl.mozilla.org/?tree=Try&rev=91097f392043

Attachment #692126 - Flags: approval-mozilla-b2g18?

Justin Lebar (not reading bugmail)

Comment 35

•

12 years ago

Comment on attachment 691681 [details] [diff] [review]
Part 1/3: Replace runtime computed gUnicodeToGBKTable by constants. r=smontagu

I think we're triple-landing on m-c, aurora, and b2g18.

Attachment #691681 - Flags: approval-mozilla-aurora?

Justin Lebar (not reading bugmail)

Updated

•

12 years ago

Attachment #691682 - Flags: approval-mozilla-aurora?

Justin Lebar (not reading bugmail)

Updated

•

12 years ago

Attachment #692126 - Flags: approval-mozilla-aurora?

Ed Morley [:emorley]

Comment 36

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/865dc2eb00fb
https://hg.mozilla.org/mozilla-central/rev/ef72d3080dc7
https://hg.mozilla.org/mozilla-central/rev/a5f94f2c5f94

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

:Ms2ger (he/him; ⌚ UTC+1/+2)

Updated

•

12 years ago

Product: Boot2Gecko → Core

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 37

•

12 years ago

Comment on attachment 691682 [details] [diff] [review]
Part 2/3: Replace runtime computed sUnpremultiplyTable/sPremultiplyTable with constants. r=roc

Review of attachment 691682 [details] [diff] [review]:
-----------------------------------------------------------------

Could you look over the Makefile change here?

Attachment #691682 - Flags: review?(gps)

Gregory Szorc [:gps]

Comment 38

•

12 years ago

Comment on attachment 691682 [details] [diff] [review]
Part 2/3: Replace runtime computed sUnpremultiplyTable/sPremultiplyTable with constants. r=roc

Review of attachment 691682 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/Makefile.in
@@ +397,5 @@
> +.PHONY: CONSTANT_TABLES
> +CONSTANT_TABLES:
> +	$(PYTHON) $(srcdir)/genTables.py
> +
> +gfxUtils.$(OBJ_SUFFIX): CONSTANT_TABLES

This is less than ideal for a few reasons.

1) genTables.py actually produces files. So, the use of a .PHONY target is inappropriate. You should instead list the files it produces explicitly in the Makefile.

2) The CONSTANT_TABLES rule will be evaluated needlessly. This isn't too bad because it doesn't take long. Still, it's something we like to avoid.

I'd do something like:

gen_tables_output := sPremultiplyTable.h sUnpremultiplyTable.h

$(gen_tables_output): $(srcdir)/genTables.py
    $(PYTHON) $(srcdir)/genTables.py

gfxUtils.$(OBJ_SUFFIX): $(gen_tables_output)

---

What you have will work, it just isn't fully proper. I think a follow-up bug should be sufficient if you don't want to deal with it now.

Attachment #691682 - Flags: review?(gps)

Alex Keybl [:akeybl]

Comment 39

•

12 years ago

Comment on attachment 691681 [details] [diff] [review]
Part 1/3: Replace runtime computed gUnicodeToGBKTable by constants. r=smontagu

Approving since this appears to be a fairly low risk memory win. Before landing, please give QA some pointers on how to identify possible regressions in Aurora desktop/mobile and B2G18.

Attachment #691681 - Flags: approval-mozilla-b2g18?

Attachment #691681 - Flags: approval-mozilla-b2g18+

Attachment #691681 - Flags: approval-mozilla-aurora?

Attachment #691681 - Flags: approval-mozilla-aurora+

Alex Keybl [:akeybl]

Updated

•

12 years ago

Attachment #691682 - Flags: approval-mozilla-b2g18?

Attachment #691682 - Flags: approval-mozilla-b2g18+

Attachment #691682 - Flags: approval-mozilla-aurora?

Attachment #691682 - Flags: approval-mozilla-aurora+

Alex Keybl [:akeybl]

Updated

•

12 years ago

Attachment #692126 - Flags: approval-mozilla-b2g18?

Attachment #692126 - Flags: approval-mozilla-b2g18+

Attachment #692126 - Flags: approval-mozilla-aurora?

Attachment #692126 - Flags: approval-mozilla-aurora+

Alex Keybl [:akeybl]

Updated

•

12 years ago

Keywords: verifyme

Ryan VanderMeulen [:RyanVM]

Comment 40

•

12 years ago

(In reply to Alex Keybl [:akeybl] from comment #39)
> Approving since this appears to be a fairly low risk memory win. Before
> landing, please give QA some pointers on how to identify possible
> regressions in Aurora desktop/mobile and B2G18.

(FYI, I'm holding off on uplifting based on this comment)

Flags: needinfo?(justin.lebar+bug)

Justin Lebar (not reading bugmail)

Comment 41

•

12 years ago

> Approving since this appears to be a fairly low risk memory win. Before landing, please give QA some 
> pointers on how to identify possible regressions in Aurora desktop/mobile and B2G18.

Based on my experience with Mozilla QA and the nature of these patches, it seems extremely unlikely that our QA will be able to test this in a meaningful way beyond everyday use of the devices.

If you wanted to test this, you should look for

 1. artifacts in screenshots (screenshots use the JPEG encoder)
 2. artifacts in code which uses canvas, particularly canvas's getData functions (this tickles the premultiply code)
 3. (I'm not sure how you test the GBK table bit, or whether this is covered by automated tests.)

Given that the artifacts in all three cases might be too small to notice in casual observation, ensuring correctness demands either automated tests or code inspection, neither of which falls under QA's responsibilities.

Flags: needinfo?(justin.lebar+bug)

Ryan VanderMeulen [:RyanVM]

Comment 42

•

12 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/36934087350a
https://hg.mozilla.org/releases/mozilla-aurora/rev/875a5bd397b4
https://hg.mozilla.org/releases/mozilla-aurora/rev/dfa3a86df2e7

This doesn't apply cleanly enough to b2g18 for me to comfortably uplift it (Part 2 is where I gave up).

status-firefox19: --- → fixed

status-firefox20: --- → fixed

Target Milestone: --- → mozilla20

Justin Lebar (not reading bugmail)

Comment 43

•

12 years ago

I think I have this ported to b2g18; I'm compiling now to make sure.

Justin Lebar (not reading bugmail)

Comment 44

•

12 years ago

Follow-up to remove now-undefined methods from the CanvasRenderingContext header:

https://hg.mozilla.org/integration/mozilla-inbound/rev/70ded23485ab
https://hg.mozilla.org/releases/mozilla-aurora/rev/79ad9e9928ed

Justin Lebar (not reading bugmail)

Comment 45

•

12 years ago

https://hg.mozilla.org/releases/mozilla-b2g18/rev/4bce303e2b05
https://hg.mozilla.org/releases/mozilla-b2g18/rev/5a9c41dc647a
https://hg.mozilla.org/releases/mozilla-b2g18/rev/767c77596873

status-firefox18: --- → fixed

Ryan VanderMeulen [:RyanVM]

Updated

•

12 years ago

status-b2g18: --- → fixed

status-firefox18: fixed → ---

Ting-Yuan Huang

Assignee

Updated

•

12 years ago

Blocks: 823351

Alex Keybl [:akeybl]

Comment 46

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #41)
>  1. artifacts in screenshots (screenshots use the JPEG encoder)
>  2. artifacts in code which uses canvas, particularly canvas's getData
> functions (this tickles the premultiply code)

Thanks - this is a good list of things for QA to look out for (unexpected artifacts).

Ed Morley [:emorley]

Comment 47

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/70ded23485ab

Jeff Muizelaar [:jrmuizel]

Comment 48

•

11 years ago

FWIW, the build parts of this patch are fixed in bug 827934

Tracy Walker [:tracy]

Comment 49

•

10 years ago

mass remove verifyme requests greater than 4 months old

Keywords: verifyme

DRC

Comment 50

•

10 years ago

Popping the stack on this, because I'm getting requests upstream for a similar solution for much the same reasons (reduced footprint on Android devices.)  The upstream submitter pointed out that, as long as we're using 8-bit JPEG, it should not be necessary for the table to have more than 2048 entries.  Can anyone comment on that?  It seems solid to me.  The original libjpeg code in fact checks for this:

  if (nbits > MAX_COEF_BITS+1)
    ERREXIT(state->cinfo, JERR_BAD_DCT_COEF);

(where MAX_COEF_BITS is 10 for 8-bit components.)

Ting-Yuan Huang

Assignee

Comment 51

•

10 years ago

Because in LJT the check is removed altogether, I assume the inputs are always valid. Otherwise you may want to check the inputs before the look-up to prevent array-index-out-of-bound.

DRC

Comment 52

•

10 years ago

Hmmm...  Well, this is on the encoder side, so it's not as if a malformed JPEG file could cause the index to be out of bounds.  Basically, if the index is out of bounds, then it's because of a bug in the libjpeg code, and I'm assuming that since no such bug has surfaced in 15 years, it's probably safe.  :/

However, I'm asking the upstream submitter whether he can adopt the same approach that you guys are using, i.e. sharing the table among multiple instances.  Since Mozilla has already thoroughly tested that approach, it makes sense to leverage your testing.

Also, wanted to clarify that the latest version of this patch (where you pre-compute the entire table) is performance-neutral.  My comments above regarding a 5-7% performance deficit were referring to an earlier version.

DRC

Comment 53

•

10 years ago

The Mozilla patch for precomputing the bit table has been integrated upstream in our subversion trunk (libjpeg-turbo 1.4 evolving.)  I also integrated a patch that uses clz/bsr intrinsics for bit counting by default on ARM platforms, and that seems to actually improve performance significantly in some cases (x86 still uses the LUT, because the implementation of bsr seems to have more variation among x86 chips, and on some of them, it causes a significant performance regression.)

Feedback welcomed.

Ting-Yuan Huang

Assignee

Comment 54

•

10 years ago

For newer gcc (>= 4.7?), will __builtin_clrsb() fit better than __builtin_clz()? Specifically,

#define JPEG_NBITS(x) (31 - __builtin_clrsb(x))
#define JPEG_NBITS_NONZERO(x) JPEG_NBITS(x)

Although it should be no faster with arm and thumb2, which have conditional executions, it probably helps in thumb mode. Don't know the behavior on x86/x64.

DRC

Comment 55

•

10 years ago

I'll ask the upstream submitter.  I'd be interested in seeing if clrsb() might benefit x86 as well.  Currently, none of the machines I have has a compiler new enough to test it.  :|

DRC

Comment 56

•

10 years ago

The reply was that, apparently, __builtin_clrsb() is implemented in software on ARM.  There is some question, however, as to whether the branch (x ? JPEG_NBITS_NONZERO(x) : 0) is necessary.  Apparently the ARM documentation says that clz(0) is well-defined, but the GCC documentation says that __builtin_clz(0) is undefined.

If you are interested in following up, please engage in the upstream discussion so I don't have to keep acting as intermediary:
https://sourceforge.net/p/libjpeg-turbo/patches/57

You need to log in before you can comment on or make changes to this bug.