Closed Bug 372631 Opened 18 years ago Closed 17 years ago

Tp regressions from bug 370588 landing

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: roc, Assigned: roc)

References

Details

Attachments

(3 files, 1 obsolete file)

change the ATSUGetGlyphBounds parameter only 18 years ago Robert O'Callahan (:roc) (email my personal email if necessary) 1.26 KB, patch	vlad : review+	Details \| Diff \| Splinter Review
Rework Mac glyph extraction code, store glyph advances in appunits 18 years ago Robert O'Callahan (:roc) (email my personal email if necessary) 88.87 KB, patch		Details \| Diff \| Splinter Review
better patch 18 years ago Robert O'Callahan (:roc) (email my personal email if necessary) 91.20 KB, patch	vlad : review+ pavlov : superreview+	Details \| Diff \| Splinter Review
Windows fix 18 years ago Robert O'Callahan (:roc) (email my personal email if necessary) 9.50 KB, patch	pavlov : review+	Details \| Diff \| Splinter Review

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Description

•

18 years ago

MacOSX Darwin 8.8.4 bm-xserve08 Dep: Tp up about 12ms (7%). WINNT 5.1 bl-bldxp01 Dep: Tp up about 70ms (12%). I'll profile Mac. Not sure about Windows, I thought Stuart had checked and not found that kind of regression. To some extent we're bound to see some regression from bug 370588 because we're deliberately recording all glyph advances now. Still, this is surprisingly large.

Stuart Parmenter

Comment 1

•

18 years ago

I haven't tried with the latest patch but I can reprofile tomorrow.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 2

•

18 years ago

I profiled an opt build in Shark, reflowing a text-heavy document. Reflow is about 49% of the time in firefox-bin. 30% of time is under nsTextFrame::MeasureText. 20% of time is under gfxAtsuiFontGroup::InitTextRun. 15.4% of time is under ATSUGetGlyphBounds. (The rest is font matching etc.) 2.7% of time is in PostLayoutOperationCallback. 1.1% of time is in gfxTextRun::GetAdvanceWidth. 5.1% of time is in gfxTextRun::Draw, entirely under _moz_cairo_show_glyphs. I guess I need to try profiling a build before the landing as well.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 3

•

18 years ago

One thing I need to try is that Mac's subpixel glyph positioning means the CompressedGlyph optimization doesn't work very well. We can easily test the impact of this by rounding glyph advances up to the nearest pixel. Probably we should store all the glyph advances in appunits since we need to round to appunits anyway.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 4

•

18 years ago

We should fix that, but it doesn't seem to help Tp. The good news is that I can reproduce the Tp regression locally.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 5

•

18 years ago

Hmm. Comparing profiles of Tp before and after 370588: Time spend in ATSUGetGlyphBounds rose from 4.3% of the profile to 7.3%. ApplyMorph (some ATS function under ASTULayoutGlyphs) jumped from 0.0% to 0.7%. GetMetricsForGlyphs (another ATS function) jumped from 0.0% to 0.6%. PositionDeviceGlyphs rose from 0.0% to 1.5%. TTextLineLayout::ConstructGlyphRecordArray up from 0.5% to 1.0%. OTOH ApplyKerning did not rise significantly (0.5% to 0.6%) (still a bit odd since we've disabled kerning). And SetupGlyphArrays went down from 0.4% to 0.1%. PostLayoutCallback is new code that accounts for 0.9% of the profile. That seems fair. But why did the other ATSU functions slow down?The only hypothesis that springs to mind immediately is "cache effects imposed by the code in PostLayoutCallback that sets up the textrun". That still spends about half its time allocating memory in gfxTextRun::SetDetailedGlyphs which is a bit disconcerting. I need to check to see why we're doing that, the detailed-glyph path should be mostly untouched in non-intl Tp (now that I'm rounding glyph advances). These profiles have about 70K samples each so I think these results are statistically significant.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 6

•

18 years ago

Okay. The calls to SetDetailedGlyphs are happening because my test build has the experimental patch for bug 372629 which stores "missing glyph" glyph info in the detailed glyph records, *and* we're seeing bogus missing glyphs because it turns out ATSUI puts a zero-width 0xFFFF glyph for each second and subsequent character of a ligature. The current code totally misinterprets those glyphs. If I ignore zero-width 0xFFFF glyphs, then most of Tp slowdown disappears. It's possible, therefore, that fixing bug 372732 (which seems to be something similar on Windows) would help Tp on Windows.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 7

•

18 years ago

I'll try a patch that converts the glyph advances in gfxTextRun to always be rounded to the nearest appunit. That alone should be enough to fix the Tp regression on Mac.We'll just have to be careful when we update the Mac "missing glyph" code.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

18 years ago

Blocks: 370588

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 8

•

18 years ago

Well, I've fixed ATSUI code up so we never call SetDetailedGlyphs in the Tp page set, but performance still isn't back where it should be. My latest theory is that having the 8-bit code path insert bidi override characters is triggering slow paths in ATSUI. I'll test that next.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 9

•

18 years ago

That was worth doing (saved at least one internal malloc call in ATSUI's RunBidiAlgorithm), but it seems that the *real* issue was actually just that I changed the parameter to ATSUGetGlyphBounds from kATSUseFractionalOrigins to kATSUseDeviceOrigins. Changing that back seems to fix most of the performance regression! Anyway, I'll attach a complete patch.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 10

•

18 years ago

Attached patch change the ATSUGetGlyphBounds parameter only — Details — Splinter Review

This is a *trivial* patch that provides a significant speedup. By my benchmarks the other changes I've made help too, but this is trivial so we can land it first. (Note that the result of the ATSUGetGlyphBounds call is not used, all we want to do here is force the layout to happen.)

Attachment #257642 - Flags: review?(vladimir)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 11

•

18 years ago

Attached patch Rework Mac glyph extraction code, store glyph advances in appunits (obsolete) — Details — Splinter Review

This fixes the rest of the issues I identified that could have regressed performance. 1) Don't use LRO/RLO/PDF characters for LTR 8-bit text, they're not needed and trigger bidi processing in ATSUI. 2) Extract ligatures properly from the ATSUI glyphrecords. In fact I've reworked the glyph extraction code thoroughly. It should be a lot easier to understand now, and more robust too. 3) On all platforms, store glyph advances in appunits and always round them to the nearest appunit. This lets us use the optimized storage *much* more frequently on Mac (from < 50% of the time to 100% of the time during Tp). This eliminates a very large number of mallocs. This may help Pango a bit too. Windows doesn't seem to do subpixel glyph placement. This also includes the ATSUGetGlyphBounds flag fix from the previous patch.

Attachment #257740 - Flags: review?(pavlov)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 12

•

18 years ago

Comment on attachment 257740 [details] [diff] [review] Rework Mac glyph extraction code, store glyph advances in appunits I want vlad to look at the ATSUI changes. If the glyph extraction algorithm is too complicated then let me know so I can add more comments where required or simplify the code further.

Attachment #257740 - Flags: review?(vladimir)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

18 years ago

Depends on: 373081

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 13

•

18 years ago

(In reply to comment #11) > This fixes the rest of the issues I identified that could have regressed > performance. On Mac anyway. I don't have a good way to measure Windows performance.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 14

•

18 years ago

Comment on attachment 257642 [details] [diff] [review] change the ATSUGetGlyphBounds parameter only I wonder if we can force layout even more cheaply, without grabbing the bounds...

Attachment #257642 - Flags: review?(vladimir) → review+

philippe (part-time)

Comment 15

•

18 years ago

This patch seems to cause problems with ligatures (esp f+l, l+l) when text-align:justify is used. Sample URL: http://www.l-c-n.com/phiw/ OS X 10.4.8, ppc Minefield, checkout: Wed Mar 14 09:39:47 JST 2007

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 16

•

18 years ago

Attached patch better patch — Details — Splinter Review

Fix the issue in previous patch with justification with old-textframe and ligatures. The problem was that we are zeroing out spacing inside ligatures before we convert spacing from absolute (spacing includes character widths) to relative (spacing excludes character widths) by subtracting character widths. This is bad because it means we end up with negative relative spacing inside the ligature ... solution is of course to zero out intra-ligature spacing after we've done the conversion to relative. Also instead of just dropping the spacing I now just move the spacing to after the ligature. There is a reftest issue with this patch on Mac that I think is actually a problem with the nquartz cairo backend. I'll file a bug for that separately.

Attachment #257740 - Attachment is obsolete: true

Attachment #258618 - Flags: review?(pavlov)

Attachment #257740 - Flags: review?(vladimir)

Attachment #257740 - Flags: review?(pavlov)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 17

•

18 years ago

Filed bug 374006 on that.

Depends on: 374006

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 18

•

18 years ago

Comment on attachment 258618 [details] [diff] [review] better patch vlad, I thought you'd reviewed the new ATSUI code here, but I notice that you didn't mark the previous patch as r+.

Attachment #258618 - Flags: review?(vladimir)

Neil Deakin

Updated

•

18 years ago

Blocks: 178513

Stuart Parmenter

Comment 19

•

18 years ago

Comment on attachment 258618 [details] [diff] [review] better patch Can you post a followup patch that makes gfxTextRun::GetAppUnitsPerDevUnit() return a PRUint32 and in places where you're doing x = tr->GetAppUnitsPerDevUnit() can you make those be const? Also make GetAppUnitsPerDEvUnits() be a const method? related you have this: + double appUnitsPerDevUnit = aTextRun->GetAppUnitsPerDevUnit(); + double devUnitsPerAppUnit = 1/appUnitsPerDevUnit; which should probably be: + const PRUint32 appUnitsPerDevUnit = aTextRun->GetAppUnitsPerDevUnit(); + double devUnitsPerAppUnit = 1.0 / appUnitsPerDevUnit; vlad should take a closer look at the mac changes

Attachment #258618 - Flags: review?(pavlov) → review+

Stuart Parmenter

Updated

•

18 years ago

Attachment #258618 - Flags: review+ → superreview+

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 20

•

18 years ago

Comment on attachment 258618 [details] [diff] [review] better patch Mac changes look good, and I can actually follow that code now :)

Attachment #258618 - Flags: review?(vladimir) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 21

•

18 years ago

OK, that patch is checked in. Just need to fix Windows now. I can reproduce on a Windows machine here.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 22

•

18 years ago

Filed bug 374567 with a patch for the request in comment #19.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 23

•

18 years ago

Mac Tp seems to have unregressed nicely.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 24

•

18 years ago

Took a look at some VTune profile data for Windows. Some of it makes no sense, but it appears there are two major issues compared to before the patch, at least on Chris's laptop: 1) MeasureAndDrawReallyFast called gfxWindowsFont::GetHFONT which spends most of its time in MakeHFONT. InitTextRunGDI calls gfxWindowsFont::CairoScaledFont which calls MakeCairoScaledFont which calls MakeHFONT *and* then does cairo_scaled_font_create. The latter is enormously expensive, about 7 times more expensive than MakeHFONT and about 28% of the after-profile. We need to avoid messing with cairo fonts on the fast path. 2) GetCharacterPlacementA is about 30% of the after-profile, and its total time is about double the cost of the near-equivalent GetGlyphIndicesA + GetTextExtentPoint32A in the before-profile. We should try avoiding GetCharacterPlacementA, probably by calling GetGlyphIndicesA and GetTextExtentExPointI. I don't really trust these numbers, and it seems they're not consistent across different machines, but it gives us something to work on.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 25

•

18 years ago

Attached patch Windows fix — Details — Splinter Review

This seems to fix things on Windows. We call GetGlyphIndices(A/W) and then call GetTextExtentExPointI to get the partial string widths. As far as we can tell it gives us equal or better performance to before the landing of 370588. We don't get ligatures, but I think we do get kerning. Fallback is there. I also restored a Truetype check that is required now that we're using GetGlyphIndices again.

Attachment #259289 - Flags: review?(pavlov)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Comment 26

•

18 years ago

hey, roc, when you create a patch which includes the code for another bug, please add a comment to the bug...

Blocks: 372732

Stuart Parmenter

Updated

•

18 years ago

Attachment #259289 - Flags: review?(pavlov) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 27

•

18 years ago

checked in that Windows patch. We'll see how that goes...

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 28

•

18 years ago

Tp on bl-bldxp01 has dropped quite a bit. Need to wait for a while to see just how well we're doing.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 29

•

18 years ago

We seem to be still slower than before the patch by maybe 20ms on Tp2 and 30-40ms, so that patch undid about half of the regression on bl-bldxp01. Not sure how much more we can squeeze out here.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 30

•

17 years ago

I'm going to have to mark this fixed.

Status: NEW → RESOLVED

Closed: 17 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.