Closed Bug 520942 Opened 16 years ago Closed 7 years ago

Parallelism Opportunities in CSS Selector Matching

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: mohammad.r.haghighat, Unassigned)

Details

Attachments

(9 files, 1 obsolete file)

Parallelized selector matching patch 16 years ago Carmen Badea 34.03 KB, patch		Details \| Diff \| Splinter Review
Performance results for the Mozilla page load tests 16 years ago Carmen Badea 38.87 KB, image/jpeg		Details
Performance results for the Zimbra benchmark suite 16 years ago Carmen Badea 28.49 KB, image/jpeg		Details
Parallelized selector matching patch - updated w/ license information 16 years ago Carmen Badea 37.54 KB, patch		Details \| Diff \| Splinter Review
HotPar'10 Paper 16 years ago Carmen Badea 312.09 KB, application/pdf		Details
Parallelized CSS selector matching - NSPR version 16 years ago Carmen Badea 27.50 KB, patch		Details \| Diff \| Splinter Review
Parallelized CSS selector matching - NSPR version with main thread as worker thread 16 years ago Carmen Badea 30.86 KB, patch		Details \| Diff \| Splinter Review
NSPR versions performance results for the Mozilla page load tests on Nehalem 16 years ago Carmen Badea 29.03 KB, image/jpeg		Details
NSPR version performance results for the Mozilla page load tests on Atom 16 years ago Carmen Badea 32.70 KB, image/jpeg		Details
Merged to tip 16 years ago Boris Zbarsky [:bzbarsky] 26.91 KB, patch		Details \| Diff \| Splinter Review

Moh Haghighat

Reporter

Description

•

16 years ago

The purpose of this bug report is capturing the results of parallelism opportunity investigations in CSS rule-matching and tracking the related implementation progress. This bug is also closely related to the bug #493651, which is about performance opportunities in CSS rule-matching in a more general context: https://bugzilla.mozilla.org/show_bug.cgi?id=493651

Carmen Badea

Comment 1

•

16 years ago

Attached patch Parallelized selector matching patch (obsolete) — Details — Splinter Review

Carmen Badea

Comment 2

•

16 years ago

Attached image Performance results for the Mozilla page load tests — Details

Carmen Badea

Comment 3

•

16 years ago

Attached image Performance results for the Zimbra benchmark suite — Details

Carmen Badea

Comment 4

•

16 years ago

First, in order to assess the speedup potentials of parallelizing the SelectorMatchesTree process, we built a tracing mechanism based on VProf, the value-profiling package under the SpiderMonkey source tree. This mechanism achieves the following: 1. Determines which branch in the SelectorMatchesTree function is most often executed (for the Mozilla page load tests, as well as for the Zimbra suite, we saw that it was very heavily biased towards descendant selectors) 2. For each SelectorMatchesTree call, it yields the execution trace (we used notations to indicate the decisions taken in each iteration, e.g. "p" means that a match against the parent/ancestor was attempted, "n" means that there was no "cached" ancestor available, "b" means that a match against a sibling was attempted, "m" means that a match was found for the current simple selector, "1" means that the function finished because there were no more ancestors available to match against, "4" means that the function exited with a successful match). The results obtained with this tracing mechanism show that in the vast majority of cases, the matching process for a descendant selector results in a non-match. Here is a sample of the most frequent iteration execution patterns encountered when executing SelectorMatchesTree function calls for the Mozilla page load tests: #Occurences Pattern #Iterations %iterations covered(cumulative) 5561595 p3 1 2.93 1997994 ppppppppppn1 10 13.43 1977238 pppppppppppn1 11 24.88 1846126 pn3 1 25.85 1599769 ppppppppn1 8 32.58 1530957 pppppppppn1 9 39.83 1293911 ppppppppppppn1 12 47.99 1187742 pppppppn1 7 52.37 840304 ppppppn1 6 55.02 619361 pppppppppppppn1 13 59.25 550482 ppppppppppppppn1 14 63.31 418940 pppppppppppppppn1 15 66.61 382272 ppppppppppppppppn1 16 69.83 310138 pppppn1 5 70.65 259582 pm4 1 70.78 224274 pnm4 1 70.90 216842 pppppppppppppppppn1 17 72.84 184728 ppppppppppppppppppn1 18 74.59 176367 pmppppppppppn1 11 75.61 173219 4 0 75.61 156926 pmpppppppppn1 10 76.43 152407 pppppppppppppppppppn1 19 77.96 141351 pmppppppn1 7 78.48 138621 ppppppppppppppppppppn1 20 79.94 127323 pppppppppppppppppppppn1 21 81.34 125789 pmppppppppn1 9 81.94 117422 ppppn1 4 82.18 105090 pmpppppppn1 8 82.63 96911 pppppppppppppppppppppppn1 23 83.80 82972 ppppppppppppppppppppppn1 22 84.76 82084 ppnpnpnpnpnpnpnpnpnpn1 11 85.23 80338 ppnpnpnpnpnpnpnpnpn1 10 85.66 Based on the profiling&tracing results obtained, we bank on the majority of rule non-match behavior observed. We re-factored the selector matching algorithm into a parallel equivalent so that for each SelectorMatchesTree call, the iterations that carry out the matching process are distributed in chunks to and executed in parallel by an adjustable number of worker threads. There is some degree of speculation, as not all iterations might get executed; the speculative work that proves to be unnecessary is discarded. Furthermore, the code is also designed in such a way that the speculative work does not cause any unwanted side effects. As observed through profiling, in the majority of cases all the iterations need to be executed and having them execute in parallel would lead to considerable performance improvement. There is also a check by each worker thread to avoid unnecessary computations if another thread has already found a match. The parallel rule matching process we implemented depends on three adjustable parameters: 1. the number of worker threads available 2. the default workload size to be assigned to a worker thread for processing (i.e. no. of matching iterations) 3. a threshold on the workload size that needs to be hit before enabling the parallel selector matching. The implementation is currently Pthreads based. We tested the performance of our parallelized prototype using the Mozilla page load tests on a Nehalem machine and with the best configuration out of the ones we tested (2 worker threads, a default workload size of 4 and minimum workload threshold of 28), we saw performance improvements in ~ 87% of the pages tested, with a maximum speedup of 1.84x. The first JPEG file attached shows the plotted performance results over the 390 web pages tested. The second JPEG file attached shows the speedups achieved for the Zimbra Collaboration suite. We're still working on refining the parallelizing algorithm and on cleaning up the code (we need to clean up the comments and take out the debugging code). The patch attached contains the current code we have. Let us know if there's any more info that we can provide. -- Carmen

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 5

•

16 years ago

+ * Copyright 2009 Intel Corporation. All rights reserved. Code contributed to Mozilla has to be under the MPL/GPL/LGPL tri-license as described in http://www.mozilla.org/MPL/license-policy.html . Is Intel willing to contribute the SelectorThread.h/cpp code under that license? (See the license headers for the other files in the directory for examples of what I'd expect the license header to look like.) (Note that the changes to nsCSSRuleProcessor.cpp are already under that license.)

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 6

•

16 years ago

Another important basic question: did you test this in an --enable-debug build? There are many threadsafety invariants that are checked by assertions in debug builds; if this patch violates some of those, then that's an additional cost we'd need to consider (because in order to avoid crashing due to things like races to set reference count variables, we'd need to fix such bugs). That said, this certainly looks interesting; I think you've discussed it a good bit more with Boris than with me, so his comments would probably be more useful than mine (since he has a better sense of your work).

Carmen Badea

Comment 7

•

16 years ago

Hi, We didn't use --enable-debug, only --enable-debugger-info-modules=yes. This is the .mozconfig file used: ac_add_options --enable-optimize ac_add_options --disable-debug ac_add_options --enable-debugger-info-modules=yes ac_add_options --disable-accessibility ac_add_options --enable-shared ac_add_options --disable-libxul ac_add_options --disable-tests ac_add_options --disable-mochitest ac_add_options --enable-application=browser mk_add_options MOZ_CO_PROJECT=browser -- Carmen (In reply to comment #6) > Another important basic question: did you test this in an --enable-debug > build? There are many threadsafety invariants that are checked by assertions > in debug builds; if this patch violates some of those, then that's an > additional cost we'd need to consider (because in order to avoid crashing due > to things like races to set reference count variables, we'd need to fix such > bugs). > > > That said, this certainly looks interesting; I think you've discussed it a good > bit more with Boris than with me, so his comments would probably be more useful > than mine (since he has a better sense of your work).

Moh Haghighat

Reporter

Comment 8

•

16 years ago

(In reply to comment #5) > Is Intel willing to contribute the SelectorThread.h/cpp code under that license? Sure. We have our legal approval for doing so, and have already done it in the case of VProf. Carmen, please look at the the license block from VProf, copy it into the newly introduced files, and resubmit the patch. David, thanks for the catch. -moh

Boris Zbarsky [:bzbarsky]

Comment 9

•

16 years ago

So from a quick skim this might in fact be ok threadsafety-wise due to the limited range of checks SelectorThread::parallelSelectorMatches performs (namespace, tag, id, class). Would need to check that carefully, of course, but it should all just be atom and integer compares. And you have a check up front to restrict the parallel matching to cases where SelectorMatchesTree is called with a selector with null mNext, mOperator being the descendant combinator, no negations, pseudo-classes, or attr selectors... So the basic approach looks sound (and the operating assumption that selectors of the form "foo bar" rarely match on the "foo" part is correct). Which OSes was this tested on? I assume Linux given the pthreads usage? I wonder whether NSPR threads would give similar results (and make the code more portable)...

Boris Zbarsky [:bzbarsky]

Comment 10

•

16 years ago

Oh, another question, for Carmen. How much time do you have to spend on this, in terms of the NSPR threads question above, cleaning up some of the code duplication introduced in SelectorMatchesTree, etc? That is, are you planning to do that sort of thing yourself, up through checkin, or do you want someone to help get the patch into shape to check in?

Boris Zbarsky [:bzbarsky]

Comment 11

•

16 years ago

Oh, and the other key property that makes this safe is that the main thread blocks on the worker threads' matching work, so there's no danger of the objects involved going away or anything like that.

Carmen Badea

Comment 12

•

16 years ago

(In reply to comment #9) > So from a quick skim this might in fact be ok threadsafety-wise due to the > limited range of checks SelectorThread::parallelSelectorMatches performs > (namespace, tag, id, class). Would need to check that carefully, of course, > but it should all just be atom and integer compares. And you have a check up > front to restrict the parallel matching to cases where SelectorMatchesTree is > called with a selector with null mNext, mOperator being the descendant > combinator, no negations, pseudo-classes, or attr selectors... > Yes, that's what we've done. We did profile these properties too (e.g. how often are there no negations, no attributes, etc.) and the results showed that it was rare to have negations or attributes, for example. > So the basic approach looks sound (and the operating assumption that selectors > of the form "foo bar" rarely match on the "foo" part is correct). > > Which OSes was this tested on? I assume Linux given the pthreads usage? I > wonder whether NSPR threads would give similar results (and make the code more > portable)... So far we've tested it on Windows only. For the implementation we used the Pthreads-w32 API. -- Carmen

Carmen Badea

Comment 13

•

16 years ago

Attached patch Parallelized selector matching patch - updated w/ license information — Details — Splinter Review

Attachment #405209 - Attachment is obsolete: true

Carmen Badea

Comment 14

•

16 years ago

I've just resubmitted the patch with the license information modified for the SelectorThread files. David, Boris, could you please take a look and let me know if you think the licensing info is fine? Thank you. -- Carmen (In reply to comment #8) > (In reply to comment #5) > > Is Intel willing to contribute the SelectorThread.h/cpp code under that license? > > Sure. We have our legal approval for doing so, and have already done it in the > case of VProf. > > Carmen, please look at the the license block from VProf, copy it into the newly > introduced files, and resubmit the patch. > > David, thanks for the catch. > > -moh

Carmen Badea

Comment 15

•

16 years ago

(In reply to comment #10) > Oh, another question, for Carmen. How much time do you have to spend on this, > in terms of the NSPR threads question above, cleaning up some of the code > duplication introduced in SelectorMatchesTree, etc? That is, are you planning > to do that sort of thing yourself, up through checkin, or do you want someone > to help get the patch into shape to check in? Right now I am working on improving the thread workload management and I also need to test it against the currently functional version to see if the performance improves. That and cleaning up the code should take me about a week. I hadn't thought about how to check it in yet :). Let's see how Moh thinks we should proceed. Thanks :). -- Carmen

Moh Haghighat

Reporter

Comment 16

•

16 years ago

Boris, We'd love to work and get all the possible issues and concerns resolved, but we'd need to know what those issues are in the first place ;) I.e., we'd need your help in telling us what needs to be done. I think knowing the process would also help us in other parallelism related work we might do in future. -moh

Boris Zbarsky [:bzbarsky]

Comment 17

•

16 years ago

License looks fine to me. David? As far as threading goes... Benjamin thinks (and I agree) that using NSPR threads here is less pain than trying to make sure pthreads works across the board with shipping Firefox on typical users' machines. You will probably also want to use the NSPR locking primitives, or our C++ wrappers on top of them. See xpcom/glue/CondVar.h and xpcom/glue/Mutex.h for a start. I haven't read the details of the thread code, but for the nsCSSRuleProcessor changes, with this patch there are three copies of the main matching loop. There should be no need for that; one copy should be quite sufficient, right? The loop to determine |n| is unnecessary; just check tempSelector->mNext; if we ever want to test for anything but n == 1 we can modify the check at that point (say by storing the number of following selectors in the chain in each selector at parse time). The "assume 128 is large enough" code needs to be fixed to not assume that. I suggest using an nsAutoTArray instead of a C array. The code in parallelSelectorMatches can be cleaned up (e.g. the commented-out bits removed, the aForStyling argument removed, the unused setNodeFlags local removed). The debug print statements should perhaps be under DEBUG_PARALLEL_MATCHING instead of C_DEBUG? General whitespace issues (use of tabs etc) also need addressing, but that can happen towards the end if desired.

Carmen Badea

Comment 18

•

16 years ago

Boris, thank you so much for the helpful suggestions. I'll go ahead and work on getting the patch code into shape. -- Carmen (In reply to comment #17) > License looks fine to me. David? > > As far as threading goes... Benjamin thinks (and I agree) that using NSPR > threads here is less pain than trying to make sure pthreads works across the > board with shipping Firefox on typical users' machines. > > You will probably also want to use the NSPR locking primitives, or our C++ > wrappers on top of them. See xpcom/glue/CondVar.h and xpcom/glue/Mutex.h for a > start. > > I haven't read the details of the thread code, but for the nsCSSRuleProcessor > changes, with this patch there are three copies of the main matching loop. > There should be no need for that; one copy should be quite sufficient, right? > The loop to determine |n| is unnecessary; just check tempSelector->mNext; if we > ever want to test for anything but n == 1 we can modify the check at that point > (say by storing the number of following selectors in the chain in each selector > at parse time). > > The "assume 128 is large enough" code needs to be fixed to not assume that. I > suggest using an nsAutoTArray instead of a C array. > > The code in parallelSelectorMatches can be cleaned up (e.g. the commented-out > bits removed, the aForStyling argument removed, the unused setNodeFlags local > removed). > > The debug print statements should perhaps be under DEBUG_PARALLEL_MATCHING > instead of C_DEBUG? > > General whitespace issues (use of tabs etc) also need addressing, but that can > happen towards the end if desired.

Boris Zbarsky [:bzbarsky]

Comment 19

•

16 years ago

Great! If you have any questions, feel free to ask here, or mail me directly, or ask on irc.mozilla.org in the #developers channel!

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 20

•

16 years ago

Comment on attachment 405222 [details] [diff] [review] Parallelized selector matching patch - updated w/ license information Some drive-by comments: >diff -crBN mozilla-central/layout/style/SelectorThread.cpp mozilla-central-pSMT/layout/style/SelectorThread.cpp >*** mozilla-central/layout/style/SelectorThread.cpp Wed Dec 31 16:00:00 1969 >--- mozilla-central-pSMT/layout/style/SelectorThread.cpp Wed Oct 7 21:51:00 2009 > >+ SharedData* SelectorMainThread::startOrWakeup(int mode, int _availableAncestorIndex, RuleProcessorData** _ancestors, FILE* _f, nsCSSSelector* _initialSelector) >+ { [snip] >+ for (int j = 0; j < i; j++) >+ { >+ sharedData->sems[j]->post(); Barrier here would be more efficient in the common case (threads already created), since you would only need to make one system call to awaken them rather than NTHREADS calls. [snip] >+ sharedData->mainSem->wait(); >+ Why is the main thread idle-waiting here? It would be better for it to also participate in the selector matching; you'd save a TCB. >+ >+ //------------------------------------------------------------------------------------------------------------ >+ >+ void *SelectorThread::run() >+ { [snip] >+ while (true) >+ { >+ sharedData->sems[index]->wait(); Barrier sync with the main thread and other worker threads here would be better, IMHO. >+ for (i = tData->startAIndex; i <= tData->endAIndex; i++) >+ { >+ >+ //if (i < 0) break; >+ if ((sharedData->found) && (sharedData->foundIndex < i)) break; >+ tData->ancestor = sharedData->ancestors[i]; >+ //tData->currentAIndex = i; >+ >+ result = parallelSelectorMatches(aForStyling); >+ if (result) >+ { >+ sharedData->mainLock->acquire(); >+ if (!sharedData->found) >+ { >+ sharedData->found = true; >+ sharedData->foundIndex = i; >+ } >+ else >+ { >+ if (i < sharedData->foundIndex) >+ { >+ sharedData->foundIndex = i; >+ } >+ //else, there's already an earlier match, so this match is irrelevant >+ sharedData->mainLock->release(); Yeah ... you don't want to lock/unlock |mainLock| in every loop iteration :). I'd recommend making found/foundIndex thread-private variables, and have the main thread compute the min foundIndex of all threads (including itself) after the barrier below. You could bring back this "speculation failed" optimization with s/sharedData->foundIndex/sharedData->approximateFoundIndex/ and do the comparison and update with the thread-local foundIndex outside of the mutex. That being racy is OK: having sharedData->approximateFoundIndex not decrease monotically doesn't affect the correctness of the algorithm. >+ sharedData->mainLock->acquire(); >+ >+ sharedData->WaitForThreads--; >+ if (sharedData->WaitForThreads == 0) >+ { >+ #ifdef C_DEBUG >+ fprintf(f, "thread #%d->run(): WaitForThreads = %d, everyone's gone to sleep...\n", this->getIndex(), sharedData->WaitForThreads); >+ #endif >+ sharedData->mainSem->post(); >+ } >+ sharedData->mainLock->release(); This is sort of a proto-barrier. I'd make it an actual barrier; when the main thread is also doing selector matching, the main thread would also want to hit that barrier like all the other worker threads.

Moh Haghighat

Reporter

Comment 21

•

16 years ago

Adding Leo.

Carmen Badea

Comment 22

•

16 years ago

Attached file HotPar'10 Paper — Details

HotPar'10 Paper

Carmen Badea

Comment 23

•

16 years ago

We recently submitted a paper describing some of our work and results on parallelizing the CSS rule-matching component of Firefox to the USENIX Workshop on Hot Topics in Parallelism (<a href="http://www.usenix.org/event/hotpar10/cfp/">HotPar '10</a>). We were just informed that our paper was one of the 16 papers that were accepted. Special thanks to those of you who provided us the benchmarks we used and for answering our questions. Hope we did a fair job in the acknowledgment section :).

Carmen Badea

Comment 24

•

16 years ago

Attached patch Parallelized CSS selector matching - NSPR version — Details — Splinter Review

Hi everyone, Attached is the patch containing the implementation of the parallel selector matching process using NSPR instead of Pthreads. It takes into account most of the suggestions that Boris made (e.g., using NSPR constructs, cleaning up comments, using an nsAutoTArray to remove the previous size limitations, etc.). I've tested this implementation on two systems running Windows (Nehalem and Atom), and in both cases the performance speedups observed are slightly worse than the ones observed when experimenting with the previous Pthread implementation. Attached JPEGs show the performance results of the NSPR implementation for the Mozilla page load tests on Nehalem and Atom, respectively. With the NSPR implementation, using the Mozilla page load tests on Nehalem, we get an average speedup of 2x (0.49x to 14x, mean = 2x), with ~75% of the benchmarks showing improvement or not being affected performance wise. With our initial Pthreads implementation, we obtained an average speedup of 1.08x (0.51x to 1.84x, mean = 1.08x) with ~ 87% of the benchmarks showing improvement or not being affected performance wise. The obtained speedups depend on the amount of work needed for CSS rule matching. With the NSPR implementation, using the Mozilla page load tests on Atom, we get an average speedup of 1.03x (0.67x to 1.5x, mean=1.03x) with ~69% of the benchmarks not being affected performance wise. Note that while Nehalem is multi-core, the Atom that I used for performance testing is single core with hyperthreading. I also have another patch (see attached NSPR-MTWT patch file), built upon the NSPR one, where the main thread also joins in the parallel rule matching process, carrying out the workload that a worker thread would do, instead of sitting idle waiting for the worker threads to finish as before. However, I still have not tested this implementation in terms of performance, only in terms of functionality using the Mozilla page load tests, but expect that the performance should improve, at least slightly. Please feel free to take the code and test it or modify it as you see fit. It would definitely need to be adapted to the latest trunk as this was built on a older version of the Firefox source code. -- Carmen

Carmen Badea

Comment 25

•

16 years ago

Attached patch Parallelized CSS selector matching - NSPR version with main thread as worker thread — Details — Splinter Review

Carmen Badea

Comment 26

•

16 years ago

Attached image NSPR versions performance results for the Mozilla page load tests on Nehalem — Details

Carmen Badea

Comment 27

•

16 years ago

Attached image NSPR version performance results for the Mozilla page load tests on Atom — Details

Andreas Gal :gal

Comment 28

•

16 years ago

The bugzilla diff view doesn't display the patch for me for some reason. The raw output is fine.

Boris Zbarsky [:bzbarsky]

Comment 29

•

16 years ago

Andreas, I bet the built-in bugzilla thing just fails because the patch doesn't apply to m-c tip. Carmen, thanks for following up on this! The patch will definitely need to be merged to tip and ideally modified so that we don't have the two different copies of SelectorMatchesTree around (needs more ifdefs, possibly, but makes it less likely that they'll get out of sync); do you want to do that or do you want someone else to pick that up? If you want I might do at least a preliminary merge to tip just so I can get some measurements on a testcase or two I have lying around here....

Boris Zbarsky [:bzbarsky]

Comment 30

•

16 years ago

Actually, looks like that patch creates _three_ separate copies of SelectorMatchesTree. That's really not desirable at all, and reading more carefully should be easy enough to avoid.

Boris Zbarsky [:bzbarsky]

Comment 31

•

16 years ago

And does SelectorThread still need windows.h? Seems to compile fine for me (on Mac) without it.

Boris Zbarsky [:bzbarsky]

Comment 32

•

16 years ago

SharedData::SharedData seems to sprintf into a random memory location (str1 is never actually set to point to anywhere). That doesn't look right.

Carmen Badea

Comment 33

•

16 years ago

Hi Boris, I’d have loved to complete the work to the end. However, I am in the process of finishing up my PhD dissertation and preparing for my final defense which I expect to happen sometime in summer. I’m thus worried that my continuing the work on this patch will further delay my graduation. Please feel free to take the code and test/modify it as you see fit or assign it to someone else. I will definitely be interested to follow this project's further evolution. -- Carmen

Boris Zbarsky [:bzbarsky]

Comment 34

•

16 years ago

Carmen, sounds good. I know how that defense thing works. ;) I'll get this merged to trunk (probably starting with the MTWT version, right?) and see how things look. Thanks for working on this!

Carmen Badea

Comment 35

•

16 years ago

Hi Boris, Yes, i think that's the version you want. It's the one where the main thread joins in carrying out the CSS selector matching process instead of sitting idle waiting for the worker threads to complete the work. Thanks again for all your help :). -- Carmen (In reply to comment #34) > Carmen, sounds good. I know how that defense thing works. ;) > > I'll get this merged to trunk (probably starting with the MTWT version, right?) > and see how things look. Thanks for working on this!

Boris Zbarsky [:bzbarsky]

Comment 36

•

16 years ago

Attached patch Merged to tip — Details — Splinter Review

This is the "NSPR version with main thread as worker thread" patch merged to tip and with the redundant SelectorMatchesTree stuff removed (as well as a few minor changes to SelectorMatchingTree callsites to allow it to assume that aSelector is non-null and reindentation of the new code to be 2-space indented, no tabs). This doesn't eliminate the 3x duplication of the SelectorMatches prefix (namespace, tag, id, class match); the best way to eliminate that will depend on how we end up splitting up the compilation units here. Chris, if you want to play with this, it should apply fine to m-c tip. I tried it on the following task: 1) Load http://www.whatwg.org/specs/web-apps/current-work/ 2) Wait for all loading work to be done. 3) Run javascript:%20var%20docEl%20=%20document.documentElement;%20docEl.style.display%20=%20"none";%20document.body.offsetWidth;%20var%20start%20=%20new%20Date();%20docEl.style.display%20=%20"";%20window.getComputedStyle(document.body,%20"").fontSize;%20var%20end%20=%20new%20Date();%20alert(end%20-%20start) and I see a slowdown from abou 1425ms to about 1650ms (both with NUM_WORKERS set to 1 and 2) on a Core 2 Duo Macbook Pro.

Boris Zbarsky [:bzbarsky]

Comment 37

•

15 years ago

I filed bug 631527 on a more ambitious plan to parallelize over entire subtrees, not over matching of a single selector against a single node.

Emilio Cobos Álvarez [:emilio]

Comment 38

•

7 years ago

Stylo does parallel selector-matching. I think we can close this.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.