Closed Bug 608196 Opened 15 years ago Closed 15 years ago

Incremental genxref implementation: unit at a time

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: timeless, Assigned: timeless)

References

Details

timeless

Assignee

Description

•

15 years ago

TOC: Comment 0 has a description of how things work and summary of the feature Comment 1 has instructions for testing the feature Comment 2 has timing numbers for an incremental run Comment 3 has timing numbers for a normal run because otherwise they'd be lost in this bug :) Description: Currently MXR must rebuild its entire index each time update-xref.pl runs, which is typically after a run of update-src.pl. It's possible for very little of a tree to change. The implementation I have isn't ready for use by mozilla-central because the database loses too much knowledge, but this is part one. currently mxr does 2 source mode passes: pass 1...: Collect identifier definitions. pass 2...: Generate reference statistics. plus a dump phase (stage 3). basically you need to have a list of identifiers before you can decide which tokens in source files are references, so you need to do pass 1 completely first. for a simple tree (security) with three directories (dbm, nspr, security), this looks like: pass 1 collect dbm pass 1 collect nspr pass 1 collect security -at this point we have a list of definitions from dbm+nspr+security- pass 2 find refs dbm pass 2 find refs nspr pass 2 find refs security -at this point we have a list of references from dbm+nspr+security- stage 3 dump database -at this point we dump the single database containing identifiers from dbm+nspr+security with consumers found in all of them- With the code I have, the following happens instead: if dbm is unchanged, copy dbm database, else: pass 1 collect dbm -at this point we have a list of definitions from dbm- pass 2 find refs dbm -at this point we have a list of references of dbm defined identifiers in dbm + things the indexer decides are identifiers anyway- stage 3 dump dbm database -at this point we dump a database containing identifiers from dbm with consumers found in dbm- if nspr is unchanged, copy nspr database, else: pass 1 collect nspr -at this point we have a list of definitions from nspr- pass 2 find refs nspr -at this point we have a list of references of nspr defined identifiers in nspr + things the indexer decides are identifiers anyway- -we did not find nspr references to dbm defined identifiers- stage 3 dump nspr database -at this point we dump a database containing identifiers from nspr with consumers found in nspr- if security is unchanged, copy security database, else: pass 1 collect security -at this point we have a list of definitions from security- pass 2 find refs security -at this point we have a list of references of security defined identifiers in security + things the indexer decides are identifiers anyway- -we did not find security references to dbm or nspr defined identifiers- stage 3 dump security database -at this point we dump a database containing identifiers from security with consumers found in security- "pass" 4 merge databases -we merge the dbm, nspr and security databases- The neat part about this is that each database is self contained, and there's logic to check to see if the source code for a database is unchanged from a reference directory. If it is, then instead of generating an index (reading each file twice), we can simply borrow the existing database. We only need to rebuild databases for directories which have had changes. For certain cases (addons) this is actually a very good thing, because we keep much smaller databases in memory and the directories are actually independent. People can test this on konigsberg: http://mxr-test.konigsberg.mozilla.org/security (currently indexed using the non incremental indexer, but it had a pass using incremental which was used by security-copy for testing purposes) http://mxr-test.konigsberg.mozilla.org/security-copy (indexed using the incremental indexer) I *believe* there would be differences in C++ cases. To show this I'd need to run against m-c or a similar tree. Unfortunately konigsberg also has a prototype for a different parsing system too which I believe causes more missed than the incremental code for this bug. This is obviously not ideal for mozilla-central (or even security). Future plans (to be refiled in another bug at a later date): The longer version of this will instead rely on distinct identifier and reference databases. For directories which haven't changed, we would skip the identifier search. if dbm is unchanged, copy dbm-ident database, else: pass 1 collect dbm -at this point we have a list of definitions from dbm- dump dbm-ident database if nspr is unchanged, copy nspr-ident database, else: pass 1 collect nspr -at this point we have a list of definitions from nspr- dump nspr-ident database if security is unchanged, copy security-ident database, else: pass 1 collect security -at this point we have a list of definitions from security- dump security-ident database if all -ident databases are unchanged, copy *-ref databases, else: if dbm is unchanged, copy dbm database, else: "pass 4" merge *-ident databases pass 2 find refs dbm -at this point we have a list of references of dbm defined identifiers in dbm+nspr+security- stage 3 dump dbm-ref database -at this point we dump a database containing identifiers from dbm,nspr,security with consumers found in dbm- pass 2 find refs nspr -at this point we have a list of references of nspr defined identifiers in dbm+nspr+security- stage 3 dump nspr-ref database -at this point we dump a database containing identifiers from dbm,nspr,security with consumers found in nspr- pass 2 find refs security -at this point we have a list of references of security defined identifiers in dbm+nspr+security- stage 3 dump security-ref database -at this point we dump a database containing identifiers from dbm,nspr,security with consumers found in security- stage 4 merge databases -we merge the dbm-ident, dbm-ref, nspr-dent, nspr-ref, security-ident and security-ref databases-

timeless

Assignee

Comment 1

•

15 years ago

Basic comparison testing: For simple C identifiers such as PR_Malloc, I don't see any differences: http://mxr-test.konigsberg.mozilla.org/security/ident?i=PR_Malloc http://mxr-test.konigsberg.mozilla.org/security-copy/ident?i=PR_Malloc

timeless

Assignee

Comment 2

•

15 years ago

This instance is a case where there are no changes to any of the indexed directories between security/ and security-copy/, so this is just measuring phase 4: merging 3 databases (dbm, nspr, security). [timeless@konigsberg mxr-test]$ ./update-xref.pl -incremental security-copy/ real 0m5.937s user 0m4.351s sys 0m1.517s

timeless

Assignee

Comment 3

•

15 years ago

This is a normal complete index rebuild of the entire security root using the normal non incremental system [timeless@konigsberg mxr-test]$ ./update-xref.pl security real 8m46.718s user 6m59.780s sys 1m46.517s

timeless

Assignee