Closed Bug 1047554 Opened 10 years ago Closed 9 years ago

Parallel tree indexing

Categories

(Webtools Graveyard :: DXR, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: erik, Unassigned)

References

Details

Summary: Parallel tree rendering → Parallel tree indexing
The deploy script can delete indexes of old format versions when new code rolls out.
Blocks: 820531
Commits pushed to es at https://github.com/mozilla/dxr

https://github.com/mozilla/dxr/commit/21767c7bd7905f0b305d804312ee60221dd0d0b2
Implement most of parallel tree indexing. Ref bug 1047554. The deploy script is still to come.

For deployers:

* Create a top-level "dxr" command. Move dxr-build.py to the `dxr index` subcommand, and install it using proper entrypoints.
* Wipe out last remnant of generated filesystem artifacts: we now use the original config file at both index and request time. Thus, the `target_folder` setting goes away, the `DXR_FOLDER` env var becomes `DXR_CONFIG` and points to the config file instead, and you can change settings like es_hosts or google_analytics_key without a rebuild or mucking around with generated files (you'll just need a WSGI restart). Settings that make it into the index, like www_root, will still require a rebuild to change.
* Introduce a "catalog index" which keeps track of what indexes are around and what their format versions are. With this comes 2 new settings: es_catalog_index and es_catalog_replicas. The latter should be set to the number of nodes in your ES cluster minus 1.
* Support relative paths in config files, so we no longer need dxr.config.in with sed hacks or the top-level makefile in each test dir. Introduce clean_command and `dxr clean` to fill the role of "make clean".
* Redo how temp and log folders work. We no longer make a top-level temp folder with tree-specific folders inside. That makes it hard to clean up tests' temp folders afterward without introducing differences between test and production use: there would be race conditions or failed-rmdir ugliness in trying to delete the global temp folder in a parallel-indexing situation, or there'd simply be empty folders lying around. Now temp and log folders are per-tree, and temp folders are deleted afterward by default (because they're huge). log_folder now exists only in the [DXR] section and supports a {tree} substitution token, as does temp_folder. Both sorts of folders are by default created in the same dir as dxr.config. There's no longer hard-coded magic about TreeConfig.log_folder or .temp_folder defaulting to being within the Config-level ones; that's now consistent with how the override semantics of the other options that occur in both TreeConfig and Config. And `dxr clean` wipes out temp and log folders on demand, so you don't have to boilerplate that into a makefile or other script.
* build_commmand can now be blanked out in config to not invoke anything. No more /bin/true.
* Change the $jobs token to {workers} for consistency with other substitution tokens. $jobs still works for now.

For DXR developers:

* Refactor build.py so it's geared toward building a single tree. If you want to loop across many, do it in the caller. build_instance() becomes index_tree() and deploy_tree(), with a convenience wrapper index_and_deploy_tree().
* make_app() now takes a Config, and Config takes a dir to interpret paths relative to. This lets us call make_app() with either concrete config files or in-memory config (as in SingleFileTestCase), and it saves us having to remember a chdir() (and undo it, if important) before instantiating a Config.
* Add the beginnings of a `dxr delete` command. Right now, it deletes only the catalog, useful for debugging.
* Stop ignoring errors while cleaning out folders in ensure_folder().
* Move some ES utils out of app.py and core.py to es.py.

https://github.com/mozilla/dxr/commit/a304b2cab06d39f3324bfdeff10d91e8577cdd17
Merge parallel tree indexing. Fix bug 1047554.

For deployers:
* See the commit message of 21767c7bd7905f0b305d804312ee60221dd0d0b2.
* dxr-serve.py becomes `dxr serve`.
* deploy.py becomes `dxr deploy`.
* The on-disk folder structure of the dxr-prod link and the builds dir doesn't change, but it no longer needs the "instances" dir (and will ignore it).
* The dxr.config file should now live somewhere where both the indexing box and the webheads can get at it, because both the indexing process and the web app read it. There's no more generated config file.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Commits pushed to master at https://github.com/mozilla/dxr

https://github.com/mozilla/dxr/commit/21767c7bd7905f0b305d804312ee60221dd0d0b2
Implement most of parallel tree indexing. Ref bug 1047554. The deploy script is still to come.

https://github.com/mozilla/dxr/commit/a304b2cab06d39f3324bfdeff10d91e8577cdd17
Merge parallel tree indexing. Fix bug 1047554.
Product: Webtools → Webtools Graveyard
You need to log in before you can comment on or make changes to this bug.