Closed Bug 2032102 Opened 1 month ago Closed 29 days ago

Parallelize revision creation and diff property calls in submit for faster stack submission

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: Sylvestre, Assigned: Sylvestre)

References

(Blocks 1 open bug)

Details

Attachments

(9 files, 7 obsolete files)

Bug 2032102 - Parallelize set_diff_property calls in submit 1 month ago Sylvestre Ledru [:Sylvestre] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 2032102 - Parallelize revision creation in submit by splitting submit and link 1 month ago Sylvestre Ledru [:Sylvestre] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 2032102 - submit: remove redundant has_revision_reviewers calls 1 month ago Sylvestre Ledru [:Sylvestre] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 2032102 - submit: persist bug_id_changed from validation to avoid refetch 1 month ago Sylvestre Ledru [:Sylvestre] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 2032102 - Restore benchmark framework for submit/patch perf work 1 month ago Sylvestre Ledru [:Sylvestre] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 2032102 - submit: persist existing_status from validation to avoid refetch 1 month ago Sylvestre Ledru [:Sylvestre] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 2032102 - submit: batch reviewer lookups across all commits 1 month ago Sylvestre Ledru [:Sylvestre] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 2032102 - submit: parallelize AI review requests 1 month ago Sylvestre Ledru [:Sylvestre] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 2032102 - patch: parallelize initial environment checks 1 month ago Sylvestre Ledru [:Sylvestre] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 2032102 - conduit: cache diff lookups in get_diffs 1 month ago Sylvestre Ledru [:Sylvestre] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 2032102 - conduit: add socket timeout and retry idempotent calls 1 month ago Sylvestre Ledru [:Sylvestre] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 2032102 - tests: cover new Commit fields in to_dict roundtrip 1 month ago Sylvestre Ledru [:Sylvestre] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 2032102 - conduit: pool HTTP connections via urllib3 1 month ago Sylvestre Ledru [:Sylvestre] 48 bytes, text/x-phabricator-request		Details \| Review
WIP: Bug 2032102 - Document benchmark framework in AGENTS.md 1 month ago Sylvestre Ledru [:Sylvestre] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 2032102 - benchmarks: framework and harness driving real Conduit code 1 month ago Sylvestre Ledru [:Sylvestre] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 2032102 - reorganise: let --force tolerate abandoned-revision ghost links 1 month ago Sylvestre Ledru [:Sylvestre] 48 bytes, text/x-phabricator-request		Details \| Review

Sylvestre Ledru [:Sylvestre]

Assignee

Description

•

1 month ago

moz-phab submit currently makes all Conduit API calls sequentially, which is slow for large stacks. This bug tracks parallelizing two key phases of the submit workflow:

Parallelize set_diff_property calls: Extract build_diff_property_args from set_diff_property so that data building (which may shell out to git-cinnabar via get_public_node) stays in the main thread, while the actual API calls are deferred and executed in parallel via a ThreadPoolExecutor at the end of submit.
Parallelize revision creation: Restructure _submit into 4 distinct phases:
- Phase 1: Create and submit diffs (sequential — touches local VCS)
- Phase 2: Create/update revisions (parallel — no cross-commit dependency after splitting out parent linking)
- Phase 3: AI review, amend commits, build diff metadata (sequential)
- Phase 4: Set diff properties and parent links (parallel)
The key insight is that by splitting parents.add out of create_revision/update_revision and deferring it to Phase 4, every revision creation call becomes independent and can run in parallel.

Thread safety: SimpleCache is made thread-safe with a threading.Lock to support concurrent Conduit calls.

Benchmark (simulated 0.3s API latency):

5-commit stack: 3.30s → 2.41s (−27%)
10-commit stack: 6.61s → 4.81s (−27%)

Sylvestre Ledru [:Sylvestre]

Assignee

Comment 1

•

1 month ago

Attached file Bug 2032102 - Parallelize set_diff_property calls in submit — Details

Extract build_diff_property_args for thread-safe parallel submission:
split set_diff_property into build_diff_property_args (data building,
which may shell out to git-cinnabar via get_public_node) and the API
call, so the main thread builds all payloads before worker threads
make the network requests.

Sylvestre Ledru [:Sylvestre]

Assignee

Comment 2

•

1 month ago

Attached file Bug 2032102 - Parallelize revision creation in submit by splitting submit and link — Details

Restructure _submit into 4 phases:

Create and submit diffs (sequential — touches local VCS)
Create/update revisions (parallel — no cross-commit dependency)
AI review, amend commits, build diff metadata (sequential)
Set diff properties and parent links (parallel)

Phase 2 is now parallel because the previous commit split out
parent linking (parents.add) from create_revision. Without that
dependency on the previous commit's rev_phid, every revision
creation call is independent.

Benchmark (simulated 0.3s API latency):
5-commit stack: 3.30s → 2.41s (−27%)
10-commit stack: 6.61s → 4.81s (−27%)

Sylvestre Ledru [:Sylvestre]

Assignee

Comment 3

•

1 month ago

Attached file Bug 2032102 - submit: remove redundant has_revision_reviewers calls (obsolete) — Details

The reviewer data is already present in the revision dict preloaded at the
top of validate_commit_stack. Check
revision["attachments"]["reviewers"]["reviewers"] directly instead of
making two per-commit conduit.has_revision_reviewers() calls (each of
which does a get_revisions lookup).

Sylvestre Ledru [:Sylvestre]

Assignee

Comment 4

•

1 month ago

Attached file Bug 2032102 - submit: persist bug_id_changed from validation to avoid refetch (obsolete) — Details

validate_commit_stack already computes bug_id_changed from the preloaded
revision data but discards it. Store it on the Commit dataclass and read
it in update_revision() to eliminate the extra get_revisions() call
during Phase 2.

Sylvestre Ledru [:Sylvestre]

Assignee

Comment 5

•

1 month ago

Attached file Bug 2032102 - Restore benchmark framework for submit/patch perf work (obsolete) — Details

Brings back the benchmark framework originally added on a sibling branch
(afe13f4) so the ongoing series of submit/patch performance changes can
be measured per-commit. Includes the framework harness, connection
pooling benchmark, and patch benchmark. The submit benchmark is added
in a follow-up commit so its baseline reflects the already-landed
has_revision_reviewers and bug_id_changed changes.

Run with:
python -m benchmarks
python -m benchmarks submit
python -m benchmarks patch
python -m benchmarks connection_pooling

Sylvestre Ledru [:Sylvestre]

Assignee

Comment 6

•

1 month ago

Attached file Bug 2032102 - submit: persist existing_status from validation to avoid refetch (obsolete) — Details

edit_revision() refetched the revision just to read the current status,
even though validate_commit_stack had already seen it. Store it on the
Commit dataclass and pass it through as an optional kwarg; the fetch
path remains as a fallback for callers that don't set it.

Also adds bench_submit.py, reflecting the cumulative effect of:

Step 1: drop has_revision_reviewers from validation
Step 2: persist bug_id_changed (no refetch in update_revision)
Step 3: persist existing_status (no refetch in edit_revision)

Benchmark (simulated 50ms per API call):
Commits Before (parallel) After (parallel) Speedup
1 504ms 404ms 1.24x
3 1007ms 806ms 1.25x
5 1559ms 1258ms 1.24x
10 3013ms 2411ms 1.25x

End-to-end submit speedup vs fully sequential (5-commit stack):
Before: 1.48x -> After: 1.83x

Sylvestre Ledru [:Sylvestre]

Assignee

Comment 7

•

1 month ago

Attached file Bug 2032102 - submit: batch reviewer lookups across all commits — Details

validate_commit_stack called check_for_invalid_reviewers() once per
commit, which triggered one user.query + one project.search per commit.
Collect all unique reviewer names and groups up front, pre-warm the
get_users / get_groups caches with a single batch call each, then the
per-commit validation becomes pure cache hits.

Uses normalise_reviewer(..., strip_group=False) to match the cache keys
used by check_for_invalid_reviewers.

Benchmark (simulated 50ms per API call):
Commits Before (parallel) After (parallel) Speedup
1 404ms 404ms 1.00x
3 806ms 606ms 1.33x
5 1258ms 857ms 1.47x
10 2411ms 1509ms 1.60x

End-to-end submit speedup vs fully sequential (5-commit stack):
Before: 1.83x -> After: 2.69x

Sylvestre Ledru [:Sylvestre]

Assignee

Comment 8

•

1 month ago

Attached file Bug 2032102 - submit: parallelize AI review requests — Details

AI review is a standalone API call that only depends on commit.rev_id,
so all eligible commits can be requested concurrently. Collect them
during the main loop and fan them out via a ThreadPoolExecutor after
all revisions are created/updated.

Each future's result is handled independently so one failure doesn't
abort the others. Individual ai_review_state is still set per-commit.

Benchmark (simulated 50ms per API call):
Commits Before (parallel) After (parallel) Speedup
1 404ms 405ms 1.00x
3 606ms 507ms 1.20x
5 857ms 659ms 1.30x
10 1509ms 1111ms 1.36x

End-to-end submit speedup vs fully sequential (5-commit stack):
Before: 2.69x -> After: 3.50x

Sylvestre Ledru [:Sylvestre]

Assignee

Comment 9

•

1 month ago

Attached file Bug 2032102 - patch: parallelize initial environment checks — Details

conduit.check(), repo.check_vcs(), and repo.is_worktree_clean() are
independent of each other: one is a network ping, the other two are
local VCS queries. Run them concurrently via a ThreadPoolExecutor so
the local checks overlap with the Phabricator round-trip.

Saves ~50-200ms on every patch invocation. The benchmark simulation
only models network latency, so the wall-clock delta isn't visible
there -- but real invocations over a live network benefit directly.

Sylvestre Ledru [:Sylvestre]

Assignee

Comment 10

•

1 month ago

Attached file Bug 2032102 - conduit: cache diff lookups in get_diffs — Details

Mirrors the existing caching pattern in get_revisions. Diffs are
immutable once submitted, so a per-process cache keyed by PHID (and
a PHID index keyed by diff ID) is safe. Partial lookups only query
Phabricator for the uncached subset.

No visible impact on the simulated submit/patch benchmarks (each
diff is typically fetched once per invocation), but helps any code
path -- or future change -- that repeats a lookup within a session.

Sylvestre Ledru [:Sylvestre]

Assignee

Comment 11

•

1 month ago

Attached file Bug 2032102 - conduit: add socket timeout and retry idempotent calls — Details

Every Conduit API call now enforces a 120-second socket timeout, so a
hung network connection can no longer stall moz-phab indefinitely.

For idempotent read-only methods (conduit.ping, user.whoami,
user.query, project.search, *.search, differential.getrawdiff,
file.querychunks, edge.search), transient URLError / TimeoutError /
ConnectionError failures are retried up to 3 times with a simple
linear backoff.

Mutation methods (differential.revision.edit, differential.creatediff,
etc.) are intentionally NOT retried. A retried POST after a successful
request that lost its response could create duplicate revisions or
diffs.

No benchmark impact on the happy path; this is a reliability change.

Sylvestre Ledru [:Sylvestre]

Assignee

Comment 12

•

1 month ago

Attached file Bug 2032102 - tests: cover new Commit fields in to_dict roundtrip (obsolete) — Details

Extend test_commit_to_dict_json_roundtrip expectations to include
bug_id_changed and existing_status, the two fields added to the
Commit dataclass to avoid refetching validation state.

Sylvestre Ledru [:Sylvestre]

Assignee

Comment 13

•

1 month ago

Attached file Bug 2032102 - conduit: pool HTTP connections via urllib3 — Details

Replace per-call urllib.request.urlopen with a shared
urllib3.PoolManager, so repeated Conduit API calls reuse the TCP+TLS
connection instead of paying a fresh handshake every time.

urllib3's own Retry is disabled (total=0) because we still do
method-aware retries in call(): idempotent read methods are retried,
mutations (differential.revision.edit, differential.creatediff, etc.)
are not, to avoid duplicate POSTs if a successful request lost its
response.

Benchmark (20 sequential calls, 30ms simulated RTT):
urllib.request : 1969ms, 20 TLS handshakes
urllib3 pool : 1628ms, 1 TLS handshake (1.21x speedup)

Sylvestre Ledru [:Sylvestre]

Assignee

Comment 14

•

1 month ago

Attached file WIP: Bug 2032102 - Document benchmark framework in AGENTS.md (obsolete) — Details