Closed Bug 2032102 Opened 1 month ago Closed 29 days ago

Parallelize revision creation and diff property calls in submit for faster stack submission

Categories

(Conduit :: moz-phab, enhancement)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Sylvestre, Assigned: Sylvestre)

References

(Blocks 1 open bug)

Details

Attachments

(9 files, 7 obsolete files)

48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review

moz-phab submit currently makes all Conduit API calls sequentially, which is slow for large stacks. This bug tracks parallelizing two key phases of the submit workflow:

  1. Parallelize set_diff_property calls: Extract build_diff_property_args from set_diff_property so that data building (which may shell out to git-cinnabar via get_public_node) stays in the main thread, while the actual API calls are deferred and executed in parallel via a ThreadPoolExecutor at the end of submit.

  2. Parallelize revision creation: Restructure _submit into 4 distinct phases:

    • Phase 1: Create and submit diffs (sequential — touches local VCS)
    • Phase 2: Create/update revisions (parallel — no cross-commit dependency after splitting out parent linking)
    • Phase 3: AI review, amend commits, build diff metadata (sequential)
    • Phase 4: Set diff properties and parent links (parallel)

    The key insight is that by splitting parents.add out of create_revision/update_revision and deferring it to Phase 4, every revision creation call becomes independent and can run in parallel.

Thread safety: SimpleCache is made thread-safe with a threading.Lock to support concurrent Conduit calls.

Benchmark (simulated 0.3s API latency):

  • 5-commit stack: 3.30s → 2.41s (−27%)
  • 10-commit stack: 6.61s → 4.81s (−27%)

Extract build_diff_property_args for thread-safe parallel submission:
split set_diff_property into build_diff_property_args (data building,
which may shell out to git-cinnabar via get_public_node) and the API
call, so the main thread builds all payloads before worker threads
make the network requests.

Restructure _submit into 4 phases:

  1. Create and submit diffs (sequential — touches local VCS)
  2. Create/update revisions (parallel — no cross-commit dependency)
  3. AI review, amend commits, build diff metadata (sequential)
  4. Set diff properties and parent links (parallel)

Phase 2 is now parallel because the previous commit split out
parent linking (parents.add) from create_revision. Without that
dependency on the previous commit's rev_phid, every revision
creation call is independent.

Benchmark (simulated 0.3s API latency):
5-commit stack: 3.30s → 2.41s (−27%)
10-commit stack: 6.61s → 4.81s (−27%)

The reviewer data is already present in the revision dict preloaded at the
top of validate_commit_stack. Check
revision["attachments"]["reviewers"]["reviewers"] directly instead of
making two per-commit conduit.has_revision_reviewers() calls (each of
which does a get_revisions lookup).

validate_commit_stack already computes bug_id_changed from the preloaded
revision data but discards it. Store it on the Commit dataclass and read
it in update_revision() to eliminate the extra get_revisions() call
during Phase 2.

Brings back the benchmark framework originally added on a sibling branch
(afe13f4) so the ongoing series of submit/patch performance changes can
be measured per-commit. Includes the framework harness, connection
pooling benchmark, and patch benchmark. The submit benchmark is added
in a follow-up commit so its baseline reflects the already-landed
has_revision_reviewers and bug_id_changed changes.

Run with:
python -m benchmarks
python -m benchmarks submit
python -m benchmarks patch
python -m benchmarks connection_pooling

edit_revision() refetched the revision just to read the current status,
even though validate_commit_stack had already seen it. Store it on the
Commit dataclass and pass it through as an optional kwarg; the fetch
path remains as a fallback for callers that don't set it.

Also adds bench_submit.py, reflecting the cumulative effect of:

  • Step 1: drop has_revision_reviewers from validation
  • Step 2: persist bug_id_changed (no refetch in update_revision)
  • Step 3: persist existing_status (no refetch in edit_revision)

Benchmark (simulated 50ms per API call):
Commits Before (parallel) After (parallel) Speedup
1 504ms 404ms 1.24x
3 1007ms 806ms 1.25x
5 1559ms 1258ms 1.24x
10 3013ms 2411ms 1.25x

End-to-end submit speedup vs fully sequential (5-commit stack):
Before: 1.48x -> After: 1.83x

validate_commit_stack called check_for_invalid_reviewers() once per
commit, which triggered one user.query + one project.search per commit.
Collect all unique reviewer names and groups up front, pre-warm the
get_users / get_groups caches with a single batch call each, then the
per-commit validation becomes pure cache hits.

Uses normalise_reviewer(..., strip_group=False) to match the cache keys
used by check_for_invalid_reviewers.

Benchmark (simulated 50ms per API call):
Commits Before (parallel) After (parallel) Speedup
1 404ms 404ms 1.00x
3 806ms 606ms 1.33x
5 1258ms 857ms 1.47x
10 2411ms 1509ms 1.60x

End-to-end submit speedup vs fully sequential (5-commit stack):
Before: 1.83x -> After: 2.69x

AI review is a standalone API call that only depends on commit.rev_id,
so all eligible commits can be requested concurrently. Collect them
during the main loop and fan them out via a ThreadPoolExecutor after
all revisions are created/updated.

Each future's result is handled independently so one failure doesn't
abort the others. Individual ai_review_state is still set per-commit.

Benchmark (simulated 50ms per API call):
Commits Before (parallel) After (parallel) Speedup
1 404ms 405ms 1.00x
3 606ms 507ms 1.20x
5 857ms 659ms 1.30x
10 1509ms 1111ms 1.36x

End-to-end submit speedup vs fully sequential (5-commit stack):
Before: 2.69x -> After: 3.50x

conduit.check(), repo.check_vcs(), and repo.is_worktree_clean() are
independent of each other: one is a network ping, the other two are
local VCS queries. Run them concurrently via a ThreadPoolExecutor so
the local checks overlap with the Phabricator round-trip.

Saves ~50-200ms on every patch invocation. The benchmark simulation
only models network latency, so the wall-clock delta isn't visible
there -- but real invocations over a live network benefit directly.

Mirrors the existing caching pattern in get_revisions. Diffs are
immutable once submitted, so a per-process cache keyed by PHID (and
a PHID index keyed by diff ID) is safe. Partial lookups only query
Phabricator for the uncached subset.

No visible impact on the simulated submit/patch benchmarks (each
diff is typically fetched once per invocation), but helps any code
path -- or future change -- that repeats a lookup within a session.

Every Conduit API call now enforces a 120-second socket timeout, so a
hung network connection can no longer stall moz-phab indefinitely.

For idempotent read-only methods (conduit.ping, user.whoami,
user.query, project.search, *.search, differential.getrawdiff,
file.querychunks, edge.search), transient URLError / TimeoutError /
ConnectionError failures are retried up to 3 times with a simple
linear backoff.

Mutation methods (differential.revision.edit, differential.creatediff,
etc.) are intentionally NOT retried. A retried POST after a successful
request that lost its response could create duplicate revisions or
diffs.

No benchmark impact on the happy path; this is a reliability change.

Extend test_commit_to_dict_json_roundtrip expectations to include
bug_id_changed and existing_status, the two fields added to the
Commit dataclass to avoid refetching validation state.

Replace per-call urllib.request.urlopen with a shared
urllib3.PoolManager, so repeated Conduit API calls reuse the TCP+TLS
connection instead of paying a fresh handshake every time.

urllib3's own Retry is disabled (total=0) because we still do
method-aware retries in call(): idempotent read methods are retried,
mutations (differential.revision.edit, differential.creatediff, etc.)
are not, to avoid duplicate POSTs if a successful request lost its
response.

Benchmark (20 sequential calls, 30ms simulated RTT):
urllib.request : 1969ms, 20 TLS handshakes
urllib3 pool : 1628ms, 1 TLS handshake (1.21x speedup)

Assignee: nobody → sledru
Status: NEW → ASSIGNED
Attachment #9571184 - Attachment is obsolete: true
Attachment #9571174 - Attachment description: WIP: Bug 2032102 - Restore benchmark framework for submit/patch perf work → Bug 2032102 - Restore benchmark framework for submit/patch perf work
Attachment #9571172 - Attachment description: WIP: Bug 2032102 - submit: remove redundant has_revision_reviewers calls → Bug 2032102 - submit: remove redundant has_revision_reviewers calls
Attachment #9571173 - Attachment description: WIP: Bug 2032102 - submit: persist bug_id_changed from validation to avoid refetch → Bug 2032102 - submit: persist bug_id_changed from validation to avoid refetch
Attachment #9571175 - Attachment description: WIP: Bug 2032102 - submit: persist existing_status from validation to avoid refetch → Bug 2032102 - submit: persist existing_status from validation to avoid refetch
Attachment #9571176 - Attachment description: WIP: Bug 2032102 - submit: batch reviewer lookups across all commits → Bug 2032102 - submit: batch reviewer lookups across all commits
Attachment #9571177 - Attachment description: WIP: Bug 2032102 - submit: parallelize AI review requests → Bug 2032102 - submit: parallelize AI review requests
Attachment #9571178 - Attachment description: WIP: Bug 2032102 - patch: parallelize initial environment checks → Bug 2032102 - patch: parallelize initial environment checks
Attachment #9571179 - Attachment description: WIP: Bug 2032102 - conduit: cache diff lookups in get_diffs → Bug 2032102 - conduit: cache diff lookups in get_diffs
Attachment #9571180 - Attachment description: WIP: Bug 2032102 - conduit: add socket timeout and retry idempotent calls → Bug 2032102 - conduit: add socket timeout and retry idempotent calls
Attachment #9571181 - Attachment description: WIP: Bug 2032102 - tests: cover new Commit fields in to_dict roundtrip → Bug 2032102 - tests: cover new Commit fields in to_dict roundtrip
Attachment #9571182 - Attachment description: WIP: Bug 2032102 - conduit: pool HTTP connections via urllib3 → Bug 2032102 - conduit: pool HTTP connections via urllib3
Attachment #9571173 - Attachment is obsolete: true
Attachment #9571175 - Attachment is obsolete: true
Attachment #9571181 - Attachment is obsolete: true

The previous bench_submit/bench_patch replayed a hand-rolled sequence of
time.sleep() calls approximating the Conduit API call pattern. The
approximation ignored the real SimpleCache, so calls that are cache hits
in production (e.g. the get_revisions lookup inside edit_revision after
validate_commit_stack has preloaded the stack) were counted as full
network trips. That made the measured speedups of any change which only
eliminated a cache-hit lookup look real when they were zero on the wire.

Replace the sleep-sequence benchmarks with a shared harness
(benchmarks/conduit_harness.py) that monkey-patches ConduitAPI.call with
a function that sleeps the simulated latency and returns fake response
payloads. The real mozphab.conduit helpers, the real SimpleCache, and
the real threadpool scheduling all run. Cache hits therefore cost
nothing and only genuine network trips count toward wall time.

bench_submit now drives validate_commit_stack + the per-commit loop +
the parallel AI-review fan-out through the real helpers, reporting wall
time and per-method call counts for 1/3/5/10-commit stacks.

bench_patch drives ping + get_revisions + get_diffs + the parallel
differential.getrawdiff download, comparing sequential vs parallel raw
diff fetch -- which is a real saving because each getrawdiff is a
network round trip, not a cache hit.

Attachment #9572683 - Attachment description: WIP: Bug 2032102 - benchmarks: drive real Conduit code through mocked call() → Bug 2032102 - benchmarks: drive real Conduit code through mocked call()

When a stack has abandoned revisions, Phabricator's stackGraph can still
include links through them, leaving a predecessor with multiple
successors. convert_stackgraph_to_linear rejects that shape, and the
surrounding try/except re-raised unconditionally -- so --force couldn't
recover even though force_stack_transactions only needs the set of
remote PHIDs (not their order) to do its job.

Apply the same fallback the walk_llist branch already uses: in force
mode, log a warning and treat list(stack_graph.keys()) as the set of
remote PHIDs. That gives force_stack_transactions enough to unlink the
nonlinear graph and relink to match local.

Attachment #9572685 - Attachment description: WIP: Bug 2032102 - reorganise: let --force tolerate abandoned-revision ghost links → Bug 2032102 - reorganise: let --force tolerate abandoned-revision ghost links

Comment on attachment 9572685 [details]
Bug 2032102 - reorganise: let --force tolerate abandoned-revision ghost links

Revision D295971 was moved to bug 2034269. Setting attachment 9572685 [details] to obsolete.

Attachment #9572685 - Attachment is obsolete: true
Attachment #9571172 - Attachment is obsolete: true
Attachment #9572683 - Attachment description: Bug 2032102 - benchmarks: drive real Conduit code through mocked call() → Bug 2032102 - benchmarks: framework and harness driving real Conduit code
Attachment #9571174 - Attachment is obsolete: true
See Also: → 2035465
Status: ASSIGNED → RESOLVED
Closed: 29 days ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: