fix: bq depsdev deadlock and a couple of fixes and optimizations#4238
fix: bq depsdev deadlock and a couple of fixes and optimizations#4238themarolt wants to merge 18 commits into
Conversation
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
There was a problem hiding this comment.
⚠️ Not ready to approve
There are a few concrete issues in the changed code (notably workflow-run detection in scripts/cli and a missing maintenance_work_mem application that the code comments rely on) that should be corrected before merge.
Pull request overview
This PR hardens and speeds up the deps.dev BigQuery → Postgres bootstrap pipeline at very large scale (≈1B package_dependencies rows), addressing deadlock risk, premature “done” marking, missing version_constraint backfills, and improving operational visibility (job-step tracking, monitor UI fixes, deploy script workflow monitoring).
Changes:
- Adds step/meta tracking to ingest jobs (
meta:step,meta:ecosystems) and improves the terminal monitor UX (stuck detection, scroll, column sizing, truncation fixes). - Improves full-load correctness + performance (parallel index builds, parallel dedup, defers
isFinal: trueuntil after rebuild/validation; adds--fill-constraintsbackfill mode). - Adds operational tooling: deploy workflow monitoring in
scripts/cliand a one-shotdedup-package-depsrecovery script.
File summaries
| File | Description |
|---|---|
| services/libs/data-access-layer/src/osspckgs/ingestJobs.ts | Expands tableRowCounts typing and adds helper to merge display/meta keys into table_row_counts. |
| services/apps/packages_worker/src/scripts/triggerBootstrap.ts | Adds --fill-constraints flag and passes it into the Temporal bootstrap workflow. |
| services/apps/packages_worker/src/scripts/monitorOsspckgs.ts | Enhances job list display (step labels, stuck highlighting, scrolling, truncation behavior, ecosystem extraction). |
| services/apps/packages_worker/src/scripts/exportToBucket.ts | Adjusts deps export SQL generation to use the new deps SQL API + default ecosystems. |
| services/apps/packages_worker/src/scripts/dedupPackageDeps.ts | New standalone script to dedup cross-chunk duplicates and rebuild the UNIQUE constraint on package_dependencies. |
| services/apps/packages_worker/src/scorecard/workflows/ingestScorecard.ts | Updates merge SQL to use ordered row locking to avoid deadlocks with concurrent repos updates. |
| services/apps/packages_worker/src/deps-dev/workflows/ingestVersions.ts | Defers “done” until after index/constraint rebuild, adds step tracking, and tweaks BQ max-bytes for full loads. |
| services/apps/packages_worker/src/deps-dev/workflows/ingestRepos.ts | Propagates ecosystems context into export activity metadata. |
| services/apps/packages_worker/src/deps-dev/workflows/ingestPackages.ts | Propagates ecosystems context into export activity metadata. |
| services/apps/packages_worker/src/deps-dev/workflows/ingestDependentCounts.ts | Adds step tracking before guard checks. |
| services/apps/packages_worker/src/deps-dev/workflows/ingestDependencies.ts | Adds --fill-constraints mode, extends timeouts, defers finalization for full-load rebuild phases, and adds step tracking. |
| services/apps/packages_worker/src/deps-dev/workflows/ingestAdvisories.ts | Adjusts BQ max-bytes and propagates ecosystems context into export activity metadata. |
| services/apps/packages_worker/src/deps-dev/workflows/bootstrapOsspckgs.ts | Wires fillConstraints option through the top-level bootstrap workflow. |
| services/apps/packages_worker/src/deps-dev/queries/depsSql.ts | Refactors deps SQL generation to support GO/NUGET ecosystem-specific tables + new full/incremental builders. |
| services/apps/packages_worker/src/deps-dev/activities/setJobStep.ts | New activity to write step state into ingest job metadata (meta:step). |
| services/apps/packages_worker/src/deps-dev/activities/manageVersionsIndexes.ts | Builds secondary indexes in parallel (per-connection) and performs dedup + UNIQUE rebuild. |
| services/apps/packages_worker/src/deps-dev/activities/managePackageDepsIndexes.ts | Builds secondary indexes in parallel and runs parallel partitioned dedup to speed up UNIQUE rebuild. |
| services/apps/packages_worker/src/deps-dev/activities/index.ts | Exports the new setJobStep activity. |
| services/apps/packages_worker/src/deps-dev/activities/bqExportToGcs.ts | Adds optional ecosystems metadata to ingest-job table_row_counts for monitor display/diagnostics. |
| services/apps/packages_worker/src/criticality/activities.ts | Marks the ingest job status as merging before ranking merge. |
| services/apps/packages_worker/package.json | Adds dedup-package-deps scripts (and local variant). |
| scripts/cli | Adds deploy workflow-run monitoring via gh run watch and conclusion reporting. |
| docs/adr/README.md | Updates ADR-0003 title/summary to reflect GO + NUGET trigger condition. |
| docs/adr/0003-deps-bq-table-selection.md | Updates ADR-0003 wording to include GO as well as NUGET in the decision/trigger language. |
Copilot's findings
- Files reviewed: 24/24 changed files
- Comments generated: 4
Note
Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 4 total unresolved issues (including 3 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit aac2ccd. Configure here.
There was a problem hiding this comment.
⚠️ Not ready to approve
The new dedup-package-deps recovery script can hang or silently no-op with invalid/missing --concurrency, and there are correctness/consistency fixes needed in the new DAL JSONB merge helper and the versions-index rebuild performance changes.
Copilot's findings
- Files reviewed: 24/24 changed files
- Comments generated: 3
Note
Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.
Signed-off-by: Uroš Marolt <uros@marolt.me>
There was a problem hiding this comment.
⚠️ Human review recommended
It changes core billion-row ingest/index/constraint rebuild behavior and includes at least one confirmed SQL/DB execution bug (transaction-scoped SET LOCAL) that should be fixed and revalidated before approval.
Copilot's findings
- Files reviewed: 24/24 changed files
- Comments generated: 2
Note
Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.
Signed-off-by: Uroš Marolt <uros@marolt.me>
There was a problem hiding this comment.
⚠️ Human review recommended
It changes multiple high-risk billion-row ingestion and Postgres maintenance paths and still has open correctness/operability concerns (job metadata consistency, fill-constraints update semantics, and memory-pressure tuning for parallel maintenance).
Copilot's findings
- Files reviewed: 24/24 changed files
- Comments generated: 6
Note
Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.
| const exportResult = await bqExportToGcs({ | ||
| jobKind: 'package_dependencies', | ||
| sql, | ||
| runId: opts.runId, | ||
| syncMode: opts.syncMode, | ||
| snapshotAt: opts.today, | ||
| maxBytesGb: 10000, | ||
| maxBytesGb: opts.syncMode === 'full' || isFill ? 25000 : 10000, | ||
| reuseExports: opts.reuseExports, | ||
| exportName: opts.exportName, | ||
| ecosystems: opts.ecosystems, | ||
| }) |
| // H7: mark exporting before we start the BQ job; store ecosystems filter in table_row_counts JSONB. | ||
| await markJobStatus(qx, jobId, 'exporting', { | ||
| ...(ecosystems ? { tableRowCounts: { 'meta:ecosystems': ecosystems } } : {}), | ||
| }) |
| ON CONFLICT (version_id, depends_on_id, dependency_kind) DO UPDATE | ||
| SET version_constraint = EXCLUDED.version_constraint | ||
| WHERE package_dependencies.version_constraint IS NULL |
| // Run in parallel batches of DEDUP_CONCURRENCY to cut wall-clock from ~10h to ~1-2h. | ||
| const NUM_PARTITIONS = 64 | ||
| const DEDUP_CONCURRENCY = 8 | ||
| let totalDedupDeleted = 0 | ||
| for (let p = 0; p < NUM_PARTITIONS; p++) { | ||
| const result = await qx.result(` | ||
| DELETE FROM package_dependencies pd | ||
| USING ( | ||
| SELECT id, depends_on_id | ||
| FROM ( | ||
| SELECT id, depends_on_id, | ||
| ROW_NUMBER() OVER ( | ||
| PARTITION BY version_id, depends_on_id, dependency_kind | ||
| ORDER BY id | ||
| ) AS rn | ||
| FROM package_dependencies | ||
| WHERE depends_on_id % ${NUM_PARTITIONS} = ${p} | ||
| ) sub | ||
| WHERE rn > 1 | ||
| ) dupes | ||
| WHERE pd.id = dupes.id AND pd.depends_on_id = dupes.depends_on_id | ||
| `) | ||
| totalDedupDeleted += result | ||
| for (let batch = 0; batch < NUM_PARTITIONS; batch += DEDUP_CONCURRENCY) { |
| // Build indexes in parallel — each on its own connection so they run concurrently. | ||
| // maintenance_work_mem per connection: with 32 partitions and default 64MB, PG spills to | ||
| // disk on every partition; 2GB lets the sort fit in RAM and cuts build time dramatically. | ||
| await Promise.all( | ||
| toRebuild.map(async (idx) => { | ||
| const conn = await getPackagesDb() | ||
| await conn.tx(async (t) => { | ||
| await t.result(`SET LOCAL maintenance_work_mem = '2GB'`) | ||
| log.info({ columns: idx.columns }, 'Creating index on versions') | ||
| await t.result(idx.createSql) |
| async function processPartition(p: number, dryRun: boolean): Promise<number> { | ||
| log.info({ partition: p }, dryRun ? 'counting partition' : 'deduping partition') | ||
| const conn = await getPackagesDb() | ||
| const count = await conn.tx(async (tx) => { | ||
| await tx.result(`SET LOCAL work_mem = '2GB'`) | ||
| if (dryRun) { |

Summary
Fixes several issues discovered during the first full bootstrap of package_dependencies at scale (~1B rows): a deadlock risk in index management, premature job completion before indexes finished rebuilding, monitor display bugs, and missing version_constraint data for ecosystems loaded via the
cheaper BQ table (option B). Also adds workflow monitoring to the deploy scripts and a one-shot dedup script for manual recovery.
Changes
Type of change
Note
High Risk
Touches billion-row
package_dependenciesfull loads, parallel index/dedup DDL, and ingest job lifecycle semantics; mistakes could leave constraints dropped, mark jobs done prematurely, or corrupt dependency data.Overview
Hardens ~1B-row
package_dependenciesfull bootstrap and related OSS packages ingest after first production-scale runs.Deps ingestion & BQ:
depsSqlnow builds ecosystem-array queries with UNION ALL — NPM/MAVEN/PYPI/CARGO from graph/deps tables, GO fromGoRequirements*, NUGET fromNuGetRequirements*, including incremental CTEs. New--fill-constraints/fillConstraintsre-exports full deps and upsertsversion_constraintonly where NULL (after option B loads). ADR-0003 updated to treat GO like NUGET for table-choice triggers.Post-load PostgreSQL: Secondary index creation runs in parallel; cross-chunk dedup uses 8-way partition batches with higher
work_mem/maintenance_work_mem. Full-load jobs deferdoneuntil index/constraint rebuild finishes; rebuild activity timeouts go to 24h. Standalonededup-package-depsscript mirrors dedup + UNIQUE rebuild for manual recovery.Jobs & monitoring:
mergeJobTableRowCountsandsetJobStepwritemeta:step/meta:ecosystemson ingest jobs;monitorOsspckgsshows steps, stuck jobs, scroll, and ecosystem metadata.scripts/clideploy staging/production waits ongh run watchafter triggering workflows.Other: Scorecard repos merge uses ordered
FOR UPDATEto avoid deadlocks with concurrent enrichers; ranking marksmergingbeforerank_packages(); assorted BQ byte ceilings bumped.Reviewed by Cursor Bugbot for commit 826b42e. Bugbot is set up for automated code reviews on this repo. Configure here.