Skip to content

fix: bq depsdev deadlock and a couple of fixes and optimizations#4238

Open
themarolt wants to merge 18 commits into
mainfrom
fix/bq-depsdev-deadlock
Open

fix: bq depsdev deadlock and a couple of fixes and optimizations#4238
themarolt wants to merge 18 commits into
mainfrom
fix/bq-depsdev-deadlock

Conversation

@themarolt

@themarolt themarolt commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes several issues discovered during the first full bootstrap of package_dependencies at scale (~1B rows): a deadlock risk in index management, premature job completion before indexes finished rebuilding, monitor display bugs, and missing version_constraint data for ecosystems loaded via the
cheaper BQ table (option B). Also adds workflow monitoring to the deploy scripts and a one-shot dedup script for manual recovery.

Changes

  • Parallel index builds: secondary indexes on package_dependencies and versions now build concurrently instead of sequentially — cuts index rebuild time significantly on partitioned tables
  • Parallel dedup: cross-chunk duplicate removal on package_dependencies runs 8 partitions at a time instead of 1, reducing wall-clock from ~10h to ~1-2h
  • Premature done fix: full-load jobs no longer mark themselves done on the last merge chunk — isFinal: true is deferred until after index rebuild and constraint validation complete
  • 24h activity timeout: rebuildPackageDepsIndexes and rebuildPackageDepsConstraints timeout raised from 12h to 24h to cover worst-case parallel dedup on 1B+ rows
  • --fill-constraints mode: new trigger flag re-exports BQ deps data and upserts version_constraint only where NULL — for ecosystems initially loaded via option B (no version constraint)
  • dedupPackageDeps script: standalone script to manually run dedup + UNIQUE constraint rebuild on package_dependencies without re-running the full workflow
  • Monitor fixes: removed non-existent ecosystems column from query, fixed column truncation causing adjacent columns to jam, added scroll support for long job lists
  • Deploy script monitoring: deploy-staging and deploy-production now wait for the triggered GitHub Actions workflow and report success/failure
  • GO/NUGET BQ table fix: GO and NUGET deps come from ecosystem-specific tables (GoRequirementsLatest, NuGetRequirementsLatest) — they were previously missing or misconfigured
  • setJobStep activity: extracted step tracking into a reusable activity with proper timeout/retry config; added step tracking across more workflow phases

Type of change

  • Bug fix
  • New feature
  • Performance improvement

Note

High Risk
Touches billion-row package_dependencies full loads, parallel index/dedup DDL, and ingest job lifecycle semantics; mistakes could leave constraints dropped, mark jobs done prematurely, or corrupt dependency data.

Overview
Hardens ~1B-row package_dependencies full bootstrap and related OSS packages ingest after first production-scale runs.

Deps ingestion & BQ: depsSql now builds ecosystem-array queries with UNION ALL — NPM/MAVEN/PYPI/CARGO from graph/deps tables, GO from GoRequirements*, NUGET from NuGetRequirements*, including incremental CTEs. New --fill-constraints / fillConstraints re-exports full deps and upserts version_constraint only where NULL (after option B loads). ADR-0003 updated to treat GO like NUGET for table-choice triggers.

Post-load PostgreSQL: Secondary index creation runs in parallel; cross-chunk dedup uses 8-way partition batches with higher work_mem / maintenance_work_mem. Full-load jobs defer done until index/constraint rebuild finishes; rebuild activity timeouts go to 24h. Standalone dedup-package-deps script mirrors dedup + UNIQUE rebuild for manual recovery.

Jobs & monitoring: mergeJobTableRowCounts and setJobStep write meta:step / meta:ecosystems on ingest jobs; monitorOsspckgs shows steps, stuck jobs, scroll, and ecosystem metadata. scripts/cli deploy staging/production waits on gh run watch after triggering workflows.

Other: Scorecard repos merge uses ordered FOR UPDATE to avoid deadlocks with concurrent enrichers; ranking marks merging before rank_packages(); assorted BQ byte ceilings bumped.

Reviewed by Cursor Bugbot for commit 826b42e. Bugbot is set up for automated code reviews on this repo. Configure here.

themarolt added 12 commits June 17, 2026 12:06
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Copilot AI review requested due to automatic review settings June 19, 2026 09:57
@github-actions

Copy link
Copy Markdown
Contributor

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

Comment thread services/apps/packages_worker/src/deps-dev/workflows/ingestDependencies.ts Outdated
Comment thread services/apps/packages_worker/src/deps-dev/queries/depsSql.ts

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Not ready to approve

There are a few concrete issues in the changed code (notably workflow-run detection in scripts/cli and a missing maintenance_work_mem application that the code comments rely on) that should be corrected before merge.

Pull request overview

This PR hardens and speeds up the deps.dev BigQuery → Postgres bootstrap pipeline at very large scale (≈1B package_dependencies rows), addressing deadlock risk, premature “done” marking, missing version_constraint backfills, and improving operational visibility (job-step tracking, monitor UI fixes, deploy script workflow monitoring).

Changes:

  • Adds step/meta tracking to ingest jobs (meta:step, meta:ecosystems) and improves the terminal monitor UX (stuck detection, scroll, column sizing, truncation fixes).
  • Improves full-load correctness + performance (parallel index builds, parallel dedup, defers isFinal: true until after rebuild/validation; adds --fill-constraints backfill mode).
  • Adds operational tooling: deploy workflow monitoring in scripts/cli and a one-shot dedup-package-deps recovery script.
File summaries
File Description
services/libs/data-access-layer/src/osspckgs/ingestJobs.ts Expands tableRowCounts typing and adds helper to merge display/meta keys into table_row_counts.
services/apps/packages_worker/src/scripts/triggerBootstrap.ts Adds --fill-constraints flag and passes it into the Temporal bootstrap workflow.
services/apps/packages_worker/src/scripts/monitorOsspckgs.ts Enhances job list display (step labels, stuck highlighting, scrolling, truncation behavior, ecosystem extraction).
services/apps/packages_worker/src/scripts/exportToBucket.ts Adjusts deps export SQL generation to use the new deps SQL API + default ecosystems.
services/apps/packages_worker/src/scripts/dedupPackageDeps.ts New standalone script to dedup cross-chunk duplicates and rebuild the UNIQUE constraint on package_dependencies.
services/apps/packages_worker/src/scorecard/workflows/ingestScorecard.ts Updates merge SQL to use ordered row locking to avoid deadlocks with concurrent repos updates.
services/apps/packages_worker/src/deps-dev/workflows/ingestVersions.ts Defers “done” until after index/constraint rebuild, adds step tracking, and tweaks BQ max-bytes for full loads.
services/apps/packages_worker/src/deps-dev/workflows/ingestRepos.ts Propagates ecosystems context into export activity metadata.
services/apps/packages_worker/src/deps-dev/workflows/ingestPackages.ts Propagates ecosystems context into export activity metadata.
services/apps/packages_worker/src/deps-dev/workflows/ingestDependentCounts.ts Adds step tracking before guard checks.
services/apps/packages_worker/src/deps-dev/workflows/ingestDependencies.ts Adds --fill-constraints mode, extends timeouts, defers finalization for full-load rebuild phases, and adds step tracking.
services/apps/packages_worker/src/deps-dev/workflows/ingestAdvisories.ts Adjusts BQ max-bytes and propagates ecosystems context into export activity metadata.
services/apps/packages_worker/src/deps-dev/workflows/bootstrapOsspckgs.ts Wires fillConstraints option through the top-level bootstrap workflow.
services/apps/packages_worker/src/deps-dev/queries/depsSql.ts Refactors deps SQL generation to support GO/NUGET ecosystem-specific tables + new full/incremental builders.
services/apps/packages_worker/src/deps-dev/activities/setJobStep.ts New activity to write step state into ingest job metadata (meta:step).
services/apps/packages_worker/src/deps-dev/activities/manageVersionsIndexes.ts Builds secondary indexes in parallel (per-connection) and performs dedup + UNIQUE rebuild.
services/apps/packages_worker/src/deps-dev/activities/managePackageDepsIndexes.ts Builds secondary indexes in parallel and runs parallel partitioned dedup to speed up UNIQUE rebuild.
services/apps/packages_worker/src/deps-dev/activities/index.ts Exports the new setJobStep activity.
services/apps/packages_worker/src/deps-dev/activities/bqExportToGcs.ts Adds optional ecosystems metadata to ingest-job table_row_counts for monitor display/diagnostics.
services/apps/packages_worker/src/criticality/activities.ts Marks the ingest job status as merging before ranking merge.
services/apps/packages_worker/package.json Adds dedup-package-deps scripts (and local variant).
scripts/cli Adds deploy workflow-run monitoring via gh run watch and conclusion reporting.
docs/adr/README.md Updates ADR-0003 title/summary to reflect GO + NUGET trigger condition.
docs/adr/0003-deps-bq-table-selection.md Updates ADR-0003 wording to include GO as well as NUGET in the decision/trigger language.

Copilot's findings

  • Files reviewed: 24/24 changed files
  • Comments generated: 4

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/libs/data-access-layer/src/osspckgs/ingestJobs.ts
Comment thread scripts/cli Outdated
Comment thread scripts/cli
Copilot AI review requested due to automatic review settings June 19, 2026 11:31

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit aac2ccd. Configure here.

Comment thread scripts/cli

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Not ready to approve

The new dedup-package-deps recovery script can hang or silently no-op with invalid/missing --concurrency, and there are correctness/consistency fixes needed in the new DAL JSONB merge helper and the versions-index rebuild performance changes.

Copilot's findings
  • Files reviewed: 24/24 changed files
  • Comments generated: 3

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.

Comment thread services/apps/packages_worker/src/scripts/dedupPackageDeps.ts
Comment thread services/apps/packages_worker/src/deps-dev/activities/manageVersionsIndexes.ts Outdated
Comment thread services/libs/data-access-layer/src/osspckgs/ingestJobs.ts
Copilot AI review requested due to automatic review settings June 19, 2026 12:01

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Human review recommended

It changes core billion-row ingest/index/constraint rebuild behavior and includes at least one confirmed SQL/DB execution bug (transaction-scoped SET LOCAL) that should be fixed and revalidated before approval.

Copilot's findings
  • Files reviewed: 24/24 changed files
  • Comments generated: 2

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.

Comment thread services/libs/data-access-layer/src/osspckgs/ingestJobs.ts
Copilot AI review requested due to automatic review settings June 19, 2026 12:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Human review recommended

It changes multiple high-risk billion-row ingestion and Postgres maintenance paths and still has open correctness/operability concerns (job metadata consistency, fill-constraints update semantics, and memory-pressure tuning for parallel maintenance).

Copilot's findings
  • Files reviewed: 24/24 changed files
  • Comments generated: 6

Note

Your feedback helps us improve the quality of this feature.
Please use 👍 or 👎 to tell us whether this assessment is correct.

Comment on lines 227 to 237
const exportResult = await bqExportToGcs({
jobKind: 'package_dependencies',
sql,
runId: opts.runId,
syncMode: opts.syncMode,
snapshotAt: opts.today,
maxBytesGb: 10000,
maxBytesGb: opts.syncMode === 'full' || isFill ? 25000 : 10000,
reuseExports: opts.reuseExports,
exportName: opts.exportName,
ecosystems: opts.ecosystems,
})
Comment on lines +194 to +197
// H7: mark exporting before we start the BQ job; store ecosystems filter in table_row_counts JSONB.
await markJobStatus(qx, jobId, 'exporting', {
...(ecosystems ? { tableRowCounts: { 'meta:ecosystems': ecosystems } } : {}),
})
Comment on lines +180 to +182
ON CONFLICT (version_id, depends_on_id, dependency_kind) DO UPDATE
SET version_constraint = EXCLUDED.version_constraint
WHERE package_dependencies.version_constraint IS NULL
Comment on lines +123 to +127
// Run in parallel batches of DEDUP_CONCURRENCY to cut wall-clock from ~10h to ~1-2h.
const NUM_PARTITIONS = 64
const DEDUP_CONCURRENCY = 8
let totalDedupDeleted = 0
for (let p = 0; p < NUM_PARTITIONS; p++) {
const result = await qx.result(`
DELETE FROM package_dependencies pd
USING (
SELECT id, depends_on_id
FROM (
SELECT id, depends_on_id,
ROW_NUMBER() OVER (
PARTITION BY version_id, depends_on_id, dependency_kind
ORDER BY id
) AS rn
FROM package_dependencies
WHERE depends_on_id % ${NUM_PARTITIONS} = ${p}
) sub
WHERE rn > 1
) dupes
WHERE pd.id = dupes.id AND pd.depends_on_id = dupes.depends_on_id
`)
totalDedupDeleted += result
for (let batch = 0; batch < NUM_PARTITIONS; batch += DEDUP_CONCURRENCY) {
Comment on lines +94 to +103
// Build indexes in parallel — each on its own connection so they run concurrently.
// maintenance_work_mem per connection: with 32 partitions and default 64MB, PG spills to
// disk on every partition; 2GB lets the sort fit in RAM and cuts build time dramatically.
await Promise.all(
toRebuild.map(async (idx) => {
const conn = await getPackagesDb()
await conn.tx(async (t) => {
await t.result(`SET LOCAL maintenance_work_mem = '2GB'`)
log.info({ columns: idx.columns }, 'Creating index on versions')
await t.result(idx.createSql)
Comment on lines +23 to +28
async function processPartition(p: number, dryRun: boolean): Promise<number> {
log.info({ partition: p }, dryRun ? 'counting partition' : 'deduping partition')
const conn = await getPackagesDb()
const count = await conn.tx(async (tx) => {
await tx.result(`SET LOCAL work_mem = '2GB'`)
if (dryRun) {
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants