Skip to content

fix: npm ingest packages deadlock#4234

Merged
epipav merged 10 commits into
mainfrom
fix/npm-ingest-packages-deadlock
Jun 19, 2026
Merged

fix: npm ingest packages deadlock#4234
epipav merged 10 commits into
mainfrom
fix/npm-ingest-packages-deadlock

Conversation

@epipav

@epipav epipav commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Note

Medium Risk
Touches concurrent DB write paths (repos, maintainers) and package upsert semantics; changes are targeted at deadlock/race fixes but affect all parallel npm ingest lanes.

Overview
Targets concurrent npm ingest deadlocks by making shared-repo creation lane-safe (getOrCreateRepoByUrl uses read-then-INSERT … ON CONFLICT DO NOTHING with a re-read on lost races instead of a single upsert CTE) and by sorting maintainers by username before upsert so parallel lanes acquire maintainer rows in a consistent order.

Also hardens ingest correctness: unpublished npm stubs (HTTP 200 without versions/dist-tags) are normalized in fetchPackument so packages get unpublished status; MALFORMED packuments follow the fast 4xx skip path instead of endless Temporal retries; per-version licenses are flattened to text via versionLicense; versions_count is not zeroed when an unpublished stub reports no versions; and ingest workflow rounds per run drop from 25 to 5.

Reviewed by Cursor Bugbot for commit a346d8e. Bugbot is set up for automated code reviews on this repo. Configure here.

epipav added 5 commits June 9, 2026 18:25
Signed-off-by: anilb <epipav@gmail.com>
Signed-off-by: anilb <epipav@gmail.com>
Signed-off-by: anilb <epipav@gmail.com>
Signed-off-by: anilb <epipav@gmail.com>
Signed-off-by: anilb <epipav@gmail.com>
Copilot AI review requested due to automatic review settings June 18, 2026 13:29
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@epipav epipav changed the title Fix/npm ingest packages deadlock fix: npm ingest packages deadlock Jun 18, 2026
@github-actions

Copy link
Copy Markdown
Contributor

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conventional Commits FTW!

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets reliability issues in the npm packages ingestion pipeline by reducing lock contention (deadlocks), improving handling of edge-case registry responses, and normalizing license data before persistence.

Changes:

  • Reworked repo “get-or-create” to avoid always attempting an insert (reduces lock contention in the common “already exists” case).
  • Enforced deterministic maintainer processing order to avoid cyclic row-lock acquisition across concurrent ingest lanes.
  • Improved npm packument handling for fully-unpublished packages and made “MALFORMED” responses non-poisoning for Temporal lanes; added per-version license normalization.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
services/libs/data-access-layer/src/packages/repos.ts Avoids insert-first pattern for shared repos; handles race by re-reading after conflict.
services/libs/data-access-layer/src/packages/maintainers.ts Sorts maintainer upserts to prevent deadlocks when multiple packages share maintainers.
services/apps/packages_worker/src/npm/workflows.ts Reduces ingest rounds per workflow run (more frequent continue-as-new).
services/apps/packages_worker/src/npm/upsertPackage.ts Uses new versionLicense() normalization for per-version license persistence.
services/apps/packages_worker/src/npm/normalize.ts Adds versionLicense() and helper for normalizing version license shapes.
services/apps/packages_worker/src/npm/fetchPackument.ts Detects “unpublished stub” responses and converts them into an empty packument.
services/apps/packages_worker/src/npm/activities.ts Treats MALFORMED packuments as permanent (fast-retry then skip) instead of throwing and poisoning a lane.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread services/libs/data-access-layer/src/packages/maintainers.ts Outdated
Comment on lines +59 to +61
// A version's license is sometimes an object ({ type, url })
// or the legacy array form ([{ type, file }, ...]). Passing those raw
export function versionLicense(raw: unknown): string | null {
Comment on lines +74 to +76
function isLicenseObject(v: unknown): v is { type?: string } {
return typeof v === 'object' && v !== null
}
Signed-off-by: anilb <epipav@gmail.com>
Comment thread services/apps/packages_worker/src/npm/fetchPackument.ts Outdated
Comment thread services/apps/packages_worker/src/npm/upsertPackage.ts
Signed-off-by: anilb <epipav@gmail.com>
Copilot AI review requested due to automatic review settings June 19, 2026 08:47

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Comment on lines +63 to +80
export function versionLicense(raw: unknown): string | null {
if (raw == null) return null
if (typeof raw === 'string') return raw || null
if (Array.isArray(raw)) {
const types = raw
.map((l) => (typeof l === 'string' ? l : licenseType(l)))
.filter((t): t is string => Boolean(t))
return types.length ? types.join(' OR ') : null
}
return licenseType(raw)
}

// Extract a string `type` from a license object, or null if absent/non-string.
function licenseType(v: unknown): string | null {
if (typeof v !== 'object' || v === null) return null
const type = (v as { type?: unknown }).type
return typeof type === 'string' ? type : null
}
Copilot AI review requested due to automatic review settings June 19, 2026 14:36

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit a346d8e. Configure here.

Comment thread services/libs/data-access-layer/src/packages/packages.ts

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.

@epipav epipav merged commit a9fa630 into main Jun 19, 2026
16 checks passed
@epipav epipav deleted the fix/npm-ingest-packages-deadlock branch June 19, 2026 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants