fix: npm ingest packages deadlock#4234
Conversation
Signed-off-by: anilb <epipav@gmail.com>
Signed-off-by: anilb <epipav@gmail.com>
Signed-off-by: anilb <epipav@gmail.com>
Signed-off-by: anilb <epipav@gmail.com>
Signed-off-by: anilb <epipav@gmail.com>
|
|
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
There was a problem hiding this comment.
Pull request overview
This PR targets reliability issues in the npm packages ingestion pipeline by reducing lock contention (deadlocks), improving handling of edge-case registry responses, and normalizing license data before persistence.
Changes:
- Reworked repo “get-or-create” to avoid always attempting an insert (reduces lock contention in the common “already exists” case).
- Enforced deterministic maintainer processing order to avoid cyclic row-lock acquisition across concurrent ingest lanes.
- Improved npm packument handling for fully-unpublished packages and made “MALFORMED” responses non-poisoning for Temporal lanes; added per-version license normalization.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| services/libs/data-access-layer/src/packages/repos.ts | Avoids insert-first pattern for shared repos; handles race by re-reading after conflict. |
| services/libs/data-access-layer/src/packages/maintainers.ts | Sorts maintainer upserts to prevent deadlocks when multiple packages share maintainers. |
| services/apps/packages_worker/src/npm/workflows.ts | Reduces ingest rounds per workflow run (more frequent continue-as-new). |
| services/apps/packages_worker/src/npm/upsertPackage.ts | Uses new versionLicense() normalization for per-version license persistence. |
| services/apps/packages_worker/src/npm/normalize.ts | Adds versionLicense() and helper for normalizing version license shapes. |
| services/apps/packages_worker/src/npm/fetchPackument.ts | Detects “unpublished stub” responses and converts them into an empty packument. |
| services/apps/packages_worker/src/npm/activities.ts | Treats MALFORMED packuments as permanent (fast-retry then skip) instead of throwing and poisoning a lane. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // A version's license is sometimes an object ({ type, url }) | ||
| // or the legacy array form ([{ type, file }, ...]). Passing those raw | ||
| export function versionLicense(raw: unknown): string | null { |
| function isLicenseObject(v: unknown): v is { type?: string } { | ||
| return typeof v === 'object' && v !== null | ||
| } |
Signed-off-by: anilb <epipav@gmail.com>
Signed-off-by: anilb <epipav@gmail.com>
| export function versionLicense(raw: unknown): string | null { | ||
| if (raw == null) return null | ||
| if (typeof raw === 'string') return raw || null | ||
| if (Array.isArray(raw)) { | ||
| const types = raw | ||
| .map((l) => (typeof l === 'string' ? l : licenseType(l))) | ||
| .filter((t): t is string => Boolean(t)) | ||
| return types.length ? types.join(' OR ') : null | ||
| } | ||
| return licenseType(raw) | ||
| } | ||
|
|
||
| // Extract a string `type` from a license object, or null if absent/non-string. | ||
| function licenseType(v: unknown): string | null { | ||
| if (typeof v !== 'object' || v === null) return null | ||
| const type = (v as { type?: unknown }).type | ||
| return typeof type === 'string' ? type : null | ||
| } |
Signed-off-by: anilb <epipav@gmail.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit a346d8e. Configure here.

Note
Medium Risk
Touches concurrent DB write paths (repos, maintainers) and package upsert semantics; changes are targeted at deadlock/race fixes but affect all parallel npm ingest lanes.
Overview
Targets concurrent npm ingest deadlocks by making shared-repo creation lane-safe (
getOrCreateRepoByUrluses read-then-INSERT … ON CONFLICT DO NOTHINGwith a re-read on lost races instead of a single upsert CTE) and by sorting maintainers by username before upsert so parallel lanes acquire maintainer rows in a consistent order.Also hardens ingest correctness: unpublished npm stubs (HTTP 200 without
versions/dist-tags) are normalized infetchPackumentso packages getunpublishedstatus;MALFORMEDpackuments follow the fast 4xx skip path instead of endless Temporal retries; per-version licenses are flattened to text viaversionLicense;versions_countis not zeroed when an unpublished stub reports no versions; and ingest workflow rounds per run drop from 25 to 5.Reviewed by Cursor Bugbot for commit a346d8e. Bugbot is set up for automated code reviews on this repo. Configure here.