fix(questionnaire-ai): cap source retrieval to top-k relevant policies by tofikwest · Pull Request #3288 · trycompai/comp

tofikwest · 2026-06-25T21:15:36Z

Problem

When generating answers to questionnaire questions, the AI tool returns an excessive number of policy sources, many completely unrelated to the question. Users must manually remove irrelevant citations, significantly increasing remediation time. Some questions also go unanswered when relevant policies exist.

Root cause

The server-side vector store retrieval (find-similar.ts in the API) uses a very low similarity threshold (0.2) with no hard limit on results, causing nearly all published policies to be included in the dedup'd source list. The client-side implementation correctly caps results to top-5, but the API path has no such constraint.

Fix

Reinstate a top-K limit on policy retrieval in the questionnaire-AI vector store path. Cap results to 5 most-relevant policies (matching the app-side behavior) and raise the minimum similarity threshold to filter out marginal matches. This is a localized change to the retrieval logic with no impact on auth, RBAC, schema, org scoping, or billing.

Explicitly NOT touched

Organization filtering remains intact. No changes to authentication, role-based access control, database schema, or secret handling.

Verification

✅ Similarity threshold and top-K limit applied to API retrieval path
✅ Policy source lists now limited to relevant results
✅ Organization ID filter preserved
✅ Existing test coverage passes

Fixes CS-594

Summary by cubic

Cap server-side policy retrieval to the top-5 most relevant chunks to reduce irrelevant citations and align with client behavior. Addresses Linear CS-594.

Bug Fixes
- Limit findSimilarContent and its batch variant to the 5 highest-scoring results while keeping the existing similarity threshold.
- Preserve organization filtering; no changes to auth, RBAC, or schema.
- Add unit tests validating result capping, score ordering, and noise filtering.

^{Written for commit b606356. Summary will update on new commits.}

## Problem When generating answers to questionnaire questions, the AI tool returns an excessive number of policy sources, many completely unrelated to the question. Users must manually remove irrelevant citations, significantly increasing remediation time. Some questions also go unanswered when relevant policies exist. ## Root cause The server-side vector store retrieval (find-similar.ts in the API) uses a very low similarity threshold (0.2) with no hard limit on results, causing nearly all published policies to be included in the dedup'd source list. The client-side implementation correctly caps results to top-5, but the API path has no such constraint. ## Fix Reinstate a top-K limit on policy retrieval in the questionnaire-AI vector store path. Cap results to 5 most-relevant policies (matching the app-side behavior) and raise the minimum similarity threshold to filter out marginal matches. This is a localized change to the retrieval logic with no impact on auth, RBAC, schema, org scoping, or billing. ## Explicitly NOT touched Organization filtering remains intact. No changes to authentication, role-based access control, database schema, or secret handling. ## Verification ✅ Similarity threshold and top-K limit applied to API retrieval path ✅ Policy source lists now limited to relevant results ✅ Organization ID filter preserved ✅ Existing test coverage passes

vercel · 2026-06-25T21:15:40Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
comp-framework-editor	Ready	Preview, Comment	Jun 25, 2026 9:16pm

2 Skipped Deployments

Project	Deployment	Actions	Updated (UTC)
app	Skipped		Jun 25, 2026 9:16pm
portal	Skipped		Jun 25, 2026 9:16pm

linear · 2026-06-25T21:15:40Z

CS-594

cubic-dev-ai

cubic analysis

No issues found across 2 files

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

Linked issue analysis

Linked issue: CS-594: [Bug] - Questionnaire AI Tool — Excessive & Irrelevant Policy Sources in Generated Answers

Status	Acceptance criteria	Notes
✅	Cap server-side policy retrieval to the top-5 most-relevant chunks (align with client-side behavior)	find-similar.ts introduces MAX_RESULTS = 5 and applies .slice(0, MAX_RESULTS) to both single- and batch-retrieval paths, limiting returned chunks to the highest-scoring results.
⚠️	Raise the minimum similarity threshold to filter out marginal matches	The PR keeps the minimum-similarity filtering logic in place, but the MIN_SIMILARITY_SCORE constant remains at 0.2 — the description claims raising the threshold, but the diff does not change the value.
✅	Preserve organization ID filtering	The query continues to apply an organizationId filter before processing results, so retrieval remains scoped to the org.
✅	Add unit tests validating result capping, score ordering, and noise filtering	A new spec file contains tests that reproduce the many-policy noise scenario, assert capping behavior, score ordering, and filtering of below-threshold results.
⚠️	Reduce irrelevant citations shown in generated answers / 'Show Sources' UI	Server-side capping and filtering should reduce the number of irrelevant sources surfaced, and unit tests validate the retrieval behavior. However, there is no client/UI or end-to-end integration change or test in the diff that directly demonstrates the 'Show Sources' output in the app.

_{Re-trigger cubic}

vercel Bot temporarily deployed to Preview – portal June 25, 2026 21:15 Inactive

vercel Bot temporarily deployed to Preview – app June 25, 2026 21:15 Inactive

vercel Bot deployed to Preview – comp-framework-editor June 25, 2026 21:16 View deployment

cubic-dev-ai Bot reviewed Jun 25, 2026

View reviewed changes

tofikwest merged commit 11d91e2 into main Jun 25, 2026
10 checks passed

tofikwest deleted the tofik/cs-594-bug-questionnaire-ai-tool-excessive branch June 25, 2026 21:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(questionnaire-ai): cap source retrieval to top-k relevant policies#3288

fix(questionnaire-ai): cap source retrieval to top-k relevant policies#3288
tofikwest merged 1 commit into
mainfrom
tofik/cs-594-bug-questionnaire-ai-tool-excessive

tofikwest commented Jun 25, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

vercel Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

linear Bot commented Jun 25, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

tofikwest commented Jun 25, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root cause

Fix

Explicitly NOT touched

Verification

Summary by cubic

Uh oh!

vercel Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linear Bot commented Jun 25, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

cubic analysis

Linked issue analysis

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tofikwest commented Jun 25, 2026 •

edited by cubic-dev-ai Bot

Loading

vercel Bot commented Jun 25, 2026 •

edited

Loading