fix(fts): enforce required terms for and queries by BubbleCal · Pull Request #7385 · lance-format/lance

BubbleCal · 2026-06-22T05:59:23Z

Bug Fix

What is the bug?

FTS AND queries could return matches from a partition that only contained a subset of the required query terms. For fuzzy AND, expansions were also flattened without preserving the original query-position grouping, so missing required positions and same-position expansion scoring could produce incorrect results.

What issues or incorrect behavior does the bug cause?

A query such as alpha AND beta could return rows from a partition that only had alpha because the missing term was skipped before WAND saw the query. Fuzzy AND could also treat expansions from one original position as separate required terms, or score grouped expansions using the wrong token IDF, which could affect top-k ordering.

How does this PR fix the problem?

This PR makes partition posting-list loading aware of the query operator. For AND, a partition now returns empty results when any required original position has no exact term or fuzzy expansion. For fuzzy AND, expansions are grouped by original query position, same-position expansions are unioned for candidate selection, and final scoring uses the actual matched expansion token frequencies.

Validation

cargo fmt --all --check
git diff --check
CARGO_TARGET_DIR=... cargo test -p lance-index test_fuzzy_and_scores_grouped_expansions_by_matched_token -- --nocapture
CARGO_TARGET_DIR=... cargo test -p lance-index test_and_query -- --nocapture
CARGO_TARGET_DIR=... cargo test -p lance-index test_fuzzy_and_groups_expansions_by_original_position -- --nocapture
CARGO_TARGET_DIR=... cargo test -p lance-index bm25_search -- --nocapture
CARGO_TARGET_DIR=... cargo test -p lance-index scalar::inverted::wand::tests -- --nocapture

codecov · 2026-06-22T06:39:50Z

Codecov Report

❌ Patch coverage is 90.68966% with 54 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance-index/src/scalar/inverted/index.rs	91.16%	43 Missing and 5 partials ⚠️
rust/lance-index/src/scalar/inverted/wand.rs	83.78%	6 Missing ⚠️

📢 Thoughts on this report? Let us know!

…ired-terms

Xuanwo

Two issues need changes before merging. Please add regression coverage for multi-token max_expansions and grouped fuzzy AND top-k/resource-bound behavior.

Xuanwo · 2026-06-22T14:27:03Z

                    let part_for_wand = part.clone();
-                    let mut partition_result = spawn_cpu(move || {
+                    let has_grouped_expansions = !grouped_expansions.is_empty();
+                    let wand_params = if has_grouped_expansions {


With grouped fuzzy AND, this clears the requested limit before WAND runs. Since WAND treats None as usize::MAX, a limit=1 query can still materialize every matching candidate in each partition and resolve all deferred row ids before the outer heap trims results, which removes the resource bound users expect from top-k search.

Addressed in afef607: grouped fuzzy AND no longer clears the WAND limit to None; it now uses bounded oversampling based on grouped expansion terms while keeping final matched-expansion rescoring. Added regression coverage in test_fuzzy_and_grouped_rescore_keeps_wand_limit_bounded.

Xuanwo · 2026-06-22T14:27:03Z


            let base_len = tokens.token_type().prefix_len(token) as u32;
            if let TokenMap::Fst(ref map) = self.tokens.tokens {
+                let mut expanded = Vec::new();


This makes max_expansions apply separately to each query token. The previous implementation accumulated expansions in one vector, so the same cap applied to the whole fuzzy query; multi-token fuzzy queries can now expand to tokens.len() * max_expansions terms, changing recall, scoring, and posting IO for existing queries.

Addressed in afef607: fuzzy expansion now applies max_expansions across the whole query again while preserving original token positions for grouped fuzzy AND. Added regression coverage in test_fuzzy_expansion_cap_applies_to_whole_query.

Xuanwo

LGTM, thank you!

fix(fts): enforce required terms for and queries

3fd2416

github-actions Bot added A-index Vector index, linalg, tokenizer bug Something isn't working labels Jun 22, 2026

fix(ci): address failing checks

96541a5

github-actions Bot added the A-python Python bindings label Jun 22, 2026

Merge remote-tracking branch 'origin/main' into yang/fix-fts-and-requ…

79760e7

…ired-terms

BubbleCal marked this pull request as ready for review June 22, 2026 13:57

Xuanwo requested changes Jun 22, 2026

View reviewed changes

fix(fts): preserve fuzzy query limits

afef607

Xuanwo approved these changes Jun 22, 2026

View reviewed changes

BubbleCal merged commit 2b1b100 into main Jun 22, 2026
41 of 42 checks passed

BubbleCal deleted the yang/fix-fts-and-required-terms branch June 22, 2026 15:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(fts): enforce required terms for and queries#7385

fix(fts): enforce required terms for and queries#7385
BubbleCal merged 4 commits into
mainfrom
yang/fix-fts-and-required-terms

BubbleCal commented Jun 22, 2026

Uh oh!

codecov Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

Xuanwo left a comment

Uh oh!

Xuanwo Jun 22, 2026

Uh oh!

BubbleCal Jun 22, 2026

Uh oh!

Xuanwo Jun 22, 2026

Uh oh!

BubbleCal Jun 22, 2026

Uh oh!

Xuanwo left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BubbleCal commented Jun 22, 2026

Bug Fix

What is the bug?

What issues or incorrect behavior does the bug cause?

How does this PR fix the problem?

Validation

Uh oh!

codecov Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Xuanwo left a comment

Choose a reason for hiding this comment

Uh oh!

Xuanwo Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

BubbleCal Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Xuanwo Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

BubbleCal Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Xuanwo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jun 22, 2026 •

edited

Loading