Skip to content

Discard an un-appliable TRUNCATE instead of looping the apply worker#505

Merged
mason-sharp merged 3 commits into
mainfrom
spoc-590
Jun 26, 2026
Merged

Discard an un-appliable TRUNCATE instead of looping the apply worker#505
mason-sharp merged 3 commits into
mainfrom
spoc-590

Conversation

@danolivo

Copy link
Copy Markdown
Contributor

Problem

In DISCARD exception mode, handle_truncate() had no per-action exception handling. On the retry (use_try_block) pass it re-ran the truncate directly, so any failure — most commonly a relation that does not exist on the subscriber — escaped as an ERROR.

With use_try_block set and xact_had_exception still clear, apply_work()'s outer handler re-throws to exit the worker; the background worker then restarts, replays from the same position, and fails identically. The result is an apply-worker crash/restart loop that stalls every later change from that origin — spock.sync_event() included — so the whole transaction (its row changes too) never lands and the stream wedges.

The row handlers (handle_insert/update/delete) and the queued-SQL handler (handle_sql_or_exception()) already wrap their work in a subtransaction for exactly this reason. TRUNCATE was the one directly-applied operation missing it: ExecuteTruncateGuts() is called directly (not via ProcessUtility), and on a missing relation handle_truncate() simply ereported, with no savepoint to roll back to.

Fix

  1. Extract apply_truncate() — a preparatory refactor with no behaviour change, isolating the relation resolution (including partition expansion) and the ExecuteTruncateGuts() call, i.e. the part of the path that can throw.

  2. Discard the failing TRUNCATE in DISCARD mode — on the retry path, run apply_truncate() inside an internal subtransaction. On failure, roll back the savepoint, set xact_had_exception, log the discarded TRUNCATE to spock.exception_log, and carry on. This mirrors the pattern already used by the row and SQL handlers.

    TRANSDISCARD and SUB_DISABLE keep their unconditional skip: the whole transaction is being abandoned, so the truncate must not be attempted at all.

Behaviour

A TRUNCATE that cannot be applied on the subscriber is now discarded individually — recorded in spock.exception_log with operation = 'TRUNCATE' — while the other changes in the same transaction still apply and the apply worker keeps running. No more crash/restart loop on a missing relation.

@danolivo danolivo self-assigned this Jun 17, 2026
@danolivo danolivo added the bug Something isn't working label Jun 17, 2026
@codacy-production

codacy-production Bot commented Jun 17, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 2 duplication

Metric Results
Duplication 2

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@mason-sharp mason-sharp requested a review from ibrarahmad June 17, 2026 13:51
danolivo added 3 commits June 24, 2026 14:42
Preparatory refactor with no change in behaviour: move the relation
resolution (including partition expansion) and the ExecuteTruncateGuts()
call out of handle_truncate() into a new apply_truncate() helper.

This isolates the part of the TRUNCATE apply path that can throw, so the
next commit can wrap it in a subtransaction for DISCARD-mode exception
handling without obscuring the material change.
In DISCARD mode handle_truncate() had no per-action exception handling.
On the retry pass it re-ran the truncate directly, so a
failure - most commonly a relation that does not exist on the
subscriber - escaped as an ERROR.  With use_try_block set and
xact_had_exception clear, apply_work()'s outer handler re-throws to exit
the worker; the background worker then restarts, replays from the same
position, and fails identically.  The result is an apply-worker
crash/restart loop that stalls every later change from that origin,
sync_event included.

Wrap apply_truncate() in a subtransaction on the DISCARD retry path,
mirroring handle_insert/update/delete and handle_sql_or_exception():
on failure, roll back the savepoint, set xact_had_exception, log the
discarded TRUNCATE, and carry on.  TRANSDISCARD and SUB_DISABLE keep
their unconditional skip - the whole transaction is being abandoned, so
the truncate must not be attempted.
Add TAP test 031, exercising the fix from previous commit.

A single replicated transaction mixes INSERT/UPDATE/DELETE on one table
with a TRUNCATE of another table that has been dropped on the subscriber
- a missing relation, modelled by dropping it manually.  In DISCARD mode
the test asserts that only the TRUNCATE is discarded (and logged to
spock.exception_log), that the row changes in the same transaction still
apply, and that the subscription keeps replicating: i.e. the missing
relation no longer sends the apply worker into a crash/restart loop.
@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 81f05394-e34d-4266-8837-e80b903c73fc

📥 Commits

Reviewing files that changed from the base of the PR and between fb3fb44 and 800dbce.

📒 Files selected for processing (3)
  • src/spock_apply.c
  • tests/tap/schedule
  • tests/tap/t/031_discard_truncate_missing_relation.pl

📝 Walkthrough

Walkthrough

handle_truncate in spock_apply.c is refactored: truncation internals are extracted into a new apply_truncate static helper, and handle_truncate becomes a dispatcher that runs TRUNCATE inside a subtransaction under DISCARD mode (logging failures) or skips/logs under TRANSDISCARD/SUB_DISABLE. A new TAP test validates this behavior.

DISCARD TRUNCATE Handling and Test Coverage

Layer / File(s) Summary
apply_truncate helper extraction
src/spock_apply.c
Adds apply_truncate(remote_relids, restart_seqs) static function declaration and moves all truncation internals—relation OID resolution, exclusive locks, partition expansion, logical-decoding tracking, ExecuteTruncateGuts, and relation closing—from the old handle_truncate body into the new helper.
handle_truncate DISCARD subtransaction wrapper
src/spock_apply.c
Rewrites handle_truncate as a thin dispatcher: reads the TRUNCATE payload, then routes to skip-and-log (TRANSDISCARD/SUB_DISABLE with use_try_block), a subtransaction with error logging on failure (DISCARD with use_try_block), or a direct apply_truncate call (no try block).
TAP test: DISCARD TRUNCATE with missing relation
tests/tap/schedule, tests/tap/t/031_discard_truncate_missing_relation.pl
Schedules and implements a 2-node TAP test: sets up DISCARD exception logging, drops t_trunc only on the subscriber, runs a mixed provider transaction (INSERT/UPDATE/DELETE + TRUNCATE), asserts the TRUNCATE appears in spock.exception_log, confirms the DML effects replicate, verifies the subscription remains in replicating state, and checks replication continues after the discard.

🐇 A truncate came bouncing down the wire,
But the table was gone — oh, what a dire!
Into the exception log it hopped with grace,
While INSERT and DELETE kept up their pace.
The worker stayed calm, still replicating true —
One missing relation? No crash ensued! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: DISCARD-mode TRUNCATE failures are discarded instead of looping the apply worker.
Description check ✅ Passed The description directly matches the implemented refactor, DISCARD handling fix, and regression test.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch spoc-590

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@mason-sharp mason-sharp merged commit da7ee30 into main Jun 26, 2026
14 checks passed
@mason-sharp mason-sharp deleted the spoc-590 branch June 26, 2026 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants