Skip to content

tac: read regular files instead of mmap to avoid SIGBUS on truncation#12971

Open
DarthStrom wants to merge 1 commit into
uutils:mainfrom
DarthStrom:tac-sigbus-9748
Open

tac: read regular files instead of mmap to avoid SIGBUS on truncation#12971
DarthStrom wants to merge 1 commit into
uutils:mainfrom
DarthStrom:tac-sigbus-9748

Conversation

@DarthStrom

Copy link
Copy Markdown
Contributor

Fixes #9748

Problem

tac memory-maps regular files. If another process truncates such a file while it is mapped — e.g. a log file being rotated — accessing the now-invalid pages raises SIGBUS and the process is killed ("Bus error (core dumped)"). Mapping a file whose backing store can change underneath us is inherently unsound.

Reproducer on main:

dd if=/dev/zero of=/tmp/tactest bs=1M count=40 2>/dev/null
( sleep 0.002; truncate -s 0 /tmp/tactest ) &
tac /tmp/tactest        # -> Bus error (core dumped)

This reproduces 12/12 runs for me.

Fix

Read regular files into memory up front with read_to_end rather than memory-mapping them, so the bytes are copied into process-owned memory before scanning and a concurrent truncation can no longer fault. The now-unused try_mmap_file() helper is removed.

stdin is unchanged: it still mmaps a process-owned temporary file, where external truncation is not a concern.

After the fix the reproducer above no longer crashes (0/20 runs).

Test

Adds test_tac_file_truncated_during_read_does_not_crash, which truncates a file while tac is reading it and asserts the process is not killed by a signal.

There's a useful asymmetry that makes this a stable regression guard: once tac copies the bytes up front, a truncation race can never SIGBUS, so the test is reliably green on the fixed code regardless of timing — while still failing on the old mmap-based code. I verified this by reverting just the source fix and keeping the test: it fails 3/3 with signal Some(7)
(SIGBUS).

Credit

This builds on the earlier read_to_end approach in #11326 and #11464 (both uutils PRs), which stalled on the missing regression test that this PR supplies.

Checklist

  • cargo fmt clean
  • cargo clippy clean
  • cargo test --features tac --no-default-features --test tests test_tac — 32/32 pass

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown

GNU testsuite comparison:

Skip an intermittent issue tests/date/date-locale-hour (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/date/resolution (fails in this run but passes in the 'main' branch)

@codspeed-hq

codspeed-hq Bot commented Jun 19, 2026

Copy link
Copy Markdown

Merging this PR will degrade performance by 3.99%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

❌ 1 regressed benchmark
✅ 322 untouched benchmarks
⏩ 46 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation sort_long_line[10000] 406.9 µs 423.8 µs -3.99%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing DarthStrom:tac-sigbus-9748 (783be34) with main (f50adad)

Open in CodSpeed

Footnotes

  1. 46 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@oech3

oech3 commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

memmap2 should be removed from workspace dep.

@oech3

oech3 commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Also you should add co-authors if code is taken from other PRs.

@DarthStrom

Copy link
Copy Markdown
Contributor Author

@oech3 thanks for the review!

Co-authors (your second comment): added Co-authored-by trailers for the #11326 and #11464 authors and force-pushed.

memmap2 — I dug into this and I'd argue we should keep it, but I've narrowed where it's used. The unsoundness in #9748 is specifically about mmapping a caller-controlled file: another process can truncate it while it's mapped and we take a SIGBUS. While checking this I noticed the original fix was actually incomplete — it only covered the file-argument path. tac < file still went through a direct mmap of the raw stdin fd (i.e. the user's regular file), so it had the exact same crash. I've added a regression test for that case; it SIGBUSes on the previous code and passes after this change.

I removed that direct-stdin mmap and now route all stdin through a tempfile::tempfile() that we copy into and then map. That temp file is unlinked and owned by our process, so no other process can open and truncate it — the mapping stays valid for its whole lifetime and can't trigger SIGBUS. That's the only remaining memmap2 use, and it's sound.

The reason I'd prefer to keep it rather than just read_to_end into a Vec: that temp-file buffering was added deliberately in #10094 to avoid OOMing/timing out on very large stdin (matching GNU spilling to TMPDIR). Dropping memmap2 would mean reading all of stdin into memory again and reverting that behavior.

So removing memmap2 would trade a sound, bounded-memory path for unbounded memory on large stdin. Happy to go the other way if you feel strongly about dropping the dependency — just wanted to surface the tradeoff first. What do you think?

@oech3

oech3 commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

That temp file is unlinked and owned by our process, so no other process can open and truncate it

Ok. Thankyou. We can use mmap in the case.

tac memory-mapped regular files. If another process truncated such a
file while it was mapped (e.g. during log rotation), accessing the
now-invalid pages raised SIGBUS and killed the process. Read regular
files into memory up front instead, so a concurrent truncation can no
longer crash tac.

The stdin path had the same hole by a different route: `tac < file`
mapped the raw stdin fd -- the caller's regular file -- directly. Remove
that direct-stdin mmap and always route stdin through a process-owned
temp file (already used to bound memory on large stdin, see uutils#10094).
The temp file is created unlinked, so no other process can truncate it
and mapping it stays sound.

Adds regression tests that truncate a file mid-read, via both an
argument and stdin redirection, and assert tac is not killed by a
signal.

Fixes uutils#9748

Co-authored-by: easonysliu <easonysliu@tencent.com>
Co-authored-by: Charlie Tonneslan <cst0520@gmail.com>
@oech3

oech3 commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Can we use MAP_POPULATE, etc... instead of removing mmap?
It would be better to avoid 2nd copy by read after generating page cache.

Also I'm considering to add sealed-memfd + mmap + splice to uucore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tac crashes with SIGBUS when input file is truncated during read

2 participants