Skip to content

Fix Python code comments mistaken for embedded notebook titles#14643

Open
cderv wants to merge 20 commits into
mainfrom
fix/issue-14577-option-b
Open

Fix Python code comments mistaken for embedded notebook titles#14643
cderv wants to merge 20 commits into
mainfrom
fix/issue-14577-option-b

Conversation

@cderv

@cderv cderv commented Jul 1, 2026

Copy link
Copy Markdown
Member

When an embedded Jupyter notebook ({{< embed notebook.ipynb >}}) has no real markdown heading, a Python comment inside a code cell (e.g. # plt.savefig(...)) could get picked up as the notebook's title in the sidebar and "Source" button.

This supersedes #14639, which fixed the same issue by blocking title promotion in findTitle whenever any content preceded the heading. That patches the symptom at one call site and also drops legitimate titles for embeds that have prose before a real heading. If this merges, #14639 should be closed instead of merged — its earlier commits are included in this branch's history, but the later commits here replace its core mechanism.

Root Cause

The shared scanner markdownWithExtractedHeading (src/core/pandoc/pandoc-partition.ts) treated any #-prefixed line as an ATX heading, with no awareness of fenced code blocks. findTitle scans every cell's markdown, not just code cells, so this also reaches markdown cells, whose authored content can contain a fenced block nested in a list item.

Fix

Make the scanner itself fence-aware: track CommonMark ``` / ~~~ fence open/close state and skip heading detection while inside a fence. Fence markers are recognized indented up to 3 spaces, matching CommonMark's own limit. A real heading anywhere in the notebook is still promoted, same as before this bug — only a fenced line that merely looks like a heading is skipped. This fixes every caller of the scanner, not just the embed path.

Fixes #14577

cderv added 16 commits June 30, 2026 18:14
…tle (#14577)

When embedding a notebook with no YAML title and no markdown heading cell,
the "Source" link and sidebar showed a Python comment (e.g. `# plt.savefig(...)`)
instead of the notebook filename.

findTitle() in jupyter-embed.ts scans every cell's markdown, including code
cells whose markdown is their fenced source. partitionMarkdown is not
fence-aware, so a `#` comment line inside the fence matches the ATX-heading
regex and was returned as the title.

The non-embed code path already guards against this: the Dec-2023 "Improve
title snipping in notebooks" work (24672a4, fixes #5363/#6411) made
fixupFrontMatter only promote a heading to a notebook title when no content
precedes it. findTitle predates that change and was never brought in line, so
it still promoted any heading-like line. Apply the same `!contentBeforeHeading`
gate here by exposing contentBeforeHeading (already computed by
markdownWithExtractedHeading) through PartitionedMarkdown. A fenced code
block's opening delimiter always counts as content before the comment, so the
comment is no longer mistaken for a title and the filename is used instead.
The teardowns called raw Deno.removeSync, which throws NotFound when a
preview artifact is already absent (e.g. a prior teardown partially ran, or
a sibling test cleaned a shared path). A throwing teardown aborts the test
and leaves the fixture directory half-cleaned, so the next run trips over
stale state. safeRemoveSync (deno_ral/fs.ts) tolerates already-removed
paths and only rethrows genuine errors, keeping cleanup idempotent.
postRenderCleanup registered entries were removed with a non-recursive
Deno.removeSync, so an entry could only be a single file. Embedded-notebook
tests produce a `*_files` support directory alongside the preview HTML, which
could not be declared for cleanup and was left behind. Use safeRemoveSync with
recursive removal so a registered entry can be a directory as well as a file.
The regression test for the embedded-notebook title fix started as a bespoke
testRender block in render-embed.test.ts with manual teardown of the preview
artifacts. smoke-all is the established home for issue-regression tests and its
framework already handles output cleanup, so move it there: a 14577.qmd that
embeds notebook.ipynb and asserts the Source link uses the filename, not the
Python comment, with the preview artifacts declared for postRenderCleanup.
These asserted pre-existing, unchanged extraction behavior plus a trivial
pass-through (partitionMarkdown now forwards contentBeforeHeading, a value
markdownWithExtractedHeading already computed). They pinned behavior rather
than exercising the fix. The fix lives in findTitle and is covered end-to-end
by the smoke-all regression test, so the unit additions added no real coverage.
Record why findTitle can't mirror fixupFrontMatter's markdown-cells-only
guard: it runs on rendered JupyterCellOutput, which carries no cell_type, so
the !contentBeforeHeading check is the available equivalent for excluding a
code cell's fenced source.
Add a smoke-all test for an embedded notebook whose only heading has prose
before it: the heading must not become the embed title, which falls back to
the filename. This is the same rule fixupFrontMatter applies to a notebook's
own front-matter title since 24672a4 (#5363/#6411); the embed title now
aligns with it. Broaden the changelog entry to describe this general behavior,
not just the code-comment case.
…hared gate

The comment and changelog claimed findTitle now derives the title the same way
fixupFrontMatter does. Only the content-before-heading gate is shared: findTitle
still scans every cell (fixupFrontMatter inspects only the first markdown cell)
and cannot filter on cell_type, so the two are not equivalent. State that the
gate is borrowed and list the divergences, so the comparison is not read as a
guarantee of identical behavior.
…tionMarkdown

Prevents a future reader from removing the seemingly-unused field and
breaking fixupFrontMatter's own gate, flagged in final branch review.
…f the code

Both comments explained the fix's history and cited the issue number
rather than something non-obvious about the code itself; that context
belongs in the commit trail, not the source.
…Heading's other cases

Prior tests exercised the ATX heading path only indirectly through
partitionMarkdown, and the fence work this branch added only covered
fenced-block cases. Setext headings, the no-heading case, and
contentBeforeHeading's false branch had no direct test.
CommonMark allows fences indented up to 3 spaces, but every fence this
scanner sees comes from code cell source, which always starts at
column 0.
@posit-snyk-bot

posit-snyk-bot commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

cderv added 4 commits July 1, 2026 15:29
markdownWithExtractedHeading scanned every notebook cell, not just
code cells. Code cells always emit fences at column 0, but markdown
cells carry arbitrary user-authored content, where a fenced block
nested in a list item can have its fence markers indented while a
line inside stays at column 0. That line was previously invisible to
fence tracking and could still be mistaken for a heading.
Distinct from the indented-open case already covered: opening fence
at column 0, closing marker indented, followed by a real heading.
Confirms the indented fence-close fix also resolves the case where a
heading after such a fence would otherwise never be reached.
@cderv

cderv commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

closes #14639

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python comments wrongly rendered as title in the sidebar of embeded notebooks

2 participants