Fix Python code comments mistaken for embedded notebook titles#14643
Open
cderv wants to merge 20 commits into
Open
Fix Python code comments mistaken for embedded notebook titles#14643cderv wants to merge 20 commits into
cderv wants to merge 20 commits into
Conversation
…tle (#14577) When embedding a notebook with no YAML title and no markdown heading cell, the "Source" link and sidebar showed a Python comment (e.g. `# plt.savefig(...)`) instead of the notebook filename. findTitle() in jupyter-embed.ts scans every cell's markdown, including code cells whose markdown is their fenced source. partitionMarkdown is not fence-aware, so a `#` comment line inside the fence matches the ATX-heading regex and was returned as the title. The non-embed code path already guards against this: the Dec-2023 "Improve title snipping in notebooks" work (24672a4, fixes #5363/#6411) made fixupFrontMatter only promote a heading to a notebook title when no content precedes it. findTitle predates that change and was never brought in line, so it still promoted any heading-like line. Apply the same `!contentBeforeHeading` gate here by exposing contentBeforeHeading (already computed by markdownWithExtractedHeading) through PartitionedMarkdown. A fenced code block's opening delimiter always counts as content before the comment, so the comment is no longer mistaken for a title and the filename is used instead.
The teardowns called raw Deno.removeSync, which throws NotFound when a preview artifact is already absent (e.g. a prior teardown partially ran, or a sibling test cleaned a shared path). A throwing teardown aborts the test and leaves the fixture directory half-cleaned, so the next run trips over stale state. safeRemoveSync (deno_ral/fs.ts) tolerates already-removed paths and only rethrows genuine errors, keeping cleanup idempotent.
postRenderCleanup registered entries were removed with a non-recursive Deno.removeSync, so an entry could only be a single file. Embedded-notebook tests produce a `*_files` support directory alongside the preview HTML, which could not be declared for cleanup and was left behind. Use safeRemoveSync with recursive removal so a registered entry can be a directory as well as a file.
The regression test for the embedded-notebook title fix started as a bespoke testRender block in render-embed.test.ts with manual teardown of the preview artifacts. smoke-all is the established home for issue-regression tests and its framework already handles output cleanup, so move it there: a 14577.qmd that embeds notebook.ipynb and asserts the Source link uses the filename, not the Python comment, with the preview artifacts declared for postRenderCleanup.
These asserted pre-existing, unchanged extraction behavior plus a trivial pass-through (partitionMarkdown now forwards contentBeforeHeading, a value markdownWithExtractedHeading already computed). They pinned behavior rather than exercising the fix. The fix lives in findTitle and is covered end-to-end by the smoke-all regression test, so the unit additions added no real coverage.
Record why findTitle can't mirror fixupFrontMatter's markdown-cells-only guard: it runs on rendered JupyterCellOutput, which carries no cell_type, so the !contentBeforeHeading check is the available equivalent for excluding a code cell's fenced source.
Add a smoke-all test for an embedded notebook whose only heading has prose before it: the heading must not become the embed title, which falls back to the filename. This is the same rule fixupFrontMatter applies to a notebook's own front-matter title since 24672a4 (#5363/#6411); the embed title now aligns with it. Broaden the changelog entry to describe this general behavior, not just the code-comment case.
…hared gate The comment and changelog claimed findTitle now derives the title the same way fixupFrontMatter does. Only the content-before-heading gate is shared: findTitle still scans every cell (fixupFrontMatter inspects only the first markdown cell) and cannot filter on cell_type, so the two are not equivalent. State that the gate is borrowed and list the divergences, so the comparison is not read as a guarantee of identical behavior.
…t heading extraction is fence-aware
…tionMarkdown Prevents a future reader from removing the seemingly-unused field and breaking fixupFrontMatter's own gate, flagged in final branch review.
…f the code Both comments explained the fix's history and cited the issue number rather than something non-obvious about the code itself; that context belongs in the commit trail, not the source.
…Heading's other cases Prior tests exercised the ATX heading path only indirectly through partitionMarkdown, and the fence work this branch added only covered fenced-block cases. Setext headings, the no-heading case, and contentBeforeHeading's false branch had no direct test.
CommonMark allows fences indented up to 3 spaces, but every fence this scanner sees comes from code cell source, which always starts at column 0.
Collaborator
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
markdownWithExtractedHeading scanned every notebook cell, not just code cells. Code cells always emit fences at column 0, but markdown cells carry arbitrary user-authored content, where a fenced block nested in a list item can have its fence markers indented while a line inside stays at column 0. That line was previously invisible to fence tracking and could still be mistaken for a heading.
Distinct from the indented-open case already covered: opening fence at column 0, closing marker indented, followed by a real heading. Confirms the indented fence-close fix also resolves the case where a heading after such a fence would otherwise never be reached.
Member
Author
|
closes #14639 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When an embedded Jupyter notebook (
{{< embed notebook.ipynb >}}) has no real markdown heading, a Python comment inside a code cell (e.g.# plt.savefig(...)) could get picked up as the notebook's title in the sidebar and "Source" button.This supersedes #14639, which fixed the same issue by blocking title promotion in
findTitlewhenever any content preceded the heading. That patches the symptom at one call site and also drops legitimate titles for embeds that have prose before a real heading. If this merges, #14639 should be closed instead of merged — its earlier commits are included in this branch's history, but the later commits here replace its core mechanism.Root Cause
The shared scanner
markdownWithExtractedHeading(src/core/pandoc/pandoc-partition.ts) treated any#-prefixed line as an ATX heading, with no awareness of fenced code blocks.findTitlescans every cell's markdown, not just code cells, so this also reaches markdown cells, whose authored content can contain a fenced block nested in a list item.Fix
Make the scanner itself fence-aware: track CommonMark ``` / ~~~ fence open/close state and skip heading detection while inside a fence. Fence markers are recognized indented up to 3 spaces, matching CommonMark's own limit. A real heading anywhere in the notebook is still promoted, same as before this bug — only a fenced line that merely looks like a heading is skipped. This fixes every caller of the scanner, not just the embed path.
Fixes #14577