Skip to content

chore: CI worfklows for linting docs#4076

Open
maxrjones wants to merge 4 commits into
zarr-developers:mainfrom
maxrjones:chore/test-docs-formatting
Open

chore: CI worfklows for linting docs#4076
maxrjones wants to merge 4 commits into
zarr-developers:mainfrom
maxrjones:chore/test-docs-formatting

Conversation

@maxrjones

@maxrjones maxrjones commented Jun 17, 2026

Copy link
Copy Markdown
Member

This PR adds a few scripts/workflows to:

  • Check that all top-level functions are included in the mkdocs API documentation
  • Check that docstrings use mkdocs rather than sphinx styles admonitions and links
  • Check that all links resolve

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.md
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions Bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jun 17, 2026
@maxrjones maxrjones changed the title chore: CI worfklows for missing docs and RST docstrings chore: CI worfklows for linting docs Jun 17, 2026
@maxrjones

Copy link
Copy Markdown
Member Author

@d-v-b @chuckwondo this adds a fair amount of config/workflow files, so a review on the direction here would be welcome but not necessarily required

Comment on lines +62 to +69
for line in text[directive_start:].splitlines()[1:]:
if line.strip() == "":
continue
if not line.startswith((" ", "\t")):
break
if MEMBERS_DISABLED_RE.match(line):
return True
return False

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perhaps more readable:

Suggested change
for line in text[directive_start:].splitlines()[1:]:
if line.strip() == "":
continue
if not line.startswith((" ", "\t")):
break
if MEMBERS_DISABLED_RE.match(line):
return True
return False
return any(
MEMBERS_DISABLED_RE.match(line)
for line in text[directive_start:].splitlines()[1:]
if line.startswith((" ", "\t")) and line.strip()
)

However, I cannot tell (yet, I'll keep reviewing) if text is the entirety of a python file, in which case, slicing it from some point to the end of the file for every directive encountered in the file seems highly inefficient (but perhaps isn't a noticeable performance problem, or I need to keep looking at your code to see this is not the case).

Although, I suspect that the suggestion above doesn't address the potentially larger issue of how this script is architected. In the end, however, this might be sufficient since this is simply a CI script, but I'll make more significant suggestions elsewhere.

Comment on lines +27 to +28
REPO_ROOT = Path(__file__).parent.parent.resolve()
API_DOCS_ROOT = REPO_ROOT / "docs" / "api"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you make this a CLI argument instead of hard-coding it here. Then modify the github workflow to call this script with "docs/api" as the argument since it will be invoked from the root of the repo.

Comment on lines +79 to +94
for md_file in sorted(API_DOCS_ROOT.rglob("*.md")):
text = md_file.read_text(encoding="utf-8")
for match in DIRECTIVE_RE.finditer(text):
obj = resolve(match.group("target"))
if obj is None:
continue
documented.add(id(obj))
if isinstance(obj, ModuleType) and not members_disabled(text, match.start()):
member_names = getattr(obj, "__all__", None) or [
name for name in dir(obj) if not name.startswith("_")
]
for name in member_names:
member = getattr(obj, name, None)
if member is not None:
documented.add(id(member))
return documented

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nearly indecipherable, so it may be annoying to maintain.

I suggest refactoring this to take a pipeline approach, which I believe will not only make it more readable, but also allow for greater efficiency/speed (related to my comment on the members_disabled function), which seems to be a motivating factor based on your comments elsewhere.

A pipeline approach should allow you to avoid repeatedly splitting each file's text into individual lines every time a directive is encountered.

Comment on lines +99 to +105
missing = []
for name in zarr.__all__:
if name in EXEMPT_EXPORTS:
continue
if id(getattr(zarr, name)) not in documented:
missing.append(name)
return sorted(missing)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the forcecomprehension, LukeMax!

Suggested change
missing = []
for name in zarr.__all__:
if name in EXEMPT_EXPORTS:
continue
if id(getattr(zarr, name)) not in documented:
missing.append(name)
return sorted(missing)
return sorted(
name
for name in zarr.__all__
if name not in EXEMPT_EXPORTS
and id(getattr(zarr, name)) not in documented
)

Comment on lines +109 to +110
if not API_DOCS_ROOT.exists():
raise FileNotFoundError(f"{API_DOCS_ROOT} does not exist.")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in another comment, take the docs root as an argument, rather than hard-coding it.

Comment on lines +117 to +131
lines = [
f"Found {len(missing)} public export(s) in zarr.__all__ missing from the API "
"reference (docs/api/):\n",
]
lines.extend(f" - zarr.{name}" for name in missing)
lines.append(
"\nAdd a `::: zarr.<name>` page under docs/api/zarr/ (and register it in "
"mkdocs.yml and docs/api/zarr/index.md), or -- if the export is intentionally "
"undocumented -- add it to EXEMPT_EXPORTS in this script with a reason."
)
raise ValueError("\n".join(lines))


if __name__ == "__main__":
main()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest following the pattern you used in lint_docs.py, where you print to stderr and use a non-zero exit code, rather than raising.

Comment thread ci/lint_docs.py
Comment on lines +116 to +131
findings: list[Finding] = []
doc_nodes = (ast.Module, ast.ClassDef, ast.FunctionDef, ast.AsyncFunctionDef)
for node in ast.walk(tree):
if not isinstance(node, doc_nodes):
continue
docstring = ast.get_docstring(node, clean=False)
if not docstring:
continue
# node.body[0].value is the docstring literal; its lineno is the line the string
# opens on, so content line i maps to source line (start + i).
start = node.body[0].value.lineno # type: ignore[attr-defined]
for offset, line in enumerate(docstring.splitlines()):
findings.extend(
Finding(path, start + offset, category, line) for category in _scan_line(line)
)
return findings

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't tested this, but I would argue this is more readable. I encourage you to embrace comprehensions over nested for loops with embedded conditionals. While the logic is essentially the same, the comprehension syntax tends to be more readable (even more natural-language-like).

To avoid having the comprehensions themselves become incomprehensible, I've also created 2 comprehensions in place of the single nested for loops.

There are several other places in this script where I encourage you to do similar refactorings to aid in readability and maintainability.

Suggested change
findings: list[Finding] = []
doc_nodes = (ast.Module, ast.ClassDef, ast.FunctionDef, ast.AsyncFunctionDef)
for node in ast.walk(tree):
if not isinstance(node, doc_nodes):
continue
docstring = ast.get_docstring(node, clean=False)
if not docstring:
continue
# node.body[0].value is the docstring literal; its lineno is the line the string
# opens on, so content line i maps to source line (start + i).
start = node.body[0].value.lineno # type: ignore[attr-defined]
for offset, line in enumerate(docstring.splitlines()):
findings.extend(
Finding(path, start + offset, category, line) for category in _scan_line(line)
)
return findings
doc_nodes = (ast.Module, ast.ClassDef, ast.FunctionDef, ast.AsyncFunctionDef)
docstring_start_pairs = [
# node.body[0].value is the docstring literal; its lineno is the line the string
# opens on, so content line i maps to source line (start + i).
(docstring, start = node.body[0].value.lineno)
for node in ast.walk(tree)
if isinstance(node, doc_nodes)
if docstring := ast.get_docstring(node, clean=False)
]
return [
Finding(path, start + offset, category, line)
for docstring, start in docstring_start_pairs
for offset, line in enumerate(docstring.splitlines())
for category in _scan_line(line)
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs release notes Automatically applied to PRs which haven't added release notes tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants