chore: CI worfklows for linting docs by maxrjones · Pull Request #4076 · zarr-developers/zarr-python

maxrjones · 2026-06-17T18:25:52Z

This PR adds a few scripts/workflows to:

Check that all top-level functions are included in the mkdocs API documentation
Check that docstrings use mkdocs rather than sphinx styles admonitions and links
Check that all links resolve

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/user-guide/*.md
Changes documented as a new file in changes/
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

maxrjones · 2026-06-18T00:03:26Z

@d-v-b @chuckwondo this adds a fair amount of config/workflow files, so a review on the direction here would be welcome but not necessarily required

chuckwondo · 2026-06-18T10:41:43Z

+    for line in text[directive_start:].splitlines()[1:]:
+        if line.strip() == "":
+            continue
+        if not line.startswith((" ", "\t")):
+            break
+        if MEMBERS_DISABLED_RE.match(line):
+            return True
+    return False


This is perhaps more readable:

Suggested change

for line in text[directive_start:].splitlines()[1:]:

if line.strip() == "":

continue

if not line.startswith((" ", "\t")):

break

if MEMBERS_DISABLED_RE.match(line):

return True

return False

return any(

MEMBERS_DISABLED_RE.match(line)

for line in text[directive_start:].splitlines()[1:]

if line.startswith((" ", "\t")) and line.strip()

)

However, I cannot tell (yet, I'll keep reviewing) if text is the entirety of a python file, in which case, slicing it from some point to the end of the file for every directive encountered in the file seems highly inefficient (but perhaps isn't a noticeable performance problem, or I need to keep looking at your code to see this is not the case).

Although, I suspect that the suggestion above doesn't address the potentially larger issue of how this script is architected. In the end, however, this might be sufficient since this is simply a CI script, but I'll make more significant suggestions elsewhere.

chuckwondo · 2026-06-18T11:03:33Z

+REPO_ROOT = Path(__file__).parent.parent.resolve()
+API_DOCS_ROOT = REPO_ROOT / "docs" / "api"


I suggest you make this a CLI argument instead of hard-coding it here. Then modify the github workflow to call this script with "docs/api" as the argument since it will be invoked from the root of the repo.

chuckwondo · 2026-06-18T11:09:54Z

+    for md_file in sorted(API_DOCS_ROOT.rglob("*.md")):
+        text = md_file.read_text(encoding="utf-8")
+        for match in DIRECTIVE_RE.finditer(text):
+            obj = resolve(match.group("target"))
+            if obj is None:
+                continue
+            documented.add(id(obj))
+            if isinstance(obj, ModuleType) and not members_disabled(text, match.start()):
+                member_names = getattr(obj, "__all__", None) or [
+                    name for name in dir(obj) if not name.startswith("_")
+                ]
+                for name in member_names:
+                    member = getattr(obj, name, None)
+                    if member is not None:
+                        documented.add(id(member))
+    return documented


This is nearly indecipherable, so it may be annoying to maintain.

I suggest refactoring this to take a pipeline approach, which I believe will not only make it more readable, but also allow for greater efficiency/speed (related to my comment on the members_disabled function), which seems to be a motivating factor based on your comments elsewhere.

A pipeline approach should allow you to avoid repeatedly splitting each file's text into individual lines every time a directive is encountered.

chuckwondo · 2026-06-18T11:14:24Z

+    missing = []
+    for name in zarr.__all__:
+        if name in EXEMPT_EXPORTS:
+            continue
+        if id(getattr(zarr, name)) not in documented:
+            missing.append(name)
+    return sorted(missing)


Use the ~~force~~comprehension, ~~Luke~~Max!

Suggested change

missing = []

for name in zarr.__all__:

if name in EXEMPT_EXPORTS:

continue

if id(getattr(zarr, name)) not in documented:

missing.append(name)

return sorted(missing)

return sorted(

name

for name in zarr.__all__

if name not in EXEMPT_EXPORTS

and id(getattr(zarr, name)) not in documented

)

chuckwondo · 2026-06-18T11:15:26Z

+    if not API_DOCS_ROOT.exists():
+        raise FileNotFoundError(f"{API_DOCS_ROOT} does not exist.")


As I mentioned in another comment, take the docs root as an argument, rather than hard-coding it.

chuckwondo · 2026-06-18T11:17:19Z

+    lines = [
+        f"Found {len(missing)} public export(s) in zarr.__all__ missing from the API "
+        "reference (docs/api/):\n",
+    ]
+    lines.extend(f"  - zarr.{name}" for name in missing)
+    lines.append(
+        "\nAdd a `::: zarr.<name>` page under docs/api/zarr/ (and register it in "
+        "mkdocs.yml and docs/api/zarr/index.md), or -- if the export is intentionally "
+        "undocumented -- add it to EXEMPT_EXPORTS in this script with a reason."
+    )
+    raise ValueError("\n".join(lines))
+
+
+if __name__ == "__main__":
+    main()


I suggest following the pattern you used in lint_docs.py, where you print to stderr and use a non-zero exit code, rather than raising.

chuckwondo · 2026-06-18T11:45:06Z

+    findings: list[Finding] = []
+    doc_nodes = (ast.Module, ast.ClassDef, ast.FunctionDef, ast.AsyncFunctionDef)
+    for node in ast.walk(tree):
+        if not isinstance(node, doc_nodes):
+            continue
+        docstring = ast.get_docstring(node, clean=False)
+        if not docstring:
+            continue
+        # node.body[0].value is the docstring literal; its lineno is the line the string
+        # opens on, so content line i maps to source line (start + i).
+        start = node.body[0].value.lineno  # type: ignore[attr-defined]
+        for offset, line in enumerate(docstring.splitlines()):
+            findings.extend(
+                Finding(path, start + offset, category, line) for category in _scan_line(line)
+            )
+    return findings


Haven't tested this, but I would argue this is more readable. I encourage you to embrace comprehensions over nested for loops with embedded conditionals. While the logic is essentially the same, the comprehension syntax tends to be more readable (even more natural-language-like).

To avoid having the comprehensions themselves become incomprehensible, I've also created 2 comprehensions in place of the single nested for loops.

There are several other places in this script where I encourage you to do similar refactorings to aid in readability and maintainability.

Suggested change

findings: list[Finding] = []

doc_nodes = (ast.Module, ast.ClassDef, ast.FunctionDef, ast.AsyncFunctionDef)

for node in ast.walk(tree):

if not isinstance(node, doc_nodes):

continue

docstring = ast.get_docstring(node, clean=False)

if not docstring:

continue

# node.body[0].value is the docstring literal; its lineno is the line the string

# opens on, so content line i maps to source line (start + i).

start = node.body[0].value.lineno # type: ignore[attr-defined]

for offset, line in enumerate(docstring.splitlines()):

findings.extend(

Finding(path, start + offset, category, line) for category in _scan_line(line)

)

return findings

doc_nodes = (ast.Module, ast.ClassDef, ast.FunctionDef, ast.AsyncFunctionDef)

docstring_start_pairs = [

# node.body[0].value is the docstring literal; its lineno is the line the string

# opens on, so content line i maps to source line (start + i).

(docstring, start = node.body[0].value.lineno)

for node in ast.walk(tree)

if isinstance(node, doc_nodes)

if docstring := ast.get_docstring(node, clean=False)

]

return [

Finding(path, start + offset, category, line)

for docstring, start in docstring_start_pairs

for offset, line in enumerate(docstring.splitlines())

for category in _scan_line(line)

]

chore: CI worfklows for missing docs and RST docstrings

52decc2

maxrjones added the tests label Jun 17, 2026

github-actions Bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jun 17, 2026

maxrjones changed the title ~~chore: CI worfklows for missing docs and RST docstrings~~ chore: CI worfklows for linting docs Jun 17, 2026

maxrjones added 3 commits June 17, 2026 19:40

Add linter for docs

b4837a4

Add link checking workflow

e837bb2

fix links

4e2a02e

chuckwondo suggested changes Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: CI worfklows for linting docs#4076

chore: CI worfklows for linting docs#4076
maxrjones wants to merge 4 commits into
zarr-developers:mainfrom
maxrjones:chore/test-docs-formatting

maxrjones commented Jun 17, 2026 •

edited

Loading

Uh oh!

maxrjones commented Jun 18, 2026

Uh oh!

chuckwondo Jun 18, 2026

Uh oh!

chuckwondo Jun 18, 2026

Uh oh!

chuckwondo Jun 18, 2026

Uh oh!

chuckwondo Jun 18, 2026

Uh oh!

chuckwondo Jun 18, 2026

Uh oh!

chuckwondo Jun 18, 2026

Uh oh!

chuckwondo Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		REPO_ROOT = Path(__file__).parent.parent.resolve()
		API_DOCS_ROOT = REPO_ROOT / "docs" / "api"

		if not API_DOCS_ROOT.exists():
		raise FileNotFoundError(f"{API_DOCS_ROOT} does not exist.")

Uh oh!

Conversation

maxrjones commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxrjones commented Jun 18, 2026

Uh oh!

chuckwondo Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chuckwondo Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chuckwondo Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chuckwondo Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chuckwondo Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chuckwondo Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chuckwondo Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maxrjones commented Jun 17, 2026 •

edited

Loading