Skip to content

refactor: bring in types from zarr-metadata#3961

Open
d-v-b wants to merge 56 commits into
zarr-developers:mainfrom
d-v-b:use-zarr-metadata
Open

refactor: bring in types from zarr-metadata#3961
d-v-b wants to merge 56 commits into
zarr-developers:mainfrom
d-v-b:use-zarr-metadata

Conversation

@d-v-b

@d-v-b d-v-b commented May 10, 2026

Copy link
Copy Markdown
Contributor

replaces some of our types with exports from zarr-metadata. I expect a few related PRs, alternating between ones like this (importing types) and ones that add missing types to zarr-metadata.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.md
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions Bot added the needs release notes Automatically applied to PRs which haven't added release notes label May 10, 2026
@d-v-b d-v-b requested a review from ilan-gold May 10, 2026 08:38
@d-v-b

d-v-b commented May 10, 2026

Copy link
Copy Markdown
Contributor Author

cc @chuckwondo

@codecov

codecov Bot commented May 10, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.53%. Comparing base (ff6da8f) to head (467cf27).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3961      +/-   ##
==========================================
- Coverage   93.55%   93.53%   -0.02%     
==========================================
  Files          88       88              
  Lines       11896    11885      -11     
==========================================
- Hits        11129    11117      -12     
- Misses        767      768       +1     
Files with missing lines Coverage Δ
src/zarr/codecs/blosc.py 95.74% <100.00%> (-0.03%) ⬇️
src/zarr/codecs/cast_value.py 98.58% <100.00%> (-0.02%) ⬇️
src/zarr/core/dtype/npy/structured.py 92.61% <100.00%> (ø)
src/zarr/core/group.py 95.21% <100.00%> (+0.01%) ⬆️
src/zarr/core/metadata/v2.py 89.32% <100.00%> (-0.06%) ⬇️
src/zarr/core/metadata/v3.py 93.80% <100.00%> (-0.17%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread pyproject.toml Outdated
@d-v-b d-v-b mentioned this pull request May 11, 2026
d-v-b added 5 commits May 11, 2026 16:39
- set `zarr-metadata` to resolve locally in local development
- add a section to the docs outlining the relationship between zarr and zarr-metadata packages
…e the lower zarr-metadata bound is published
@github-actions github-actions Bot removed the needs release notes Automatically applied to PRs which haven't added release notes label May 13, 2026
@d-v-b

d-v-b commented May 15, 2026

Copy link
Copy Markdown
Contributor Author

we should not merge this until #3972 is sorted out. that PR switches our pre-commit mypy check to run in a locked environment, which can include the package-local copy of zarr-metadata. Without that PR, mypy in pre-commit would run against the version of zarr-metadata on pypi, which would generate false alarms when we pushed pre-release changes to zarr-metadata + zarr-python.

Comment thread packages/zarr-metadata/src/zarr_metadata/v3/codec/bytes.py
d-v-b and others added 10 commits May 19, 2026 08:57
zarr-metadata 0.2.0 is published on PyPI and adds partial metadata
document types (ArrayMetadataV2Partial, ArrayMetadataV3Partial,
GroupMetadataV2Partial, GroupMetadataV3Partial). Raise the dependency
range from `>=0.1.1,<0.2` to `>=0.2.0,<0.3` so zarr-python can consume
them, and update the matching min_deps pin and contributing docs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rtial

The three `array_metadata` fixtures in test_consolidated.py are partial v3
array documents: they intentionally omit `shape` and `chunk_grid`, which are
supplied per-array via spread into `ArrayV3Metadata.from_dict(...)`.
zarr-metadata 0.2.0 ships `ArrayMetadataV3Partial` (the `total=False` form)
to type exactly this kind of fragment, replacing the loose `dict[str, JSON]`
/ `dict[str, Any]` annotations.

The partial type widens field value types to `object`, so spreading it into
the `dict[str, JSON]` literal that `from_dict` consumes needs a per-spread
`# type: ignore[dict-item]`. That tradeoff is deliberate: the fragment values
are now precisely typed, and the suppression is localized to the spread sites
rather than mistyping the actual fixture.

Other survey candidates (full GroupMetadata docs in test_group.py,
extension-field dicts) were left unchanged: they are either complete
documents or extra-key dicts, not partial fragments.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `zarr_json` fixture in test_getitem_consolidated_empty_leaf_group is a
complete v3 group metadata document carrying an inline consolidated_metadata
extension field. Retype it from `dict[str, JSON]` to `GroupMetadataV3` so the
standard fields (zarr_format, node_type, attributes) are structurally checked.

mypy does not honor PEP 728 `extra_items=`, so the conforming extension key
still needs a single `# type: ignore[typeddict-unknown-key]`, matching the
existing pattern elsewhere in the metadata tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Annotate the `valid_json_v3` fixtures in the dtype tests with the matching
zarr-metadata 0.2.0 types so the fixture shapes are structurally checked
(and dtype-name typos in the bare-string fixtures are caught):

- test_int.py: Int8/16/32/64 + Uint8/16/32/64 DataTypeName literals
- test_float.py: Float16/32/64 DataTypeName literals
- test_complex.py: Complex64/128 DataTypeName literals
- test_bool.py: BoolDataTypeName literal
- test_string.py: StringDataTypeName literal (variable-length classes)
- test_time.py: NumpyDatetime64 / NumpyTimedelta64 envelope TypedDicts

Left deliberately untyped (documented inline):
- struct: zarr-metadata's `Struct` models `fields` as a tuple, but
  zarr-python's `Struct.to_json` emits a list and the round-trip test asserts
  equality, so typing it would break runtime behavior.
- fixed_length_utf32 / null_terminated_bytes / raw_bytes / variable_length_bytes:
  zarr-metadata 0.2.0 exports no envelope type for these dtypes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`Struct.to_json(zarr_format=3)` built `configuration.fields` with a list
comprehension, producing a Python list. Nothing requires a list: `json.dumps`
serializes list and tuple identically, the `_from_json` parsers iterate the
field array structurally, and no test asserts list-ness for the canonical
`Struct` class. The list was incidental.

Emit a tuple instead, matching zarr-metadata's `StructConfiguration.fields`
type (`tuple[StructField, ...]`) and the project's principle that a JSON array
is a typed fixed-length container. This lets the v3 round-trip test fixture be
typed as zarr-metadata's `Struct` — the one dtype fixture that previously had
to stay loosely typed because the tuple model collided with the list emission.

The on-disk JSON is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Slow Hypothesis job checked out shallowly (no tags), so hatch-vcs could
not find the `zarr_metadata-v*` / `v*` tags and built the in-tree
zarr-metadata as `0.1.dev1`. After bumping zarr-python's floor to
`zarr-metadata>=0.2.0`, that stale version no longer satisfies the
constraint, producing a ResolutionImpossible at dependency-sync time:

    Cannot install zarr-metadata 0.1.dev1 ... and zarr==0.1.dev1 because
    these package versions have conflicting dependencies.

Add `fetch-depth: 0` so the workflow grabs all tags, matching test.yml and
the other package-building workflows. hatch-vcs then derives real versions
(zarr 3.x, zarr-metadata 0.2.x) and the floor resolves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two low-severity findings from the branch review:

- releases.yml: the `verify_pypi_dependency` failure diagnostics wrapped `>=`
  in backticks inside double-quoted echo strings, so bash ran `>=` as command
  substitution — dropping `>=` from the message and creating a stray `=` file.
  Switch to single quotes with the requirement passed as a separate argument.

- core/metadata/v2.py: the "Re-export ... historical name" comment sat above
  an unrelated `parse_separator` import. Move it to the
  `ArrayV2MetadataDict = _ArrayMetadataV2` assignment it actually describes.

The third finding (pyright via uvx possibly not resolving src imports) was
verified as a non-issue: the zarr-metadata pyright CI job passes on the
current branch tip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@d-v-b

d-v-b commented May 22, 2026

Copy link
Copy Markdown
Contributor Author

this is ready for final review!

key notes for reviewers:

  • nothing should change at runtime
  • some tests have changed because they used invalid metadata
  • the struct dtype used lists instead of tuples for its JSON metadata as declared in zarr-python. That type is changed to use tuples, following our broader metadata convention. I do not think this will affect any consumers.
  • There's a fair number of infrastructure / build changes, which try to ensure that we don't ship a version of zarr that depends on unreleased features in zarr-metadata.

d-v-b and others added 3 commits May 28, 2026 21:02
# Conflicts:
#	tests/test_dtype/test_npy/test_string.py
…#176)

Bumps the actions group with 8 updates in the / directory:

| Package | From | To |
| --- | --- | --- |
| [prefix-dev/setup-pixi](https://github.com/prefix-dev/setup-pixi) | `0.9.5` | `0.9.6` |
| [codecov/codecov-action](https://github.com/codecov/codecov-action) | `6.0.0` | `6.0.1` |
| [github/issue-metrics](https://github.com/github/issue-metrics) | `4.2.2` | `4.2.7` |
| [j178/prek-action](https://github.com/j178/prek-action) | `2.0.3` | `2.0.4` |
| [actions/upload-artifact](https://github.com/actions/upload-artifact) | `7.0.0` | `7.0.1` |
| [actions/download-artifact](https://github.com/actions/download-artifact) | `7.0.0` | `8.0.1` |
| [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish) | `1.13.0` | `1.14.0` |
| [zizmorcore/zizmor-action](https://github.com/zizmorcore/zizmor-action) | `0.5.3` | `0.5.6` |



Updates `prefix-dev/setup-pixi` from 0.9.5 to 0.9.6
- [Release notes](https://github.com/prefix-dev/setup-pixi/releases)
- [Commits](prefix-dev/setup-pixi@1b2de7f...5185adf)

Updates `codecov/codecov-action` from 6.0.0 to 6.0.1
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](codecov/codecov-action@57e3a13...e79a696)

Updates `github/issue-metrics` from 4.2.2 to 4.2.7
- [Release notes](https://github.com/github/issue-metrics/releases)
- [Commits](github-community-projects/issue-metrics@c9e9838...1e38d5e)

Updates `j178/prek-action` from 2.0.3 to 2.0.4
- [Release notes](https://github.com/j178/prek-action/releases)
- [Commits](j178/prek-action@6ad8027...bdca6f1)

Updates `actions/upload-artifact` from 7.0.0 to 7.0.1
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@v7...043fb46)

Updates `actions/download-artifact` from 7.0.0 to 8.0.1
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](actions/download-artifact@v7...3e5f45b)

Updates `pypa/gh-action-pypi-publish` from 1.13.0 to 1.14.0
- [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases)
- [Commits](pypa/gh-action-pypi-publish@v1.13.0...cef2210)

Updates `zizmorcore/zizmor-action` from 0.5.3 to 0.5.6
- [Release notes](https://github.com/zizmorcore/zizmor-action/releases)
- [Commits](zizmorcore/zizmor-action@b1d7e1f...5f14fd0)

---
updated-dependencies:
- dependency-name: prefix-dev/setup-pixi
  dependency-version: 0.9.6
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions
- dependency-name: codecov/codecov-action
  dependency-version: 6.0.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions
- dependency-name: github/issue-metrics
  dependency-version: 4.2.7
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions
- dependency-name: j178/prek-action
  dependency-version: 2.0.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions
- dependency-name: actions/upload-artifact
  dependency-version: 7.0.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions
- dependency-name: actions/download-artifact
  dependency-version: 8.0.1
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions
- dependency-name: pypa/gh-action-pypi-publish
  dependency-version: 1.14.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions
- dependency-name: zizmorcore/zizmor-action
  dependency-version: 0.5.6
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@d-v-b d-v-b self-assigned this Jun 3, 2026
ls
ls dist

# ---------------------------------------------------------------------------

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test similar to https://github.com/zarr-developers/zarr-python/pull/3961/changes#diff-50c86b7ed8ac2cf95bd48334961bf0530cdc77b5a56f852c5c61b89d735fd711R267-R271 that runs the tests against the maximum version allowed from pypi? It feels like that would also catch any bugs present in that version. If there is a test for the minimum zarr_metadata version, how could a user add a feature that hasn't been released yet and also have fully-passing CI?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after this PR, our test suite will run against the in-tree version of zarr-metadata. that's always going to be the latest version

They must have ``must_understand`` set to ``False``, and may contain
arbitrary additional JSON data.
"""
AllowedExtraField = ExtensionFieldV3

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to deprecate these sorts of aliases? A custom __getattr__ that checks if the caller is trying to import one of these?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we need to patch the module object for this. lmk if you think that's high enough value in this case -- iirc these types were never exported at the top level

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we need to patch the module object for this.

Yes, I think so, something like https://github.com/scverse/anndata/blob/b0746ac7db220a3428ab93d3d1d92c1c1b0122c8/src/anndata/_io/__init__.py#L8-L17

d-v-b and others added 13 commits June 3, 2026 16:19
… close, codec order validation

- storage/_utils.py: clamp SuffixByteRequest start to max(0, ...) so
  suffix > len(data) returns all available bytes (B1)
- storage/_logging.py: materialise key_ranges into a list before building
  the log hint string, preventing one-shot generator exhaustion (B2)
- abc/store.py: normalise getsize_prefix argument to end with "/" before
  calling list_prefix, matching delete_dir/is_empty behaviour so sibling
  keys are not over-counted (B3)
- storage/_zip.py: guard ZipStore.close() with an early return when the
  store was never opened, avoiding AttributeError on missing _lock (B4)
- core/codec_pipeline.py: add the missing raise TypeError(msg) in the
  BytesBytesCodec-after-ArrayArrayCodec branch of codecs_from_list (B5)

Add regression tests for all five fixes.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Make each of the five bugs fixed in this PR catchable by a property/stateful
or shared-harness test (verified red-green against the pre-fix source), and
remove the per-store special-case tests whose coverage is now subsumed:

- B1 (suffix clamp): broaden the `key_ranges` strategy to emit
  SuffixByteRequest / OffsetByteRequest / None with bounds that can exceed
  the value length, and give the stateful `get_partial_values` rule an
  independent oracle for every ByteRequest variant. The pure-function unit
  test `TestNormalizeByteRangeIndex` is kept as a fast deterministic guard.
- B2 (generator exhaustion): the shared `StoreTests.test_get_partial_values`
  now passes `key_ranges` as a one-shot generator and asserts one result per
  request, so any store/wrapper that exhausts the iterable early is caught
  across all stores. The logging-only regression test is removed as redundant.
- B3 (getsize_prefix sibling over-counting): the shared
  `StoreTests.test_getsize_prefix` now includes a sibling key ("cc/0") that
  must be excluded, covering every store deterministically. Add a matching
  `getsize_prefix` rule to the store state machine. The memory-only
  regression test is removed as redundant.
- B4 (ZipStore.close before open): replace the two example-based regression
  tests with a stateful lifecycle machine asserting `close()` never raises,
  regardless of open/IO history.
- B5 (missing raise in codec order validation): replace the single example
  with a property test over random {AA,AB,BB} codec orderings whose expected
  outcome (TypeError / ValueError / ok) is modelled independently.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… use-zarr-metadata

# Conflicts:
#	docs/contributing.md
#	src/zarr/core/group.py
#	tests/test_codecs/test_cast_value.py
#	tests/test_codecs/test_scale_offset.py
The check-min-deps-floor hook used `language: system` with `entry:
python ci/check_min_deps_floor.py`, which runs against whatever is on
PATH. Hosts without a bare `python` (Debian/Ubuntu, bare uv-managed
environments) fail the hook with `os error 2` even though the check
itself passes. Switch to `language: python` so pre-commit (and prek in
CI via lint.yml) provisions an interpreter; the script is stdlib-only,
so no extra dependencies are needed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants