refactor: bring in types from zarr-metadata#3961
Conversation
|
cc @chuckwondo |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3961 +/- ##
==========================================
- Coverage 93.55% 93.53% -0.02%
==========================================
Files 88 88
Lines 11896 11885 -11
==========================================
- Hits 11129 11117 -12
- Misses 767 768 +1
🚀 New features to boost your workflow:
|
…into use-zarr-metadata
- set `zarr-metadata` to resolve locally in local development - add a section to the docs outlining the relationship between zarr and zarr-metadata packages
…e the lower zarr-metadata bound is published
…d drop workspaces for min deps test
…ck via pre-commit
|
we should not merge this until #3972 is sorted out. that PR switches our pre-commit mypy check to run in a locked environment, which can include the package-local copy of |
…into use-zarr-metadata
…into use-zarr-metadata
zarr-metadata 0.2.0 is published on PyPI and adds partial metadata document types (ArrayMetadataV2Partial, ArrayMetadataV3Partial, GroupMetadataV2Partial, GroupMetadataV3Partial). Raise the dependency range from `>=0.1.1,<0.2` to `>=0.2.0,<0.3` so zarr-python can consume them, and update the matching min_deps pin and contributing docs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rtial The three `array_metadata` fixtures in test_consolidated.py are partial v3 array documents: they intentionally omit `shape` and `chunk_grid`, which are supplied per-array via spread into `ArrayV3Metadata.from_dict(...)`. zarr-metadata 0.2.0 ships `ArrayMetadataV3Partial` (the `total=False` form) to type exactly this kind of fragment, replacing the loose `dict[str, JSON]` / `dict[str, Any]` annotations. The partial type widens field value types to `object`, so spreading it into the `dict[str, JSON]` literal that `from_dict` consumes needs a per-spread `# type: ignore[dict-item]`. That tradeoff is deliberate: the fragment values are now precisely typed, and the suppression is localized to the spread sites rather than mistyping the actual fixture. Other survey candidates (full GroupMetadata docs in test_group.py, extension-field dicts) were left unchanged: they are either complete documents or extra-key dicts, not partial fragments. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `zarr_json` fixture in test_getitem_consolidated_empty_leaf_group is a complete v3 group metadata document carrying an inline consolidated_metadata extension field. Retype it from `dict[str, JSON]` to `GroupMetadataV3` so the standard fields (zarr_format, node_type, attributes) are structurally checked. mypy does not honor PEP 728 `extra_items=`, so the conforming extension key still needs a single `# type: ignore[typeddict-unknown-key]`, matching the existing pattern elsewhere in the metadata tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Annotate the `valid_json_v3` fixtures in the dtype tests with the matching zarr-metadata 0.2.0 types so the fixture shapes are structurally checked (and dtype-name typos in the bare-string fixtures are caught): - test_int.py: Int8/16/32/64 + Uint8/16/32/64 DataTypeName literals - test_float.py: Float16/32/64 DataTypeName literals - test_complex.py: Complex64/128 DataTypeName literals - test_bool.py: BoolDataTypeName literal - test_string.py: StringDataTypeName literal (variable-length classes) - test_time.py: NumpyDatetime64 / NumpyTimedelta64 envelope TypedDicts Left deliberately untyped (documented inline): - struct: zarr-metadata's `Struct` models `fields` as a tuple, but zarr-python's `Struct.to_json` emits a list and the round-trip test asserts equality, so typing it would break runtime behavior. - fixed_length_utf32 / null_terminated_bytes / raw_bytes / variable_length_bytes: zarr-metadata 0.2.0 exports no envelope type for these dtypes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`Struct.to_json(zarr_format=3)` built `configuration.fields` with a list comprehension, producing a Python list. Nothing requires a list: `json.dumps` serializes list and tuple identically, the `_from_json` parsers iterate the field array structurally, and no test asserts list-ness for the canonical `Struct` class. The list was incidental. Emit a tuple instead, matching zarr-metadata's `StructConfiguration.fields` type (`tuple[StructField, ...]`) and the project's principle that a JSON array is a typed fixed-length container. This lets the v3 round-trip test fixture be typed as zarr-metadata's `Struct` — the one dtype fixture that previously had to stay loosely typed because the tuple model collided with the list emission. The on-disk JSON is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Slow Hypothesis job checked out shallowly (no tags), so hatch-vcs could
not find the `zarr_metadata-v*` / `v*` tags and built the in-tree
zarr-metadata as `0.1.dev1`. After bumping zarr-python's floor to
`zarr-metadata>=0.2.0`, that stale version no longer satisfies the
constraint, producing a ResolutionImpossible at dependency-sync time:
Cannot install zarr-metadata 0.1.dev1 ... and zarr==0.1.dev1 because
these package versions have conflicting dependencies.
Add `fetch-depth: 0` so the workflow grabs all tags, matching test.yml and
the other package-building workflows. hatch-vcs then derives real versions
(zarr 3.x, zarr-metadata 0.2.x) and the floor resolves.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two low-severity findings from the branch review: - releases.yml: the `verify_pypi_dependency` failure diagnostics wrapped `>=` in backticks inside double-quoted echo strings, so bash ran `>=` as command substitution — dropping `>=` from the message and creating a stray `=` file. Switch to single quotes with the requirement passed as a separate argument. - core/metadata/v2.py: the "Re-export ... historical name" comment sat above an unrelated `parse_separator` import. Move it to the `ArrayV2MetadataDict = _ArrayMetadataV2` assignment it actually describes. The third finding (pyright via uvx possibly not resolving src imports) was verified as a non-issue: the zarr-metadata pyright CI job passes on the current branch tip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
this is ready for final review! key notes for reviewers:
|
# Conflicts: # tests/test_dtype/test_npy/test_string.py
…#176) Bumps the actions group with 8 updates in the / directory: | Package | From | To | | --- | --- | --- | | [prefix-dev/setup-pixi](https://github.com/prefix-dev/setup-pixi) | `0.9.5` | `0.9.6` | | [codecov/codecov-action](https://github.com/codecov/codecov-action) | `6.0.0` | `6.0.1` | | [github/issue-metrics](https://github.com/github/issue-metrics) | `4.2.2` | `4.2.7` | | [j178/prek-action](https://github.com/j178/prek-action) | `2.0.3` | `2.0.4` | | [actions/upload-artifact](https://github.com/actions/upload-artifact) | `7.0.0` | `7.0.1` | | [actions/download-artifact](https://github.com/actions/download-artifact) | `7.0.0` | `8.0.1` | | [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish) | `1.13.0` | `1.14.0` | | [zizmorcore/zizmor-action](https://github.com/zizmorcore/zizmor-action) | `0.5.3` | `0.5.6` | Updates `prefix-dev/setup-pixi` from 0.9.5 to 0.9.6 - [Release notes](https://github.com/prefix-dev/setup-pixi/releases) - [Commits](prefix-dev/setup-pixi@1b2de7f...5185adf) Updates `codecov/codecov-action` from 6.0.0 to 6.0.1 - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](codecov/codecov-action@57e3a13...e79a696) Updates `github/issue-metrics` from 4.2.2 to 4.2.7 - [Release notes](https://github.com/github/issue-metrics/releases) - [Commits](github-community-projects/issue-metrics@c9e9838...1e38d5e) Updates `j178/prek-action` from 2.0.3 to 2.0.4 - [Release notes](https://github.com/j178/prek-action/releases) - [Commits](j178/prek-action@6ad8027...bdca6f1) Updates `actions/upload-artifact` from 7.0.0 to 7.0.1 - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@v7...043fb46) Updates `actions/download-artifact` from 7.0.0 to 8.0.1 - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](actions/download-artifact@v7...3e5f45b) Updates `pypa/gh-action-pypi-publish` from 1.13.0 to 1.14.0 - [Release notes](https://github.com/pypa/gh-action-pypi-publish/releases) - [Commits](pypa/gh-action-pypi-publish@v1.13.0...cef2210) Updates `zizmorcore/zizmor-action` from 0.5.3 to 0.5.6 - [Release notes](https://github.com/zizmorcore/zizmor-action/releases) - [Commits](zizmorcore/zizmor-action@b1d7e1f...5f14fd0) --- updated-dependencies: - dependency-name: prefix-dev/setup-pixi dependency-version: 0.9.6 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions - dependency-name: codecov/codecov-action dependency-version: 6.0.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions - dependency-name: github/issue-metrics dependency-version: 4.2.7 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions - dependency-name: j178/prek-action dependency-version: 2.0.4 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions - dependency-name: actions/upload-artifact dependency-version: 7.0.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions - dependency-name: actions/download-artifact dependency-version: 8.0.1 dependency-type: direct:production update-type: version-update:semver-major dependency-group: actions - dependency-name: pypa/gh-action-pypi-publish dependency-version: 1.14.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: actions - dependency-name: zizmorcore/zizmor-action dependency-version: 0.5.6 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
| ls | ||
| ls dist | ||
|
|
||
| # --------------------------------------------------------------------------- |
There was a problem hiding this comment.
Could we add a test similar to https://github.com/zarr-developers/zarr-python/pull/3961/changes#diff-50c86b7ed8ac2cf95bd48334961bf0530cdc77b5a56f852c5c61b89d735fd711R267-R271 that runs the tests against the maximum version allowed from pypi? It feels like that would also catch any bugs present in that version. If there is a test for the minimum zarr_metadata version, how could a user add a feature that hasn't been released yet and also have fully-passing CI?
There was a problem hiding this comment.
after this PR, our test suite will run against the in-tree version of zarr-metadata. that's always going to be the latest version
| They must have ``must_understand`` set to ``False``, and may contain | ||
| arbitrary additional JSON data. | ||
| """ | ||
| AllowedExtraField = ExtensionFieldV3 |
There was a problem hiding this comment.
Does it make sense to deprecate these sorts of aliases? A custom __getattr__ that checks if the caller is trying to import one of these?
There was a problem hiding this comment.
i think we need to patch the module object for this. lmk if you think that's high enough value in this case -- iirc these types were never exported at the top level
There was a problem hiding this comment.
i think we need to patch the module object for this.
Yes, I think so, something like https://github.com/scverse/anndata/blob/b0746ac7db220a3428ab93d3d1d92c1c1b0122c8/src/anndata/_io/__init__.py#L8-L17
… close, codec order validation - storage/_utils.py: clamp SuffixByteRequest start to max(0, ...) so suffix > len(data) returns all available bytes (B1) - storage/_logging.py: materialise key_ranges into a list before building the log hint string, preventing one-shot generator exhaustion (B2) - abc/store.py: normalise getsize_prefix argument to end with "/" before calling list_prefix, matching delete_dir/is_empty behaviour so sibling keys are not over-counted (B3) - storage/_zip.py: guard ZipStore.close() with an early return when the store was never opened, avoiding AttributeError on missing _lock (B4) - core/codec_pipeline.py: add the missing raise TypeError(msg) in the BytesBytesCodec-after-ArrayArrayCodec branch of codecs_from_list (B5) Add regression tests for all five fixes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Make each of the five bugs fixed in this PR catchable by a property/stateful
or shared-harness test (verified red-green against the pre-fix source), and
remove the per-store special-case tests whose coverage is now subsumed:
- B1 (suffix clamp): broaden the `key_ranges` strategy to emit
SuffixByteRequest / OffsetByteRequest / None with bounds that can exceed
the value length, and give the stateful `get_partial_values` rule an
independent oracle for every ByteRequest variant. The pure-function unit
test `TestNormalizeByteRangeIndex` is kept as a fast deterministic guard.
- B2 (generator exhaustion): the shared `StoreTests.test_get_partial_values`
now passes `key_ranges` as a one-shot generator and asserts one result per
request, so any store/wrapper that exhausts the iterable early is caught
across all stores. The logging-only regression test is removed as redundant.
- B3 (getsize_prefix sibling over-counting): the shared
`StoreTests.test_getsize_prefix` now includes a sibling key ("cc/0") that
must be excluded, covering every store deterministically. Add a matching
`getsize_prefix` rule to the store state machine. The memory-only
regression test is removed as redundant.
- B4 (ZipStore.close before open): replace the two example-based regression
tests with a stateful lifecycle machine asserting `close()` never raises,
regardless of open/IO history.
- B5 (missing raise in codec order validation): replace the single example
with a property test over random {AA,AB,BB} codec orderings whose expected
outcome (TypeError / ValueError / ok) is modelled independently.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… use-zarr-metadata # Conflicts: # docs/contributing.md # src/zarr/core/group.py # tests/test_codecs/test_cast_value.py # tests/test_codecs/test_scale_offset.py
The check-min-deps-floor hook used `language: system` with `entry: python ci/check_min_deps_floor.py`, which runs against whatever is on PATH. Hosts without a bare `python` (Debian/Ubuntu, bare uv-managed environments) fail the hook with `os error 2` even though the check itself passes. Switch to `language: python` so pre-commit (and prek in CI via lint.yml) provisions an interpreter; the script is stdlib-only, so no extra dependencies are needed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
replaces some of our types with exports from zarr-metadata. I expect a few related PRs, alternating between ones like this (importing types) and ones that add missing types to zarr-metadata.
TODO:
docs/user-guide/*.mdchanges/