perf: default integer arrays to int32 for ~25% memory reduction#566
perf: default integer arrays to int32 for ~25% memory reduction#566FBumann wants to merge 11 commits into
Conversation
linopy/constants.py — Added DEFAULT_LABEL_DTYPE = np.int32 linopy/model.py — Variable and constraint label assignment now uses np.arange(..., dtype=DEFAULT_LABEL_DTYPE) with overflow guards that raise ValueError if labels exceed int32 max. linopy/expressions.py — _term coord assignment and all .astype(int) for vars arrays now use DEFAULT_LABEL_DTYPE (int32). linopy/common.py — fill_missing_coords uses np.arange(..., dtype=DEFAULT_LABEL_DTYPE). Polars schema inference now checks array.dtype.itemsize instead of the old OS/numpy-version hack. test/test_constraints.py — Updated 2 dtype assertions to use np.issubdtype instead of == int. test/test_dtypes.py (new) — 7 tests covering int32 labels, expression vars, solve correctness, and overflow guards.
…k to int64 via astype(int), now use DEFAULT_LABEL_DTYPE. Also Variables.to_dataframe arange for map_labels. - linopy/constraints.py: Constraints.to_dataframe arange for map_labels. - linopy/common.py: save_join outer-join fallback was casting to int64.
…ords. Here's what changed: - test_linear_expression_sum / test_linear_expression_sum_with_const: v.loc[:9].add(v.loc[10:], join="override") → v.loc[:9] + v.loc[10:].assign_coords(dim_2=v.loc[:9].coords["dim_2"]) - test_add_join_override → test_add_positional_assign_coords: uses v + disjoint.assign_coords(...) - test_add_constant_join_override → test_add_constant_positional: now uses different coords [5,6,7] + assign_coords to make the test meaningful - test_same_shape_add_join_override → test_same_shape_add_assign_coords: uses + c.to_linexpr().assign_coords(...) - test_add_constant_override_positional → test_add_constant_positional_different_coords: expr + other.assign_coords(...) - test_sub_constant_override → test_sub_constant_positional: expr - other.assign_coords(...) - test_mul_constant_override_positional → test_mul_constant_positional: expr * other.assign_coords(...) - test_div_constant_override_positional → test_div_constant_positional: expr / other.assign_coords(...) - test_variable_mul_override → test_variable_mul_positional: a * other.assign_coords(...) - test_variable_div_override → test_variable_div_positional: a / other.assign_coords(...) - test_add_same_coords_all_joins: removed "override" from loop, added assign_coords variant - test_add_scalar_with_explicit_join → test_add_scalar: simplified to expr + 10
FBumann
left a comment
There was a problem hiding this comment.
Review
The guards and logical checks all look correct (see discussion in prior review). Two things to add before merge:
1. Release notes
Please add to doc/release_notes.rst under "Upcoming Version", something like:
* Default internal integer arrays (labels, variable indices, ``_term`` coordinates) to ``int32`` instead of ``int64``, reducing memory usage by ~25% and improving model build speed by 10-35%. The dtype is controlled by ``linopy.constants.DEFAULT_LABEL_DTYPE`` and can be changed back to ``np.int64`` before model construction if needed. An overflow guard raises ``ValueError`` if labels exceed the int32 maximum (~2.1 billion).2. Document how to override DEFAULT_LABEL_DTYPE
Since every module imports DEFAULT_LABEL_DTYPE by name at import time, simply assigning linopy.constants.DEFAULT_LABEL_DTYPE = np.int64 after import won't propagate. The constant should either:
(a) Be documented as a compile-time constant (users edit constants.py or monkey-patch before importing linopy), or
(b) Be read indirectly so runtime changes work. For example, change all usages to read from the module rather than a local binding:
# In constants.py — no change needed
DEFAULT_LABEL_DTYPE = np.int32
# In model.py, expressions.py, etc. — instead of:
from linopy.constants import DEFAULT_LABEL_DTYPE
# Use:
from linopy import constants
# ... then reference constants.DEFAULT_LABEL_DTYPE everywhereThis way linopy.constants.DEFAULT_LABEL_DTYPE = np.int64 at runtime would work. Option (b) is a small change and much more user-friendly. Up to you whether this is worth doing now or in a follow-up.
Minor items from prior review (still applicable)
- Add
-> Nonereturn type annotations to all test functions intest_dtypes.py(CI blocker) - Guard
test_solve_with_int32_labelswithpytest.importorskip("highspy") - Remove trailing spaces in the overflow error message strings in
model.py
|
Note on scipy compatibility: scipy sparse matrices ( |
- Move DEFAULT_LABEL_DTYPE from constants.py into options["label_dtype"] - Widen OptionSettings types from int to Any - Add validation: label_dtype only accepts np.int32 or np.int64 - Fix matrices.py empty clabels fallback to use configured dtype - Fix f-string quoting and trailing spaces in overflow error messages - Add -> None annotations and importorskip guard in test_dtypes.py - Add tests for int64 override and invalid dtype rejection - Add release notes entry Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dimension coordinates (fill_missing_coords, _term coord) are small index arrays, not the large label/vars arrays that benefit from int32. xarray's index creation is slower with int32 than the default int64, causing a 13-38% build regression. Revert these to default int while keeping int32 for labels and vars where the memory savings matter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Benchmark Results:
|
|
@FBumann just scanning oldish prs. do you think we should update this? or rather drop? |
|
I would reconsider after we ship the new solver stuff. |
|
Should we push this again? |
|
Yes, ill do it. |
Merging this PR will improve performance by 33.16%
Performance Changes
Tip Curious why this is faster? Comment Comparing Footnotes
|
# Conflicts: # doc/release_notes.rst # linopy/common.py # linopy/config.py # linopy/matrices.py # linopy/model.py # linopy/variables.py # test/test_constraints.py
CSRConstraint._to_dataset hardcoded np.int64 for the reconstructed vars and labels arrays, so a frozen constraint round-tripped through .data/.flat came back as int64 regardless of options["label_dtype"], silently undoing the int32 memory win. Use options["label_dtype"] to match the allocation paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
I wanted to ensure we dont regress on speed. So i ran benchmarks locally too. No speed regression Note Independent local benchmark, generated by AI ( Build phase, each spec at its largest size — peak memory is memray-measured (deterministic); time is the pytest-benchmark mean of one local sweep.
Peak: improved in 35/35 cases across the full sweep, median −10%. The Time: improved in 29/35, median +1.8%. A single local sweep is too noisy for per-case time claims (the few non-improvements are all sub-10 ms cases in the ±noise band), but the direction matches the PR's 30-iteration benchmark. Treat the memory column as the hard result and the time column as corroborating. Does the benign (severity=0) regime regress at scale?The two worst-looking time numbers in the full sweep are
So the small-size negatives are measurement noise; at scale the benign regime behaves like the model specs — memory −9–10%, time faster. Full 35-case sweep (every spec at every swept size)
Method
|
This is pretty much free memory improvement. It only hurts with Models with >2.1 Billion (2.1e9) variables. THere linopy now RAISES. We could decide to implement this more gracefully, if you think any user hits this number...
Note
The technical breakdown below was generated by AI.
Changes proposed in this Pull Request
Default linopy's internal label and variable-index arrays to
int32instead ofint64, cutting their memory ~25% and improving build speed ~10-35%. The dtype is runtime-configurable.What changed
linopy/config.py: Newlabel_dtypeoption (defaultnp.int32), validated against{np.int32, np.int64}. Set globally or per-scope vialinopy.options["label_dtype"] = np.int64to restore the old behaviour.linopy/model.py: Variable and constraint label allocation usesnp.arange(..., dtype=options["label_dtype"]), with an overflow guard that raisesValueErrorif a model would exceed the int32 maximum (~2.1 billion labels).linopy/variables.py:ffill/bfill/sanitizeand theVariables.flatkey map useoptions["label_dtype"]instead ofastype(int)(which silently widened labels back to int64).linopy/expressions.py: thevarsarrays inLinearExpressionconstruction/assignment/combine useoptions["label_dtype"].linopy/constraints.py: theConstraints.flatkey map and the CSR constraint reconstruction (CSRConstraint._to_dataset) useoptions["label_dtype"], so a frozen constraint round-tripped through.data/.flatstays int32 instead of re-widening to int64.linopy/common.py:save_join's outer-join fallback usesoptions["label_dtype"]; the polars schema infersInt32/Int64from the actual array width.test/test_dtypes.py(new): int32 defaults for labels / expression vars, solve correctness, the overflow guard, and the runtime option (incl. int64 round-trip and invalid-dtype rejection).test/test_constraints.py: dtype assertions relaxed tonp.issubdtype(..., np.number)so they hold for both widths.Dimension coordinates: left at their default (and user-controlled)
fill_missing_coords, the_termhelper coord) keep the default int64. Narrowing these small index arrays caused a 13-38% build regression (xarray's index creation is slower with int32), so it was reverted while keeping int32 where the memory actually is (labels/vars).coordsgiven as int32 already produce int32 coordinates. The int64 default only arises from indices created out ofrange(N)/pandas. These coordinate arrays are the largest residual integer footprint in a typical model, but they hold user key values (years, node ids), so linopy does not narrow them implicitly — that choice stays with the user.Benchmark results
Memory (dataset
.nbytes)Consistent 1.25x reduction across all problem sizes (e.g. 640 MB → 512 MB at 8M vars). The
labelsandvarsarrays shrink 50% (int64 → int32) whilelower/upper/coeffs/rhsstay float64.Build speed
Consistently ~1.1-1.35x faster across all sizes (30 iterations with GC, tight error bars). 10-20% for large models (170ms → 153ms at 8M vars), and up to 35% for small/medium models where fixed allocation overhead dominates.
Similar results on a real PyPSA model. No influence on lp-write.
Checklist
doc.doc/release_notes.rstof the upcoming release is included.