Skip to content

perf: default integer arrays to int32 for ~25% memory reduction#566

Open
FBumann wants to merge 11 commits into
PyPSA:masterfrom
fluxopt:perf/int32
Open

perf: default integer arrays to int32 for ~25% memory reduction#566
FBumann wants to merge 11 commits into
PyPSA:masterfrom
fluxopt:perf/int32

Conversation

@FBumann

@FBumann FBumann commented Feb 2, 2026

Copy link
Copy Markdown
Collaborator

This is pretty much free memory improvement. It only hurts with Models with >2.1 Billion (2.1e9) variables. THere linopy now RAISES. We could decide to implement this more gracefully, if you think any user hits this number...

Note

The technical breakdown below was generated by AI.

Changes proposed in this Pull Request

Default linopy's internal label and variable-index arrays to int32 instead of int64, cutting their memory ~25% and improving build speed ~10-35%. The dtype is runtime-configurable.

What changed

  • linopy/config.py: New label_dtype option (default np.int32), validated against {np.int32, np.int64}. Set globally or per-scope via linopy.options["label_dtype"] = np.int64 to restore the old behaviour.
  • linopy/model.py: Variable and constraint label allocation uses np.arange(..., dtype=options["label_dtype"]), with an overflow guard that raises ValueError if a model would exceed the int32 maximum (~2.1 billion labels).
  • linopy/variables.py: ffill / bfill / sanitize and the Variables.flat key map use options["label_dtype"] instead of astype(int) (which silently widened labels back to int64).
  • linopy/expressions.py: the vars arrays in LinearExpression construction/assignment/combine use options["label_dtype"].
  • linopy/constraints.py: the Constraints.flat key map and the CSR constraint reconstruction (CSRConstraint._to_dataset) use options["label_dtype"], so a frozen constraint round-tripped through .data/.flat stays int32 instead of re-widening to int64.
  • linopy/common.py: save_join's outer-join fallback uses options["label_dtype"]; the polars schema infers Int32/Int64 from the actual array width.
  • test/test_dtypes.py (new): int32 defaults for labels / expression vars, solve correctness, the overflow guard, and the runtime option (incl. int64 round-trip and invalid-dtype rejection).
  • test/test_constraints.py: dtype assertions relaxed to np.issubdtype(..., np.number) so they hold for both widths.

Dimension coordinates: left at their default (and user-controlled)

  • Auto-materialized coordinates (fill_missing_coords, the _term helper coord) keep the default int64. Narrowing these small index arrays caused a 13-38% build regression (xarray's index creation is slower with int32), so it was reverted while keeping int32 where the memory actually is (labels/vars).
  • User-supplied coordinates keep whatever dtype the user passes — linopy preserves it, so coords given as int32 already produce int32 coordinates. The int64 default only arises from indices created out of range(N)/pandas. These coordinate arrays are the largest residual integer footprint in a typical model, but they hold user key values (years, node ids), so linopy does not narrow them implicitly — that choice stays with the user.

Benchmark results

Measured during development with a standalone script (not included in the branch).

Memory (dataset .nbytes)

Consistent 1.25x reduction across all problem sizes (e.g. 640 MB → 512 MB at 8M vars). The labels and vars arrays shrink 50% (int64 → int32) while lower/upper/coeffs/rhs stay float64.

benchmark_memory_comparison

Build speed

Consistently ~1.1-1.35x faster across all sizes (30 iterations with GC, tight error bars). 10-20% for large models (170ms → 153ms at 8M vars), and up to 35% for small/medium models where fixed allocation overhead dominates.

benchmark_build_comparison

Similar results on a real PyPSA model. No influence on lp-write.

Checklist

  • Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in doc.
  • Unit tests for new features were added (if applicable).
  • A note for the release notes doc/release_notes.rst of the upcoming release is included.
  • I consent to the release of this PR's code under the MIT license.

  linopy/constants.py — Added DEFAULT_LABEL_DTYPE = np.int32

  linopy/model.py — Variable and constraint label assignment now uses np.arange(..., dtype=DEFAULT_LABEL_DTYPE) with overflow guards that raise ValueError if labels exceed
  int32 max.

  linopy/expressions.py — _term coord assignment and all .astype(int) for vars arrays now use DEFAULT_LABEL_DTYPE (int32).

  linopy/common.py — fill_missing_coords uses np.arange(..., dtype=DEFAULT_LABEL_DTYPE). Polars schema inference now checks array.dtype.itemsize instead of the old
  OS/numpy-version hack.

  test/test_constraints.py — Updated 2 dtype assertions to use np.issubdtype instead of == int.

  test/test_dtypes.py (new) — 7 tests covering int32 labels, expression vars, solve correctness, and overflow guards.
…k to int64 via astype(int), now use DEFAULT_LABEL_DTYPE. Also Variables.to_dataframe arange for

  map_labels.
  - linopy/constraints.py: Constraints.to_dataframe arange for map_labels.
  - linopy/common.py: save_join outer-join fallback was casting to int64.
@FBumann FBumann changed the title Perf/int32 perf: default integer arrays to int32 for ~25% memory reduction Feb 2, 2026
@FBumann FBumann mentioned this pull request Feb 2, 2026
3 tasks
FBumann added 2 commits March 14, 2026 18:45
…ords. Here's what changed:

  - test_linear_expression_sum / test_linear_expression_sum_with_const: v.loc[:9].add(v.loc[10:], join="override") → v.loc[:9] + v.loc[10:].assign_coords(dim_2=v.loc[:9].coords["dim_2"])
  - test_add_join_override → test_add_positional_assign_coords: uses v + disjoint.assign_coords(...)
  - test_add_constant_join_override → test_add_constant_positional: now uses different coords [5,6,7] + assign_coords to make the test meaningful
  - test_same_shape_add_join_override → test_same_shape_add_assign_coords: uses + c.to_linexpr().assign_coords(...)
  - test_add_constant_override_positional → test_add_constant_positional_different_coords: expr + other.assign_coords(...)
  - test_sub_constant_override → test_sub_constant_positional: expr - other.assign_coords(...)
  - test_mul_constant_override_positional → test_mul_constant_positional: expr * other.assign_coords(...)
  - test_div_constant_override_positional → test_div_constant_positional: expr / other.assign_coords(...)
  - test_variable_mul_override → test_variable_mul_positional: a * other.assign_coords(...)
  - test_variable_div_override → test_variable_div_positional: a / other.assign_coords(...)
  - test_add_same_coords_all_joins: removed "override" from loop, added assign_coords variant
  - test_add_scalar_with_explicit_join → test_add_scalar: simplified to expr + 10

@FBumann FBumann left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

The guards and logical checks all look correct (see discussion in prior review). Two things to add before merge:

1. Release notes

Please add to doc/release_notes.rst under "Upcoming Version", something like:

* Default internal integer arrays (labels, variable indices, ``_term`` coordinates) to ``int32`` instead of ``int64``, reducing memory usage by ~25% and improving model build speed by 10-35%. The dtype is controlled by ``linopy.constants.DEFAULT_LABEL_DTYPE`` and can be changed back to ``np.int64`` before model construction if needed. An overflow guard raises ``ValueError`` if labels exceed the int32 maximum (~2.1 billion).

2. Document how to override DEFAULT_LABEL_DTYPE

Since every module imports DEFAULT_LABEL_DTYPE by name at import time, simply assigning linopy.constants.DEFAULT_LABEL_DTYPE = np.int64 after import won't propagate. The constant should either:

(a) Be documented as a compile-time constant (users edit constants.py or monkey-patch before importing linopy), or

(b) Be read indirectly so runtime changes work. For example, change all usages to read from the module rather than a local binding:

# In constants.py — no change needed
DEFAULT_LABEL_DTYPE = np.int32

# In model.py, expressions.py, etc. — instead of:
from linopy.constants import DEFAULT_LABEL_DTYPE

# Use:
from linopy import constants
# ... then reference constants.DEFAULT_LABEL_DTYPE everywhere

This way linopy.constants.DEFAULT_LABEL_DTYPE = np.int64 at runtime would work. Option (b) is a small change and much more user-friendly. Up to you whether this is worth doing now or in a follow-up.

Minor items from prior review (still applicable)

  • Add -> None return type annotations to all test functions in test_dtypes.py (CI blocker)
  • Guard test_solve_with_int32_labels with pytest.importorskip("highspy")
  • Remove trailing spaces in the overflow error message strings in model.py

@FBumann

FBumann commented Mar 14, 2026

Copy link
Copy Markdown
Collaborator Author

Note on scipy compatibility: scipy sparse matrices (CSC/CSR) already use int32 for their indices and indptr arrays internally, regardless of input dtype. So every solver receiving matrices through scipy (HiGHS, MOSEK, Gurobi, cuPDLPx) is already getting int32 indices today on master. This change just aligns linopy's internal arrays with what scipy already produces — no new risk to solver interfaces.

FBumann and others added 2 commits March 14, 2026 19:57
- Move DEFAULT_LABEL_DTYPE from constants.py into options["label_dtype"]
- Widen OptionSettings types from int to Any
- Add validation: label_dtype only accepts np.int32 or np.int64
- Fix matrices.py empty clabels fallback to use configured dtype
- Fix f-string quoting and trailing spaces in overflow error messages
- Add -> None annotations and importorskip guard in test_dtypes.py
- Add tests for int64 override and invalid dtype rejection
- Add release notes entry

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dimension coordinates (fill_missing_coords, _term coord) are small
index arrays, not the large label/vars arrays that benefit from int32.
xarray's index creation is slower with int32 than the default int64,
causing a 13-38% build regression. Revert these to default int while
keeping int32 for labels and vars where the memory savings matter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@FBumann

FBumann commented Mar 14, 2026

Copy link
Copy Markdown
Collaborator Author

Benchmark Results: perf/int32 vs master

Summary

Matrix generation is the big win: up to 57% faster at scale (basic n=1000), with 15-20% memory savings on large models.

Build is now neutral — small models unchanged, large basic models even faster (-10 to -19%).

LP write is mixed — big wins at large sizes (basic n=500/1000: -41 to -46%), but some small-size regressions (+15-25%) that look like noise.

Knapsack build still has overhead (+8-38% on small/medium), likely from binary variable handling with int32 conversion.

Bottom line: Clear net positive for any non-trivial model. The bigger the model, the bigger the gains.

Benchmarks from PR #567. Run on macOS, Python 3.11, CPython 64-bit. Full problem sizes (no --quick).

int32_benchmark_results.csv

@FBumann FBumann marked this pull request as ready for review March 14, 2026 19:44
@FBumann FBumann added the performance This improves performance while not (meaningfully) altering behaviour for users label Mar 17, 2026
@FabianHofmann

Copy link
Copy Markdown
Collaborator

@FBumann just scanning oldish prs. do you think we should update this? or rather drop?

@FBumann

FBumann commented May 27, 2026

Copy link
Copy Markdown
Collaborator Author

I would reconsider after we ship the new solver stuff.

@coroa

coroa commented Jun 30, 2026

Copy link
Copy Markdown
Member

Should we push this again?

@FBumann

FBumann commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator Author

Yes, ill do it.

@FBumann FBumann marked this pull request as draft June 30, 2026 16:52
@codspeed-hq

codspeed-hq Bot commented Jun 30, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 33.16%

⚡ 53 improved benchmarks
✅ 85 untouched benchmarks
⏩ 138 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Memory test_to_lp[storage-n=250] 1,319.2 KB 663 KB +98.96%
Memory test_to_lp[storage-n=10] 56.5 KB 30.3 KB +86.29%
Memory test_to_lp[basic-n=250] 2 MB 1.3 MB +56.84%
Memory test_to_lp[kvl_cycles-severity=100] 38.5 MB 25.6 MB +49.99%
Memory test_to_lp[sparse_network-n=250] 34.5 MB 23 MB +49.99%
Memory test_to_lp[rolling-severity=100] 45.8 MB 30.5 MB +49.99%
Memory test_to_lp[kvl_cycles-severity=50] 38.5 MB 25.6 MB +49.99%
Memory test_to_lp[nodal_balance-severity=100] 17.9 MB 11.9 MB +49.98%
Memory test_to_lp[cumsum-severity=100] 29.3 MB 19.5 MB +49.98%
Memory test_to_lp[merge_balance-severity=100] 17.6 MB 11.7 MB +49.98%
Memory test_to_lp[nodal_balance-severity=50] 9.2 MB 6.1 MB +49.97%
Memory test_to_lp[merge_balance-severity=50] 9 MB 6 MB +49.97%
Memory test_to_lp[rolling-severity=50] 45.9 MB 30.6 MB +49.93%
Memory test_to_lp[kvl_cycles-severity=0] 38.5 MB 25.7 MB +49.92%
Memory test_to_lp[nodal_balance-severity=0] 385.3 KB 258.8 KB +48.91%
Memory test_to_lp[masked-n=100] 238.4 KB 160.3 KB +48.74%
Memory test_to_lp[milp-n=50] 63.5 KB 43.9 KB +44.45%
Memory test_to_lp[sparse_network-n=10] 29.8 KB 21.2 KB +40.5%
Memory test_to_lp[merge_balance-severity=0] 367.8 KB 262.8 KB +39.95%
Memory test_to_lp[sos-n=1000] 90.7 KB 67.5 KB +34.3%
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing fluxopt:perf/int32 (f7bad31) with master (fe798b1)

Open in CodSpeed

Footnotes

  1. 138 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@FBumann FBumann marked this pull request as ready for review June 30, 2026 17:42
# Conflicts:
#	doc/release_notes.rst
#	linopy/common.py
#	linopy/config.py
#	linopy/matrices.py
#	linopy/model.py
#	linopy/variables.py
#	test/test_constraints.py
CSRConstraint._to_dataset hardcoded np.int64 for the reconstructed vars and
labels arrays, so a frozen constraint round-tripped through .data/.flat came
back as int64 regardless of options["label_dtype"], silently undoing the
int32 memory win. Use options["label_dtype"] to match the allocation paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@FBumann FBumann marked this pull request as ready for review June 30, 2026 18:09
@FBumann FBumann requested a review from coroa June 30, 2026 18:09
@FBumann

FBumann commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator Author

I wanted to ensure we dont regress on speed. So i ran benchmarks locally too.

No speed regression

Note

Independent local benchmark, generated by AI (benchmem sweep, master vs this branch, fresh isolated uv venvs with identical pinned deps — only linopy differs). Corroborates the PR's own numbers.

Build phase, each spec at its largest size — peak memory is memray-measured (deterministic); time is the pytest-benchmark mean of one local sweep.

case peak master peak int32 peak Δ time master time int32 time Δ
expression_arithmetic (n=250) 19.7 MiB 15.6 MiB −21% 32.2 ms 30.3 ms +6%
qp (n=1000) 190.7 KiB 155.6 KiB −18% 14.0 ms 13.7 ms +2%
basic (n=250) 13.5 MiB 11.1 MiB −18% 27.2 ms 26.4 ms +3%
storage (n=250) 11.4 MiB 9.6 MiB −15% 29.0 ms 28.5 ms +2%
piecewise (n=1000) 1.1 MiB 926.3 KiB −14% 50.8 ms 50.3 ms +1%
milp (n=50) 247.5 KiB 217.0 KiB −12% 20.6 ms 20.0 ms +3%
sos (n=1000) 449.2 KiB 410.2 KiB −9% 12.0 ms 11.7 ms +2%
masked (n=100) 751.8 KiB 711.9 KiB −5% 14.5 ms 13.9 ms +4%
knapsack (n=10000) 802.4 KiB 763.3 KiB −5% 7.3 ms 7.3 ms −1%
merge_balance (sev=100) 24.2 MiB 18.3 MiB −24% 117.0 ms 114.1 ms +2%
nodal_balance (sev=100) 31.8 MiB 28.8 MiB −10% 11.5 ms 11.5 ms +0%
sparse_network (n=250) 38.0 MiB 38.0 MiB −0% 17.5 ms 17.3 ms +2%
kvl_cycles (sev=100) 126.4 MiB 126.2 MiB −0% 17.8 ms 17.5 ms +2%
cumsum (sev=100) 45.0 MiB 44.9 MiB −0% 16.7 ms 15.4 ms +8%
rolling (sev=100) 138.0 MiB 138.0 MiB −0% 30.8 ms 29.6 ms +4%

Peak: improved in 35/35 cases across the full sweep, median −10%. The −0% rows are the float64-dominated patterns (sparse_network, kvl_cycles, cumsum, rolling at high severity), where labels/vars are a small share of the dataset, so narrowing them barely moves the peak — as expected.

Time: improved in 29/35, median +1.8%. A single local sweep is too noisy for per-case time claims (the few non-improvements are all sub-10 ms cases in the ±noise band), but the direction matches the PR's 30-iteration benchmark. Treat the memory column as the hard result and the time column as corroborating.

Does the benign (severity=0) regime regress at scale?

The two worst-looking time numbers in the full sweep are rolling (sev=0, −9%) and nodal_balance (sev=0, −6%) — but those models build in ~1–2 ms, pure noise. Re-running the severity=0 shapes at growing size (25+ rounds each) settles it: once build time clears the noise floor, int32 is faster, never slower.

shape (sev=0) n time master time int32 time Δ peak Δ
nodal_balance 8,000 11.7 ms 11.9 ms −2% (noise ≈5%) −10%
nodal_balance 32,000 14.9 ms 15.3 ms −2% (noise ≈6%) −10%
nodal_balance 128,000 31.2 ms 29.8 ms +5% −10%
nodal_balance 512,000 111.1 ms 102.7 ms +8% −10%
rolling 8,000 11.2 ms 11.2 ms +1% (noise ≈4%) −9%
rolling 32,000 14.2 ms 14.1 ms +1% (noise ≈7%) −10%
rolling 128,000 28.5 ms 25.8 ms +10% −10%
rolling 512,000 89.4 ms 77.1 ms +14% −10%

So the small-size negatives are measurement noise; at scale the benign regime behaves like the model specs — memory −9–10%, time faster.

Full 35-case sweep (every spec at every swept size)
case size peak master peak int32 peak Δ time master time int32 time Δ
basic n=10 19.6 KiB 13.4 KiB −32% 24.3 ms 24.0 ms +1%
basic n=250 13.5 MiB 11.1 MiB −18% 27.2 ms 26.4 ms +3%
expression_arithmetic n=10 31.2 KiB 24.4 KiB −22% 28.1 ms 26.5 ms +6%
expression_arithmetic n=250 19.7 MiB 15.6 MiB −21% 32.2 ms 30.3 ms +6%
knapsack n=100 4.0 KiB 3.2 KiB −20% 6.3 ms 6.1 ms +3%
knapsack n=10000 802.4 KiB 763.3 KiB −5% 7.3 ms 7.3 ms −1%
masked n=10 5.5 KiB 4.7 KiB −16% 14.2 ms 13.6 ms +5%
masked n=100 751.8 KiB 711.9 KiB −5% 14.5 ms 13.9 ms +4%
milp n=10 4.8 KiB 3.9 KiB −19% 20.4 ms 20.1 ms +2%
milp n=50 247.5 KiB 217.0 KiB −12% 20.6 ms 20.0 ms +3%
piecewise n=10 11.1 KiB 10.8 KiB −3% 50.0 ms 48.7 ms +3%
piecewise n=1000 1.1 MiB 926.3 KiB −14% 50.8 ms 50.3 ms +1%
qp n=10 2.8 KiB 2.8 KiB −0% 13.8 ms 13.3 ms +4%
qp n=1000 190.7 KiB 155.6 KiB −18% 14.0 ms 13.7 ms +2%
sos n=10 3.5 KiB 2.8 KiB −20% 11.7 ms 11.1 ms +4%
sos n=1000 449.2 KiB 410.2 KiB −9% 12.0 ms 11.7 ms +2%
sparse_network n=10 34.9 KiB 29.0 KiB −17% 13.9 ms 13.4 ms +4%
sparse_network n=250 38.0 MiB 38.0 MiB −0% 17.5 ms 17.3 ms +2%
storage n=10 469.5 KiB 397.3 KiB −15% 27.2 ms 26.4 ms +3%
storage n=250 11.4 MiB 9.6 MiB −15% 29.0 ms 28.5 ms +2%
cumsum sev=0 16.2 KiB 15.2 KiB −6% 10.2 ms 10.1 ms +1%
cumsum sev=50 11.5 MiB 11.5 MiB −0% 11.8 ms 11.7 ms +1%
cumsum sev=100 45.0 MiB 44.9 MiB −0% 16.7 ms 15.4 ms +8%
kvl_cycles sev=0 126.4 MiB 126.2 MiB −0% 20.3 ms 20.1 ms +1%
kvl_cycles sev=50 126.4 MiB 126.2 MiB −0% 17.9 ms 17.9 ms −0%
kvl_cycles sev=100 126.4 MiB 126.2 MiB −0% 17.8 ms 17.5 ms +2%
merge_balance sev=0 795.0 KiB 705.0 KiB −11% 109.6 ms 108.3 ms +1%
merge_balance sev=50 12.6 MiB 9.5 MiB −24% 113.9 ms 114.4 ms −0%
merge_balance sev=100 24.2 MiB 18.3 MiB −24% 117.0 ms 114.1 ms +2%
nodal_balance sev=0 1.2 MiB 1.1 MiB −10% 10.2 ms 10.8 ms −6% (noise)
nodal_balance sev=50 16.6 MiB 15.0 MiB −10% 10.9 ms 10.9 ms −1%
nodal_balance sev=100 31.8 MiB 28.8 MiB −10% 11.5 ms 11.5 ms +0%
rolling sev=0 774.9 KiB 712.4 KiB −8% 10.3 ms 11.3 ms −9% (noise)
rolling sev=50 69.3 MiB 69.2 MiB −0% 19.3 ms 19.0 ms +2%
rolling sev=100 138.0 MiB 138.0 MiB −0% 30.8 ms 29.6 ms +4%
Method
  • benchmem sweep linopy <master> <this-branch> --suite benchmarks --memory --as-of 2026-06-30 — one fresh uv venv per side, identical pinned measurement deps (numpy/scipy/xarray/pandas/polars/dask + the pytest-benchmark/benchmem/memray stack), only linopy differs.
  • Build phase only; peak = memray high-water mark, time = pytest-benchmark mean.
  • Severity-0 size check is a separate standalone sweep of the two benign shapes at n up to 512k, ≥25 rounds each.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance This improves performance while not (meaningfully) altering behaviour for users

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants