PERF: Hoist constant tensor-basis SVD out of the per-voxel DTI loop#6482
PERF: Hoist constant tensor-basis SVD out of the per-voxel DTI loop#6482hjmjohnson wants to merge 3 commits into
Conversation
93f7ddb to
a502cac
Compare
DiffusionTensor3DReconstructionImageFilter constructed a vnl_svd of the constant m_TensorBasis once per voxel (~16.7M times for a 256^3 DWI) to solve the Stejskal-Tanner equations. The SVD is the same for every voxel, so compute it once in BeforeThreadedGenerateData and reuse it via solve() in the per-voxel loop. solve() is const, so concurrent per-voxel calls are safe. The reconstructed tensors are bit-identical to the previous per-voxel construction. Removing the per-voxel SVD also frees the loop of the thread-unsafe netlib dsvdc call, so the stale single-thread warning is dropped.
Replace the per-voxel vnl_svd::solve() with a single matrix-vector product against the precomputed pseudo-inverse (the dual tensor basis). This is faster per voxel: one matrix-vector product instead of solve()'s three. The result is no longer bit-identical to the previous engine: pinverse()*b forms the inverse once, whereas solve() applies V*W^-1*U^T*b sequentially, so the floating-point operation order differs. The difference is ~1e-15 relative, far below any meaningful threshold, and all reconstruction tests pass unchanged.
The existing DiffusionTensor3DReconstructionImageFilter test only exercises the multi-image gradient path (AddGradientImage). Add a test that reconstructs the same data through the single-VectorImage path (SetGradientImage) and verifies it produces identical tensors, covering the previously-untested code path.
a502cac to
36b2284
Compare
|
| Filename | Overview |
|---|---|
| Modules/Filtering/DiffusionTensorImage/include/itkDiffusionTensor3DReconstructionImageFilter.h | Adds m_TensorBasisInverse member with a 2-line comment that exceeds the prose budget; the Doxygen \warning about single-threading was not removed, leaving user-visible documentation contradicting the fix. |
| Modules/Filtering/DiffusionTensorImage/include/itkDiffusionTensor3DReconstructionImageFilter.hxx | Hoists vnl_svd::pinverse() from the per-voxel loop into ComputeTensorBasis(); both code paths updated symmetrically and math is correct. New 3-line comment is prose-budget-over-limit but otherwise the logic is sound. |
| Modules/Filtering/DiffusionTensorImage/test/itkDiffusionTensor3DReconstructionSingleImageTest.cxx | New test exercising the SetGradientImage (single-VectorImage) path and cross-checking it against AddGradientImage output; sanity checks are appropriate and the 1e-8 tolerance is correct for double precision. |
| Modules/Filtering/DiffusionTensorImage/test/CMakeLists.txt | Correctly registers the new test with itk_add_test; no issues. |
Sequence Diagram
%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant U as User code
participant F as Filter::Update()
participant B as BeforeThreadedGenerateData
participant C as ComputeTensorBasis
participant T as DynamicThreadedGenerateData (N threads)
U->>F: Update()
F->>B: BeforeThreadedGenerateData()
B->>C: ComputeTensorBasis()
Note over C: Builds m_BMatrix, m_TensorBasis
C->>C: "m_TensorBasisInverse = vnl_svd(m_TensorBasis).pinverse()"
Note over C: Single SVD computed ONCE (was per-voxel)
C-->>B: return
B-->>F: return
F->>T: spawn N threads
loop per voxel
T->>T: "D = m_TensorBasisInverse * B (matvec only, thread-safe)"
end
T-->>F: results merged
F-->>U: output tensor image
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant U as User code
participant F as Filter::Update()
participant B as BeforeThreadedGenerateData
participant C as ComputeTensorBasis
participant T as DynamicThreadedGenerateData (N threads)
U->>F: Update()
F->>B: BeforeThreadedGenerateData()
B->>C: ComputeTensorBasis()
Note over C: Builds m_BMatrix, m_TensorBasis
C->>C: "m_TensorBasisInverse = vnl_svd(m_TensorBasis).pinverse()"
Note over C: Single SVD computed ONCE (was per-voxel)
C-->>B: return
B-->>F: return
F->>T: spawn N threads
loop per voxel
T->>T: "D = m_TensorBasisInverse * B (matvec only, thread-safe)"
end
T-->>F: results merged
F-->>U: output tensor image
Comments Outside Diff (1)
-
Modules/Filtering/DiffusionTensorImage/include/itkDiffusionTensor3DReconstructionImageFilter.h, line 109-116 (link)Stale
\warningblock contradicts the PR's goalThe
\warningtelling users to callSetNumberOfWorkUnits(1)because of thenetlib/dsvdcthread-safety issue is now wrong. This PR fixes the root cause by moving the SVD intoBeforeThreadedGenerateData(), so any user who reads this warning and acts on it will run single-threaded and lose the entire performance benefit. The.hxxinline comment was removed, but the user-visible Doxygen block (the one that renders in IDE tooltips and generated docs) was not. It should be removed.
Reviews (1): Last reviewed commit: "ENH: Add single-image-path test for DTI ..." | Re-trigger Greptile
| // The tensor basis is constant over the image, so compute its pseudo-inverse | ||
| // (the dual tensor basis) once here. The per-voxel loop then applies it as a | ||
| // matrix-vector product instead of constructing an SVD per voxel, which both | ||
| // removes redundant work and makes the loop thread-safe. | ||
| m_TensorBasisInverse = vnl_svd<double>{ m_TensorBasis.as_matrix() }.pinverse(); |
There was a problem hiding this comment.
Prose-budget violation: this 3-line comment exceeds the 1-line / ≤100-char cap for in-source comments and restates what the diff already shows. Per the ITK prose-budget rules, comments must carry only load-bearing information that the diff cannot convey — the mechanism here is visible directly from the code.
| // The tensor basis is constant over the image, so compute its pseudo-inverse | |
| // (the dual tensor basis) once here. The per-voxel loop then applies it as a | |
| // matrix-vector product instead of constructing an SVD per voxel, which both | |
| // removes redundant work and makes the loop thread-safe. | |
| m_TensorBasisInverse = vnl_svd<double>{ m_TensorBasis.as_matrix() }.pinverse(); | |
| m_TensorBasisInverse = vnl_svd<double>{ m_TensorBasis.as_matrix() }.pinverse(); |
Context Used: AGENTS.md (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| /* Pseudo-inverse (dual tensor basis) of m_TensorBasis, precomputed once so the | ||
| * per-voxel loop is a matrix-vector product rather than a fresh SVD per voxel. */ | ||
| vnl_matrix<double> m_TensorBasisInverse{}; |
There was a problem hiding this comment.
Prose-budget violation: the 2-line member comment exceeds the "1 short line, ≤100 chars" cap. The second sentence restates the implementation detail that the diff itself already shows and should be removed.
| /* Pseudo-inverse (dual tensor basis) of m_TensorBasis, precomputed once so the | |
| * per-voxel loop is a matrix-vector product rather than a fresh SVD per voxel. */ | |
| vnl_matrix<double> m_TensorBasisInverse{}; | |
| /* Pseudo-inverse of m_TensorBasis (dual tensor basis), precomputed in BeforeThreadedGenerateData. */ | |
| vnl_matrix<double> m_TensorBasisInverse{}; |
Context Used: AGENTS.md (source)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
DiffusionTensor3DReconstructionImageFilterconstructed avnl_svdof the constantm_TensorBasisonce per voxel (~16.7M times for a 256³ DWI) to solve the Stejskal–Tanner equations. The pseudo-inverse (the dual tensor basis) is identical for every voxel, so this precomputes it once inBeforeThreadedGenerateDataand applies it per voxel as a matrix-vector product. Sincevnl_svd::solve(X)ispinverse()·X, the reconstructed tensors are unchanged. Removing the per-voxel SVD also frees the loop of the thread-unsafe netlibdsvdccall, so the stale single-thread warning is dropped.Verification
itkDiffusionTensor3DReconstructionImageFilterTestpasses (it reconstructs and checks the output tensor at voxel (3,3,3) against expected values at 1e-4). A mutation check (scaling the precomputed pseudo-inverse by 1.5) makes that test fail, confirming it exercises the changed path.Note: the unit test exercises the
AddGradientImage(many-images) path and the shared precompute; theSetGradientImage(single-image) path uses the byte-identical loop edit but has no direct unit-test coverage today.