Add BD3LM architecture adapter by puranikyashaswin · Pull Request #1479 · TransformerLensOrg/TransformerLens

puranikyashaswin · 2026-07-02T08:59:41Z

Description

Adds a TransformerBridge Architecture Adapter for BD3LM (Kuleshov Group's Block Diffusion Language Model, ICLR 2025), enabling TransformerBridge.boot_transformers("kuleshov-group/bd3lm-owt-block_size4") with full hook support.

Fixes #1473

BD3LM is a discrete diffusion LM with a single block_size knob that interpolates between autoregressive and full diffusion behavior. It differs structurally from standard causal LMs: adaLN conditioning on the diffusion timestep, a custom Rotary embedding, joint QKV projection, and non-causal block-diffusion attention masking.

Adapter design:

Uses DelegatedAttentionBlockBridge to delegate DDiTBlock.forward() wholesale to the original HF module. adaLN modulation varies per-timestep, so it can't be folded into weights the way LayerNorm folding works for standard transformers wrapping rather than reimplementing avoids getting this subtly wrong.
Registered at all four required sites (adapter package, factory, model registry, report generation); TestRegistrySyncedWithFactory passes.
sources/transformers.py gains hidden_dim→d_model and n_blocks→n_layers fallback aliasing, since BD3LM's HF config uses non-standard attribute names.
_HF_PASSTHROUGH_ATTRS (in both sources/transformers.py and sources/_bridge_builder.py) gains model_length, block_size, cond_dim, adaln, cross_attn. Without this, model_length silently falls back to the wrong default, producing an incorrectly-shaped attention mask and small nonzero logit divergence caught and root-caused during development.

Verification:

Logit parity vs. the raw HF model confirmed block-by-block (all 12 blocks + embeddings + final logits) in both sample_mode=False (default forward path, seq_len=2048, real block-diffusion mask) and sample_mode=True (generation-time path) exact match once the passthrough-attrs fix above was in place.
run_with_cache confirmed to populate real per-hook activations (28 hooks per block: norms, QKV, adaLN modulation output, MLP) in both modes, not just pass-through logits.
Full existing model_bridge unit suite (2209 tests) passes with no regressions from the shared _HF_PASSTHROUGH_ATTRS change.

Open item feedback welcome: verify_models can't currently run BD3LM it assumes AutoModelForCausalLM and doesn't have trust_remote_code allowlisted for this model prefix. Fixing that touches shared infra (verify_models.py) rather than just this adapter, so it's scoped out of this PR; parity/cache correctness was verified directly instead (see above). Happy to take this on separately or fold it in here if preferred.

supports_generation = False since BD3LM uses its own diffusion sampling loop, not HF's generate() Phase 4 doesn't apply, but Phases 1–3 do and were manually verified as above.

Test coverage note: this PR includes unit tests (tests/unit/model_bridge/supported_architectures/test_bd3lm_adapter.py, 17 tests) but not yet a committed integration test at tests/integration/model_bridge/test_bd3lm_adapter.py. Parity was verified via ad-hoc scripts during development rather than a committed test. Happy to add one before merge if preferred.

Type of change

New feature (non-breaking change which adds functionality)

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

jlarson4

Great work so far @puranikyashaswin! Just a couple small comments and one code hygiene point below.

hook_attn_out is a dead hook – DelegatedAttentionBlockBridge pops the four input aliases (hook_q_input/hook_k_input/hook_v_input/hook_attn_in), but leaves hook_attn_out to attn.hook_out. In this situation, attn is a SymbolicBridge whose forward raises, so it never runs under delegation. Due to this blocks.{i}.hook_attn_out appears in the registry but silently never fires (no activation, no error). Please either redirect it to attn.o.hook_out (the real attn_out projection's output, which does fire) or pop it alongside the input aliases.
No committed end-to-end correctness gate – Nothing in CI exercises the real DDiTBlock.forward, the only test builds nn.Linear/nn.LayerNorm mocks and never runs a forward pass, there's no integration test, and verify_models can't reach BD3LM (kuleshov-group/ isn't in _BRIDGE_REMOTE_CODE_PREFIXES). So neither the dead hook above nor any numerical drift is currently detectable, and the block-by-block parity in the description isn't reproducible. test_nemotron_h_adapter.py already establishes the pattern to copy: an opt-in, env-var-skippable real-HF test asserting max_diff == 0.0, plus a hook-firing check on the real model. Adding kuleshov-group/ to the remote-code allowlist would also restore the standard verification path.

jlarson4 · 2026-07-02T16:27:52Z

        tl_config.n_layers = source_config.n_layer
    elif hasattr(source_config, "num_hidden_layers"):
        tl_config.n_layers = source_config.num_hidden_layers
+    elif hasattr(source_config, "n_blocks"):


Move the n_blocks elif to the end of the n_layers chain in map_default_transformer_lens_config. It currently sits before num_transformer_layers/num_layers. Since this new case only serves one architecture, it should have lowest precedence.

…n test and verify_models support

…om/puranikyashaswin/TransformerLens into feature/bd3lm-architecture-adapter # Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.

puranikyashaswin · 2026-07-03T02:49:45Z

Thanks for the detailed review @jlarson4 fixed both:

hook_attn_out: redirected to attn.o.hook_out via hook_alias_overrides. Worth noting explicitly: this captures the raw attn_out projection output, not the gate_msa-scaled value actually added to the residual stream the gating happens inside a torch.jit.script-fused function with no hookable module boundary, so this is the closest available hook point, not a fully equivalent one. Documented this in a comment at the override site so it's clear to anyone using the hook for activation patching.
Correctness gate: added tests/integration/model_bridge/test_bd3lm_adapter.py following test_nemotron_h_adapter.py's pattern loads the real model, asserts max_diff == 0.0 against HF, and verifies hook firing on real activations (confirms the fix to Add patching and ablations features #1 actually works, not just structurally). Also added kuleshov-group/ to _BRIDGE_REMOTE_CODE_PREFIXES so verify_models can reach it now.

Also fixed the n_blocks elif ordering per your inline comment. Full unit suite (3043 tests) passes clean with no regressions.

One CI note: Notebook Checks (Activation_Patching_in_TL_Demo) is failing, but I verified it passes cleanly both on this branch and on a fresh checkout of dev locally (same pre-existing jupyter_client deprecation warnings either way) looks like CI-runner flakiness unrelated to this PR. Let me know if a re-run would help, since I don't have permission to trigger one from a fork.

jlarson4 · 2026-07-03T15:08:14Z

@puranikyashaswin Thank you for the updates! Great work! The demo notebooks can false fail in CI due to API limits, I am rerunning it now. Assuming it passes, I will merge

puranikyashaswin · 2026-07-04T00:35:47Z

Thank you @jlarson4. I appreciate the review and for rerunning the notebook checks. Glad the fixes addressed the concerns. Let me know if there's anything else you'd like me to adjust before merge.

Add BD3LM architecture adapter

ac453b3

puranikyashaswin mentioned this pull request Jul 2, 2026

[Proposal] Add BD3LM block-diffusion adapter (BD3LM) #1473

Open

1 task

Merge branch 'dev' into feature/bd3lm-architecture-adapter

52b9134

jlarson4 reviewed Jul 2, 2026

View reviewed changes

jlarson4 linked an issue Jul 2, 2026 that may be closed by this pull request

[Proposal] Add BD3LM block-diffusion adapter (BD3LM) #1473

Open

1 task

puranikyashaswin added 2 commits July 3, 2026 07:39

Address review: fix dead hook_attn_out, elif ordering, add integratio…

7e70431

…n test and verify_models support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add BD3LM architecture adapter#1479

Add BD3LM architecture adapter#1479
puranikyashaswin wants to merge 4 commits into
TransformerLensOrg:devfrom
puranikyashaswin:feature/bd3lm-architecture-adapter

puranikyashaswin commented Jul 2, 2026

Uh oh!

jlarson4 left a comment

Uh oh!

jlarson4 Jul 2, 2026

Uh oh!

puranikyashaswin commented Jul 3, 2026

Uh oh!

jlarson4 commented Jul 3, 2026

Uh oh!

puranikyashaswin commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

puranikyashaswin commented Jul 2, 2026

Description

Type of change

Checklist:

Uh oh!

jlarson4 left a comment

Choose a reason for hiding this comment

Uh oh!

jlarson4 Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

puranikyashaswin commented Jul 3, 2026

Uh oh!

jlarson4 commented Jul 3, 2026

Uh oh!

puranikyashaswin commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants