AIR CLI Integration: `air run` Command Pt. 1 - Add GPU accelerator type and compute config model by riddhibhagwat-db · Pull Request #5602 · databricks/cli

riddhibhagwat-db · 2026-06-15T03:43:27Z

Changes

Adds experimental/air/cmd/compute.go , which is the gpuType model and compute which is the block validation that the air run configuration layer depends on.
Specifically:

the training service accelerator types were added (GPU_1xA10, GPU_8xH100, GPU_1xH100)
parseGPUType resolves a YAML accelerator type string
gpusPerNode is the per node partition count based on the type name
computeConfig and validate() are the port of the python ComputeConfig validators

Why

This is the first, leaf-most piece of the air run port for the AIR CLI and the root of the config validation layer dependencies. This piece for compute does not depend on anything else so it lands first as a small and fully unit-tested unit.
Note that we also use exact case sensitive parsing since a potential typo in the user's YAML could misroute the run. Additionally, we only support GPU_* training service types (legacy MAPI types (eg. h100_80gb) are no longer supported and intentionally deprecated in this port. However, they still have their own display map for historical runs to be able to be displayed (but no new runs can use the MAPI path). Rendering them in get is unaffected since format.go keeps its own display map for historical runs.

Tests

Table-driven unit tests in compute_test.go: parseGPUType for valid types and rejected inputs (wrong casing, legacy types, unknown, empty); gpusPerNode counts plus its invalid-type error; and computeConfig.validate across valid configs and every failure mode (unknown/legacy type, non-positive count, non-multiple count, dual-pool conflict). go build, go test, and golangci-lint are clean.

Implement the read-only run-details command (renamed from `status` to `get`). It fetches a job run via the Jobs API and renders the run's status, start time, duration, retries, experiment, accelerators, dashboard URL, MLflow deep-link, and a foreach/sweep summary. Output is the air-style {v, ts, data} JSON envelope under -o json, or a text view. Renames the command-level identifiers (status -> get) while keeping the run's "status" field/label. Adds format/mlflow/sweep/output helpers with unit tests and an acceptance test, and drops `get` from the not-implemented stub coverage. Co-authored-by: Isaac

Co-authored-by: Isaac

eng-dev-ecosystem-bot · 2026-06-15T04:19:20Z

Integration test report

Commit: 9efd3d1

Run: 27724619645

	Env	🔄flaky	💚RECOVERED	🙈SKIP	✅pass	🙈skip	Time
💚	aws linux		7	14	264	998	6:41
💚	aws windows		7	14	266	996	8:00
💚	aws-ucws linux		7	14	360	912	6:22
💚	aws-ucws windows		7	14	362	910	9:00
💚	azure linux		1	16	267	996	7:02
💚	azure windows		1	16	269	994	7:12
💚	azure-ucws linux		1	16	365	908	8:10
🔄	azure-ucws windows	2	1	16	365	906	8:23
💚	gcp linux		1	16	263	999	7:00
💚	gcp windows		1	16	265	997	9:11

23 interesting tests: 14 SKIP, 7 RECOVERED, 2 flaky

	Test Name	aws linux	aws windows	aws-ucws linux	aws-ucws windows	azure linux	azure windows	azure-ucws linux	azure-ucws windows	gcp linux	gcp windows
💚	TestAccept	💚R	💚R	💚R	💚R	💚R	💚R	💚R	💚R	💚R	💚R
🙈	TestAccept/bundle/invariant/no_drift	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/permissions	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions	💚R	💚R	💚R	💚R	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct	💚R	💚R	💚R	💚R
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform	💚R	💚R	💚R	💚R
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions	💚R	💚R	💚R	💚R	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct	💚R	💚R	💚R	💚R
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform	💚R	💚R	💚R	💚R
🙈	TestAccept/bundle/resources/postgres_branches/basic	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/recreate	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/replace_existing	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/update_protected	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/without_branch_id	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_endpoints/basic	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_endpoints/recreate	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_projects/update_display_name	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/synced_database_tables/basic	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/vector_search_indexes/recreate/embedding_dimension	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/ssh/connection	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🔄	TestFsCpFileToFile	✅p	✅p	✅p	✅p	✅p	✅p	✅p	🔄f	✅p	✅p
🔄	TestFsCpFileToFile/local_to_uc-volumes	🙈s	🙈s	✅p	✅p	🙈s	🙈s	✅p	🔄f	🙈s	🙈s

Top 29 slowest tests (at least 2 minutes):

duration	env	testname
4:10	gcp windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:10	gcp linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:10	gcp windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:00	gcp linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:28	azure-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:26	aws-ucws windows	TestAccept
3:22	aws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:21	azure-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:19	aws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:19	aws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:04	azure linux	TestSecretsPutSecretStringValue
3:04	aws-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:04	azure-ucws linux	TestSecretsPutSecretStringValue
3:02	aws-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:00	azure windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:59	aws-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:48	azure windows	TestAccept
2:47	azure-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:46	aws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:45	aws-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:44	azure linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:40	azure-ucws windows	TestAccept
2:39	gcp windows	TestAccept
2:38	aws windows	TestAccept
2:34	azure-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:34	azure windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:24	azure linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:13	gcp linux	TestSecretsPutSecretStringValue
2:02	aws linux	TestSecretsPutSecretStringValue

pardis-beikzadeh-db · 2026-06-16T11:59:03Z

+	NodePoolID      string `yaml:"node_pool_id"`
+	PoolName        string `yaml:"pool_name"`


Hey let's leave out any pool related features from Go port. cc @ben-hansen-db @maggiewang-db I'd cc Yu Peng but he doesn't have a -db GH account?

Removed, thanks!

pardis-beikzadeh-db · 2026-06-16T12:00:33Z

+	perNode, err := gpusPerNode(g)
+	if err != nil {
+		return err
+	}
+	if c.NumAccelerators%perNode != 0 {
+		return fmt.Errorf("compute.num_accelerators for %s must be a multiple of %d, got %d", c.AcceleratorType, perNode, c.NumAccelerators)
+	}


I'm off the opinion this kind of check should be done in the backend. @maggiewang-db @ben-hansen-db @vinchenzo-db wdyt? can we do that easily using Training Service logic?

I think that based on the project milestones and as I discussed with Maggie yesterday, we want to port this in phases. As written in the project doc, we want to first port the run functionality directly as is (including the validation) and then move the validation & add handlers to the backend in milestone 3.

I agree. But my plan is to do that later in Milestone 3.2 after the initial lift and shift.
It needs some design to decide which validations to move to backend, which validations to keep in client

sounds good

maggiewang-db · 2026-06-17T05:52:16Z

+	case gpuType8xH100:
+		return 8, nil
+	}
+	return 0, fmt.Errorf("invalid GPU type %q", string(g))


Nit: By the time validate() reaches gpusPerNode(), parseGPUType() has already guaranteed g is valid.
It's ok to leave the code as is to be defensive. Just add a comment this shouldn't be reachable.

Added comment, thanks

Add compute.go: the gpuType model and compute-block validation the upcoming `air run` config layer depends on. Defines the canonical GPU_* accelerator types, parseGPUType (exact, case-sensitive), gpusPerNode (partition counts), and computeConfig.validate (positive count, multiple-of-per-node, mutually exclusive node_pool_id/pool_name). Co-authored-by: Isaac

The training compute config no longer supports pool placement, so remove the node_pool_id and pool_name fields and the validation that rejected setting both. Co-authored-by: Isaac

riddhibhagwat-db added 3 commits June 14, 2026 21:38

experimental/air: rename stale TestBuildStatusData to TestBuildGetData

89042d0

Co-authored-by: Isaac

experimental/air: apply testifylint fixes in get/format tests

c99239c

Co-authored-by: Isaac

riddhibhagwat-db temporarily deployed to test-trigger-is June 15, 2026 03:44 — with GitHub Actions Inactive

riddhibhagwat-db requested review from maggiewang-db and simonfaltum June 15, 2026 03:45

riddhibhagwat-db requested a review from ben-hansen-db June 15, 2026 18:11

pardis-beikzadeh-db reviewed Jun 16, 2026

View reviewed changes

riddhibhagwat-db temporarily deployed to test-trigger-is June 16, 2026 18:42 — with GitHub Actions Inactive

maggiewang-db approved these changes Jun 17, 2026

View reviewed changes

pardis-beikzadeh-db approved these changes Jun 17, 2026

View reviewed changes

riddhibhagwat-db force-pushed the air-integration-m1-1 branch from 7af56f3 to a69e0d3 Compare June 17, 2026 21:03

riddhibhagwat-db added 2 commits June 17, 2026 21:24

experimental/air: drop node pool / pool name compute fields

62be1a1

The training compute config no longer supports pool placement, so remove the node_pool_id and pool_name fields and the validation that rejected setting both. Co-authored-by: Isaac

riddhibhagwat-db force-pushed the air-integration-m2-1 branch from f8477fc to 62be1a1 Compare June 17, 2026 21:26

riddhibhagwat-db changed the base branch from air-integration-m1-1 to air-cli June 17, 2026 22:39

riddhibhagwat-db changed the base branch from air-cli to air-integration-m1-1 June 17, 2026 22:40

Merge branch 'air-integration-m1-1' into air-integration-m2-1

9efd3d1

riddhibhagwat-db temporarily deployed to test-trigger-is June 17, 2026 22:46 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AIR CLI Integration: `air run` Command Pt. 1 - Add GPU accelerator type and compute config model#5602

AIR CLI Integration: `air run` Command Pt. 1 - Add GPU accelerator type and compute config model#5602
riddhibhagwat-db wants to merge 6 commits into
air-integration-m1-1from
air-integration-m2-1

riddhibhagwat-db commented Jun 15, 2026

Uh oh!

eng-dev-ecosystem-bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

pardis-beikzadeh-db Jun 16, 2026

Uh oh!

riddhibhagwat-db Jun 16, 2026

Uh oh!

pardis-beikzadeh-db Jun 16, 2026

Uh oh!

riddhibhagwat-db Jun 16, 2026

Uh oh!

maggiewang-db Jun 16, 2026

Uh oh!

pardis-beikzadeh-db Jun 17, 2026

Uh oh!

maggiewang-db Jun 17, 2026

Uh oh!

riddhibhagwat-db Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		NodePoolID string `yaml:"node_pool_id"`
		PoolName string `yaml:"pool_name"`

Conversation

riddhibhagwat-db commented Jun 15, 2026

Changes

Why

Tests

Uh oh!

eng-dev-ecosystem-bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Integration test report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eng-dev-ecosystem-bot commented Jun 15, 2026 •

edited

Loading