Skip to content

Only show installable projects in 'databricks labs list'#5560

Open
janniklasrose wants to merge 4 commits into
mainfrom
janniklasrose/labs-list-installable
Open

Only show installable projects in 'databricks labs list'#5560
janniklasrose wants to merge 4 commits into
mainfrom
janniklasrose/labs-list-installable

Conversation

@janniklasrose

@janniklasrose janniklasrose commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Changes

databricks labs list showed every non-archived, non-fork repository in the databrickslabs GitHub org — currently 39 — but only repositories that ship a labs.yml manifest can actually be installed (currently 8: blueprint, dlt-meta, dqx, lakebridge, lsql, pylint-plugin, sandbox, ucx). Picking anything else from the list failed databricks labs install with Error: remote: read labs.yml from GitHub: not found (error message improved separately in #5559), so the listing mostly advertised projects that cannot be installed.

With this change, the filter is narrowed to labs that are tagged as databricks-cli-installable, currently 3.

Output before (39 entries, abridged) / after (3 entries):

Name           Description
blueprint      Baseline for Databricks Labs projects written in P...
lakebridge     Accelerates migrations to Databricks by automating...
ucx            Automated migrations to Unity Catalog

Tests

Unit tests

This pull request and its description were written by Isaac, an AI coding agent.

'databricks labs list' showed every non-archived, non-fork repository in the databrickslabs GitHub org (currently 39), but only repositories that ship a labs.yml manifest at the root of their release tag can actually be installed (currently 8). Everything else failed 'databricks labs install' with a not-found error. Filter the listing to repositories that have a root labs.yml on their default branch, checked concurrently via raw.githubusercontent.com (not subject to the low unauthenticated GitHub API rate limit) and cached for 24 hours like the repository list itself.

Co-authored-by: Isaac
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Approval status: pending

/cmd/labs/ - needs approval

Files: cmd/labs/list.go, cmd/labs/list_test.go, cmd/labs/project/testdata/installed-in-home/.databricks/labs/databrickslabs-repositories.json
Suggested: @asnare
Also eligible: @alexott

General files (require maintainer)

Files: NEXT_CHANGELOG.md
Based on git history:

  • @pietern -- recent work in cmd/labs/, ./

Any maintainer (@andrewnester, @anton-107, @denik, @pietern, @shreyas-goenka, @simonfaltum, @renaudhartert-db) can approve all areas.
See OWNERS for ownership rules.

Co-authored-by: Isaac
@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Integration test report

Commit: 14ce2cb

Run: 27653046378

Env 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
💚​ aws linux 7 15 264 994 6:18
💚​ aws windows 7 15 266 992 8:29
💚​ aws-ucws linux 7 15 360 908 6:35
💚​ aws-ucws windows 7 15 362 906 9:05
💚​ azure linux 1 17 267 992 5:54
💚​ azure windows 1 17 269 990 8:13
💚​ azure-ucws linux 1 17 365 904 7:48
💚​ azure-ucws windows 1 17 367 902 8:56
💚​ gcp linux 1 17 263 995 6:35
💚​ gcp windows 1 17 265 993 9:02
22 interesting tests: 15 SKIP, 7 RECOVERED
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
💚​ TestAccept 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 💚​R 💚​R 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 💚​R 💚​R 💚​R
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 💚​R 💚​R 💚​R
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 💚​R 💚​R 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 💚​R 💚​R 💚​R
💚​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/grants/select 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
Top 25 slowest tests (at least 2 minutes):
duration env testname
4:32 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:30 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:19 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:14 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:40 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:26 aws windows TestAccept
3:25 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:25 azure windows TestAccept
3:23 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:22 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:21 gcp windows TestAccept
3:20 aws-ucws windows TestAccept
3:20 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:08 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:08 azure-ucws windows TestAccept
3:06 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:03 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:59 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:58 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:54 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:42 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:41 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:31 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:30 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:29 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct

@simonfaltum simonfaltum left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the full diff plus the supporting packages (localcache, cmd/labs/github, clear_cache.go), and ran an independent second-model pass over the same diff; both converged on the same two issues, so requesting changes for those (details inline):

  1. labs clear-cache does not know about the new cache file.
  2. An offline cold start writes an empty installable cache that then sticks for 24h, and (1) means clear-cache cannot fix it.

Both fixes are small. Two smaller notes inline (changelog wording given #5559 is still open, and a test nit).

Checked and found sound: the errgroup filter (writes to distinct slice elements, first-error semantics, limit 10, ctx propagation), preserved ordering and archived/fork semantics, graceful offline behavior when caches exist, the raw.githubusercontent choice and its failure mode (failing loudly beats caching a partial list for 24h), no stale-cache hazard on default_branch (it has been in ghRepo since #914, so old on-disk caches have it), and the test design (the blueprint fixture proof in TestListingWorks is a nice touch). Unit tests for cmd/labs, cmd/labs/github, and cmd/labs/localcache pass locally, including a -race run of the new test.

This review was written by Isaac, an AI coding agent, with an independent second pass by another model.

Comment thread cmd/labs/list.go Outdated
Comment thread cmd/labs/list.go Outdated
Comment thread NEXT_CHANGELOG.md Outdated
Comment thread cmd/labs/list_test.go Outdated

@asnare asnare left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that labs list should only list installable projects, and considered the problem back when I addressed the paging issue.

The problem with testing for labs.yml is the additional REST calls that are required:

  • Before this PR: 1 request (results cached for 24h)
  • After this request: 1 + N requests, where N is currently 39.

Although this implementation avoids the 60/IP/hour quota on the REST API by hitting the CDN directly, in terms of light-touch I think I'd prefer to filter projects based on repository "topics". I've tagged a few with databricks-cli-installable, for testing purposes. Filtering on this has a few benefits:

  • No additional HTTP requests necessary, repository topics are already included in the response we get.
  • Caching remains simple.
  • On the labs/maintainer side, things become opt-in. At 8/39 I think opt-in is preferable to opt-out.
  • On the labs/maintainer side, turning up on labs list becomes an admin operation.

Before reviewing the technical implementation I'd like to get consensus on this.

P.S. I also rejected using the GraphQL API as a solution to detect the presence of labs.yml: calls need to be authenticated, and the quota system would still make it costly.

@janniklasrose

Copy link
Copy Markdown
Contributor Author

@asnare I like the databricks-cli-installable tag idea, it's simple (the diff now looks much cleaner). Any concern with tagging the other 5 installable repos before we proceed here? Do we need alignment from repo owners, or is the fact that these repos already support labs install enough to warrant the tag?

artchen-db pushed a commit to artchen-db/cli that referenced this pull request Jun 18, 2026
…cks#5559)

## Changes

`databricks labs install <name>` only works for repositories in the
databrickslabs GitHub org that ship a `labs.yml` manifest at the
repository root. Most repositories in the org do not (they are libraries
published to CRAN, PyPI, Maven, etc.), and installing one of them fails
with:

```
Error: remote: read labs.yml from GitHub: not found
```

which gives the user no clue what is wrong.

Detect the not-found case when fetching `labs.yml` and return an
actionable error instead:

```
Error: remote: databrickslabs/brickster@v0.2.13 does not provide labs.yml (not found); this project cannot be installed with the Databricks CLI, see https://github.com/databrickslabs/brickster for instructions
```

This also covers projects without any GitHub release, where the version
resolves to the literal ref `latest` and the fetch 404s the same way.

While PR databricks#5560 limits `databricks labs list` to those that are
installable, users can still provide a lab name not in that list. Thus,
this PR is still relevant for graceful error handling.

## Tests

New unit test simulating a project whose release tag has no `labs.yml`
(`cmd/labs/project/fetcher_test.go`). Verified live: `databricks labs
install brickster` produces the message above.

This pull request and its description were written by Isaac, an AI
coding agent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants