Skip to content

databricks_connect_cluster_id is ignored — Databricks Connect always forces serverless compute #5842

@siddharth-sc1903

Description

@siddharth-sc1903

Summary

When using the Databricks SQL Connector for SQL models + Databricks Connect for DataFrame (Python) models (i.e. databricks-connect installed, force_databricks_connect not set), the databricks_connect_cluster_id connection option is silently ignored and Databricks Connect always connects to serverless compute, regardless of configuration. Setting databricks_connect_use_serverless: false has no effect.

Environment

  • sqlmesh: 0.230.0 (also present on current main)
  • databricks-connect: 17.0.0
  • Engine: databricks gateway, http_path pointing to a serverless SQL Warehouse, databricks_connect_cluster_id set to an all-purpose cluster id.

Possible Root cause

DatabricksEngineAdapter._set_spark_engine_adapter_if_needed selects serverless based on key presence rather than the option's value:

# sqlmesh/core/engine_adapter/databricks.py
if "databricks_connect_use_serverless" in self._extra_config:   # <-- key-presence check
    connect_kwargs["serverless"] = True
else:
    connect_kwargs["cluster_id"] = self._extra_config["databricks_connect_cluster_id"]

Observed end-to-end: a Python model returning a DataFrame is executed on serverless compute (cluster id like 0616-043812-xxxxxxxx-v2n) even though databricks_connect_cluster_id points at an all-purpose cluster.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions