Summary
When using the Databricks SQL Connector for SQL models + Databricks Connect for DataFrame (Python) models (i.e. databricks-connect installed, force_databricks_connect not set), the databricks_connect_cluster_id connection option is silently ignored and Databricks Connect always connects to serverless compute, regardless of configuration. Setting databricks_connect_use_serverless: false has no effect.
Environment
- sqlmesh:
0.230.0 (also present on current main)
- databricks-connect:
17.0.0
- Engine:
databricks gateway, http_path pointing to a serverless SQL Warehouse, databricks_connect_cluster_id set to an all-purpose cluster id.
Possible Root cause
DatabricksEngineAdapter._set_spark_engine_adapter_if_needed selects serverless based on key presence rather than the option's value:
# sqlmesh/core/engine_adapter/databricks.py
if "databricks_connect_use_serverless" in self._extra_config: # <-- key-presence check
connect_kwargs["serverless"] = True
else:
connect_kwargs["cluster_id"] = self._extra_config["databricks_connect_cluster_id"]
Observed end-to-end: a Python model returning a DataFrame is executed on serverless compute (cluster id like 0616-043812-xxxxxxxx-v2n) even though databricks_connect_cluster_id points at an all-purpose cluster.
Summary
When using the Databricks SQL Connector for SQL models + Databricks Connect for DataFrame (Python) models (i.e.
databricks-connectinstalled,force_databricks_connectnot set), thedatabricks_connect_cluster_idconnection option is silently ignored and Databricks Connect always connects to serverless compute, regardless of configuration. Settingdatabricks_connect_use_serverless: falsehas no effect.Environment
0.230.0(also present on currentmain)17.0.0databricksgateway,http_pathpointing to a serverless SQL Warehouse,databricks_connect_cluster_idset to an all-purpose cluster id.Possible Root cause
DatabricksEngineAdapter._set_spark_engine_adapter_if_neededselects serverless based on key presence rather than the option's value:Observed end-to-end: a Python model returning a DataFrame is executed on serverless compute (cluster id like
0616-043812-xxxxxxxx-v2n) even thoughdatabricks_connect_cluster_idpoints at an all-purpose cluster.