# Databricks

As a best practice, generate a dedicated Databricks Principal Service scoped only for your chosen catalog **(Hive or Unity Catalog)** and use it against your workspace host. This isolates your integration credentials and simplifies permission management, ensuring a seamless no-code connection.

### Prerequisites

* **Host**
* **M2M OAuth credentials or Personal Access Token**
* *(Conditional)* **Hive catalog**:
  * **SQL Warehouse HTTP Path** *or* **Cluster ID**
* *(Conditional)* **Unity Catalog**:
  * **SQL Warehouse HTTP Path**

***

### Host

Your **Host** is the domain portion of your Databricks workspace URL.

{% hint style="info" %}
When you log into Databricks, your browser URL looks like:\
\
`https://<workspace-name>.cloud.databricks.com/?o=1234567890123456`\
\
Use only `<workspace-name>.cloud.databricks.com` (no protocol, no path, no query string).

* ✅ `mycompany.cloud.databricks.com`
* ❌ `https://mycompany.cloud.databricks.com`
* ❌ `mycompany.cloud.databricks.com/?o=1234567890123456`<br>
  {% endhint %}

***

### M2M OAuth credentials

Databricks recommends using [service principals](https://docs.databricks.com/aws/en/admin/users-groups/service-principals) for machine-to-machine access.&#x20;

**How to create new service principal and generate credentials:**

1. Click your user avatar (top-right) -> **Settings** -> **Identity and access** -> **Service Principals.**
2. Click on **Add service principal.**
3. Select an existing service principal and assign it to the workspace, or create a new one.
4. Assign the appropriate permissions to the new service principal, including access to the required catalog(s).
5. On page for newly created service principal, open the **Secrets** tabs and click **Generate secret**.
6. Copy the generated `client_id` and `secret`, and store them securely.

### Permissions

After creating the service principal, grant it access to the SQL warehouse and the required Unity Catalog objects.

#### SQL Warehouse

Grant the service principal permission to use the SQL warehouse that will run queries.

1. In Databricks, please go to **SQL Warehouses**.
2. Select the warehouse to be used by the Connecty.
3. Open **Permissions**.
4. Add created Connecty service principal.
5. Grant `CAN_USE` permission.

#### Unity Catalog

Grant access to the required catalog and schema.

```
GRANT USE CATALOG ON CATALOG <CATALOG> TO `connectyai_service_principal`;
GRANT USE SCHEMA  ON SCHEMA  <CATALOG>.<SCHEMA> TO `connectyai_service_principal`;
GRANT SELECT ON SCHEMA <CATALOG>.<SCHEMA> TO `connectyai_service_principal`;
```

#### Query History

To allow Connecty read SQL query history, please grant the following permissions

```
GRANT USE CATALOG ON CATALOG system TO `connectyai_service_principal`;
GRANT USE SCHEMA  ON SCHEMA  system.query TO `connectyai_service_principal`;
GRANT SELECT ON TABLE system.query.history TO `connectyai_service_principal`;
```

### Personal Access Token

Usage of **Personal Access Tokens** in Databricks is currently in legacy mode. Connecty will be supporting this method of authentication as long as Databricks does. Though for new integrations using Service Principals and M2M OAuth is preferred.

**How to generate a token:**

1. Click your user avatar (top-right) → **User Settings** → **Access Tokens**.
2. Click **Generate New Token**, give it a name, set an expiration.
3. Copy the token value — you won’t see it again.

:warning: **Warning:**

* Treat this token like a password. Don’t check it into source control.
* Store it in environment variables or a secure vault.
* ✅ `dapiXXXXXXXXXXXXXXXXXXXX`  - expected format of Databricks access token.
* ❌ *(leaving blank)* — connection will fail.

***

### Catalog Selection (Conditional Fields)

Databricks supports two catalog types. Fill only the fields required for your chosen catalog.

#### Hive Catalog

Use Hive catalog when connecting to classic clusters **or** SQL Warehouses without Unity Catalog.

* **If you’re using a SQL Warehouse**, supply the **SQL Warehouse HTTP Path**:**Tip:**
  1. In Databricks UI, go to **SQL** → **SQL Warehouses**.
  2. Click your warehouse → **Connection Details** → copy the **JDBC/ODBC HTTP Path**.

{% hint style="info" %}

* ✅ `/sql/1.0/warehouses/abcdef1234567890`  - expected HTTP path format.
* ❌ `(cluster IT path or cluster ID)`&#x20;
  {% endhint %}

- **If you’re using a standard (compute) cluster**, supply the **Cluster ID** instead:**Tip:**
  1. In Databricks UI, go to **Compute** → **Clusters**.
  2. Click your cluster name → copy the **Cluster ID** from the URL or details.

{% hint style="info" %}

* ✅ `1234-567890-abcd123`  - expected cluster ID.
* ❌ `/sql/1.0/warehouses/abcdef1234567890` - do not use HTTP path as cluster ID.
  {% endhint %}

⚠️ **Warning:**

* Provide **either** SQL Warehouse HTTP Path **or** Cluster ID — not both.
* If both are filled, the connection may default to the wrong endpoint.

#### Unity Catalog

Use Unity Catalog when your data lives under the Databricks Unity Catalog model.

* **SQL Warehouse HTTP Path** is required.

{% hint style="info" %}
In **SQL** → **SQL Warehouses**, select a warehouse that’s enabled for Unity Catalog, then copy its **HTTP Path** as above.
{% endhint %}

* ❌ Do **not** enter a Cluster ID when using Unity Catalog.

***

### Putting It All Together

Below is an example configuration for each scenario:

#### Example: Hive Catalog + SQL Warehouse

```
host: mycompany.cloud.databricks.com
token: dapiXXXXXXXXXXXXXXXXXXXX (or client_id and secret for M2M OAuth)
sql_warehouse_http_path: /sql/1.0/warehouses/abcdef1234567890
cluster_id: # leave blank
```

#### Example: Hive Catalog + Standard Cluster

```
host: mycompany.cloud.databricks.com
token: dapiXXXXXXXXXXXXXXXXXXXX (or client_id and secret for M2M OAuth)
sql_warehouse_http_path: # leave blank
cluster_id: 1234-567890-abcd123
```

#### Example: Unity Catalog

```
host: mycompany.cloud.databricks.com
token: dapiXXXXXXXXXXXXXXXXXXXX (or client_id and secret for M2M OAuth)
sql_warehouse_http_path: /sql/1.0/warehouses/uvwxyz9876543210
cluster_id: <left empty>
```

***

### Query History support for [Day Zero Semantic Layer](/documentation/usage-guides/context-engine/day-zero-semantic-layer.md)

Connecty generally uses SQL query history to provide a high-quality semantic layer. For Databricks connections, Connecty synchronises query history based on SQL queries executed via Databricks SQL Warehouses. Fetching and parsing SQL queries executed inside standard Databricks jobs is currently not supported.

***

### Unity Catalog Sync

Connecty support synchronising its bespoke semantic layer back into Databricks Unity Catalog metric views through Context Engine Export process. This chapter focuses on the required configuration and permissions for enabling the export. For more details about the export process itself, refer to [Databricks Unity Catalog Sync](/documentation/getting-started/data-connections-guide/databricks/databricks-unity-catalog-sync.md).

#### Permissions

Context Engine Export process writes Connecty semantic layer entities into Unity Catalog as metric views. To enable this, ensure that the configured service principal has write permissions on the target scope.

#### Configuration

<figure><img src="/files/CtCDI5DIdtLF5j2Xb0ia" alt=""><figcaption></figcaption></figure>

Connecty exports semantic layer entities into a specified Databricks schema (`catalog`.`schema`). Before configuring the export, it is recommended to:

1. Verify that the service principal has write permissions for selected schema.
2. Ensure that the selected schema is empty (optional but recommended).

Write permissions for service principal can be configured using the following commands&#x20;

```
GRANT USE CATALOG ON CATALOG <CATALOG> TO `connectyai_service_principal`;
GRANT USE SCHEMA  ON SCHEMA  <CATALOG>.<SCHEMA> TO `connectyai_service_principal`;

GRANT MODIFY ON SCHEMA <CATALOG>.<SCHEMA> TO `connectyai_service_principal`;
GRANT CREATE TABLE ON SCHEMA <CATALOG>.<SCHEMA> TO `connectyai_service_principal`;
```

**Selecting Data Workspaces**

A single Data Connection in Connecty may be used by multiple Data Workspaces. In the `Export Configuration`, you can select which Data Workspaces should be included in the export. For each Data Workspace, you can choose one of the following export classes:

* `All entities` - all semantic layer entities will be exported.
* `Verified only` - only verified entities from semantic layer will be exported in that Data Workspace.

**Handling Naming Conflicts**

Because multiple Data Workspaces can export into the same target schema, naming conflicts may occur (for example, two Workspaces may have a subject named `Products`). Connecty provides two strategies to avoid these conflicts:

* `Prefix object names` - in this strategy Data Workspace identifier will be put as exported metric view name prefix. It would of the form `catalog.schema.<dw_id>_<subject_name>`.
* `Create a per-workspace schema` - in this strategy Connecty will create a new schema for each Data Workspace selected for the Export process. In this strategy export metric view names would of the form `catalog.schema_<dw_id>.<subject_name>`. This strategy requires that configure service principal has permission for creating new schemas in given catalog.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://connectyai.gitbook.io/documentation/getting-started/data-connections-guide/databricks.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
