Databricks Unity Catalog Sync

"Day Zero Semantic Layer" for Unity Catalog Metric Views

Overview

Connecty is the business logic Agentic AI engine for Databricks. It observes how your data is structured and used, its purpose, then automates the creation of a governed semantic layer.

Connecty turns raw signals—schemas, metadata, and millions of historical queries—into verified Unity Catalog Metric Views and documentation. It accelerates the adoption of AI/BI Genie, Dashboards, and SQL Alerts by populating the semantic layer in minutes, not months.

The Problem

Databricks customers often have well-modeled tables, but lack the semantic definitions required for AI and Self-Service BI.

  • Unity Catalog documentation is often incomplete.

  • Metric View YAML definitions are complex and manually written.

  • Vital business logic is trapped in ad-hoc SQL, limiting reusability.

Manual semantic modeling is slow and error-prone. As a result, powerful features like AI/BI Genie remain underutilized because they lack the "context" to answer questions accurately.

Connecty’s Approach

Connecty acts as an intelligence layer that sits on top of Databricks. It does not replace Unity Catalog; it populates it.

Inputs (Read-Only)

Connecty analyzes a broad spectrum of signals to understand business intent:

  • Databricks Signals: Table schemas, Column statistics, Query History, Usage patterns.

  • Contextual Signals: Business Goals, North Star KPIs, Public Business Context.

Outputs (Write-Back)

  • Unity Catalog Documentation: Human-readable descriptions and context.

  • Unity Catalog Metric Views: Fully defined YAML (Measures, Dimensions, Joins) written to a dedicated schema.


How It Works: The Secure Pipeline

1. Secure Connection & PII Shield

Connecty connects directly to Databricks. During the initial handshake, the AI agent performs a PII Detection Scan on schema and column names.

  • User Verification: The user verifies flagged PII columns.

  • Privacy First: If a column is flagged as PII, Connecty skips data profiling and statistical collection for that column entirely. No sensitive row-level data is read or egressed.

2. Noise-Free Logic Extraction

Connecty’s engine is stress-tested on 100M+ queries. It ingests your Query History but applies AI-Assisted Filtering to separate signal from noise.

  • Filters Out: Failed queries, syntax errors, SELECT * explorations, and one-off ad-hoc tests.

  • Identifies: Recurring business logic, complex joins used by power users, and high-value filters.

3. Agentic Semantic Reasoning

The AI agent synthesizes these signals into candidate Metric Views.

  • Structure Compliance: The agent is constrained to only generate valid Unity Catalog Metric View structures. It translates business logic strictly into what Metric Views allow (e.g., separating Measures from Dimensions).

  • Context Injection: It maps technical columns to business goals (e.g., mapping trx_dt to "Revenue Timestamp").

4. Mandatory Human-in-the-Loop Verification

Connecty operates on a "Verify, then Deploy" principle.

  • The AI proposes definitions.

  • The Data Owner reviews the logic via an assisted interface.

  • Only verified objects are approved for sync. Unverified drafts remain in staging.

5. Governed Writeback & Continuous Learning

  • Dedicated Schema: Final verified assets are written to a dedicated schema (e.g., connecty_semantic_layer), ensuring we never overwrite existing production tables.

  • Versioning: Full auditability is maintained. You can roll back to previous definitions at any time.

  • Continuous Sync: If a human manually edits a definition in Unity Catalog, Connecty’s system learns from that change, updating its internal model to prevent regression.

Key Capabilities

Enterprise Cost Controls

Connecty includes built-in cost limit features. The profiling engine is highly optimized to use existing Unity Catalog statistics (ANALYZE TABLE data) wherever possible, minimizing the need for expensive compute on your SQL Warehouses.

"Day Zero" to "Day 500" Support

  • Day Zero: For new projects, Connecty establishes a pristine semantic layer instantly.

  • Day 500: For existing lakehouses, Connecty mines years of history to retrofit a semantic layer onto legacy data, cleaning up technical debt.

Governed by Design

  • No Data Egress: Data remains in your plane. Only metadata and aggregated profiles are processed.

  • Standard Compliance: Outputs are standard Unity Catalog YAML. There is no vendor lock-in; if you stop using Connecty, your Metric Views remain yours.

Unlocking Databricks AI/BI

With Connecty populating the semantic layer:

  • AI/BI Genie: "Trusted Assets" are pre-populated, dramatically increasing answer accuracy.

  • Dashboards: Metrics are defined once, eliminating "formula drift" across reports.

  • SQL Alerts: Monitoring runs on governed definitions, reducing false positives.

FAQ

Q: Does Connecty read my sensitive data?

A: No. Connecty relies on metadata and statistical profiles. During setup, PII detection ensures that we skip profiling for any sensitive columns. We do not read or move row-level PII data.

Q: My Query History is messy (bad joins, failed queries). Will this generate "garbage" metrics?

A: No. Our engine is trained on 100M+ queries to identify and discard "garbage" queries. We filter out errors and ad-hoc noise, focusing only on recurring, successful patterns. Furthermore, the mandatory Human Verification step ensures no bad logic reaches production.

Q: Will Connecty overwrite my existing production work?

A: No. Connecty writes to a dedicated schema/catalog managed by the application. It respects your existing production assets. Additionally, full versioning allows you to audit and revert changes.

Q: How does Connecty affect my Databricks compute costs?

A: We prioritize cost efficiency. Connecty allows you to set cost limits. Our agent leverages existing metadata stats first and runs highly optimized profiling queries only when necessary, preventing unexpected DBU spikes.

Last updated