Plainsight

DOCUMENTATION

How to
Use it

Complete setup guide for installing and configuring the Data Quality Manager on your Databricks workspace.

1

01

Prerequisites

Before you begin, ensure your Databricks tenant meets these requirements.

Workspace requirements

  • Databricks workspace with Unity Catalog enabled
  • Databricks Runtime 13.3 LTS or later
  • Permission to create Schemas and Databricks Apps
  • Permission to create and run Databricks Jobs
  • Access to at least one SQL Warehouse

Required permissions

CREATE SCHEMA on catalogApp auto-creates the dataquality schema and Delta tables on first startup
CAN_USE on SQL WarehouseExecute metadata queries and fallback validation
CAN_MANAGE_RUN on JobApp triggers the validation job via SDK
USE CATALOG / USE SCHEMARead table metadata for the catalog browser
SELECT on tablesRun data quality checks against actual table data
2

02

Schema and Delta Tables (auto-created)

No manual SQL is required for storage setup. On first startup the app automatically creates the dataquality schema and the four Delta tables it needs inside your catalog.

The tables created are: dq_expectation_suites, dq_expectations, dq_validation_runs, and dq_validation_results — all in <catalog>.dataquality. They are created with CREATE TABLE IF NOT EXISTS, so restarting the app is always safe.

The app service principal must have CREATE SCHEMA permission on the catalog to create the dataquality schema automatically. Grant this in Step 04 after the app is first deployed and the service principal name is known.

3

03

Create the Validation Job

The validation job runs Notebook #4 on a Databricks cluster. It accepts a single SUITE_NAME parameter and executes the full validation pipeline.

Steps in the Databricks UI

  • Navigate to Workflows → Jobs → Create Job
  • Set the job name (e.g. DQ Validation Runner)
  • Add a Notebook task pointing to: notebooks/4. Run 1 expectation suite.py
  • Add a notebook parameter: key = SUITE_NAME, default = (leave blank)
  • Configure the cluster (use DBR 13.3 LTS or later)
  • Save the job and note down the Job ID from the URL

The app service principal must have CAN_MANAGE_RUN permission on the job. Go to the job's Permissions settings and add the service principal with that role.

4

04

Grant Permissions to the App Service Principal

When you deploy a Databricks App, a service principal is automatically created. Grant it the permissions it needs to access your catalog.

sql
-- Allow the app to auto-create the dataquality schema and Delta tables
GRANT CREATE SCHEMA ON CATALOG <catalog> TO `<app-service-principal>`;

-- Grant catalog browser access
GRANT USE CATALOG ON CATALOG <catalog> TO `<app-service-principal>`;
GRANT USE SCHEMA ON ALL SCHEMAS IN CATALOG <catalog> TO `<app-service-principal>`;
GRANT SELECT ON ALL TABLES IN CATALOG <catalog> TO `<app-service-principal>`;

Find the service principal name under Settings → Compute → Apps after first deployment. It follows the naming convention databricks-app-<app-name>.

5

05

Configure the App

Create an app.yaml file in the repository root. Set the CATALOG variable and the validation job ID.

yaml
# app.yaml  —  DO NOT COMMIT (gitignored)
name: data-governance-app
command: ["python", "app.py"]

env:
  - name: CATALOG
    value: resources_prod          # Change per environment

  - name: DQ_VALIDATION_JOB_ID
    value: "123456789"             # Job ID from Step 03

  - name: DQ_APP_URL
    value: "https://<workspace>.azuredatabricks.net/apps/<app-name>"

  # Optional — failure notifications
  - name: MICROSOFT_TEAMS_WEBHOOK
    valueFrom:
      secretScope: dq-secrets
      secretKey: teams-webhook-url

resources:
  - name: sql-warehouse
    type: sql_warehouse
    sql_warehouse_id: abc123def456  # Your SQL Warehouse ID

Environment variable reference

CATALOGYesUnity Catalog catalog name (resources_dev, resources_prod, …)
DQ_VALIDATION_JOB_IDRecommendedDatabricks Job ID for production validation runs
DATABRICKS_HOSTAuto-injectedWorkspace URL — injected by Databricks Apps runtime
SQL_WAREHOUSE_HTTP_PATHAuto-injectedInjected when a SQL Warehouse resource is linked
MICROSOFT_TEAMS_WEBHOOKOptionalTeams channel webhook URL for failure alerts
SMTP_HOST / SMTP_USER / SMTP_PASSWORDOptionalSMTP credentials for email failure alerts
6

06

Deploy the App

Deploy from your terminal using the Databricks CLI.

bash
databricks apps deploy data-governance-app \
  --source-code-path /Workspace/Repos/<user>/data-governance-platform

After deploying

  • In the Databricks UI, navigate to Compute → Apps → data-governance-app → Resources
  • Click Add Resource, select SQL Warehouse, and choose your warehouse
  • This auto-injects DATABRICKS_HOST and SQL_WAREHOUSE_HTTP_PATH
  • Click Start App and wait for the status to show Running
  • Open the app URL — you should see the Data Quality dashboard

USING THE APP

Key workflows

Create an expectation suite

  1. 1Open the app and navigate to Define Checks
  2. 2Click New Suite and enter a name matching your table (e.g. atlas.silver.cli)
  3. 3Link the suite to a Unity Catalog table using the data asset picker
  4. 4Set the severity level (high / medium / low) and optional subscriber emails
  5. 5Click Add Expectation and choose from 27+ built-in expectation types
  6. 6Fill in the expectation parameters and click Save

Run a validation

  1. 1Select your suite from the Define Checks list
  2. 2Click Run Validation
  3. 3If DQ_VALIDATION_JOB_ID is configured: a Databricks Job is triggered and a link to the run appears
  4. 4If no Job ID is configured: validation runs inline (suitable for dev/testing)
  5. 5Navigate to the Results tab to see pass/fail outcomes for each expectation
  6. 6If any expectation fails, Teams and/or email alerts are sent to subscribers

Browse the Unity Catalog

  1. 1Navigate to the Catalog tab in the app
  2. 2Browse catalogs, schemas, and tables using the tree view
  3. 3Click a table to see its columns, types, and existing comments
  4. 4Edit table or column comments directly from the UI — saved back to Unity Catalog

TROUBLESHOOTING

Common issues

Catalog browser shows no tables

Check that the app service principal has USE CATALOG and USE SCHEMA grants on the target catalog.

App fails on startup with DDL or schema error

The app auto-creates the dataquality schema and Delta tables on first run. Ensure the service principal has GRANT CREATE SCHEMA on the catalog so it can create the schema.

Run Validation button triggers nothing

If DQ_VALIDATION_JOB_ID is missing, inline validation runs. Check app logs for errors. Ensure the SQL Warehouse is running.

Job is triggered but fails immediately

Confirm the service principal has CAN_MANAGE_RUN on the job. Check the job cluster logs for import errors.

Teams notification not sent

Verify MICROSOFT_TEAMS_WEBHOOK is set and the webhook URL is valid. Check app logs for HTTP errors.

App crashes on startup

Usually a missing CATALOG variable. Check the app.yaml env section and confirm the variable is present.

Need help with setup?

The Plainsight team can guide you through installation and configuration in a short session.

Contact Plainsight