DOCUMENTATION
How to
Use it
Complete setup guide for installing and configuring the Data Quality Manager on your Databricks workspace.
01
Prerequisites
Before you begin, ensure your Databricks tenant meets these requirements.
Workspace requirements
- Databricks workspace with Unity Catalog enabled
- Databricks Runtime 13.3 LTS or later
- Permission to create Schemas and Databricks Apps
- Permission to create and run Databricks Jobs
- Access to at least one SQL Warehouse
Required permissions
| CREATE SCHEMA on catalog | App auto-creates the dataquality schema and Delta tables on first startup |
| CAN_USE on SQL Warehouse | Execute metadata queries and fallback validation |
| CAN_MANAGE_RUN on Job | App triggers the validation job via SDK |
| USE CATALOG / USE SCHEMA | Read table metadata for the catalog browser |
| SELECT on tables | Run data quality checks against actual table data |
02
Schema and Delta Tables (auto-created)
No manual SQL is required for storage setup. On first startup the app automatically creates the dataquality schema and the four Delta tables it needs inside your catalog.
The tables created are: dq_expectation_suites, dq_expectations, dq_validation_runs, and dq_validation_results — all in <catalog>.dataquality. They are created with CREATE TABLE IF NOT EXISTS, so restarting the app is always safe.
The app service principal must have CREATE SCHEMA permission on the catalog to create the dataquality schema automatically. Grant this in Step 04 after the app is first deployed and the service principal name is known.
03
Create the Validation Job
The validation job runs Notebook #4 on a Databricks cluster. It accepts a single SUITE_NAME parameter and executes the full validation pipeline.
Steps in the Databricks UI
- Navigate to Workflows → Jobs → Create Job
- Set the job name (e.g. DQ Validation Runner)
- Add a Notebook task pointing to: notebooks/4. Run 1 expectation suite.py
- Add a notebook parameter: key = SUITE_NAME, default = (leave blank)
- Configure the cluster (use DBR 13.3 LTS or later)
- Save the job and note down the Job ID from the URL
The app service principal must have CAN_MANAGE_RUN permission on the job. Go to the job's Permissions settings and add the service principal with that role.
04
Grant Permissions to the App Service Principal
When you deploy a Databricks App, a service principal is automatically created. Grant it the permissions it needs to access your catalog.
-- Allow the app to auto-create the dataquality schema and Delta tables
GRANT CREATE SCHEMA ON CATALOG <catalog> TO `<app-service-principal>`;
-- Grant catalog browser access
GRANT USE CATALOG ON CATALOG <catalog> TO `<app-service-principal>`;
GRANT USE SCHEMA ON ALL SCHEMAS IN CATALOG <catalog> TO `<app-service-principal>`;
GRANT SELECT ON ALL TABLES IN CATALOG <catalog> TO `<app-service-principal>`;Find the service principal name under Settings → Compute → Apps after first deployment. It follows the naming convention databricks-app-<app-name>.
05
Configure the App
Create an app.yaml file in the repository root. Set the CATALOG variable and the validation job ID.
# app.yaml — DO NOT COMMIT (gitignored)
name: data-governance-app
command: ["python", "app.py"]
env:
- name: CATALOG
value: resources_prod # Change per environment
- name: DQ_VALIDATION_JOB_ID
value: "123456789" # Job ID from Step 03
- name: DQ_APP_URL
value: "https://<workspace>.azuredatabricks.net/apps/<app-name>"
# Optional — failure notifications
- name: MICROSOFT_TEAMS_WEBHOOK
valueFrom:
secretScope: dq-secrets
secretKey: teams-webhook-url
resources:
- name: sql-warehouse
type: sql_warehouse
sql_warehouse_id: abc123def456 # Your SQL Warehouse IDEnvironment variable reference
| CATALOG | Yes | Unity Catalog catalog name (resources_dev, resources_prod, …) |
| DQ_VALIDATION_JOB_ID | Recommended | Databricks Job ID for production validation runs |
| DATABRICKS_HOST | Auto-injected | Workspace URL — injected by Databricks Apps runtime |
| SQL_WAREHOUSE_HTTP_PATH | Auto-injected | Injected when a SQL Warehouse resource is linked |
| MICROSOFT_TEAMS_WEBHOOK | Optional | Teams channel webhook URL for failure alerts |
| SMTP_HOST / SMTP_USER / SMTP_PASSWORD | Optional | SMTP credentials for email failure alerts |
06
Deploy the App
Deploy from your terminal using the Databricks CLI.
databricks apps deploy data-governance-app \
--source-code-path /Workspace/Repos/<user>/data-governance-platformAfter deploying
- In the Databricks UI, navigate to Compute → Apps → data-governance-app → Resources
- Click Add Resource, select SQL Warehouse, and choose your warehouse
- This auto-injects DATABRICKS_HOST and SQL_WAREHOUSE_HTTP_PATH
- Click Start App and wait for the status to show Running
- Open the app URL — you should see the Data Quality dashboard
USING THE APP
Key workflows
Create an expectation suite
- 1Open the app and navigate to Define Checks
- 2Click New Suite and enter a name matching your table (e.g. atlas.silver.cli)
- 3Link the suite to a Unity Catalog table using the data asset picker
- 4Set the severity level (high / medium / low) and optional subscriber emails
- 5Click Add Expectation and choose from 27+ built-in expectation types
- 6Fill in the expectation parameters and click Save
Run a validation
- 1Select your suite from the Define Checks list
- 2Click Run Validation
- 3If DQ_VALIDATION_JOB_ID is configured: a Databricks Job is triggered and a link to the run appears
- 4If no Job ID is configured: validation runs inline (suitable for dev/testing)
- 5Navigate to the Results tab to see pass/fail outcomes for each expectation
- 6If any expectation fails, Teams and/or email alerts are sent to subscribers
Browse the Unity Catalog
- 1Navigate to the Catalog tab in the app
- 2Browse catalogs, schemas, and tables using the tree view
- 3Click a table to see its columns, types, and existing comments
- 4Edit table or column comments directly from the UI — saved back to Unity Catalog
TROUBLESHOOTING
Common issues
Catalog browser shows no tables
Check that the app service principal has USE CATALOG and USE SCHEMA grants on the target catalog.
App fails on startup with DDL or schema error
The app auto-creates the dataquality schema and Delta tables on first run. Ensure the service principal has GRANT CREATE SCHEMA on the catalog so it can create the schema.
Run Validation button triggers nothing
If DQ_VALIDATION_JOB_ID is missing, inline validation runs. Check app logs for errors. Ensure the SQL Warehouse is running.
Job is triggered but fails immediately
Confirm the service principal has CAN_MANAGE_RUN on the job. Check the job cluster logs for import errors.
Teams notification not sent
Verify MICROSOFT_TEAMS_WEBHOOK is set and the webhook URL is valid. Check app logs for HTTP errors.
App crashes on startup
Usually a missing CATALOG variable. Check the app.yaml env section and confirm the variable is present.
Need help with setup?
The Plainsight team can guide you through installation and configuration in a short session.
Contact Plainsight