DP-750 Practice Questions — Page 1

Question 1

You are the lead data engineer for Contoso. Your Unity Catalog-enabled Azure Databricks workspace must serve four distinct workloads, and you want to follow Databricks' compute selection recommendations to minimize cost and management overhead while meeting each workload's needs.

The four workloads are:

- **Workload 1** — A nightly, scheduled notebook task in a Lakeflow Job that transforms Delta tables and has no special cluster requirements.

- **Workload 2** — A Power BI semantic model and ad-hoc SQL analysts who issue bursty, highly concurrent SQL queries against Unity Catalog tables and need sub-10-second start times.

- **Workload 3** — Interactive, collaborative notebook development where several data scientists need to share one always-running cluster to iterate on RDD-based code and R.

- **Workload 4** — A production Spark Submit (JAR) task that requires custom cluster settings not available in serverless.

For each workload, select the recommended compute type.

```mermaid

flowchart TD

subgraph Hotspot["Select recommended compute per workload"]

W1["Workload 1: nightly scheduled notebook task"] --> D1{{"Serverless jobs compute / Serverless SQL warehouse / Classic all-purpose compute"}}

W2["Workload 2: bursty concurrent BI + ad-hoc SQL"] --> D2{{"Serverless jobs compute / Serverless SQL warehouse / Classic jobs compute"}}

W3["Workload 3: shared interactive RDD/R dev"] --> D3{{"Serverless SQL warehouse / Classic all-purpose compute / Classic jobs compute"}}

W4["Workload 4: Spark Submit (JAR) with custom settings"] --> D4{{"Serverless jobs compute / Classic jobs compute / Serverless SQL warehouse"}}

end

```

Question 2

Open question ↗

You manage Databricks SQL for the analytics team at Litware. A team of business analysts connects Power BI and the Databricks SQL editor to Unity Catalog tables. Their usage is highly variable: long idle gaps between meetings, followed by short bursts where 20+ concurrent queries arrive at once and must return quickly. The region supports serverless SQL warehouses and the workspace meets all serverless requirements.

You must choose a SQL warehouse type that:

- Starts in a few seconds after an idle period so analysts are not blocked.

- Automatically and rapidly scales out to absorb sudden query concurrency, then scales back down to minimize cost.

- Supports Photon, Predictive I/O, and Intelligent Workload Management.

Which SQL warehouse type should you choose?

A.Classic SQL warehouse
B.Pro SQL warehouse
C.Serverless SQL warehouse
D.Classic all-purpose compute with a Photon-enabled Databricks Runtime
E.A serverless jobs cluster running SQL tasks

Question 3

Open question ↗

Your Unity Catalog-enabled Azure Databricks workspace runs a mix of workloads. The platform team has set a standard that all interactive notebook clusters must:

- Be Unity Catalog-compliant.

- Allow multiple analysts to share a single cluster and concurrently run isolated Python and SQL workloads, with Lakeguard user workload isolation.

- Follow the Databricks-recommended default access mode for general workloads.

A separate data science team needs a cluster running Databricks Runtime 15.4 LTS ML so they can use RDD APIs and an R-based library that is not supported by the standard team cluster.

You are configuring the access mode for the shared analyst cluster (not the data science cluster). Which access mode should you select for the shared analyst cluster?

A.No isolation shared access mode
B.Standard (formerly shared) access mode
C.Dedicated (formerly single user) access mode assigned to one analyst
D.Dedicated (formerly single user) access mode assigned to a group
E.Credential passthrough access mode

Question 4

Open question ↗

You configure a classic all-purpose compute resource on a Premium-plan Azure Databricks workspace for an interactive data-exploration team. Throughout the day the workload swings widely: some phases run heavy Spark transformations across large Delta tables, while other phases are mostly idle as analysts read results.

You want Azure Databricks to dynamically add worker nodes during compute-intensive phases and remove them when they are no longer needed, so you achieve high utilization without provisioning a large fixed cluster for the peak. You configure the cluster as follows:

```json

{

"cluster_name": "exploration-autoscale",

"spark_version": "15.4.x-scala2.12",

"node_type_id": "Standard_DS4_v2",

"autoscale": {

"min_workers": 2,

"max_workers": 12

"data_security_mode": "USER_ISOLATION"

}

```

Which statement BEST describes the benefit of enabling autoscaling (Min 2 / Max 12) instead of setting a fixed number of workers for this workload?

A.Autoscaling guarantees the cluster always runs at exactly 12 workers so queries never queue.
B.Autoscaling lets Databricks dynamically reallocate workers between the min and max so the cluster runs faster than an under-provisioned fixed cluster and can reduce overall cost versus a statically sized cluster.
C.Autoscaling eliminates the need for auto-termination because an idle autoscaling cluster scales itself to zero workers and stops billing.
D.Autoscaling is required for Unity Catalog compatibility; a fixed-size cluster cannot access Unity Catalog data.
E.Autoscaling forces the use of spot instances for all worker nodes to lower cost.

Question 5

Open question ↗

Your team runs dozens of short Lakeflow Jobs on classic job clusters throughout the business day. Engineers complain that each job spends several minutes acquiring VMs from the cloud provider before any Spark work begins, and the total cluster start time dominates the short jobs. Serverless compute is not available for these particular JAR-based tasks, so you must use classic compute.

You want to reduce cluster start and autoscaling times for these classic clusters without paying Databricks Units (DBUs) for capacity that is sitting idle and ready.

Which approach should you implement?

A.Increase each job cluster's `min_workers` so the cluster never has to scale up from a cold start.
B.Attach the job clusters to an Azure Databricks pool that keeps a set of idle, ready-to-use instances on standby.
C.Disable auto-termination on the job clusters so they stay warm between runs.
D.Switch the clusters to a larger driver node type so VM acquisition completes faster.
E.Enable Photon acceleration, which pre-provisions worker VMs before the job starts.

Question 6

Open question ↗

A finance review shows that several classic all-purpose (interactive) clusters in your Azure Databricks workspace run overnight and on weekends with no activity, accruing DBU and VM charges. You are asked to control this idle cost while keeping the clusters available during working hours.

You decide to configure automatic termination on each interactive cluster. The relevant configuration is:

```json

{

"cluster_name": "analytics-interactive",

"spark_version": "15.4.x-scala2.12",

"node_type_id": "Standard_DS3_v2",

"autotermination_minutes": 45

}

```

Which statement correctly describes how this setting behaves and the best practice it follows?

A.The cluster terminates exactly 45 minutes after it starts, regardless of activity, to cap runtime cost.
B.The cluster terminates after 45 minutes of inactivity (no commands, Spark jobs, Structured Streaming, or JDBC activity), and Databricks recommends auto-termination for all interactive compute, typically 30–60 minutes in dev environments.
C.Setting `autotermination_minutes` to 45 disables billing during the 45-minute idle window before shutdown, so idle clusters are free.
D.Auto-termination only applies to job clusters; all-purpose clusters must be terminated manually.
E.Setting `autotermination_minutes` to 0 terminates the cluster after 0 minutes of inactivity, the most aggressive cost-saving option.

Question 7

Open question ↗

You are creating classic compute resources in a Unity Catalog-enabled Azure Databricks workspace and must apply the correct feature settings.

The platform's requirements are:

- A long-running **production job cluster** that runs operational Delta ETL and must avoid compatibility surprises and be thoroughly testable before upgrades.

- A **data science cluster** that needs the prebuilt ML stack (TensorFlow, PyTorch, XGBoost) integrated with the workspace.

- SQL and DataFrame transformations on the production job cluster should be accelerated by the built-in vectorized engine.

You configure these via the create-compute UI and the Clusters API. Select the THREE statements that correctly describe how to apply these feature settings. **(Choose THREE.)**

A.For the production job cluster running operational workloads, choose a Long Term Support (LTS) Databricks Runtime version so you avoid compatibility issues and can test before upgrading.
B.For the data science cluster, select the **Machine learning** checkbox so the cluster uses Databricks Runtime ML, which preloads TensorFlow, PyTorch, and XGBoost.
C.To accelerate SQL/DataFrame transformations, enable **Use Photon Acceleration**; when creating the cluster via the Clusters API you must explicitly set `runtime_engine` to `PHOTON`.
D.Photon must be disabled on any cluster that uses a Databricks Runtime ML version because Photon and ML runtimes are mutually exclusive on all versions.
E.Selecting the **Machine learning** checkbox automatically sets the cluster's access mode to **Standard (shared)**.
F.Choosing an LTS Databricks Runtime version automatically disables Photon to guarantee long-term API stability.

Question 8

Open question ↗

As a workspace admin, you must provision a classic job cluster that is backed by an Azure Databricks pool and uses autoscaling, so that scheduled Lakeflow Jobs start quickly and scale with demand. You will create the pool first, then a job that uses a new job cluster drawn from that pool.

Order the following steps into the correct sequence to provision a pool-backed job cluster with autoscaling.

```mermaid

flowchart LR

subgraph Tiles["Steps to order"]

T1["Set the pool's Min Idle instances and Max Capacity"]

T2["Create a pool from the Compute > Pools tab and choose the node type"]

T3["In the job's task, configure a new job cluster and select the pool for worker (and driver) nodes"]

T4["Enable autoscaling on the job cluster and set Min and Max workers within the pool's capacity"]

T5["Run the job so the job cluster allocates nodes from the pool's idle instances"]

end

S1[Step 1] --> S2[Step 2] --> S3[Step 3] --> S4[Step 4] --> S5[Step 5]

```

Question 9

Open question ↗

You own (have CAN MANAGE on) a shared classic all-purpose compute resource in an Azure Databricks workspace. You must grant the minimum compute permission level to three personas so each can do exactly their job and nothing more:

- **Persona 1 — Analyst:** Must be able to attach notebooks to the compute and view the Spark UI / compute metrics, but must NOT be able to start, restart, terminate, edit, or resize the compute.

- **Persona 2 — On-call engineer:** Must be able to start, restart, and terminate the compute (in addition to attaching), but must NOT be able to edit the configuration, resize it, or modify its permissions.

- **Persona 3 — Compute owner:** Must be able to edit compute details, resize it, attach libraries, and modify permissions.

For each persona, select the minimum compute ACL permission level.

```mermaid

flowchart TD

subgraph Hotspot["Assign minimum compute permission per persona"]

P1["Analyst: attach + view Spark UI/metrics only"] --> D1{{"CAN ATTACH TO / CAN RESTART / CAN MANAGE"}}

P2["On-call engineer: start/restart/terminate + attach"] --> D2{{"CAN ATTACH TO / CAN RESTART / CAN MANAGE"}}

P3["Compute owner: edit, resize, attach libraries, modify permissions"] --> D3{{"CAN ATTACH TO / CAN RESTART / CAN MANAGE"}}

end

```

Question 10

Open question ↗

Contoso runs a single Azure Databricks account with one Unity Catalog metastore per region. A data platform team must design a catalog-and-isolation strategy that satisfies four governance requirements at the same time. The team has already decided to follow Databricks' recommended pattern of isolating software development lifecycle (SDLC) environments at the catalog level of the three-level namespace, and to bind catalogs to specific workspaces where work environments and data share the same isolation requirements.

For each requirement, select the option that Databricks recommends.

```mermaid

flowchart TD

M[Unity Catalog metastore - one per region] --> C1[catalog: sales_dev]

M --> C2[catalog: sales_prod]

subgraph R1[Requirement 1: Separate dev from prod data]

D1{Isolate at: metastore / catalog / schema}

end

subgraph R2[Requirement 2: prod data must NOT be queryable from the Dev workspace, even by users with a SELECT grant]

D2{Mechanism: privilege revoke / workspace-catalog binding / external location}

end

subgraph R3[Requirement 3: Let all users discover prod catalog metadata without reading data]

D3{Grant: SELECT / BROWSE / USE CATALOG}

end

subgraph R4[Requirement 4: Share a curated prod table cross-region to a partner]

D4{Method: register external table in 2nd metastore / Delta Sharing / copy files}

end

```