DP-750 Certification Practice Question #88

Question

A nightly ETL job on a classic all-purpose cluster (Databricks Runtime 15.4 LTS, Photon enabled) runs a large `GROUP BY` aggregation over a 2 TB Delta table. The job is slow. In the compute **Metrics** tab you observe:

- CPU utilization on the workers is pinned near 100% for the duration of the stage.
- The Spark UI shows the aggregation stage has **only 8 tasks**, while the cluster has 64 worker cores available.
- Memory utilization is moderate (no spill is reported) and there are no failed tasks.

You must reduce the wall-clock time of the aggregation stage by improving parallelism, without changing the data or adding nodes.

Which action most directly resolves the bottleneck?

Accepted Answer

The symptom — a stage limited to 8 tasks on a 64-core cluster, CPU-bound, with no spill or failures — is classic under-parallelization. Spark runs tasks concurrently up to the number of cores, so only 8 of 64 cores do work. Increasing the number of (shuffle) partitions, or enabling auto-optimized shuffle so AQE sizes them, creates enough tasks to saturate all cores and cut the stage's wall-clock time. Adding memory (C) addresses spill/OOM, not parallelism; disabling Photon (B) and forcing one core per executor (D) both reduce throughput.

More DP-750 practice questions