FEFreeExamDumps.in

Implementing Data Engineering Solutions Using Azure Databricks

Topic 1

Question 88

DP-750 voucher + Udemy course (lifetime access) = ₹3,500 for Indian ID card holders.

Details →

A nightly ETL job on a classic all-purpose cluster (Databricks Runtime 15.4 LTS, Photon enabled) runs a large `GROUP BY` aggregation over a 2 TB Delta table. The job is slow. In the compute **Metrics** tab you observe: - CPU utilization on the workers is pinned near 100% for the duration of the stage. - The Spark UI shows the aggregation stage has **only 8 tasks**, while the cluster has 64 worker cores available. - Memory utilization is moderate (no spill is reported) and there are no failed tasks. You must reduce the wall-clock time of the aggregation stage by improving parallelism, without changing the data or adding nodes. Which action most directly resolves the bottleneck?

  • AIncrease the number of shuffle partitions (for example, set `spark.sql.shuffle.partitions` to a value such as 128–256, or `auto`) so the aggregation runs across more tasks and uses all 64 cores.
  • BDisable Photon to free CPU cycles for the JVM, because Photon is consuming the CPU headroom needed by the aggregation.
  • CIncrease `spark.executor.memory` to 32 GB per executor to give the aggregation more heap.
  • DSwitch the worker nodes to memory-optimized instance types and reduce the number of cores per executor to one.