Question 90
DP-750 voucher + Udemy course (lifetime access) = ₹3,500 for Indian ID card holders.
Details →A daily job joins a 3 TB `fact_orders` Delta table to a small 40 MB `dim_country` lookup table and then aggregates. In the Spark UI you find: - The join stage reports significant **disk spill** and a very large **Shuffle Read** on both sides of a sort-merge join. - A few post-shuffle partitions are far larger than the rest (the join key is skewed). - AQE has been turned off in this workspace by a legacy `spark.databricks.optimizer.adaptive.enabled false` setting. You must reduce shuffle and spill and balance the skewed partitions. Which **three** actions are appropriate? (Choose THREE.)
- ARe-enable Adaptive Query Execution (`spark.databricks.optimizer.adaptive.enabled true`) so Spark can coalesce post-shuffle partitions and apply skew-join handling at runtime.
- BForce a broadcast hash join of the 40 MB `dim_country` table (for example with a `BROADCAST` hint) to eliminate the shuffle of the large fact table against the lookup.
- CUse `repartition()` on the skewed join key to redistribute the fact data into more, balanced partitions before the join.
- DReplace the join with `coalesce(1)` on the fact DataFrame to reduce the number of output partitions and therefore the shuffle volume.
- EDisable Photon so the sort-merge join falls back to row-based execution.
- FSet `spark.sql.shuffle.partitions` to 1 so the entire shuffle happens in a single task.