Question 285
DP-203 voucher + Udemy course (lifetime access) = ₹3,500 for Indian ID card holders.
Details →You have an Azure subscription that contains an Azure Data Lake Storage Gen2 container named Container1 and an Azure Synapse Analytics workspace named Workspace1. Workspace1 contains multiple Apache Spark jobs that reference a large dataset in Container1. You need to optimize the run times of the jobs. What should you do?
- AFor Container1, disable hierarchical namespaces.
- BCache the dataset.
- CIncrease the spark.sql.autoBroadcastJoinThreshold value.
- DUse Resilient Distributed Datasets (RDDs).