FEFreeExamDumps.in

Implementing Data Engineering Solutions Using Azure Databricks

Topic 1

Question 56

DP-750 voucher + Udemy course (lifetime access) = ₹3,500 for Indian ID card holders.

Details →

A bronze ingestion job loads clickstream events into a DataFrame `df` with columns `user_id`, `session_id`, `country`, and `revenue`. Profiling reveals three quality problems you must fix before writing to the silver layer: 1. Some rows are exact duplicates across **all** columns and must be collapsed to a single row. 2. The `country` column has `NULL` values that should be replaced with the literal string `"UNKNOWN"`. 3. The `revenue` column has `NULL` values that should be replaced with `0`. You want the most idiomatic PySpark expression that performs all three fixes in one chained transformation. Which code is correct? ```python # Option to choose result = ( df .<STEP_1> .<STEP_2> ) ```

  • A`df.dropDuplicates().na.fill({"country": "UNKNOWN", "revenue": 0})`
  • B`df.distinct().na.drop(subset=["country", "revenue"])`
  • C`df.dropDuplicates(["user_id"]).na.fill("UNKNOWN")`
  • D`df.na.fill({"country": "UNKNOWN", "revenue": 0}).dropDuplicates(["session_id"])`
  • E`df.dropna().fillna({"country": "UNKNOWN", "revenue": 0})`