FEFreeExamDumps.in

Implementing Data Engineering Solutions Using Azure Databricks

Topic 1

Question 71

DP-750 voucher + Udemy course (lifetime access) = ₹3,500 for Indian ID card holders.

Details →

You are authoring a Lakeflow Spark Declarative Pipeline in Python. The pipeline defines a bronze streaming table `orders_raw`, a silver streaming table `orders_clean`, and a gold materialized view `daily_revenue`. You did **not** write any explicit orchestration code to specify which dataset runs before another. ```python import dlt from pyspark.sql.functions import col, sum as _sum @dlt.table(name="orders_raw") def orders_raw(): return (spark.readStream.format("cloudFiles") .option("cloudFiles.format", "json") .load("/Volumes/main/sales/landing/orders")) @dlt.table(name="orders_clean") def orders_clean(): return (dlt.read_stream("orders_raw") .filter(col("amount") > 0)) @dlt.table(name="daily_revenue") def daily_revenue(): return (dlt.read("orders_clean") .groupBy("order_date") .agg(_sum("amount").alias("revenue"))) ``` How does Lakeflow Spark Declarative Pipelines determine the execution order of these three datasets?

  • AYou must add a `depends_on` parameter to each `@dlt.table` decorator to declare the order explicitly.
  • BSDP automatically infers the dependency graph from the `dlt.read`/`dlt.read_stream` references between datasets and runs flows in the correct order with maximum parallelism.
  • CDatasets execute in the top-to-bottom order they appear in the notebook source file.
  • DYou must create a separate Lakeflow Job with notebook tasks to sequence the three datasets.