Question 71
Open question ↗You are authoring a Lakeflow Spark Declarative Pipeline in Python. The pipeline defines a bronze streaming table `orders_raw`, a silver streaming table `orders_clean`, and a gold materialized view `daily_revenue`. You did **not** write any explicit orchestration code to specify which dataset runs before another.
```python
import dlt
from pyspark.sql.functions import col, sum as _sum
@dlt.table(name="orders_raw")
def orders_raw():
return (spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "json")
.load("/Volumes/main/sales/landing/orders"))
@dlt.table(name="orders_clean")
def orders_clean():
return (dlt.read_stream("orders_raw")
.filter(col("amount") > 0))
@dlt.table(name="daily_revenue")
def daily_revenue():
return (dlt.read("orders_clean")
.groupBy("order_date")
.agg(_sum("amount").alias("revenue")))
```
How does Lakeflow Spark Declarative Pipelines determine the execution order of these three datasets?
- A.You must add a `depends_on` parameter to each `@dlt.table` decorator to declare the order explicitly.
- B.SDP automatically infers the dependency graph from the `dlt.read`/`dlt.read_stream` references between datasets and runs flows in the correct order with maximum parallelism.
- C.Datasets execute in the top-to-bottom order they appear in the notebook source file.
- D.You must create a separate Lakeflow Job with notebook tasks to sequence the three datasets.