DP-750 Certification Practice Question #67

Question

A data engineering team must build an incremental ETL workload that ingests CDC events from a Kafka topic into bronze streaming tables, applies SCD Type 2 logic in silver, and refreshes a set of gold materialized views. The team wants the platform to automatically resolve the execution order of the datasets, retry transient failures starting at the most granular unit, and reduce the amount of hand-written Spark and Structured Streaming orchestration code.

They are deciding between two implementation approaches within Lakeflow:

- Author the logic across several Databricks notebooks and orchestrate them as **notebook tasks** in a Lakeflow Job, wiring up the task dependencies and retry logic manually.
- Author the logic as a **Lakeflow Spark Declarative Pipeline (SDP)** using streaming tables and materialized views.

Which approach best meets the requirements with the least custom orchestration code?

Accepted Answer

Lakeflow Spark Declarative Pipelines is the declarative framework that infers dataset dependencies (from streaming tables and materialized views) and orchestrates flows automatically, removing the need to hand-wire a task DAG. It also retries transient failures with an escalating strategy — Spark task first, then flow, then the whole pipeline — and the `AUTO CDC` API simplifies SCD Type 1/2 without manual watermark/out-of-order handling. The notebook-task and script-based options (A, C, D) push all dependency wiring and retry logic onto the engineer, which is exactly what the requirement asks to avoid.

More DP-750 practice questions