DP-750 Certification Practice Question #64

Question

You maintain a Lakeflow Spark Declarative Pipeline that produces the `silver.payments` streaming table for a regulated finance workload. The data steward gives you a strict rule: a payment record with a **null `account_id` is never acceptable** — if even one such record arrives, the pipeline update must **stop immediately and atomically roll back the table update** so no partial, dirty data is committed, forcing manual investigation of the upstream source before reprocessing.

Three candidate SQL expectation clauses are below:

```sql
-- Option 1
CONSTRAINT valid_account EXPECT (account_id IS NOT NULL)

-- Option 2
CONSTRAINT valid_account EXPECT (account_id IS NOT NULL) ON VIOLATION DROP ROW

-- Option 3
CONSTRAINT valid_account EXPECT (account_id IS NOT NULL) ON VIOLATION FAIL UPDATE
```

Which clause meets the steward's requirement?

Accepted Answer

Lakeflow expectations support three violation actions. The default `EXPECT` (warn) writes violating rows to the target and only records metrics. `ON VIOLATION DROP ROW` removes violating rows before write but lets the update complete successfully. `ON VIOLATION FAIL UPDATE` is the only action that stops the pipeline update the moment a violating record is detected and, for a table update, atomically rolls back the transaction so nothing is committed — after which a human must fix the upstream data and rerun. That precisely matches the steward's zero-tolerance, halt-and-investigate requirement, so Option 3 is correct. Note that because `fail` aborts the update, expectation pass/fail metrics are not recorded for that run, and a single flow's failure does not cascade to other parallel flows in the pipeline.

More DP-750 practice questions