DP-750 Certification Practice Question #61

Question

You are a data engineer building a Lakeflow Spark Declarative Pipeline that lands a `silver.transactions` streaming table in Unity Catalog. The business defines three row-level data quality rules and wants the pipeline to **drop** any record that violates **any** rule, while still emitting per-rule pass/fail metrics to the pipeline event log: - **Nullability:** `transaction_id` must never be null. - **Cardinality / domain:** `status` must be exactly one of `'PENDING'`, `'SETTLED'`, or `'REVERSED'`. - **Range:** `amount` must be greater than 0 and at most 50000. You author the dataset with grouped expectations so a single decorator applies all three checks with one collective action: ```python from pyspark import pipelines as dp rules = { "valid_id": "transaction_id IS NOT NULL", "valid_status": "status IN ('PENDING','SETTLED','REVERSED')", "valid_amount": "amount > 0 AND amount <= 50000" } @dp.table(name="silver_transactions") @dp.expect_all_or_drop(rules) def silver_transactions(): return spark.readStream.table("bronze.transactions_raw") ``` In the **Data quality** tab, for each rule you must select the SINGLE check category that the rule's SQL condition primarily implements. ```mermaid flowchart TD R1["Rule valid_id:
transaction_id IS NOT NULL"] --> D1{{"Check category?"}} R2["Rule valid_status:
status IN ('PENDING','SETTLED','REVERSED')"] --> D2{{"Check category?"}} R3["Rule valid_amount:
amount > 0 AND amount <= 50000"] --> D3{{"Check category?"}} D1 -.options.-> O1["Nullability / Cardinality / Range"] D2 -.options.-> O2["Nullability / Cardinality / Range"] D3 -.options.-> O3["Nullability / Cardinality / Range"] ```

Accepted Answer

Each expectation's SQL condition reveals its validation category. `transaction_id IS NOT NULL` is a nullability check — the same intent as a Delta `NOT NULL` enforced constraint. `status IN ('PENDING','SETTLED','REVERSED')` restricts the column to a finite set of permitted distinct values, which is a cardinality/domain check. `amount > 0 AND amount <= 50000` bounds a numeric value between a floor and a ceiling, which is a range check (the expectation analogue of a `CHECK (amount > 0 AND amount <= 50000)` constraint). Because the rules are grouped under `@dp.expect_all_or_drop`, a row that violates any rule is dropped before write, but Lakeflow still records passed/failed counts per named expectation in the event log, satisfying the metrics requirement.

More DP-750 practice questions