Question 61
Open question ↗You are a data engineer building a Lakeflow Spark Declarative Pipeline that lands a `silver.transactions` streaming table in Unity Catalog. The business defines three row-level data quality rules and wants the pipeline to **drop** any record that violates **any** rule, while still emitting per-rule pass/fail metrics to the pipeline event log:
- **Nullability:** `transaction_id` must never be null.
- **Cardinality / domain:** `status` must be exactly one of `'PENDING'`, `'SETTLED'`, or `'REVERSED'`.
- **Range:** `amount` must be greater than 0 and at most 50000.
You author the dataset with grouped expectations so a single decorator applies all three checks with one collective action:
```python
from pyspark import pipelines as dp
rules = {
"valid_id": "transaction_id IS NOT NULL",
"valid_status": "status IN ('PENDING','SETTLED','REVERSED')",
"valid_amount": "amount > 0 AND amount <= 50000"
}
@dp.table(name="silver_transactions")
@dp.expect_all_or_drop(rules)
def silver_transactions():
return spark.readStream.table("bronze.transactions_raw")
```
In the **Data quality** tab, for each rule you must select the SINGLE check category that the rule's SQL condition primarily implements.
```mermaid
flowchart TD
R1["Rule valid_id:<br/>transaction_id IS NOT NULL"] --> D1{{"Check category?"}}
R2["Rule valid_status:<br/>status IN ('PENDING','SETTLED','REVERSED')"] --> D2{{"Check category?"}}
R3["Rule valid_amount:<br/>amount > 0 AND amount <= 50000"] --> D3{{"Check category?"}}
D1 -.options.-> O1["Nullability / Cardinality / Range"]
D2 -.options.-> O2["Nullability / Cardinality / Range"]
D3 -.options.-> O3["Nullability / Cardinality / Range"]
```