DP-750 Certification Practice Question #63

Question

An upstream SaaS vendor frequently adds new optional columns to the JSON files they drop into a Unity Catalog volume. You ingest these files into the Delta table `bronze.events` with Auto Loader. Today, when a new column appears, your nightly batch write fails with a schema-mismatch error because Delta Lake enforces the schema on write and rejects columns that are not already in the target table.

The requirement is: **new (additive) columns from the source must be automatically appended to the `bronze.events` schema and their data persisted** — without manually altering the table and without losing data — while existing column types remain unchanged.

You have this Auto Loader write:

```python
(spark.readStream
  .format("cloudFiles")
  .option("cloudFiles.format", "json")
  .option("cloudFiles.schemaLocation", "/Volumes/main/bronze/_schemas/events")
  .load("/Volumes/main/bronze/landing/events")
  .writeStream
  # <-- option goes here
  .option("checkpointLocation", "/Volumes/main/bronze/_ckpt/events")
  .toTable("bronze.events")
)
```

Which option should you add to the **write** so additive source columns are merged into the Delta target schema?

Accepted Answer

Delta Lake enforces schema on write by default, which is exactly why the unmerged write fails when the vendor adds a column. To permit controlled, additive growth, set `mergeSchema` to `true` on the write: columns present in the source but absent from `bronze.events` are appended to the end of the table schema, existing rows receive `NULL` for the new columns, and existing column types are left unchanged. This is the documented Auto Loader + Delta pattern for handling schema drift without manual `ALTER TABLE` work. `overwriteSchema` is wrong because it replaces the schema rather than additively merging it; `failOnNewColumns` is the source-side evolution mode that intentionally halts on drift; and `inferColumnTypes`/`ignoreChanges` address source reading, not target schema evolution. (You would still configure `cloudFiles.schemaLocation` and let Auto Loader's `addNewColumns` evolution surface the new column on the read side, but the target merge is governed by `mergeSchema` on the write.)

More DP-750 practice questions