CCA-F Practice Questions — Page 16

Question 151

Scenario: Structured Data Extraction

An extraction service intermittently returns malformed JSON — missing braces and trailing commas — which breaks the downstream parser about 2% of the time. The team currently asks the model in the prompt to "return valid JSON only."

What is the most reliable way to eliminate these syntax errors?

A.Strengthen the prompt wording and add "do not include markdown fences."
B.Post-process the text with a regex that repairs braces and commas.
C.Use `tool_use` with a JSON Schema so output is guaranteed syntactically valid and schema-conformant.
D.Lower the temperature to 0 to avoid formatting mistakes.

Question 152

Open question ↗

Scenario: Structured Data Extraction

After switching to `tool_use` with a strict JSON Schema, output is always valid JSON. Yet an invoice extraction reports `total: 150` while the line items sum to 145, and similar arithmetic mismatches slip through to accounting.

Why does this happen and what addresses it?

A.This is a semantic error; schemas guarantee structure, not correctness — add validation checks plus retry-with-feedback or self-correction.
B.The schema is malformed; tightening the field types will fix the arithmetic.
C.The model needs `tool_choice: "any"` to compute the total correctly.
D.Raising `max_tokens` will give the model room to finish the calculation.

Question 153

Open question ↗

Scenario: Structured Data Extraction

An extraction schema marks `middle_name` as required. Many source documents have no middle name, and the model fabricates plausible-looking values to satisfy the required field, polluting the dataset.

How should the schema be designed?

A.Keep the field required but add "do not guess" to the prompt.
B.Make the field nullable (`"type": ["string", "null"]`) so the model can return null when the information is absent.
C.Remove the field from the schema entirely so nothing is reported.
D.Set a required default of "N/A" as a string.

Question 154

Open question ↗

Scenario: Structured Data Extraction

A support-ticket classifier uses the enum `["bug", "feature", "docs"]`. Around 12% of real tickets fit none of these, and the model is forced to pick a wrong category, losing information and skewing the metrics.

How should the enum be extended?

A.Add 20 more narrowly defined categories to cover every conceivable ticket.
B.Remove the enum constraint and accept any free-text label.
C.Force the model to always pick the closest of the three existing categories.
D.Add `"other"` (with a detail string) and `"unclear"` so out-of-scope or ambiguous tickets are captured honestly.

Question 155

Open question ↗

Scenario: Structured Data Extraction

You need consistent extraction of informal recipe measurements like "two handfuls of rice" and "a pinch of salt" into a normalized `{amount, unit}` structure. A purely descriptive instruction produces inconsistent, ad-hoc conversions.

Which technique works best here?

A.Provide few-shot examples mapping several informal phrases to the exact desired output structure.
B.Write a longer prose description of how to handle every informal unit.
C.Force `tool_choice: "any"` to guarantee JSON output.
D.Raise the temperature so the model is more creative with conversions.

Question 156

Open question ↗

Scenario: Structured Data Extraction

One extraction fails validation because a date is formatted `03/05/25` instead of ISO 8601. A separate extraction fails because the requested supplier tax ID simply does not appear anywhere in the provided document. You plan to retry both with feedback.

For which case will retry-with-feedback help?

A.Retry helps both cases equally.
B.Retry helps neither; you must switch to a larger model.
C.Retry helps the date-format case (the model can reformat); it won't help when the information is absent from the source.
D.Retry helps the missing-information case but not the format case.

Question 157

Open question ↗

Scenario: Structured Data Extraction

A team must classify 60,000 archived documents overnight. Results are needed by morning, cost matters, and no interactive turn-by-turn tool calling is required per document.

Which API choice is most appropriate?

A.The synchronous API, to get each result back immediately.
B.The Message Batches API, for ~50% savings within its up-to-24-hour window.
C.The synchronous API with multi-turn tool calling for each document.
D.Split the job into 60,000 separate real-time requests to avoid the batch window.

Question 158

Open question ↗

Scenario: Structured Data Extraction

A batch of 100 documents is submitted; 94 succeed and 6 fail because they exceeded the context limit. You want to chunk and reprocess only the failures without redoing the successful work.

How does `custom_id` help?

A.It guarantees ordering so failed items always appear at the end of the results.
B.It automatically retries the failed documents inside the same batch.
C.It enables multi-turn tool calling for the failed documents only.
D.It links each response to its source document, so you can identify and resubmit only the 6 failures.

Question 159

Open question ↗

Scenario: Structured Data Extraction

You want invoice extraction to surface a self-correction signal so a downstream system can automatically flag when a stated total doesn't match the sum of the line items.

Which schema design enables this?

A.Extract both `stated_total` and `calculated_total` plus a `conflict_detected` flag so discrepancies are surfaced.
B.Extract only `total` and trust the model to compute it correctly.
C.Mark `total` as required so the model must always provide it.
D.Use `tool_choice: "any"` to force a numeric total.

Question 160

Open question ↗

Scenario: Structured Data Extraction

A document-extraction pipeline reports 96% overall accuracy and the team wants to fully automate it. A spot check reveals that one document type — handwritten intake forms — has roughly 38% errors, hidden inside the aggregate metric.

What practice would have surfaced this risk earlier?

A.Trust the 96% aggregate and automate everything.
B.Increase `max_tokens` for all extractions to reduce errors.
C.Analyze accuracy by document type and field, using stratified random sampling rather than relying on the aggregate.
D.Ask the model to self-rate its confidence on each document and trust it.