FEFreeExamDumps.in

CCA-F Practice Questions — Page 16

Question 151

Open question ↗

Scenario: Structured Data Extraction

An extraction service intermittently returns malformed JSON — missing braces and trailing commas — which breaks the downstream parser about 2% of the time. The team currently asks the model in the prompt to "return valid JSON only."

What is the most reliable way to eliminate these syntax errors?

  • A.Strengthen the prompt wording and add "do not include markdown fences."
  • B.Post-process the text with a regex that repairs braces and commas.
  • C.Use `tool_use` with a JSON Schema so output is guaranteed syntactically valid and schema-conformant.
  • D.Lower the temperature to 0 to avoid formatting mistakes.

Question 152

Open question ↗

Scenario: Structured Data Extraction

After switching to `tool_use` with a strict JSON Schema, output is always valid JSON. Yet an invoice extraction reports `total: 150` while the line items sum to 145, and similar arithmetic mismatches slip through to accounting.

Why does this happen and what addresses it?

  • A.This is a semantic error; schemas guarantee structure, not correctness — add validation checks plus retry-with-feedback or self-correction.
  • B.The schema is malformed; tightening the field types will fix the arithmetic.
  • C.The model needs `tool_choice: "any"` to compute the total correctly.
  • D.Raising `max_tokens` will give the model room to finish the calculation.

Question 153

Open question ↗

Scenario: Structured Data Extraction

An extraction schema marks `middle_name` as required. Many source documents have no middle name, and the model fabricates plausible-looking values to satisfy the required field, polluting the dataset.

How should the schema be designed?

  • A.Keep the field required but add "do not guess" to the prompt.
  • B.Make the field nullable (`"type": ["string", "null"]`) so the model can return null when the information is absent.
  • C.Remove the field from the schema entirely so nothing is reported.
  • D.Set a required default of "N/A" as a string.

Question 154

Open question ↗

Scenario: Structured Data Extraction

A support-ticket classifier uses the enum `["bug", "feature", "docs"]`. Around 12% of real tickets fit none of these, and the model is forced to pick a wrong category, losing information and skewing the metrics.

How should the enum be extended?

  • A.Add 20 more narrowly defined categories to cover every conceivable ticket.
  • B.Remove the enum constraint and accept any free-text label.
  • C.Force the model to always pick the closest of the three existing categories.
  • D.Add `"other"` (with a detail string) and `"unclear"` so out-of-scope or ambiguous tickets are captured honestly.

Question 155

Open question ↗

Scenario: Structured Data Extraction

You need consistent extraction of informal recipe measurements like "two handfuls of rice" and "a pinch of salt" into a normalized `{amount, unit}` structure. A purely descriptive instruction produces inconsistent, ad-hoc conversions.

Which technique works best here?

  • A.Provide few-shot examples mapping several informal phrases to the exact desired output structure.
  • B.Write a longer prose description of how to handle every informal unit.
  • C.Force `tool_choice: "any"` to guarantee JSON output.
  • D.Raise the temperature so the model is more creative with conversions.

Question 156

Open question ↗

Scenario: Structured Data Extraction

One extraction fails validation because a date is formatted `03/05/25` instead of ISO 8601. A separate extraction fails because the requested supplier tax ID simply does not appear anywhere in the provided document. You plan to retry both with feedback.

For which case will retry-with-feedback help?

  • A.Retry helps both cases equally.
  • B.Retry helps neither; you must switch to a larger model.
  • C.Retry helps the date-format case (the model can reformat); it won't help when the information is absent from the source.
  • D.Retry helps the missing-information case but not the format case.

Question 157

Open question ↗

Scenario: Structured Data Extraction

A team must classify 60,000 archived documents overnight. Results are needed by morning, cost matters, and no interactive turn-by-turn tool calling is required per document.

Which API choice is most appropriate?

  • A.The synchronous API, to get each result back immediately.
  • B.The Message Batches API, for ~50% savings within its up-to-24-hour window.
  • C.The synchronous API with multi-turn tool calling for each document.
  • D.Split the job into 60,000 separate real-time requests to avoid the batch window.

Question 158

Open question ↗

Scenario: Structured Data Extraction

A batch of 100 documents is submitted; 94 succeed and 6 fail because they exceeded the context limit. You want to chunk and reprocess only the failures without redoing the successful work.

How does `custom_id` help?

  • A.It guarantees ordering so failed items always appear at the end of the results.
  • B.It automatically retries the failed documents inside the same batch.
  • C.It enables multi-turn tool calling for the failed documents only.
  • D.It links each response to its source document, so you can identify and resubmit only the 6 failures.

Question 159

Open question ↗

Scenario: Structured Data Extraction

You want invoice extraction to surface a self-correction signal so a downstream system can automatically flag when a stated total doesn't match the sum of the line items.

Which schema design enables this?

  • A.Extract both `stated_total` and `calculated_total` plus a `conflict_detected` flag so discrepancies are surfaced.
  • B.Extract only `total` and trust the model to compute it correctly.
  • C.Mark `total` as required so the model must always provide it.
  • D.Use `tool_choice: "any"` to force a numeric total.

Question 160

Open question ↗

Scenario: Structured Data Extraction

A document-extraction pipeline reports 96% overall accuracy and the team wants to fully automate it. A spot check reveals that one document type — handwritten intake forms — has roughly 38% errors, hidden inside the aggregate metric.

What practice would have surfaced this risk earlier?

  • A.Trust the 96% aggregate and automate everything.
  • B.Increase `max_tokens` for all extractions to reduce errors.
  • C.Analyze accuracy by document type and field, using stratified random sampling rather than relying on the aggregate.
  • D.Ask the model to self-rate its confidence on each document and trust it.