Question 59
AI-103 voucher + Udemy course (lifetime access) = ₹3,500 for Indian ID card holders.
Details →You are creating an agent workflow in a Microsoft Foundry project to support natural voice interactions. The agent must receive continuous audio input, convert the input into text for reasoning, and then return spoken responses to a user. The workflow must meet the following requirements: Support turn-taking dynamics, where the agent begins to generate the speech output before the user finishes speaking. Operate with low latency to maintain conversational experience. You need to enable both speech to text and text to speech in a real-time agent interaction. What should you do?
- AUse batch transcription to convert the audio input and return text responses from the agent.
- BUse real-time speech to text for incoming audio and text to speech for agent responses. ✓
- CUse an embeddings model to encode the audio, and then decode the audio into text and speech.
- DUse speech translation to convert the audio into another language and return the translated text.