DP-203 Practice Questions — Page 11

Question 101

A company plans to use Platform-as-a-Service (PaaS) to create the new data pipeline process. The process must meet the following requirements:

Ingest:

✑ Access multiple data sources.

✑ Provide the ability to orchestrate workflow.

✑ Provide the capability to run SQL Server Integration Services packages.

Store:

✑ Optimize storage for big data workloads.

✑ Provide encryption of data at rest.

✑ Operate with no size limits.

Prepare and Train:

✑ Provide a fully-managed and interactive workspace for exploration and visualization.

✑ Provide the ability to program in R, SQL, Python, Scala, and Java.

Provide seamless user authentication with Azure Active Directory.

Model & Serve:

✑ Implement native columnar storage.

✑ Support for the SQL language

✑ Provide support for structured streaming.

You need to build the data integration pipeline.

Which technologies should you use? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

Question 102

Open question ↗

You have the following table named Employees.

You need to calculate the employee_type value based on the hire_date value.

How should you complete the Transact-SQL statement? To answer, drag the appropriate values to the correct targets. Each value may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.

NOTE: Each correct selection is worth one point.

Select and Place:

Question 103

Open question ↗

You have an Azure Synapse Analytics workspace named WS1.

You have an Azure Data Lake Storage Gen2 container that contains JSON-formatted files in the following format.

You need to use the serverless SQL pool in WS1 to read the files.

NOTE: Each correct selection is worth one point.

Select and Place:

Question 104

Open question ↗

You have an Apache Spark DataFrame named temperatures. A sample of the data is shown in the following table.

You need to produce the following table by using a Spark SQL query.

How should you complete the query? To answer, drag the appropriate values to the correct targets. Each value may be used once, more than once, or not at all.

You may need to drag the split bar between panes or scroll to view content.

NOTE: Each correct selection is worth one point.

Select and Place:

Question 105

Open question ↗

You have an Azure Data Factory that contains 10 pipelines.

You need to label each pipeline with its main purpose of either ingest, transform, or load. The labels must be available for grouping and filtering when using the monitoring experience in Data Factory.

What should you add to each pipeline?

A.a resource tag
B.a correlation ID
C.a run group ID
D.an annotation

Question 106

Open question ↗

The following code segment is used to create an Azure Databricks cluster.

For each of the following statements, select Yes if the statement is true. Otherwise, select No.

NOTE: Each correct selection is worth one point.

Hot Area:

Question 107

Open question ↗

You need to implement a Type 3 slowly changing dimension (SCD) for product category data in an Azure Synapse Analytics dedicated SQL pool.

You have a table that was created by using the following Transact-SQL statement.

Which two columns should you add to the table? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

A.[EffectiveStartDate] [datetime] NOT NULL,
B.[CurrentProductCategory] [nvarchar] (100) NOT NULL,
C.[EffectiveEndDate] [datetime] NULL,
D.[ProductCategory] [nvarchar] (100) NOT NULL,
E.[OriginalProductCategory] [nvarchar] (100) NOT NULL,

Question 108

Open question ↗

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You are designing an Azure Stream Analytics solution that will analyze Twitter data.

You need to count the tweets in each 10-second window. The solution must ensure that each tweet is counted only once.

Solution: You use a hopping window that uses a hop size of 5 seconds and a window size 10 seconds.

Does this meet the goal?

A.Yes
B.No

Question 109

Open question ↗

You are building an Azure Stream Analytics job to identify how much time a user spends interacting with a feature on a webpage.

The job receives events based on user actions on the webpage. Each row of data represents an event. Each event has a type of either 'start' or 'end'.

You need to calculate the duration between start and end events.

How should you complete the query? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

Question 110

Open question ↗

You are creating an Azure Data Factory data flow that will ingest data from a CSV file, cast columns to specified types of data, and insert the data into a table in an

Azure Synapse Analytic dedicated SQL pool. The CSV file contains three columns named username, comment, and date.

The data flow already contains the following:

✑ A source transformation.

✑ A Derived Column transformation to set the appropriate types of data.

✑ A sink transformation to land the data in the pool.

You need to ensure that the data flow meets the following requirements:

✑ All valid rows must be written to the destination table.

✑ Truncation errors in the comment column must be avoided proactively.

✑ Any rows containing comment values that will cause truncation errors upon insert must be written to a file in blob storage.

Which two actions should you perform? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

A.To the data flow, add a sink transformation to write the rows to a file in blob storage.
B.To the data flow, add a Conditional Split transformation to separate the rows that will cause truncation errors.
C.To the data flow, add a filter transformation to filter out rows that will cause truncation errors.
D.Add a select transformation to select only the rows that will cause truncation errors.