Pass the Databricks Databricks-Certified-Data-Engineer-Associate Questions and answers with ValidTests

Exam Databricks-Certified-Data-Engineer-Associate All Questions

Exam Databricks-Certified-Data-Engineer-Associate Premium Access

View all detail and faqs for the Databricks-Certified-Data-Engineer-Associate exam

Go to Exam

Viewing page 6 out of 6 pages

Viewing questions 51-60 out of questions

Questions # 51:

A data engineer wants to create a new table containing the names of customers that live in France.

They have written the following command:

Question # 51

A senior data engineer mentions that it is organization policy to include a table property indicating that the new table includes personally identifiable information (PII).

Which of the following lines of code fills in the above blank to successfully complete the task?

Options:

There is no way to indicate whether a table contains PII.

"COMMENT PII"

TBLPROPERTIES PII

COMMENT "Contains PII"

PII

Expert Solution

Questions # 52:

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The cade block used by the data engineer is below:

Question # 52

If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the following lines of code should the data engineer use to fill in the blank?

Options:

trigger("5 seconds")

trigger()

trigger(once="5 seconds")

trigger(processingTime="5 seconds")

trigger(continuous="5 seconds")

Expert Solution

Questions # 53:

A data engineer needs to create a table in Databricks using data from a CSV file at location /path/to/csv.

They run the following command:

Question # 53

Which of the following lines of code fills in the above blank to successfully complete the task?

Options:

None of these lines of code are needed to successfully complete the task

USING CSV

FROM CSV

USING DELTA

FROM "path/to/csv"

Expert Solution

Questions # 54:

A Delta Live Table pipeline includes two datasets defined using streaming live table. Three datasets are defined against Delta Lake table sources using live table.

The table is configured to run in Production mode using the Continuous Pipeline Mode.

What is the expected outcome after clicking Start to update the pipeline assuming previously unprocessed data exists and all definitions are valid?

Options:

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped.

Expert Solution

Questions # 55:

Which of the following benefits is provided by the array functions from Spark SQL?

Options:

An ability to work with data in a variety of types at once

An ability to work with data within certain partitions and windows

An ability to work with time-related data in specified intervals

An ability to work with complex, nested data ingested from JSON files

An ability to work with an array of tables for procedural automation

Expert Solution

Questions # 56:

A data engineer has been given a new record of data:

id STRING = 'a1'

rank INTEGER = 6

rating FLOAT = 9.4

Which of the following SQL commands can be used to append the new record to an existing Delta table my_table?

Options:

INSERT INTO my_table VALUES ('a1', 6, 9.4)

my_table UNION VALUES ('a1', 6, 9.4)

INSERT VALUES ( 'a1' , 6, 9.4) INTO my_table

UPDATE my_table VALUES ('a1', 6, 9.4)

UPDATE VALUES ('a1', 6, 9.4) my_table

Expert Solution

Questions # 57:

A data engineer needs to create a table in Databricks using data from their organization's existing SQLite database. They run the following command:

CREATE TABLE jdbc_customer360

USING

OPTIONS (

url "jdbc:sqlite:/customers.db", dbtable "customer360"

)

Which line of code fills in the above blank to successfully complete the task?

Options:

autoloader

org.apache.spark.sql.jdbc

sqlite

org.apache.spark.sql.sqlite

Expert Solution

Questions # 58:

A data engineer runs a statement every day to copy the previous day’s sales into the table transactions. Each day’s sales are in their own file in the location "/transactions/raw".

Today, the data engineer runs the following command to complete this task:

Question # 58

After running the command today, the data engineer notices that the number of records in table transactions has not changed.

Which of the following describes why the statement might not have copied any new records into the table?

Options:

The format of the files to be copied were not included with the FORMAT_OPTIONS keyword.

The names of the files to be copied were not included with the FILES keyword.

The previous day’s file has already been copied into the table.

The PARQUET file format does not support COPY INTO.

The COPY INTO statement requires the table to be refreshed to view the copied rows.

Expert Solution

Questions # 59:

A dataset has been defined using Delta Live Tables and includes an expectations clause:

CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION DROP ROW

What is the expected behavior when a batch of data containing data that violates these constraints is processed?

Options:

Records that violate the expectation are dropped from the target dataset and loaded into a quarantine table.

Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset.

Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log.

Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.

Records that violate the expectation cause the job to fail.

Expert Solution

Questions # 60:

Which of the following is a benefit of the Databricks Lakehouse Platform embracing open source technologies?

Options:

Cloud-specific integrations

Simplified governance

Ability to scale storage

Ability to scale workloads

Avoiding vendor lock-in

Expert Solution

Viewing page 6 out of 6 pages

Viewing questions 51-60 out of questions

Summer Certification Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: validbest

Pass the Databricks Databricks-Certified-Data-Engineer-Associate Questions and answers with ValidTests

Exam Databricks-Certified-Data-Engineer-Associate Premium Access

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options: