Pre-Summer Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: validbest

Pass the Databricks Databricks-Certified-Data-Engineer-Associate Questions and answers with ValidTests

Exam Databricks-Certified-Data-Engineer-Associate All Questions
Exam Databricks-Certified-Data-Engineer-Associate Premium Access

View all detail and faqs for the Databricks-Certified-Data-Engineer-Associate exam

Viewing page 6 out of 6 pages
Viewing questions 51-60 out of questions
Questions # 51:

A data engineer wants to create a new table containing the names of customers that live in France.

They have written the following command:

Question # 51

A senior data engineer mentions that it is organization policy to include a table property indicating that the new table includes personally identifiable information (PII).

Which of the following lines of code fills in the above blank to successfully complete the task?

Options:

A.

There is no way to indicate whether a table contains PII.

B.

"COMMENT PII"

C.

TBLPROPERTIES PII

D.

COMMENT "Contains PII"

E.

PII

Expert Solution
Questions # 52:

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The cade block used by the data engineer is below:

Question # 52

If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the following lines of code should the data engineer use to fill in the blank?

Options:

A.

trigger("5 seconds")

B.

trigger()

C.

trigger(once="5 seconds")

D.

trigger(processingTime="5 seconds")

E.

trigger(continuous="5 seconds")

Expert Solution
Questions # 53:

A data engineer needs to create a table in Databricks using data from a CSV file at location /path/to/csv.

They run the following command:

Question # 53

Which of the following lines of code fills in the above blank to successfully complete the task?

Options:

A.

None of these lines of code are needed to successfully complete the task

B.

USING CSV

C.

FROM CSV

D.

USING DELTA

E.

FROM "path/to/csv"

Expert Solution
Questions # 54:

A Delta Live Table pipeline includes two datasets defined using streaming live table. Three datasets are defined against Delta Lake table sources using live table.

The table is configured to run in Production mode using the Continuous Pipeline Mode.

What is the expected outcome after clicking Start to update the pipeline assuming previously unprocessed data exists and all definitions are valid?

Options:

A.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

B.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

C.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

D.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped.

Expert Solution
Questions # 55:

Which of the following benefits is provided by the array functions from Spark SQL?

Options:

A.

An ability to work with data in a variety of types at once

B.

An ability to work with data within certain partitions and windows

C.

An ability to work with time-related data in specified intervals

D.

An ability to work with complex, nested data ingested from JSON files

E.

An ability to work with an array of tables for procedural automation

Expert Solution
Questions # 56:

A data engineer has been given a new record of data:

id STRING = 'a1'

rank INTEGER = 6

rating FLOAT = 9.4

Which of the following SQL commands can be used to append the new record to an existing Delta table my_table?

Options:

A.

INSERT INTO my_table VALUES ('a1', 6, 9.4)

B.

my_table UNION VALUES ('a1', 6, 9.4)

C.

INSERT VALUES ( 'a1' , 6, 9.4) INTO my_table

D.

UPDATE my_table VALUES ('a1', 6, 9.4)

E.

UPDATE VALUES ('a1', 6, 9.4) my_table

Expert Solution
Questions # 57:

A data engineer needs to create a table in Databricks using data from their organization's existing SQLite database. They run the following command:

CREATE TABLE jdbc_customer360

USING

OPTIONS (

url "jdbc:sqlite:/customers.db", dbtable "customer360"

)

Which line of code fills in the above blank to successfully complete the task?

Options:

A.

autoloader

B.

org.apache.spark.sql.jdbc

C.

sqlite

D.

org.apache.spark.sql.sqlite

Expert Solution
Questions # 58:

A data engineer runs a statement every day to copy the previous day’s sales into the table transactions. Each day’s sales are in their own file in the location "/transactions/raw".

Today, the data engineer runs the following command to complete this task:

Question # 58

After running the command today, the data engineer notices that the number of records in table transactions has not changed.

Which of the following describes why the statement might not have copied any new records into the table?

Options:

A.

The format of the files to be copied were not included with the FORMAT_OPTIONS keyword.

B.

The names of the files to be copied were not included with the FILES keyword.

C.

The previous day’s file has already been copied into the table.

D.

The PARQUET file format does not support COPY INTO.

E.

The COPY INTO statement requires the table to be refreshed to view the copied rows.

Expert Solution
Questions # 59:

A dataset has been defined using Delta Live Tables and includes an expectations clause:

CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION DROP ROW

What is the expected behavior when a batch of data containing data that violates these constraints is processed?

Options:

A.

Records that violate the expectation are dropped from the target dataset and loaded into a quarantine table.

B.

Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset.

C.

Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log.

D.

Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.

E.

Records that violate the expectation cause the job to fail.

Expert Solution
Questions # 60:

Which of the following is a benefit of the Databricks Lakehouse Platform embracing open source technologies?

Options:

A.

Cloud-specific integrations

B.

Simplified governance

C.

Ability to scale storage

D.

Ability to scale workloads

E.

Avoiding vendor lock-in

Expert Solution
Viewing page 6 out of 6 pages
Viewing questions 51-60 out of questions