Pass the Databricks Databricks-Certified-Data-Engineer-Associate Questions and answers with ValidTests

Exam Databricks-Certified-Data-Engineer-Associate All Questions

Exam Databricks-Certified-Data-Engineer-Associate Premium Access

View all detail and faqs for the Databricks-Certified-Data-Engineer-Associate exam

Go to Exam

Viewing page 4 out of 6 pages

Viewing questions 31-40 out of questions

Questions # 31:

A data engineer has been using a Databricks SQL dashboard to monitor the cleanliness of the input data to an ELT job. The ELT job has its Databricks SQL query that returns the number of input records containing unexpected NULL values. The data engineer wants their entire team to be notified via a messaging webhook whenever this value reaches 100.

Which of the following approaches can the data engineer use to notify their entire team via a messaging webhook whenever the number of NULL values reaches 100?

Options:

They can set up an Alert with a custom template.

They can set up an Alert with a new email alert destination.

They can set up an Alert with a new webhook alert destination.

They can set up an Alert with one-time notifications.

They can set up an Alert without notifications.

Expert Solution

Questions # 32:

In order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing, which of the following two approaches is used by Spark to record the offset range of the data being processed in each trigger?

Options:

Checkpointing and Write-ahead Logs

Structured Streaming cannot record the offset range of the data being processed in each trigger.

Replayable Sources and Idempotent Sinks

Write-ahead Logs and Idempotent Sinks

Checkpointing and Idempotent Sinks

Expert Solution

Questions # 33:

A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values.

Which of the following describes why Auto Loader inferred all of the columns to be of the string type?

Options:

There was a type mismatch between the specific schema and the inferred schema

JSON data is a text-based format

Auto Loader only works with string data

All of the fields had at least one null value

Auto Loader cannot infer the schema of ingested data

Expert Solution

Questions # 34:

A data engineer has been using a Databricks SQL dashboard to monitor the cleanliness of the input data to a data analytics dashboard for a retail use case. The job has a Databricks SQL query that returns the number of store-level records where sales is equal to zero. The data engineer wants their entire team to be notified via a messaging webhook whenever this value is greater than 0.

Which of the following approaches can the data engineer use to notify their entire team via a messaging webhook whenever the number of stores with $0 in sales is greater than zero?

Options:

They can set up an Alert with a custom template.

They can set up an Alert with a new email alert destination.

They can set up an Alert with one-time notifications.

They can set up an Alert with a new webhook alert destination.

They can set up an Alert without notifications.

Expert Solution

Questions # 35:

Identify how the count_if function and the count where x is null can be used

Consider a table random_values with below data.

What would be the output of below query?

select count_if(col > 1) as count_a. count(*) as count_b.count(col1) as count_c from random_values col1

NULL -

Options:

3 6 5

4 6 5

3 6 6

4 6 6

Expert Solution

Questions # 36:

A new data engineering team team has been assigned to an ELT project. The new data engineering team will need full privileges on the table sales to fully manage the project.

Which of the following commands can be used to grant full permissions on the database to the new data engineering team?

Options:

GRANT ALL PRIVILEGES ON TABLE sales TO team;

GRANT SELECT CREATE MODIFY ON TABLE sales TO team;

GRANT SELECT ON TABLE sales TO team;

GRANT USAGE ON TABLE sales TO team;

GRANT ALL PRIVILEGES ON TABLE team TO sales;

Expert Solution

Questions # 37:

A data analyst has a series of queries in a SQL program. The data analyst wants this program to run every day. They only want the final query in the program to run on Sundays. They ask for help from the data engineering team to complete this task.

Which of the following approaches could be used by the data engineering team to complete this task?

Options:

They could submit a feature request with Databricks to add this functionality.

They could wrap the queries using PySpark and use Python’s control flow system to determine when to run the final query.

They could only run the entire program on Sundays.

They could automatically restrict access to the source table in the final query so that it is only accessible on Sundays.

They could redesign the data model to separate the data used in the final query into a new table.

Expert Solution

Questions # 38:

Which of the following Structured Streaming queries is performing a hop from a Silver table to a Gold table?

Options:

Expert Solution

Questions # 39:

A data engineer needs to create a table in Databricks using data from their organization’s existing SQLite database.

They run the following command:

Question # 39

Which of the following lines of code fills in the above blank to successfully complete the task?

Options:

org.apache.spark.sql.jdbc

autoloader

DELTA

sqlite

org.apache.spark.sql.sqlite

Expert Solution

Questions # 40:

Which of the following describes a scenario in which a data engineer will want to use a single-node cluster?

Options:

When they are working interactively with a small amount of data

When they are running automated reports to be refreshed as quickly as possible

When they are working with SQL within Databricks SQL

When they are concerned about the ability to automatically scale with larger data

When they are manually running reports with a large amount of data

Expert Solution

Answer

Explanation

The scenario in which a data engineer will want to use a single-node cluster is when they are working interactively with a small amount of data. A single-node cluster is a cluster consisting of an Apache Spark driver and no Spark workers1. A single-node cluster supports Spark jobs and all Spark data sources, including Delta Lake1. A single-node cluster is helpful for single-node machine learning workloads that use Spark to load and save data, and for lightweight exploratory data analysis1. A single-node cluster can run Spark locally, spawn one executor thread per logical core in the cluster, and save all log output in the driver log1. A single-node cluster can be created by selecting the Single Node button when configuring a cluster1.

The other options are not suitable for using a single-node cluster. When running automated reports to be refreshed as quickly as possible, a data engineer will want to use a multi-node cluster that can scale up and down automatically based on the workload demand2. When working with SQL within Databricks SQL, a data engineer will want to use a SQL Endpoint that can execute SQL queries on a serverless pool or an existing cluster3. When concerned about the ability to automatically scale with larger data, a data engineer will want to use a multi-node cluster that can leverage the Databricks Lakehouse Platform and the Delta Engine to handle large-scale data processing efficiently and reliably4. When manually running reports with a large amount of data, a data engineer will want to use a multi-node cluster that can distribute the computation across multiple workers and leverage the Spark UI to monitor the performance and troubleshoot the issues.

Viewing page 4 out of 6 pages

Viewing questions 31-40 out of questions

Summer Certification Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: validbest

Pass the Databricks Databricks-Certified-Data-Engineer-Associate Questions and answers with ValidTests

Exam Databricks-Certified-Data-Engineer-Associate Premium Access

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options: