Pass the Databricks Certification Databricks-Certified-Professional-Data-Engineer Questions and answers with ValidTests

Exam Databricks-Certified-Professional-Data-Engineer All Questions

Exam Databricks-Certified-Professional-Data-Engineer Premium Access

View all detail and faqs for the Databricks-Certified-Professional-Data-Engineer exam

Go to Exam

Viewing page 2 out of 7 pages

Viewing questions 11-20 out of questions

Questions # 11:

Which statement describes integration testing?

Options:

Validates interactions between subsystems of your application

Requires an automated testing framework

Requires manual intervention

Validates an application use case

Validates behavior of individual elements of your application

Expert Solution

Questions # 12:

The data engineering team maintains the following code:

Question # 12

Assuming that this code produces logically correct results and the data in the source table has been de-duplicated and validated, which statement describes what will occur when this code is executed?

Options:

The silver_customer_sales table will be overwritten by aggregated values calculated from all records in the gold_customer_lifetime_sales_summary table as a batch job.

A batch job will update the gold_customer_lifetime_sales_summary table, replacing only those rows that have different values than the current version of the table, using customer_id as the primary key.

The gold_customer_lifetime_sales_summary table will be overwritten by aggregated values calculated from all records in the silver_customer_sales table as a batch job.

An incremental job will leverage running information in the state store to update aggregate values in the gold_customer_lifetime_sales_summary table.

An incremental job will detect if new rows have been written to the silver_customer_sales table; if new rows are detected, all aggregates will be recalculated and used to overwrite the gold_customer_lifetime_sales_summary table.

Expert Solution

Questions # 13:

A CHECK constraint has been successfully added to the Delta table named activity_details using the following logic:

Question # 13

A batch job is attempting to insert new records to the table, including a record where latitude = 45.50 and longitude = 212.67.

Which statement describes the outcome of this batch insert?

Options:

The write will fail when the violating record is reached; any records previously processed will be recorded to the target table.

The write will fail completely because of the constraint violation and no records will be inserted into the target table.

The write will insert all records except those that violate the table constraints; the violating records will be recorded to a quarantine table.

The write will include all records in the target table; any violations will be indicated in the boolean column named valid_coordinates.

The write will insert all records except those that violate the table constraints; the violating records will be reported in a warning log.

Expert Solution

Questions # 14:

What is a method of installing a Python package scoped at the notebook level to all nodes in the currently active cluster?

Options:

Use &Pip install in a notebook cell

Run source env/bin/activate in a notebook setup script

Install libraries from PyPi using the cluster UI

Use &sh install in a notebook cell

Expert Solution

Questions # 15:

The data science team has created and logged a production model using MLflow. The following code correctly imports and applies the production model to output the predictions as a new DataFrame named preds with the schema "customer_id LONG, predictions DOUBLE, date DATE".

Question # 15

The data science team would like predictions saved to a Delta Lake table with the ability to compare all predictions across time. Churn predictions will be made at most once per day.

Which code block accomplishes this task while minimizing potential compute costs?

Options:

preds.write.mode("append").saveAsTable("churn_preds")

preds.write.format("delta").save("/preds/churn_preds")

Option A

Option B

Option C

Option D

Option E

Expert Solution

Questions # 16:

Which statement characterizes the general programming model used by Spark Structured Streaming?

Options:

Structured Streaming leverages the parallel processing of GPUs to achieve highly parallel data throughput.

Structured Streaming is implemented as a messaging bus and is derived from Apache Kafka.

Structured Streaming uses specialized hardware and I/O streams to achieve sub-second latency for data transfer.

Structured Streaming models new data arriving in a data stream as new rows appended to an unbounded table.

Structured Streaming relies on a distributed network of nodes that hold incremental state values for cached stages.

Expert Solution

Questions # 17:

A team of data engineer are adding tables to a DLT pipeline that contain repetitive expectations for many of the same data quality checks.

One member of the team suggests reusing these data quality rules across all tables defined for this pipeline.

What approach would allow them to do this?

Options:

Maintain data quality rules in a Delta table outside of this pipeline’s target schema, providing the schema name as a pipeline parameter.

Use global Python variables to make expectations visible across DLT notebooks included in the same pipeline.

Add data quality constraints to tables in this pipeline using an external job with access to pipeline configuration files.

Maintain data quality rules in a separate Databricks notebook that each DLT notebook of file.

Expert Solution

Questions # 18:

The view updates represents an incremental batch of all newly ingested data to be inserted or updated in the customers table.

The following logic is used to process these records.

Question # 18

Which statement describes this implementation?

Options:

The customers table is implemented as a Type 3 table; old values are maintained as a new column alongside the current value.

The customers table is implemented as a Type 2 table; old values are maintained but marked as no longer current and new values are inserted.

The customers table is implemented as a Type 0 table; all writes are append only with no changes to existing values.

The customers table is implemented as a Type 1 table; old values are overwritten by new values and no history is maintained.

The customers table is implemented as a Type 2 table; old values are overwritten and new customers are appended.

Expert Solution

Questions # 19:

A developer has successfully configured credential for Databricks Repos and cloned a remote Git repository. Hey don not have privileges to make changes to the main branch, which is the only branch currently visible in their workspace.

Use Response to pull changes from the remote Git repository commit and push changes to a branch that appeared as a changes were pulled.

Options:

Use Repos to merge all differences and make a pull request back to the remote repository.

Use repos to merge all difference and make a pull request back to the remote repository.

Use Repos to create a new branch commit all changes and push changes to the remote Git repertory.

Use repos to create a fork of the remote repository commit all changes and make a pull request on the source repository

Expert Solution

Questions # 20:

Which is a key benefit of an end-to-end test?

Options:

It closely simulates real world usage of your application.

It pinpoint errors in the building blocks of your application.

It provides testing coverage for all code paths and branches.

It makes it easier to automate your test suite

Expert Solution

Viewing page 2 out of 7 pages

Viewing questions 11-20 out of questions

Summer Certification Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: validbest

Pass the Databricks Certification Databricks-Certified-Professional-Data-Engineer Questions and answers with ValidTests

Exam Databricks-Certified-Professional-Data-Engineer Premium Access

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options: