Pass the Databricks Certification Databricks-Certified-Professional-Data-Engineer Questions and answers with ValidTests

Exam Databricks-Certified-Professional-Data-Engineer All Questions

Exam Databricks-Certified-Professional-Data-Engineer Premium Access

View all detail and faqs for the Databricks-Certified-Professional-Data-Engineer exam

Go to Exam

Viewing page 4 out of 7 pages

Viewing questions 31-40 out of questions

Questions # 31:

The data governance team is reviewing user for deleting records for compliance with GDPR. The following logic has been implemented to propagate deleted requests from the user_lookup table to the user aggregate table.

Question # 31

Assuming that user_id is a unique identifying key and that all users have requested deletion have been removed from the user_lookup table, which statement describes whether successfully executing the above logic guarantees that the records to be deleted from the user_aggregates table are no longer accessible and why?

Options:

No: files containing deleted records may still be accessible with time travel until a BACUM command is used to remove invalidated data files.

Yes: Delta Lake ACID guarantees provide assurance that the DELETE command successed fully and permanently purged these records.

No: the change data feed only tracks inserts and updates not deleted records.

No: the Delta Lake DELETE command only provides ACID guarantees when combined with the MERGE INTO command

Expert Solution

Questions # 32:

A data engineer wants to reflector the following DLT code, which includes multiple definition with very similar code:

Question # 32

In an attempt to programmatically create these tables using a parameterized table definition, the data engineer writes the following code.

Question # 32

The pipeline runs an update with this refactored code, but generates a different DAG showing incorrect configuration values for tables.

How can the data engineer fix this?

Options:

Convert the list of configuration values to a dictionary of table settings, using table names as keys.

Convert the list of configuration values to a dictionary of table settings, using different input the for loop.

Load the configuration values for these tables from a separate file, located at a path provided by a pipeline parameter.

Wrap the loop inside another table definition, using generalized names and properties to replace with those from the inner table

Expert Solution

Questions # 33:

To reduce storage and compute costs, the data engineering team has been tasked with curating a series of aggregate tables leveraged by business intelligence dashboards, customer-facing applications, production machine learning models, and ad hoc analytical queries.

The data engineering team has been made aware of new requirements from a customer-facing application, which is the only downstream workload they manage entirely. As a result, an aggregate table used by numerous teams across the organization will need to have a number of fields renamed, and additional fields will also be added.

Which of the solutions addresses the situation while minimally interrupting other teams in the organization without increasing the number of tables that need to be managed?

Options:

Send all users notice that the schema for the table will be changing; include in the communication the logic necessary to revert the new table schema to match historic queries.

Configure a new table with all the requisite fields and new names and use this as the source for the customer-facing application; create a view that maintains the original data schema and table name by aliasing select fields from the new table.

Create a new table with the required schema and new fields and use Delta Lake's deep clone functionality to sync up changes committed to one table to the corresponding table.

Replace the current table definition with a logical view defined with the query logic currently writing the aggregate table; create a new table to power the customer-facing application.

Add a table comment warning all users that the table schema and field names will be changing on a given date; overwrite the table in place to the specifications of the customer-facing application.

Expert Solution

Questions # 34:

A Delta Lake table was created with the below query:

Question # 34

Consider the following query:

DROP TABLE prod.sales_by_store -

If this statement is executed by a workspace admin, which result will occur?

Options:

Nothing will occur until a COMMIT command is executed.

The table will be removed from the catalog but the data will remain in storage.

The table will be removed from the catalog and the data will be deleted.

An error will occur because Delta Lake prevents the deletion of production data.

Data will be marked as deleted but still recoverable with Time Travel.

Expert Solution

Questions # 35:

A Spark job is taking longer than expected. Using the Spark UI, a data engineer notes that the Min, Median, and Max Durations for tasks in a particular stage show the minimum and median time to complete a task as roughly the same, but the max duration for a task to be roughly 100 times as long as the minimum.

Which situation is causing increased duration of the overall job?

Options:

Task queueing resulting from improper thread pool assignment.

Spill resulting from attached volume storage being too small.

Network latency due to some cluster nodes being in different regions from the source data

Skew caused by more data being assigned to a subset of spark-partitions.

Credential validation errors while pulling data from an external system.

Expert Solution

Questions # 36:

The business intelligence team has a dashboard configured to track various summary metrics for retail stories. This includes total sales for the previous day alongside totals and averages for a variety of time periods. The fields required to populate this dashboard have the following schema:

Question # 36

For Demand forecasting, the Lakehouse contains a validated table of all itemized sales updated incrementally in near real-time. This table named products_per_order, includes the following fields:

Question # 36

Because reporting on long-term sales trends is less volatile, analysts using the new dashboard only require data to be refreshed once daily. Because the dashboard will be queried interactively by many users throughout a normal business day, it should return results quickly and reduce total compute associated with each materialization.

Which solution meets the expectations of the end users while controlling and limiting possible costs?

Options:

Use the Delta Cache to persists the products_per_order table in memory to quickly the dashboard with each query.

Populate the dashboard by configuring a nightly batch job to save the required to quickly update the dashboard with each query.

Use Structure Streaming to configure a live dashboard against the products_per_order table within a Databricks notebook.

Define a view against the products_per_order table and define the dashboard against this view.

Expert Solution

Questions # 37:

What is a method of installing a Python package scoped at the notebook level to all nodes in the currently active cluster?

Options:

Use &Pip install in a notebook cell

Run source env/bin/activate in a notebook setup script

Install libraries from PyPi using the cluster UI

Use &sh install in a notebook cell

Questions # 38:

The DevOps team has configured a production workload as a collection of notebooks scheduled to run daily using the Jobs Ul. A new data engineering hire is onboarding to the team and has requested access to one of these notebooks to review the production logic.

What are the maximum notebook permissions that can be granted to the user without allowing accidental changes to production code or data?

Options:

Can manage

Can edit

Can run

Can Read

Questions # 39:

Options:

No: files containing deleted records may still be accessible with time travel until a BACUM command is used to remove invalidated data files.

Yes: Delta Lake ACID guarantees provide assurance that the DELETE command successed fully and permanently purged these records.

No: the change data feed only tracks inserts and updates not deleted records.

No: the Delta Lake DELETE command only provides ACID guarantees when combined with the MERGE INTO command

Questions # 40:

Which of the solutions addresses the situation while minimally interrupting other teams in the organization without increasing the number of tables that need to be managed?

Options:

Send all users notice that the schema for the table will be changing; include in the communication the logic necessary to revert the new table schema to match historic queries.

Create a new table with the required schema and new fields and use Delta Lake's deep clone functionality to sync up changes committed to one table to the corresponding table.

Replace the current table definition with a logical view defined with the query logic currently writing the aggregate table; create a new table to power the customer-facing application.

Add a table comment warning all users that the table schema and field names will be changing on a given date; overwrite the table in place to the specifications of the customer-facing application.

Viewing page 4 out of 7 pages

Viewing questions 31-40 out of questions

Summer Certification Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: validbest

Pass the Databricks Certification Databricks-Certified-Professional-Data-Engineer Questions and answers with ValidTests

Exam Databricks-Certified-Professional-Data-Engineer Premium Access

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options: