Weekend Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: simple70

Pass the Databricks Certified Data Engineer Professional Exam Databricks-Certified-Professional-Data-Engineer Questions and answers with ValidTests

Viewing page 1 out of 4 pages
Viewing questions 1-10 out of questions
Questions # 1:

A Delta Lake table was created with the below query:

Question # 1

Realizing that the original query had a typographical error, the below code was executed:

ALTER TABLE prod.sales_by_stor RENAME TO prod.sales_by_store

Which result will occur after running the second command?

Options:

A.

The table reference in the metastore is updated and no data is changed.

B.

The table name change is recorded in the Delta transaction log.

C.

All related files and metadata are dropped and recreated in a single ACID transaction.

D.

The table reference in the metastore is updated and all data files are moved.

E.

A new Delta transaction log Is created for the renamed table.

Expert Solution
Questions # 2:

The data governance team has instituted a requirement that all tables containing Personal Identifiable Information (PH) must be clearly annotated. This includes adding column comments, table comments, and setting the custom table property "contains_pii" = true.

The following SQL DDL statement is executed to create a new table:

Question # 2

Which command allows manual confirmation that these three requirements have been met?

Options:

A.

DESCRIBE EXTENDED dev.pii test

B.

DESCRIBE DETAIL dev.pii test

C.

SHOW TBLPROPERTIES dev.pii test

D.

DESCRIBE HISTORY dev.pii test

E.

SHOW TABLES dev

Expert Solution
Questions # 3:

The data architect has mandated that all tables in the Lakehouse should be configured as external Delta Lake tables.

Which approach will ensure that this requirement is met?

Options:

A.

Whenever a database is being created, make sure that the location keyword is used

B.

When configuring an external data warehouse for all table storage. leverage Databricks for all ELT.

C.

Whenever a table is being created, make sure that the location keyword is used.

D.

When tables are created, make sure that the external keyword is used in the create table statement.

E.

When the workspace is being configured, make sure that external cloud object storage has been mounted.

Expert Solution
Questions # 4:

Which statement regarding spark configuration on the Databricks platform is true?

Options:

A.

Spark configuration properties set for an interactive cluster with the Clusters UI will impact all notebooks attached to that cluster.

B.

When the same spar configuration property is set for an interactive to the same interactive cluster.

C.

Spark configuration set within an notebook will affect all SparkSession attached to the same interactive cluster

D.

The Databricks REST API can be used to modify the Spark configuration properties for an interactive cluster without interrupting jobs.

Expert Solution
Questions # 5:

A Delta Lake table representing metadata about content posts from users has the following schema:

user_id LONG, post_text STRING, post_id STRING, longitude FLOAT, latitude FLOAT, post_time TIMESTAMP, date DATE

This table is partitioned by the date column. A query is run with the following filter:

longitude < 20 & longitude > -20

Which statement describes how data will be filtered?

Options:

A.

Statistics in the Delta Log will be used to identify partitions that might Include files in the filtered range.

B.

No file skipping will occur because the optimizer does not know the relationship between the partition column and the longitude.

C.

The Delta Engine will use row-level statistics in the transaction log to identify the flies that meet the filter criteria.

D.

Statistics in the Delta Log will be used to identify data files that might include records in the filtered range.

E.

The Delta Engine will scan the parquet file footers to identify each row that meets the filter criteria.

Expert Solution
Questions # 6:

Each configuration below is identical to the extent that each cluster has 400 GB total of RAM, 160 total cores and only one Executor per VM.

Given a job with at least one wide transformation, which of the following cluster configurations will result in maximum performance?

Options:

A.

• Total VMs; 1

• 400 GB per Executor

• 160 Cores / Executor

B.

• Total VMs: 8

• 50 GB per Executor

• 20 Cores / Executor

C.

• Total VMs: 4

• 100 GB per Executor

• 40 Cores/Executor

D.

• Total VMs:2

• 200 GB per Executor

• 80 Cores / Executor

Expert Solution
Questions # 7:

In order to facilitate near real-time workloads, a data engineer is creating a helper function to leverage the schema detection and evolution functionality of Databricks Auto Loader. The desired function will automatically detect the schema of the source directly, incrementally process JSON files as they arrive in a source directory, and automatically evolve the schema of the table when new fields are detected.

The function is displayed below with a blank:

Question # 7

Which response correctly fills in the blank to meet the specified requirements?

Question # 7

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Expert Solution
Questions # 8:

In order to prevent accidental commits to production data, a senior data engineer has instituted a policy that all development work will reference clones of Delta Lake tables. After testing both deep and shallow clone, development tables are created using shallow clone.

A few weeks after initial table creation, the cloned versions of several tables implemented as Type 1 Slowly Changing Dimension (SCD) stop working. The transaction logs for the source tables show that vacuum was run the day before.

Why are the cloned tables no longer working?

Options:

A.

The data files compacted by vacuum are not tracked by the cloned metadata; running refresh on the cloned table will pull in recent changes.

B.

Because Type 1 changes overwrite existing records, Delta Lake cannot guarantee data consistency for cloned tables.

C.

The metadata created by the clone operation is referencing data files that were purged as invalid by the vacuum command

D.

Running vacuum automatically invalidates any shallow clones of a table; deep clone should always be used when a cloned table will be repeatedly queried.

Expert Solution
Questions # 9:

A new data engineer notices that a critical field was omitted from an application that writes its Kafka source to Delta Lake. This happened even though the critical field was in the Kafka source. That field was further missing from data written to dependent, long-term storage. The retention threshold on the Kafka service is seven days. The pipeline has been in production for three months.

Which describes how Delta Lake can help to avoid data loss of this nature in the future?

Options:

A.

The Delta log and Structured Streaming checkpoints record the full history of the Kafka producer.

B.

Delta Lake schema evolution can retroactively calculate the correct value for newly added fields, as long as the data was in the original source.

C.

Delta Lake automatically checks that all fields present in the source data are included in the ingestion layer.

D.

Data can never be permanently dropped or deleted from Delta Lake, so data loss is not possible under any circumstance.

E.

Ingestine all raw data and metadata from Kafka to a bronze Delta table creates a permanent, replayable history of the data state.

Expert Solution
Questions # 10:

Which of the following is true of Delta Lake and the Lakehouse?

Options:

A.

Because Parquet compresses data row by row. strings will only be compressed when a character is repeated multiple times.

B.

Delta Lake automatically collects statistics on the first 32 columns of each table which are leveraged in data skipping based on query filters.

C.

Views in the Lakehouse maintain a valid cache of the most recent versions of source tables at all times.

D.

Primary and foreign key constraints can be leveraged to ensure duplicate values are never entered into a dimension table.

E.

Z-order can only be applied to numeric values stored in Delta Lake tables

Expert Solution
Viewing page 1 out of 4 pages
Viewing questions 1-10 out of questions