Pass the Databricks Certification Databricks-Certified-Professional-Data-Engineer Questions and answers with ValidTests

Exam Databricks-Certified-Professional-Data-Engineer All Questions

Exam Databricks-Certified-Professional-Data-Engineer Premium Access

View all detail and faqs for the Databricks-Certified-Professional-Data-Engineer exam

Go to Exam

Viewing page 6 out of 7 pages

Viewing questions 51-60 out of questions

Questions # 51:

A platform engineer is creating catalogs and schemas for the development team to use.

The engineer has created an initial catalog, catalog_A, and initial schema, schema_A. The engineer has also granted USE CATALOG, USE

SCHEMA, and CREATE TABLE to the development team so that the engineer can begin populating the schema with new tables.

Despite being owner of the catalog and schema, the engineer noticed that they do not have access to the underlying tables in Schema_A.

What explains the engineer's lack of access to the underlying tables?

Options:

The platform engineer needs to execute a REFRESH statement as the table permissions did not automatically update for owners.

Users granted with USE CATALOG can modify the owner's permissions to downstream tables.

The owner of the schema does not automatically have permission to tables within the schema, but can grant them to themselves at any point.

Permissions explicitly given by the table creator are the only way the Platform Engineer could access the underlying tables in their

schema.

Answer

Explanation

In Databricks, catalogs, schemas (or databases), and tables are managed through the Unity Catalog or Hive Metastore, depending on the environment. Permissions and ownership within these structures are governed by access control lists (ACLs).

Catalog and Schema Ownership: When a platform engineer creates a catalog (such as catalog_A) and schema (such as schema_A), they automatically become the owner of those entities. This ownership gives them control over granting permissions for those entities (i.e., granting the USE CATALOG and USE SCHEMA privileges to others). However, ownership of the catalog or schema does not automatically extend to ownership or permission of individual tables within that schema.

Table Permissions: For tables within a schema, the permission model is more granular. The table creator (i.e., whoever creates the table) is automatically assigned as the owner of that table. In this case, the platform engineer owns the schema but does not automatically inherit permissions to any table created within the schema unless explicitly granted by the table's owner or unless they grant permissions to themselves.

Why the Engineer Lacks Access: The platform engineer notices that they do not have access to the underlying tables in schema_A despite being the owner of the schema. This occurs because the schema's ownership does not cascade to the tables. The engineer must either:

Grant permissions to themselves for the tables in schema_A, or

Be granted permissions by whoever created the tables within the schema.

Resolution: As the owner of the schema, the platform engineer can easily grant themselves the required permissions (such as SELECT, INSERT, etc.) for the tables in the schema. This explains why the owner of a schema may not automatically have access to the tables and must take explicit steps to acquire those permissions.

References

Databricks Unity Catalog Documentation: Manage Permissions

[Databricks Permissions and Ownership](https://docs.databricks.com/security/access-control/workspace-acl.html#permissions

Questions # 52:

Incorporating unit tests into a PySpark application requires upfront attention to the design of your jobs, or a potentially significant refactoring of existing code.

Which statement describes a main benefit that offset this additional effort?

Options:

Improves the quality of your data

Validates a complete use case of your application

Troubleshooting is easier since all steps are isolated and tested individually

Yields faster deployment and execution times

Ensures that all steps interact correctly to achieve the desired end result

Questions # 53:

Which statement describes the default execution mode for Databricks Auto Loader?

Options:

New files are identified by listing the input directory; new files are incrementally and idempotently loaded into the target Delta Lake table.

Cloud vendor-specific queue storage and notification services are configured to track newly arriving files; new files are incrementally and impotently into the target Delta Lake table.

Webhook trigger Databricks job to run anytime new data arrives in a source directory; new data automatically merged into target tables using rules inferred from the data.

New files are identified by listing the input directory; the target table is materialized by directory querying all valid files in the source directory.

Questions # 54:

The data engineer is using Spark's MEMORY_ONLY storage level.

Which indicators should the data engineer look for in the spark UI's Storage tab to signal that a cached table is not performing optimally?

Options:

Size on Disk is> 0

The number of Cached Partitions> the number of Spark Partitions

The RDD Block Name included the '' annotation signaling failure to cache

On Heap Memory Usage is within 75% of off Heap Memory usage

Questions # 55:

A nightly job ingests data into a Delta Lake table using the following code:

The next step in the pipeline requires a function that returns an object that can be used to manipulate new records that have not yet been processed to the next table in the pipeline.

Which code snippet completes this function definition?

def new_records():

Options:

return spark.readStream.table("bronze")

return spark.readStream.load("bronze")

return spark.read.option("readChangeFeed", "true").table ("bronze")

Questions # 56:

Which configuration parameter directly affects the size of a spark-partition upon ingestion of data into Spark?

Options:

spark.sql.files.maxPartitionBytes

spark.sql.autoBroadcastJoinThreshold

spark.sql.files.openCostInBytes

spark.sql.adaptive.coalescePartitions.minPartitionNum

spark.sql.adaptive.advisoryPartitionSizeInBytes

Questions # 57:

A junior member of the data engineering team is exploring the language interoperability of Databricks notebooks. The intended outcome of the below code is to register a view of all sales that occurred in countries on the continent of Africa that appear in the geo_lookup table.

Before executing the code, running SHOW TABLES on the current database indicates the database contains only two tables: geo_lookup and sales.

Which statement correctly describes the outcome of executing these command cells in order in an interactive notebook?

Options:

Both commands will succeed. Executing show tables will show that countries at and sales at have been registered as views.

Cmd 1 will succeed. Cmd 2 will search all accessible databases for a table or view named countries af: if this entity exists, Cmd 2 will succeed.

Cmd 1 will succeed and Cmd 2 will fail, countries at will be a Python variable representing a PySpark DataFrame.

Both commands will fail. No new variables, tables, or views will be created.

Cmd 1 will succeed and Cmd 2 will fail, countries at will be a Python variable containing a list of strings.

Questions # 58:

A developer has successfully configured credential for Databricks Repos and cloned a remote Git repository. Hey don not have privileges to make changes to the main branch, which is the only branch currently visible in their workspace.

Use Response to pull changes from the remote Git repository commit and push changes to a branch that appeared as a changes were pulled.

Options:

Use Repos to merge all differences and make a pull request back to the remote repository.

Use repos to merge all difference and make a pull request back to the remote repository.

Use Repos to create a new branch commit all changes and push changes to the remote Git repertory.

Use repos to create a fork of the remote repository commit all changes and make a pull request on the source repository

Questions # 59:

A user wants to use DLT expectations to validate that a derived table report contains all records from the source, included in the table validation_copy.

The user attempts and fails to accomplish this by adding an expectation to the report table definition.

Which approach would allow using DLT expectations to validate all expected records are present in this table?

Options:

Define a SQL UDF that performs a left outer join on two tables, and check if this returns null values for report key values in a DLT expectation for the report table.

Define a function that performs a left outer join on validation_copy and report and report, and check against the result in a DLT expectation for the report table

Define a temporary table that perform a left outer join on validation_copy and report, and define an expectation that no report key values are null

Define a view that performs a left outer join on validation_copy and report, and reference this view in DLT expectations for the report table

Questions # 60:

A distributed team of data analysts share computing resources on an interactive cluster with autoscaling configured. In order to better manage costs and query throughput, the workspace administrator is hoping to evaluate whether cluster upscaling is caused by many concurrent users or resource-intensive queries.

In which location can one review the timeline for cluster resizing events?

Options:

Workspace audit logs

Driver's log file

Ganglia

Cluster Event Log

Executor's log file

Viewing page 6 out of 7 pages

Viewing questions 51-60 out of questions

Summer Certification Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: validbest

Pass the Databricks Certification Databricks-Certified-Professional-Data-Engineer Questions and answers with ValidTests

Exam Databricks-Certified-Professional-Data-Engineer Premium Access

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options: