Pass the Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Questions and answers with ValidTests

Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 All Questions

Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Premium Access

View all detail and faqs for the Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 exam

Go to Exam

Viewing page 5 out of 5 pages

Viewing questions 41-50 out of questions

Questions # 41:

22 of 55.

A Spark application needs to read multiple Parquet files from a directory where the files have differing but compatible schemas.

The data engineer wants to create a DataFrame that includes all columns from all files.

Which code should the data engineer use to read the Parquet files and include all columns using Apache Spark?

Options:

spark.read.parquet("/data/parquet/")

spark.read.option("mergeSchema", True).parquet("/data/parquet/")

spark.read.format("parquet").option("inferSchema", "true").load("/data/parquet/")

spark.read.parquet("/data/parquet/").option("mergeAllCols", True)

Questions # 42:

Which UDF implementation calculates the length of strings in a Spark DataFrame?

Options:

df.withColumn("length", spark.udf("len", StringType()))

df.select(length(col("stringColumn")).alias("length"))

spark.udf.register("stringLength", lambda s: len(s))

df.withColumn("length", udf(lambda s: len(s), StringType()))

Questions # 43:

49 of 55.

In the code block below, aggDF contains aggregations on a streaming DataFrame:

aggDF.writeStream \

.format("console") \

.outputMode("???") \

.start()

Which output mode at line 3 ensures that the entire result table is written to the console during each trigger execution?

Options:

AGGREGATE

COMPLETE

REPLACE

APPEND

Questions # 44:

1 of 55. A data scientist wants to ingest a directory full of plain text files so that each record in the output DataFrame contains the entire contents of a single file and the full path of the file the text was read from.

The first attempt does read the text files, but each record contains a single line. This code is shown below:

txt_path = "/datasets/raw_txt/*"

df = spark.read.text(txt_path) # one row per line by default

df = df.withColumn("file_path", input_file_name()) # add full path

Which code change can be implemented in a DataFrame that meets the data scientist's requirements?

Options:

Add the option wholetext to the text() function.

Add the option lineSep to the text() function.

Add the option wholetext=False to the text() function.

Add the option lineSep=", " to the text() function.

Answer

Questions # 45:

A Spark developer is building an app to monitor task performance. They need to track the maximum task processing time per worker node and consolidate it on the driver for analysis.

Which technique should be used?

Options:

Use an RDD action like reduce() to compute the maximum time

Use an accumulator to record the maximum time on the driver

Broadcast a variable to share the maximum time among workers

Configure the Spark UI to automatically collect maximum times

Questions # 46:

A Spark application is experiencing performance issues in client mode because the driver is resource-constrained.

How should this issue be resolved?

Options:

Add more executor instances to the cluster

Increase the driver memory on the client machine

Switch the deployment mode to cluster mode

Switch the deployment mode to local mode

Questions # 47:

Given a DataFrame df that has 10 partitions, after running the code:

result = df.coalesce(20)

How many partitions will the result DataFrame have?

Options:

Same number as the cluster executors

Questions # 48:

17 of 55.

A data engineer has noticed that upgrading the Spark version in their applications from Spark 3.0 to Spark 3.5 has improved the runtime of some scheduled Spark applications.

Looking further, the data engineer realizes that Adaptive Query Execution (AQE) is now enabled.

Which operation should AQE be implementing to automatically improve the Spark application performance?

Options:

Dynamically switching join strategies

Collecting persistent table statistics and storing them in the metastore for future use

Improving the performance of single-stage Spark jobs

Optimizing the layout of Delta files on disk

Questions # 49:

37 of 55.

A data scientist is working with a Spark DataFrame called customerDF that contains customer information.

The DataFrame has a column named email with customer email addresses.

The data scientist needs to split this column into username and domain parts.

Which code snippet splits the email column into username and domain columns?

Options:

customerDF = customerDF \

.withColumn("username", split(col("email"), "@").getItem(0)) \

.withColumn("domain", split(col("email"), "@").getItem(1))

customerDF = customerDF.withColumn("username", regexp_replace(col("email"), "@", ""))

customerDF = customerDF.select("email").alias("username", "domain")

customerDF = customerDF.withColumn("domain", col("email").split("@")[1])

Questions # 50:

40 of 55.

A developer wants to refactor older Spark code to take advantage of built-in functions introduced in Spark 3.5.

The original code:

from pyspark.sql import functions as F

min_price = 110.50

result_df = prices_df.filter(F.col("price") > min_price).agg(F.count("*"))

Which code block should the developer use to refactor the code?

Options:

result_df = prices_df.filter(F.col("price") > F.lit(min_price)).agg(F.count("*"))

result_df = prices_df.where(F.lit("price") > min_price).groupBy().count()

result_df = prices_df.withColumn("valid_price", when(col("price") > F.lit(min_price), True))

result_df = prices_df.filter(F.lit(min_price) > F.col("price")).count()

Viewing page 5 out of 5 pages

Viewing questions 41-50 out of questions

Pre-Summer Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: validbest

Pass the Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Questions and answers with ValidTests

Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Premium Access

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options: