Pre-Summer Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: validbest

Pass the Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Questions and answers with ValidTests

Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 All Questions
Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Premium Access

View all detail and faqs for the Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 exam

Viewing page 5 out of 5 pages
Viewing questions 41-50 out of questions
Questions # 41:

22 of 55.

A Spark application needs to read multiple Parquet files from a directory where the files have differing but compatible schemas.

The data engineer wants to create a DataFrame that includes all columns from all files.

Which code should the data engineer use to read the Parquet files and include all columns using Apache Spark?

Options:

A.

spark.read.parquet("/data/parquet/")

B.

spark.read.option("mergeSchema", True).parquet("/data/parquet/")

C.

spark.read.format("parquet").option("inferSchema", "true").load("/data/parquet/")

D.

spark.read.parquet("/data/parquet/").option("mergeAllCols", True)

Questions # 42:

Which UDF implementation calculates the length of strings in a Spark DataFrame?

Options:

A.

df.withColumn("length", spark.udf("len", StringType()))

B.

df.select(length(col("stringColumn")).alias("length"))

C.

spark.udf.register("stringLength", lambda s: len(s))

D.

df.withColumn("length", udf(lambda s: len(s), StringType()))

Questions # 43:

49 of 55.

In the code block below, aggDF contains aggregations on a streaming DataFrame:

aggDF.writeStream \

.format("console") \

.outputMode("???") \

.start()

Which output mode at line 3 ensures that the entire result table is written to the console during each trigger execution?

Options:

A.

AGGREGATE

B.

COMPLETE

C.

REPLACE

D.

APPEND

Questions # 44:

1 of 55. A data scientist wants to ingest a directory full of plain text files so that each record in the output DataFrame contains the entire contents of a single file and the full path of the file the text was read from.

The first attempt does read the text files, but each record contains a single line. This code is shown below:

txt_path = "/datasets/raw_txt/*"

df = spark.read.text(txt_path) # one row per line by default

df = df.withColumn("file_path", input_file_name()) # add full path

Which code change can be implemented in a DataFrame that meets the data scientist's requirements?

Options:

A.

Add the option wholetext to the text() function.

B.

Add the option lineSep to the text() function.

C.

Add the option wholetext=False to the text() function.

D.

Add the option lineSep=", " to the text() function.

Questions # 45:

A Spark developer is building an app to monitor task performance. They need to track the maximum task processing time per worker node and consolidate it on the driver for analysis.

Which technique should be used?

Options:

A.

Use an RDD action like reduce() to compute the maximum time

B.

Use an accumulator to record the maximum time on the driver

C.

Broadcast a variable to share the maximum time among workers

D.

Configure the Spark UI to automatically collect maximum times

Questions # 46:

A Spark application is experiencing performance issues in client mode because the driver is resource-constrained.

How should this issue be resolved?

Options:

A.

Add more executor instances to the cluster

B.

Increase the driver memory on the client machine

C.

Switch the deployment mode to cluster mode

D.

Switch the deployment mode to local mode

Questions # 47:

Given a DataFrame df that has 10 partitions, after running the code:

result = df.coalesce(20)

How many partitions will the result DataFrame have?

Options:

A.

10

B.

Same number as the cluster executors

C.

1

D.

20

Questions # 48:

17 of 55.

A data engineer has noticed that upgrading the Spark version in their applications from Spark 3.0 to Spark 3.5 has improved the runtime of some scheduled Spark applications.

Looking further, the data engineer realizes that Adaptive Query Execution (AQE) is now enabled.

Which operation should AQE be implementing to automatically improve the Spark application performance?

Options:

A.

Dynamically switching join strategies

B.

Collecting persistent table statistics and storing them in the metastore for future use

C.

Improving the performance of single-stage Spark jobs

D.

Optimizing the layout of Delta files on disk

Questions # 49:

37 of 55.

A data scientist is working with a Spark DataFrame called customerDF that contains customer information.

The DataFrame has a column named email with customer email addresses.

The data scientist needs to split this column into username and domain parts.

Which code snippet splits the email column into username and domain columns?

Options:

A.

customerDF = customerDF \

.withColumn("username", split(col("email"), "@").getItem(0)) \

.withColumn("domain", split(col("email"), "@").getItem(1))

B.

customerDF = customerDF.withColumn("username", regexp_replace(col("email"), "@", ""))

C.

customerDF = customerDF.select("email").alias("username", "domain")

D.

customerDF = customerDF.withColumn("domain", col("email").split("@")[1])

Questions # 50:

40 of 55.

A developer wants to refactor older Spark code to take advantage of built-in functions introduced in Spark 3.5.

The original code:

from pyspark.sql import functions as F

min_price = 110.50

result_df = prices_df.filter(F.col("price") > min_price).agg(F.count("*"))

Which code block should the developer use to refactor the code?

Options:

A.

result_df = prices_df.filter(F.col("price") > F.lit(min_price)).agg(F.count("*"))

B.

result_df = prices_df.where(F.lit("price") > min_price).groupBy().count()

C.

result_df = prices_df.withColumn("valid_price", when(col("price") > F.lit(min_price), True))

D.

result_df = prices_df.filter(F.lit(min_price) > F.col("price")).count()

Viewing page 5 out of 5 pages
Viewing questions 41-50 out of questions