Pre-Summer Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: validbest

Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 All Questions
Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 All Questions

View all questions & answers for the Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 exam

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question # 23 Topic 3 Discussion

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Exam Topic 3 Question 23 Discussion:
Question #: 23
Topic #: 3

A developer is working with a pandas DataFrame containing user behavior data from a web application.

Which approach should be used for executing agroupByoperation in parallel across all workers in Apache Spark 3.5?

A)

Use the applylnPandas API

B)

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 23

C)

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 23

D)

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 23


A.

Use theapplyInPandasAPI:

df.groupby("user_id").applyInPandas(mean_func, schema="user_id long, value double").show()


B.

Use themapInPandasAPI:

df.mapInPandas(mean_func, schema="user_id long, value double").show()


C.

Use a regular Spark UDF:

from pyspark.sql.functions import mean

df.groupBy("user_id").agg(mean("value")).show()


D.

Use a Pandas UDF:

@pandas_udf("double")

def mean_func(value: pd.Series) -> float:

return value.mean()

df.groupby("user_id").agg(mean_func(df["value"])).show()


Get Premium Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Questions

Contribute your Thoughts:


Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.