Pre-Summer Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: validbest

Pass the Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Questions and answers with ValidTests

Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 All Questions
Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Premium Access

View all detail and faqs for the Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 exam

Viewing page 2 out of 5 pages
Viewing questions 11-20 out of questions
Questions # 11:

Given the schema:

Question # 11

event_ts TIMESTAMP,

sensor_id STRING,

metric_value LONG,

ingest_ts TIMESTAMP,

source_file_path STRING

The goal is to deduplicate based on: event_ts, sensor_id, and metric_value.

Options:

Options:

A.

dropDuplicates on all columns (wrong criteria)

B.

dropDuplicates with no arguments (removes based on all columns)

C.

groupBy without aggregation (invalid use)

D.

dropDuplicates on the exact matching fields

Expert Solution
Questions # 12:

Given a CSV file with the content:

Question # 12

And the following code:

from pyspark.sql.types import *

schema = StructType([

StructField("name", StringType()),

StructField("age", IntegerType())

])

spark.read.schema(schema).csv(path).collect()

What is the resulting output?

Options:

A.

[Row(name='bambi'), Row(name='alladin', age=20)]

B.

[Row(name='alladin', age=20)]

C.

[Row(name='bambi', age=None), Row(name='alladin', age=20)]

D.

The code throws an error due to a schema mismatch.

Expert Solution
Questions # 13:

A developer wants to test Spark Connect with an existing Spark application.

What are the two alternative ways the developer can start a local Spark Connect server without changing their existing application code? (Choose 2 answers)

Options:

A.

Execute their pyspark shell with the option--remote "https://localhost"

B.

Execute their pyspark shell with the option--remote "sc://localhost"

C.

Set the environment variableSPARK_REMOTE="sc://localhost"before starting the pyspark shell

D.

Add.remote("sc://localhost")to their SparkSession.builder calls in their Spark code

E.

Ensure the Spark propertyspark.connect.grpc.binding.portis set to 15002 in the application code

Expert Solution
Questions # 14:

A data analyst builds a Spark application to analyze finance data and performs the following operations:filter,select,groupBy, andcoalesce.

Which operation results in a shuffle?

Options:

A.

groupBy

B.

filter

C.

select

D.

coalesce

Expert Solution
Questions # 15:

Which command overwrites an existing JSON file when writing a DataFrame?

Options:

A.

df.write.mode("overwrite").json("path/to/file")

B.

df.write.overwrite.json("path/to/file")

C.

df.write.json("path/to/file", overwrite=True)

D.

df.write.format("json").save("path/to/file", mode="overwrite")

Expert Solution
Questions # 16:

What is a feature of Spark Connect?

Options:

A.

It supports DataStreamReader, DataStreamWriter, StreamingQuery, and Streaming APIs

B.

Supports DataFrame, Functions, Column, SparkContext PySpark APIs

C.

It supports only PySpark applications

D.

It has built-in authentication

Expert Solution
Questions # 17:

Given the code fragment:

Question # 17

import pyspark.pandas as ps

psdf = ps.DataFrame({'col1': [1, 2], 'col2': [3, 4]})

Which method is used to convert a Pandas API on Spark DataFrame (pyspark.pandas.DataFrame) into a standard PySpark DataFrame (pyspark.sql.DataFrame)?

Options:

A.

psdf.to_spark()

B.

psdf.to_pyspark()

C.

psdf.to_pandas()

D.

psdf.to_dataframe()

Expert Solution
Questions # 18:

Given:

python

CopyEdit

spark.sparkContext.setLogLevel("")

Which set contains the suitable configuration settings for Spark driver LOG_LEVELs?

Options:

A.

ALL, DEBUG, FAIL, INFO

B.

ERROR, WARN, TRACE, OFF

C.

WARN, NONE, ERROR, FATAL

D.

FATAL, NONE, INFO, DEBUG

Expert Solution
Questions # 19:

Given the code:

Question # 19

df = spark.read.csv("large_dataset.csv")

filtered_df = df.filter(col("error_column").contains("error"))

mapped_df = filtered_df.select(split(col("timestamp")," ").getItem(0).alias("date"), lit(1).alias("count"))

reduced_df = mapped_df.groupBy("date").sum("count")

reduced_df.count()

reduced_df.show()

At which point will Spark actually begin processing the data?

Options:

A.

When the filter transformation is applied

B.

When the count action is applied

C.

When the groupBy transformation is applied

D.

When the show action is applied

Expert Solution
Questions # 20:

A data scientist is working on a project that requires processing large amounts of structured data, performing SQL queries, and applying machine learning algorithms. The data scientist is considering using Apache Spark for this task.

Which combination of Apache Spark modules should the data scientist use in this scenario?

Options:

Options:

A.

Spark DataFrames, Structured Streaming, and GraphX

B.

Spark SQL, Pandas API on Spark, and Structured Streaming

C.

Spark Streaming, GraphX, and Pandas API on Spark

D.

Spark DataFrames, Spark SQL, and MLlib

Expert Solution
Viewing page 2 out of 5 pages
Viewing questions 11-20 out of questions