Pass the Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Questions and answers with ValidTests

Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 All Questions

Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Premium Access

View all detail and faqs for the Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 exam

Go to Exam

Viewing page 2 out of 5 pages

Viewing questions 11-20 out of questions

Questions # 11:

Given the schema:

Question # 11

event_ts TIMESTAMP,

sensor_id STRING,

metric_value LONG,

ingest_ts TIMESTAMP,

source_file_path STRING

The goal is to deduplicate based on: event_ts, sensor_id, and metric_value.

Options:

dropDuplicates on all columns (wrong criteria)

dropDuplicates with no arguments (removes based on all columns)

groupBy without aggregation (invalid use)

dropDuplicates on the exact matching fields

Expert Solution

Questions # 12:

Given a CSV file with the content:

Question # 12

And the following code:

from pyspark.sql.types import *

schema = StructType([

StructField("name", StringType()),

StructField("age", IntegerType())

])

spark.read.schema(schema).csv(path).collect()

What is the resulting output?

Options:

[Row(name='bambi'), Row(name='alladin', age=20)]

[Row(name='alladin', age=20)]

[Row(name='bambi', age=None), Row(name='alladin', age=20)]

The code throws an error due to a schema mismatch.

Expert Solution

Questions # 13:

A developer wants to test Spark Connect with an existing Spark application.

What are the two alternative ways the developer can start a local Spark Connect server without changing their existing application code? (Choose 2 answers)

Options:

Execute their pyspark shell with the option--remote "https://localhost"

Execute their pyspark shell with the option--remote "sc://localhost"

Set the environment variableSPARK_REMOTE="sc://localhost"before starting the pyspark shell

Add.remote("sc://localhost")to their SparkSession.builder calls in their Spark code

Ensure the Spark propertyspark.connect.grpc.binding.portis set to 15002 in the application code

Expert Solution

Questions # 14:

A data analyst builds a Spark application to analyze finance data and performs the following operations:filter,select,groupBy, andcoalesce.

Which operation results in a shuffle?

Options:

groupBy

filter

select

coalesce

Expert Solution

Questions # 15:

Which command overwrites an existing JSON file when writing a DataFrame?

Options:

df.write.mode("overwrite").json("path/to/file")

df.write.overwrite.json("path/to/file")

df.write.json("path/to/file", overwrite=True)

df.write.format("json").save("path/to/file", mode="overwrite")

Expert Solution

Questions # 16:

What is a feature of Spark Connect?

Options:

It supports DataStreamReader, DataStreamWriter, StreamingQuery, and Streaming APIs

Supports DataFrame, Functions, Column, SparkContext PySpark APIs

It supports only PySpark applications

It has built-in authentication

Expert Solution

Questions # 17:

Given the code fragment:

Question # 17

import pyspark.pandas as ps

psdf = ps.DataFrame({'col1': [1, 2], 'col2': [3, 4]})

Which method is used to convert a Pandas API on Spark DataFrame (pyspark.pandas.DataFrame) into a standard PySpark DataFrame (pyspark.sql.DataFrame)?

Options:

psdf.to_spark()

psdf.to_pyspark()

psdf.to_pandas()

psdf.to_dataframe()

Expert Solution

Questions # 18:

Given:

python

CopyEdit

spark.sparkContext.setLogLevel("")

Which set contains the suitable configuration settings for Spark driver LOG_LEVELs?

Options:

ALL, DEBUG, FAIL, INFO

ERROR, WARN, TRACE, OFF

WARN, NONE, ERROR, FATAL

FATAL, NONE, INFO, DEBUG

Expert Solution

Questions # 19:

Given the code:

Question # 19

df = spark.read.csv("large_dataset.csv")

filtered_df = df.filter(col("error_column").contains("error"))

mapped_df = filtered_df.select(split(col("timestamp")," ").getItem(0).alias("date"), lit(1).alias("count"))

reduced_df = mapped_df.groupBy("date").sum("count")

reduced_df.count()

reduced_df.show()

At which point will Spark actually begin processing the data?

Options:

When the filter transformation is applied

When the count action is applied

When the groupBy transformation is applied

When the show action is applied

Expert Solution

Questions # 20:

A data scientist is working on a project that requires processing large amounts of structured data, performing SQL queries, and applying machine learning algorithms. The data scientist is considering using Apache Spark for this task.

Which combination of Apache Spark modules should the data scientist use in this scenario?

Options:

Spark DataFrames, Structured Streaming, and GraphX

Spark SQL, Pandas API on Spark, and Structured Streaming

Spark Streaming, GraphX, and Pandas API on Spark

Spark DataFrames, Spark SQL, and MLlib

Expert Solution

Viewing page 2 out of 5 pages

Viewing questions 11-20 out of questions

Pre-Summer Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: validbest

Pass the Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Questions and answers with ValidTests

Exam Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Premium Access

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options: