Certbus > Databricks > Databricks Certification > DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK > DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Online Practice Questions and Answers

DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Online Practice Questions and Answers

Questions 4

Which of the following code blocks stores DataFrame itemsDf in executor memory and, if insufficient memory is available, serializes it and saves it to disk?

A. itemsDf.persist(StorageLevel.MEMORY_ONLY)

B. itemsDf.cache(StorageLevel.MEMORY_AND_DISK)

C. itemsDf.store()

D. itemsDf.cache()

E. itemsDf.write.option('destination', 'memory').save()

Browse 180 Q&As
Questions 5

The code block shown below should add column transactionDateForm to DataFrame transactionsDf. The column should express the unix-format timestamps in column transactionDate as string type like Apr 26 (Sunday). Choose the answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__(__2__, from_unixtime(__3__, __4__))

A. 1. withColumn

2.

"transactionDateForm"

3.

"MMM d (EEEE)"

4.

"transactionDate"

B. 1. select

2.

"transactionDate"

3.

"transactionDateForm"

4.

"MMM d (EEEE)"

C. 1. withColumn

2.

"transactionDateForm"

3.

"transactionDate"

4.

"MMM d (EEEE)"

D. 1. withColumn

2.

"transactionDateForm"

3.

"transactionDate"

4.

"MM d (EEE)"

E. 1. withColumnRenamed

2.

"transactionDate"

3.

"transactionDateForm"

4.

"MM d (EEE)"

Browse 180 Q&As
Questions 6

Which of the following code blocks stores a part of the data in DataFrame itemsDf on executors?

A. itemsDf.cache().count()

B. itemsDf.cache(eager=True)

C. cache(itemsDf)

D. itemsDf.cache().filter()

E. itemsDf.rdd.storeCopy()

Browse 180 Q&As
Questions 7

Which of the following code blocks reads in parquet file /FileStore/imports.parquet as a DataFrame?

A. spark.mode("parquet").read("/FileStore/imports.parquet")

B. spark.read.path("/FileStore/imports.parquet", source="parquet")

C. spark.read().parquet("/FileStore/imports.parquet")

D. spark.read.parquet("/FileStore/imports.parquet")

E. spark.read().format('parquet').open("/FileStore/imports.parquet")

Browse 180 Q&As
Questions 8

Which of the following code blocks removes all rows in the 6-column DataFrame transactionsDf that have missing data in at least 3 columns?

A. transactionsDf.dropna("any")

B. transactionsDf.dropna(thresh=4)

C. transactionsDf.drop.na("",2)

D. transactionsDf.dropna(thresh=2)

E. transactionsDf.dropna("",4)

Browse 180 Q&As
Questions 9

The code block shown below should return the number of columns in the CSV file stored at location filePath. From the CSV file, only lines should be read that do not start with a # character. Choose the answer that correctly fills the blanks in the code block to accomplish this.

Code block:

__1__(__2__.__3__.csv(filePath, __4__).__5__)

A. 1. size

2.

spark

3.

read()

4.

escape='#'

5.

columns

B. 1. DataFrame

2.

spark

3.

read()

4.

escape='#'

5.

shape[0]

C. 1. len

2.

pyspark

3.

DataFrameReader

4.

comment='#'

5.

columns

D. 1. size

2.

pyspark

3.

DataFrameReader

4.

comment='#'

5.

columns

E. 1. len

2.

spark

3.

read

4.

comment='#'

5.

columns

Browse 180 Q&As
Questions 10

Which of the following describes the conversion of a computational query into an execution plan in Spark?

A. Spark uses the catalog to resolve the optimized logical plan.

B. The catalog assigns specific resources to the optimized memory plan.

C. The executed physical plan depends on a cost optimization from a previous stage.

D. Depending on whether DataFrame API or SQL API are used, the physical plan may differ.

E. The catalog assigns specific resources to the physical plan.

Browse 180 Q&As
Questions 11

Which of the following describes Spark's standalone deployment mode?

A. Standalone mode uses a single JVM to run Spark driver and executor processes.

B. Standalone mode means that the cluster does not contain the driver.

C. Standalone mode is how Spark runs on YARN and Mesos clusters.

D. Standalone mode uses only a single executor per worker per application.

E. Standalone mode is a viable solution for clusters that run multiple frameworks, not only Spark.

Browse 180 Q&As
Questions 12

Which of the following code blocks returns a copy of DataFrame itemsDf where the column supplier has been renamed to manufacturer?

A. itemsDf.withColumn(["supplier", "manufacturer"])

B. itemsDf.withColumn("supplier").alias("manufacturer")

C. itemsDf.withColumnRenamed("supplier", "manufacturer")

D. itemsDf.withColumnRenamed(col("manufacturer"), col("supplier"))

E. itemsDf.withColumnsRenamed("supplier", "manufacturer")

Browse 180 Q&As
Questions 13

Which of the following describes Spark's way of managing memory?

A. Spark uses a subset of the reserved system memory.

B. Storage memory is used for caching partitions derived from DataFrames.

C. As a general rule for garbage collection, Spark performs better on many small objects than few big objects.

D. Disabling serialization potentially greatly reduces the memory footprint of a Spark application.

E. Spark's memory usage can be divided into three categories: Execution, transaction, and storage.

Browse 180 Q&As
Questions 14

Which of the following code blocks returns only rows from DataFrame transactionsDf in which values in column productId are unique?

A. transactionsDf.distinct("productId")

B. transactionsDf.dropDuplicates(subset=["productId"])

C. transactionsDf.drop_duplicates(subset="productId")

D. transactionsDf.unique("productId")

E. transactionsDf.dropDuplicates(subset="productId")

Browse 180 Q&As
Questions 15

Which of the following code blocks reads the parquet file stored at filePath into DataFrame itemsDf, using a valid schema for the sample of itemsDf shown below?

Sample of itemsDf:

1.+------+-----------------------------+-------------------+

2.|itemId|attributes |supplier |

3.+------+-----------------------------+-------------------+

4.|1 |[blue, winter, cozy] |Sports Company Inc.|

5.|2 |[red, summer, fresh, cooling]|YetiX |

6.|3 |[green, summer, travel] |Sports Company Inc.|

7.+------+-----------------------------+-------------------+

A. 1.itemsDfSchema = StructType([

2.

StructField("itemId", IntegerType()),

3.

StructField("attributes", StringType()),

4.

StructField("supplier", StringType())])

5.

6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)

B. 1.itemsDfSchema = StructType([

2.

StructField("itemId", IntegerType),

3.

StructField("attributes", ArrayType(StringType)),

4.

StructField("supplier", StringType)])

5.

6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)

C. 1.itemsDf = spark.read.schema('itemId integer, attributes , supplier string').parquet(filePath)

D. 1.itemsDfSchema = StructType([

2.

StructField("itemId", IntegerType()),

3.

StructField("attributes", ArrayType(StringType())),

4.

StructField("supplier", StringType())])

5.

6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)

E. 1.itemsDfSchema = StructType([

2.

StructField("itemId", IntegerType()),

3.

StructField("attributes", ArrayType([StringType()])),

4.

StructField("supplier", StringType())])

5.

6.itemsDf = spark.read(schema=itemsDfSchema).parquet(filePath)

Browse 180 Q&As
Questions 16

Which of the following code blocks returns the number of unique values in column storeId of DataFrame transactionsDf?

A. transactionsDf.select("storeId").dropDuplicates().count()

B. transactionsDf.select(count("storeId")).dropDuplicates()

C. transactionsDf.select(distinct("storeId")).count()

D. transactionsDf.dropDuplicates().agg(count("storeId"))

E. transactionsDf.distinct().select("storeId").count()

Browse 180 Q&As
Questions 17

The code block shown below should return a column that indicates through boolean variables whether rows in DataFrame transactionsDf have values greater or equal to 20 and smaller or equal to 30 in column storeId and have the value 2 in column productId. Choose the answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__((__2__.__3__) __4__ (__5__))

A. 1. select

2.

col("storeId")

3.

between(20, 30)

4.

and

5.

col("productId")==2

B. 1. where

2.

col("storeId")

3.

geq(20).leq(30)

4.

and

5.

col("productId")==2

C. 1. select

2.

"storeId"

3.

between(20, 30)

4.

andand

5.

col("productId")==2

D. 1. select

2.

col("storeId")

3.

between(20, 30)

4.

andand

5.

col("productId")=2

E. 1. select

2.

col("storeId")

3.

between(20, 30)

4.

and

5.

col("productId")==2

Browse 180 Q&As
Questions 18

The code block displayed below contains an error. The code block should use Python method find_most_freq_letter to find the letter present most in column itemName of DataFrame itemsDf and return it in a new column most_frequent_letter. Find the error.

Code block:

1.

find_most_freq_letter_udf = udf(find_most_freq_letter)

2.

itemsDf.withColumn("most_frequent_letter", find_most_freq_letter("itemName"))

A. Spark is not using the UDF method correctly.

B. The UDF method is not registered correctly, since the return type is missing.

C. The "itemName" expression should be wrapped in col().

D. UDFs do not exist in PySpark.

E. Spark is not adding a column.

Browse 180 Q&As
Exam Name: Databricks Certified Associate Developer for Apache Spark 3.0
Last Update:
Questions: 180 Q&As

PDF

$45.99

VCE

$49.99

PDF + VCE

$59.99