March Sale Special - 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: c4sdisc65

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF

$38.5

$109.99

3 Months Free Update

  • Printable Format
  • Value of Money
  • 100% Pass Assurance
  • Verified Answers
  • Researched by Industry Experts
  • Based on Real Exams Scenarios
  • 100% Real Questions

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF + Testing Engine

$61.6

$175.99

3 Months Free Update

  • Exam Name: Databricks Certified Associate Developer for Apache Spark 3.0 Exam
  • Last Update: Mar 28, 2024
  • Questions and Answers: 180
  • Free Real Questions Demo
  • Recommended by Industry Experts
  • Best Economical Package
  • Immediate Access

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Engine

$46.2

$131.99

3 Months Free Update

  • Best Testing Engine
  • One Click installation
  • Recommended by Teachers
  • Easy to use
  • 3 Modes of Learning
  • State of Art Technology
  • 100% Real Questions included

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Databricks Certified Associate Developer for Apache Spark 3.0 Exam Questions and Answers

Question # 6

Which of the following code blocks stores a part of the data in DataFrame itemsDf on executors?

A.

itemsDf.cache().count()

B.

itemsDf.cache(eager=True)

C.

cache(itemsDf)

D.

itemsDf.cache().filter()

E.

itemsDf.rdd.storeCopy()

Full Access
Question # 7

Which of the following statements about Spark's configuration properties is incorrect?

A.

The maximum number of tasks that an executor can process at the same time is controlled by the spark.task.cpus property.

B.

The maximum number of tasks that an executor can process at the same time is controlled by the spark.executor.cores property.

C.

The default value for spark.sql.autoBroadcastJoinThreshold is 10MB.

D.

The default number of partitions to use when shuffling data for joins or aggregations is 300.

E.

The default number of partitions returned from certain transformations can be controlled by the spark.default.parallelism property.

Full Access
Question # 8

Which of the following code blocks reduces a DataFrame from 12 to 6 partitions and performs a full shuffle?

A.

DataFrame.repartition(12)

B.

DataFrame.coalesce(6).shuffle()

C.

DataFrame.coalesce(6)

D.

DataFrame.coalesce(6, shuffle=True)

E.

DataFrame.repartition(6)

Full Access
Question # 9

Which of the following code blocks reads in parquet file /FileStore/imports.parquet as a DataFrame?

A.

spark.mode("parquet").read("/FileStore/imports.parquet")

B.

spark.read.path("/FileStore/imports.parquet", source="parquet")

C.

spark.read().parquet("/FileStore/imports.parquet")

D.

spark.read.parquet("/FileStore/imports.parquet")

E.

spark.read().format('parquet').open("/FileStore/imports.parquet")

Full Access
Question # 10

Which of the following code blocks returns a DataFrame that has all columns of DataFrame transactionsDf and an additional column predErrorSquared which is the squared value of column

predError in DataFrame transactionsDf?

A.

transactionsDf.withColumn("predError", pow(col("predErrorSquared"), 2))

B.

transactionsDf.withColumnRenamed("predErrorSquared", pow(predError, 2))

C.

transactionsDf.withColumn("predErrorSquared", pow(col("predError"), lit(2)))

D.

transactionsDf.withColumn("predErrorSquared", pow(predError, lit(2)))

E.

transactionsDf.withColumn("predErrorSquared", "predError"**2)

Full Access
Question # 11

Which of the following is a problem with using accumulators?

A.

Only unnamed accumulators can be inspected in the Spark UI.

B.

Only numeric values can be used in accumulators.

C.

Accumulator values can only be read by the driver, but not by executors.

D.

Accumulators do not obey lazy evaluation.

E.

Accumulators are difficult to use for debugging because they will only be updated once, independent if a task has to be re-run due to hardware failure.

Full Access
Question # 12

Which of the following describes Spark actions?

A.

Writing data to disk is the primary purpose of actions.

B.

Actions are Spark's way of exchanging data between executors.

C.

The driver receives data upon request by actions.

D.

Stage boundaries are commonly established by actions.

E.

Actions are Spark's way of modifying RDDs.

Full Access
Question # 13

Which of the following code blocks returns a one-column DataFrame of all values in column supplier of DataFrame itemsDf that do not contain the letter X? In the DataFrame, every value should

only be listed once.

Sample of DataFrame itemsDf:

1.+------+--------------------+--------------------+-------------------+

2.|itemId| itemName| attributes| supplier|

3.+------+--------------------+--------------------+-------------------+

4.| 1|Thick Coat for Wa...|[blue, winter, cozy]|Sports Company Inc.|

5.| 2|Elegant Outdoors ...|[red, summer, fre...| YetiX|

6.| 3| Outdoors Backpack|[green, summer, t...|Sports Company Inc.|

7.+------+--------------------+--------------------+-------------------+

A.

itemsDf.filter(col(supplier).not_contains('X')).select(supplier).distinct()

B.

itemsDf.select(~col('supplier').contains('X')).distinct()

C.

itemsDf.filter(not(col('supplier').contains('X'))).select('supplier').unique()

D.

itemsDf.filter(~col('supplier').contains('X')).select('supplier').distinct()

E.

itemsDf.filter(!col('supplier').contains('X')).select(col('supplier')).unique()

Full Access
Question # 14

Which of the following code blocks returns a copy of DataFrame transactionsDf in which column productId has been renamed to productNumber?

A.

transactionsDf.withColumnRenamed("productId", "productNumber")

B.

transactionsDf.withColumn("productId", "productNumber")

C.

transactionsDf.withColumnRenamed("productNumber", "productId")

D.

transactionsDf.withColumnRenamed(col(productId), col(productNumber))

E.

transactionsDf.withColumnRenamed(productId, productNumber)

Full Access
Question # 15

The code block shown below should return a one-column DataFrame where the column storeId is converted to string type. Choose the answer that correctly fills the blanks in the code block to

accomplish this.

transactionsDf.__1__(__2__.__3__(__4__))

A.

1. select

2. col("storeId")

3. cast

4. StringType

B.

1. select

2. col("storeId")

3. as

4. StringType

C.

1. cast

2. "storeId"

3. as

4. StringType()

D.

1. select

2. col("storeId")

3. cast

4. StringType()

E.

1. select

2. storeId

3. cast

4. StringType()

Full Access
Question # 16

Which of the following code blocks returns a DataFrame showing the mean value of column "value" of DataFrame transactionsDf, grouped by its column storeId?

A.

transactionsDf.groupBy(col(storeId).avg())

B.

transactionsDf.groupBy("storeId").avg(col("value"))

C.

transactionsDf.groupBy("storeId").agg(avg("value"))

D.

transactionsDf.groupBy("storeId").agg(average("value"))

E.

transactionsDf.groupBy("value").average()

Full Access
Question # 17

Which of the following statements about executors is correct?

A.

Executors are launched by the driver.

B.

Executors stop upon application completion by default.

C.

Each node hosts a single executor.

D.

Executors store data in memory only.

E.

An executor can serve multiple applications.

Full Access
Question # 18

Which of the following is a characteristic of the cluster manager?

A.

Each cluster manager works on a single partition of data.

B.

The cluster manager receives input from the driver through the SparkContext.

C.

The cluster manager does not exist in standalone mode.

D.

The cluster manager transforms jobs into DAGs.

E.

In client mode, the cluster manager runs on the edge node.

Full Access
Question # 19

Which of the following statements about lazy evaluation is incorrect?

A.

Predicate pushdown is a feature resulting from lazy evaluation.

B.

Execution is triggered by transformations.

C.

Spark will fail a job only during execution, but not during definition.

D.

Accumulators do not change the lazy evaluation model of Spark.

E.

Lineages allow Spark to coalesce transformations into stages

Full Access
Question # 20

The code block displayed below contains an error. The code block is intended to return all columns of DataFrame transactionsDf except for columns predError, productId, and value. Find the error.

Excerpt of DataFrame transactionsDf:

transactionsDf.select(~col("predError"), ~col("productId"), ~col("value"))

A.

The select operator should be replaced by the drop operator and the arguments to the drop operator should be column names predError, productId and value wrapped in the col operator so they

should be expressed like drop(col(predError), col(productId), col(value)).

B.

The select operator should be replaced with the deselect operator.

C.

The column names in the select operator should not be strings and wrapped in the col operator, so they should be expressed like select(~col(predError), ~col(productId), ~col(value)).

D.

The select operator should be replaced by the drop operator.

E.

The select operator should be replaced by the drop operator and the arguments to the drop operator should be column names predError, productId and value as strings.

(Correct)

Full Access
Question # 21

Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

A.

1, 10

B.

1, 8

C.

10

D.

7, 9, 10

E.

1, 4, 6, 9

Full Access
Question # 22

Which of the following are valid execution modes?

A.

Kubernetes, Local, Client

B.

Client, Cluster, Local

C.

Server, Standalone, Client

D.

Cluster, Server, Local

E.

Standalone, Client, Cluster

Full Access
Question # 23

Which of the following code blocks creates a new DataFrame with 3 columns, productId, highest, and lowest, that shows the biggest and smallest values of column value per value in column

productId from DataFrame transactionsDf?

Sample of DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.| 4| null| null| 3| 2|null|

8.| 5| null| null| null| 2|null|

9.| 6| 3| 2| 25| 2|null|

10.+-------------+---------+-----+-------+---------+----+

A.

transactionsDf.max('value').min('value')

B.

transactionsDf.agg(max('value').alias('highest'), min('value').alias('lowest'))

C.

transactionsDf.groupby(col(productId)).agg(max(col(value)).alias("highest"), min(col(value)).alias("lowest"))

D.

transactionsDf.groupby('productId').agg(max('value').alias('highest'), min('value').alias('lowest'))

E.

transactionsDf.groupby("productId").agg({"highest": max("value"), "lowest": min("value")})

Full Access
Question # 24

Which of the following describes Spark's way of managing memory?

A.

Spark uses a subset of the reserved system memory.

B.

Storage memory is used for caching partitions derived from DataFrames.

C.

As a general rule for garbage collection, Spark performs better on many small objects than few big objects.

D.

Disabling serialization potentially greatly reduces the memory footprint of a Spark application.

E.

Spark's memory usage can be divided into three categories: Execution, transaction, and storage.

Full Access
Question # 25

Which of the following describes the role of the cluster manager?

A.

The cluster manager schedules tasks on the cluster in client mode.

B.

The cluster manager schedules tasks on the cluster in local mode.

C.

The cluster manager allocates resources to Spark applications and maintains the executor processes in client mode.

D.

The cluster manager allocates resources to Spark applications and maintains the executor processes in remote mode.

E.

The cluster manager allocates resources to the DataFrame manager.

Full Access
Question # 26

Which of the following describes the role of tasks in the Spark execution hierarchy?

A.

Tasks are the smallest element in the execution hierarchy.

B.

Within one task, the slots are the unit of work done for each partition of the data.

C.

Tasks are the second-smallest element in the execution hierarchy.

D.

Stages with narrow dependencies can be grouped into one task.

E.

Tasks with wide dependencies can be grouped into one stage.

Full Access
Question # 27

Which of the following code blocks returns a DataFrame where columns predError and productId are removed from DataFrame transactionsDf?

Sample of DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId|f |

3.+-------------+---------+-----+-------+---------+----+

4.|1 |3 |4 |25 |1 |null|

5.|2 |6 |7 |2 |2 |null|

6.|3 |3 |null |25 |3 |null|

7.+-------------+---------+-----+-------+---------+----+

A.

transactionsDf.withColumnRemoved("predError", "productId")

B.

transactionsDf.drop(["predError", "productId", "associateId"])

C.

transactionsDf.drop("predError", "productId", "associateId")

D.

transactionsDf.dropColumns("predError", "productId", "associateId")

E.

transactionsDf.drop(col("predError", "productId"))

Full Access