Weekend Special - 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: spcl70

Databricks-Certified-Data-Engineer-Associate PDF

$33

$109.99

3 Months Free Update

  • Printable Format
  • Value of Money
  • 100% Pass Assurance
  • Verified Answers
  • Researched by Industry Experts
  • Based on Real Exams Scenarios
  • 100% Real Questions

Databricks-Certified-Data-Engineer-Associate PDF + Testing Engine

$52.8

$175.99

3 Months Free Update

  • Exam Name: Databricks Certified Data Engineer Associate Exam
  • Last Update: Dec 8, 2024
  • Questions and Answers: 99
  • Free Real Questions Demo
  • Recommended by Industry Experts
  • Best Economical Package
  • Immediate Access

Databricks-Certified-Data-Engineer-Associate Engine

$39.6

$131.99

3 Months Free Update

  • Best Testing Engine
  • One Click installation
  • Recommended by Teachers
  • Easy to use
  • 3 Modes of Learning
  • State of Art Technology
  • 100% Real Questions included

Databricks-Certified-Data-Engineer-Associate Practice Exam Questions with Answers Databricks Certified Data Engineer Associate Exam Certification

Question # 6

A data architect has determined that a table of the following format is necessary:

Databricks-Certified-Data-Engineer-Associate question answer

Which of the following code blocks uses SQL DDL commands to create an empty Delta table in the above format regardless of whether a table already exists with this name?

Databricks-Certified-Data-Engineer-Associate question answer

A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Full Access
Question # 7

Which of the following tools is used by Auto Loader process data incrementally?

A.

Checkpointing

B.

Spark Structured Streaming

C.

Data Explorer

D.

Unity Catalog

E.

Databricks SQL

Full Access
Question # 8

Which of the following is a benefit of the Databricks Lakehouse Platform embracing open source technologies?

A.

Cloud-specific integrations

B.

Simplified governance

C.

Ability to scale storage

D.

Ability to scale workloads

E.

Avoiding vendor lock-in

Full Access
Question # 9

Which of the following data workloads will utilize a Gold table as its source?

A.

A job that enriches data by parsing its timestamps into a human-readable format

B.

A job that aggregates uncleaned data to create standard summary statistics

C.

A job that cleans data by removing malformatted records

D.

A job that queries aggregated data designed to feed into a dashboard

E.

A job that ingests raw data from a streaming source into the Lakehouse

Full Access
Question # 10

A data engineer wants to schedule their Databricks SQL dashboard to refresh every hour, but they only want the associated SQL endpoint to be running when It is necessary. The dashboard has multiple queries on multiple datasets associated with it. The data that feeds the dashboard is automatically processed using a Databricks Job.

Which approach can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

A.

O They can reduce the cluster size of the SQL endpoint.

B.

Q They can turn on the Auto Stop feature for the SQL endpoint.

C.

O They can set up the dashboard's SQL endpoint to be serverless.

D.

0 They can ensure the dashboard's SQL endpoint matches each of the queries' SQL endpoints.

Full Access
Question # 11

Which of the following benefits is provided by the array functions from Spark SQL?

A.

An ability to work with data in a variety of types at once

B.

An ability to work with data within certain partitions and windows

C.

An ability to work with time-related data in specified intervals

D.

An ability to work with complex, nested data ingested from JSON files

E.

An ability to work with an array of tables for procedural automation

Full Access
Question # 12

A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests to ensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL.

Which of the following commands could the data engineering team use to access sales in PySpark?

A.

SELECT * FROM sales

B.

There is no way to share data between PySpark and SQL.

C.

spark.sql("sales")

D.

spark.delta.table("sales")

E.

spark.table("sales")

Full Access
Question # 13

A data engineer has a Python variable table_name that they would like to use in a SQL query. They want to construct a Python code block that will run the query using table_name.

They have the following incomplete code block:

____(f"SELECT customer_id, spend FROM {table_name}")

Which of the following can be used to fill in the blank to successfully complete the task?

A.

spark.delta.sql

B.

spark.delta.table

C.

spark.table

D.

dbutils.sql

E.

spark.sql

Full Access
Question # 14

A data engineering team has two tables. The first table march_transactions is a collection of all retail transactions in the month of March. The second table april_transactions is a collection of all retail transactions in the month of April. There are no duplicate records between the tables.

Which of the following commands should be run to create a new table all_transactions that contains all records from march_transactions and april_transactions without duplicate records?

A.

CREATE TABLE all_transactions AS

SELECT * FROM march_transactions

INNER JOIN SELECT * FROM april_transactions;

B.

CREATE TABLE all_transactions AS

SELECT * FROM march_transactions

UNION SELECT * FROM april_transactions;

C.

CREATE TABLE all_transactions AS

SELECT * FROM march_transactions

OUTER JOIN SELECT * FROM april_transactions;

D.

CREATE TABLE all_transactions AS

SELECT * FROM march_transactions

INTERSECT SELECT * from april_transactions;

E.

CREATE TABLE all_transactions AS

SELECT * FROM march_transactions

MERGE SELECT * FROM april_transactions;

Full Access
Question # 15

Which of the following describes a scenario in which a data team will want to utilize cluster pools?

A.

An automated report needs to be refreshed as quickly as possible.

B.

An automated report needs to be made reproducible.

C.

An automated report needs to be tested to identify errors.

D.

An automated report needs to be version-controlled across multiple collaborators.

E.

An automated report needs to be runnable by all stakeholders.

Full Access
Question # 16

A data engineer wants to schedule their Databricks SQL dashboard to refresh every hour, but they only want the associated SQL endpoint to be running when it is necessary. The dashboard has multiple queries on multiple datasets associated with it. The data that feeds the dashboard is automatically processed using a Databricks Job.

Which of the following approaches can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

A.

They can turn on the Auto Stop feature for the SQL endpoint.

B.

They can ensure the dashboard's SQL endpoint is not one of the included query's SQL endpoint.

C.

They can reduce the cluster size of the SQL endpoint.

D.

They can ensure the dashboard's SQL endpoint matches each of the queries' SQL endpoints.

E.

They can set up the dashboard's SQL endpoint to be serverless.

Full Access
Question # 17

An engineering manager uses a Databricks SQL query to monitor ingestion latency for each data source. The manager checks the results of the query every day, but they are manually rerunning the query each day and waiting for the results.

Which of the following approaches can the manager use to ensure the results of the query are updated each day?

A.

They can schedule the query to refresh every 1 day from the SQL endpoint's page in Databricks SQL.

B.

They can schedule the query to refresh every 12 hours from the SQL endpoint's page in Databricks SQL.

C.

They can schedule the query to refresh every 1 day from the query's page in Databricks SQL.

D.

They can schedule the query to run every 1 day from the Jobs UI.

E.

They can schedule the query to run every 12 hours from the Jobs UI.

Full Access
Question # 18

In which of the following scenarios should a data engineer use the MERGE INTO command instead of the INSERT INTO command?

A.

When the location of the data needs to be changed

B.

When the target table is an external table

C.

When the source table can be deleted

D.

When the target table cannot contain duplicate records

E.

When the source is not a Delta table

Full Access
Question # 19

A data engineer has a Python notebook in Databricks, but they need to use SQL to accomplish a specific task within a cell. They still want all of the other cells to use Python without making any changes to those cells.

Which of the following describes how the data engineer can use SQL within a cell of their Python notebook?

A.

It is not possible to use SQL in a Python notebook

B.

They can attach the cell to a SQL endpoint rather than a Databricks cluster

C.

They can simply write SQL syntax in the cell

D.

They can add %sql to the first line of the cell

E.

They can change the default language of the notebook to SQL

Full Access
Question # 20

A data engineer is working with two tables. Each of these tables is displayed below in its entirety.

Databricks-Certified-Data-Engineer-Associate question answer

The data engineer runs the following query to join these tables together:

Databricks-Certified-Data-Engineer-Associate question answer

Which of the following will be returned by the above query?

Databricks-Certified-Data-Engineer-Associate question answer

A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Full Access
Question # 21

What is stored in a Databricks customer's cloud account?

A.

Data

B.

Cluster management metadata

C.

Databricks web application

D.

Notebooks

Full Access
Question # 22

Which of the following Git operations must be performed outside of Databricks Repos?

A.

Commit

B.

Pull

C.

Push

D.

Clone

E.

Merge

Full Access
Question # 23

A data engineer has been using a Databricks SQL dashboard to monitor the cleanliness of the input data to a data analytics dashboard for a retail use case. The job has a Databricks SQL query that returns the number of store-level records where sales is equal to zero. The data engineer wants their entire team to be notified via a messaging webhook whenever this value is greater than 0.

Which of the following approaches can the data engineer use to notify their entire team via a messaging webhook whenever the number of stores with $0 in sales is greater than zero?

A.

They can set up an Alert with a custom template.

B.

They can set up an Alert with a new email alert destination.

C.

They can set up an Alert with one-time notifications.

D.

They can set up an Alert with a new webhook alert destination.

E.

They can set up an Alert without notifications.

Full Access
Question # 24

A dataset has been defined using Delta Live Tables and includes an expectations clause:

CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION FAIL UPDATE

What is the expected behavior when a batch of data containing data that violates these constraints is processed?

A.

Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log.

B.

Records that violate the expectation cause the job to fail.

C.

Records that violate the expectation are dropped from the target dataset and loaded into a quarantine table.

D.

Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.

E.

Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset.

Full Access
Question # 25

A data engineer wants to schedule their Databricks SQL dashboard to refresh once per day, but they only want the associated SQL endpoint to be running when it is necessary.

Which of the following approaches can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

A.

They can ensure the dashboard’s SQL endpoint matches each of the queries’ SQL endpoints.

B.

They can set up the dashboard’s SQL endpoint to be serverless.

C.

They can turn on the Auto Stop feature for the SQL endpoint.

D.

They can reduce the cluster size of the SQL endpoint.

E.

They can ensure the dashboard’s SQL endpoint is not one of the included query’s SQL endpoint.

Full Access
Question # 26

A data engineering team has noticed that their Databricks SQL queries are running too slowly when they are submitted to a non-running SQL endpoint. The data engineering team wants this issue to be resolved.

Which of the following approaches can the team use to reduce the time it takes to return results in this scenario?

A.

They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to "Reliability Optimized."

B.

They can turn on the Auto Stop feature for the SQL endpoint.

C.

They can increase the cluster size of the SQL endpoint.

D.

They can turn on the Serverless feature for the SQL endpoint.

E.

They can increase the maximum bound of the SQL endpoint's scaling range

Full Access
Question # 27

Which of the following is hosted completely in the control plane of the classic Databricks architecture?

A.

Worker node

B.

JDBC data source

C.

Databricks web application

D.

Databricks Filesystem

E.

Driver node

Full Access
Question # 28

A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE.

The table is configured to run in Production mode using the Continuous Pipeline Mode.

Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?

A.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

B.

All datasets will be updated once and the pipeline will persist without any processing. The compute resources will persist but go unused.

C.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped.

D.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

E.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

Full Access
Question # 29

A data engineer is attempting to drop a Spark SQL table my_table. The data engineer wants to delete all table metadata and data.

They run the following command:

DROP TABLE IF EXISTS my_table

While the object no longer appears when they run SHOW TABLES, the data files still exist.

Which of the following describes why the data files still exist and the metadata files were deleted?

A.

The table’s data was larger than 10 GB

B.

The table’s data was smaller than 10 GB

C.

The table was external

D.

The table did not have a location

E.

The table was managed

Full Access