Month End Sale Special - 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: c4sdisc65

Databricks-Certified-Professional-Data-Engineer PDF

$38.5

$109.99

3 Months Free Update

  • Printable Format
  • Value of Money
  • 100% Pass Assurance
  • Verified Answers
  • Researched by Industry Experts
  • Based on Real Exams Scenarios
  • 100% Real Questions

Databricks-Certified-Professional-Data-Engineer PDF + Testing Engine

$61.6

$175.99

3 Months Free Update

  • Exam Name: Databricks Certified Data Engineer Professional Exam
  • Last Update: Mar 19, 2023
  • Questions and Answers: 222
  • Free Real Questions Demo
  • Recommended by Industry Experts
  • Best Economical Package
  • Immediate Access

Databricks-Certified-Professional-Data-Engineer Engine

$46.2

$131.99

3 Months Free Update

  • Best Testing Engine
  • One Click installation
  • Recommended by Teachers
  • Easy to use
  • 3 Modes of Learning
  • State of Art Technology
  • 100% Real Questions included

Databricks-Certified-Professional-Data-Engineer Databricks Certified Data Engineer Professional Exam Questions and Answers

Question # 6

Direct query on external files limited options, create external tables for CSV files with header and pipe delimited CSV files, fill in the blanks to complete the create table statement

CREATE TABLE sales (id int, unitsSold int, price FLOAT, items STRING)

________

________

LOCATION “dbfs:/mnt/sales/*.csv”

A.

FORMAT CSV

OPTIONS ( “true”,”|”)

B.

USING CSV

TYPE ( “true”,”|”)

C.

USING CSV

OPTIONS ( header =“true”, delimiter = ”|”)

(Correct)

D.

FORMAT CSV

FORMAT TYPE ( header =“true”, delimiter = ”|”)

E.

FORMAT CSV

TYPE ( header =“true”, delimiter = ”|”)

Full Access
Question # 7

Which of the following section in the UI can be used to manage permissions and grants to tables?

A.

User Settings

B.

Admin UI

C.

Workspace admin settings

D.

User access control lists

E.

Data Explorer

Full Access
Question # 8

What is the purpose of the bronze layer in a Multi-hop Medallion architecture?

A.

Copy of raw data, easy to query and ingest data for downstream processes.

B.

Powers ML applications

C.

Data quality checks, corrupt data quarantined

D.

Contain aggregated data that is to be consumed into Silver

E.

Reduces data storage by compressing the data

Full Access
Question # 9

You were asked to create a notebook that can take department as a parameter and process the data accordingly, which is the following statements result in storing the notebook parameter into a py-thon variable

A.

SET department = dbutils.widget.get("department")

B.

ASSIGN department == dbutils.widget.get("department")

C.

department = dbutils.widget.get("department")

D.

department = notebook.widget.get("department")

E.

department = notebook.param.get("department")

Full Access
Question # 10

You are trying to create an object by joining two tables that and it is accessible to data scientist’s team, so it does not get dropped if the cluster restarts or if the notebook is detached. What type of object are you trying to create?

A.

Temporary view

B.

Global Temporary view

C.

Global Temporary view with cache option

D.

External view

E.

View

Full Access
Question # 11

The data engineering team noticed that one of the job fails randomly as a result of using spot in-stances, what feature in Jobs/Tasks can be used to address this issue so the job is more stable when using spot instances?

A.

Use Databrick REST API to monitor and restart the job

B.

Use Jobs runs, active runs UI section to monitor and restart the job

C.

Add second task and add a check condition to rerun the first task if it fails

D.

Restart the job cluster, job automatically restarts

E.

Add a retry policy to the task

Full Access
Question # 12

What is the main difference between the silver layer and gold layer in medallion architecture?

A.

Silver optimized to perform ETL, Gold is optimized query performance

B.

Gold is optimized go perform ETL, Silver is optimized for query performance

C.

Silver is copy of Bronze, Gold is a copy of Silver

D.

Silver is stored in Delta Lake, Gold is stored in memory

E.

Silver may contain aggregated data, gold may preserve the granularity of original data

Full Access
Question # 13

Which of the following two options are supported in identifying the arrival of new files, and incre-mental data from Cloud object storage using Auto Loader?

A.

Directory listing, File notification

B.

Checking pointing, watermarking

C.

Writing ahead logging, read head logging

D.

File hashing, Dynamic file lookup

E.

Checkpointing and Write ahead logging

Full Access
Question # 14

You were asked to setup a new all-purpose cluster, but the cluster is unable to start which of the following steps do you need to take to identify the root cause of the issue and the reason why the cluster was unable to start?

A.

Check the cluster driver logs

B.

Check the cluster event logs

(Correct)

C.

Workspace logs

D.

Storage account

E.

Data plane

Full Access
Question # 15

Data engineering team has provided 10 queries and asked Data Analyst team to build a dashboard and refresh the data every day at 8 AM, identify the best approach to set up data refresh for this dashaboard?

A.

Each query requires a separate task and setup 10 tasks under a single job to run at 8 AM to refresh the dashboard

B.

The entire dashboard with 10 queries can be refreshed at once, single schedule needs to be set up to refresh at 8 AM.

C.

Setup JOB with linear dependency to all load all 10 queries into a table so the dashboard can be refreshed at once.

D.

A dashboard can only refresh one query at a time, 10 schedules to set up the refresh.

E.

Use Incremental refresh to run at 8 AM every day.

Full Access
Question # 16

John Smith is a newly joined team member in the Marketing team who currently has access read access to sales tables but does not have access to delete rows from the table, which of the following commands help you accomplish this?

A.

GRANT USAGE ON TABLE table_name TO john.smith@marketing.com

B.

GRANT DELETE ON TABLE table_name TO john.smith@marketing.com

C.

GRANT DELETE TO TABLE table_name ON john.smith@marketing.com

D.

GRANT MODIFY TO TABLE table_name ON john.smith@marketing.com

E.

GRANT MODIFY ON TABLE table_name TO john.smith@marketing.com

Full Access
Question # 17

You are currently working on a notebook that will populate a reporting table for downstream process consumption, this process needs to run on a schedule every hour, what type of cluster are you going to use to set up this job?

A.

Since it’s just a single job and we need to run every hour, we can use an all-purpose cluster

B.

The job cluster is best suited for this purpose.

C.

Use Azure VM to read and write delta tables in Python

D.

Use delta live table pipeline to run in continuous mode

Full Access
Question # 18

How to determine if a table is a managed table vs external table?

A.

Run IS_MANAGED(‘table_name’) function

B.

All external tables are stored in data lake, managed tables are stored in DELTA lake

C.

All managed tables are stored in unity catalog

D.

Run SQL command DESCRIBE EXTENDED table_name and check type

E.

Run SQL command SHOW TABLES to see the type of the table

Full Access
Question # 19

Which of the following SQL statement can be used to query a table by eliminating duplicate rows from the query results?

A.

SELECT DISTINCT * FROM table_name

B.

SELECT DISTINCT * FROM table_name HAVING COUNT(*) > 1

C.

SELECT DISTINCT_ROWS (*) FROM table_name

D.

SELECT * FROM table_name GROUP BY * HAVING COUNT(*) < 1

E.

SELECT * FROM table_name GROUP BY * HAVING COUNT(*) > 1

Full Access
Question # 20

You are currently working on reloading customer_sales tables using the below query

1. INSERT OVERWRITE customer_sales

2. SELECT * FROM customers c

3. INNER JOIN sales_monthly s on s.customer_id = c.customer_id

After you ran the above command, the Marketing team quickly wanted to review the old data that was in the table. How does INSERT OVERWRITE impact the data in the customer_sales table if you want to see the previous version of the data prior to running the above statement?

A.

Overwrites the data in the table, all historical versions of the data, you can not time travel to previous versions

B.

Overwrites the data in the table but preserves all historical versions of the data, you can time travel to previous versions

C.

Overwrites the current version of the data but clears all historical versions of the data, so you can not time travel to previous versions.

D.

Appends the data to the current version, you can time travel to previous versions

E.

By default, overwrites the data and schema, you cannot perform time travel

Full Access
Question # 21

You are tasked to set up a set notebook as a job for six departments and each department can run the task parallelly, the notebook takes an input parameter dept number to process the data by department, how do you go about to setup this up in job?

A.

Use a single notebook as task in the job and use dbutils.notebook.run to run each note-book with parameter in a different cell

B.

A task in the job cannot take an input parameter, create six notebooks with hardcoded dept number and setup six tasks with linear dependency in the job

C.

A task accepts key-value pair parameters, creates six tasks pass department number as parameter foreach task with no dependency in the job as they can all run in parallel.

(Correct)

D.

A parameter can only be passed at the job level, create six jobs pass department number to each job with linear job dependency

E.

A parameter can only be passed at the job level, create six jobs pass department number to each job with no job dependency

Full Access
Question # 22

What are the different ways you can schedule a job in Databricks workspace?

A.

Continuous, Incremental

B.

On-Demand runs, File notification from Cloud object storage

C.

Cron, On Demand runs

D.

Cron, File notification from Cloud object storage

E.

Once, Continuous

Full Access
Question # 23

You are currently working with the second team and both teams are looking to modify the same notebook, you noticed that the second member is copying the notebooks to the personal folder to edit and replace the collaboration notebook, which notebook feature do you recommend to make the process easier to collaborate.

A.

Databricks notebooks should be copied to a local machine and setup source control lo-cally to version the notebooks

B.

Databricks notebooks support automatic change tracking and versioning

C.

Databricks Notebooks support real-time coauthoring on a single notebook

D.

Databricks notebooks can be exported into dbc archive files and stored in data lake

E.

Databricks notebook can be exported as HTML and imported at a later time

Full Access
Question # 24

Data engineering team is required to share the data with Data science team and both the teams are using different workspaces in the same organizationwhich of the following techniques can be used to simplify sharing data across?

*Please note the question is asking how data is shared within an organization across multiple workspaces.

A.

Data Sharing

B.

Unity Catalog

C.

DELTA lake

D.

Use a single storage location

E.

DELTA LIVE Pipelines

Full Access
Question # 25

You are currently working on a project that requires the use of SQL and Python in a given note-book, what would be your approach

A.

Create two separate notebooks, one for SQL and the second for Python

B.

A single notebook can support multiple languages, use the magic command to switch between the two.

C.

Use an All-purpose cluster for python, SQL endpoint for SQL

D.

Use job cluster to run python and SQL Endpoint for SQL

Full Access
Question # 26

Data engineering team has a job currently setup to run a task load data into a reporting table every day at 8: 00 AM takes about 20 mins, Operations teams are planning to use that data to run a second job, so they access latest complete set of data. What is the best to way to orchestrate this job setup?

A.

Add Operation reporting task in the same job and set the Data Engineering task to de-pend on Operations reporting task

B.

Setup a second job to run at 8:20 AM in the same workspace

C.

Add Operation reporting task in the same job and set the operations reporting task to depend on Data Engineering task

D.

Use Auto Loader to run every 20 mins to read the initial table and set the trigger to once and create a second job

E.

Setup a Delta live to table based on the first table, set the job to run in continuous mode

Full Access
Question # 27

The team has decided to take advantage of table properties to identify a business owner for each table, which of the following table DDL syntax allows you to populate a table property identifying the business owner of a table

CREATE TABLE inventory (id INT, units FLOAT)

A.

SET TBLPROPERTIES business_owner = 'supply chain'

CREATE TABLE inventory (id INT, units FLOAT)

B.

TBLPROPERTIES (business_owner = 'supply chain')

C.

CREATE TABLE inventory (id INT, units FLOAT)

SET (business_owner = ‘supply chain’)

D.

CREATE TABLE inventory (id INT, units FLOAT)

SET PROPERTY (business_owner = ‘supply chain’)

E.

CREATE TABLE inventory (id INT, units FLOAT)

SET TAG (business_owner = ‘supply chain’)

Full Access
Question # 28

Which of the following Auto loader structured streaming commands successfully performs a hop from the landing area into Bronze?

A.

1.spark\

2..readStream\

3..format("csv")\

4..option("cloudFiles.schemaLocation", checkpoint_directory)\

5..load("landing")\

6..writeStream.option("checkpointLocation", checkpoint_directory)\

7..table(raw)

B.

1.spark\

2..readStream\

3..format("cloudFiles")\

4..option("cloudFiles.format","csv")\

5..option("cloudFiles.schemaLocation", checkpoint_directory)\

6..load("landing")\

7..writeStream.option("checkpointLocation", checkpoint_directory)\

8..table(raw)

(Correct)

C.

1.spark\

2..read\

3..format("cloudFiles")\

4..option("cloudFiles.format",”csv”)\

5..option("cloudFiles.schemaLocation", checkpoint_directory)\

6..load("landing")\

7..writeStream.option("checkpointLocation", checkpoint_directory)\

8..table(raw)

D.

1.spark\

2..readStream\

3..load(rawSalesLocation)\

4..writeStream \

5..option("checkpointLocation", checkpointPath).outputMode("append")\

6..table("uncleanedSales")

E.

1.spark\

2..read\

3..load(rawSalesLocation) \

4..writeStream\

5..option("checkpointLocation", checkpointPath) \

6..outputMode("append")\

7..table("uncleanedSales")

Full Access
Question # 29

Which of the following techniques structured streaming uses to create an end-to-end fault toler-ance?

A.

Checkpointing and Water marking

B.

Write ahead logging and water marking

C.

Checkpointing and idempotent sinks

D.

Write ahead logging and idempotent sinks

E.

Stream will failover to available nodes in the cluste

Full Access
Question # 30

You are designing an analytical to store structured data from your e-commerce platform and un-structured data from website traffic and app store, how would you approach where you store this data?

A.

Use traditional data warehouse for structured data and use data lakehouse for un-structured data.

B.

Data lakehouse can only store unstructured data but cannot enforce a schema

C.

Data lakehouse can store structured and unstructured data and can enforce schema

D.

Traditional data warehouses are good for storing structured data and enforcing schema

Full Access
Question # 31

A notebook accepts an input parameter that is assigned to a python variable called department and this is an optional parameter to the notebook, you are looking to control the flow of the code using this parameter. you have to check department variable is present then execute the code and if no department value is passed then skip the code execution. How do you achieve this using python?

A.

1.if department is not None:

2. #Execute code

3.else:

4. pass

(Correct)

B.

1.if (department is not None)

2. #Execute code

3.else

4. pass

C.

1.if department is not None:

2. #Execute code

3.end:

4. pass

D.

1.if department is not None:

2. #Execute code

3.then:

4. pass

E.

1.if department is None:

2. #Execute code

3.else:

4. pass

Full Access
Question # 32

In order to use Unity catalog features, which of the following steps needs to be taken on man-aged/external tables in the Databricks workspace?

A.

Enable unity catalog feature in workspace settings

B.

Migrate/upgrade objects in workspace managed/external tables/view to unity catalog

C.

Upgrade to DBR version 15.0

D.

Copy data from workspace to unity catalog

E.

Upgrade workspace to Unity catalog

Full Access
Question # 33

You are currently working on a production job failure with a job set up in job clusters due to a data issue, what cluster do you need to start to investigate and analyze the data?

A.

A Job cluster can be used to analyze the problem

B.

All-purpose cluster/ interactive cluster is the recommended way to run commands and view the data.

C.

Existing job cluster can be used to investigate the issue

D.

Databricks SQL Endpoint can be used to investigate the issue

Full Access