3 Months Free Update
3 Months Free Update
3 Months Free Update
Direct query on external files limited options, create external tables for CSV files with header and pipe delimited CSV files, fill in the blanks to complete the create table statement
CREATE TABLE sales (id int, unitsSold int, price FLOAT, items STRING)
________
________
LOCATION “dbfs:/mnt/sales/*.csv”
Which of the following section in the UI can be used to manage permissions and grants to tables?
What is the purpose of the bronze layer in a Multi-hop Medallion architecture?
You were asked to create a notebook that can take department as a parameter and process the data accordingly, which is the following statements result in storing the notebook parameter into a py-thon variable
You are trying to create an object by joining two tables that and it is accessible to data scientist’s team, so it does not get dropped if the cluster restarts or if the notebook is detached. What type of object are you trying to create?
The data engineering team noticed that one of the job fails randomly as a result of using spot in-stances, what feature in Jobs/Tasks can be used to address this issue so the job is more stable when using spot instances?
What is the main difference between the silver layer and gold layer in medallion architecture?
Which of the following two options are supported in identifying the arrival of new files, and incre-mental data from Cloud object storage using Auto Loader?
You were asked to setup a new all-purpose cluster, but the cluster is unable to start which of the following steps do you need to take to identify the root cause of the issue and the reason why the cluster was unable to start?
Data engineering team has provided 10 queries and asked Data Analyst team to build a dashboard and refresh the data every day at 8 AM, identify the best approach to set up data refresh for this dashaboard?
John Smith is a newly joined team member in the Marketing team who currently has access read access to sales tables but does not have access to delete rows from the table, which of the following commands help you accomplish this?
You are currently working on a notebook that will populate a reporting table for downstream process consumption, this process needs to run on a schedule every hour, what type of cluster are you going to use to set up this job?
Which of the following SQL statement can be used to query a table by eliminating duplicate rows from the query results?
You are currently working on reloading customer_sales tables using the below query
1. INSERT OVERWRITE customer_sales
2. SELECT * FROM customers c
3. INNER JOIN sales_monthly s on s.customer_id = c.customer_id
After you ran the above command, the Marketing team quickly wanted to review the old data that was in the table. How does INSERT OVERWRITE impact the data in the customer_sales table if you want to see the previous version of the data prior to running the above statement?
You are tasked to set up a set notebook as a job for six departments and each department can run the task parallelly, the notebook takes an input parameter dept number to process the data by department, how do you go about to setup this up in job?
What are the different ways you can schedule a job in Databricks workspace?
You are currently working with the second team and both teams are looking to modify the same notebook, you noticed that the second member is copying the notebooks to the personal folder to edit and replace the collaboration notebook, which notebook feature do you recommend to make the process easier to collaborate.
Data engineering team is required to share the data with Data science team and both the teams are using different workspaces in the same organizationwhich of the following techniques can be used to simplify sharing data across?
*Please note the question is asking how data is shared within an organization across multiple workspaces.
You are currently working on a project that requires the use of SQL and Python in a given note-book, what would be your approach
Data engineering team has a job currently setup to run a task load data into a reporting table every day at 8: 00 AM takes about 20 mins, Operations teams are planning to use that data to run a second job, so they access latest complete set of data. What is the best to way to orchestrate this job setup?
The team has decided to take advantage of table properties to identify a business owner for each table, which of the following table DDL syntax allows you to populate a table property identifying the business owner of a table
CREATE TABLE inventory (id INT, units FLOAT)
Which of the following Auto loader structured streaming commands successfully performs a hop from the landing area into Bronze?
Which of the following techniques structured streaming uses to create an end-to-end fault toler-ance?
You are designing an analytical to store structured data from your e-commerce platform and un-structured data from website traffic and app store, how would you approach where you store this data?
A notebook accepts an input parameter that is assigned to a python variable called department and this is an optional parameter to the notebook, you are looking to control the flow of the code using this parameter. you have to check department variable is present then execute the code and if no department value is passed then skip the code execution. How do you achieve this using python?
In order to use Unity catalog features, which of the following steps needs to be taken on man-aged/external tables in the Databricks workspace?
You are currently working on a production job failure with a job set up in job clusters due to a data issue, what cluster do you need to start to investigate and analyze the data?