Question 1

The data engineering team has a secret scope named ‘DataOps-Prod’ that contains all secrets needed by DataOps engineers in a production workspace.

Which of the following is the minimum permission required for the DataOps engineers to use the secrets in this scope ?

Options :

A : MANAGE permission on the “DataOps-Prod” scope

B : READ permission on the “DataOps-Prod” scope

C : MANAGE permission on each secret in the “DataOps-Prod” scope

D : READ permission on each secret in the “DataOps-Prod” scope

E : Workspace Administrator role

Answer: B

Question 2

All records from an Apache Kafka producer are being ingested into a single Delta Lake table with the following schema:

key BINARY, value BINARY, topic STRING, partition LONG, offset LONG, timestamp LONG

There are 5 unique topics being ingested. Only the "registration" topic contains Personal Identifiable Information (PII). The company wishes to restrict access to PII. The company also wishes to only retain records containing PII in this table for 14 days after initial ingestion. However, for non-PII information, it would like to retain these records indefinitely.

Which of the following solutions meets the requirements?

Options :

A :

All data should be deleted biweekly; Delta Lake-s time travel functionality should be leveraged to maintain a history of non-PII information.

B :

Data should be partitioned by the registration field, allowing ACLs and delete statements to be set for the PII directory

C :

Because the value field is stored as binary data, this information is not considered PII and no special precautions should be taken

D :

Separate object storage containers should be specified based on the partition field, allowing isolation at the storage level

E :

Data should be partitioned by the topic field, allowing ACLs and delete statements to leverage partition boundaries.

Answer: E

Question 3

A data engineer, User A, has promoted a pipeline to production by using the REST API to programmatically create several jobs. A DevOps engineer, User B, has configured an external orchestration tool to trigger job runs through the REST API. Both users authorized the REST API calls using their personal access tokens.

A workspace admin, User C, inherits responsibility for managing this pipeline. User C uses the Databricks Jobs UI to take "Owner" privileges of each job. Jobs continue to be triggered using the credentials and tooling configured by User B.

An application has been configured to collect and parse run information returned by the REST API. Which statement describes the value returned in the creator_user_name field?

Options :

A : Once User C takes "Owner" privileges, their email address will appear in this field; prior to this, User A’s email address will appear in this field.

B : User B’s email address will always appear in this field, as their credentials are always used to trigger the run.

C : User A’s email address will always appear in this field, as they still own the underlying notebooks.

D : Once User C takes "Owner" privileges, their email address will appear in this field; prior to this, User B’s email address will appear in this field.

E : User C will only ever appear in this field if they manually trigger the job, otherwise it will indicate User B.

Answer: C

Question 4

A data engineer uses the following SQL query:

GRANT USAGE ON DATABASE sales_db TO finance_team

Which of the following is the benefit of the USAGE privilege ?

Options :

A : Gives read access on the database

B : Gives full permissions on the entire database

C : Gives the ability to view database objects and their metadata

D : No effect! but it-s required to perform any action on the database

E : USAGE privilege is not part of the Databricks governance model

Answer: D

Question 5

The data engineering team has a Silver table called ‘sales_cleaned’ where new sales data is appended in near real-time.

They want to create a new Gold-layer entity against the ‘sales_cleaned’ table to calculate the year-to-date (YTD) of the sales amount. The new entity will have the following schema:

country_code STRING, category STRING, ytd_total_sales FLOAT, updated TIMESTAMP

It’s enough for these metrics to be recalculated once daily. But since they will be queried very frequently by several business teams, the data engineering team wants to cut down the potential costs and latency associated with materializing the results.

Which of the following solutions meets these requirements?

Options :

A : Define the new entity as a view to avoid persisting the results each time the metrics are recalculated

B : Define the new entity as a global temporary view since it can be shared between notebooks or jobs that share computing resources.

C : Configuring a nightly batch job to recalculate the metrics and store them as a table overwritten with each update

D : Create multiple tables, one per business team so the metrics can be queried quickly and efficiently.

E : All the above solutions meet the required requirements since Databricks uses the Delta Caching feature

Answer: C

Smartly Prepare Exam with Free Online Databricks-Certified-Professional-Data-Engineer Practice Test

Buying Options: