Pre-Summer Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: validbest

Pass the Amazon Web Services AWS Certified Data Engineer Data-Engineer-Associate Questions and answers with ValidTests

Exam Data-Engineer-Associate All Questions
Exam Data-Engineer-Associate Premium Access

View all detail and faqs for the Data-Engineer-Associate exam

Viewing page 6 out of 9 pages
Viewing questions 51-60 out of questions
Questions # 51:

During a security review, a company identified a vulnerability in an AWS Glue job. The company discovered that credentials to access an Amazon Redshift cluster were hard coded in the job script.

A data engineer must remediate the security vulnerability in the AWS Glue job. The solution must securely store the credentials.

Which combination of steps should the data engineer take to meet these requirements? (Choose two.)

Options:

A.

Store the credentials in the AWS Glue job parameters.

B.

Store the credentials in a configuration file that is in an Amazon S3 bucket.

C.

Access the credentials from a configuration file that is in an Amazon S3 bucket by using the AWS Glue job.

D.

Store the credentials in AWS Secrets Manager.

E.

Grant the AWS Glue job 1AM role access to the stored credentials.

Expert Solution
Questions # 52:

A retail company is expanding its operations globally. The company needs to use Amazon QuickSight to accurately calculate currency exchange rates for financial reports.The company has an existing dashboard that includes a visual that is based on an analysis of a dataset that contains global currency values and exchange rates.

A data engineer needs to ensure that exchange rates are calculated with a precision of four decimal places. The calculations must be precomputed. The data engineer must materialize results in QuickSight super-fast, parallel, in-memory calculation engine (SPICE).

Which solution will meet these requirements?

Options:

A.

Define and create the calculated field in the dataset.

B.

Define and create the calculated field in the analysis.

C.

Define and create the calculated field in the visual.

D.

Define and create the calculated field in the dashboard.

Expert Solution
Questions # 53:

An airline company is collecting metrics about flight activities for analytics. The company is conducting a proof of concept (POC) test to show how analytics can provide insights that the company can use to increase on-time departures.

The POC test uses objects in Amazon S3 that contain the metrics in .csv format. The POC test uses Amazon Athena to query the data. The data is partitioned in the S3 bucket by date.

As the amount of data increases, the company wants to optimize the storage solution to improve query performance.

Which combination of solutions will meet these requirements? (Choose two.)

Options:

A.

Add a randomized string to the beginning of the keys in Amazon S3 to get more throughput across partitions.

B.

Use an S3 bucket that is in the same account that uses Athena to query the data.

C.

Use an S3 bucket that is in the same AWS Region where the company runs Athena queries.

D.

Preprocess the .csv data to JSON format by fetching only the document keys that the query requires.

E.

Preprocess the .csv data to Apache Parquet format by fetching only the data blocks that are needed for predicates.

Expert Solution
Questions # 54:

A company uses Amazon S3 to store data and Amazon QuickSight to create visualizations.

The company has an S3 bucket in an AWS account named Hub-Account. The S3 bucket is encrypted by an AWS Key Management Service (AWS KMS) key. The company's QuickSight instance is in a separate account named BI-Account

The company updates the S3 bucket policy to grant access to the QuickSight service role. The company wants to enable cross-account access to allow QuickSight to interact with the S3 bucket.

Which combination of steps will meet this requirement? (Select TWO.)

Options:

A.

Use the existing AWS KMS key to encrypt connections from QuickSight to the S3 bucket.

B.

Add the 53 bucket as a resource that the QuickSight service role can access.

C.

Use AWS Resource Access Manager (AWS RAM) to share the S3 bucket with the Bl-Account account.

D.

Add an IAM policy to the QuickSight service role to give QuickSight access to the KMS key that encrypts the S3 bucket.

E.

Add the KMS key as a resource that the QuickSight service role can access.

Expert Solution
Questions # 55:

A company stores data in a data lake that is in Amazon S3. Some data that the company stores in the data lake contains personally identifiable information (PII). Multiple user groups need to access the raw data. The company must ensure that user groups can access only the PII that they require.

Which solution will meet these requirements with the LEAST effort?

Options:

A.

Use Amazon Athena to query the data. Set up AWS Lake Formation and create data filters to establish levels of access for the company's IAM roles. Assign each user to the IAM role that matches the user's PII access requirements.

B.

Use Amazon QuickSight to access the data. Use column-level security features in QuickSight to limit the PII that users can retrieve from Amazon S3 by using Amazon Athena. Define QuickSight access levels based on the PII access requirements of the users.

C.

Build a custom query builder UI that will run Athena queries in the background to access the data. Create user groups in Amazon Cognito. Assign access levels to the user groups based on the PII access requirements of the users.

D.

Create IAM roles that have different levels of granular access. Assign the IAM roles to IAM user groups. Use an identity-based policy to assign access levels to user groups at the column level.

Expert Solution
Questions # 56:

A technology company currently uses Amazon Kinesis Data Streams to collect log data in real time. The company wants to use Amazon Redshift for downstream real-time queries and to enrich the log data.

Which solution will ingest data into Amazon Redshift with the LEAST operational overhead?

Options:

A.

Set up an Amazon Data Firehose delivery stream to send data to a Redshift provisioned cluster table.

B.

Set up an Amazon Data Firehose delivery stream to send data to Amazon S3. Configure a Redshift provisioned cluster to load data every minute.

C.

Configure Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) to send data directly to a Redshift provisioned cluster table.

D.

Use Amazon Redshift streaming ingestion from Kinesis Data Streams and to present data as a materialized view.

Expert Solution
Questions # 57:

A company needs to set up a data catalog and metadata management for data sources that run in the AWS Cloud. The company will use the data catalog to maintain the metadata of all the objects that are in a set of data stores. The data stores include structured sources such as Amazon RDS and Amazon Redshift. The data stores also include semistructured sources such as JSON files and .xml files that are stored in Amazon S3.

The company needs a solution that will update the data catalog on a regular basis. The solution also must detect changes to the source metadata.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Use Amazon Aurora as the data catalog. Create AWS Lambda functions that will connect to the data catalog. Configure the Lambda functions to gather the metadata information from multiple sources and to update the Aurora data catalog. Schedule the Lambda functions to run periodically.

B.

Use the AWS Glue Data Catalog as the central metadata repository. Use AWS Glue crawlers to connect to multiple data stores and to update the Data Catalog with metadata changes. Schedule the crawlers to run periodically to update the metadata catalog.

C.

Use Amazon DynamoDB as the data catalog. Create AWS Lambda functions that will connect to the data catalog. Configure the Lambda functions to gather the metadata information from multiple sources and to update the DynamoDB data catalog. Schedule the Lambda functions to run periodically.

D.

Use the AWS Glue Data Catalog as the central metadata repository. Extract the schema for Amazon RDS and Amazon Redshift sources, and build the Data Catalog. Use AWS Glue crawlers for data that is in Amazon S3 to infer the schema and to automatically update the Data Catalog.

Expert Solution
Questions # 58:

A company stores customer data that contains personally identifiable information (PII) in an Amazon Redshift cluster. The company's marketing, claims, and analytics teams need to be able to access the customer data.

The marketing team should have access to obfuscated claim information but should have full access to customer contact information.

The claims team should have access to customer information for each claim that the team processes.

The analytics team should have access only to obfuscated PII data.

Which solution will enforce these data access requirements with the LEAST administrative overhead?

Options:

A.

Create a separate Redshift cluster for each team. Load only the required data for each team. Restrict access to clusters based on the teams.

B.

Create views that include required fields for each of the data requirements. Grant the teams access only to the view that each team requires.

C.

Create a separate Amazon Redshift database role for each team. Define masking policies that apply for each team separately. Attach appropriate masking policies to each team role.

D.

Move the customer data to an Amazon S3 bucket. Use AWS Lake Formation to create a data lake. Use fine-grained security capabilities to grant each team appropriate permissions to access the data.

Expert Solution
Questions # 59:

A company receives .csv files that contain physical address data. The data is in columns that have the following names: Door_No, Street_Name, City, and Zip_Code. The company wants to create a single column to store these values in the following format:

Question # 59

Which solution will meet this requirement with the LEAST coding effort?

Options:

A.

Use AWS Glue DataBrew to read the files. Use the NEST TO ARRAY transformation to create the new column.

B.

Use AWS Glue DataBrew to read the files. Use the NEST TO MAP transformation to create the new column.

C.

Use AWS Glue DataBrew to read the files. Use the PIVOT transformation to create the new column.

D.

Write a Lambda function in Python to read the files. Use the Python data dictionary type to create the new column.

Expert Solution
Questions # 60:

A company has a data lake in Amazon S3. The company collects AWS CloudTrail logs for multiple applications. The company stores the logs in the data lake, catalogs the logs in AWS Glue, and partitions the logs based on the year. The company uses Amazon Athena to analyze the logs.

Recently, customers reported that a query on one of the Athena tables did not return any data. A data engineer must resolve the issue.

Which combination of troubleshooting steps should the data engineer take? (Select TWO.)

Options:

A.

Confirm that Athena is pointing to the correct Amazon S3 location.

B.

Increase the query timeout duration.

C.

Use the MSCK REPAIR TABLE command.

D.

Restart Athena.

E.

Delete and recreate the problematic Athena table.

Expert Solution
Viewing page 6 out of 9 pages
Viewing questions 51-60 out of questions