Pre-Summer Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: validbest

Pass the Amazon Web Services AWS Certified Data Engineer Data-Engineer-Associate Questions and answers with ValidTests

Exam Data-Engineer-Associate All Questions
Exam Data-Engineer-Associate Premium Access

View all detail and faqs for the Data-Engineer-Associate exam

Viewing page 4 out of 9 pages
Viewing questions 31-40 out of questions
Questions # 31:

A company stores its processed data in an S3 bucket. The company has a strict data access policy. The company uses IAM roles to grant teams within the company different levels of access to the S3 bucket.

The company wants to receive notifications when a user violates the data access policy. Each notification must include the username of the user who violated the policy.

Which solution will meet these requirements?

Options:

A.

Use AWS Config rules to detect violations of the data access policy. Set up compliance alarms.

B.

Use Amazon CloudWatch metrics to gather object-level metrics. Set up CloudWatch alarms.

C.

Use AWS CloudTrail to track object-level events for the S3 bucket. Forward events to Amazon CloudWatch to set up CloudWatch alarms.

D.

Use Amazon S3 server access logs to monitor access to the bucket. Forward the access logs to an Amazon CloudWatch log group. Use metric filters on the log group to set up CloudWatch alarms.

Expert Solution
Questions # 32:

A data engineer is configuring an AWS Glue job to read data from an Amazon S3 bucket. The data engineer has set up the necessary AWS Glue connection details and an associated IAM role. However, when the data engineer attempts to run the AWS Glue job, the data engineer receives an error message that indicates that there are problems with the Amazon S3 VPC gateway endpoint.

The data engineer must resolve the error and connect the AWS Glue job to the S3 bucket.

Which solution will meet this requirement?

Options:

A.

Update the AWS Glue security group to allow inbound traffic from the Amazon S3 VPC gateway endpoint.

B.

Configure an S3 bucket policy to explicitly grant the AWS Glue job permissions to access the S3 bucket.

C.

Review the AWS Glue job code to ensure that the AWS Glue connection details include a fully qualified domain name.

D.

Verify that the VPC's route table includes inbound and outbound routes for the Amazon S3 VPC gateway endpoint.

Expert Solution
Questions # 33:

A company uses Amazon Redshift for its data warehouse. The company must automate refresh schedules for Amazon Redshift materialized views.

Which solution will meet this requirement with the LEAST effort?

Options:

A.

Use Apache Airflow to refresh the materialized views.

B.

Use an AWS Lambda user-defined function (UDF) within Amazon Redshift to refresh the materialized views.

C.

Use the query editor v2 in Amazon Redshift to refresh the materialized views.

D.

Use an AWS Glue workflow to refresh the materialized views.

Expert Solution
Questions # 34:

A marketing company uses Amazon S3 to store marketing data. The company uses versioning in some buckets. The company runs several jobs to read and load data into the buckets.

To help cost-optimize its storage, the company wants to gather information about incomplete multipart uploads and outdated versions that are present in the S3 buckets.

Which solution will meet these requirements with the LEAST operational effort?

Options:

A.

Use AWS CLI to gather the information.

B.

Use Amazon S3 Inventory configurations reports to gather the information.

C.

Use the Amazon S3 Storage Lens dashboard to gather the information.

D.

Use AWS usage reports for Amazon S3 to gather the information.

Expert Solution
Questions # 35:

A data engineer needs Amazon Athena queries to finish faster. The data engineer notices that all the files the Athena queries use are currently stored in uncompressed .csv format. The data engineer also notices that users perform most queries by selecting a specific column.

Which solution will MOST speed up the Athena query performance?

Options:

A.

Change the data format from .csvto JSON format. Apply Snappy compression.

B.

Compress the .csv files by using Snappy compression.

C.

Change the data format from .csvto Apache Parquet. Apply Snappy compression.

D.

Compress the .csv files by using gzjg compression.

Expert Solution
Questions # 36:

A data engineer uses Amazon Kinesis Data Streams to ingest and process records that contain user behavior data from an application every day.

The data engineer notices that the data stream is experiencing throttling because hot shards receive much more data than other shards in the data stream.

How should the data engineer resolve the throttling issue?

Options:

A.

Use a random partition key to distribute the ingested records.

B.

Increase the number of shards in the data stream. Distribute the records across the shards.

C.

Limit the number of records that are sent each second by the producer to match the capacity of the stream.

D.

Decrease the size of the records that the producer sends to match the capacity of the stream.

Expert Solution
Questions # 37:

A data engineer has a one-time task to read data from objects that are in Apache Parquet format in an Amazon S3 bucket. The data engineer needs to query only one column of the data.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Confiqure an AWS Lambda function to load data from the S3 bucket into a pandas dataframe- Write a SQL SELECT statement on the dataframe to query the required column.

B.

Use S3 Select to write a SQL SELECT statement to retrieve the required column from the S3 objects.

C.

Prepare an AWS Glue DataBrew project to consume the S3 objects and to query the required column.

D.

Run an AWS Glue crawler on the S3 objects. Use a SQL SELECT statement in Amazon Athena to query the required column.

Expert Solution
Questions # 38:

A company uses Amazon DataZone as a data governance and business catalog solution. The company stores data in an Amazon S3 data lake. The company uses AWS Glue with an AWS Glue Data Catalog.

A data engineer needs to publish AWS Glue Data Quality scores to the Amazon DataZone portal.

Which solution will meet this requirement?

Options:

A.

Create a data quality ruleset with Data Quality Definition Language (DQDL) rules that apply to a specific AWS Glue table. Schedule the ruleset to run daily. Configure the Amazon DataZone project to have an Amazon Redshift data source. Enable the data quality configuration for the data source.

B.

Configure AWS Glue ETL jobs to use an Evaluate Data Quality transform. Define a data quality ruleset inside the jobs. Configure the Amazon DataZone project to have an AWS Glue data source. Enable the data quality configuration for the data source.

C.

Create a data quality ruleset with Data Quality Definition Language (DQDL) rules that apply to a specific AWS Glue table. Schedule the ruleset to run daily. Configure the Amazon DataZone project to have an AWS Glue data source. Enable the data quality configuration for the data source.

D.

Configure AWS Glue ETL jobs to use an Evaluate Data Quality transform. Define a data quality ruleset inside the jobs. Configure the Amazon DataZone project to have an Amazon Redshift data source. Enable the data quality configuration for the data source.

Expert Solution
Questions # 39:

A retail company is expanding its operations globally. The company needs to use Amazon QuickSight to accurately calculate currency exchange rates for financial reports. The company has an existing dashboard that includes a visual that is based on an analysis of a dataset that contains global currency values and exchange rates.

A data engineer needs to ensure that exchange rates are calculated with a precision of four decimal places. The calculations must be precomputed. The data engineer must materialize results in QuickSight super-fast, parallel, in-memory calculation engine (SPICE).

Which solution will meet these requirements?

Options:

A.

Define and create the calculated field in the dataset.

B.

Define and create the calculated field in the analysis.

C.

Define and create the calculated field in the visual.

D.

Define and create the calculated field in the dashboard.

Expert Solution
Questions # 40:

A gaming company uses Amazon Kinesis Data Streams to collect clickstream data. The company uses Amazon Kinesis Data Firehose delivery streams to store the data in JSON format in Amazon S3. Data scientists at the company use Amazon Athena to query the most recent data to obtain business insights.

The company wants to reduce Athena costs but does not want to recreate the data pipeline.

Which solution will meet these requirements with the LEAST management effort?

Options:

A.

Change the Firehose output format to Apache Parquet. Provide a custom S3 object YYYYMMDD prefix expression and specify a large buffer size. For the existing data, create an AWS Glue extract, transform, and load (ETL) job. Configure the ETL job to combine small JSON files, convert the JSON files to large Parquet files, and add the YYYYMMDD prefix. Use the ALTER TABLE ADD PARTITION statement to reflect the partition on the existing Athena

B.

Create an Apache Spark job that combines JSON files and converts the JSON files to Apache Parquet files. Launch an Amazon EMR ephemeral cluster every day to run the Spark job to create new Parquet files in a different S3 location. Use the ALTER TABLE SET LOCATION statement to reflect the new S3 location on the existing Athena table.

C.

Create a Kinesis data stream as a delivery destination for Firehose. Use Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) to run Apache Flink on the Kinesis data stream. Use Flink to aggregate the data and save the data to Amazon S3 in Apache Parquet format with a custom S3 object YYYYMMDD prefix. Use the ALTER TABLE ADD PARTITION statement to reflect the partition on the existing Athena table.<

D.

Integrate an AWS Lambda function with Firehose to convert source records to Apache Parquet and write them to Amazon S3. In parallel, run an AWS Glue extract, transform, and load (ETL) job to combine the JSON files and convert the JSON files to large Parquet files. Create a custom S3 object YYYYMMDD prefix. Use the ALTER TABLE ADD PARTITION statement to reflect the partition on the existing Athena table.

Expert Solution
Viewing page 4 out of 9 pages
Viewing questions 31-40 out of questions