Pre-Summer Special Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: validbest

Pass the Amazon Web Services AWS Certified Specialty MLS-C01 Questions and answers with ValidTests

Exam MLS-C01 All Questions
Exam MLS-C01 Premium Access

View all detail and faqs for the MLS-C01 exam

Viewing page 3 out of 10 pages
Viewing questions 21-30 out of questions
Questions # 21:

A Machine Learning Specialist is attempting to build a linear regression model.

Given the displayed residual plot only, what is the MOST likely problem with the model?

Options:

A.

Linear regression is inappropriate. The residuals do not have constant variance.

B.

Linear regression is inappropriate. The underlying data has outliers.

C.

Linear regression is appropriate. The residuals have a zero mean.

D.

Linear regression is appropriate. The residuals have constant variance.

Expert Solution
Questions # 22:

A data scientist needs to identify fraudulent user accounts for a company's ecommerce platform. The company wants the ability to determine if a newly created account is associated with a previously known fraudulent user. The data scientist is using AWS Glue to cleanse the company's application logs during ingestion.

Which strategy will allow the data scientist to identify fraudulent accounts?

Options:

A.

Execute the built-in FindDuplicates Amazon Athena query.

B.

Create a FindMatches machine learning transform in AWS Glue.

C.

Create an AWS Glue crawler to infer duplicate accounts in the source data.

D.

Search for duplicate accounts in the AWS Glue Data Catalog.

Expert Solution
Questions # 23:

A data science team is working with a tabular dataset that the team stores in Amazon S3. The team wants to experiment with different feature transformations such as categorical feature encoding. Then the team wants to visualize the resulting distribution of the dataset. After the team finds an appropriate set of feature transformations, the team wants to automate the workflow for feature transformations.

Which solution will meet these requirements with the MOST operational efficiency?

Options:

A.

Use Amazon SageMaker Data Wrangler preconfigured transformations to explore feature transformations. Use SageMaker Data Wrangler templates for visualization. Export the feature processing workflow to a SageMaker pipeline for automation.

B.

Use an Amazon SageMaker notebook instance to experiment with different feature transformations. Save the transformations to Amazon S3. Use Amazon QuickSight for visualization. Package the feature processing steps into an AWS Lambda function for automation.

C.

Use AWS Glue Studio with custom code to experiment with different feature transformations. Save the transformations to Amazon S3. Use Amazon QuickSight for visualization. Package the feature processing steps into an AWS Lambda function for automation.

D.

Use Amazon SageMaker Data Wrangler preconfigured transformations to experiment with different feature transformations. Save the transformations to Amazon S3. Use Amazon QuickSight for visualzation. Package each feature transformation step into a separate AWS Lambda function. Use AWS Step Functions for workflow automation.

Expert Solution
Questions # 24:

A finance company has collected stock return data for 5.000 publicly traded companies. A financial analyst has a dataset that contains 2.000 attributes for each company. The financial analyst wants to use Amazon SageMaker to identify the top 15 attributes that are most valuable to predict future stock returns.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Use the linear learner algorithm in SageMaker to train a linear regression model to predict the stock returns. Identify the most predictive features by ranking absolute coefficient values.

B.

Use random forest regression in SageMaker to train a model to predict the stock returns. Identify the most predictive features based on Gini importance scores.

C.

Use an Amazon SageMaker Data Wrangler quick model visualization to predict the stock returns. Identify the most predictive features based on the quick model's feature importance scores.

D.

Use Amazon SageMaker Autopilot to build a regression model to predict the stock returns. Identify the most predictive features based on an Amazon SageMaker Clarify report.

Expert Solution
Questions # 25:

A financial services company wants to adopt Amazon SageMaker as its default data science environment. The company's data scientists run machine learning (ML) models on confidential financial data. The company is worried about data egress and wants an ML engineer to secure the environment.

Which mechanisms can the ML engineer use to control data egress from SageMaker? (Choose three.)

Options:

A.

Connect to SageMaker by using a VPC interface endpoint powered by AWS PrivateLink.

B.

Use SCPs to restrict access to SageMaker.

C.

Disable root access on the SageMaker notebook instances.

D.

Enable network isolation for training jobs and models.

E.

Restrict notebook presigned URLs to specific IPs used by the company.

F.

Protect data with encryption at rest and in transit. Use AWS Key Management Service (AWS KMS) to manage encryption keys.

Expert Solution
Questions # 26:

A Machine Learning Specialist kicks off a hyperparameter tuning job for a tree-based ensemble model using Amazon SageMaker with Area Under the ROC Curve (AUC) as the objective metric This workflow will eventually be deployed in a pipeline that retrains and tunes hyperparameters each night to model click-through on data that goes stale every 24 hours

With the goal of decreasing the amount of time it takes to train these models, and ultimately to decrease costs, the Specialist wants to reconfigure the input hyperparameter range(s)

Which visualization will accomplish this?

Options:

A.

A histogram showing whether the most important input feature is Gaussian.

B.

A scatter plot with points colored by target variable that uses (-Distributed Stochastic Neighbor Embedding (I-SNE) to visualize the large number of input variables in an easier-to-read dimension.

C.

A scatter plot showing (he performance of the objective metric over each training iteration

D.

A scatter plot showing the correlation between maximum tree depth and the objective metric.

Expert Solution
Questions # 27:

A music streaming company is building a pipeline to extract features. The company wants to store the features for offline model training and online inference. The company wants to track feature history and to give the company's data science teams access to the features.

Which solution will meet these requirements with the MOST operational efficiency?

Options:

A.

Use Amazon SageMaker Feature Store to store features for model training and inference. Create an online store for online inference. Create an offline store for model training. Create an 1AM role for data scientists to access and search through feature groups.

B.

Use Amazon SageMaker Feature Store to store features for model training and inference. Create an online store for both online inference and model training. Create an 1AM role for data scientists to access and search through feature groups.

C.

Create one Amazon S3 bucket to store online inference features. Create a second S3 bucket to store offline model training features. Turn on

versioning for the S3 buckets and use tags to specify which tags are for online inference features and which are for offline model training features. Use Amazon Athena to query the S3 bucket for online inference. Connect the S3 bucket for offline model training to a SageMaker training job. Create

D.

Create two separate Amazon DynamoDB tables to store online inference features and offline model training features. Use time-based versioning on both tables. Query the DynamoDB table for online inference. Move the data from DynamoDB to Amazon S3 when a new SageMaker training job is launched. Create an 1AM policy that allows data scientists to access both tables.

Expert Solution
Questions # 28:

A company's machine learning (ML) specialist is building a computer vision model to classify 10 different traffic signs. The company has stored 100 images of each class in Amazon S3, and the company has another 10.000 unlabeled images. All the images come from dash cameras and are a size of 224 pixels * 224 pixels. After several training runs, the model is overfitting on the training data.

Which actions should the ML specialist take to address this problem? (Select TWO.)

Options:

A.

Use Amazon SageMaker Ground Truth to label the unlabeled images

B.

Use image preprocessing to transform the images into grayscale images.

C.

Use data augmentation to rotate and translate the labeled images.

D.

Replace the activation of the last layer with a sigmoid.

E.

Use the Amazon SageMaker k-nearest neighbors (k-NN) algorithm to label the unlabeled images.

Expert Solution
Questions # 29:

A web-based company wants to improve its conversion rate on its landing page Using a large historical dataset of customer visits, the company has repeatedly trained a multi-class deep learning network algorithm on Amazon SageMaker However there is an overfitting problem training data shows 90% accuracy in predictions, while test data shows 70% accuracy only

The company needs to boost the generalization of its model before deploying it into production to maximize conversions of visits to purchases

Which action is recommended to provide the HIGHEST accuracy model for the company's test and validation data?

Options:

A.

Increase the randomization of training data in the mini-batches used in training.

B.

Allocate a higher proportion of the overall data to the training dataset

C.

Apply L1 or L2 regularization and dropouts to the training.

D.

Reduce the number of layers and units (or neurons) from the deep learning network.

Expert Solution
Questions # 30:

A data scientist is developing a pipeline to ingest streaming web traffic data. The data scientist needs to implement a process to identify unusual web traffic patterns as part of the pipeline. The patterns will be used downstream for alerting and incident response. The data scientist has access to unlabeled historic data to use, if needed.

The solution needs to do the following:

    Calculate an anomaly score for each web traffic entry.

    Adapt unusual event identification to changing web patterns over time.

Which approach should the data scientist implement to meet these requirements?

Options:

A.

Use historic web traffic data to train an anomaly detection model using the Amazon SageMaker Random Cut Forest (RCF) built-in model. Use an Amazon Kinesis Data Stream to process the incoming web traffic data. Attach a preprocessing AWS Lambda function to perform data enrichment by calling the RCF model to calculate the anomaly score for each record.

B.

Use historic web traffic data to train an anomaly detection model using the Amazon SageMaker built-in XGBoost model. Use an Amazon Kinesis Data Stream to process the incoming web traffic data. Attach a preprocessing AWS Lambda function to perform data enrichment by calling the XGBoost model to calculate the anomaly score for each record.

C.

Collect the streaming data using Amazon Kinesis Data Firehose. Map the delivery stream as an input source for Amazon Kinesis Data Analytics. Write a SQL query to run in real time against the streaming data with the k-Nearest Neighbors (kNN) SQL extension to calculate anomaly scores for each record using a tumbling window.

D.

Collect the streaming data using Amazon Kinesis Data Firehose. Map the delivery stream as an input source for Amazon Kinesis Data Analytics. Write a SQL query to run in real time against the streaming data with the Amazon Random Cut Forest (RCF) SQL extension to calculate anomaly scores for each record using a sliding window.

Expert Solution
Viewing page 3 out of 10 pages
Viewing questions 21-30 out of questions