Practice Free MLA-C01 AWS Certified Machine Learning Engineer - Associate Exam Questions Answers With Explanation

We at Crack4sure are committed to giving students who are preparing for the Amazon Web Services MLA-C01 Exam the most current and reliable questions . To help people study, we've made some of our AWS Certified Machine Learning Engineer - Associate exam materials available for free to everyone. You can take the Free MLA-C01 Practice Test as many times as you want. The answers to the practice questions are given, and each answer is explained.

Get Full 241 Questions Search Other Amazon Web Services Exam

Question # 6

An ML engineer normalized training data by using min-max normalization in AWS Glue DataBrew. The ML engineer must normalize production inference data in the same way before passing the data to the model.

Which solution will meet this requirement?

Apply statistics from a well-known dataset to normalize the production samples.

Keep the min-max normalization statistics from the training set and use them to normalize the production samples.

Calculate new min-max statistics from a batch of production samples and use them to normalize all production samples.

Calculate new min-max statistics from each production sample and use them to normalize all production samples.

Question # 7

An ML engineer is using an Amazon SageMaker AI shadow test to evaluate a new model that is hosted on a SageMaker AI endpoint. The shadow test requires significant GPU resources for high performance. The production variant currently runs on a less powerful instance type.

The ML engineer needs to configure the shadow test to use a higher performance instance type for a shadow variant. The solution must not affect the instance type of the production variant.

Which solution will meet these requirements?

Modify the existing ProductionVariant configuration in the endpoint to include a ShadowProductionVariants list. Specify the larger instance type for the shadow variant.

Create a new endpoint configuration with two ProductionVariant definitions. Configure one definition for the existing production variant and one definition for the shadow variant with the larger instance type. Use the UpdateEndpoint action to apply the new configuration.

Create a separate SageMaker AI endpoint for the shadow variant that uses the larger instance type. Create an AWS Lambda function that routes a portion of the traffic to the shadow endpoint. Assign the Lambda function to the original endpoint.

Use the CreateEndpointConfig action to define a new configuration. Specify the existing production variant in the configuration and add a separate ShadowProductionVariants list. Specify the larger instance type for the shadow variant. Use the CreateEndpoint action and pass the new configuration to the endpoint.

Question # 8

A company stores historical data in .csv files in Amazon S3. Only some of the rows and columns in the .csv files are populated. The columns are not labeled. An ML

engineer needs to prepare and store the data so that the company can use the data to train ML models.

Select and order the correct steps from the following list to perform this task. Each step should be selected one time or not at all. (Select and order three.)

• Create an Amazon SageMaker batch transform job for data cleaning and feature engineering.

• Store the resulting data back in Amazon S3.

• Use Amazon Athena to infer the schemas and available columns.

• Use AWS Glue crawlers to infer the schemas and available columns.

• Use AWS Glue DataBrew for data cleaning and feature engineering.

MLA-C01 question answer

Answer:

Explanation:

Step 1: Use AWS Glue crawlers to infer the schemas and available columns.

Step 2: Use AWS Glue DataBrew for data cleaning and feature engineering.

Step 3: Store the resulting data back in Amazon S3.

Step 1: Use AWS Glue Crawlers to Infer Schemas and Available Columns

Why? The data is stored in .csv files with unlabeled columns, and Glue Crawlers can scan the raw data in Amazon S3 to automatically infer the schema, including available columns, data types, and any missing or incomplete entries.

How? Configure AWS Glue Crawlers to point to the S3 bucket containing the .csv files, and run the crawler to extract metadata. The crawler creates a schema in the AWS Glue Data Catalog, which can then be used for subsequent transformations.

Step 2: Use AWS Glue DataBrew for Data Cleaning and Feature Engineering

Why? Glue DataBrew is a visual data preparation tool that allows for comprehensive cleaning and transformation of data. It supports imputation of missing values, renaming columns, feature engineering, and more without requiring extensive coding.

How? Use Glue DataBrew to connect to the inferred schema from Step 1 and perform data cleaning and feature engineering tasks like filling in missing rows/columns, renaming unlabeled columns, and creating derived features.

Step 3: Store the Resulting Data Back in Amazon S3

Why? After cleaning and preparing the data, it needs to be saved back to Amazon S3 so that it can be used for training machine learning models.

How? Configure Glue DataBrew to export the cleaned data to a specific S3 bucket location. This ensures the processed data is readily accessible for ML workflows.

Order Summary:

Use AWS Glue crawlers to infer schemas and available columns.

Use AWS Glue DataBrew for data cleaning and feature engineering.

Store the resulting data back in Amazon S3.

This workflow ensures that the data is prepared efficiently for ML model training while leveraging AWS services for automation and scalability.

Question # 9

An ML engineer receives datasets that contain missing values, duplicates, and extreme outliers. The ML engineer must consolidate these datasets into a single data frame and must prepare the data for ML.

Which solution will meet these requirements?

Use Amazon SageMaker Data Wrangler to import the datasets and to consolidate them into a single data frame. Use the cleansing and enrichment functionalities to prepare the data.

Use Amazon SageMaker Ground Truth to import the datasets and to consolidate them into a single data frame. Use the human-in-the-loop capability to prepare the data.

Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon Q Developer to generate code snippets that will prepare the data.

Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon SageMaker data labeling to prepare the data.

Question # 10

An ML engineer is setting up an Amazon SageMaker AI pipeline for an ML model. The pipeline must automatically initiate a re-training job if any data drift is detected.

How should the ML engineer set up the pipeline to meet this requirement?

Use an AWS Glue crawler and an AWS Glue extract, transform, and load (ETL) job to detect data drift. Use AWS Glue triggers to automate the retraining job.

Use Amazon Managed Service for Apache Flink to detect data drift. Use an AWS Lambda function to automate the re-training job.

Use SageMaker Model Monitor to detect data drift. Use an AWS Lambda function to automate the re-training job.

Use Amazon Quick Suite (previously known as Amazon QuickSight) anomaly detection to detect data drift. Use an AWS Step Functions workflow to automate the re-training job.

Question # 11

A company has trained an ML model in Amazon SageMaker. The company needs to host the model to provide inferences in a production environment.

The model must be highly available and must respond with minimum latency. The size of each request will be between 1 KB and 3 MB. The model will receive unpredictable bursts of requests during the day. The inferences must adapt proportionally to the changes in demand.

How should the company deploy the model into production to meet these requirements?

Create a SageMaker real-time inference endpoint. Configure auto scaling. Configure the endpoint to present the existing model.

Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster. Use ECS scheduled scaling that is based on the CPU of the ECS cluster.

Install SageMaker Operator on an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. Deploy the model in Amazon EKS. Set horizontal pod auto scaling to scale replicas based on the memory metric.

Use Spot Instances with a Spot Fleet behind an Application Load Balancer (ALB) for inferences. Use the ALBRequestCountPerTarget metric as the metric for auto scaling.

Question # 12

A company has AWS Glue data processing jobs that are orchestrated by an AWS Glue workflow. The AWS Glue jobs can run on a schedule or can be launched manually.

The company is developing pipelines in Amazon SageMaker Pipelines for ML model development. The pipelines will use the output of the AWS Glue jobs during the data processing phase of model development. An ML engineer needs to implement a solution that integrates the AWS Glue jobs with the pipelines.

Which solution will meet these requirements with the LEAST operational overhead?

Use AWS Step Functions for orchestration of the pipelines and the AWS Glue jobs.

Use processing steps in SageMaker Pipelines. Configure inputs that point to the Amazon Resource Names (ARNs) of the AWS Glue jobs.

Use Callback steps in SageMaker Pipelines to start the AWS Glue workflow and to stop the pipelines until the AWS Glue jobs finish running.

Use Amazon EventBridge to invoke the pipelines and the AWS Glue jobs in the desired order.

Question # 13

Case study

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.

The dataset has a class imbalance that affects the learning of the model ' s algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

After the data is aggregated, the ML engineer must implement a solution to automatically detect anomalies in the data and to visualize the result.

Which solution will meet these requirements?

Use Amazon Athena to automatically detect the anomalies and to visualize the result.

Use Amazon Redshift Spectrum to automatically detect the anomalies. Use Amazon QuickSight to visualize the result.

Use Amazon SageMaker Data Wrangler to automatically detect the anomalies and to visualize the result.

Use AWS Batch to automatically detect the anomalies. Use Amazon QuickSight to visualize the result.

Answer:

Explanation:

Amazon SageMaker Data Wrangler is a comprehensive tool that streamlines the process of data preparation and offers built-in capabilities for anomaly detection and visualization.

Key Features of SageMaker Data Wrangler:

Data Importation: Connects seamlessly to various data sources, including Amazon S3 and on-premises databases, facilitating the aggregation of transaction logs, customer profiles, and MySQL tables.

Anomaly Detection: Provides built-in analyses to detect anomalies in time series data, enabling the identification of outliers that may indicate fraudulent activities.

Visualization: Offers a suite of visualization tools, such as histograms and scatter plots, to help understand data distributions and relationships, which are crucial for feature engineering and model development.

Implementation Steps:

Data Aggregation:

Import data from Amazon S3 and on-premises MySQL databases into SageMaker Data Wrangler.

Utilize Data Wrangler ' s data flow interface to combine and preprocess datasets, ensuring a unified dataset for analysis.

Anomaly Detection:

Apply the anomaly detection analysis feature to identify outliers in the dataset.

Configure parameters such as the anomaly threshold to fine-tune the detection sensitivity.

Visualization:

Use built-in visualization tools to create charts and graphs that depict data distributions and highlight anomalies.

Interpret these visualizations to gain insights into potential fraud patterns and feature interdependencies.

Advantages of Using SageMaker Data Wrangler:

Integrated Workflow: Combines data preparation, anomaly detection, and visualization within a single interface, streamlining the ML development process.

Operational Efficiency: Reduces the need for multiple tools and complex integrations, thereby minimizing operational overhead.

Scalability: Handles large datasets efficiently, making it suitable for extensive transaction logs and customer profiles.

By leveraging SageMaker Data Wrangler, the ML engineer can effectively detect anomalies and visualize results, facilitating the development of a robust fraud detection model.

Analyze and Visualize - Amazon SageMaker

Transform Data - Amazon SageMaker

Question # 14

An ML engineer is deploying a generative AI model-based customer support agent that uses Amazon SageMaker AI for inference. The customer support agent must respond to customer questions about topics such as shipping policies, refund processes, and account management. The generative AI model generates one token at a time.

Customers report dissatisfaction with how long the customer support agent takes to generate lengthy responses to questions. The ML engineer must apply an inference optimization technique to improve the performance of the customer support agent.

Which solution will meet this requirement?

Compilation

Speculative decoding

Quantization

Fast model loading

Question # 15

An ML engineer is using Amazon Quick Suite (previously known as Amazon QuickSight) anomaly detection to detect very high or very low machine operating temperatures compared to normal. The ML engineer sets the Severity parameter to Low and above. The ML engineer sets the Direction parameter to All.

What effect will the ML engineer observe in the anomaly detection results if the ML engineer changes the Direction parameter to Lower than expected?

Increased anomaly identification frequency and increased recall

Decreased anomaly identification frequency and decreased recall

Increased anomaly identification frequency and decreased recall

Decreased anomaly identification frequency and increased recall

Question # 16

A company is building a conversational AI assistant on Amazon Bedrock. The company is using Retrieval Augmented Generation (RAG) to reference the company ' s internal knowledge base. The AI assistant uses the Anthropic Claude 4 foundation model (FM).

The company needs a solution that uses a vector embedding model, a vector store, and a vector search algorithm.

Which solution will develop the AI assistant with the LEAST development effort?

Use Amazon Kendra Experience Builder.

Use Amazon Aurora PostgreSQL with the pgvector extension.

Use Amazon RDS for PostgreSQL with the pgvector extension.

Use the AWS Glue Data Catalog metadata repository.

Question # 17

Case Study

A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a

central model registry, model deployment, and model monitoring.

The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.

The company needs to use the central model registry to manage different versions of models in the application.

Which action will meet this requirement with the LEAST operational overhead?

Create a separate Amazon Elastic Container Registry (Amazon ECR) repository for each model.

Use Amazon Elastic Container Registry (Amazon ECR) and unique tags for each model version.

Use the SageMaker Model Registry and model groups to catalog the models.

Use the SageMaker Model Registry and unique tags for each model version.

Question # 18

A company is using an Amazon S3 bucket to collect data that will be used for ML workflows. The company needs to use AWS Glue DataBrew to clean and normalize the data.

Which solution will meet these requirements?

Create a DataBrew dataset by using the S3 path. Clean and normalize the data by using a DataBrew profile job.

Create a DataBrew dataset by using the S3 path. Clean and normalize the data by using a DataBrew recipe job.

Create a DataBrew dataset by using a JDBC driver to connect to the S3 bucket. Use a profile job.

Create a DataBrew dataset by using a JDBC driver to connect to the S3 bucket. Use a recipe job.

Question # 19

A company is developing a customer support AI assistant by using an Amazon Bedrock Retrieval Augmented Generation (RAG) pipeline. The AI assistant retrieves articles from a knowledge base stored in Amazon S3. The company uses Amazon OpenSearch Service to index the knowledge base. The AI assistant uses an Amazon Bedrock Titan Embeddings model for vector search.

The company wants to improve the relevance of the retrieved articles to improve the quality of the AI assistant ' s answers.

Which solution will meet these requirements?

Use auto-summarization on the retrieved articles by using Amazon SageMaker JumpStart.

Use a reranker model before passing the articles to the foundation model (FM).

Use Amazon Athena to pre-filter the articles based on metadata before retrieval.

Use Amazon Bedrock Provisioned Throughput to process queries more efficiently.

Question # 20

An ML engineer is building a logistic regression model to predict customer churn for subscription services. The dataset contains two string variables: location and job_seniority_level.

The location variable has 3 distinct values, and the job_seniority_level variable has over 10 distinct values.

The ML engineer must perform preprocessing on the variables.

Which solution will meet this requirement?

Apply tokenization to location. Apply ordinal encoding to job_seniority_level.

Apply one-hot encoding to location. Apply ordinal encoding to job_seniority_level.

Apply binning to location. Apply standard scaling to job_seniority_level.

Apply one-hot encoding to location. Apply standard scaling to job_seniority_level.

Question # 21

A company is developing an ML model to predict customer satisfaction. The company needs to use survey feedback and the past satisfaction level of customers to predict the future satisfaction level of customers.

The dataset includes a column named Feedback that contains long text responses. The dataset also includes a column named Satisfaction Level that contains three distinct values for past customer satisfaction: High, Medium, and Low. The company must apply encoding methods to transform the data in each column.

Which solution will meet these requirements?

Apply one-hot encoding to the Feedback column and the Satisfaction Level column.

Apply one-hot encoding to the Feedback column. Apply ordinal encoding to the Satisfaction Level column.

Apply label encoding to the Feedback column. Apply binary encoding to the Satisfaction Level column.

Apply tokenization to the Feedback column. Apply ordinal encoding to the Satisfaction Level column.

Question # 22

A company is developing an ML model by using Amazon SageMaker AI. The company must monitor bias in the model and display the results on a dashboard. An ML engineer creates a bias monitoring job.

How should the ML engineer capture bias metrics to display on the dashboard?

Capture AWS CloudTrail metrics from SageMaker Clarify.

Capture Amazon CloudWatch metrics from SageMaker Clarify.

Capture SageMaker Model Monitor metrics from Amazon EventBridge.

Capture SageMaker Model Monitor metrics from Amazon SNS.

Question # 23

A company has trained an ML model that is packaged in a container. The company will integrate the model with an existing Python web application. The company needs to host the model on AWS by using Kubernetes.

The company does not want to manage the control plane and must provision the resources in a repeatable manner. The infrastructure must be provisioned by using Python.

Which solution will meet these requirements?

Use AWS CloudFormation to provision Amazon EC2 instances in multiple Availability Zones. Set up a Kubernetes cluster. Host the model container on the Kubernetes cluster.

Use the AWS CLI to provision an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. Store the image in an Amazon Elastic Container Registry (Amazon ECR) repository. Host the model container on the EKS cluster.

Use the AWS Cloud Development Kit (AWS CDK) to provision an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. Store the image in an Amazon Elastic Container Registry (Amazon ECR) repository. Host the model container on the EKS cluster.

Use AWS CloudFormation to provision an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. Store the image in an Amazon Elastic Container Registry (Amazon ECR) repository. Host the model container on the EKS cluster.

Question # 24

A company collects customer data every day. The company stores the data as compressed files in an Amazon S3 bucket that is partitioned by date. Every month, analysts download the data, process the data to check the data quality, and then upload the data to Amazon QuickSight dashboards.

An ML engineer needs to implement a solution to automatically check the data quality before the data is sent to QuickSight.

Which solution will meet these requirements with the LEAST operational overhead?

Run an AWS Glue crawler every month to update the AWS Glue Data Catalog. Use AWS Glue Data Quality rules to check the data quality.

Use an AWS Glue trigger to run an AWS Glue crawler every month to update the AWS Glue Data Catalog. Create an AWS Glue job that loads the data into a PySpark DataFrame. Configure the job to apply custom functions and to evaluate the data quality.

Run Python scripts on an AWS Lambda function every month to evaluate data quality. Configure the S3 bucket to invoke the Lambda function when objects are added to the S3 bucket.

Configure the S3 bucket to send event notifications to an Amazon Simple Queue Service (Amazon SQS) queue when objects are uploaded. Use Amazon CloudWatch insights every month for the SQS queue to evaluate the data quality.

Question # 25

A company plans to use Amazon SageMaker AI to build image classification models. The company has 6 TB of training data stored on Amazon FSx for NetApp ONTAP. The file system is in the same VPC as SageMaker AI.

An ML engineer must make the training data accessible to SageMaker AI training jobs.

Which solution will meet these requirements?

Mount the FSx for ONTAP file system as a volume to the SageMaker AI instance.

Create an Amazon S3 bucket and use Mountpoint for Amazon S3 to link the bucket to FSx for ONTAP.

Create a catalog connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

Create a direct connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

Question # 26

A company has a Retrieval Augmented Generation (RAG) application that uses a vector database to store embeddings of documents. The company must migrate the application to AWS and must implement a solution that provides semantic search of text files. The company has already migrated the text repository to an Amazon S3 bucket.

Which solution will meet these requirements?

Use an AWS Batch job to process the files and generate embeddings. Use AWS Glue to store the embeddings. Use SQL queries to perform the semantic searches.

Use a custom Amazon SageMaker notebook to run a custom script to generate embeddings. Use SageMaker Feature Store to store the embeddings. Use SQL queries to perform the semantic searches.

Use the Amazon Kendra S3 connector to ingest the documents from the S3 bucket into Amazon Kendra. Query Amazon Kendra to perform the semantic searches.

Use an Amazon Textract asynchronous job to ingest the documents from the S3 bucket. Query Amazon Textract to perform the semantic searches.

Question # 27

An ML engineer is training an ML model to identify medical patients for disease screening. The tabular dataset for training contains 50,000 patient records: 1,000 with the disease and 49,000 without the disease.

The ML engineer splits the dataset into a training dataset, a validation dataset, and a test dataset.

What should the ML engineer do to transform the data and make the data suitable for training?

Apply principal component analysis (PCA) to oversample the minority class in the training dataset.

Apply Synthetic Minority Oversampling Technique (SMOTE) to generate new synthetic samples of the minority class in the training dataset.

Randomly oversample the majority class in the validation dataset.

Apply k-means clustering to undersample the minority class in the test dataset.

Question # 28

A company is exploring generative AI and wants to add a new product feature. An ML engineer is making API calls from existing Amazon EC2 instances to Amazon Bedrock.

The EC2 instances are in a private subnet and must remain private during the implementation. The EC2 instances have a security group that allows access to all IP addresses in the private subnet.

What should the ML engineer do to establish a connection between the EC2 instances and Amazon Bedrock?

Modify the security group to allow inbound and outbound traffic to and from Amazon Bedrock.

Use AWS PrivateLink to access Amazon Bedrock through an interface VPC endpoint.

Configure Amazon Bedrock to use the private subnet where the EC2 instances are deployed.

Use AWS Direct Connect to link the VPC to Amazon Bedrock.

Question # 29

A company runs an ML model on Amazon SageMaker AI. The company uses an automatic process that makes API calls to create training jobs for the model. The company has new compliance rules that prohibit the collection of aggregated metadata from training jobs.

Which solution will prevent SageMaker AI from collecting metadata from the training jobs?

Opt out of metadata tracking for any training job that is submitted.

Ensure that training jobs are running in a private subnet in a custom VPC.

Encrypt the training data with an AWS Key Management Service (AWS KMS) customer managed key.

Reconfigure the training jobs to use only AWS Nitro instances.

Question # 30

A company is planning to create several ML prediction models. The training data is stored in Amazon S3. The entire dataset is more than 5 ?? in size and consists of CSV, JSON, Apache Parquet, and simple text files.

The data must be processed in several consecutive steps. The steps include complex manipulations that can take hours to finish running. Some of the processing involves natural language processing (NLP) transformations. The entire process must be automated.

Which solution will meet these requirements?

Process data at each step by using Amazon SageMaker Data Wrangler. Automate the process by using Data Wrangler jobs.

Use Amazon SageMaker notebooks for each data processing step. Automate the process by using Amazon EventBridge.

Process data at each step by using AWS Lambda functions. Automate the process by using AWS Step Functions and Amazon EventBridge.

Use Amazon SageMaker Pipelines to create a pipeline of data processing steps. Automate the pipeline by using Amazon EventBridge.

Question # 31

A company is building an Amazon SageMaker AI pipeline for an ML model. The pipeline uses distributed processing and distributed training.

An ML engineer needs to encrypt network communication between instances that run distributed jobs. The ML engineer configures the distributed jobs to run in a private VPC.

What should the ML engineer do to meet the encryption requirement?

Enable network isolation.

Configure traffic encryption by using security groups.

Enable inter-container traffic encryption.

Enable VPC flow logs.

Answer:

Explanation:

In Amazon SageMaker, distributed training and distributed processing jobs often involve multiple instances exchanging data over the network. By default, when these jobs run inside a VPC, network traffic remains private but is not automatically encrypted between instances. When compliance or security requirements mandate encryption of in-transit data, additional configuration is required.

The correct solution is to enable inter-container traffic encryption, which ensures that all network communication between containers running on different instances is encrypted using TLS. Amazon SageMaker provides a built-in feature for this purpose. When inter-container traffic encryption is enabled, SageMaker automatically configures secure communication channels between all nodes participating in a distributed job, including training clusters and processing jobs.

Option A (Network isolation) is incorrect because network isolation prevents containers from making outbound network calls and accessing the internet. It does not encrypt traffic between instances.

Option B (Security groups) is incorrect because security groups control network access and traffic flow, not encryption. They can restrict which instances can communicate, but they do not provide data-in-transit encryption.

Option D (VPC flow logs) is incorrect because VPC flow logs are used for monitoring and auditing network traffic, not for encrypting it.

AWS documentation explicitly states that enabling inter-container traffic encryption is the recommended and supported approach for encrypting data exchanged between instances during distributed SageMaker jobs. This feature aligns with enterprise security best practices and regulatory requirements for protecting sensitive ML training data in transit.

Therefore, Option C is the only solution that directly fulfills the encryption requirement for distributed SageMaker workloads.

Question # 32

An ML engineer is designing an AI-powered traffic management system. The system must use near real-time inference to predict congestion and prevent collisions.

The system must also use batch processing to perform historical analysis of predictions over several hours to improve the model. The inference endpoints must scale automatically to meet demand.

Which combination of solutions will meet these requirements? (Select TWO.)

Use Amazon SageMaker real-time inference endpoints with automatic scaling based on ConcurrentInvocationsPerInstance.

Use AWS Lambda with reserved concurrency and SnapStart to connect to SageMaker endpoints.

Use an Amazon SageMaker Processing job for batch historical analysis. Schedule the job with Amazon EventBridge.

Use Amazon EC2 Auto Scaling to host containers for batch analysis.

Use AWS Lambda for historical analysis.

Question # 33

A company needs to run a batch data-processing job on Amazon EC2 instances. The job will run during the weekend and will take 90 minutes to finish running. The processing can handle interruptions. The company will run the job every weekend for the next 6 months.

Which EC2 instance purchasing option will meet these requirements MOST cost-effectively?

Spot Instances

Reserved Instances

On-Demand Instances

Dedicated Instances

Question # 34

A company ' s dataset for prediction analytics contains duplicate records, missing data, and unusually extreme high or low values. The company needs a solution to resolve the data quality issues quickly. The solution must maintain data integrity and have the LEAST operational overhead.

Which solution will meet these requirements?

Use AWS Glue DataBrew to delete duplicate records, fill missing values with medians, and replace extreme values with values in a normal range.

Configure an AWS Glue job to identify records with missing values and extreme measurements and delete them.

Create an Amazon EMR Spark job to replace missing values with zeros and merge duplicate records.

Use Amazon SageMaker Data Wrangler to delete duplicates, apply statistical modeling for missing values, and apply outlier detection algorithms.

Question # 35

A company stores training data as a .csv file in an Amazon S3 bucket. The company must encrypt the data and must control which applications have access to the encryption key.

Which solution will meet these requirements?

Create a new SSH access key and use the AWS Encryption CLI to encrypt the file.

Create a new API key by using Amazon API Gateway and use it to encrypt the file.

Create a new IAM role with permissions for kms:GenerateDataKey and use the role to encrypt the file.

Create a new AWS Key Management Service (AWS KMS) key and use the AWS Encryption CLI with the KMS key to encrypt the file.

Question # 36

A company uses a training job on Amazon SageMaker Al to train a neural network. The job first trains a model and then evaluates the model ' s performance ag

test dataset. The company uses the results from the evaluation phase to decide if the trained model will go to production.

The training phase takes too long. The company needs solutions that can shorten training time without decreasing the model ' s final performance.

Select the correct solutions from the following list to meet the requirements for each description. Select each solution one time or not at all. (Select THREE.)

. Change the epoch count.

. Choose an Amazon EC2 Spot Fleet.

· Change the batch size.

. Use early stopping on the training job.

· Use the SageMaker Al distributed data parallelism (SMDDP) library.

. Stop the training job.

MLA-C01 question answer

Question # 37

A company needs to deploy a custom-trained classification ML model on AWS. The model must make near real-time predictions with low latency and must handle variable request volumes.

Which solution will meet these requirements?

Create an Amazon SageMaker AI batch transform job to process inference requests in batches.

Use Amazon API Gateway to receive prediction requests. Use an Amazon S3 bucket to host and serve the model.

Deploy an Amazon SageMaker AI endpoint. Configure auto scaling for the endpoint.

Launch AWS Deep Learning AMIs (DLAMI) on two Amazon EC2 instances. Run the instances behind an Application Load Balancer.

Question # 38

Which solution will meet these requirements?

Use an AWS Batch job to process the files and generate embeddings. Use AWS Glue to store the embeddings. Use SQL queries to perform the semantic searches.

Use a custom Amazon SageMaker AI notebook to run a custom script to generate embeddings. Use SageMaker Feature Store to store the embeddings. Use SQL queries to perform the semantic searches.

Use the Amazon Kendra S3 connector to ingest the documents from the S3 bucket into Amazon Kendra. Query Amazon Kendra to perform the semantic searches.

Use an Amazon Textract asynchronous job to ingest the documents from the S3 bucket. Query Amazon Textract to perform the semantic searches.

Question # 39

An ML engineer is training a simple neural network model. The model’s performance improves initially and then degrades after a certain number of epochs.

Which solutions will mitigate this problem? (Select TWO.)

Enable early stopping on the model.

Increase dropout in the layers.

Increase the number of layers.

Increase the number of neurons.

Investigate and reduce the sources of model bias.

Question # 40

A company has a conversational AI assistant that sends requests through Amazon Bedrock to an Anthropic Claude large language model (LLM). Users report that when they ask similar questions multiple times, they sometimes receive different answers. An ML engineer needs to improve the responses to be more consistent and less random.

Which solution will meet these requirements?

Increase the temperature parameter and the top_k parameter.

Increase the temperature parameter. Decrease the top_k parameter.

Decrease the temperature parameter. Increase the top_k parameter.

Decrease the temperature parameter and the top_k parameter.

Question # 41

A company has historical data that shows whether customers needed long-term support from company staff. The company needs to develop an ML model to predict whether new customers will require long-term support.

Which modeling approach should the company use to meet this requirement?

Anomaly detection

Linear regression

Logistic regression

Semantic segmentation

Question # 42

A financial company receives a high volume of real-time market data streams from an external provider. The streams consist of thousands of JSON records every second.

The company needs to implement a scalable solution on AWS to identify anomalous data points.

Which solution will meet these requirements with the LEAST operational overhead?

Ingest real-time data into Amazon Kinesis data streams. Use the built-in RANDOM_CUT_FOREST function in Amazon Managed Service for Apache Flink to process the data streams and to detect data anomalies.

Ingest real-time data into Amazon Kinesis data streams. Deploy an Amazon SageMaker endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

Ingest real-time data into Apache Kafka on Amazon EC2 instances. Deploy an Amazon SageMaker endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

Send real-time data to an Amazon Simple Queue Service (Amazon SQS) FIFO queue. Create an AWS Lambda function to consume the queue messages. Program the Lambda function to start an AWS Glue extract, transform, and load (ETL) job for batch processing and anomaly detection.

Question # 43

A company is running ML models on premises by using custom Python scripts and proprietary datasets. The company is using PyTorch. The model building requires unique domain knowledge. The company needs to move the models to AWS.

Which solution will meet these requirements with the LEAST effort?

Use SageMaker built-in algorithms to train the proprietary datasets.

Use SageMaker script mode and premade images for ML frameworks.

Build a container on AWS that includes custom packages and a choice of ML frameworks.

Purchase similar production models through AWS Marketplace.

Question # 44

A company uses Amazon SageMaker for its ML workloads. The company ' s ML engineer receives a 50 MB Apache Parquet data file to build a fraud detection model. The file includes several correlated columns that are not required.

What should the ML engineer do to drop the unnecessary columns in the file with the LEAST effort?

Download the file to a local workstation. Perform one-hot encoding by using a custom Python script.

Create an Apache Spark job that uses a custom processing script on Amazon EMR.

Create a SageMaker processing job by calling the SageMaker Python SDK.

Create a data flow in SageMaker Data Wrangler. Configure a transform step.

Question # 45

An ML engineer uses an Amazon SageMaker AI notebook instance to run a training job that trains a neural network model with an estimator. The training job loads data iteratively from an Amazon S3 path that is configured as an environment variable. The ML engineer viewed a profiling report of the training job. The ML engineer discovered that a substantial amount of the training time is spent during data loading.

How can the ML engineer improve the training speed?

Provision Amazon Elastic Block Store (Amazon EBS) Provisioned IOPS SSD io1 storage during the estimator initialization. Download the training data from the S3 path to Amazon EBS. Point the data loader to the EBS location.

Provision Amazon Elastic File System (Amazon EFS) storage during the estimator initialization. Download the training data to Amazon EFS by using the S3 path. Point the data loader to the EFS location.

Download the training data to the estimator by using fast file mode. Point the data loader to the location specified by the S3 path.

Configure the path to the S3 bucket that contains the training data as a hyperparameter instead of an environment variable.

Question # 46

A company is developing an ML model to forecast future values based on time series data. The dataset includes historical measurements collected at regular intervals and categorical features. The model needs to predict future values based on past patterns and trends.

Which algorithm and hyperparameters should the company use to develop the model?

Use the Amazon SageMaker AI XGBoost algorithm. Set the scale_pos_weight hyperparameter to adjust for class imbalance.

Use k-means clustering with k to specify the number of clusters.

Use the Amazon SageMaker AI DeepAR algorithm with matching context length and prediction length hyperparameters.

Use the Amazon SageMaker AI Random Cut Forest (RCF) algorithm with contamination to set the expected proportion of anomalies.

Question # 47

A healthcare analytics company wants to segment patients into groups that have similar risk factors to develop personalized treatment plans. The company has a dataset that includes patient health records, medication history, and lifestyle changes. The company must identify the appropriate algorithm to determine the number of groups by using hyperparameters.

Which solution will meet these requirements?

Use the Amazon SageMaker AI XGBoost algorithm. Set max_depth to control tree complexity for risk groups.

Use the Amazon SageMaker k-means clustering algorithm. Set k to specify the number of clusters.

Use the Amazon SageMaker AI DeepAR algorithm. Set epochs to determine the number of training iterations for risk groups.

Use the Amazon SageMaker AI Random Cut Forest (RCF) algorithm. Set a contamination hyperparameter for risk anomaly detection.

Question # 48

An ML engineering team has a data processing pipeline that ingests sensor data from IoT devices into an Amazon S3 bucket. The pipeline then processes the data by using AWS Glue extract, transform, and load (ETL) jobs for ML modeling. The team noticed throttling errors in the ETL jobs. The data ingestion process has also been slower than normal.

What is the cause of the problem?

The AWS Glue service quotas have been reached.

The network bandwidth between the IoT devices and the AWS Region is insufficient.

The AWS Glue ETL jobs are not optimized for parallel processing.

The AWS Glue execution role is missing Amazon S3 permissions.

Question # 49

A company uses ML models to predict whether transactions are fraudulent. The company needs to identify as many fraudulent transactions as possible. Which evaluation metric should the company use to evaluate the models to meet this requirement?

F1 score

Area Under the ROC Curve (AUC)

Precision

Recall

Question # 50

An ML engineer needs to deploy a trained model based on a genetic algorithm. Predictions can take several minutes, and requests can include up to 100 MB of data.

Which deployment solution will meet these requirements with the LEAST operational overhead?

Deploy on EC2 Auto Scaling behind an ALB.

Deploy to a SageMaker AI real-time endpoint.

Deploy to a SageMaker AI Asynchronous Inference endpoint.

Deploy to Amazon ECS on EC2.

Question # 51

A company is planning to use Amazon Redshift ML in its primary AWS account. The source data is in an Amazon S3 bucket in a secondary account.

An ML engineer needs to set up an ML pipeline in the primary account to access the S3 bucket in the secondary account. The solution must not require public IPv4 addresses.

Which solution will meet these requirements?

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC with no public access enabled in the primary account. Create a VPC peering connection between the accounts. Update the VPC route tables to remove the route to 0.0.0.0/0.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC with no public access enabled in the primary account. Create an AWS Direct Connect connection and a transit gateway. Associate the VPCs from both accounts with the transit gateway. Update the VPC route tables to remove the route to 0.0.0.0/0.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an AWS Site-to-Site VPN connection with two encrypted IPsec tunnels between the accounts. Set up interface VPC endpoints for Amazon S3.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an S3 gateway endpoint. Update the S3 bucket policy to allow IAM principals from the primary account. Set up interface VPC endpoints for SageMaker and Amazon Redshift.

Question # 52

A company wants to reduce the cost of its containerized ML applications. The applications use ML models that run on Amazon EC2 instances, AWS Lambda functions, and an Amazon Elastic Container Service (Amazon ECS) cluster. The EC2 workloads and ECS workloads use Amazon Elastic Block Store (Amazon EBS) volumes to save predictions and artifacts.

An ML engineer must identify resources that are being used inefficiently. The ML engineer also must generate recommendations to reduce the cost of these resources.

Which solution will meet these requirements with the LEAST development effort?

Create code to evaluate each instance ' s memory and compute usage.

Add cost allocation tags to the resources. Activate the tags in AWS Billing and Cost Management.

Check AWS CloudTrail event history for the creation of the resources.

Run AWS Compute Optimizer.

Question # 53

A company is developing an ML model for a customer. The training data is stored in an Amazon S3 bucket in the customer ' s AWS account (Account A). The company runs Amazon SageMaker AI training jobs in a separate AWS account (Account B).

The company defines an S3 bucket policy and an IAM policy to allow reads to the S3 bucket.

Which additional steps will meet the cross-account access requirement?

Create the S3 bucket policy in Account A. Attach the IAM policy to an IAM role that SageMaker AI uses in Account A.

Create the S3 bucket policy in Account A. Attach the IAM policy to an IAM role that SageMaker AI uses in Account B.

Create the S3 bucket policy in Account B. Attach the IAM policy to an IAM role that SageMaker AI uses in Account A.

Create the S3 bucket policy in Account B. Attach the IAM policy to an IAM role that SageMaker AI uses in Account B.

Question # 54

An ML engineer needs to deploy ML models to get inferences from large datasets in an asynchronous manner. The ML engineer also needs to implement scheduled monitoring of the data quality of the models. The ML engineer must receive alerts when changes in data quality occur.

Which solution will meet these requirements?

Deploy the models by using scheduled AWS Glue jobs. Use Amazon CloudWatch alarms to monitor the data quality and to send alerts.

Deploy the models by using scheduled AWS Batch jobs. Use AWS CloudTrail to monitor the data quality and to send alerts.

Deploy the models by using Amazon Elastic Container Service (Amazon ECS) on AWS Fargate. Use Amazon EventBridge to monitor the data quality and to send alerts.

Deploy the models by using Amazon SageMaker AI batch transform. Use SageMaker Model Monitor to monitor the data quality and to send alerts.

Question # 55

An ML engineer must choose the appropriate Amazon SageMaker algorithm to solve specific AI problems.

Select the correct SageMaker built-in algorithm from the following list for each use case. Each algorithm should be selected one time.

• Random Cut Forest (RCF) algorithm

• Semantic segmentation algorithm

• Sequence-to-Sequence (seq2seq) algorithm

MLA-C01 question answer

Question # 56

A company has built more than 50 models and deployed the models on Amazon SageMaker Al as real-time inference

endpoints. The company needs to reduce the costs of the SageMaker Al inference endpoints. The company used the same

ML framework to build the models. The company ' s customers require low-latency access to the models.

Select and order the correct steps from the following list to reduce the cost of inference and keep latency low. Select each

step one time or not at all. (Select and order FIVE.)

· Create an endpoint configuration that references a multi-model container.

. Create a SageMaker Al model with multi-model endpoints enabled.

. Deploy a real-time inference endpoint by using the endpoint configuration.

. Deploy a serverless inference endpoint configuration by using the endpoint configuration.

· Spread the existing models to multiple different Amazon S3 bucket paths.

. Upload the existing models to the same Amazon S3 bucket path.

. Update the models to use the new endpoint ID. Pass the model IDs to the new endpoint.

MLA-C01 question answer

Question # 57

Which solution will meet these requirements with the LEAST development effort?

Use SageMaker AI built-in algorithms to train the proprietary datasets.

Use SageMaker AI script mode and premade images for ML frameworks.

Build a container on AWS that includes custom packages and a choice of ML frameworks.

Purchase similar production models through AWS Marketplace.

Question # 58

A logistics company has installed in-vehicle cameras for basic monitoring of its drivers. The company wants to improve driver safety by identifying distractions that could lead to accidents.

Which solution will meet this requirement with the LEAST operational effort?

Use Amazon Rekognition eye gaze direction detection to monitor driver behavior and identify distractions.

Use Amazon SageMaker AI to customize an AI model to monitor driver behavior and identify distractions.

Integrate a third-party driver monitoring system with Amazon Rekognition to monitor driver behavior and identify distractions.

Use Amazon Comprehend to analyze text-based driver feedback and identify distractions.

Question # 59

An ML engineer wants to re-train an XGBoost model at the end of each month. A data team prepares the training data. The training dataset is a few hundred megabytes in size. When the data is ready, the data team stores the data as a new file in an Amazon S3 bucket.

The ML engineer needs a solution to automate this pipeline. The solution must register the new model version in Amazon SageMaker Model Registry within 24 hours.

Which solution will meet these requirements?

Create an AWS Lambda function that runs one time each week to poll the S3 bucket for new files. Invoke the Lambda function asynchronously. Configure the Lambda function to start the pipeline if the function detects new data.

Create an Amazon CloudWatch rule that runs on a schedule to start the pipeline every 30 days.

Create an S3 Lifecycle rule to start the pipeline every time a new object is uploaded to the S3 bucket.

Create an Amazon EventBridge rule to start an AWS Step Functions TrainingStep every time a new object is uploaded to the S3 bucket.

Question # 60

Case Study

A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a

central model registry, model deployment, and model monitoring.

The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.

The company needs to run an on-demand workflow to monitor bias drift for models that are deployed to real-time endpoints from the application.

Which action will meet this requirement?

Configure the application to invoke an AWS Lambda function that runs a SageMaker Clarify job.

Invoke an AWS Lambda function to pull the sagemaker-model-monitor-analyzer built-in SageMaker image.

Use AWS Glue Data Quality to monitor bias.

Use SageMaker notebooks to compare the bias.

Answer:

Explanation:

Monitoring bias drift in deployed machine learning models is crucial to ensure fairness and accuracy over time. Amazon SageMaker Clarify provides tools to detect bias in ML models, both during training and after deployment. To monitor bias drift for models deployed to real-time endpoints, an effective approach involves orchestrating SageMaker Clarify jobs using AWS Lambda functions.

Implementation Steps:

Set Up Data Capture:

Enable data capture on the SageMaker endpoint to record input data and model predictions. This captured data serves as the basis for bias analysis.

Develop a Lambda Function:

Create an AWS Lambda function configured to initiate a SageMaker Clarify job. This function will process the captured data to assess bias metrics.

Schedule or Trigger the Lambda Function:

Configure the Lambda function to run on-demand or at scheduled intervals using Amazon CloudWatch Events or EventBridge. This setup allows for regular bias monitoring as per the application ' s requirements.

Analyze and Respond to Results:

After each Clarify job completes, review the generated bias reports. If bias drift is detected, take appropriate actions, such as retraining the model or adjusting data preprocessing steps.

Advantages of This Approach:

Automation: Utilizing AWS Lambda for orchestrating Clarify jobs enables automated and scalable bias monitoring without manual intervention.

Cost-Effectiveness: AWS Lambda ' s serverless nature ensures that you only pay for the compute time consumed during the execution of the function, optimizing resource usage.

Flexibility: The solution can be tailored to specific monitoring needs, allowing for adjustments in monitoring frequency and analysis parameters.

By implementing this solution, the company can effectively monitor bias drift in real-time, ensuring that the AI application maintains fairness and accuracy throughout its lifecycle.

[References:, Bias drift for models in production - Amazon SageMaker, Schedule Bias Drift Monitoring Jobs - Amazon SageMaker, , , , ]

Question # 61

A company has deployed a model to predict the churn rate for its games by using Amazon SageMaker Studio. After the model is deployed, the company must monitor the model performance for data drift and inspect the report. Select and order the correct steps from the following list to model monitor actions. Select each step one time. (Select and order THREE.) .

Check the analysis results on the SageMaker Studio console. .

Create a Shapley Additive Explanations (SHAP) baseline for the model by using Amazon SageMaker Clarify.

Schedule an hourly model explainability monitor.

MLA-C01 question answer

Question # 62

A company regularly receives new training data from a vendor of an ML model. The vendor delivers cleaned and prepared data to the company’s Amazon S3 bucket every 3–4 days.

The company has an Amazon SageMaker AI pipeline to retrain the model. An ML engineer needs to run the pipeline automatically when new data is uploaded to the S3 bucket.

Which solution will meet these requirements with the LEAST operational effort?

Create an S3 lifecycle rule to transfer the data to the SageMaker AI training instance and initiate training.

Create an AWS Lambda function that scans the S3 bucket and initiates the pipeline when new data is uploaded.

Create an Amazon EventBridge rule that matches S3 upload events and configures the SageMaker pipeline as the target.

Use Amazon Managed Workflows for Apache Airflow (MWAA) to orchestrate the pipeline when new data is uploaded.

Question # 63

A company is using Amazon SageMaker AI to build an ML model to predict customer behavior. The company needs to explain the bias in the model to an auditor. The explanation must focus on demographic data of the customers.

Which solution will meet these requirements?

Use SageMaker Clarify to generate a bias report. Send the report to the auditor.

Use AWS Glue DataBrew to create a job to detect drift in the model ' s data quality. Send the job output to the auditor.

Use Amazon QuickSight integration with SageMaker AI to generate a bias report. Send the report to the auditor.

Use Amazon CloudWatch metrics from the SageMaker AI namespace to create a bias dashboard. Share the dashboard with the auditor.

Question # 64

An ML engineering team is spread across multiple locations. When the lead ML engineer opens an Amazon SageMaker AI notebook, the ML engineer does not see the latest merged notebook made by other team members from a Git repository.

The lead ML engineer must see the latest SageMaker AI notebook updates.

Which solution will meet this requirement?

Run the !git pull origin master command.

Run the !git commit command.

Run the !git push origin master command.

Run the !git branch command.

Question # 65

A company is training a deep learning model to detect abnormalities in images. The company has limited GPU resources and a large hyperparameter space to explore. The company needs to test different configurations and avoid wasting computation time on poorly performing models that show weak validation accuracy in early epochs.

Which hyperparameter optimization strategy should the company use?

Grid search across all possible combinations

Bayesian optimization with early stopping

Manual tuning of each parameter individually

Exhaustive search without early stopping

Question # 66

A company has an ML model that is deployed to an Amazon SageMaker AI endpoint for real-time inference. The company needs to deploy a new model. The company must compare the new model’s performance to the currently deployed model ' s performance before shifting all traffic to the new model.

Which solution will meet these requirements with the LEAST operational effort?

Deploy the new model to a separate endpoint. Manually split traffic between the two endpoints.

Deploy the new model to a separate endpoint. Use Amazon CloudFront to distribute traffic between the two endpoints.

Deploy the new model as a shadow variant on the same endpoint as the current model. Route a portion of live traffic to the shadow model for evaluation.

Use AWS Lambda functions with custom logic to route traffic between the current model and the new model.

Question # 67

A company uses a batching solution to process data analytics each day. The company wants to build an analytics platform to provide near real-time updates. The company wants to use open source technology and does not want to manage or scale the infrastructure.

Which solution will meet these requirements?

Create Amazon Managed Streaming for Apache Kafka (Amazon MSK) Serverless clusters to process the data.

Create Amazon Managed Streaming for Apache Kafka (Amazon MSK) Provisioned clusters. Configure the clusters based on data volume.

Create data streams in Amazon Kinesis Data Streams. Use AWS Application Auto Scaling to scale the infrastructure.

Create self-hosted Apache Flink applications on Amazon EC2. Run the applications as containers.

Question # 68

An ML model is deployed in production. The model has performed well and has met its metric thresholds for months.

An ML engineer who is monitoring the model observes a sudden degradation. The performance metrics of the model are now below the thresholds.

What could be the cause of the performance degradation?

Lack of training data

Drift in production data distribution

Compute resource constraints

Model overfitting

Question # 69

A company wants to evaluate a new ML model architecture to understand its performance before deploying the model to production. The company wants to use Amazon SageMaker AI shadow testing.

The company needs to analyze the performance metrics of the shadow model and the production model without affecting the existing production endpoint. The analysis must use real-time inference requests.

Select and order the correct steps to implement shadow testing and compare the model variants in SageMaker AI. Select each step one time or not at all (Select and order Three)

MLA-C01 question answer

Question # 70

A company has deployed an XGBoost prediction model in production to predict if a customer is likely to cancel a subscription. The company uses Amazon SageMaker Model Monitor to detect deviations in the F1 score.

During a baseline analysis of model quality, the company recorded a threshold for the F1 score. After several months of no change, the model ' s F1 score decreases significantly.

What could be the reason for the reduced F1 score?

Concept drift occurred in the underlying customer data that was used for predictions.

The model was not sufficiently complex to capture all the patterns in the original baseline data.

The original baseline data had a data quality issue of missing values.

Incorrect ground truth labels were provided to Model Monitor during the calculation of the baseline.

Question # 71

A financial company receives a high volume of real-time market data streams from an external provider. The streams consist of thousands of JSON records every second.

The company needs to implement a scalable solution on AWS to identify anomalous data points.

Which solution will meet these requirements with the LEAST operational overhead?

Ingest real-time data into Amazon Kinesis data streams. Deploy an Amazon SageMaker AI endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

Ingest real-time data into Apache Kafka on Amazon EC2 instances. Deploy an Amazon SageMaker AI endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

Answer:

Explanation:

The correct answer is A. Ingest real-time data into Amazon Kinesis data streams. Use the built-in RANDOM_CUT_FOREST function in Amazon Managed Service for Apache Flink to process the data streams and to detect data anomalies.

Amazon Kinesis Data Streams is a fully managed service that can handle high-volume, real-time data streams with low latency and high scalability. By integrating Amazon Managed Service for Apache Flink (MSK for Flink) with the built-in RANDOM_CUT_FOREST (RCF) algorithm, the company can perform real-time anomaly detection directly on streaming data without building or managing custom infrastructure. RCF is designed for unsupervised anomaly detection on streaming datasets, making it ideal for financial market data that arrives continuously and at high velocity.

Option B adds operational complexity because it requires deploying a SageMaker endpoint, setting up Lambda functions, and maintaining the orchestration between Kinesis and Lambda. Option C further increases overhead by requiring self-managed Apache Kafka clusters on EC2, combined with SageMaker and Lambda orchestration. Option D introduces SQS and batch ETL processing via AWS Glue, which is not suitable for real-time anomaly detection and significantly increases latency.

Using Kinesis + Managed Flink + RCF provides a serverless, fully managed, and scalable solution with minimal operational overhead. It handles ingestion, streaming processing, and anomaly detection natively. The architecture eliminates the need for provisioning compute clusters or managing real-time orchestration, reducing operational cost while achieving sub-second detection for thousands of JSON records per second.

This approach aligns with AWS best practices for ML solution monitoring, maintenance, and security, particularly for real-time anomaly detection in high-volume, structured or semi-structured data streams.

Question # 72

An ML engineer needs to implement a solution to host a trained ML model. The rate of requests to the model will be inconsistent throughout the day.

The ML engineer needs a scalable solution that minimizes costs when the model is not in use. The solution also must maintain the model ' s capacity to respond to requests during times of peak usage.

Which solution will meet these requirements?

Create AWS Lambda functions that have fixed concurrency to host the model. Configure the Lambda functions to automatically scale based on the number of requests to the model.

Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster that uses AWS Fargate. Set a static number of tasks to handle requests during times of peak usage.

Deploy the model to an Amazon SageMaker endpoint. Deploy multiple copies of the model to the endpoint. Create an Application Load Balancer to route traffic between the different copies of the model at the endpoint.

Deploy the model to an Amazon SageMaker endpoint. Create SageMaker endpoint auto scaling policies that are based on Amazon CloudWatch metrics to adjust the number of instances dynamically.

Pre-Summer Special Sale - 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: spcl70

Crack4sure Logo

Main Navigation

Practice Free MLA-C01 AWS Certified Machine Learning Engineer - Associate Exam Questions Answers With Explanation

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation: