Spring Special Sale - 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: spcl70

Practice Free MLA-C01 AWS Certified Machine Learning Engineer - Associate Exam Questions Answers With Explanation

We at Crack4sure are committed to giving students who are preparing for the Amazon Web Services MLA-C01 Exam the most current and reliable questions . To help people study, we've made some of our AWS Certified Machine Learning Engineer - Associate exam materials available for free to everyone. You can take the Free MLA-C01 Practice Test as many times as you want. The answers to the practice questions are given, and each answer is explained.

Question # 6

A company is creating an ML model to identify defects in a product. The company has gathered a dataset and has stored the dataset in TIFF format in Amazon S3. The dataset contains 200 images in which the most common defects are visible. The dataset also contains 1,800 images in which there is no defect visible.

An ML engineer trains the model and notices poor performance in some classes. The ML engineer identifies a class imbalance problem in the dataset.

What should the ML engineer do to solve this problem?

A.

Use a few hundred images and Amazon Rekognition Custom Labels to train a new model.

B.

Undersample the 200 images in which the most common defects are visible.

C.

Oversample the 200 images in which the most common defects are visible.

D.

Use all 2,000 images and Amazon Rekognition Custom Labels to train a new model.

Question # 7

An ML engineer is building a model to predict house and apartment prices. The model uses three features: Square Meters, Price, and Age of Building. The dataset has 10,000 data rows. The data includes data points for one large mansion and one extremely small apartment.

The ML engineer must perform preprocessing on the dataset to ensure that the model produces accurate predictions for the typical house or apartment.

Which solution will meet these requirements?

A.

Remove the outliers and perform a log transformation on the Square Meters variable.

B.

Keep the outliers and perform normalization on the Square Meters variable.

C.

Remove the outliers and perform one-hot encoding on the Square Meters variable.

D.

Keep the outliers and perform one-hot encoding on the Square Meters variable.

Question # 8

A company's ML engineer has deployed an ML model for sentiment analysis to an Amazon SageMaker AI endpoint. The ML engineer needs to explain to company stakeholders how the model makes predictions.

Which solution will provide an explanation for the model's predictions?

A.

Use SageMaker Model Monitor on the deployed model.

B.

Use SageMaker Clarify on the deployed model.

C.

Show the distribution of inferences from A/B testing in Amazon CloudWatch.

D.

Add a shadow endpoint. Analyze prediction differences on samples.

Question # 9

A company is uploading thousands of PDF policy documents into Amazon S3 and Amazon Bedrock Knowledge Bases. Each document contains structured sections. Users often search for a small section but need the full section context. The company wants accurate section-level search with automatic context retrieval and minimal custom coding.

Which chunking strategy meets these requirements?

A.

Hierarchical

B.

Maximum tokens

C.

Semantic

D.

Fixed-size

Question # 10

An ML engineer needs to create data ingestion pipelines and ML model deployment pipelines on AWS. All the raw data is stored in Amazon S3 buckets.

Which solution will meet these requirements?

A.

Use Amazon Data Firehose to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

B.

Use AWS Glue to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

C.

Use Amazon Redshift ML to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

D.

Use Amazon Athena to create the data ingestion pipelines. Use an Amazon SageMaker notebook to create the model deployment pipelines.

Question # 11

A company uses an Amazon EMR cluster to run a data ingestion process for an ML model. An ML engineer notices that the processing time is increasing.

Which solution will reduce the processing time MOST cost-effectively?

A.

Use Spot Instances to increase the number of primary nodes.

B.

Use Spot Instances to increase the number of core nodes.

C.

Use Spot Instances to increase the number of task nodes.

D.

Use On-Demand Instances to increase the number of core nodes.

Question # 12

An ML engineer needs to deploy ML models to get inferences from large datasets in an asynchronous manner. The ML engineer also needs to implement scheduled monitoring of the data quality of the models. The ML engineer must receive alerts when changes in data quality occur.

Which solution will meet these requirements?

A.

Deploy the models by using scheduled AWS Glue jobs. Use Amazon CloudWatch alarms to monitor the data quality and to send alerts.

B.

Deploy the models by using scheduled AWS Batch jobs. Use AWS CloudTrail to monitor the data quality and to send alerts.

C.

Deploy the models by using Amazon Elastic Container Service (Amazon ECS) on AWS Fargate. Use Amazon EventBridge to monitor the data quality and to send alerts.

D.

Deploy the models by using Amazon SageMaker batch transform. Use SageMaker Model Monitor to monitor the data quality and to send alerts.

Question # 13

A company is developing an ML model to forecast future values based on time series data. The dataset includes historical measurements collected at regular intervals and categorical features. The model needs to predict future values based on past patterns and trends.

Which algorithm and hyperparameters should the company use to develop the model?

A.

Use the Amazon SageMaker AI XGBoost algorithm. Set the scale_pos_weight hyperparameter to adjust for class imbalance.

B.

Use k-means clustering with k to specify the number of clusters.

C.

Use the Amazon SageMaker AI DeepAR algorithm with matching context length and prediction length hyperparameters.

D.

Use the Amazon SageMaker AI Random Cut Forest (RCF) algorithm with contamination to set the expected proportion of anomalies.

Question # 14

An ML engineer is using a training job to fine-tune a deep learning model in Amazon SageMaker Studio. The ML engineer previously used the same pre-trained model with a similar

dataset. The ML engineer expects vanishing gradient, underutilized GPU, and overfitting problems.

The ML engineer needs to implement a solution to detect these issues and to react in predefined ways when the issues occur. The solution also must provide comprehensive real-time metrics during the training.

Which solution will meet these requirements with the LEAST operational overhead?

A.

Use TensorBoard to monitor the training job. Publish the findings to an Amazon Simple Notification Service (Amazon SNS) topic. Create an AWS Lambda function to consume the findings and to initiate the predefined actions.

B.

Use Amazon CloudWatch default metrics to gain insights about the training job. Use the metrics to invoke an AWS Lambda function to initiate the predefined actions.

C.

Expand the metrics in Amazon CloudWatch to include the gradients in each training step. Use the metrics to invoke an AWS Lambda function to initiate the predefined actions.

D.

Use SageMaker Debugger built-in rules to monitor the training job. Configure the rules to initiate the predefined actions.

Question # 15

A company wants to migrate ML models from an on-premises environment to Amazon SageMaker AI. The models are based on the PyTorch algorithm. The company needs to reuse its existing custom scripts as much as possible.

Which SageMaker AI feature should the company use?

A.

SageMaker AI built-in algorithms

B.

SageMaker Canvas

C.

SageMaker JumpStart

D.

SageMaker AI script mode

Question # 16

A company must install a custom script on any newly created Amazon SageMaker AI notebook instances.

Which solution will meet this requirement with the LEAST operational overhead?

A.

Create a lifecycle configuration script to install the custom script when a new SageMaker AI notebook is created. Attach the lifecycle configuration to every new SageMaker AI notebook as part of the creation steps.

B.

Create a custom Amazon Elastic Container Registry (Amazon ECR) image that contains the custom script. Push the ECR image to a Docker registry. Attach the Docker image to a SageMaker Studio domain. Select the kernel to run as part of the SageMaker AI notebook.

C.

Create a custom package index repository. Use AWS CodeArtifact to manage the installation of the custom script. Set up AWS PrivateLink endpoints to connect CodeArtifact to the SageMaker AI instance. Install the script.

D.

Store the custom script in Amazon S3. Create an AWS Lambda function to install the custom script on new SageMaker AI notebooks. Configure Amazon EventBridge to invoke the Lambda function when a new SageMaker AI notebook is initialized.

Question # 17

An ML engineer is designing an AI-powered traffic management system. The system must use near real-time inference to predict congestion and prevent collisions.

The system must also use batch processing to perform historical analysis of predictions over several hours to improve the model. The inference endpoints must scale automatically to meet demand.

Which combination of solutions will meet these requirements? (Select TWO.)

A.

Use Amazon SageMaker real-time inference endpoints with automatic scaling based on ConcurrentInvocationsPerInstance.

B.

Use AWS Lambda with reserved concurrency and SnapStart to connect to SageMaker endpoints.

C.

Use an Amazon SageMaker Processing job for batch historical analysis. Schedule the job with Amazon EventBridge.

D.

Use Amazon EC2 Auto Scaling to host containers for batch analysis.

E.

Use AWS Lambda for historical analysis.

Question # 18

A company uses an Amazon SageMaker AI model for real-time inference with auto scaling enabled. During peak usage, new instances launch before existing instances are fully ready, causing inefficiencies and delays.

Which solution will optimize the scaling process without affecting response times?

A.

Change to a multi-model endpoint configuration.

B.

Integrate Amazon API Gateway and AWS Lambda to manage invocations.

C.

Decrease the scale-in cooldown period and increase the maximum instance count.

D.

Increase the cooldown period after scale-out activities.

Question # 19

An ML engineer needs to use an ML model to predict the price of apartments in a specific location.

Which metric should the ML engineer use to evaluate the model’s performance?

A.

Accuracy

B.

Area Under the ROC Curve (AUC)

C.

F1 score

D.

Mean absolute error (MAE)

Question # 20

An ML engineer needs to run intensive model training jobs each month that can take 48–72 hours. The jobs can be interrupted and resumed. The engineer has a fixed budget and needs the most cost-effective compute option.

Which solution will meet these requirements?

A.

Purchase Reserved Instances with partial upfront payment.

B.

Purchase On-Demand Instances.

C.

Purchase SageMaker AI Savings Plans.

D.

Purchase Spot Instances that use automated checkpoints.

Question # 21

An ML engineer wants to use Amazon SageMaker Data Wrangler to perform preprocessing on a dataset. The ML engineer wants to use the processed dataset to train a classification model. During preprocessing, the ML engineer notices that a text feature has a range of thousands of values that differ only by spelling errors. The ML engineer needs to apply an encoding method so that after preprocessing is complete, the text feature can be used to train the model.

Which solution will meet these requirements?

A.

Perform ordinal encoding to represent categories of the feature.

B.

Perform similarity encoding to represent categories of the feature.

C.

Perform one-hot encoding to represent categories of the feature.

D.

Perform target encoding to represent categories of the feature.

Question # 22

A company is developing a generative AI conversational interface to assist customers with payments. The company wants to use an ML solution to detect customer intent. The company does not have training data to train a model.

Which solution will meet these requirements?

A.

Fine-tune a sequence-to-sequence (seq2seq) algorithm in Amazon SageMaker JumpStart.

B.

Use an LLM from Amazon Bedrock with zero-shot learning.

C.

Use the Amazon Comprehend DetectEntities API.

D.

Run an LLM from Amazon Bedrock on Amazon EC2 instances.

Question # 23

A company's dataset for prediction analytics contains duplicate records, missing data, and unusually extreme high or low values. The company needs a solution to resolve the data quality issues quickly. The solution must maintain data integrity and have the LEAST operational overhead.

Which solution will meet these requirements?

A.

Use AWS Glue DataBrew to delete duplicate records, fill missing values with medians, and replace extreme values with values in a normal range.

B.

Configure an AWS Glue job to identify records with missing values and extreme measurements and delete them.

C.

Create an Amazon EMR Spark job to replace missing values with zeros and merge duplicate records.

D.

Use Amazon SageMaker Data Wrangler to delete duplicates, apply statistical modeling for missing values, and apply outlier detection algorithms.

Question # 24

An ML engineer wants to run a training job on Amazon SageMaker AI. The training job will train a neural network by using multiple GPUs. The training dataset is stored in Parquet format.

The ML engineer discovered that the Parquet dataset contains files too large to fit into the memory of the SageMaker AI training instances.

Which solution will fix the memory problem?

A.

Attach an Amazon Elastic Block Store (Amazon EBS) Provisioned IOPS SSD volume to the instance. Store the files in the EBS volume.

B.

Repartition the Parquet files by using Apache Spark on Amazon EMR. Use the repartitioned files for the training job.

C.

Change the instance type to Memory Optimized instances with sufficient memory for the training job.

D.

Use the SageMaker AI distributed data parallelism (SMDDP) library with multiple instances to split the memory usage.

Question # 25

A company is developing an ML model for a customer. The training data is stored in an Amazon S3 bucket in the customer's AWS account (Account A). The company runs Amazon SageMaker AI training jobs in a separate AWS account (Account B).

The company defines an S3 bucket policy and an IAM policy to allow reads to the S3 bucket.

Which additional steps will meet the cross-account access requirement?

A.

Create the S3 bucket policy in Account A. Attach the IAM policy to an IAM role that SageMaker AI uses in Account A.

B.

Create the S3 bucket policy in Account A. Attach the IAM policy to an IAM role that SageMaker AI uses in Account B.

C.

Create the S3 bucket policy in Account B. Attach the IAM policy to an IAM role that SageMaker AI uses in Account A.

D.

Create the S3 bucket policy in Account B. Attach the IAM policy to an IAM role that SageMaker AI uses in Account B.

Question # 26

An ML engineer has an Amazon Comprehend custom model in Account A in the us-east-1 Region. The ML engineer needs to copy the model to Account ? in the same Region.

Which solution will meet this requirement with the LEAST development effort?

A.

Use Amazon S3 to make a copy of the model. Transfer the copy to Account B.

B.

Create a resource-based IAM policy. Use the Amazon Comprehend ImportModel API operation to copy the model to Account B.

C.

Use AWS DataSync to replicate the model from Account A to Account B.

D.

Create an AWS Site-to-Site VPN connection between Account A and Account ? to transfer the model.

Question # 27

A company wants to host an ML model on Amazon SageMaker. An ML engineer is configuring a continuous integration and continuous delivery (Cl/CD) pipeline in AWS CodePipeline to deploy the model. The pipeline must run automatically when new training data for the model is uploaded to an Amazon S3 bucket.

Select and order the pipeline's correct steps from the following list. Each step should be selected one time or not at all. (Select and order three.)

• An S3 event notification invokes the pipeline when new data is uploaded.

• S3 Lifecycle rule invokes the pipeline when new data is uploaded.

• SageMaker retrains the model by using the data in the S3 bucket.

• The pipeline deploys the model to a SageMaker endpoint.

• The pipeline deploys the model to SageMaker Model Registry.

MLA-C01 question answer

Question # 28

A company has implemented a data ingestion pipeline for sales transactions from its ecommerce website. The company uses Amazon Data Firehose to ingest data into Amazon OpenSearch Service. The buffer interval of the Firehose stream is set for 60 seconds. An OpenSearch linear model generates real-time sales forecasts based on the data and presents the data in an OpenSearch dashboard.

The company needs to optimize the data ingestion pipeline to support sub-second latency for the real-time dashboard.

Which change to the architecture will meet these requirements?

A.

Use zero buffering in the Firehose stream. Tune the batch size that is used in the PutRecordBatch operation.

B.

Replace the Firehose stream with an AWS DataSync task. Configure the task with enhanced fan-out consumers.

C.

Increase the buffer interval of the Firehose stream from 60 seconds to 120 seconds.

D.

Replace the Firehose stream with an Amazon Simple Queue Service (Amazon SQS) queue.

Question # 29

An ML engineer is using Amazon SageMaker to train a deep learning model that requires distributed training. After some training attempts, the ML engineer observes that the instances are not performing as expected. The ML engineer identifies communication overhead between the training instances.

What should the ML engineer do to MINIMIZE the communication overhead between the instances?

A.

Place the instances in the same VPC subnet. Store the data in a different AWS Region from where the instances are deployed.

B.

Place the instances in the same VPC subnet but in different Availability Zones. Store the data in a different AWS Region from where the instances are deployed.

C.

Place the instances in the same VPC subnet. Store the data in the same AWS Region and Availability Zone where the instances are deployed.

D.

Place the instances in the same VPC subnet. Store the data in the same AWS Region but in a different Availability Zone from where the instances are deployed.

Question # 30

An ML engineer is developing a classification model. The ML engineer needs to use custom libraries in processing jobs, training jobs, and pipelines in Amazon SageMaker AI.

Which solution will provide this functionality with the LEAST implementation effort?

A.

Manually install the libraries in the SageMaker AI containers.

B.

Build a custom Docker container that includes the required libraries. Host the container in Amazon Elastic Container Registry (Amazon ECR). Use the ECR image in the SageMaker AI jobs and pipelines.

C.

Use a SageMaker AI notebook instance and install libraries at startup.

D.

Run code externally on Amazon EC2 and import results into SageMaker AI.

Question # 31

A construction company is using Amazon SageMaker AI to train specialized custom object detection models to identify road damage. The company uses images from multiple cameras. The images are stored as JPEG objects in an Amazon S3 bucket.

The images need to be pre-processed by using computationally intensive computer vision techniques before the images can be used in the training job. The company needs to optimize data loading and pre-processing in the training job. The solution cannot affect model performance or increase compute or storage resources.

Which solution will meet these requirements?

A.

Use SageMaker AI file mode to load and process the images in batches.

B.

Reduce the batch size of the model and increase the number of pre-processing threads.

C.

Reduce the quality of the training images in the S3 bucket.

D.

Convert the images into RecordIO format and use the lazy loading pattern.

Question # 32

A company is planning to create several ML prediction models. The training data is stored in Amazon S3. The entire dataset is more than 5 ?? in size and consists of CSV, JSON, Apache Parquet, and simple text files.

The data must be processed in several consecutive steps. The steps include complex manipulations that can take hours to finish running. Some of the processing involves natural language processing (NLP) transformations. The entire process must be automated.

Which solution will meet these requirements?

A.

Process data at each step by using Amazon SageMaker Data Wrangler. Automate the process by using Data Wrangler jobs.

B.

Use Amazon SageMaker notebooks for each data processing step. Automate the process by using Amazon EventBridge.

C.

Process data at each step by using AWS Lambda functions. Automate the process by using AWS Step Functions and Amazon EventBridge.

D.

Use Amazon SageMaker Pipelines to create a pipeline of data processing steps. Automate the pipeline by using Amazon EventBridge.

Question # 33

A company is training a deep learning model to detect abnormalities in images. The company has limited GPU resources and a large hyperparameter space to explore. The company needs to test different configurations and avoid wasting computation time on poorly performing models that show weak validation accuracy in early epochs.

Which hyperparameter optimization strategy should the company use?

A.

Grid search across all possible combinations

B.

Bayesian optimization with early stopping

C.

Manual tuning of each parameter individually

D.

Exhaustive search without early stopping

Question # 34

An ML engineer is building a logistic regression model to predict customer churn for subscription services. The dataset contains two string variables: location and job_seniority_level.

The location variable has 3 distinct values, and the job_seniority_level variable has over 10 distinct values.

The ML engineer must perform preprocessing on the variables.

Which solution will meet this requirement?

A.

Apply tokenization to location. Apply ordinal encoding to job_seniority_level.

B.

Apply one-hot encoding to location. Apply ordinal encoding to job_seniority_level.

C.

Apply binning to location. Apply standard scaling to job_seniority_level.

D.

Apply one-hot encoding to location. Apply standard scaling to job_seniority_level.

Question # 35

An ML engineer is using Amazon SageMaker AI to train an ML model. The ML engineer needs to use SageMaker AI automatic model tuning (AMT) features to tune the model hyperparameters over a large parameter space.

The model has 20 categorical hyperparameters and 7 continuous hyperparameters that can be tuned. The ML engineer needs to run the tuning job a maximum of 1,000 times. The ML engineer must ensure that each parameter trial is built based on the performance of the previous trial.

Which solution will meet these requirements?

A.

Define the search space as categorical parameters of 1,000 possible combinations. Use grid search.

B.

Define the search space as continuous parameters. Use random search. Set the maximum number of tuning jobs to 1,000.

C.

Define the search space as categorical parameters and continuous parameters. Use Bayesian optimization. Set the maximum number of training jobs to 1,000.

D.

Define the search space as categorical parameters and continuous parameters. Use grid search. Set the maximum number of tuning jobs to 1,000.

Question # 36

An ML engineer is using Amazon Quick Suite (previously known as Amazon QuickSight) anomaly detection to detect very high or very low machine operating temperatures compared to normal. The ML engineer sets the Severity parameter to Low and above. The ML engineer sets the Direction parameter to All.

What effect will the ML engineer observe in the anomaly detection results if the ML engineer changes the Direction parameter to Lower than expected?

A.

Increased anomaly identification frequency and increased recall

B.

Decreased anomaly identification frequency and decreased recall

C.

Increased anomaly identification frequency and decreased recall

D.

Decreased anomaly identification frequency and increased recall

Question # 37

A government agency is conducting a national census to assess program needs by area and city. The census form collects approximately 500 responses from each citizen. The agency needs to analyze the data to extract meaningful insights. The agency wants to reduce the dimensions of the high-dimensional data to uncover hidden patterns.

Which solution will meet these requirements?

A.

Use the principal component analysis (PCA) algorithm in Amazon SageMaker AI.

B.

Use the t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm in Amazon SageMaker AI.

C.

Use the k-means algorithm in Amazon SageMaker AI.

D.

Use the Random Cut Forest (RCF) algorithm in Amazon SageMaker AI.

Question # 38

A company needs to ingest data from data sources into Amazon SageMaker Data Wrangler. The data sources are Amazon S3, Amazon Redshift, and Snowflake. The ingested data must always be up to date with the latest changes in the source systems.

Which solution will meet these requirements?

A.

Use direct connections to import data from the data sources into Data Wrangler.

B.

Use cataloged connections to import data from the data sources into Data Wrangler.

C.

Use AWS Glue to extract data from the data sources. Use AWS Glue also to import the data directly into Data Wrangler.

D.

Use AWS Lambda to extract data from the data sources. Use Lambda also to import the data directly into Data Wrangler.

Question # 39

An ML engineer is building a generative AI application on Amazon Bedrock by using large language models (LLMs).

Select the correct generative AI term from the following list for each description. Each term should be selected one time or not at all. (Select three.)

• Embedding

• Retrieval Augmented Generation (RAG)

• Temperature

• Token

MLA-C01 question answer

Question # 40

A company has a team of data scientists who use Amazon SageMaker notebook instances to test ML models. When the data scientists need new permissions, the company attaches the permissions to each individual role that was created during the creation of the SageMaker notebook instance.

The company needs to centralize management of the team's permissions.

Which solution will meet this requirement?

A.

Create a single IAM role that has the necessary permissions. Attach the role to each notebook instance that the team uses.

B.

Create a single IAM group. Add the data scientists to the group. Associate the group with each notebook instance that the team uses.

C.

Create a single IAM user. Attach the AdministratorAccess AWS managed IAM policy to the user. Configure each notebook instance to use the IAM user.

D.

Create a single IAM group. Add the data scientists to the group. Create an IAM role. Attach the AdministratorAccess AWS managed IAM policy to the role. Associate the role with the group. Associate the group with each notebook instance that the team uses.

Question # 41

An ML engineer wants to deploy an Amazon SageMaker AI model for inference. The payload sizes are less than 3 MB. Processing time does not exceed 45 seconds. The traffic patterns will be irregular or unpredictable.

Which inference option will meet these requirements MOST cost-effectively?

A.

Asynchronous inference

B.

Real-time inference

C.

Serverless inference

D.

Batch transform

Question # 42

A company uses an ML model to recommend videos to users. The model is deployed on Amazon SageMaker AI. The model performed well initially after deployment, but the model's performance has degraded over time.

Which solution can the company use to identify model drift in the future?

A.

Create a monitoring job in SageMaker Model Monitor. Then create a baseline from the training dataset.

B.

Create a baseline from the training dataset. Then create a monitoring job in SageMaker Model Monitor.

C.

Create a baseline by using a built-in rule in SageMaker Clarify. Monitor the drift in Amazon CloudWatch.

D.

Retrain the model on new data. Compare the retrained model's performance to the original model's performance.

Question # 43

An ML engineer trained an ML model on Amazon SageMaker to detect automobile accidents from dosed-circuit TV footage. The ML engineer used SageMaker Data Wrangler to create a training dataset of images of accidents and non-accidents.

The model performed well during training and validation. However, the model is underperforming in production because of variations in the quality of the images from various cameras.

Which solution will improve the model's accuracy in the LEAST amount of time?

A.

Collect more images from all the cameras. Use Data Wrangler to prepare a new training dataset.

B.

Recreate the training dataset by using the Data Wrangler corrupt image transform. Specify the impulse noise option.

C.

Recreate the training dataset by using the Data Wrangler enhance image contrast transform. Specify the Gamma contrast option.

D.

Recreate the training dataset by using the Data Wrangler resize image transform. Crop all images to the same size.

Question # 44

An ML engineer needs to use an Amazon EMR cluster to process large volumes of data in batches. Any data loss is unacceptable.

Which instance purchasing option will meet these requirements MOST cost-effectively?

A.

Run the primary node, core nodes, and task nodes on On-Demand Instances.

B.

Run the primary node, core nodes, and task nodes on Spot Instances.

C.

Run the primary node on an On-Demand Instance. Run the core nodes and task nodes on Spot Instances.

D.

Run the primary node and core nodes on On-Demand Instances. Run the task nodes on Spot Instances.

Question # 45

An ML engineer is setting up a CI/CD pipeline for an ML workflow in Amazon SageMaker AI. The pipeline must automatically retrain, test, and deploy a model whenever new data is uploaded to an Amazon S3 bucket. New data files are approximately 10 GB in size. The ML engineer also needs to track model versions for auditing.

Which solution will meet these requirements?

A.

Use AWS CodePipeline, Amazon S3, and AWS CodeBuild to retrain and deploy the model automatically and track model versions.

B.

Use SageMaker Pipelines with the SageMaker Model Registry to orchestrate model training and version tracking.

C.

Use AWS Lambda and Amazon EventBridge to retrain and deploy the model and track versions via logs.

D.

Manually retrain and deploy the model using SageMaker notebook instances and track versions with AWS CloudTrail.

Question # 46

Case study

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.

The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

The ML engineer needs to use an Amazon SageMaker built-in algorithm to train the model.

Which algorithm should the ML engineer use to meet this requirement?

A.

LightGBM

B.

Linear learner

C.

?-means clustering

D.

Neural Topic Model (NTM)

Question # 47

A company uses a hybrid cloud environment. A model that is deployed on premises uses data in Amazon 53 to provide customers with a live conversational engine.

The model is using sensitive data. An ML engineer needs to implement a solution to identify and remove the sensitive data.

Which solution will meet these requirements with the LEAST operational overhead?

A.

Deploy the model on Amazon SageMaker. Create a set of AWS Lambda functions to identify and remove the sensitive data.

B.

Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster that uses AWS Fargate. Create an AWS Batch job to identify and remove the sensitive data.

C.

Use Amazon Macie to identify the sensitive data. Create a set of AWS Lambda functions to remove the sensitive data.

D.

Use Amazon Comprehend to identify the sensitive data. Launch Amazon EC2 instances to remove the sensitive data.

Question # 48

A company is planning to use Amazon Redshift ML in its primary AWS account. The source data is in an Amazon S3 bucket in a secondary account.

An ML engineer needs to set up an ML pipeline in the primary account to access the S3 bucket in the secondary account. The solution must not require public IPv4 addresses.

Which solution will meet these requirements?

A.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC with no public access enabled in the primary account. Create a VPC peering connection between the accounts. Update the VPC route tables to remove the route to 0.0.0.0/0.

B.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC with no public access enabled in the primary account. Create an AWS Direct Connect connection and a transit gateway. Associate the VPCs from both accounts with the transit gateway. Update the VPC route tables to remove the route to 0.0.0.0/0.

C.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an AWS Site-to-Site VPN connection with two encrypted IPsec tunnels between the accounts. Set up interface VPC endpoints for Amazon S3.

D.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an S3 gateway endpoint. Update the S3 bucket policy to allow IAM principals from the primary account. Set up interface VPC endpoints for SageMaker and Amazon Redshift.

Question # 49

A company has deployed a model to predict the churn rate for its games by using Amazon SageMaker Studio. After the model is deployed, the company must monitor the model performance for data drift and inspect the report. Select and order the correct steps from the following list to model monitor actions. Select each step one time. (Select and order THREE.) .

Check the analysis results on the SageMaker Studio console. .

Create a Shapley Additive Explanations (SHAP) baseline for the model by using Amazon SageMaker Clarify.

Schedule an hourly model explainability monitor.

MLA-C01 question answer

Question # 50

A company ingests sales transaction data using Amazon Data Firehose into Amazon OpenSearch Service. The Firehose buffer interval is set to 60 seconds.

The company needs sub-second latency for a real-time OpenSearch dashboard.

Which architectural change will meet this requirement?

A.

Use zero buffering in the Firehose stream and tune the PutRecordBatch batch size.

B.

Replace Firehose with AWS DataSync and enhanced fan-out consumers.

C.

Increase the Firehose buffer interval to 120 seconds.

D.

Replace Firehose with Amazon SQS.

Question # 51

A company wants to predict the success of advertising campaigns by considering the color scheme of each advertisement. An ML engineer is preparing data for a neural network model. The dataset includes color information as categorical data.

Which technique for feature engineering should the ML engineer use for the model?

A.

Apply label encoding to the color categories. Automatically assign each color a unique integer.

B.

Implement padding to ensure that all color feature vectors have the same length.

C.

Perform dimensionality reduction on the color categories.

D.

One-hot encode the color categories to transform the color scheme feature into a binary matrix.

Question # 52

An ML engineer needs to organize a large set of text documents into topics. The ML engineer will not know what the topics are in advance. The ML engineer wants to use built-in algorithms or pre-trained models available through Amazon SageMaker AI to process the documents.

Which solution will meet these requirements?

A.

Use the BlazingText algorithm to identify the relevant text and to create a set of topics based on the documents.

B.

Use the Sequence-to-Sequence algorithm to summarize the text and to create a set of topics based on the documents.

C.

Use the Object2Vec algorithm to create embeddings and to create a set of topics based on the embeddings.

D.

Use the Latent Dirichlet Allocation (LDA) algorithm to process the documents and to create a set of topics based on the documents.

Question # 53

A company is building an Amazon SageMaker AI pipeline for an ML model. The pipeline uses distributed processing and distributed training.

An ML engineer needs to encrypt network communication between instances that run distributed jobs. The ML engineer configures the distributed jobs to run in a private VPC.

What should the ML engineer do to meet the encryption requirement?

A.

Enable network isolation.

B.

Configure traffic encryption by using security groups.

C.

Enable inter-container traffic encryption.

D.

Enable VPC flow logs.

Question # 54

A company is using ML to predict the presence of a specific weed in a farmer's field. The company is using the Amazon SageMaker linear learner built-in algorithm with a value of multiclass_dassifier for the predictorjype hyperparameter.

What should the company do to MINIMIZE false positives?

A.

Set the value of the weight decay hyperparameter to zero.

B.

Increase the number of training epochs.

C.

Increase the value of the target_precision hyperparameter.

D.

Change the value of the predictorjype hyperparameter to regressor.

Question # 55

An ML engineer normalized training data by using min-max normalization in AWS Glue DataBrew. The ML engineer must normalize production inference data in the same way before passing the data to the model.

Which solution will meet this requirement?

A.

Apply statistics from a well-known dataset to normalize the production samples.

B.

Keep the min-max normalization statistics from the training set and use them to normalize the production samples.

C.

Calculate new min-max statistics from a batch of production samples and use them to normalize all production samples.

D.

Calculate new min-max statistics from each production sample and use them to normalize all production samples.

Question # 56

An ML engineer needs to use data with Amazon SageMaker Canvas to train an ML model. The data is stored in Amazon S3 and is complex in structure. The ML engineer must use a file format that minimizes processing time for the data.

Which file format will meet these requirements?

A.

CSV files compressed with Snappy

B.

JSON objects in JSONL format

C.

JSON files compressed with gzip

D.

Apache Parquet files

Question # 57

A company needs to create a central catalog for all the company's ML models. The models are in AWS accounts where the company developed the models initially. The models are hosted in Amazon Elastic Container Registry (Amazon ECR) repositories.

Which solution will meet these requirements?

A.

Configure ECR cross-account replication for each existing ECR repository. Ensure that each model is visible in each AWS account.

B.

Create a new AWS account with a new ECR repository as the central catalog. Configure ECR cross-account replication between the initial ECR repositories and the central catalog.

C.

Use the Amazon SageMaker Model Registry to create a model group for models hosted in Amazon ECR. Create a new AWS account. In the new account, use the SageMaker Model Registry as the central catalog. Attach a cross-account resource policy to each model group in the initial AWS accounts.

D.

Use an AWS Glue Data Catalog to store the models. Run an AWS Glue crawler to migrate the models from the ECR repositories to the Data Catalog. Configure cross-account access to the Data Catalog.

Question # 58

A company needs to combine data from multiple sources. The company must use Amazon Redshift Serverless to query an AWS Glue Data Catalog database and underlying data that is stored in an Amazon S3 bucket.

Select and order the correct steps from the following list to meet these requirements. Select each step one time or not at all. (Select and order three.)

• Attach the IAM role to the Redshift cluster.

• Attach the IAM role to the Redshift namespace.

• Create an external database in Amazon Redshift to point to the Data Catalog schema.

• Create an external schema in Amazon Redshift to point to the Data Catalog database.

• Create an IAM role for Amazon Redshift to use to access only the S3 bucket that contains underlying data.

• Create an IAM role for Amazon Redshift to use to access the Data Catalog and the S3 bucket that contains underlying data.

MLA-C01 question answer

Question # 59

A company is creating an application that will recommend products for customers to purchase. The application will make API calls to Amazon Q Business. The company must ensure that responses from Amazon Q Business do not include the name of the company's main competitor.

Which solution will meet this requirement?

A.

Configure the competitor's name as a blocked phrase in Amazon Q Business.

B.

Configure an Amazon Q Business retriever to exclude the competitor’s name.

C.

Configure an Amazon Kendra retriever for Amazon Q Business to build indexes that exclude the competitor's name.

D.

Configure document attribute boosting in Amazon Q Business to deprioritize the competitor's name.

Question # 60

A company is using Amazon SageMaker AI to develop a credit risk assessment model. During model validation, the company finds that the model achieves 82% accuracy on the validation data. However, the model achieved 99% accuracy on the training data. The company needs to address the model accuracy issue before deployment.

Which solution will meet this requirement?

A.

Add more dense layers to increase model complexity. Implement batch normalization. Use early stopping during training.

B.

Implement dropout layers. Use L1 or L2 regularization. Perform k-fold cross-validation.

C.

Use principal component analysis (PCA) to reduce the feature dimensionality. Decrease model layers. Implement cross-entropy loss functions.

D.

Augment the training dataset. Remove duplicate records from the training dataset. Implement stratified sampling.

Question # 61

An ML engineer needs to use AWS services to identify and extract meaningful unique keywords from documents.

Which solution will meet these requirements with the LEAST operational overhead?

A.

Use the Natural Language Toolkit (NLTK) library on Amazon EC2 instances for text pre-processing. Use the Latent Dirichlet Allocation (LDA) algorithm to identify and extract relevant keywords.

B.

Use Amazon SageMaker and the BlazingText algorithm. Apply custom pre-processing steps for stemming and removal of stop words. Calculate term frequency-inverse document frequency (TF-IDF) scores to identify and extract relevant keywords.

C.

Store the documents in an Amazon S3 bucket. Create AWS Lambda functions to process the documents and to run Python scripts for stemming and removal of stop words. Use bigram and trigram techniques to identify and extract relevant keywords.

D.

Use Amazon Comprehend custom entity recognition and key phrase extraction to identify and extract relevant keywords.

Question # 62

A company has multiple models that are hosted on Amazon SageMaker Al. The models need to be re-trained. The requirements for each model are different, so the company needs to choose different deployment strategies to transfer all requests to a new model.

Select the correct strategy from the following list for each requirement. Select each strategy one time. (Select THREE.)

. Canary traffic shifting

. Linear traffic shifting guardrail

. All at once traffic shifting

MLA-C01 question answer

MLA-C01 PDF

$33

$109.99

3 Months Free Update

  • Printable Format
  • Value of Money
  • 100% Pass Assurance
  • Verified Answers
  • Researched by Industry Experts
  • Based on Real Exams Scenarios
  • 100% Real Questions

MLA-C01 PDF + Testing Engine

$52.8

$175.99

3 Months Free Update

  • Exam Name: AWS Certified Machine Learning Engineer - Associate
  • Last Update: Feb 24, 2026
  • Questions and Answers: 207
  • Free Real Questions Demo
  • Recommended by Industry Experts
  • Best Economical Package
  • Immediate Access

MLA-C01 Engine

$39.6

$131.99

3 Months Free Update

  • Best Testing Engine
  • One Click installation
  • Recommended by Teachers
  • Easy to use
  • 3 Modes of Learning
  • State of Art Technology
  • 100% Real Questions included