3 Months Free Update
3 Months Free Update
3 Months Free Update
A marketing manager attempts to determine those customers most likely to purchase additional products as the result of a nation-wide marketing campaign.
The manager possesses a historical dataset (CAMPAIGN) of a similar campaign from last year.
It has the following characteristics:
Which SAS program performs this analysis?
PROC GLMSELECT was used for building a model predicting the natural log of a baseball player's salary from certain performance and longevity statistics. The model used backward elimination using SBC as its selection criterion. The sequence of steps is summarized in the graphic shown below:
At Step 9 number of at bats (nAtBat) was removed from the model.
Why was it removed?
An analyst knows that the categorical predictor, zip_code, is an important predictor of a binary target. However, zip_code has too many levels to be a feasible predictor in a model. The analyst uses PROC CLUSTER to implement Greenacre's method to reduce the number of categorical levels.
What is the correct application of Greenacre's method in this situation?
In order to perform honest assessment on a predictive model, what is an acceptable division between training, validation, and testing data?
Which of the following describes a concordant pair of observations in the LOGISTIC procedure?
While building a predictive model, median imputations are performed while preparing the training data.
How should the imputations be addressed in the validation data?
Refer to the exhibit:
SAS output from the RSQUARE selection method, within the REG procedure, is shown. The top two models in each subset are given.
Based on the AIC statistic, which model is the champion model?
After performing an ANOVA test, an analyst has determined that a significant effect exists due to income. The analyst wants to compare each Income to all others and wants to control for experimentwise error.
Which GLM procedure statement would provide the most appropriate output?
There are missing values in the input variables for a regression application.
Which SAS procedure provides a viable solution?
A financial services manager wants to assess the probability that certain clients will default on their Home Equity Line of Credit (HELOC). A former employee left the code listed below.
The training data set is named HELOC, while a similar data set of more recent clients is named RECENT_HELOC.
Which SAS data steps will calculate the predicted probability of default on recent clients? (Choose two.)
An analyst compares the mean salaries of men and women working at a company.
The SAS data set SALARY contains variables:
Which SAS programs can be used to find the p-value for comparing men's salaries with women's salaries? (Choose two.)
What is a drawback to performing data cleansing (imputation, transformations, etc.) on raw data prior to partitioning the data for honest assessment as opposed to performing the data cleansing after partitioning the data?
Which SAS program will divide the original data set into 60% training and 40% validation data sets, stratified by county?
Refer to the following odds ratio table:
What is a correct interpretation of the estimate?
A confusion matrix is created for data that were oversampled due to a rare target.
What values are not affected by this oversampling?
A linear model has the following characteristics:
Which SAS program fits this model?
The following LOGISTIC procedure output analyzes the relationship between a binary response and an ordinal predictor variable, wrist_size Using reference cell coding, the analyst selects Large (L) as the reference level.
What is the estimated logit for a person with large wrist size?
Click the calculator button to display a calculator if needed.
A marketing campaign will send brochures describing an expensive product to a set of customers. The cost for mailing and production per customer is $50. The company makes $500 revenue for each sale.
What is the profit matrix for a typical person in the population?
This question will ask you to provide a missing option. Given the following SAS program:
What option must be added to the program to obtain a data set containing Pearson statistics?
A predictive model uses a data set that has several variables with missing values.
What two problems can arise with this model? (Choose two.)
Refer to the exhibit.
Given alpha=0.02, which conclusion is justified regarding percentage of body fat, comparing small (S), medium (M), and large (L) wrist sizes?
A researcher is planning a logistic regression to model the probability of disease occurrence. The researcher determines the rate of disease occurrence in the population is 1%.
For which of the following would this study be a candidate?
The standard form of a linear regression model is:
Which statement best summarizes the assumptions placed on the errors?