CS484_IML_Assignment_5

.docx

School

Illinois Institute Of Technology *

*We aren’t endorsed by this school

Course

484

Subject

Computer Science

Date

Dec 6, 2023

Type

docx

Pages

Uploaded by SargentFreedomCapybara39 on coursehero.com

CS 484: Introduction to Machine Learning Fall Semester 2023 Assignment 5 Question 1 (100 points) The Center for Machine Learning and Intelligent Systems at the University of California, Irvine manages the Machine Learning Repository ( https://archive.ics.uci.edu/ml/index.php ). We will use two of the datasets in the repository for analyses, namely, the WineQuality_Train.csv for training and the WineQuality_Test.csv for testing. The categorical target variable is quality_grp . It has two categories, namely, 0 and 1. The Event category is 1. The input features are alcohol , citric_acid , free_sulfur_dioxide , residual_sugar , and sulphates . These five input features are considered interval variables. We will train two models. One is a classification tree, and another is a binary logistic regression. The classification tree has the following specifications.  The Splitting Criterion is Entropy.  The maximum tree depth is five.  The initial random state value is 20230101 for the classification tree. The binary logistic regression has the following specifications.  The model must include the Intercept term.  Use the All-Possible Subset method to determine the model with the lowest Akaike Information Criterion. After we train these two models, we will compare them using a suite of model performance metrics and charts. (a) (20 points) What are the Root Average Squared Error values of both models for both training and testing partitions? (b) (20 points) What are the Area Under Curve values of both models for both training and testing partitions? (c) (10 points) Generate the Receiver Operating Characteristic curve for both models on the training partition. Please put the two curves in the same chart frame. Don’t forget to add the diagonal reference line. (d) (10 points) Generate the Precision and Recall chart for both models on the training partition. Please put the two curves in the same chart frame. Don’t forget to add the No-Skills line to the chart. (e) (10 points) What is the threshold for the Event probability based on the F1 Score from the training partition? Please calculate the thresholds of both models. (f) (10 points) Using the F1 Score threshold, what are the Misclassification Rates of both models when evaluated only on the testing partition? (g) (10 points) Generate the Cumulative Gain and Lift table for both models using the predicted Event probabilities from the testing partition. Which model has the highest Lift value in Decile 1? Page 1

CS 484: Fall Semester 2023 Assignment 5 (h) (10 points) Based on all the above model performance metrics and charts, which model will you pick as the Champion model? Page 2

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version