5 Results

Table 5.1: Model accuracies and their corresponding
confidence intervals
Model Accuracy 95% CI
Naive Bayes 0.6 [0.55, 0.65]
Basic Logistic Regression 0.62 [0.57, 0.67]
Lasso Regression 0.7 [0.65, 0.74]
Support Vector Machine 0.69 [0.64, 0.74]
Random Forest 0.71 [0.66, 0.75]
Multilayer Perceptron 0.7 [0.65, 0.75]
Recurrent Neural Network 0.68 [0.63, 0.73]

Table 5.1 displays the resulting accuracies and 95% confidence intervals for each of the models run in this project. Basic approaches, such as Naive Bayes and basic logistic regression performed noticeably worse than the other five models. It also appears that the deep learning algorithms did not perform significantly better than any of the other algorithms (besides Naive Bayes). Note that the confusion matrices for each of the models can be found in Section 7.

To examine these results further, we can visualize the confusion matrix of the random forest model (which did among the best out of all of the models).

Confusion matrix for the random forest model

Figure 5.1: Confusion matrix for the random forest model

Figure 5.1 shows that there were 147 true positives, 124 true negatives, 51 false positives, and 61 false negatives. Although the overall accuracy of around 71% is not very high, we have a reasonable balance between the sensitivity and specificity of the predictions.