Prediction using Orange (.ows) on Loan Status.
Analyzing the Factors and Requisites that can influence the loan status and finally classify whether the person paid the loan or is charged off.
The project undertaken predicted the requisite figures and analyzed them under the given parameters to arrive at the conclusion of whether the person has fully paid the loan or is charged off. The analysis was enabled on Orange and using a wide variety of tools to arrive at the above-mentioned conclusion.
Firstly, the .CSV File was uploaded, then all the target column(s), i.e., LOAN STATUS were selected & then, the Rank widget from the Data Column was taken, as ranking helps in giving a gist of what is required the most in a particular type of data. Then the first 11 Data Heads were selected according to their ranks.
Then, the Data was check using Data Table, and, it was then observed that 8.3% of values from the data were missing. So, there is a need to impute the data by considering the mean and mode of the values and fill the missing values in the data (using func. impute)
The task was done by using different models and then evaluated by using Test & Score.
The two different models that were used are: -
- Combination of Naive Bayes & Decision Tree
- Random Forest
- The Naive Bayes & Tree were used because: -
The Naive Bayes model can deal with both continuous and discrete data. It is profoundly versatile with the number of indicators and data focuses. It is quick and can be utilized to make continuous real-time predictions. It is not sensitive to irrelevant features.
The Decision Tree is used to comprehend & predict both numerical values and categorical value problems. But there is a drawback that it generally results in overfitting of the data/information. Yet, we can dodge the over fittings by utilizing a pre-pruning approach, for instance, creating a tree with fewer leaves and branches.
The combination of Naive Bayes & Decision Tree was used because Naive Bayes has some plus points, which Tree does not have, and vice-versa. For instance, Naive Bayes can do text classification and spam filtering. On the other hand, Tree can do the pattern, sequence, and financial recognition. Together they are strong.
- Random Forest model was used because: -
Random Forest is a tree-based learning algorithm with the power to form accurate decisions as it many decision trees together. As its name says — it’s a forest of trees. Hence, Random Forest takes more training time than a single decision tree. Each branch and leaf within the decision tree works on the random features to predict the output. Then this algorithm combines all the predictions of individual decision trees to generate the final prediction, and it can also deal with the missing values.
After Test & Score, the Confusion Metrics was used to see all the True Positives and False Negative values, etc. And lastly, Distribution visualization was used to ascertain the information.
Conclusion
It can be seen that the final results were turned out to be different. So — now, there is a need to take an average of both the results and then it can be said that in LOAN STATUS only 5.52% of the population (Total Population is 79.25k (As there is a need to take an average of the total population as well.)) comes under CHARGED OFF and rest, i.e., 94.48% of the population comes under FULLY PAID.
And, It can also be said that the RANDOM FOREST is a better model than DECISION TREE and NAIVE BAYES Combination because it has better AUC as: -
- AUC is scale-invariant — i.e., — it measures how well predictions are ranked, irrespective of their absolute values.
- AUC is also a classification threshold invariant- i.e.,- it measures the quality of the model’s predictions irrespective of what classification threshold is chosen.
This dataset (.CSV file) is taken from Kaggle.
Filename: Credit_train
Contacts
In case you have any questions or any suggestions on what my next article should be about, please leave a comment below or mail me at aryanbajaj104@gmail.com
If you want to keep updated with my latest articles and projects, follow me on Medium.
Connect with me via: