Exploratory Data Analysis

In this page, we will show some data visulization to explore the interesting key insights in this dataset.

Approach description

For all the visulizations, I utilized the Python language with the famous vislulization packages, "MatPlotlib" and "Searborn" to achieve our goals.





Home Ownership vs Interest Rate

From this plot, we can see that "Mortgage" is the most home ownership type and with the lowest interest rate; On the other hand, the rent is the less home ownership type but with the highest interest rate.

Loan purpose count

From the barchart, we can observe that the "debt consolidation" is the reason the most loan applications will apply the loan for, and the second one is the loan for "credit card", but it still even don't hit the half of the first place.

Loan amount vs Loan status

From the box plot, we can see that "Fully paid" status has the low medium loan amount overall; on the other hand, we can observe that the status "Late (31 - 120 days)" and "In Grace Period" have relatively higher loan amount, even include the higher minimum. There is another thing we can detect on this plot, we can assume that the higest loan from this company is 40000, since we can observe that the maximum amount for all categories are the same at $40,000.

Interest Rate vs Grade

From this plot, it's pretty obvious that the interest rate has high correlation with the grade, from this plot, we assume that the grad has a ordinal relationship between the grades, so we can assume that grade A is the customers who have been classified as the greatest applicants, so that they can enjoy the lowest interest rate, on the contrast, we can assume that the grad F is the applicants who nave the worst records/histories so that they should afford the highest interest rate.

Histogram of the Loan Amount

From the histogram, we can find that the most loan amount is located between $8800 and $12700.