Credit Card Fraud Detection
- Came across this mocked-up dataset of customer transactions at [Capital One Recruitment Challenge](https://github.com/CapitalOneRecruiting/DS).
- The unbalanced dataset is comprised of artificial customer transactions with a few outlier cases where fraud was detected. There's only ~1.6% fraudulent cases.
- Our primary goal is to successfully predict whether a transaction is Fraudulent or not, and avoid Type-II errors as much as possible as in most sensitive classification problems: we'll try not to point accusatory-fingers at genuine-transactions
😂 . - The secondary goal is to identify interesting anomalies in the transactions like multi-swipes, reversal of suspicious transactions, etc. by performing exploratory-data-analysis.
- Most numerical-fields seem to follow Power-law distributions rather than Gaussian distributions.
- We'll engineer some time-dependent categorical features by parsing the datetime fields, exclude the fields which have just one categorical value (makes no sense keeping these around
😒 ), and also create a new feature to indicate if credit-card-CVV is wrongly entered. - Baseline classifiers chosen are Logistic Regression, SVM, Random Forest, Isolated Forest.
- Performance is kinda poor on these Baseline models: Accuracy, precision, and recall vary greatly across the models.
- Moving on Gradient-Boosting models, Light Gradient Boosting is known to perform well on sparse datasets.
- Final accuracy achieved hovers around 98%, and recall is approximately 99.99% indicating that False-Negatives are absolutely minimal.