We have a dataset of user performances. The project is to develop a machine learning model that will predict the salaries of baseball players.

Overview

Salary-Prediction-with-Machine-Learning

image

1. Business Problem

Can a machine learning project be implemented to estimate the salaries of baseball players whose salary information and career statistics for 1986 are shared?

2. Dataset Story

  • This dataset was originally taken from the StatLib library at Carnegie Mellon University.
  • The data set is part of the data used in the 1988 ASA Graphics Section Poster Session.
  • Salary data originally taken from Sports Illustrated, April 20, 1987.
  • 1986 and career statistics, Collier Books, Macmillan Publishing Company.

3. Variables

  • AtBat: Number of hits with a baseball bat during the 1986-1987 season
  • Hits: the number of hits in the 1986-1987 season
  • HmRun: Most valuable hits in the 1986-1987 season
  • Runs: The points he earned for his team in the 1986-1987 season
  • RBI: The number of players a batter had jogged when he hit
  • Walks: Number of mistakes made by the opposing player
  • Years: Player's playing time in major league (years)
  • CAtBat: The number of times the player hits the ball during his career
  • CHits: The number of hits the player has made throughout his career
  • CHmRun: The player's most valuable number during his career
  • CRuns: The number of points the player has earned for his team during his career
  • CRBI: The number of players the player has made during his career
  • CWalks: The number of mistakes the player has made to the opposing player during his career
  • League: A factor with levels A and N, showing the league in which the player played until the end of the season
  • Division: a factor with levels E and W, indicating the position played by the player at the end of 1986
  • PutOuts: Helping your teammate in-game
  • Assits: The number of assists the player made in the 1986-1987 season
  • Errors: the number of errors of the player in the 1986-1987 season
  • Salary: The salary of the player in the 1986-1987 season (over thousand)
  • NewLeague: a factor with levels A and N indicating the league of the player at the beginning of the 1987 season

TASK

Salary using data preprocessing and feature engineering techniques develop a forecasting model

Owner
Ayşe Nur Türkaslan
I continue my studies in the field of Data Science and Artificial Intelligence. I want to turn my efforts into a contribution using Github
Ayşe Nur Türkaslan
Software Engineer Salary Prediction

Based on 2021 stack overflow data, this machine learning web application helps one predict the salary based on years of experience, level of education and the country they work in.

Jhanvi Mimani 1 Jan 08, 2022
AI and Machine Learning with Kubeflow, Amazon EKS, and SageMaker

Data Science on AWS - O'Reilly Book Get the book on Amazon.com Book Outline Quick Start Workshop (4-hours) In this quick start hands-on workshop, you

Data Science on AWS 2.8k Jan 03, 2023
A simple and lightweight genetic algorithm for optimization of any machine learning model

geneticml This package contains a simple and lightweight genetic algorithm for optimization of any machine learning model. Installation Use pip to ins

Allan Barcelos 8 Aug 10, 2022
Kalman filter library

The kalman filter framework described here is an incredibly powerful tool for any optimization problem, but particularly for visual odometry, sensor fusion localization or SLAM.

comma.ai 276 Jan 01, 2023
Mixing up the Invariant Information clustering architecture, with self supervised concepts from SimCLR and MoCo approaches

Self Supervised clusterer Combined IIC, and Moco architectures, with some SimCLR notions, to get state of the art unsupervised clustering while retain

Bendidi Ihab 9 Feb 13, 2022
icepickle is to allow a safe way to serialize and deserialize linear scikit-learn models

icepickle It's a cooler way to store simple linear models. The goal of icepickle is to allow a safe way to serialize and deserialize linear scikit-lea

vincent d warmerdam 24 Dec 09, 2022
Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

Model Validation Toolkit is a collection of tools to assist with validating machine learning models prior to deploying them to production and monitoring them after deployment to production.

FINRA 25 Dec 28, 2022
This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing variance.

minvar_invest_portfolio This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing var

1 Jan 06, 2022
BudouX is the successor to Budou, the machine learning powered line break organizer tool.

BudouX Standalone. Small. Language-neutral. BudouX is the successor to Budou, the machine learning powered line break organizer tool. It is standalone

Google 868 Jan 05, 2023
PyPOTS - A Python Toolbox for Data Mining on Partially-Observed Time Series

A python toolbox/library for data mining on partially-observed time series, supporting tasks of forecasting/imputation/classification/clustering on incomplete multivariate time series with missing va

Wenjie Du 179 Dec 31, 2022
Decision Tree Regression algorithm implemented on Python from scratch.

Decision_Tree_Regression I implemented the decision tree regression algorithm on Python. Unlike regular linear regression, this algorithm is used when

1 Dec 22, 2021
GRaNDPapA: Generator of Rad Names from Decent Paper Acronyms

Generator of Rad Names from Decent Paper Acronyms

264 Nov 08, 2022
Estudos e projetos feitos com PySpark.

PySpark (Spark com Python) PySpark é uma biblioteca Spark escrita em Python, e seu objetivo é permitir a análise interativa dos dados em um ambiente d

Karinne Cristina 54 Nov 06, 2022
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se

alkaline-ml 1.3k Dec 22, 2022
Machine Learning University: Accelerated Natural Language Processing Class

Machine Learning University: Accelerated Natural Language Processing Class This repository contains slides, notebooks and datasets for the Machine Lea

AWS Samples 2k Jan 01, 2023
Predict the income for each percentile of the population (Python) - FRENCH

05.income-prediction Predict the income for each percentile of the population (Python) - FRENCH Effectuez une prédiction de revenus Prérequis Pour ce

1 Feb 13, 2022
pandas, scikit-learn, xgboost and seaborn integration

pandas, scikit-learn and xgboost integration.

299 Dec 30, 2022
100 Days of Machine and Deep Learning Code

💯 Days of Machine Learning and Deep Learning Code MACHINE LEARNING TOPICS COVERED - FROM SCRATCH Linear Regression Logistic Regression K Means Cluste

Tanishq Gautam 66 Nov 02, 2022
A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching.

A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching. The solver will solve equations of the type: A can be

Sanjeet N. Dasharath 3 Feb 15, 2022
Steganography is the art of hiding the fact that communication is taking place, by hiding information in other information.

Steganography is the art of hiding the fact that communication is taking place, by hiding information in other information.

Priyansh Sharma 7 Nov 09, 2022