demir.ai Dataset Operations

Overview

demir.ai Dataset Operations

With this application, you can have the empty values (nan/null) deleted or filled before giving your dataset to machine learning algorithms, you can access visual or numerical information about your dataset and have more detailed information about your attributes.

The application is written in Python programming language, Flask framework is used in the backend, Html is used in the frontent. Pandas framework is used to navigate over the dataset, all numerical operations on the dataset were written by me and no ready-made functions were used, while the plots were created from scratch by me using the Opencv framework.

Before running the application, you can install the necessary packages for the application with the following command.

pip3 install -r requirements.txt

You can launch the web application with the following command, and then you can use the application by going to http://localhost:5000/.

python3 main.py

With this web application, you can delete rows or columns with empty values (nan/null) on your dataset or fill these empty values in three different ways.

  • Null value (nan) operations you can do on your dataset with demir.ai Dataset Operations:

    • Column-based deletion of null data (nan/null)
    • Row-based deletion of null data (nan/null)
    • Filling in blank data by mean, median and mode

Again, thanks to this web application, you can reach visual or numerical results about your dataset and have detailed information about your dataset.

  • Information you can learn about your dataset with demir.ai Dataset Operations:

    • Mean of columns
    • Median of columns
    • Mode of columns
    • Frequency of columns
    • Interquartile range value (IQR) of columns
    • Outliers of columns
    • Five number summary of columns
    • Box Chart of columns
    • Variance and standard deviation of columns

Null value (nan/null) operations

  • Column-based deletion of null data (nan/null): The number of nulls is calculated for each column, then the percentage of nulls is calculated and if this percentage is greater than the percentage the user enters, this column is deleted.

  • Row-based deletion of null data (nan/null): The number of nulls is calculated for each line, and if this number of nulls is greater than the number entered by the user, this line is deleted.

  • Filling in blank data by mean, median and mode:

    • Mean: The sum of the non-blank values of the columns is taken and divided by the total number of non-blank values, the average obtained is written instead of the empty values.

    • Median: The median is calculated according to the non-blank values in the columns, and then this median value is written instead of the empty columns.

    • Mode: The mode is calculated according to the non-blank values in the columns, and then this mode value is written instead of the empty columns

Information you can learn about your dataset

  • Mean of columns: The mean is calculated for each column separately and the column mean information is presented to the user.

  • Median of columns: The median is calculated for each column separately and the column median information is presented to the user.

  • Mode of columns: The mode is calculated for each column separately and the column mode information is presented to the user.

  • Frequency of columns: Frequency is calculated for each column and the frequency information of the columns is presented to the user. In this section, frequency visualization is also done by creating a bar plot from scratch with Opencv.

  • Interquartile range value (IQR) of columns: Q1 and Q3 values are found for each column, then the IQR value of the columns is found with Q3-Q1 and presented to the user.

  • Outliers of columns: If the data in the column is less than (Q1-IQR * 1.5) and greater than (Q3+IQR * 1.5), it is called outlier and this information is presented to the user.

  • Five number summary of columns: Minimum, Q1, median, Q3 and Maximum values are calculated and presented to the user.

  • Box Chart of columns: After finding the minimum, Q1, median, Q3 and maximum values for each column, a box chart is created from scratch with Opencv and this chart is presented to the user.

  • Variance and standard deviation of columns: The variance and standard deviation for each column are calculated and presented to the user.

Application video

demirai.mp4
Owner
Ahmet Furkan DEMIR
Hi, my name is Ahmet Furkan DEMIR. I study computer engineering at Necmettin Erbakan University.
Ahmet Furkan DEMIR
This is my favourite function - the Rastrigin function.

This is my favourite function - the Rastrigin function. What sparked my curiosity and interest in the function was its complexity in terms of many local optimum points, which makes it particularly in

1 Dec 27, 2021
A high performance implementation of HDBSCAN clustering. http://hdbscan.readthedocs.io/en/latest/

HDBSCAN Now a part of scikit-learn-contrib HDBSCAN - Hierarchical Density-Based Spatial Clustering of Applications with Noise. Performs DBSCAN over va

Leland McInnes 91 Dec 29, 2022
Wikipedia WordCloud App generate Wikipedia word cloud art created using python's streamlit, matplotlib, wikipedia and wordcloud packages

Wikipedia WordCloud App Wikipedia WordCloud App generate Wikipedia word cloud art created using python's streamlit, matplotlib, wikipedia and wordclou

Siva Prakash 5 Jan 02, 2022
Create HTML profiling reports from pandas DataFrame objects

Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great

10k Jan 01, 2023
A site that displays up to date COVID-19 stats, powered by fastpages.

https://covid19dashboards.com This project was built with fastpages Background This project showcases how you can use fastpages to create a static das

GitHub 1.6k Jan 07, 2023
An open-source plotting library for statistical data.

Lets-Plot Lets-Plot is an open-source plotting library for statistical data. It is implemented using the Kotlin programming language. The design of Le

JetBrains 820 Jan 06, 2023
Collection of data visualizing projects through Tableau, Data Wrapper, and Power BI

Data-Visualization-Projects Collection of data visualizing projects through Tableau, Data Wrapper, and Power BI Indigenous-Brands-Social-Movements Pyt

Jinwoo(Roy) Yoon 1 Feb 05, 2022
A programming language built on top of Python to easily allow Swahili speakers to get started with programming without ever knowing English

pyswahili A programming language built over Python to easily allow swahili speakers to get started with programming without ever knowing english pyswa

Jordan Kalebu 72 Dec 15, 2022
Seismic Waveform Inversion Toolbox-1.0

Seismic Waveform Inversion Toolbox (SWIT-1.0)

Haipeng Li 98 Dec 29, 2022
Learn Data Science with focus on adding value with the most efficient tech stack.

DataScienceWithPython Get started with Data Science with Python An engaging journey to become a Data Scientist with Python TL;DR Download all Jupyter

Learn Python with Rune 110 Dec 22, 2022
Cryptocurrency Centralized Exchange Visualization

This is a simple one that uses Grafina to visualize cryptocurrency from the Bitkub exchange. This service will make a request to the Bitkub API from your wallet and save the response to Postgresql. G

Popboon Mahachanawong 1 Nov 24, 2021
🐞 📊 Ladybug extension to generate 2D charts

ladybug-charts Ladybug extension to generate 2D charts. Installation pip install ladybug-charts QuickStart import ladybug_charts API Documentation Loc

Ladybug Tools 3 Dec 30, 2022
Data visualization electromagnetic spectrum

Datenvisualisierung-Elektromagnetischen-Spektrum Anhand des Moduls matplotlib sollen die Daten des elektromagnetischen Spektrums dargestellt werden. D

Pulsar 1 Sep 01, 2022
Statistics and Visualization of acceptance rate, main keyword of CVPR 2021 accepted papers for the main Computer Vision conference (CVPR)

Statistics and Visualization of acceptance rate, main keyword of CVPR 2021 accepted papers for the main Computer Vision conference (CVPR)

Hoseong Lee 78 Aug 23, 2022
Make scripted visualizations in blender

Scripted visualizations in blender The goal of this project is to script 3D scientific visualizations using blender. To achieve this, we aim to bring

Praneeth Namburi 10 Jun 01, 2022
This is a learning tool and exploration app made using the Dash interactive Python framework developed by Plotly

Support Vector Machine (SVM) Explorer This app has been moved here. This repo is likely outdated and will not be updated. This is a learning tool and

Plotly 150 Nov 03, 2022
By default, networkx has problems with drawing self-loops in graphs.

By default, networkx has problems with drawing self-loops in graphs. It makes it hard to draw a graph with self-loops or to make a nicely looking chord diagram. This repository provides some code to

Vladimir Shitov 5 Jan 06, 2022
metedraw is a project mainly for data visualization projects of Atmospheric Science, Marine Science, Environmental Science or other majors

It is mainly for data visualization projects of Atmospheric Science, Marine Science, Environmental Science or other majors.

Nephele 11 Jul 05, 2022
GD-UltraHack - A Mod Menu for Geometry Dash. Specifically a MegahackV5 clone in Python. Only for Windows

GD UltraHack: The Mod Menu that Nobody asked for. This is a mod menu for the gam

zeo 1 Jan 05, 2022
coordinate to draw the nimbus logo on the graffitiwall

This is a community effort to draw the nimbus logo on beaconcha.in's graffitiwall. get started clone repo with git clone https://github.com/tennisbowl

4 Apr 04, 2022