Identifies the faulty wafer before it can be used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells.

Overview

Retrainable-Faulty-Wafer-Detector

Aim of the project:

In electronics, a wafer (also called a slice or substrate) is a thin slice of semiconductor, such as crystalline silicon (c-Si), used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells. The wafer serves as the substrate for microelectronic devices built in and upon the wafer. The project aims to successfully identify the state of the provided wafer by classifying it between one of the two-class +1 (good, can be used as a substrate) or -1 (bad, the substrate need to be replaced) and then train the model on this data so that it can continuously update itself with the environment and become more generalized with time. In this regard, a training and prediction dataset is provided to build a machine learning classification model, which can predict the wafer quality.

Data Description:

The columns of provided data can be classified into 3 parts: wafer name, sensor values and label. The wafer name contains the batch number of the wafer, whereas the sensor values obtained from the measurement carried out on the wafer. The label column contains two unique values +1 and -1 that identifies if the wafer is good or need to be replaced. Additionally, we also require a schema file, which contains all the relevant information about the training files such as file names, length of date value in the file name, length of time value in the file name, number of columns, name of the columns, and their datatype.

Directory creation:

All the necessary folders were created to effectively separate the files so that the end-user can get easy access to them.

Data Validation:

In this step, we matched our dataset with the provided schema file to match the file names, the number of columns it should contain, their names as well as their datatype. If the files matched with the schema values then they are considered good files on which we can train or predict our model, however, if it didn't match then they are moved to the bad folder. Moreover, we also identify the columns with null values. If all the data in a column is missing then the file is also moved to the bad folder. On the contrary, if only a fraction of data in a column is missing then we initially fill it with NaN and considered it as good data.

Data Insertion in Database:

First, open a connection to the database if it exists otherwise create a new one. A table with the name train_good_raw_dt or pred_good_raw_dt is created in the database, based on the training or prediction process, for inserting the good data files obtained from the data validation step. If the table is already present then new files are inserted in that table as we want training to be done on new as well as old training files. In the end, the data in a stored database is exported as a CSV file, to be used for the model training.

Data Pre-processing and Model Training:

In the training section, first, the data is checked for the NaN values in the columns. If present, impute the NaN values using the KNN imputer. The columns with zero standard deviation were also identified and removed as they don't give any information during model training. A prediction schema was created based on the remained dataset columns. Afterwards, the KMeans algorithm is used to create clusters in the pre-processed data. The optimum number of clusters is selected by plotting the elbow plot, and for the dynamic selection of the number of clusters, we are using the "KneeLocator" function. The idea behind clustering is to implement different algorithms to train data in different clusters. The Kmeans model is trained over pre-processed data and the model is saved for further use in prediction. After clusters are created, we find the best model for each cluster. We are using four algorithms, Random Forest, K-Neighbours, Logistic Regression and XGBoost. For each cluster, both the algorithms are passed with the best parameters derived from GridSearch. We calculate the AUC scores for all the models and select the one with the best score. Similarly, the best model is selected for each cluster. For every cluster, the models are saved so that they can be used in future predictions. In the end, the confusion matrix of the model associated with every cluster is also saved to give the a glance over the performance of the models.

Prediction:

In data prediction, first, the essential directories are created. The data validation, data insertion and data processing steps are similar to the training section. The KMeans model created during training is loaded, and clusters for the pre-processed prediction data is predicted. Based on the cluster number, the respective model is loaded and is used to predict the data for that cluster. Once the prediction is made for all the clusters, the predictions along with the Wafer names are saved in a CSV file at a given location.

Retraining:

After the prediction, the prediction data is merged with the previous training dataset and then the models were retrained on this data using the hyperparameter values obtained from the GridSearch. The cycle repeats with every prediction it does and learns from the newly acquired data, making it more robust.

Deployment:

We will be deploying the model to Heroku Cloud.

Owner
Arun Singh Babal
Engineer | Data Science Enthusiasts | Machine Learning | Deep Learning | Advanced Computer Vision.
Arun Singh Babal
Convert Photoshop curves (acv) to xmp presets for Lightroom

acv2xmp Convert Photoshop curves (acv) to Lightroom preset (xmp) acv2xmp.py Basic command prompt that relies on standard library only and can be used

5 Feb 06, 2022
Implementation of the Angular Spectrum method in Python to simulate Diffraction Patterns

Diffraction Simulations - Angular Spectrum Method Implementation of the Angular Spectrum method in Python to simulate Diffraction Patterns with arbitr

Rafael de la Fuente 276 Dec 30, 2022
OntoSeer is a tool to help users build better quality ontologies

Ontoseer This document provides documentation for the first version of OntoSeer.OntoSeer is a tool that monitors the ontology development process andp

Knowledgeable Computing and Reasoning Lab 9 Aug 15, 2022
Cairo hooks for pre-commit

pre-commit-cairo Cairo hooks for pre-commit. See pre-commit for more details Using pre-commit-cairo with pre-commit Add this to your .pre-commit-confi

Fran Algaba 16 Sep 21, 2022
A tool that bootstraps your dotfiles ⚡️

Dotbot Dotbot makes installing your dotfiles as easy as git clone $url && cd dotfiles && ./install, even on a freshly installed system! Rationale Gett

Anish Athalye 5.9k Jan 07, 2023
This project intends to take the user's CEP (brazilian adress code) and return the local in which the CEP is placed.

This project aims to simply return the CEP's (the brazilian resident adress code) User of the application. The project uses a request and passes on to

Daniel Soares Saldanha 4 Nov 17, 2021
A Github Action for sending messages to a Matrix Room.

matrix-commit A Github Action for sending messages to a Matrix Room. Screenshot: Example Usage: # .github/workflows/matrix-commit.yml on: push:

3 Sep 11, 2022
Toppr Os Auto Class Joiner

Toppr Os Auto Class Joiner Toppr os is a irritating platform to work with especially for students it takes a while and is problematic most of the time

1 Dec 18, 2021
Earth centric orbit propagation tool. Built from scratch in python.

Orbit-Propogator Earth centric orbit propagation tool. Built from scratch in python. Functionality includes: tracking sattelite location over time plo

Adam Klein 1 Mar 13, 2022
API development made easy: a smart Python 3 API framework

appkernel - API development made easy What is Appkernel? A super-easy to use API framework, enabling API creation from zero to production within minut

156 Sep 28, 2022
Code emulator plugin for IDA Pro

emu_ida Code emulator plugin for IDA Pro (v 0.0.6) The plugin is designed for simple data decryption and getting stack strings. Requirements Emulator

Andrey Zhdanov 11 Jul 06, 2022
Static bytecode simulator

SEA Static bytecode simulator for creating dependency/dependant based experimental bytecode format for CPython. Example a = random() if a = 5.0:

Batuhan Taskaya 23 Jun 10, 2022
Python implementation of the ASFLIP advection method

This is a python implementation of the ASFLIP advection method . We would like to hear from you if you appreciate this work.

Raymond Yun Fei 133 Nov 13, 2022
Convert-Decimal-to-Binary-Octal-and-Hexadecimal

Convert-Decimal-to-Binary-Octal-and-Hexadecimal We have a number in a decimal number, and we have to convert it into a binary, octal, and hexadecimal

Maanyu M 2 Oct 08, 2021
a wordle-solver written in python

Wordle Solver Overview This is yet another wordle solver. It is built with the word list of the official wordle website, but it should also work with

Shoubhit Dash 10 Sep 24, 2022
Generate PNG filles from NFO files.

Installation git clone https://github.com/pcroland/nfopng cd nfopng pip install -r requirements.txt Usage ❯ ./nfopng.py usage: nfopng.py [-h] [-v] [-i

4 Jun 26, 2022
Simple Calculator Mobile Apps

Simple Calculator Mobile Apps Screenshoot If you want to try it please click the link below to download, this application is 100% safe no virus. link

0 Sep 24, 2022
Entitlement AND Hardened Runtime Check

Python3 script for macOS to recursively check /Applications and also check /usr/local/bin, /usr/bin, and /usr/sbin for binaries with problematic/interesting entitlements. Also checks for hardened run

Cedric Owens 79 Nov 16, 2022
SQL centered, docker process running game

REQUIREMENTS Linux Docker Python/bash set up image "docker build -t game ." create db container "run my_whatever/game_docker/pdb create" # creating po

1 Jan 11, 2022