zoofs is a Python library for performing feature selection using an variety of nature inspired wrapper algorithms. The algorithms range from swarm-intelligence to physics based to Evolutionary. It's easy to use ,flexible and powerful tool to reduce your feature size.
zoofs is a Python library for performing feature selection using an variety of nature inspired wrapper algorithms. The algorithms range from swarm-intelligence to physics based to Evolutionary. It's easy to use ,flexible and powerful tool to reduce your feature size.
Define your own objective function for optimization !
fromsklearn.metricsimportlog_loss# define your own objective function, make sure the function receives four parameters,# fit your model and return the objective value ! defobjective_function_topass(model,X_train, y_train, X_valid, y_valid):
model.fit(X_train,y_train)
P=log_loss(y_valid,model.predict_proba(X_valid))
returnP# import an algorithm ! fromzoofsimportParticleSwarmOptimization# create object of algorithmalgo_object=ParticleSwarmOptimization(objective_function_topass,n_iteration=20,
population_size=20,minimize=True)
importlightgbmaslgblgb_model=lgb.LGBMClassifier()
# fit the algorithmalgo_object.fit(lgb_model,X_train, y_train, X_valid, y_valid,verbose=True)
#plot your resultsalgo_object.plot_history()
Suggestions for Usage
As available algorithms are wrapper algos. It is better to use ml models that build quicker, e.g lightgbm, catboost.
Take sufficient amount for 'population_size' , as this will determine the extent of exploration and exploitation of the algo.
Ensure that your ml model has its hyperparamters optimized before passing it to zoofs algos.
objective score plot
Algorithms
Particle Swarm Algorithm
class zoofs.ParticleSwarmOptimization(objective_function,n_iteration=50,population_size=50,minimize=True,c1=2,c2=2,w=0.9)
Parameters
objective_function : user made function of the signature 'func(model,X_train,y_train,X_test,y_test)'.
The function must return a value, that needs to be minimized/maximized.
n_iteration : int, default=50
Number of time the algorithm will run
population_size : int, default=50
Total size of the population
minimize : bool, default=True
Defines if the objective value is to be maximized or minimized
X_train : pandas.core.frame.DataFrame of shape (n_samples, n_features)
Training input samples to be used for machine learning model
y_train : pandas.core.frame.DataFrame or pandas.core.series.Series of shape (n_samples)
The target values (class labels in classification, real numbers in regression).
X_valid : pandas.core.frame.DataFrame of shape (n_samples, n_features)
Validation input samples
y_valid : pandas.core.frame.DataFrame or pandas.core.series.Series of shape (n_samples)
The Validation target values .
verbose : bool,default=True
Print results for iterations
Returns
best_feature_list : array-like
Final best set of features
plot_history()
Plot results across iterations
Example
fromsklearn.metricsimportlog_loss# define your own objective function, make sure the function receives four parameters,# fit your model and return the objective value ! defobjective_function_topass(model,X_train, y_train, X_valid, y_valid):
model.fit(X_train,y_train)
P=log_loss(y_valid,model.predict_proba(X_valid))
returnP# import an algorithm ! fromzoofsimportParticleSwarmOptimization# create object of algorithmalgo_object=ParticleSwarmOptimization(objective_function_topass,n_iteration=20,
population_size=20,minimize=True,c1=2,c2=2,w=0.9)
importlightgbmaslgblgb_model=lgb.LGBMClassifier()
# fit the algorithmalgo_object.fit(lgb_model,X_train, y_train, X_valid, y_valid,verbose=True)
#plot your resultsalgo_object.plot_history()
Grey Wolf Algorithm
class zoofs.GreyWolfOptimization(objective_function,n_iteration=50,population_size=50,minimize=True)
Parameters
objective_function : user made function of the signature 'func(model,X_train,y_train,X_test,y_test)'.
The function must return a value, that needs to be minimized/maximized.
n_iteration : int, default=50
Number of time the algorithm will run
population_size : int, default=50
Total size of the population
minimize : bool, default=True
Defines if the objective value is to be maximized or minimized
X_train : pandas.core.frame.DataFrame of shape (n_samples, n_features)
Training input samples to be used for machine learning model
y_train : pandas.core.frame.DataFrame or pandas.core.series.Series of shape (n_samples)
The target values (class labels in classification, real numbers in regression).
X_valid : pandas.core.frame.DataFrame of shape (n_samples, n_features)
Validation input samples
y_valid : pandas.core.frame.DataFrame or pandas.core.series.Series of shape (n_samples)
The Validation target values .
method : {1, 2}, default=1
Choose the between the two methods of grey wolf optimization
verbose : bool,default=True
Print results for iterations
Returns
best_feature_list : array-like
Final best set of features
plot_history()
Plot results across iterations
Example
fromsklearn.metricsimportlog_loss# define your own objective function, make sure the function receives four parameters,# fit your model and return the objective value ! defobjective_function_topass(model,X_train, y_train, X_valid, y_valid):
model.fit(X_train,y_train)
P=log_loss(y_valid,model.predict_proba(X_valid))
returnP# import an algorithm ! fromzoofsimportGreyWolfOptimization# create object of algorithmalgo_object=GreyWolfOptimization(objective_function_topass,n_iteration=20,
population_size=20,minimize=True)
importlightgbmaslgblgb_model=lgb.LGBMClassifier()
# fit the algorithmalgo_object.fit(lgb_model,X_train, y_train, X_valid, y_valid,method=1,verbose=True)
#plot your resultsalgo_object.plot_history()
Dragon Fly Algorithm
class zoofs.DragonFlyOptimization(objective_function,n_iteration=50,population_size=50,minimize=True)
Parameters
objective_function : user made function of the signature 'func(model,X_train,y_train,X_test,y_test)'.
The function must return a value, that needs to be minimized/maximized.
n_iteration : int, default=50
Number of time the algorithm will run
population_size : int, default=50
Total size of the population
minimize : bool, default=True
Defines if the objective value is to be maximized or minimized
Choose the between the three methods of Dragon Fly optimization
verbose : bool,default=True
Print results for iterations
Returns
best_feature_list : array-like
Final best set of features
plot_history()
Plot results across iterations
Example
fromsklearn.metricsimportlog_loss# define your own objective function, make sure the function receives four parameters,# fit your model and return the objective value ! defobjective_function_topass(model,X_train, y_train, X_valid, y_valid):
model.fit(X_train,y_train)
P=log_loss(y_valid,model.predict_proba(X_valid))
returnP# import an algorithm ! fromzoofsimportDragonFlyOptimization# create object of algorithmalgo_object=DragonFlyOptimization(objective_function_topass,n_iteration=20,
population_size=20,minimize=True)
importlightgbmaslgblgb_model=lgb.LGBMClassifier()
# fit the algorithmalgo_object.fit(lgb_model,X_train, y_train, X_valid, y_valid, method='sinusoidal', verbose=True)
#plot your resultsalgo_object.plot_history()
Genetic Algorithm
class zoofs.GeneticOptimization(objective_function,n_iteration=20,population_size=20,selective_pressure=2,elitism=2,mutation_rate=0.05,minimize=True)
Parameters
objective_function : user made function of the signature 'func(model,X_train,y_train,X_test,y_test)'.
The function must return a value, that needs to be minimized/maximized.
n_iteration: int, default=50
Number of time the algorithm will run
population_size : int, default=50
Total size of the population
selective_pressure: int, default=2
measure of reproductive opportunities for each organism in the population
elitism: int, default=2
number of top individuals to be considered as elites
mutation_rate: float, default=0.05
rate of mutation in the population's gene
minimize: bool, default=True
Defines if the objective value is to be maximized or minimized
X_train : pandas.core.frame.DataFrame of shape (n_samples, n_features)
Training input samples to be used for machine learning model
y_train : pandas.core.frame.DataFrame or pandas.core.series.Series of shape (n_samples)
The target values (class labels in classification, real numbers in regression).
X_valid : pandas.core.frame.DataFrame of shape (n_samples, n_features)
Validation input samples
y_valid : pandas.core.frame.DataFrame or pandas.core.series.Series of shape (n_samples)
The Validation target values .
verbose : bool,default=True
Print results for iterations
Returns
best_feature_list : array-like
Final best set of features
plot_history()
Plot results across iterations
Example
fromsklearn.metricsimportlog_loss# define your own objective function, make sure the function receives four parameters,# fit your model and return the objective value ! defobjective_function_topass(model,X_train, y_train, X_valid, y_valid):
model.fit(X_train,y_train)
P=log_loss(y_valid,model.predict_proba(X_valid))
returnP# import an algorithm ! fromzoofsimportGeneticOptimization# create object of algorithmalgo_object=GeneticOptimization(objective_function_topass,n_iteration=20,
population_size=20,selective_pressure=2,elitism=2,
mutation_rate=0.05,minimize=True)
importlightgbmaslgblgb_model=lgb.LGBMClassifier()
# fit the algorithmalgo_object.fit(lgb_model,X_train, y_train,X_valid, y_valid, verbose=True)
#plot your resultsalgo_object.plot_history()
Gravitational Algorithm
class zoofs.GravitationalOptimization(self,objective_function,n_iteration=50,population_size=50,g0=100,eps=0.5,minimize=True)
Parameters
objective_function : user made function of the signature 'func(model,X_train,y_train,X_test,y_test)'.
The function must return a value, that needs to be minimized/maximized.
n_iteration: int, default=50
Number of time the algorithm will run
population_size : int, default=50
Total size of the population
g0: float, default=100
gravitational strength constant
eps: float, default=0.5
distance constant
minimize: bool, default=True
Defines if the objective value is to be maximized or minimized
X_train : pandas.core.frame.DataFrame of shape (n_samples, n_features)
Training input samples to be used for machine learning model
y_train : pandas.core.frame.DataFrame or pandas.core.series.Series of shape (n_samples)
The target values (class labels in classification, real numbers in regression).
X_valid : pandas.core.frame.DataFrame of shape (n_samples, n_features)
Validation input samples
y_valid : pandas.core.frame.DataFrame or pandas.core.series.Series of shape (n_samples)
The Validation target values .
verbose : bool,default=True
Print results for iterations
Returns
best_feature_list : array-like
Final best set of features
plot_history()
Plot results across iterations
Example
fromsklearn.metricsimportlog_loss# define your own objective function, make sure the function receives four parameters,# fit your model and return the objective value ! defobjective_function_topass(model,X_train, y_train, X_valid, y_valid):
model.fit(X_train,y_train)
P=log_loss(y_valid,model.predict_proba(X_valid))
returnP# import an algorithm ! fromzoofsimportGravitationalOptimization# create object of algorithmalgo_object=GravitationalOptimization(objective_function,n_iteration=50,
population_size=50,g0=100,eps=0.5,minimize=True)
importlightgbmaslgblgb_model=lgb.LGBMClassifier()
# fit the algorithmalgo_object.fit(lgb_model,X_train, y_train, X_valid, y_valid, verbose=True)
#plot your resultsalgo_object.plot_history()
Support zoofs
The development of zoofs relies completely on contributions.
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Additional context
Harris Haw Optimization (HHO) is a novel meta-heuristic optimization algorithm released in 2019 with an increasing of applied research papers. It would be great if the team can add the HHO to the zoofs which will be potential for further testing and make the zoofs more popular.
Snyk has created this PR to fix one or more vulnerable packages in the `pip` dependencies of this project.
Changes included in this PR
Changes to the following files to upgrade the vulnerable dependencies to a fixed version:
docs/requirement.txt
⚠️ Warning
notebook 5.7.13 requires terminado, which is not installed.
nbformat 4.4.0 requires jsonschema, which is not installed.
nbconvert 5.6.1 has requirement mistune<2,>=0.8.1, but you have mistune 2.0.2.
mkdocs-material 8.0.1 requires mkdocs, which is not installed.
mkdocs-material 8.0.1 requires pymdown-extensions, which is not installed.
mkdocs-material 8.0.1 requires mkdocs-material-extensions, which is not installed.
mkdocs-material 8.0.1 requires markdown, which is not installed.
Vulnerabilities that will be fixed
By pinning:
Severity | Issue | Upgrade | Breaking Change | Exploit Maturity
:-------------------------:|:-------------------------|:-------------------------|:-------------------------|:-------------------------
| Cross-site Scripting (XSS) SNYK-PYTHON-MISTUNE-2328096 | mistune: 0.8.4 -> 2.0.1 | No | No Known Exploit
Some vulnerabilities couldn't be fully fixed and so Snyk will still find them when the project is tested again. This may be because the vulnerability existed within more than one direct dependency, but not all of the effected dependencies could be upgraded.
Check the changes in this PR to ensure they won't cause issues with your project.
Note:You are seeing this because you or someone else with access to this repository has authorized Snyk to open fix PRs.
Snyk has created this PR to fix one or more vulnerable packages in the `pip` dependencies of this project.
Changes included in this PR
Changes to the following files to upgrade the vulnerable dependencies to a fixed version:
docs/requirement.txt
⚠️ Warning
notebook 5.7.13 requires terminado, which is not installed.
nbformat 4.4.0 requires jsonschema, which is not installed.
mkdocs-material 8.0.1 requires mkdocs, which is not installed.
mkdocs-material 8.0.1 requires pymdown-extensions, which is not installed.
mkdocs-material 8.0.1 requires mkdocs-material-extensions, which is not installed.
mkdocs-material 8.0.1 requires markdown, which is not installed.
Vulnerabilities that will be fixed
By pinning:
Severity | Priority Score (*) | Issue | Upgrade | Breaking Change | Exploit Maturity
:-------------------------:|-------------------------|:-------------------------|:-------------------------|:-------------------------|:-------------------------
| 624/1000 Why? Has a fix available, CVSS 8.2 | Arbitrary Code Execution SNYK-PYTHON-IPYTHON-2348630 | ipython: 5.10.0 -> 7.16.3 | No | No Known Exploit
| 696/1000 Why? Proof of Concept exploit, Has a fix available, CVSS 7.5 | Regular Expression Denial of Service (ReDoS) SNYK-PYTHON-PYGMENTS-1086606 | pygments: 2.5.2 -> 2.7.4 | No | Proof of Concept
| 589/1000 Why? Has a fix available, CVSS 7.5 | Denial of Service (DoS) SNYK-PYTHON-PYGMENTS-1088505 | pygments: 2.5.2 -> 2.7.4 | No | No Known Exploit
(*) Note that the real score may have changed since the PR was raised.
Some vulnerabilities couldn't be fully fixed and so Snyk will still find them when the project is tested again. This may be because the vulnerability existed within more than one direct dependency, but not all of the effected dependencies could be upgraded.
Check the changes in this PR to ensure they won't cause issues with your project.
Note:You are seeing this because you or someone else with access to this repository has authorized Snyk to open fix PRs.
Hi,
Thanks for the great repo. I would like to know whether we can get the ranking of the selected features after using one of your algorithm (ex: particle swarm optimization)
Would you consider to add the function like GridSearch for hyper-parameter optimization of the algorithm, such as GWO, in the zoofs?
This library, PySwarm (https://github.com/tisimst/pyswarm) for instance, they provide a GridSearch to find the best combination of the parameters c, w1, w2.
For now, I have to do the trial and error to test which ranges of parameters in the GWO (population, iteration, method) deliver the best result for my dataset.
Setting verbose=False still produces output at every iteration. This is problematic since the JSON file can get very large when the fit function runs for prolonged period of time.
This PR was automatically created by Snyk using the credentials of a real user.
Snyk has created this PR to fix one or more vulnerable packages in the `pip` dependencies of this project.
Changes included in this PR
Changes to the following files to upgrade the vulnerable dependencies to a fixed version:
docs/requirement.txt
⚠️ Warning
mkdocs-material 8.0.1 requires pygments, which is not installed.
mkdocs-material 8.0.1 requires mkdocs-material-extensions, which is not installed.
mkdocs-material 8.0.1 requires markdown, which is not installed.
mkdocs-material 8.0.1 requires pymdown-extensions, which is not installed.
mkdocs-material 8.0.1 requires mkdocs, which is not installed.
jupyter-nbextensions-configurator 0.6.1 requires notebook, which is not installed.
jupyter-contrib-nbextensions 0.7.0 requires nbconvert, which is not installed.
jupyter-contrib-nbextensions 0.7.0 requires notebook, which is not installed.
jupyter-contrib-core 0.4.2 requires notebook, which is not installed.
Vulnerabilities that will be fixed
By pinning:
Severity | Priority Score (*) | Issue | Upgrade | Breaking Change | Exploit Maturity
:-------------------------:|-------------------------|:-------------------------|:-------------------------|:-------------------------|:-------------------------
| 551/1000 Why? Recently disclosed, Has a fix available, CVSS 5.3 | Regular Expression Denial of Service (ReDoS) SNYK-PYTHON-SETUPTOOLS-3180412 | setuptools: 39.0.1 -> 65.5.1 | No | No Known Exploit
| 551/1000 Why? Recently disclosed, Has a fix available, CVSS 5.3 | Regular Expression Denial of Service (ReDoS) SNYK-PYTHON-WHEEL-3180413 | wheel: 0.30.0 -> 0.38.0 | No | No Known Exploit
(*) Note that the real score may have changed since the PR was raised.
Some vulnerabilities couldn't be fully fixed and so Snyk will still find them when the project is tested again. This may be because the vulnerability existed within more than one direct dependency, but not all of the affected dependencies could be upgraded.
Check the changes in this PR to ensure they won't cause issues with your project.
Note:You are seeing this because you or someone else with access to this repository has authorized Snyk to open fix PRs.
This PR was automatically created by Snyk using the credentials of a real user.
Snyk has created this PR to fix one or more vulnerable packages in the `pip` dependencies of this project.
Changes included in this PR
Changes to the following files to upgrade the vulnerable dependencies to a fixed version:
docs/requirement.txt
⚠️ Warning
notebook 5.7.16 requires terminado, which is not installed.
nbformat 4.4.0 requires jsonschema, which is not installed.
nbconvert 5.6.1 has requirement mistune<2,>=0.8.1, but you have mistune 2.0.4.
mkdocs-material 8.0.1 requires mkdocs, which is not installed.
mkdocs-material 8.0.1 requires pymdown-extensions, which is not installed.
mkdocs-material 8.0.1 requires mkdocs-material-extensions, which is not installed.
mkdocs-material 8.0.1 requires markdown, which is not installed.
jupyter-nbextensions-configurator 0.5.0 has requirement notebook>=6.0, but you have notebook 5.7.16.
ipython 5.10.0 requires simplegeneric, which is not installed.
Vulnerabilities that will be fixed
By pinning:
Severity | Priority Score (*) | Issue | Upgrade | Breaking Change | Exploit Maturity
:-------------------------:|-------------------------|:-------------------------|:-------------------------|:-------------------------|:-------------------------
| 441/1000 Why? Recently disclosed, Has a fix available, CVSS 3.1 | Regular Expression Denial of Service (ReDoS) SNYK-PYTHON-SETUPTOOLS-3113904 | setuptools: 39.0.1 -> 65.5.1 | No | No Known Exploit
(*) Note that the real score may have changed since the PR was raised.
Some vulnerabilities couldn't be fully fixed and so Snyk will still find them when the project is tested again. This may be because the vulnerability existed within more than one direct dependency, but not all of the affected dependencies could be upgraded.
Check the changes in this PR to ensure they won't cause issues with your project.
Note:You are seeing this because you or someone else with access to this repository has authorized Snyk to open fix PRs.
Snyk has created this PR to fix one or more vulnerable packages in the `pip` dependencies of this project.
Changes included in this PR
Changes to the following files to upgrade the vulnerable dependencies to a fixed version:
docs/requirement.txt
⚠️ Warning
notebook 5.7.16 requires pyzmq, which is not installed.
notebook 5.7.16 requires terminado, which is not installed.
nbformat 4.4.0 requires jsonschema, which is not installed.
nbconvert 5.6.1 has requirement mistune<2,>=0.8.1, but you have mistune 2.0.4.
mkdocs-material 8.0.1 requires mkdocs, which is not installed.
mkdocs-material 8.0.1 requires markdown, which is not installed.
mkdocs-material 8.0.1 requires mkdocs-material-extensions, which is not installed.
mkdocs-material 8.0.1 requires pymdown-extensions, which is not installed.
jupyter-nbextensions-configurator 0.5.0 has requirement notebook>=6.0, but you have notebook 5.7.16.
jupyter-client 5.3.5 requires pyzmq, which is not installed.
ipython 5.10.0 requires simplegeneric, which is not installed.
Vulnerabilities that will be fixed
By pinning:
Severity | Priority Score (*) | Issue | Upgrade | Breaking Change | Exploit Maturity
:-------------------------:|-------------------------|:-------------------------|:-------------------------|:-------------------------|:-------------------------
| 551/1000 Why? Recently disclosed, Has a fix available, CVSS 5.3 | Regular Expression Denial of Service (ReDoS) SNYK-PYTHON-WHEEL-3092128 | wheel: 0.30.0 -> 0.38.0 | No | No Known Exploit
(*) Note that the real score may have changed since the PR was raised.
Some vulnerabilities couldn't be fully fixed and so Snyk will still find them when the project is tested again. This may be because the vulnerability existed within more than one direct dependency, but not all of the affected dependencies could be upgraded.
Check the changes in this PR to ensure they won't cause issues with your project.
Note:You are seeing this because you or someone else with access to this repository has authorized Snyk to open fix PRs.
It doesn't accept numpy arrays and so numba is out of question.
Any suggestions to improve speed? When you have 100+ feature columns it takes atleast 2 weeks running 24/7
First of all i want to thank you for this amazing library , just i want to ask can the size of best_feature_list can be declared before starting the algorithm ??
The Cormen-lib module is an insular data structures and algorithms library based on the Thomas H. Cormen's Introduction to Algorithms Third Edition. This library was made specifically for administeri
Distributed Grid Descent: an algorithm for hyperparameter tuning guided by Bayesian inference, designed to run on multiple processes and potentially many machines with no central point of control.