A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Overview

Website | Documentation | Tutorials | Installation | Release Notes

GitHub license PyPI version Conda Version GitHub issues Telegram

CatBoost is a machine learning method based on gradient boosting over decision trees.

Main advantages of CatBoost:

Get Started and Documentation

All CatBoost documentation is available here.

Install CatBoost by following the guide for the

Next you may want to investigate:

If you cannot open documentation in your browser try adding yastatic.net and yastat.net to the list of allowed domains in your privacy badger.

Catboost models in production

If you want to evaluate Catboost model in your application read model api documentation.

Questions and bug reports

Help to Make CatBoost Better

  • Check out open problems and help wanted issues to see what can be improved, or open an issue if you want something.
  • Add your stories and experience to Awesome CatBoost.
  • To contribute to CatBoost you need to first read CLA text and add to your pull request, that you agree to the terms of the CLA. More information can be found in CONTRIBUTING.md
  • Instructions for contributors can be found here.

News

Latest news are published on twitter.

Reference Paper

Anna Veronika Dorogush, Andrey Gulin, Gleb Gusev, Nikita Kazeev, Liudmila Ostroumova Prokhorenkova, Aleksandr Vorobev "Fighting biases with dynamic boosting". arXiv:1706.09516, 2017.

Anna Veronika Dorogush, Vasily Ershov, Andrey Gulin "CatBoost: gradient boosting with categorical features support". Workshop on ML Systems at NIPS 2017.

License

© YANDEX LLC, 2017-2019. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.

Comments
  • UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 9: ordinal not in range(128)

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 9: ordinal not in range(128)

    Problem:UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 9: ordinal not in range(128) catboost version: catboost 0.25 Operating System:win10

    When I use setup.py to install Catboost, this error occurs, and if I look closely it is divided into two parts: 1. Using CUDA to create _catboost.pyd will cause an error like 'UnicodeDecodeError:' ASCII 'codec can't decode byte 0xCD in position 9: Ordinal not in range(128). 2. Do not use the CUDA to create _catboost. pyd, there will be "subprocess. CalledProcessError:Command '['D:\anaconda3\python.exe', 'D:\learn\catboost-master\ya', 'make', 'D:\learn\catboost-master\catboost\python-package\..\..\catboost\python-package\catboost', '--no-src-links', '--output', 'D:\ learn\ catboost-master\catboost\python-package\build\temp.win-amd64-3.8\Release', '-dpython_config =python3-config',' -duse_arcadia_python =no', '-dos_sdk =local', '-r','-DNO_DEBUGINFO', '-DHAVE_CUDA= NO '] returned non-zero exit status 1." I also tried converting _catboost.pyx from GitHub to _catboost.pyd using 'python setup.py build_ext --inplace' directly, but I got the same error as when installing CatBoost.

    C:\Users\王普聪>pip install -e D:\learn\catboost-master\catboost\python-package
    Obtaining file:///D:/learn/catboost-master/catboost/python-package
    Requirement already satisfied: graphviz in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (0.16)
    Requirement already satisfied: plotly in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (4.14.3)
    Requirement already satisfied: six in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (1.15.0)
    Requirement already satisfied: matplotlib in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (3.2.2)
    Requirement already satisfied: numpy>=1.16.0 in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (1.18.5)
    Requirement already satisfied: pandas>=0.24 in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (1.0.5)
    Requirement already satisfied: scipy in d:\anaconda3\lib\site-packages (from catboost==0.24.4) (1.5.0)
    Requirement already satisfied: retrying>=1.3.3 in d:\anaconda3\lib\site-packages (from plotly->catboost==0.24.4) (1.3.3)
    Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in d:\anaconda3\lib\site-packages (from matplotlib->catboost==0.24.4) (2.4.7)
    Requirement already satisfied: cycler>=0.10 in d:\anaconda3\lib\site-packages (from matplotlib->catboost==0.24.4) (0.10.0)
    Requirement already satisfied: kiwisolver>=1.0.1 in d:\anaconda3\lib\site-packages (from matplotlib->catboost==0.24.4) (1.2.0)
    Requirement already satisfied: python-dateutil>=2.1 in d:\anaconda3\lib\site-packages (from matplotlib->catboost==0.24.4) (2.8.1)
    Requirement already satisfied: pytz>=2017.2 in d:\anaconda3\lib\site-packages (from pandas>=0.24->catboost==0.24.4) (2020.1)
    Installing collected packages: catboost
      Running setup.py develop for catboost
        ERROR: Command errored out with exit status 1:
         command: 'D:\anaconda3\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'D:\\learn\\catboost-master\\catboost\\python-package\\setup.py'"'"'; __file__='"'"'D:\\learn\\catboost-master\\catboost\\python-package\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps
             cwd: D:\learn\catboost-master\catboost\python-package\
        Complete output (159 lines):
        running develop
        15:30:22 I Targeting for CUDA support with C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1
        running egg_info
        writing catboost.egg-info\PKG-INFO
        writing dependency_links to catboost.egg-info\dependency_links.txt
        writing requirements to catboost.egg-info\requires.txt
        writing top-level names to catboost.egg-info\top_level.txt
        15:30:24 I Targeting for CUDA support with C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1
        reading manifest file 'catboost.egg-info\SOURCES.txt'
        writing manifest file 'catboost.egg-info\SOURCES.txt'
        running build_ext
        15:30:24 I Targeting for CUDA support with C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1
        15:30:24 I Buildling _catboost.pyd with ymake
        15:30:24 I EXECUTE: D:\anaconda3\python.exe D:\learn\catboost-master\ya make D:\learn\catboost-master\catboost\python-package\..\..\catboost\python-package\catboost --no-src-links --output D:\learn\catboost-master\catboost\python-package\build\temp.win-amd64-3.8\Release -DPYTHON_CONFIG=python3-config -DUSE_ARCADIA_PYTHON=no -DOS_SDK=local -r -DNO_DEBUGINFO -DHAVE_CUDA=yes "-DCUDA_ROOT=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1"
        Output root is subdirectory of Arcadia root, this may cause non-idempotent build
        Traceback (most recent call last):
          File "devtools/ya/app.py", line 422, in configure_exit_interceptor
            yield
          File "devtools/ya/app.py", line 65, in helper
            return action(args)
          File "devtools/ya/entry/entry.py", line 55, in do_main
            res = handler.handle(handler, args, prefix=['ya'])
          File "devtools/ya/core/handler.py", line 159, in handle
            return handler.handle(self, args[1:], prefix + [name])
          File "devtools/ya/core/dispatch.py", line 37, in handle
            return self.command().handle(root_handler, args, prefix)
          File "devtools/ya/core/handler.py", line 341, in handle
            return self._action(params)
          File "devtools/ya/app.py", line 92, in helper
            return action(ctx.params)
          File "devtools/ya/build/build_handler.py", line 85, in do_ya_make
            builder = ya_make.YaMake(params, app_ctx)
          File "devtools/ya/build/ya_make.py", line 895, in __init__
            self.ctx = Context(self.opts, app_ctx=app_ctx, graph=graph, tests=tests, stripped_tests=stripped_tests, configure_errors=configure_errors, make_files=make_files, lite_graph=lite_graph)
          File "devtools/ya/build/ya_make.py", line 574, in __init__
            self.graph, self.tests, self.stripped_tests, self.configure_errors, self.make_files = _build_graph_and_tests(self.opts, app_ctx)
          File "devtools/ya/build/ya_make.py", line 258, in _build_graph_and_tests
            graph, tests, stripped_tests, gh, make_files = lg.build_graph_and_tests(opts, check=True, ev_listener=ev_listener, display=display)
          File "devtools/ya/build/graph.py", line 1688, in build_graph_and_tests
            return _build_graph_and_tests(opts, check, ev_listener, exit_stack, display)
          File "devtools/ya/build/graph.py", line 1992, in _build_graph_and_tests
            real_ymake_bin = tools.tool('ymake')
          File "devtools/ya/yalibrary/tools/__init__.py", line 220, in tool
            return toolchain.find(name, with_params, for_platform, cache=cache)
          File "devtools/ya/yalibrary/tools/__init__.py", line 158, in find
            executable = cur_bottle[executable_name]  # if executable_name is None it's Ok
          File "devtools/ya/yalibrary/tools/__init__.py", line 64, in __getitem__
            path = self.resolve()
          File "devtools/ya/yalibrary/tools/__init__.py", line 46, in resolve
            return self.__fetcher.fetch_if_need(self.__formula["match"], tared, binname, cache=cache).where
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 385, in fetch_if_need
            self.__c[key] = self._fetch_if_need(*args, **kwargs)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 452, in _fetch_if_need
            if self._fetch(name, tared, lambda x: name.lower() in x.lower(), binname):
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 368, in _fetch
            _install(res_path, do_install)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 104, in _install
            fs_handler(install_guard)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 95, in fs_handler
            func(install_guard)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 350, in do_install
            deploy_params=(UNTAR, resource_info if resource_info else {"file_name": "FILE"}, ""))
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 137, in _deploy_tool
            exts.archive.extract_from_tar(archive, extract_to)
          File "devtools/ya/exts/archive.py", line 16, in extract_from_tar
            archive.extract_tar(tar_file_path, output_dir)
          File "library/python/archive/__init__.py", line 62, in extract_tar
            output_dir = encode(output_dir, ENCODING)
          File "library/python/archive/__init__.py", line 58, in encode
            return value.encode(encoding)
        UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 9: ordinal not in range(128)
        15:30:37 E Cannot build _catboost.pyd with CUDA support, will build without CUDA
        15:30:37 I EXECUTE: D:\anaconda3\python.exe D:\learn\catboost-master\ya make D:\learn\catboost-master\catboost\python-package\..\..\catboost\python-package\catboost --no-src-links --output D:\learn\catboost-master\catboost\python-package\build\temp.win-amd64-3.8\Release -DPYTHON_CONFIG=python3-config -DUSE_ARCADIA_PYTHON=no -DOS_SDK=local -r -DNO_DEBUGINFO -DHAVE_CUDA=no
        Output root is subdirectory of Arcadia root, this may cause non-idempotent build
        Traceback (most recent call last):
          File "devtools/ya/app.py", line 422, in configure_exit_interceptor
            yield
          File "devtools/ya/app.py", line 65, in helper
            return action(args)
          File "devtools/ya/entry/entry.py", line 55, in do_main
            res = handler.handle(handler, args, prefix=['ya'])
          File "devtools/ya/core/handler.py", line 159, in handle
            return handler.handle(self, args[1:], prefix + [name])
          File "devtools/ya/core/dispatch.py", line 37, in handle
            return self.command().handle(root_handler, args, prefix)
          File "devtools/ya/core/handler.py", line 341, in handle
            return self._action(params)
          File "devtools/ya/app.py", line 92, in helper
            return action(ctx.params)
          File "devtools/ya/build/build_handler.py", line 85, in do_ya_make
            builder = ya_make.YaMake(params, app_ctx)
          File "devtools/ya/build/ya_make.py", line 895, in __init__
            self.ctx = Context(self.opts, app_ctx=app_ctx, graph=graph, tests=tests, stripped_tests=stripped_tests, configure_errors=configure_errors, make_files=make_files, lite_graph=lite_graph)
          File "devtools/ya/build/ya_make.py", line 574, in __init__
            self.graph, self.tests, self.stripped_tests, self.configure_errors, self.make_files = _build_graph_and_tests(self.opts, app_ctx)
          File "devtools/ya/build/ya_make.py", line 258, in _build_graph_and_tests
            graph, tests, stripped_tests, gh, make_files = lg.build_graph_and_tests(opts, check=True, ev_listener=ev_listener, display=display)
          File "devtools/ya/build/graph.py", line 1688, in build_graph_and_tests
            return _build_graph_and_tests(opts, check, ev_listener, exit_stack, display)
          File "devtools/ya/build/graph.py", line 1992, in _build_graph_and_tests
            real_ymake_bin = tools.tool('ymake')
          File "devtools/ya/yalibrary/tools/__init__.py", line 220, in tool
            return toolchain.find(name, with_params, for_platform, cache=cache)
          File "devtools/ya/yalibrary/tools/__init__.py", line 158, in find
            executable = cur_bottle[executable_name]  # if executable_name is None it's Ok
          File "devtools/ya/yalibrary/tools/__init__.py", line 64, in __getitem__
            path = self.resolve()
          File "devtools/ya/yalibrary/tools/__init__.py", line 46, in resolve
            return self.__fetcher.fetch_if_need(self.__formula["match"], tared, binname, cache=cache).where
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 385, in fetch_if_need
            self.__c[key] = self._fetch_if_need(*args, **kwargs)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 452, in _fetch_if_need
            if self._fetch(name, tared, lambda x: name.lower() in x.lower(), binname):
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 368, in _fetch
            _install(res_path, do_install)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 104, in _install
            fs_handler(install_guard)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 95, in fs_handler
            func(install_guard)
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 350, in do_install
            deploy_params=(UNTAR, resource_info if resource_info else {"file_name": "FILE"}, ""))
          File "devtools/ya/yalibrary/fetcher/__init__.py", line 137, in _deploy_tool
            exts.archive.extract_from_tar(archive, extract_to)
          File "devtools/ya/exts/archive.py", line 16, in extract_from_tar
            archive.extract_tar(tar_file_path, output_dir)
          File "library/python/archive/__init__.py", line 62, in extract_tar
            output_dir = encode(output_dir, ENCODING)
          File "library/python/archive/__init__.py", line 58, in encode
            return value.encode(encoding)
        UnicodeDecodeError: 'ascii' codec can't decode byte 0xcd in position 9: ordinal not in range(128)
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "D:\learn\catboost-master\catboost\python-package\setup.py", line 259, in <module>
            setup(
          File "D:\anaconda3\lib\site-packages\setuptools\__init__.py", line 153, in setup
            return distutils.core.setup(**attrs)
          File "D:\anaconda3\lib\distutils\core.py", line 148, in setup
            dist.run_commands()
          File "D:\anaconda3\lib\distutils\dist.py", line 966, in run_commands
            self.run_command(cmd)
          File "D:\anaconda3\lib\distutils\dist.py", line 985, in run_command
            cmd_obj.run()
          File "D:\anaconda3\lib\site-packages\setuptools\command\develop.py", line 34, in run
            self.install_for_development()
          File "D:\anaconda3\lib\site-packages\setuptools\command\develop.py", line 136, in install_for_development
            self.run_command('build_ext')
          File "D:\anaconda3\lib\distutils\cmd.py", line 313, in run_command
            self.distribution.run_command(command)
          File "D:\anaconda3\lib\distutils\dist.py", line 985, in run_command
            cmd_obj.run()
          File "D:\learn\catboost-master\catboost\python-package\setup.py", line 186, in run
            self.build_with_ymake(topsrc_dir, build_dir, catboost_ext, put_dir, verbose, dry_run)
          File "D:\learn\catboost-master\catboost\python-package\setup.py", line 219, in build_with_ymake
            logging_execute(ymake_cmd + ['-DHAVE_CUDA=no'], verbose, dry_run)
          File "D:\learn\catboost-master\catboost\python-package\setup.py", line 62, in logging_execute
            subprocess.check_call(cmd, universal_newlines=True)
          File "D:\anaconda3\lib\subprocess.py", line 364, in check_call
            raise CalledProcessError(retcode, cmd)
        subprocess.CalledProcessError: Command '['D:\\anaconda3\\python.exe', 'D:\\learn\\catboost-master\\ya', 'make', 'D:\\learn\\catboost-master\\catboost\\python-package\\..\\..\\catboost\\python-package\\catboost', '--no-src-links', '--output', 'D:\\learn\\catboost-master\\catboost\\python-package\\build\\temp.win-amd64-3.8\\Release', '-DPYTHON_CONFIG=python3-config', '-DUSE_ARCADIA_PYTHON=no', '-DOS_SDK=local', '-r', '-DNO_DEBUGINFO', '-DHAVE_CUDA=no']' returned non-zero exit status 1.
        ----------------------------------------
    ERROR: Command errored out with exit status 1: 'D:\anaconda3\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'D:\\learn\\catboost-master\\catboost\\python-package\\setup.py'"'"'; __file__='"'"'D:\\learn\\catboost-master\\catboost\\python-package\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.
    
    opened by Wangpc-972 67
  • User description is used by default. Move metric creation metric to corresponding class factories.

    User description is used by default. Move metric creation metric to corresponding class factories.

    Each metric now uses user-specified parameters in their descriptions by default.

    Design

    TMetric now stores a TMap<TString, TString> of user parameters, which are used to construct a metric description (e.g. MetricName:key1=value1;key2=value2). This implementation is defined in the base class and is now the default behaviour for building metric descriptions.

    Some of specifiv GetDescription method implementations are kept in order to be consistent with the existing behaviour.

    Note

    UserQuerywiseMetric now uses the options in its representation as well.

    opened by ivanychev 38
  • Sum of shap values does not equal to the prediction

    Sum of shap values does not equal to the prediction

    Problem: Sum of shap values does not equal to the prediction catboost version: 0.18.1 Operating System: Ubuntu 19.10 CPU: i7-8565U

    It only happens sometimes but we find that the of shap values does not equal to the prediction. Please let us know how we can provide further information

    in progress bug 
    opened by hopoluicha 27
  • How catboost handle with big data?

    How catboost handle with big data?

    Hi! I try to use catboost in kaggle competition. https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection The size of my train set about 40m rows with 14 features. When i try to train model, kernel always dies without any errors...

    need info 
    opened by Mechanix12 27
  • Unknown class labels

    Unknown class labels

    I'm beginner using boosting models ,I'm trying to implement catboost . My input data has 6 categorical features and 2 numerical feature . My target variable is numerical data. I'm running on GPU . I'm facing the problem below please help me. Cannot chare data due privacy issue.

    Traceback (most recent call last): File "/work/ilt/css8222/cat_boost/cat_boost.py", line 127, in save_snapshot = True File "/fibus/fs2/15/css8222/.local/lib/python3.6/site-packages/catboost/core.py", line 4718, in fit silent, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, init_model, callbacks, log_cout, log_cerr) File "/fibus/fs2/15/css8222/.local/lib/python3.6/site-packages/catboost/core.py", line 2042, in _fit train_params["init_model"] File "/fibus/fs2/15/css8222/.local/lib/python3.6/site-packages/catboost/core.py", line 1464, in _train self._object._train(train_pool, test_pool, params, allow_clear_pool, init_model._object if init_model else None) File "_catboost.pyx", line 4393, in _catboost._CatBoost._train File "_catboost.pyx", line 4442, in _catboost._CatBoost._train _catboost.CatBoostError: catboost/private/libs/target/target_converter.cpp:226: Unknown class label: "14289"

    opened by sujay003 25
  • Faster SHAP values for small batches

    Faster SHAP values for small batches

    For small batches use direct SHAP values calculation. Direct algorithm (without precalculation) is faster when (where DocumentsNumber < MeanLeafCount), because for preprocessing we find SHAP values for MeanLeafCount documents.

    (algorithm from https://arxiv.org/abs/1802.03888)

    With preprocessing final complexity was O(NT(D+F))+O(TL^2 D^2) where N is the number of documents(objects), T - number of trees, D - average tree depth, F - average number of features in tree, L - average number of leaves in tree. But if the batch is small we can use default algorithm with complexity O(NTLD^2), which is better when N < L.

    Example: On dataset gisette (https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html) with 100 first features train CatBoostRegressor(iterations=500, depth=6, random_seed=42) and then use get_feature_importance to find SHAP values for the first object in test.

    Old:

    • 0.32 s

    New:

    • shap_mode="Auto" or "NoPreCalc"- 0.015 s
    • shap_mode="UsePreCalc" - 0.32 s (this is like it was before)

    I hereby agree to the terms of the CLA available at: link

    opened by Lokutrus 25
  • Tutorial for ranking modes in CatBoost

    Tutorial for ranking modes in CatBoost

    Hello.

    Looks like the current version of CatBoost supports learning to rank. There are some clues about it in the documentation, but I couldn't find any minimal working examples. I wonder which methods should be considered as a baseline approach and what are the prerequisites?

    Should we use YetiRank as the training metric and just provide a query id as the Pool group_id parameter? What other CatBoost parameters should be taken into account specifically for a ranking problem?

    Thank you!

    planned documentation 
    opened by hanky 24
  • GPU yields worse metric than CPU

    GPU yields worse metric than CPU

    Problem:various measurements become worse when I switch from CPU to GPU catboost version:0.22 Operating System:Linux 4.4.0-1100-aws x86_64 CPU: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz

    GPU: Tesla M60

    I wanted to reduce the training time and so I specified 'task_type' as 'GPU'. I immediately noticed that its metrics got worse. The only change I made was setting task_type as GPU. The rest are the same.

    The training dataset has 1.2M rows and 218 columns. Among these 218 columns, 42 are categorical features. The rest are floats or integers, no text features. The validation dataset has 120K rows.

    The following are the parameters for the CPU version: {'nan_mode': 'Min', 'eval_metric': 'Logloss', 'combinations_ctr': ['Borders:CtrBorderCount=15:CtrBorderType=Uniform:TargetBorderCount=1:TargetBorderType=MinEntropy:Prior=0/1:Prior=0.5/1:Prior=1/1', 'Counter:CtrBorderCount=15:CtrBorderType=Uniform:Prior=0/1'], 'iterations': 1000, 'sampling_frequency': 'PerTree', 'fold_permutation_block': 0, 'leaf_estimation_method': 'Newton', 'od_pval': 0, 'counter_calc_method': 'SkipTest', 'grow_policy': 'SymmetricTree', 'boosting_type': 'Plain', 'model_shrink_mode': 'Constant', 'feature_border_type': 'GreedyLogSum', 'ctr_leaf_count_limit': 18446744073709551615, 'bayesian_matrix_reg': 0.10000000149011612, 'one_hot_max_size': 2, 'l2_leaf_reg': 3, 'random_strength': 1, 'od_type': 'Iter', 'rsm': 1, 'boost_from_average': False, 'max_ctr_complexity': 4, 'model_size_reg': 0.5, 'simple_ctr': ['Borders:CtrBorderCount=15:CtrBorderType=Uniform:TargetBorderCount=1:TargetBorderType=MinEntropy:Prior=0/1:Prior=0.5/1:Prior=1/1', 'Counter:CtrBorderCount=15:CtrBorderType=Uniform:Prior=0/1'], 'subsample': 0.800000011920929, 'use_best_model': True, 'od_wait': 35, 'class_names': [0, 1], 'random_seed': 42, 'depth': 6, 'ctr_target_border_count': 1, 'has_time': False, 'store_all_simple_ctr': False, 'border_count': 254, 'classes_count': 0, 'sparse_features_conflict_fraction': 0, 'leaf_estimation_backtracking': 'AnyImprovement', 'best_model_min_trees': 1, 'model_shrink_rate': 0, 'min_data_in_leaf': 1, 'loss_function': 'Logloss', 'learning_rate': 0.30000001192092896, 'score_function': 'Cosine', 'task_type': 'CPU', 'leaf_estimation_iterations': 10, 'bootstrap_type': 'MVS', 'max_leaves': 64, 'permutation_count': 4}

    The following are the parameters for the GPU version: {'nan_mode': 'Min', 'gpu_ram_part': 0.95, 'eval_metric': 'Logloss', 'combinations_ctr': ['Borders:CtrBorderCount=15:CtrBorderType=Uniform:TargetBorderCount=1:TargetBorderType=MinEntropy:Prior=0/1:Prior=0.5/1:Prior=1/1', 'FeatureFreq:CtrBorderCount=15:CtrBorderType=Median:Prior=0/1'], 'iterations': 1000, 'fold_permutation_block': 64, 'leaf_estimation_method': 'Newton', 'observations_to_bootstrap': 'TestOnly', 'od_pval': 0, 'counter_calc_method': 'SkipTest', 'grow_policy': 'SymmetricTree', 'boosting_type': 'Plain', 'ctr_history_unit': 'Sample', 'feature_border_type': 'GreedyLogSum', 'bayesian_matrix_reg': 0.10000000149011612, 'one_hot_max_size': 2, 'devices': '-1', 'pinned_memory_bytes': '104857600', 'l2_leaf_reg': 3, 'random_strength': 1, 'od_type': 'Iter', 'rsm': 1, 'boost_from_average': False, 'fold_size_loss_normalization': False, 'max_ctr_complexity': 4, 'gpu_cat_features_storage': 'GpuRam', 'simple_ctr': ['Borders:CtrBorderCount=15:CtrBorderType=Uniform:TargetBorderCount=1:TargetBorderType=MinEntropy:Prior=0/1:Prior=0.5/1:Prior=1/1', 'FeatureFreq:CtrBorderCount=15:CtrBorderType=MinEntropy:Prior=0/1'], 'use_best_model': True, 'od_wait': 35, 'class_names': [0, 1], 'random_seed': 42, 'depth': 6, 'ctr_target_border_count': 1, 'has_time': False, 'border_count': 128, 'min_fold_size': 100, 'data_partition': 'FeatureParallel', 'bagging_temperature': 1, 'classes_count': 0, 'leaf_estimation_backtracking': 'AnyImprovement', 'best_model_min_trees': 1, 'min_data_in_leaf': 1, 'add_ridge_penalty_to_loss_function': False, 'loss_function': 'Logloss', 'learning_rate': 0.30000001192092896, 'score_function': 'Cosine', 'task_type': 'GPU', 'leaf_estimation_iterations': 10, 'bootstrap_type': 'Bayesian', 'max_leaves': 64, 'permutation_count': 4}

    opened by kdlin 23
  • Using parameters from saved model for cross-validation leads to 'exclusive parameters' error.

    Using parameters from saved model for cross-validation leads to 'exclusive parameters' error.

    Problem: "Only one of parameters ['verbose', 'logging_level', 'verbose_eval', 'silent'] should be set" printed by cv function after loading from file previously saved model. catboost version: 0.12.2 Operating System: CentOS Linux release 7.4.1708 CPU: Intel(R) Xeon(R) CPU E5-2450 v2 @ 2.50GHz

    model = CatBoostClassifier(loss_function='MultiClass')
    model.fit(train_pool, 
      verbose=False, 
      plot=True,
      eval_set=validation_pool)
    model.save_model(str(model_path.absolute()))
    model = CatBoostClassifier()
    model.load_model(str(model_path.absolute()))
    cv_data = cv(
        whole_pool,
        params=model.get_params()
    )
    
    ---------------------------------------------------------------------------
    CatboostError                             Traceback (most recent call last)
    <ipython-input-40-f150897615b8> in <module>
          1 cv_data = cv(
          2     whole_pool,
    ----> 3     params=model.get_params()
          4 )
    
    ~/.conda/envs/catboost/lib/python3.6/site-packages/catboost/core.py in cv(pool, params, dtrain, iterations, num_boost_round, fold_count, nfold, inverted, partition_random_seed, seed, shuffle, logging_level, stratified, as_pandas, metric_period, verbose, verbose_eval, plot, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, max_time_spent_on_fixed_cost_ratio, dev_max_iterations_batch_size)
       2876 
       2877     params = deepcopy(params)
    -> 2878     _process_synonyms(params)
       2879 
       2880     metric_period, verbose, logging_level = _process_verbose(metric_period, verbose, logging_level, verbose_eval)
    
    ~/.conda/envs/catboost/lib/python3.6/site-packages/catboost/core.py in _process_synonyms(params)
        754         del params['silent']
        755 
    --> 756     metric_period, verbose, logging_level = _process_verbose(metric_period, verbose, logging_level, verbose_eval, silent)
        757 
        758     if metric_period is not None:
    
    ~/.conda/envs/catboost/lib/python3.6/site-packages/catboost/core.py in _process_verbose(metric_period, verbose, logging_level, verbose_eval, silent)
        133     at_most_one = sum(params.get(exclusive) is not None for exclusive in exclusive_params)
        134     if at_most_one > 1:
    --> 135         raise CatboostError('Only one of parameters {} should be set'.format(exclusive_params))
        136 
        137     if verbose is None:
    
    CatboostError: Only one of parameters ['verbose', 'logging_level', 'verbose_eval', 'silent'] should be set
    
    bug 
    opened by protsenkovi 23
  • Flag not copied unnecessarily with blank and whitespace

    Flag not copied unnecessarily with blank and whitespace

    Before submitting a pull request, please do the following steps:

    1. Read instructions for contributors here.
    2. Run ya make in catboost folder to make sure the code builds.
    3. Add tests that test your change.
    4. Run tests using ya make -t -A command.
    5. If you haven't already, complete the CLA. I hereby agree to the terms of the CLA available at https://yandex.ru/legal/cla/?lang=en.
    opened by sharaalfa 23
  • Issue trying to compile with specified gcc version

    Issue trying to compile with specified gcc version

    I'm trying to compile the catboost python wheel on my system. The default gcc version I have is 8, but I also have 7 installed so I'm trying to use that by setting the CC and CXX environment variables. However, when running:

    python mk_wheel.py -DCUDA_ROOT="/opt/cuda"
    

    I get the message:

    Info: Attention! Using system user-defined compiler: g++-7 (check CC and CXX env vars).
    Cross compilation with system CXX is not supported
    

    catboost version: git master Operating System: Linux CPU: i7 GPU: GTX 1080

    Thanks!

    build issues 
    opened by ctlaltdefeat 23
  • Spark Feature Importance issue

    Spark Feature Importance issue

    Problem: ai.catboost.CatBoostError: Unsupported data type for Label at ai.catboost.spark.DatasetLoadingContext$.getLabelCallback(DataHelpers.scala:465) catboost version: 1.1.1 Operating System: Linux, Spark 3.3.1

    The following method call fails with the error described above:

    ((CatBoostClassificationModel) model).getFeatureImportance(EFstrType.LossFunctionChange, evalPool, ECalcTypeShapValues.Regular)
    
    opened by eugene-kamenev 0
  •  Saved model's params are different from current model's params

    Saved model's params are different from current model's params

    Problem: Can't fit models on GPU, Saved model's params are different from current model's params catboost version: '1.1.1' Operating System: Windows 10 CPU: 0 GPU: 1

    model_cat_tm_1 = CatBoostClassifier( iterations=5000, loss_function ='Logloss', #eval_metric = 'AUC', learning_rate = 0.05, random_seed = 1, od_type = "Iter", od_wait = 200, depth = 5, task_type = "GPU", devices = '0:1', save_snapshot= False, )

    cv_params_tm_1 = model_cat_tm_1.get_params() cv_data_tm_1 = cv( Pool(train_tm_treatment_one_features, train_tm_treatment_one_target), cv_params_tm_1, plot=True, verbose=100, )

    Gettting this error (tried, rebooting the system, open another script - doesn't help)

    Training on fold [0/3]

    CatBoostError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_3516\715857703.py in 1 cv_params_tm_1 = model_cat_tm_1.get_params() ----> 2 cv_data_tm_1 = cv( 3 Pool(train_tm_treatment_one_features, train_tm_treatment_one_target), 4 cv_params_tm_1, 5 plot=True,

    ~\AppData\Roaming\Python\Python39\site-packages\catboost\core.py in cv(pool, params, dtrain, iterations, num_boost_round, fold_count, nfold, inverted, partition_random_seed, seed, shuffle, logging_level, stratified, as_pandas, metric_period, verbose, verbose_eval, plot, plot_file, early_stopping_rounds, save_snapshot, snapshot_file, snapshot_interval, metric_update_interval, folds, type, return_models, log_cout, log_cerr) 6648 with log_fixup(log_cout, log_cerr), plot_wrapper(plot, plot_file=plot_file, plot_title='Cross-validation plot', train_dirs=plot_dirs): 6649 if not return_models: -> 6650 return _cv(params, pool, fold_count, inverted, partition_random_seed, shuffle, stratified, 6651 metric_update_interval, as_pandas, folds, type, return_models) 6652 else:

    _catboost.pyx in _catboost._cv()

    _catboost.pyx in _catboost._cv()

    CatBoostError: C:/Program Files (x86)/Go Agent/pipelines/BuildMaster/catboost.git/catboost/cuda/methods/boosting_progress_tracker.cpp:171: Saved model's params are different from current model's params

    opened by MiMakh 0
  • Catboost spark fit error java.lang.ClassCastException

    Catboost spark fit error java.lang.ClassCastException

    Problem: net.razorvine.pickle.objects.TimeDelta cannot be cast to java.time.Duration catboost version: 1.0.6 Operating System: 10.4 LTS ML (includes Apache Spark 3.2.1, Scala 2.12)

    Hi, I'm trying to test catboost_spark in a Databricks notebook using the example from the official documentation: https://catboost.ai/en/docs/concepts/spark-quickstart-python#binary-classification

    When I run this command:

    classifier.fit(dataset=trainPool, evalDatasets=[evalPool])
    

    The following error is raised:

    java.lang.ClassCastException: net.razorvine.pickle.objects.TimeDelta cannot be cast to java.time.Duration
    
    ...
    
    Py4JJavaError: An error occurred while calling o18779.w.
    : java.lang.ClassCastException: net.razorvine.pickle.objects.TimeDelta cannot be cast to java.time.Duration
    	at ai.catboost.spark.params.DurationParam.w(Helpers.scala:61)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
    	at py4j.Gateway.invoke(Gateway.java:295)
    	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    	at py4j.commands.CallCommand.execute(CallCommand.java:79)
    	at py4j.GatewayConnection.run(GatewayConnection.java:251)
    	at java.lang.Thread.run(Thread.java:748)
    

    I believe there is a similar issue to this but it is now closed. Thank you in advance for the help.

    opened by vitormanita 0
  • parameter missing for non_linear regression

    parameter missing for non_linear regression

    Problem: Non Linear Regression "Poly" Kernal parameter missing catboost version: 0.26.1 Operating System: Linux CPU:True GPU:False

    Hi there, I am training a model for linear regression problem but my data has non-linear in nature. So I have decided to change kernel like Poly or something for non_linear that we have Support Vector Regressor. I have tried searching for same in Catboost parameters but i couldn't get. Do you have plans for adding it? Thanks

    opened by hamza1424 0
  • Log message

    Log message "There are invalid params and some of them will be ignored."

    Problem: some strange messages in the logs catboost version: 0.26.1 Operating System: Linux

    I still see the messages in the logs, using catboost for java 0.26.1 for prediction, and catboost for python 0.26.1 to train the model (with feature_weights parameter in CatBoostRegressor).

    There are invalid params and some of them will be ignored.
    Parameter {"feature_weights":[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0.1,0.1,0.1,0.1,0.1,0.1,0.1]} is ignored, because it cannot be parsed.
    

    Probably related tickets: https://github.com/catboost/catboost/issues/873 https://github.com/catboost/catboost/issues/1169

    opened by mazurkin 0
Releases(v1.1.1)
Owner
CatBoost
CatBoost is a fast, scalable, high performance gradient boosting on decision trees library. Used for ranking, classification, regression and other ML tasks.
CatBoost
Fast and accurate optimisation for registration with little learningconvexadam

convexAdam Learn2Reg 2021 Submission Fast and accurate optimisation for registration with little learning Excellent results on Learn2Reg 2021 challeng

17 Dec 06, 2022
This reposityory contains the PyTorch implementation of our paper "Generative Dynamic Patch Attack".

Generative Dynamic Patch Attack This reposityory contains the PyTorch implementation of our paper "Generative Dynamic Patch Attack". Requirements PyTo

Xiang Li 8 Nov 17, 2022
A minimal TPU compatible Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

NeRF Minimal Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Result of Tiny-NeRF RGB Depth

Soumik Rakshit 11 Jul 24, 2022
Doods2 - API for detecting objects in images and video streams using Tensorflow

DOODS2 - Return of DOODS Dedicated Open Object Detection Service - Yes, it's a b

Zach 101 Jan 04, 2023
The pytorch implementation of SOKD (BMVC2021).

Semi-Online Knowledge Distillation Implementations of SOKD. Requirements This repo was tested with Python 3.8, PyTorch 1.5.1, torchvision 0.6.1, CUDA

4 Dec 19, 2021
Code for KDD'20 "An Efficient Neighborhood-based Interaction Model for Recommendation on Heterogeneous Graph"

Heterogeneous INteract and aggreGatE (GraphHINGE) This is a pytorch implementation of GraphHINGE model. This is the experiment code in the following w

Jinjiarui 69 Nov 24, 2022
Self-supervised Product Quantization for Deep Unsupervised Image Retrieval - ICCV2021

Self-supervised Product Quantization for Deep Unsupervised Image Retrieval Pytorch implementation of SPQ Accepted to ICCV 2021 - paper Young Kyun Jang

Young Kyun Jang 71 Dec 27, 2022
VOGUE: Try-On by StyleGAN Interpolation Optimization

VOGUE is a StyleGAN interpolation optimization algorithm for photo-realistic try-on. Top: shirt try-on automatically synthesized by our method in two different examples.

Wei ZHANG 66 Dec 09, 2022
Official code for Score-Based Generative Modeling through Stochastic Differential Equations

Score-Based Generative Modeling through Stochastic Differential Equations This repo contains the official implementation for the paper Score-Based Gen

Yang Song 818 Jan 06, 2023
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] Overview Content Prerequisites Data Prep

268 Jan 09, 2023
A basic neural network for image segmentation.

Unet_erythema_detection A basic neural network for image segmentation. 前期准备 1.在logs文件夹中下载h5权重文件,百度网盘链接在logs文件夹中 2.将所有原图 放置在“/dataset_1/JPEGImages/”文件夹

1 Jan 16, 2022
Official Pytorch implementation of "Learning Debiased Representation via Disentangled Feature Augmentation (Neurips 2021, Oral)"

Learning Debiased Representation via Disentangled Feature Augmentation (Neurips 2021, Oral): Official Project Webpage This repository provides the off

Kakao Enterprise Corp. 68 Dec 17, 2022
Deep learning operations reinvented (for pytorch, tensorflow, jax and others)

This video in better quality. einops Flexible and powerful tensor operations for readable and reliable code. Supports numpy, pytorch, tensorflow, and

Alex Rogozhnikov 6.2k Jan 01, 2023
This repository contains the source code of our work on designing efficient CNNs for computer vision

Efficient networks for Computer Vision This repo contains source code of our work on designing efficient networks for different computer vision tasks:

Sachin Mehta 386 Nov 26, 2022
Pytorch implementation of Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization https://arxiv.org/abs/2008.11646

[TCSVT] Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization LPN [Paper] NEWs Prerequisites Python 3.6 GPU Memory = 8G Numpy 1.

46 Dec 14, 2022
XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale

XtremeDistilTransformers for Distilling Massive Multilingual Neural Networks ACL 2020 Microsoft Research [Paper] [Video] Releasing [XtremeDistilTransf

Microsoft 125 Jan 04, 2023
SwinIR: Image Restoration Using Swin Transformer

SwinIR: Image Restoration Using Swin Transformer This repository is the official PyTorch implementation of SwinIR: Image Restoration Using Shifted Win

Jingyun Liang 2.4k Jan 08, 2023
Utilizes Pose Estimation to offer sprinters cues based on an image of their running form.

Running-Form-Correction Utilizes Pose Estimation to offer sprinters cues based on an image of their running form. How to Run Dependencies You will nee

3 Nov 08, 2022
This is the code for the paper "Motion-Focused Contrastive Learning of Video Representations" (ICCV'21).

Motion-Focused Contrastive Learning of Video Representations Introduction This is the code for the paper "Motion-Focused Contrastive Learning of Video

11 Sep 23, 2022
A Kaggle competition: discriminate gender based on handwriting

Gender discrimination based on handwriting See http://fastml.com/gender-discrimination/ for description. prep_data.py - a first step chunk_by_authors.

Zygmunt Zając 22 Jul 20, 2022