Snips Python library to extract meaning from text

Overview

Snips NLU

https://travis-ci.org/snipsco/snips-nlu.svg?branch=master https://ci.appveyor.com/api/projects/status/github/snipsco/snips-nlu?branch=master&svg=true https://img.shields.io/pypi/v/snips-nlu.svg?branch=master https://img.shields.io/pypi/pyversions/snips-nlu.svg?branch=master https://img.shields.io/twitter/url/http/shields.io.svg?style=social

Snips NLU (Natural Language Understanding) is a Python library that allows to extract structured information from sentences written in natural language.

Summary

What is Snips NLU about ?

Behind every chatbot and voice assistant lies a common piece of technology: Natural Language Understanding (NLU). Anytime a user interacts with an AI using natural language, their words need to be translated into a machine-readable description of what they meant.

The NLU engine first detects what the intention of the user is (a.k.a. intent), then extracts the parameters (called slots) of the query. The developer can then use this to determine the appropriate action or response.

Let’s take an example to illustrate this, and consider the following sentence:

"What will be the weather in paris at 9pm?"

Properly trained, the Snips NLU engine will be able to extract structured data such as:

{
   "intent": {
      "intentName": "searchWeatherForecast",
      "probability": 0.95
   },
   "slots": [
      {
         "value": "paris",
         "entity": "locality",
         "slotName": "forecast_locality"
      },
      {
         "value": {
            "kind": "InstantTime",
            "value": "2018-02-08 20:00:00 +00:00"
         },
         "entity": "snips/datetime",
         "slotName": "forecast_start_datetime"
      }
   ]
}

In this case, the identified intent is searchWeatherForecast and two slots were extracted, a locality and a datetime. As you can see, Snips NLU does an extra step on top of extracting entities: it resolves them. The extracted datetime value has indeed been converted into a handy ISO format.

Check out our blog post to get more details about why we built Snips NLU and how it works under the hood. We also published a paper on arxiv, presenting the machine learning architecture of the Snips Voice Platform.

Getting Started

System requirements

  • Python 2.7 or Python >= 3.5
  • RAM: Snips NLU will typically use between 100MB and 200MB of RAM, depending on the language and the size of the dataset.

Installation

pip install snips-nlu

We currently have pre-built binaries (wheels) for snips-nlu and its dependencies for MacOS (10.11 and later), Linux x86_64 and Windows.

For any other architecture/os snips-nlu can be installed from the source distribution. To do so, Rust and setuptools_rust must be installed before running the pip install snips-nlu command.

Language resources

Snips NLU relies on external language resources that must be downloaded before the library can be used. You can fetch resources for a specific language by running the following command:

python -m snips_nlu download en

Or simply:

snips-nlu download en

The list of supported languages is available at this address.

API Usage

Command Line Interface

The easiest way to test the abilities of this library is through the command line interface.

First, start by training the NLU with one of the sample datasets:

snips-nlu train path/to/dataset.json path/to/output_trained_engine

Where path/to/dataset.json is the path to the dataset which will be used during training, and path/to/output_trained_engine is the location where the trained engine should be persisted once the training is done.

After that, you can start parsing sentences interactively by running:

snips-nlu parse path/to/trained_engine

Where path/to/trained_engine corresponds to the location where you have stored the trained engine during the previous step.

Sample code

Here is a sample code that you can run on your machine after having installed snips-nlu, fetched the english resources and downloaded one of the sample datasets:

>>> from __future__ import unicode_literals, print_function
>>> import io
>>> import json
>>> from snips_nlu import SnipsNLUEngine
>>> from snips_nlu.default_configs import CONFIG_EN
>>> with io.open("sample_datasets/lights_dataset.json") as f:
...     sample_dataset = json.load(f)
>>> nlu_engine = SnipsNLUEngine(config=CONFIG_EN)
>>> nlu_engine = nlu_engine.fit(sample_dataset)
>>> text = "Please turn the light on in the kitchen"
>>> parsing = nlu_engine.parse(text)
>>> parsing["intent"]["intentName"]
'turnLightOn'

What it does is training an NLU engine on a sample weather dataset and parsing a weather query.

Sample datasets

Here is a list of some datasets that can be used to train a Snips NLU engine:

  • Lights dataset: "Turn on the lights in the kitchen", "Set the light to red in the bedroom"
  • Beverage dataset: "Prepare two cups of cappucino", "Make me a cup of tea"
  • Flights dataset: "Book me a flight to go to boston this weekend", "book me some tickets from istanbul to moscow in three days"

Benchmarks

In January 2018, we reproduced an academic benchmark which was published during the summer 2017. In this article, authors assessed the performance of API.ai (now Dialogflow, Google), Luis.ai (Microsoft), IBM Watson, and Rasa NLU. For fairness, we used an updated version of Rasa NLU and compared it to the latest version of Snips NLU (both in dark blue).

.img/benchmarks.png

In the figure above, F1 scores of both intent classification and slot filling were computed for several NLU providers, and averaged accross the three datasets used in the academic benchmark mentionned before. All the underlying results can be found here.

Documentation

To find out how to use Snips NLU please refer to the package documentation, it will provide you with a step-by-step guide on how to setup and use this library.

Citing Snips NLU

Please cite the following paper when using Snips NLU:

@article{coucke2018snips,
  title   = {Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces},
  author  = {Coucke, Alice and Saade, Alaa and Ball, Adrien and Bluche, Th{\'e}odore and Caulier, Alexandre and Leroy, David and Doumouro, Cl{\'e}ment and Gisselbrecht, Thibault and Caltagirone, Francesco and Lavril, Thibaut and others},
  journal = {arXiv preprint arXiv:1805.10190},
  pages   = {12--16},
  year    = {2018}
}

FAQ & Community

Please join the forum to ask your questions and get feedback from the community.

Related content

How do I contribute ?

Please see the Contribution Guidelines.

Licence

This library is provided by Snips as Open Source software. See LICENSE for more information.

Geonames Licence

The snips/city, snips/country and snips/region builtin entities rely on software from Geonames, which is made available under a Creative Commons Attribution 4.0 license international. For the license and warranties for Geonames please refer to: https://creativecommons.org/licenses/by/4.0/legalcode.

Comments
  • problem installing snips on windows

    problem installing snips on windows

    After installing Visual Studio 2017 and Rust, I finally ran pip install snips-nlu and the following error was displayed error[E0425]: cannot find function parse_crate in module syn --> C:\Users\jhg.cargo\registry\src\github.com-1ecc6299db9ec823\cbindgen-0.4.3\src\bindgen\parser.rs:167:30

    What can I do ?

    opened by hvaneylen 14
  • [Windows] Generating custom dataset fails

    [Windows] Generating custom dataset fails

    I'm following the tutorial, so I have three intent files, setTemperature.txt, turnLightsOn.txt, and turnLightsOff.txt, and one entity file, rooms.txt. Running

    generate-dataset --language en --intent-files turnLightOn.txt turnLightOff.txt setTemperature.txt --entity-files room.txt > dataset.json

    generates a dataset, but when I do

     with io.open("dataset.json") as f:
        dataset = json.load(f)
    engine.fit(dataset)
    
    

    I get the following error

    Traceback (most recent call last):
      File "C:\Users\J90779\.spyder-py3\testSnipIdle.py", line 12, in <module>
        engine.fit(dataset)
      File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\utils.py", line 259, in wrapped
        res = fn(*args, **kwargs)
      File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\nlu_engine\nlu_engine.py", line 95, in fit
        recycled_parser.fit(dataset, force_retrain)
      File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\utils.py", line 259, in wrapped
        res = fn(*args, **kwargs)
      File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\intent_parser\probabilistic_intent_parser.py", line 84, in fit
        self.slot_fillers[intent_name].fit(dataset, intent_name)
      File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\utils.py", line 259, in wrapped
        res = fn(*args, **kwargs)
      File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\slot_filler\crf_slot_filler.py", line 133, in fit
        for sample in crf_samples]
      File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\slot_filler\crf_slot_filler.py", line 133, in <listcomp>
        for sample in crf_samples]
      File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\slot_filler\crf_slot_filler.py", line 203, in compute_features
        value = feature.compute(i, cache)
      File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\slot_filler\feature.py", line 59, in compute
        value = self.function(tokens, token_index + self.offset)
      File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\slot_filler\feature_factory.py", line 498, in builtin_entity_match
        text, self.language, scope=[builtin_entity], use_cache=True)
      File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\builtin_entities.py", line 46, in get_builtin_entities
        return parser.parse(text, scope=scope, use_cache=use_cache)
      File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\builtin_entities.py", line 26, in parse
        parser_result = self.parser.parse(text, scope)
      File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu_ontology\builtin_entities.py", line 159, in parse
        self._parser, text.encode("utf8"), scope, byref(ptr))
    OSError: [WinError -529697949] Windows Error 0xe06d7363
    

    Everything is in the same folder, and fitting the sample_dataset.json works just fine. I also tried to manually type out a dataset following the formatting in sample_dataset, but I get the same error. Is this an issue on my side or something else.

    Also running snips version 0.14.0

    bug 
    opened by Brannonj96 13
  • different & low scores in snips-nlu

    different & low scores in snips-nlu

    Hello team I installed in Linux red hat, I able to train the data I just want intent classification. Say we have trained the model with a utterance like -> "Unable to edit the bill for my request, which is submitted for approval". And after training I asked in a different manner -> "submitted for approval but unable to edit the bill". Now I get very good confidence score (0.88) if my number of intents are very small say 2-3. When the number of intents are more say close to 30 and I ask the same question -> "submitted for approval but cannot to edit it's bills". then it's giving very low score close to 0.45.

    Even I tried to tweak values "max_iter & random_state=0 which improved the score to .51 but still it's low w.r.t. first score. Can you tell me do I need to tweak anywhere else in config file or not & if yes, then is there any rules based on which we should calibrate those parameters?

    opened by deepankar27 13
  • cache gazetter steaming words to faster load models that use NgramFactory features

    cache gazetter steaming words to faster load models that use NgramFactory features

    Hi team,

    In a project that I'm working on, we have a dynamic number of models, and we can't cache them all on each python process.

    So we have to load them dynamically often. Loading a sample model, which has 6 features of NgramFactory, (290ms of 300ms total load time for model) is taken stemming common_words_gazetteer_name from gazetter. (using from_dict / to_dict )

    So if we cache the stemmed words, loading models becomes extremly fast. This pull request in our case removes 60K calls to stem(lang, word) which takes 90% of domain load time for our model.

    Is this something you'd be interested to have in core ?

    ps:

    1. I didn't even create the model and don't know how snips works, just profiled and fixed.
    2. The code sucks, I'll make it better if you are interested.

    Regards, Dorian

    opened by ddorian 12
  • Snips doesn't work with pyinstaller

    Snips doesn't work with pyinstaller

    My team just spent the past week to get snips to work with pyinstaller. Pyinstaller lets you wrap your python code up into a convenient single file for distribution. The modifications to Snips were minor:

    snips_nlu_parsers/utils.py: add the import for sys, and change the assignment of PACKAGE_PATH to the following

    try:
        PACKAGE_PATH = Path(sys._MEIPASS)
    except AttributeError:
        PACKAGE_PATH = Path(__file__).absolute().parent
    
    

    _MEIPASS is a variable set by pyinstaller that records where the installation is in the environment it creates. The call to load the library a few lines down will break without this.

    Add the hook-snips_nlu_parsers.py file:

    from PyInstaller.compat import modname_tkinter, is_win
    import os
    
    hiddenimports = ['sklearn', 'sklearn.neighbors.typedefs', 'sklearn.neighbors.quad_tree', 'sklearn.tree._utils', 'snips_nlu', 'multiprocessing.get_context', 'sklearn.utils', 'pycrfsuite._dumpparser', 'pycrfsuite._logparser']
    
    if is_win:
        binaries=[(os.environ['USERPROFILE'] + '\\AppData\\Local\\Programs\\python\\python36\\lib\\site-packages\\snips_nlu_parsers\\dylib\\libsnips_nlu_parsers_rs.cp36-win_amd64.pyd', 'dylib')]
    else:
        binaries=[('/usr/local/lib/python3.7/site-packages/snips_nlu_parsers/dylib/libsnips_nlu_parsers_rs.cpython-37m-darwin.so', 'dylib')]
    
    

    Documentation for pyinstaller hooks: https://pyinstaller.readthedocs.io/en/stable/hooks.html?highlight=collect_submodules

    wontfix 
    opened by Shotgun167 10
  • Incremental Training

    Incremental Training

    Is it possible to do incremental training? I build training sets that have between 10-20K training examples and training takes a long time. Would like to be able to add a new training example to a model incrementally without having to wait for hours to retrain the set. Are there any thoughts on how to approach this?

    opened by timtutt 10
  • make proc_intent() fast again (on big model)

    make proc_intent() fast again (on big model)

    Hi,

    I have a big-model, which parsing text is 3x slower compared to my normal model. With changes inside, it's only 1.5x slower. The normal model should be faster too, just didn't test.

    The code isn't as nice as it could be, but if you agree with the changes we can fix it.

    Attached is cProfile screenshot, before & after of single proc_intent().

    before after

    Makes sense ?

    opened by ddorian 9
  • error loading a saved engine

    error loading a saved engine

    I did the tutorial and save the engine state. When I try to load it - I obtain ... snips_nlu.resources.MissingResource: Language resource 'en' not found. This may be solved by running 'snips-nlu download en'

    But the resource language has been found for the saving done just before with this code snips.load_resources('snips_nlu_en') <-- this works but not with this code load_resources(u"en") During the language installation I have this message Creating a shortcut link for 'snips_nlu_en' didn't work, but you can still load the resources via its full package name: snips_nlu.load_resources('snips_nlu_en')

    I guess that the loading engine code use the second instruction to load the resource.

    How can I solve this problem - I need to be able to save / load trained engines.

    Thanks

    opened by hvaneylen 8
  • [Windows] error while fitting engine on custom dataset with multiple entites

    [Windows] error while fitting engine on custom dataset with multiple entites

    Hi I'm using snips on windows10 anaconda I followed the tutorial but tried to generate my own dataset with multiple entities using the command given and while fitting the data i got this error -:

    with io.open("dataset45.json") as f: dataset = json.load(f) engine.fit(dataset)

    OSError Traceback (most recent call last) in () 1 with io.open("dataset45.json") as f: 2 dataset = json.load(f) ----> 3 engine.fit(dataset)

    ~\Anaconda3\lib\site-packages\snips_nlu\utils.py in wrapped(*args, **kwargs) 254 start = datetime.now() 255 msg_fmt = dict() --> 256 res = fn(*args, **kwargs) 257 if "elapsed_time" in output_msg: 258 msg_fmt["elapsed_time"] = datetime.now() - start

    ~\Anaconda3\lib\site-packages\snips_nlu\nlu_engine\nlu_engine.py in fit(self, dataset, force_retrain) 93 recycled_parser = build_processing_unit(parser_config) 94 if force_retrain or not recycled_parser.fitted: ---> 95 recycled_parser.fit(dataset, force_retrain) 96 parsers.append(recycled_parser) 97

    ~\Anaconda3\lib\site-packages\snips_nlu\utils.py in wrapped(*args, **kwargs) 254 start = datetime.now() 255 msg_fmt = dict() --> 256 res = fn(*args, **kwargs) 257 if "elapsed_time" in output_msg: 258 msg_fmt["elapsed_time"] = datetime.now() - start

    ~\Anaconda3\lib\site-packages\snips_nlu\intent_parser\probabilistic_intent_parser.py in fit(self, dataset, force_retrain) 82 slot_filler_config) 83 if force_retrain or not self.slot_fillers[intent_name].fitted: ---> 84 self.slot_fillers[intent_name].fit(dataset, intent_name) 85 logger.debug("Fitted slot fillers in %s", 86 elapsed_since(slot_fillers_start))

    ~\Anaconda3\lib\site-packages\snips_nlu\utils.py in wrapped(*args, **kwargs) 254 start = datetime.now() 255 msg_fmt = dict() --> 256 res = fn(*args, **kwargs) 257 if "elapsed_time" in output_msg: 258 msg_fmt["elapsed_time"] = datetime.now() - start

    ~\Anaconda3\lib\site-packages\snips_nlu\slot_filler\crf_slot_filler.py in fit(self, dataset, intent) 130 # pylint: disable=C0103 131 X = [self.compute_features(sample[TOKENS], drop_out=True) --> 132 for sample in crf_samples] 133 # ensure ascii tags 134 Y = [[_encode_tag(tag) for tag in sample[TAGS]]

    ~\Anaconda3\lib\site-packages\snips_nlu\slot_filler\crf_slot_filler.py in (.0) 130 # pylint: disable=C0103 131 X = [self.compute_features(sample[TOKENS], drop_out=True) --> 132 for sample in crf_samples] 133 # ensure ascii tags 134 Y = [[_encode_tag(tag) for tag in sample[TAGS]]

    ~\Anaconda3\lib\site-packages\snips_nlu\slot_filler\crf_slot_filler.py in compute_features(self, tokens, drop_out) 200 if drop_out and random_state.rand() < f_drop_out: 201 continue --> 202 value = feature.compute(i, cache) 203 if value is not None: 204 token_features[feature.name] = value

    ~\Anaconda3\lib\site-packages\snips_nlu\slot_filler\feature.py in compute(self, token_index, cache) 57 58 tokens = [c["token"] for c in cache] ---> 59 value = self.function(tokens, token_index + self.offset) 60 cache[token_index + self.offset][self.base_name] = value 61 return value

    ~\Anaconda3\lib\site-packages\snips_nlu\slot_filler\feature_factory.py in builtin_entity_match(tokens, token_index) 494 495 builtin_entities = get_builtin_entities( --> 496 text, self.language, scope=[builtin_entity], use_cache=True) 497 builtin_entities = [ent for ent in builtin_entities 498 if entity_filter(ent, start, end)]

    ~\Anaconda3\lib\site-packages\snips_nlu\builtin_entities.py in get_builtin_entities(text, language, scope, use_cache) 44 def get_builtin_entities(text, language, scope=None, use_cache=True): 45 parser = get_builtin_entity_parser(language) ---> 46 return parser.parse(text, scope=scope, use_cache=use_cache) 47 48

    ~\Anaconda3\lib\site-packages\snips_nlu\builtin_entities.py in parse(self, text, scope, use_cache) 24 cache_key = (text, str(scope)) 25 if cache_key not in self._cache: ---> 26 parser_result = self.parser.parse(text, scope) 27 self._cache[cache_key] = parser_result 28 return self._cache[cache_key]

    ~\Anaconda3\lib\site-packages\snips_nlu_ontology\builtin_entities.py in parse(self, text, scope) 157 with string_pointer(c_char_p()) as ptr: 158 exit_code = lib.snips_nlu_ontology_extract_entities_json( --> 159 self._parser, text.encode("utf8"), scope, byref(ptr)) 160 if exit_code: 161 raise ValueError("Something wrong happened while extracting "

    OSError: [WinError -1073741795] Windows Error 0xc000001d

    snips-nlu generate-dataset en intent_Lan.txt intent_ot.txt intent_reset.txt entity_LanID.txt entity_otp.txt entity_PasswordReset.txt > dataset.json

    my .txt files were intent_ot.txt intent_reset.txt entity_ResetPassword.txt intent_Lan.txt entity_LanID.txt entity_otp.txt

    opened by rohan-dot 8
  • [INSTALL] Setup.py does not install enum34 module

    [INSTALL] Setup.py does not install enum34 module

    For some reason enum34 is not installed when I run python setup.py install even though it seems to be specified correctly in the setup.py:

    setup(name="snips_nlu",
          version="0.0.1",
          description="",
          author="Clement Doumouro",
          author_email="[email protected]",
          url="",
          download_url="",
          license="MIT",
          install_requires=["enum34"],
          packages=["snips_nlu",
                    "snips_nlu.entity_extractor",
                    "snips_nlu.nlu_engine"],
          cmdclass={"install": SnipsNLUInstall},
          entry_points={},
          include_package_data=False,
          zip_safe=False)
    
    bug 
    opened by adrienball 8
  • Inconsistencies in intent classification

    Inconsistencies in intent classification

    Hi,

    I have been working on the sample dataset and sample code posted in the https://snips-nlu.readthedocs.io/en/latest/quickstart.html.

    I have also added a new intent "sampleTurnOffLight" to the same sample_dataset.json which looks like below

    sample_dataset.json.zip

    For a text - "turn lights in basement" I'm getting different classification every time. Note - I retrain(fit) every time before I call the parse I expect it to behave consistently with each re-train Could you please confirm the behavior? Run 1- { "input": "turn lights in basement", "slots": [], "intent": { "intentName": "sampleTurnOffLight", "probability": 0.6660875805168223 } } Run 2- { "input": "turn lights in basement", "slots": [], "intent": { "intentName": "sampleTurnOnLight", "probability": 0.6430405901353275 } }

    question 
    opened by satnam2012 7
  • Is it possible to adapt the stemming and stop words file for a language?

    Is it possible to adapt the stemming and stop words file for a language?

    For my use case, I need to remove some words from stop words file and add some to stemming list. Changing it locally in the python dependency works. But I need to share it and don't want to get it overwritten.

    So, is there a way to import adapted language resource files?

    question 
    opened by Corasonn 0
  • snips-nlu not getting installed on docker.

    snips-nlu not getting installed on docker.

    Describe the bug I am installing snips in docker with step 1 given in To Reproduce section. I am ending with No module named 'distutils.msvccompiler' error. I am running everything on Linux based system. is there alternaltive way to install snips-nlu in docker?

    To Reproduce

    1. Created installSnips.sh (following) file for installing rust, rust setup tool
    pip3 install numpy
    pip3 install scipy
    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
    pip3 install setuptools-rust
    source ~/.cargo/env
    pip3 install snips-nlu
    
    1. while running installSnips.sh pip3 install snip-nlu enter in following error
     Building wheel for scikit-learn (setup.py) ... error
      error: subprocess-exited-with-error
      
      × python setup.py bdist_wheel did not run successfully.
      │ exit code: 1
      ╰─> [26 lines of output]
          Partial import of sklearn during the build process.
          /tmp/pip-install-h9opcxf9/scikit-learn_fc42ebd0e5804549ad6f611dced79620/setup.py:123: DeprecationWarning:
          
            `numpy.distutils` is deprecated since NumPy 1.23.0, as a result
            of the deprecation of `distutils` itself. It will be removed for
            Python >= 3.12. For older Python versions it will remain present.
            It is recommended to use `setuptools < 60.0` for those Python versions.
            For more details, see:
              https://numpy.org/devdocs/reference/distutils_status_migration.html
          
          
            from numpy.distutils.command.build_ext import build_ext  # noqa
          Traceback (most recent call last):
            File "<string>", line 2, in <module>
            File "<pip-setuptools-caller>", line 34, in <module>
            File "/tmp/pip-install-h9opcxf9/scikit-learn_fc42ebd0e5804549ad6f611dced79620/setup.py", line 303, in <module>
              setup_package()
            File "/tmp/pip-install-h9opcxf9/scikit-learn_fc42ebd0e5804549ad6f611dced79620/setup.py", line 295, in setup_package
              from numpy.distutils.core import setup
            File "/home/drjslab/.local/lib/python3.10/site-packages/numpy/distutils/core.py", line 24, in <module>
              from numpy.distutils.command import config, config_compiler, \
            File "/home/drjslab/.local/lib/python3.10/site-packages/numpy/distutils/command/config.py", line 19, in <module>
              from numpy.distutils.mingw32ccompiler import generate_manifest
            File "/home/drjslab/.local/lib/python3.10/site-packages/numpy/distutils/mingw32ccompiler.py", line 28, in <module>
              from distutils.msvccompiler import get_build_version as get_build_msvc_version
          ModuleNotFoundError: No module named 'distutils.msvccompiler'
          [end of output]
      
      note: This error originates from a subprocess, and is likely not a problem with pip.
      ERROR: Failed building wheel for scikit-learn
      Running setup.py clean for scikit-learn
    Failed to build scikit-learn
    Installing collected packages: requests, pyaml, packaging, scikit-learn, deprecation
      Attempting uninstall: scikit-learn
        Found existing installation: scikit-learn 1.1.2
        Uninstalling scikit-learn-1.1.2:
          Successfully uninstalled scikit-learn-1.1.2
      Running setup.py install for scikit-learn ... error
      error: subprocess-exited-with-error
      
      × Running setup.py install for scikit-learn did not run successfully.
      │ exit code: 1
      ╰─> [26 lines of output]
          Partial import of sklearn during the build process.
          /tmp/pip-install-h9opcxf9/scikit-learn_fc42ebd0e5804549ad6f611dced79620/setup.py:123: DeprecationWarning:
          
            `numpy.distutils` is deprecated since NumPy 1.23.0, as a result
            of the deprecation of `distutils` itself. It will be removed for
            Python >= 3.12. For older Python versions it will remain present.
            It is recommended to use `setuptools < 60.0` for those Python versions.
            For more details, see:
              https://numpy.org/devdocs/reference/distutils_status_migration.html
          
          
            from numpy.distutils.command.build_ext import build_ext  # noqa
          Traceback (most recent call last):
            File "<string>", line 2, in <module>
            File "<pip-setuptools-caller>", line 34, in <module>
            File "/tmp/pip-install-h9opcxf9/scikit-learn_fc42ebd0e5804549ad6f611dced79620/setup.py", line 303, in <module>
              setup_package()
            File "/tmp/pip-install-h9opcxf9/scikit-learn_fc42ebd0e5804549ad6f611dced79620/setup.py", line 295, in setup_package
              from numpy.distutils.core import setup
            File "/home/drjslab/.local/lib/python3.10/site-packages/numpy/distutils/core.py", line 24, in <module>
              from numpy.distutils.command import config, config_compiler, \
            File "/home/drjslab/.local/lib/python3.10/site-packages/numpy/distutils/command/config.py", line 19, in <module>
              from numpy.distutils.mingw32ccompiler import generate_manifest
            File "/home/drjslab/.local/lib/python3.10/site-packages/numpy/distutils/mingw32ccompiler.py", line 28, in <module>
              from distutils.msvccompiler import get_build_version as get_build_msvc_version
          ModuleNotFoundError: No module named 'distutils.msvccompiler'
          [end of output]
      
      note: This error originates from a subprocess, and is likely not a problem with pip.
      Rolling back uninstall of scikit-learn
      Moving to /home/drjslab/.local/lib/python3.10/site-packages/scikit_learn-1.1.2.dist-info/
       from /home/drjslab/.local/lib/python3.10/site-packages/~cikit_learn-1.1.2.dist-info
      Moving to /home/drjslab/.local/lib/python3.10/site-packages/scikit_learn.libs/
       from /home/drjslab/.local/lib/python3.10/site-packages/~cikit_learn.libs
      Moving to /home/drjslab/.local/lib/python3.10/site-packages/sklearn/
       from /home/drjslab/.local/lib/python3.10/site-packages/~klearn
    error: legacy-install-failure
    
    × Encountered error while trying to install package.
    ╰─> scikit-learn
    
    note: This is an issue with the package mentioned above, not pip.
    

    Environment:

    • Base OS: Ubuntu 20.04
    • Base Python version: 3.8
    • snips-nlu version: Latest
    • Docker OS: Ubuntu 20.04
    • Docker Python: 3.10.4
    bug 
    opened by jig4physics 0
  • SSLError

    SSLError "Bad Handshake" error during python -m snips_nlu download-language-entities en

    The Bug I seem to be getting a bad handshake error while trying to download the built-in entities. SLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')],)"),))

    To Reproduce The error comes up when trying to parse input, and then an error occurs saying "FileNotFoundError: No data found for the 'snips/city' builtin entity in language 'en'. You must download the corresponding resources by running 'python -m snips_nlu download-entity snips/city en' before you can use this built-in entity."

    Then when trying to download the SSL Bad Handshake error occurs

    If successful, the command should download and link the built-in entities, but I see no documentation about it and very little help online.

    Environment:

    • OS: Mac OSX
    • Python version: 3.6.10 :: Anaconda, Inc.
    • snips-nlu version: 0.20.2
    bug 
    opened by SamChadri 4
  • Problem with Installation on Windows

    Problem with Installation on Windows

    Hi, people!

    I have python 3.8.6 and pip 22.0 installed on my Windows 10 machine. When I try to install Snips NLU via pip with command pip install snips-nlu, the following error occurs:

      Running `C:\Users\diego\AppData\Local\Temp\pip-install-krsta91s\snips-nlu-parsers_ccf393e870dc4dd38853696b80385053\ffi\target\release\build\rustling-ontology-514a7dc119c55141\build-script-build`
      error: failed to run custom build command for `rustling-ontology v0.19.3 (https://github.com/snipsco/rustling-ontology?tag=0.19.3#3bb1313d)`
    
      Caused by:
        process didn't exit successfully: `C:\Users\diego\AppData\Local\Temp\pip-install-krsta91s\snips-nlu-parsers_ccf393e870dc4dd38853696b80385053\ffi\target\release\build\rustling-ontology-514a7dc119c55141\build-script-build` (exit code: 101)
        --- stdout
        cargo:rerun-if-changed=grammar/de/src/
    
        --- stderr
        thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ErrorMessage { msg: "example: \"letzten 2 jahren\" matched no rule" }', C:\Users\diego\.cargo\git\checkouts\rustling-ontology-a5f364cfd4d376e4\3bb1313\build.rs:45:86
        note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
      error: cargo failed with code: 101
    
      [end of output]
    

    Can anyone help me with this? Thanks in advance!

    question 
    opened by diegostefano 0
  • Confuguring the probabilistic parser to ignore stop words?

    Confuguring the probabilistic parser to ignore stop words?

    Question Hi friends In the deterministic and lookup intent parsers, we can define that we want to ignore the stop words Is it possible to do the same for the probabilistic parser? Regards Hicham

    question 
    opened by hicham17 0
Releases(0.20.2)
  • 0.20.2(Jan 15, 2020)

  • 0.20.1(Sep 5, 2019)

    Added

    • Allow to bypass the model version check #830
    • Persist CustomEntityParser license when needed #832
    • Document metrics CLI #839
    • Allow to fit SnipsNLUEngine with a Dataset object #840

    Changed

    • Update snips-nlu-parsers dependency upper bound to 0.5 #850

    Fixed

    • Invalidate importlib caches after dynamically installing module #838
    • Automatically generate documentation for supported languages and builtin entities #841
    • Fix issue when cleaning up crfsuite files #843
    • Fix filemode of persisted crfsuite files #844
    Source code(tar.gz)
    Source code(zip)
  • 0.20.0(Jul 16, 2019)

    Added

    • Add new intent parser: LookupIntentParser #759

    Changed

    • Replace DeterministicIntentParser by LookupIntentParser in default configs #829
    • Bumped snips-nlu-parsers to 0.3.x introducing new builtin entities:
      • snips/time
      • snips/timePeriod
      • snips/date
      • snips/datePeriod
      • snips/city
      • snips/country
      • snips/region
    Source code(tar.gz)
    Source code(zip)
  • 0.19.8(Jul 10, 2019)

    Added

    • Add filter for entity match feature #814
    • Add noise re-weight factor in LogRegIntentClassifier #815
    • Add warning logs and improve errors #821
    • Add random seed parameter in training CLI #819

    Fixed

    • Fix non-deterministic behavior #817
    • Import modules lazily to speed up CLI startup time #819
    • Removed dependency on semantic_version to accept "subpatches" number #825
    Source code(tar.gz)
    Source code(zip)
  • 0.19.7.1(Jul 10, 2019)

  • 0.19.7(Jun 20, 2019)

    Changed

    • Re-score ambiguous DeterministicIntentParser results based on slots #791
    • Accept ambiguous results from DeterministicIntentParser when confidence score is above 0.5 #797
    • Avoid generating number variations when not needed #799
    • Moved the NLU random state from the config to the shared resources #801
    • Reduce custom entity parser footprint in training time #804
    • Bumped scikit-learn to >=0.21,<0.22 for python>=3.5 and >=0.20<0.21 for python<3.5 #801
    • Update dependencies #811

    Fixed

    • Fixed a couple of bugs in the data augmentation which were making the NLU training non-deterministic #801
    • Remove deprecated code in dataset generation #803
    • Fix possible override of entity values when generating variations #808
    Source code(tar.gz)
    Source code(zip)
  • 0.19.6(Apr 26, 2019)

  • 0.19.5(Apr 10, 2019)

    Added

    • Advanced inference logging in the CRFSlotFiller #776
    • Improved failed linking error message after download of resources #774
    • Improve handling of ambiguous utterances in DeterministicIntentParser #773

    Changed

    • Remove normalization of confidence scores in intent classification #782

    Fixed

    • Fixed a crash due to missing resources when refitting the CRFSlotFiller #771
    • Fixed issue with egg fragments in download cli #769
    • Fixed an issue causing the None intent to be ignored when using the parse API in conjunction with intents and top_n #781
    Source code(tar.gz)
    Source code(zip)
  • 0.19.4(Mar 6, 2019)

    Added

    • Support for Portuguese: "pt_pt" and "pt_br"

    Changed

    • Enhancement: leverage entity scopes of each intent in deterministic intent parser
    Source code(tar.gz)
    Source code(zip)
  • 0.19.3(Mar 5, 2019)

    Fixed

    • Issue with intent classification reducing classification accuracy
    • Issue resulting in a mutation of the CRFSlotFillerConfig
    • Wrong required resources of the DeterministicIntentParser
    • Issue with non ASCII characters when using the parsing CLI with Python2
    Source code(tar.gz)
    Source code(zip)
  • 0.19.2(Feb 11, 2019)

  • 0.19.1(Feb 4, 2019)

  • 0.19.0(Feb 4, 2019)

    Added

    • Support for Python3.7
    • get_intents(text) API in SnipsNLUEngine to get the probabilities of all the intents
    • get_slots(text, intent) API in SnipsNLUEngine to extract slots when the intent is known
    • The DeterministicIntentParser can now ignore stop words through the new ignore_stop_words configuration parameter
    • Co-occurrence features can now be used in the LogRegIntentClassifier

    Changed

    • The None intent is now handled as a regular intent in the parsing output, which means that:
    {
        "input": "foo bar",
        "intent": None,
        "slots": None
    }
    

    is replaced with:

    {
        "input": "foo bar",
        "intent": {
            "intentName": None,
            "probability": 0.552122
        },
        "slots": []
    }
    
    • Patterns of the DeterministicIntentParser are now deduplicated across intents in order to reduce ambiguity
    • Improve the use of custom ProcessingUnit through the use of Registrable pattern
    • Improve the use of default processing unit configurations
    • Improve logging
    • Replace snips-nlu-ontology with snips-nlu-parsers

    Fixed

    • Issue when persisting resources
    • Issue when resolving custom entities
    • Issue with whitespaces when generating dataset from YAML and text files
    • Issue with unicode when using the CLI (Python 2)
    Source code(tar.gz)
    Source code(zip)
  • 0.18.0(Nov 26, 2018)

  • 0.17.4(Nov 20, 2018)

    Added

    • Add a --config argument in the metrics CLI

    Changed

    • Replace "parser_threshold" by "matching_strictness" in dataset format
    • Optimize loading and inference runtime
    • Disable stemming for intent classification in default configs
    Source code(tar.gz)
    Source code(zip)
  • 0.17.3(Oct 18, 2018)

  • 0.17.2(Oct 17, 2018)

  • 0.17.1(Oct 9, 2018)

  • 0.17.0(Oct 5, 2018)

    Added

    • Support for 3 new builtin entities in French: snips/musicAlbum, snips/musicArtist and snips/musicTrack
    • Minimal support for Italian

    Changed

    • model version 0.16.0 => 0.17.0

    Fixed

    • Bug with entity feature name in intent classification
    Source code(tar.gz)
    Source code(zip)
  • 0.16.5(Sep 17, 2018)

  • 0.16.4(Sep 17, 2018)

  • 0.16.3(Aug 22, 2018)

  • 0.16.2(Aug 8, 2018)

    Added

    • automatically_extensible flag in dataset generation tool
    • System requirements
    • Reference to chatito tool in documentation

    Changed

    • Bump snips-nlu-ontology to 0.57.3
    • versions of dependencies are now defined more loosely

    Fixed

    • Issue with synonyms mapping
    • Issue with snips-nlu download-all-languages CLI command
    Source code(tar.gz)
    Source code(zip)
  • 0.16.1(Jul 23, 2018)

  • 0.16.0(Jul 17, 2018)

    Changed

    • The SnipsNLUEngine object is now persisted to (and loaded from) a directory, instead of a single json file.
    • The language resources are now persisted along with the SnipsNLUEngine, removing the need to download and load the resources when loading a trained engine.
    • The format of language resources has been optimized.

    Added

    • Stemmed gazetteers, computed beforehand. It removes the need to stem gazetteers on the fly.
    • API to persist (and load) a SnipsNLUEngine object as a bytearray

    Fixed

    • Issue in the DeterministicIntentParser when the same slot name was used in multiple intents while referring to different entities
    Source code(tar.gz)
    Source code(zip)
  • 0.15.1(Jul 9, 2018)

  • 0.15.0(Jun 21, 2018)

    Changed

    • Language resources are now packaged separately from the Snips NLU core library, and can be fetched using snips-nlu download <language>.
    • The CLI tool now consists in a single entry point, snips-nlu, which exposes several commands.

    Added

    • CLI command to parse a query
    Source code(tar.gz)
    Source code(zip)
  • 0.14.0(Jun 8, 2018)

    Fixed

    • Issue due to caching of builtin entities at inference time

    Changed

    • Improve builtin entities handling during intent classification
    • Improve builtin entities handling in DeterministicIntentParser
    • Reduce size of regex patterns in trained model file
    • Update model version to 0.15.0
    Source code(tar.gz)
    Source code(zip)
  • 0.13.5(May 23, 2018)

    Fixed

    • Fixed synonyms matching by using the normalized version of the tagged value
    • Fixed dataset augmentation by keep stripped values of entities
    • Fixed the string variations functions not to generate too many variations
    Source code(tar.gz)
    Source code(zip)
  • 0.13.4(May 18, 2018)

    Added

    • Documentation for the None intent

    Changed

    • Improve calibration of intent classification probabilities
    • Update snips-nlu-ontology version to 0.55.0

    Fixed

    • DeterministicIntentParser: Fix bug when deduplicating regexes
    • DeterministicIntentParser: Fix issue with incorrect ranges when parsing sentences with both builtin and custom slots
    • DeterministicIntentParser: Fix issue with builtin entities placeholders causing mismatches
    • Fix issue with engine-inference CLI script not loading resources correctly
    Source code(tar.gz)
    Source code(zip)
Owner
Snips
We make technology disappear
Snips
A library for Multilingual Unsupervised or Supervised word Embeddings

MUSE: Multilingual Unsupervised and Supervised Embeddings MUSE is a Python library for multilingual word embeddings, whose goal is to provide the comm

Facebook Research 3k Jan 06, 2023
Japanese synonym library

chikkarpy chikkarpyはchikkarのPython版です。 chikkarpy is a Python version of chikkar. chikkarpy は Sudachi 同義語辞書を利用し、SudachiPyの出力に同義語展開を追加するために開発されたライブラリです。

Works Applications 48 Dec 14, 2022
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism This repository is the official PyTorch implementation of our AAAI-2022 paper, in

Jinglin Liu 829 Jan 07, 2023
Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

Yoon Kim 43 Dec 23, 2022
Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

Auto-Research A no-code utility to generate a detailed well-cited survey with topic clustered sections (draft paper format) and other interesting arti

Sidharth Pal 20 Dec 14, 2022
Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

MLP Singer Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis. Audio samples are available on our demo page.

Neosapience 103 Dec 23, 2022
The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Main Idea The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank Semantic Search Re

Sergio Arnaud Gomez 2 Jan 28, 2022
Turkish Stop Words Türkçe Dolgu Sözcükleri

trstop Turkish Stop Words Türkçe Dolgu Sözcükleri In this repository I put Turkish stop words that is contained in the first 10 thousand words with th

Ahmet Aksoy 103 Nov 12, 2022
Auto-researching tool generating word documents.

About ResearchTE automates researching by generating document with answers to given questions. Supports getting results from: Google DuckDuckGo (with

1 Feb 14, 2022
State of the Art Natural Language Processing

Spark NLP: State of the Art Natural Language Processing Spark NLP is a Natural Language Processing library built on top of Apache Spark ML. It provide

John Snow Labs 3k Jan 05, 2023
Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, Explosion AI 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 French 1.2.3 German 1.2

Explosion 70 Dec 12, 2022
nlpcommon is a python Open Source Toolkit for text classification.

nlpcommon nlpcommon, Python Text Tool. Guide Feature Install Usage Dataset Contact Cite Reference Feature nlpcommon is a python Open Source

xuming 3 May 29, 2022
Materials (slides, code, assignments) for the NYU class I teach on NLP and ML Systems (Master of Engineering).

FREE_7773 Repo containing material for the NYU class (Master of Engineering) I teach on NLP, ML Sys etc. For context on what the class is trying to ac

Jacopo Tagliabue 90 Dec 19, 2022
Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Generating Persona Consistent Dialogues by Exploiting Natural Language Inference Source code for RCDG model in AAAI20 Generating Persona Consistent Di

16 Oct 08, 2022
Code and dataset for the EMNLP 2021 Finding paper "Can NLI Models Verify QA Systems’ Predictions?"

Code and dataset for the EMNLP 2021 Finding paper "Can NLI Models Verify QA Systems’ Predictions?"

Jifan Chen 22 Oct 21, 2022
A fast hierarchical dimensionality reduction algorithm.

h-NNE: Hierarchical Nearest Neighbor Embedding A fast hierarchical dimensionality reduction algorithm. h-NNE is a general purpose dimensionality reduc

Marios Koulakis 35 Dec 12, 2022
Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.

Linear Transformers Are Secretly Fast Weight Programmers This repository contains the code accompanying the paper Linear Transformers Are Secretly Fas

Imanol Schlag 77 Dec 19, 2022
KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

KoGPT KoGPT (Korean Generative Pre-trained Transformer) https://github.com/kakaobrain/kogpt https://huggingface.co/kakaobrain/kogpt Model Descriptions

Kakao Brain 797 Dec 26, 2022
⚖️ A Statutory Article Retrieval Dataset in French.

A Statutory Article Retrieval Dataset in French This repository contains the Belgian Statutory Article Retrieval Dataset (BSARD), as well as the code

Maastricht Law & Tech Lab 19 Nov 17, 2022
Python SDK for working with Voicegain Speech-to-Text

Voicegain Speech-to-Text Python SDK Python SDK for the Voicegain Speech-to-Text API. This API allows for large vocabulary speech-to-text transcription

Voicegain 3 Dec 14, 2022