Speech recognition module for Python, supporting several engines and APIs, online and offline.

Overview

SpeechRecognition

Latest Version Development Status Supported Python Versions License Continuous Integration Test Results

Library for performing speech recognition, with support for several engines and APIs, online and offline.

Speech recognition engine/API support:

Quickstart: pip install SpeechRecognition. See the "Installing" section for more details.

To quickly try it out, run python -m speech_recognition after installing.

Project links:

Library Reference

The library reference documents every publicly accessible object in the library. This document is also included under reference/library-reference.rst.

See Notes on using PocketSphinx for information about installing languages, compiling PocketSphinx, and building language packs from online resources. This document is also included under reference/pocketsphinx.rst.

Examples

See the examples/ directory in the repository root for usage examples:

Installing

First, make sure you have all the requirements listed in the "Requirements" section.

The easiest way to install this is using pip install SpeechRecognition.

Otherwise, download the source distribution from PyPI, and extract the archive.

In the folder, run python setup.py install.

Requirements

To use all of the functionality of the library, you should have:

  • Python 2.6, 2.7, or 3.3+ (required)
  • PyAudio 0.2.11+ (required only if you need to use microphone input, Microphone)
  • PocketSphinx (required only if you need to use the Sphinx recognizer, recognizer_instance.recognize_sphinx)
  • Google API Client Library for Python (required only if you need to use the Google Cloud Speech API, recognizer_instance.recognize_google_cloud)
  • FLAC encoder (required only if the system is not x86-based Windows/Linux/OS X)

The following requirements are optional, but can improve or extend functionality in some situations:

  • On Python 2, and only on Python 2, some functions (like recognizer_instance.recognize_bing) will run slower if you do not have Monotonic for Python 2 installed.
  • If using CMU Sphinx, you may want to install additional language packs to support languages like International French or Mandarin Chinese.

The following sections go over the details of each requirement.

Python

The first software requirement is Python 2.6, 2.7, or Python 3.3+. This is required to use the library.

PyAudio (for microphone users)

PyAudio is required if and only if you want to use microphone input (Microphone). PyAudio version 0.2.11+ is required, as earlier versions have known memory management bugs when recording from microphones in certain situations.

If not installed, everything in the library will still work, except attempting to instantiate a Microphone object will raise an AttributeError.

The installation instructions on the PyAudio website are quite good - for convenience, they are summarized below:

  • On Windows, install PyAudio using Pip: execute pip install pyaudio in a terminal.
  • On Debian-derived Linux distributions (like Ubuntu and Mint), install PyAudio using APT: execute sudo apt-get install python-pyaudio python3-pyaudio in a terminal.
    • If the version in the repositories is too old, install the latest release using Pip: execute sudo apt-get install portaudio19-dev python-all-dev python3-all-dev && sudo pip install pyaudio (replace pip with pip3 if using Python 3).
  • On OS X, install PortAudio using Homebrew: brew install portaudio. Then, install PyAudio using Pip: pip install pyaudio.
  • On other POSIX-based systems, install the portaudio19-dev and python-all-dev (or python3-all-dev if using Python 3) packages (or their closest equivalents) using a package manager of your choice, and then install PyAudio using Pip: pip install pyaudio (replace pip with pip3 if using Python 3).

PyAudio wheel packages for common 64-bit Python versions on Windows and Linux are included for convenience, under the third-party/ directory in the repository root. To install, simply run pip install wheel followed by pip install ./third-party/WHEEL_FILENAME (replace pip with pip3 if using Python 3) in the repository root directory.

PocketSphinx-Python (for Sphinx users)

PocketSphinx-Python is required if and only if you want to use the Sphinx recognizer (recognizer_instance.recognize_sphinx).

PocketSphinx-Python wheel packages for 64-bit Python 2.7, 3.4, and 3.5 on Windows are included for convenience, under the third-party/ directory. To install, simply run pip install wheel followed by pip install ./third-party/WHEEL_FILENAME (replace pip with pip3 if using Python 3) in the SpeechRecognition folder.

On Linux and other POSIX systems (such as OS X), follow the instructions under "Building PocketSphinx-Python from source" in Notes on using PocketSphinx for installation instructions.

Note that the versions available in most package repositories are outdated and will not work with the bundled language data. Using the bundled wheel packages or building from source is recommended.

See Notes on using PocketSphinx for information about installing languages, compiling PocketSphinx, and building language packs from online resources. This document is also included under reference/pocketsphinx.rst.

Google Cloud Speech Library for Python (for Google Cloud Speech API users)

Google Cloud Speech library for Python is required if and only if you want to use the Google Cloud Speech API (recognizer_instance.recognize_google_cloud).

If not installed, everything in the library will still work, except calling recognizer_instance.recognize_google_cloud will raise an RequestError.

According to the official installation instructions, the recommended way to install this is using Pip: execute pip install google-cloud-speech (replace pip with pip3 if using Python 3).

FLAC (for some systems)

A FLAC encoder is required to encode the audio data to send to the API. If using Windows (x86 or x86-64), OS X (Intel Macs only, OS X 10.6 or higher), or Linux (x86 or x86-64), this is already bundled with this library - you do not need to install anything.

Otherwise, ensure that you have the flac command line tool, which is often available through the system package manager. For example, this would usually be sudo apt-get install flac on Debian-derivatives, or brew install flac on OS X with Homebrew.

Monotonic for Python 2 (for faster operations in some functions on Python 2)

On Python 2, and only on Python 2, if you do not install the Monotonic for Python 2 library, some functions will run slower than they otherwise could (though everything will still work correctly).

On Python 3, that library's functionality is built into the Python standard library, which makes it unnecessary.

This is because monotonic time is necessary to handle cache expiry properly in the face of system time changes and other time-related issues. If monotonic time functionality is not available, then things like access token requests will not be cached.

To install, use Pip: execute pip install monotonic in a terminal.

Troubleshooting

The recognizer tries to recognize speech even when I'm not speaking, or after I'm done speaking.

Try increasing the recognizer_instance.energy_threshold property. This is basically how sensitive the recognizer is to when recognition should start. Higher values mean that it will be less sensitive, which is useful if you are in a loud room.

This value depends entirely on your microphone or audio data. There is no one-size-fits-all value, but good values typically range from 50 to 4000.

Also, check on your microphone volume settings. If it is too sensitive, the microphone may be picking up a lot of ambient noise. If it is too insensitive, the microphone may be rejecting speech as just noise.

The recognizer can't recognize speech right after it starts listening for the first time.

The recognizer_instance.energy_threshold property is probably set to a value that is too high to start off with, and then being adjusted lower automatically by dynamic energy threshold adjustment. Before it is at a good level, the energy threshold is so high that speech is just considered ambient noise.

The solution is to decrease this threshold, or call recognizer_instance.adjust_for_ambient_noise beforehand, which will set the threshold to a good value automatically.

The recognizer doesn't understand my particular language/dialect.

Try setting the recognition language to your language/dialect. To do this, see the documentation for recognizer_instance.recognize_sphinx, recognizer_instance.recognize_google, recognizer_instance.recognize_wit, recognizer_instance.recognize_bing, recognizer_instance.recognize_api, recognizer_instance.recognize_houndify, and recognizer_instance.recognize_ibm.

For example, if your language/dialect is British English, it is better to use "en-GB" as the language rather than "en-US".

The recognizer hangs on recognizer_instance.listen; specifically, when it's calling Microphone.MicrophoneStream.read.

This usually happens when you're using a Raspberry Pi board, which doesn't have audio input capabilities by itself. This causes the default microphone used by PyAudio to simply block when we try to read it. If you happen to be using a Raspberry Pi, you'll need a USB sound card (or USB microphone).

Once you do this, change all instances of Microphone() to Microphone(device_index=MICROPHONE_INDEX), where MICROPHONE_INDEX is the hardware-specific index of the microphone.

To figure out what the value of MICROPHONE_INDEX should be, run the following code:

import speech_recognition as sr
for index, name in enumerate(sr.Microphone.list_microphone_names()):
    print("Microphone with name \"{1}\" found for `Microphone(device_index={0})`".format(index, name))

This will print out something like the following:

Microphone with name "HDA Intel HDMI: 0 (hw:0,3)" found for `Microphone(device_index=0)`
Microphone with name "HDA Intel HDMI: 1 (hw:0,7)" found for `Microphone(device_index=1)`
Microphone with name "HDA Intel HDMI: 2 (hw:0,8)" found for `Microphone(device_index=2)`
Microphone with name "Blue Snowball: USB Audio (hw:1,0)" found for `Microphone(device_index=3)`
Microphone with name "hdmi" found for `Microphone(device_index=4)`
Microphone with name "pulse" found for `Microphone(device_index=5)`
Microphone with name "default" found for `Microphone(device_index=6)`

Now, to use the Snowball microphone, you would change Microphone() to Microphone(device_index=3).

Calling Microphone() gives the error IOError: No Default Input Device Available.

As the error says, the program doesn't know which microphone to use.

To proceed, either use Microphone(device_index=MICROPHONE_INDEX, ...) instead of Microphone(...), or set a default microphone in your OS. You can obtain possible values of MICROPHONE_INDEX using the code in the troubleshooting entry right above this one.

The code examples raise UnicodeEncodeError: 'ascii' codec can't encode character when run.

When you're using Python 2, and your language uses non-ASCII characters, and the terminal or file-like object you're printing to only supports ASCII, an error is raised when trying to write non-ASCII characters.

This is because in Python 2, recognizer_instance.recognize_sphinx, recognizer_instance.recognize_google, recognizer_instance.recognize_wit, recognizer_instance.recognize_bing, recognizer_instance.recognize_api, recognizer_instance.recognize_houndify, and recognizer_instance.recognize_ibm return unicode strings (u"something") rather than byte strings ("something"). In Python 3, all strings are unicode strings.

To make printing of unicode strings work in Python 2 as well, replace all print statements in your code of the following form:

print SOME_UNICODE_STRING

With the following:

print SOME_UNICODE_STRING.encode("utf8")

This change, however, will prevent the code from working in Python 3.

The program doesn't run when compiled with PyInstaller.

As of PyInstaller version 3.0, SpeechRecognition is supported out of the box. If you're getting weird issues when compiling your program using PyInstaller, simply update PyInstaller.

You can easily do this by running pip install --upgrade pyinstaller.

On Ubuntu/Debian, I get annoying output in the terminal saying things like "bt_audio_service_open: [...] Connection refused" and various others.

The "bt_audio_service_open" error means that you have a Bluetooth audio device, but as a physical device is not currently connected, we can't actually use it - if you're not using a Bluetooth microphone, then this can be safely ignored. If you are, and audio isn't working, then double check to make sure your microphone is actually connected. There does not seem to be a simple way to disable these messages.

For errors of the form "ALSA lib [...] Unknown PCM", see this StackOverflow answer. Basically, to get rid of an error of the form "Unknown PCM cards.pcm.rear", simply comment out pcm.rear cards.pcm.rear in /usr/share/alsa/alsa.conf, ~/.asoundrc, and /etc/asound.conf.

For "jack server is not running or cannot be started" or "connect(2) call to /dev/shm/jack-1000/default/jack_0 failed (err=No such file or directory)" or "attempt to connect to server failed", these are caused by ALSA trying to connect to JACK, and can be safely ignored. I'm not aware of any simple way to turn those messages off at this time, besides entirely disabling printing while starting the microphone.

On OS X, I get a ChildProcessError saying that it couldn't find the system FLAC converter, even though it's installed.

Installing FLAC for OS X directly from the source code will not work, since it doesn't correctly add the executables to the search path.

Installing FLAC using Homebrew ensures that the search path is correctly updated. First, ensure you have Homebrew, then run brew install flac to install the necessary files.

Developing

To hack on this library, first make sure you have all the requirements listed in the "Requirements" section.

  • Most of the library code lives in speech_recognition/__init__.py.
  • Examples live under the examples/ directory, and the demo script lives in speech_recognition/__main__.py.
  • The FLAC encoder binaries are in the speech_recognition/ directory.
  • Documentation can be found in the reference/ directory.
  • Third-party libraries, utilities, and reference material are in the third-party/ directory.

To install/reinstall the library locally, run python setup.py install in the project root directory.

Before a release, the version number is bumped in README.rst and speech_recognition/__init__.py. Version tags are then created using git config gpg.program gpg2 && git config user.signingkey DB45F6C431DE7C2DCD99FF7904882258A4063489 && git tag -s VERSION_GOES_HERE -m "Version VERSION_GOES_HERE".

Releases are done by running make-release.sh VERSION_GOES_HERE to build the Python source packages, sign them, and upload them to PyPI.

Testing

To run all the tests:

python -m unittest discover --verbose

Testing is also done automatically by TravisCI, upon every push. To set up the environment for offline/local Travis-like testing on a Debian-like system:

sudo docker run --volume "$(pwd):/speech_recognition" --interactive --tty quay.io/travisci/travis-python:latest /bin/bash
su - travis && cd /speech_recognition
sudo apt-get update && sudo apt-get install swig libpulse-dev
pip install --user pocketsphinx monotonic && pip install --user flake8 rstcheck && pip install --user -e .
python -m unittest discover --verbose # run unit tests
python -m flake8 --ignore=E501,E701 speech_recognition tests examples setup.py # ignore errors for long lines and multi-statement lines
python -m rstcheck README.rst reference/*.rst # ensure RST is well-formed

FLAC Executables

The included flac-win32 executable is the official FLAC 1.3.2 32-bit Windows binary.

The included flac-linux-x86 and flac-linux-x86_64 executables are built from the FLAC 1.3.2 source code with Manylinux to ensure that it's compatible with a wide variety of distributions.

The built FLAC executables should be bit-for-bit reproducible. To rebuild them, run the following inside the project directory on a Debian-like system:

# download and extract the FLAC source code
cd third-party
sudo apt-get install --yes docker.io

# build FLAC inside the Manylinux i686 Docker image
tar xf flac-1.3.2.tar.xz
sudo docker run --tty --interactive --rm --volume "$(pwd):/root" quay.io/pypa/manylinux1_i686:latest bash
    cd /root/flac-1.3.2
    ./configure LDFLAGS=-static # compiler flags to make a static build
    make
exit
cp flac-1.3.2/src/flac/flac ../speech_recognition/flac-linux-x86 && sudo rm -rf flac-1.3.2/

# build FLAC inside the Manylinux x86_64 Docker image
tar xf flac-1.3.2.tar.xz
sudo docker run --tty --interactive --rm --volume "$(pwd):/root" quay.io/pypa/manylinux1_x86_64:latest bash
    cd /root/flac-1.3.2
    ./configure LDFLAGS=-static # compiler flags to make a static build
    make
exit
cp flac-1.3.2/src/flac/flac ../speech_recognition/flac-linux-x86_64 && sudo rm -r flac-1.3.2/

The included flac-mac executable is extracted from xACT 2.39, which is a frontend for FLAC 1.3.2 that conveniently includes binaries for all of its encoders. Specifically, it is a copy of xACT 2.39/xACT.app/Contents/Resources/flac in xACT2.39.zip.

Authors

Uberi <[email protected]> (Anthony Zhang)
bobsayshilol
arvindch <[email protected]> (Arvind Chembarpu)
kevinismith <[email protected]> (Kevin Smith)
haas85
DelightRun <[email protected]>
maverickagm
kamushadenes <[email protected]> (Kamus Hadenes)
sbraden <[email protected]> (Sarah Braden)
tb0hdan (Bohdan Turkynewych)
Thynix <[email protected]> (Steve Dougherty)
beeedy <[email protected]> (Broderick Carlin)

Please report bugs and suggestions at the issue tracker!

How to cite this library (APA style):

Zhang, A. (2017). Speech Recognition (Version 3.8) [Software]. Available from https://github.com/Uberi/speech_recognition#readme.

How to cite this library (Chicago style):

Zhang, Anthony. 2017. Speech Recognition (version 3.8).

Also check out the Python Baidu Yuyin API, which is based on an older version of this project, and adds support for Baidu Yuyin. Note that Baidu Yuyin is only available inside China.

License

Copyright 2014-2017 Anthony Zhang (Uberi). The source code for this library is available online at GitHub.

SpeechRecognition is made available under the 3-clause BSD license. See LICENSE.txt in the project's root directory for more information.

For convenience, all the official distributions of SpeechRecognition already include a copy of the necessary copyright notices and licenses. In your project, you can simply say that licensing information for SpeechRecognition can be found within the SpeechRecognition README, and make sure SpeechRecognition is visible to users if they wish to see it.

SpeechRecognition distributes source code, binaries, and language files from CMU Sphinx. These files are BSD-licensed and redistributable as long as copyright notices are correctly retained. See speech_recognition/pocketsphinx-data/*/LICENSE*.txt and third-party/LICENSE-Sphinx.txt for license details for individual parts.

SpeechRecognition distributes source code and binaries from PyAudio. These files are MIT-licensed and redistributable as long as copyright notices are correctly retained. See third-party/LICENSE-PyAudio.txt for license details.

SpeechRecognition distributes binaries from FLAC - speech_recognition/flac-win32.exe, speech_recognition/flac-linux-x86, and speech_recognition/flac-mac. These files are GPLv2-licensed and redistributable, as long as the terms of the GPL are satisfied. The FLAC binaries are an aggregate of separate programs, so these GPL restrictions do not apply to the library or your programs that use the library, only to FLAC itself. See LICENSE-FLAC.txt for license details.

Comments
  • AttributeError: Could not find PyAudio; check installation

    AttributeError: Could not find PyAudio; check installation

    Steps to reproduce

    Hey I am just getting started with speech recignition and i was just checking some basic examples that i had found It was nothing too complicated, very simple things
    The code i am using is below :

    import speech_recognition as sr
    
    # Record Audio
    r = sr.Recognizer()
    with sr.Microphone() as source:
        print("Say something!")
        audio = r.listen(source)
     
    # Speech recognition using Google Speech Recognition
    try:
        # for testing purposes, we're just using the default API key
        # to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`
        # instead of `r.recognize_google(audio)`
        print("You said: " + r.recognize_google(audio))
    except sr.UnknownValueError:
        print("Google Speech Recognition could not understand audio")
    except sr.RequestError as e:
        print("Could not request results from Google Speech Recognition service; {0}".format(e))
    

    Expected behaviour

    I expect it to just use my microphone and let me record something.

    (What did you expect to happen?)

    Actual behaviour

    This is the error i get,

    Traceback (most recent call last):
    
      File "<ipython-input-9-2b39d94ceb5b>", line 1, in <module>
        runfile('/home/sanwal092/Desktop/Python/SR/dummy.py', wdir='/home/sanwal092/Desktop/Python/SR')
    
      File "/home/sanwal092/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 866, in runfile
        execfile(filename, namespace)
    
      File "/home/sanwal092/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
        exec(compile(f.read(), filename, 'exec'), namespace)
    
      File "/home/sanwal092/Desktop/Python/SR/dummy.py", line 14, in <module>
        with sr.Microphone() as source:
    
      File "/home/sanwal092/anaconda3/lib/python3.6/site-packages/speech_recognition/__init__.py", line 78, in __init__
        self.pyaudio_module = self.get_pyaudio()
    
      File "/home/sanwal092/anaconda3/lib/python3.6/site-packages/speech_recognition/__init__.py", line 109, in get_pyaudio
        raise AttributeError("Could not find PyAudio; check installation")
    
    AttributeError: Could not find PyAudio; check installation
    

    When i run import pyauaudio as p;print(p.version)"`.) to check the version of pyaudio installed, i get the following error instead of the library working.

    Could not import the PyAudio C module '_portaudio'.
    Traceback (most recent call last):
    
      File "<ipython-input-11-b37da17f237b>", line 1, in <module>
        import pyaudio
    
      File "/home/sanwal092/pyaudio/build/lib.linux-x86_64-2.7/pyaudio.py", line 116, in <module>
        import _portaudio as pa
    
      ImportError: /home/sanwal092/pyaudio/build/lib.linux-x86_64-2.7/_portaudio.so: undefined symbol: _Py_ZeroStruct
    

    (What happened instead? How is it different from what you expected?) I am not sure where the error is coming from. It must have something to do with my compilation of the libray maybe and i am not sure what to do from here on. Anyhelp would be appreciated

    (If the library threw an exception, paste the full stack trace here)
    

    System information

    (Delete all the statements that don't apply.)

    My system is Ubuntu 16.04 LTS x64

    My Python version is Python 3.6.0 :: Anaconda 4.3.1 (64-bit)

    My Pip version is pip 9.0.1

    My SpeechRecognition library version is 3.6.5

    opened by sanster9292 52
  • Alsa lib error while running program and it automatically set particular threshold value always

    Alsa lib error while running program and it automatically set particular threshold value always

    Steps to reproduce

    Initially when i was working with this program in my raspberry pi running jessie, everything went so fine. But from today morning it is not picking any audio from usb microphone and always setting threshold values as 48.5477227879 even high noise and low noise situation.

    so i tried to reinstall program again, eventhough it didnt worked for me.

    after running main.py program it shows

    "a moment of silence please.. Set minimum threshold to 48.5477227879 say something! "

    thats it, after this nothing is happening.

    After some time i tried to run it through terminal, it shiows error as " ALSA lib pcm_dmix.c :1022:(snd_pcm_dmix_open) unable to open slave" say some thing

    what shall i do now to make this program run.

    Expected behaviour

    Run program without error and recognise speech.

    Actual behaviour

    scenario 1: Running from python

    program running , no error, it shows "a moment of silence please.. Set minimum threshold to 48.5477227879 say something! "

    and stay like this for whole time.

    scenario 2: Running from terminal

    "ALSA lib pcm_dmix.c :1022:(snd_pcm_dmix_open) unable to open slave ALSA lib pcm.c:2289:(snd_pcm_open_noupdate) unknown PCM cards.pcm rear ALSA lib pcm.c:2289:(snd_pcm_open_noupdate) unknown PCM cards.pcm center_lfe ALSA lib pcm.c:2289:(snd_pcm_open_noupdate) unknown PCM cards.pcm side ALSA lib pcm.c:2289:(snd_pcm_open_noupdate) unknown PCM cards.pcm hdmi ALSA lib pcm.c:2289:(snd_pcm_open_noupdate) unknown PCM cards.pcm hdmi ALSA lib pcm.c:2289:(snd_pcm_open_noupdate) unknown PCM cards.pcm modem ALSA lib pcm.c:2289:(snd_pcm_open_noupdate) unknown PCM cards.pcm modem ALSA lib pcm.c:2289:(snd_pcm_open_noupdate) unknown PCM cards.pcm phoneline ALSA lib pcm.c:2289:(snd_pcm_open_noupdate) unknown PCM cards.pcm phoneline ALSA lib pcm_dmix.c :1022:(snd_pcm_dmix_open) unable to open slave Say something"

    and stay like this at all, no more recognition.

    System information

    My SpeechRecognition library version is <3.3.3>.

    My PyAudio library version is <0.2.9>

    I installed PocketSphinx from Debian repositories.

    opened by santhoshrsk 41
  • Getting error while using speech_recognition module in python.Error: TypeError: function takes exactly 2 arguments (3 given)

    Getting error while using speech_recognition module in python.Error: TypeError: function takes exactly 2 arguments (3 given)

    I want to convert speech to text in real time using the module SpeechRecognition 3.4.6 I've installed everything and now I am trying a simple code from example, here's the code:

    import speech_recognition as sr
    
    r = sr.Recognizer()
    with sr.Microphone() as source:
        print("Say something!")
        audio = r.listen(source)
    
    
    try:
        print("Sphinx thinks you said " + r.recognize_sphinx(audio))
    except sr.UnknownValueError:
         print("Sphinx could not understand audio")
    except sr.RequestError as e:
         print("Sphinx error; {0}".format(e)) `
    
    
    
    I am getting error on line `audio=r.listen(sourc
      `import speech_recognition as sr
          r = sr.Recognizer()
          with sr.Microphone() as source:
              print("Say something!")
              audio = r.listen(source)
    try:
        print("Sphinx thinks you said " + r.recognize_sphinx(audio))
    except sr.UnknownValueError:
         print("Sphinx could not understand audio")
    except sr.RequestError as e:
         print("Sphinx error; {0}".format(e)) 
    

    I am getting error on line audio=r.listen(source)

    Traceback (most recent call last):
    File "sr.py", line 4, in <module>
    audio = r.listen(source)                   # listen for the first    phrase and extract it into audio data
     File "/usr/local/lib/python2.7/dist-  packages/speech_recognition/__init__.py", line 493, in listen
     buffer = source.stream.read(source.CHUNK)
     File "/usr/local/lib/python2.7/dist-packages/speech_recognition/__init__.py", line 139, in read
      return self.pyaudio_stream.read(size, exception_on_overflow = False)
      File "/usr/local/lib/python2.7/dist-packages/pyaudio.py", line 608, in read
     return pa.read_stream(self._stream, num_frames, exception_on_overflow)
     TypeError: function takes exactly 2 arguments (3 given)
    

    How to fix this problem.

    opened by aquibjaved 29
  • OSError: [Errno -9999] Unanticipated host error

    OSError: [Errno -9999] Unanticipated host error

    Steps to reproduce

    1. (How do you make the issue happen? Does it happen every time you try it?) `No sometimes the error doesn't acurr but randomly out of the blue it does.
    2. My Code:
    
    #JARVIS mark 10. python 3.5.1 version
    #JUST.A.RATHER.VERY.INTELEGENT.SYSTEM.
    ##import speech_recognition
    ##import datetime
    ##import os
    ##import random
    ##import datetime
    ##import webbrowser
    ##import time
    ##import calendar
    from difflib import SequenceMatcher
    import nltk
    from nltk.tokenize import sent_tokenize, word_tokenize
    from nltk.tokenize import PunktSentenceTokenizer
    import speech_recognition as sr
    import sys
    from time import sleep
    import os
    import random
    r = sr.Recognizer()
    m = sr.Microphone()
    
    
    
    #Brain functions, vocab!
    what_i_should_call_someone = [""]
    Good_Things = ["love","sweet","nice","happy","fun","awesome","great"]
    Bad_Things = ["death","kill","hurt","harm","discomfort","rape","pain","sad","depression","depressed","angry","mad","broken","raging","rage"]
    # Words that you might says in the beginning of your input, for example: "um hey where are we!?!"
    Slang_Words = ["um","uh","hm","eh"]
    # Put all greetings in here
    Static_Greetings = ["Hey","Hi","Hello"]
    # Put your AIs Name and other names just in case.
    Name = ["jarvis"]
    posible_answer_key_words = ["becuase","yes","no"]
    Chance_that_question_was_asked_1 = 0
    Chance_that_question_was_asked_2 = 0
    certainty_question_was_asked = 0
    Me_statment_keywords = ["you","your","yours"]
    You_statment_keywords = ["i","i'm","me"]
    global certainty_person_is_talking_to_me
    what_i_said = ("")
    Just_asked_querstion = False
    the_last_thing_i_said = ("")
    the_last_thing_person_said = ("")
    what_person_said = ("")
    what_person_said_means = [""]
    what_im_about_to_say = [""]
    why_im_about_to_say_it = [""]
    who_im_talking_to = [""]
    how_i_feel = [""]
    why_do_i_feel_the_way_i_do = [""]
    what_i_am_thinking = ("")
    # ways to describe the nouns last said
    it_pronouns = ["it","they","she","he"]
    # last person place or thing described spoken or descussed!
    last_nouns = [""]
    
    # Sample of random questions so Jarvis has somthing to index to know what a question is!
    Sample_Questions = ["what is the weather like","where are we today","why did you do that","where is the dog","when are we going to leave","why do you hate me","what is the Answer to question 8",
                        "what is a dinosour","what do i do in an hour","why do we have to leave at 6.00", "When is the apointment","where did you go","why did you do that","how did he win","why won’t you help me",
                        "when did he find you","how do you get it","who does all the shipping","where do you buy stuff","why don’t you just find it in the target","why don't you buy stuff at target","where did you say it was",
                        "when did he grab the phone","what happened at seven am","did you take my phone","do you like me","do you know what happened yesterday","did it break when it dropped","does it hurt everyday",
                        "does the car break down often","can you drive me home","where did you find me"
                        "can it fly from here to target","could you find it for me"]
    
    Sample_Greetings = ["hey","hello","hi","hey there","hi there","hello there","hey jarvis","hey dude"]
    
    Question_Keyword_Answer = []
    
    Int_Question_Keywords_In_Input = []
    
    Possible_Question_Key_Words = ["whats","what","where","when","why","isn't","whats","who","should","would","could","can","do","does","can","can","did"]
    
    Possible_Greeting_Key_Words = ["hey","hi","hello",Name]
    
    # In this function: Analyze the user input find out if it's (Question, Answer, Command. Etc) and what is being: Asked, Commanded, ETC.
    def Analyze():
    
    
        def Analyze_For_Greeting():
           
            def Greeting_Keyword_Check():
                    global Possible_Greeting_Key_Words
                    Int_Greeting_Keywords_In_Input = []
                    for words in what_person_said_l_wt:
                        if words in Possible_Greeting_Key_Words:
                            Int_Greeting_Keywords_In_Input.append(words)
                    Amount_Greeting_Keywords = (len(Int_Greeting_Keywords_In_Input))
                    if Amount_Greeting_Keywords > 0:
                        return True
            def Greeting_Sentence_Match():
                 
                    for Ran_Greeting in Sample_Greetings:
                        Greeting_Matcher = SequenceMatcher(None, Ran_Greeting, what_person_said_l).ratio()
                        if Greeting_Matcher > 0.5:
                            print (Greeting_Matcher)
                            print ("Similar to Greeting: "+Ran_Greeting)
                            return True
    
    
            
            Greeting_Keyword_Check()
            Greeting_Sentence_Match()
        
        #In this function: determin if the input is a question or not.
        def Analyze_For_Question():
                # In this function: if there is atleast one question keyword in the user input then return true.
                def Question_Keyword_Check():
                    global Possible_Question_Key_Words
                    Int_Question_Keywords_In_Input = []
                    for words in what_person_said_l_wt:
                        if words in Possible_Question_Key_Words:
                            Int_Question_Keywords_In_Input.append(words)
                    Amount_Question_keywords = (len(Int_Question_Keywords_In_Input))
                    if Amount_Question_keywords > 0:
                        return True
                # In this function: if the users input is simular to other sample questions, return true.
                def Question_Sentence_Match():
                    for Ran_Question in Sample_Questions:
                        Question_Matcher = SequenceMatcher(None, Ran_Question, what_person_said_l).ratio()
                        if Question_Matcher > 0.5:
                            print (Question_Matcher)
                            print ("Similar to Question: "+Ran_Question)
                            return True
                # In this function: if the first word of the users input is a question keyword and there is a different question keyword in the input return true.
                def Question_Verb_Noun_Check():
                    #if you say "hey jarvis" before somthing like a question or command it will still understand
                    try:
                        for word in what_person_said_l_wt:
                            if word in Static_Greetings or word in Name:
                                    print (word)
                                    Minus_Begin_Greet1 = what_person_said_l_wt.remove(word)
                                    print (Minus_Begin_Greet1)
                                    return True 
                    except IndexError:
                        pass
    
                Question_Keyword_Check()                  
                Question_Sentence_Match()
                Question_Verb_Noun_Check()
                if Question_Keyword_Check()==True and Question_Sentence_Match()==True and Question_Verb_Noun_Check()==True:
                    return True
                else:
                    return False                
    
    
                
        # All the funtions in Analyze
        Analyze_For_Greeting()
        Analyze_For_Question() 
    
        if Analyze_For_Question()==True:
            print ("This was a Question")
        
    
    
    
    
    
    
    
    
    
    Conversation=True
    Conversation_Started=False
    
    while Conversation==True:
    
        try:
            if Conversation_Started==False:
                #Greeting()
                Conversation_Started=True
                
            with m as source: r.adjust_for_ambient_noise(source)
            print(format(r.energy_threshold))
     
            print("Say something!") # just here for now and testing porposes so we know whats happening
            with m as source: audio = r.listen(source)
            print("Got it! Now to recognize it...")
            try:
                # recognize speech using Google Speech Recognition
                value = r.recognize_google(audio)
    
                # we need some special handling here to correctly print unicode characters to standard output
                if str is bytes:  # this version of Python uses bytes for strings (Python 2)
                    print(u"You said {}".format(value).encode("utf-8"))
                else:  # this version of Python uses unicode for strings (Python 3+)
                    print("You said {}".format(value))
                              
                what_person_said_l = value.lower()
                what_person_said_l_wt = word_tokenize(what_person_said_l)
                Analyze()
                
     
            except sr.UnknownValueError:
                print ("what was that?")
            except sr.RequestError as e:
                print("Uh oh! Sorry sir Couldn't request results from Google Speech Recognition service; {0}".format(e))
        except KeyboardInterrupt:
            pass   
            
    
    
    
    1. (If there are any files, like audio recordings, don't forget to include them.)

    Expected behaviour

    my code should do what it's suposed to do i don't see what you mean.

    Actual behaviour

    (If the library threw an exception, paste the full stack trace here)

    Traceback (most recent call last):
      File "/media/pi/TRAVELDRIVE/Jarvis(10.0).py", line 172, in <module>
        with m as source: r.adjust_for_ambient_noise(source)
      File "/usr/local/lib/python3.4/dist-packages/speech_recognition/__init__.py", line 140, in __enter__
        input=True,  # stream is an input stream
      File "/usr/local/lib/python3.4/dist-packages/pyaudio.py", line 750, in open
        stream = Stream(self, *args, **kwargs)
      File "/usr/local/lib/python3.4/dist-packages/pyaudio.py", line 441, in __init__
        self._stream = pa.open(**arguments)
    OSError: [Errno -9999] Unanticipated host error
    ```_
    System information
    ------------------
    
    (Delete all the statements that don't apply.)
    
    My **system** is <Raspbian PIxel>. (For example, "Ubuntu 16.04 LTS x64", "Windows 10 x64", or "macOS Sierra".)
    
    My **Python version** is <python3>. (You can check this by running `python -V`.)
    
    My **Pip version** is <1.5.6 for python 2.7 but im using pip3>. (You can check this by running `pip -V`.)
    
    My **SpeechRecognition library version** is <3.6.5>. (You can check this by running `python -c "import speech_recognition as sr;print(sr.__version__)"`.)
    
    My **PyAudio library version** is <0.2.11> / I don't have PyAudio installed. (You can check this by running `python -c "import pyaudio as p;print(p.__version__)"`.)
    
    opened by techsetonyoutube 27
  • Install/compilation of PyAudio failed

    Install/compilation of PyAudio failed

    Kubuntu 17.10.1 Python 3.6.3 Pip 9.0.1

    Created a virtual environment and read through the SpeechRecognition instructions. Here are the steps

    $ sudo apt-get install python-pyaudio python3-pyaudio

    that went okay, it also installed libportaudio2

    $ sudo apt-get install python3 python3-all-dev python3-pip build-essential swig git libpulse-dev

    Apt also added the suggestions - "The following NEW packages will be installed: libglib2.0-dev libglib2.0-dev-bin libpcre16-3 libpcre3-dev libpcre32-3 libpcrecpp0v5 libpulse-dev libpython3-all-dev pkg-config python3-all python3-all-dev swig swig3.0"

    Then installed SpeechRecognition

    $ pip install SpeechRecognition

    Then tried SpeechRecognition and got an error

    $ python3 -m speech_recognition Traceback (most recent call last): File ".../SpeechRecognition/lib/python3.6/site-packages/speech_recognition/init.py", line 108, in get_pyaudio import pyaudio ModuleNotFoundError: No module named 'pyaudio'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File ".../SpeechRecognition/lib/python3.6/site-packages/speech_recognition/main.py", line 4, in m = sr.Microphone() File ".../SpeechRecognition/lib/python3.6/site-packages/speech_recognition/init.py", line 79, in init self.pyaudio_module = self.get_pyaudio() File ".../SpeechRecognition/lib/python3.6/site-packages/speech_recognition/init.py", line 110, in get_pyaudio raise AttributeError("Could not find PyAudio; check installation") AttributeError: Could not find PyAudio; check installation

    Tried installing PyAudio , even though it is already installed via Apt

    $ pip3 install pyaudio Collecting pyaudio Downloading PyAudio-0.2.11.tar.gz Building wheels for collected packages: pyaudio Running setup.py bdist_wheel for pyaudio ... error Complete output from command .../SpeechRecognition/bin/python3 -u -c "import setuptools, tokenize;file='/tmp/pip-build-72_au_03/pyaudio/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /tmp/tmpu_qg0ulxpip-wheel- --python-tag cp36: usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...] or: -c --help [cmd1 cmd2 ...] or: -c --help-commands or: -c cmd --help

    error: invalid command 'bdist_wheel'


    Failed building wheel for pyaudio Running setup.py clean for pyaudio Failed to build pyaudio Installing collected packages: pyaudio Running setup.py install for pyaudio ... error Complete output from command .../SpeechRecognition/bin/python3 -u -c "import setuptools, tokenize;file='/tmp/pip-build-72_au_03/pyaudio/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-feocjwqu-record/install-record.txt --single-version-externally-managed --compile --install-headers .../SpeechRecognition/include/site/python3.6/pyaudio: running install running build running build_py creating build creating build/lib.linux-x86_64-3.6 copying src/pyaudio.py -> build/lib.linux-x86_64-3.6 running build_ext building '_portaudio' extension creating build/temp.linux-x86_64-3.6 creating build/temp.linux-x86_64-3.6/src x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fdebug-prefix-map=/build/python3.6-sXpGnM/python3.6-3.6.3=. -specs=/usr/share/dpkg/no-pie-compile.specs -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I.../SpeechRecognition/include -I/usr/include/python3.6m -c src/_portaudiomodule.c -o build/temp.linux-x86_64-3.6/src/_portaudiomodule.o src/_portaudiomodule.c:29:10: fatal error: portaudio.h: No such file or directory #include "portaudio.h" ^~~~~~~~~~~~~ compilation terminated. error: command 'x86_64-linux-gnu-gcc' failed with exit status 1


    Command ".../SpeechRecognition/bin/python3 -u -c "import setuptools, tokenize;file='/tmp/pip-build-72_au_03/pyaudio/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-feocjwqu-record/install-record.txt --single-version-externally-managed --compile --install-headers .../SpeechRecognition/include/site/python3.6/pyaudio" failed with error code 1 in /tmp/pip-build-72_au_03/pyaudio/

    This appears to be the same problem - https://github.com/SlapBot/stephanie-va/issues/8 . I will try the solution at https://github.com/SlapBot/stephanie-va/issues/8#issuecomment-307617796

    Not sure if the following had an impact or not. The apt commands I used were run within a virtual environment, but I would have thought anything to do with Kubuntu packages would be system wide. I installed SpeechRecognition with pip instead of pip3 (I don't think it matters).

    Although if I run pip3 or pip within the virtual environment, they both say version 9.0.1, yet if I run them both outside the virtual environment, it shows pip3 is installed but not pip.

    opened by jehoshua7 23
  • ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format

    ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format

    code

    import speech_recognition as sr
    r = sr.Recognizer()
    with sr.AudioFile("test.wav") as source:
        audio = r.record(source)
    
    try:
        s = r.recognize_google(audio)
        print("Text: "+s)
    except Exception as e:
        print("Exception: "+str(e))
    

    error

    Traceback (most recent call last):
      File "/Users/robiulislam/anaconda3/lib/python3.6/site-packages/speech_recognition/__init__.py", line 203, in __enter__
        self.audio_reader = wave.open(self.filename_or_fileobject, "rb")
      File "/Users/robiulislam/anaconda3/lib/python3.6/wave.py", line 499, in open
        return Wave_read(f)
      File "/Users/robiulislam/anaconda3/lib/python3.6/wave.py", line 163, in __init__
        self.initfp(f)
      File "/Users/robiulislam/anaconda3/lib/python3.6/wave.py", line 143, in initfp
        self._read_fmt_chunk(chunk)
      File "/Users/robiulislam/anaconda3/lib/python3.6/wave.py", line 260, in _read_fmt_chunk
        raise Error('unknown format: %r' % (wFormatTag,))
    wave.Error: unknown format: 65534
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/Users/robiulislam/anaconda3/lib/python3.6/site-packages/speech_recognition/__init__.py", line 208, in __enter__
        self.audio_reader = aifc.open(self.filename_or_fileobject, "rb")
      File "/Users/robiulislam/anaconda3/lib/python3.6/aifc.py", line 912, in open
        return Aifc_read(f)
      File "/Users/robiulislam/anaconda3/lib/python3.6/aifc.py", line 351, in __init__
        self.initfp(file_object)
      File "/Users/robiulislam/anaconda3/lib/python3.6/aifc.py", line 316, in initfp
        raise Error('file does not start with FORM id')
    aifc.Error: file does not start with FORM id
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/Users/robiulislam/anaconda3/lib/python3.6/site-packages/speech_recognition/__init__.py", line 234, in __enter__
        self.audio_reader = aifc.open(aiff_file, "rb")
      File "/Users/robiulislam/anaconda3/lib/python3.6/aifc.py", line 912, in open
        return Aifc_read(f)
      File "/Users/robiulislam/anaconda3/lib/python3.6/aifc.py", line 357, in __init__
        self.initfp(f)
      File "/Users/robiulislam/anaconda3/lib/python3.6/aifc.py", line 314, in initfp
        chunk = Chunk(file)
      File "/Users/robiulislam/anaconda3/lib/python3.6/chunk.py", line 63, in __init__
        raise EOFError
    EOFError
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "test.py", line 21, in <module>
        with sr.AudioFile("test.wav") as source:
      File "/Users/robiulislam/anaconda3/lib/python3.6/site-packages/speech_recognition/__init__.py", line 236, in __enter__
        raise ValueError("Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format")
    ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format
    

    operating system is Mac & python version 3

    opened by connect2robiul 21
  • Bad gateway errors using Google Speech Recognition service

    Bad gateway errors using Google Speech Recognition service

    Steps to reproduce

    1. Trying to transcribe polish voice files basing on code from examples. I have tried with default and my generated API key.
    2. `#!/usr/bin/env python3

    import speech_recognition as sr

    from os import path

    AUDIO_FILE = path.join(path.dirname(path.realpath(file)), "test.flac")

    r = sr.Recognizer() with sr.AudioFile(AUDIO_FILE) as source: audio = r.record(source) # read the entire audio file

    GOOGLE_KEY = "mykey" lang = "pl"

    try: print("Google Speech Recognition thinks you said " + r.recognize_google(audio, key = GOOGLE_KEY, language=lang)) except sr.UnknownValueError: print("Google Speech Recognition could not understand audio") except sr.RequestError as e: print("Could not request results from Google Speech Recognition service; {0}".format(e)) ` I use this file as test file - test.zip

    Expected behaviour

    Recognized polish text.

    Actual behaviour

    Every time i get the same error message: "Could not request results from Google Speech Recognition service; recognition request failed: Bad Gateway"

    System information

    My Python version is Python 2.7.11

    My SpeechRecognition library version is 3.4.6

    I don't have PyAudio installed.

    I don't havePocketSphinx installed.

    opened by mtraton 21
  • Houndify not responding any except when request to account credit over limite

    Houndify not responding any except when request to account credit over limite

    • Houndify not responding any "ErrorCode Example : UnknownValueError,RequestError" for except when request to account credit over limite
    • It's posible to check current credit avalible via api key ?

    Thanks , Best reguard

    opened by DarKWinGTM 20
  • IOError: [Errno Input overflowed] -9981

    IOError: [Errno Input overflowed] -9981

    I got this error after the recognition worked OK once.

    Source code:

    import speech_recognition as sr
    
    r = sr.Recognizer()
    m = sr.Microphone()
    m.RATE = 44100
    m.CHUNK = 512
    
    print("A moment of silence, please...")
    with m as source:
        r.adjust_for_ambient_noise(source)
        print("Set minimum energy threshold to {}".format(r.energy_threshold))
        while True:
            print("Say something!")
            audio = r.listen(source)
            print("Got it! Now to recognize it...")
            try:
                print("You said " + r.recognize(audio))
            except LookupError:
                print("Oops! Didn't catch that")
    
    

    Error message:

    A moment of silence, please...
    ALSA lib pcm_dmix.c:957:(snd_pcm_dmix_open) The dmix plugin supports only playback stream
    Cannot connect to server socket err = No such file or directory
    Cannot connect to server request channel
    jack server is not running or cannot be started
    Set minimum energy threshold to 350.037988116
    Say something!
    Got it! Now to recognize it...
    You said hello testing testing
    Say something!
    Traceback (most recent call last):
      File "listen.py", line 14, in <module>
        audio = r.listen(source)
      File "/usr/local/lib/python2.7/dist-packages/speech_recognition/__init__.py", line 265, in listen
        buffer = source.stream.read(source.CHUNK)
      File "/usr/lib/pymodules/python2.7/pyaudio.py", line 564, in read
        return pa.read_stream(self._stream, num_frames)
    IOError: [Errno Input overflowed] -9981
    
    opened by chaoming 20
  • Error for Pyaudio 0.2.9 version

    Error for Pyaudio 0.2.9 version

    Hello!

    I installed all modules for use the SpeechRecognition, and I checked that the modules was in the python modules. When I check the examples for use this SpeechRecognition in Terminal do the next:

    Traceback (most recent call last): File "background_listening.py", line 19, in m = sr.Microphone() File "/home/javier/.local/lib/python2.7/site-packages/speech_recognition/init.py", line 55, in init self.pyaudio_module = self.get_pyaudio() File "/home/javier/.local/lib/python2.7/site-packages/speech_recognition/init.py", line 88, in get_pyaudio raise AttributeError("PyAudio 0.2.9 or later is required (found version {0})".format(pyaudio.version)) AttributeError: PyAudio 0.2.9 or later is required (found version 0.2.8)

    Why this function does this? because my version of pyaudio is 0.2.9.

    I will like that somebody help me please Thanks.

    opened by javiisanchez 17
  • Stream Closed Error

    Stream Closed Error

    The library is awesome and, even as green as I am, I nearly have it working. I'm playing around with it in PyCharm but after one or two recordings, it gives me an error. This is my output.

    A moment of silence, please...
    2015-12-04 23:39:04.175 Python[40487:860874] 23:39:04.175 WARNING:  140: This application, or a library it uses, is using the deprecated Carbon Component Manager for hosting Audio Units. Support for this will be removed in a future release. Also, this makes the host incompatible with version 3 audio units. Please transition to the API's in AudioComponent.h.
    Set minimum energy threshold to 74.0823600901
    Say something!
    Got it! Now to recognize it...
    You said please
    Say something!
    Traceback (most recent call last):
      File "/Users/stevenchun/PycharmProjects/VoiceRecognition/speech_recognition/__main__.py", line 29, in <module>
        print("Uh oh! Couldn't request results from Google Speech Recognition service; {0}".format(e))
      File "/Users/stevenchun/PycharmProjects/VoiceRecognition/speech_recognition/__init__.py", line 80, in __exit__
        if not self.stream.is_stopped():
      File "/usr/local/lib/python2.7/site-packages/pyaudio.py", line 543, in is_stopped
        return pa.is_stream_stopped(self._stream)
    IOError: [Errno -9988] Stream closed
    

    I downloaded this all today so I have the most recent version as well as the most recent version of PyAudio. I haven't changed main at all, but here it is anyway:

    import speech_recognition as sr
    
    r = sr.Recognizer()
    m = sr.Microphone()
    
    try:
        print("A moment of silence, please...")
        with m as source:
            r.adjust_for_ambient_noise(source)
            print("Set minimum energy threshold to {}".format(r.energy_threshold))
            while True:
                print("Say something!")
                audio = r.listen(source)
                print("Got it! Now to recognize it...")
                try:
                    # recognize speech using Google Speech Recognition
                    value = r.recognize_google(audio)
                    # we need some special handling here to correctly print unicode characters to standard output
                    if str is bytes: # this version of Python uses bytes for strings (Python 2)
                        print(u"You said {}".format(value).encode("utf-8"))
    
                    else: # this version of Python uses unicode for strings (Python 3+)
                        print("You said {}".format(value))
    
                except sr.UnknownValueError:
                    print("Oops! Didn't catch that")
                except sr.RequestError as e:
                    print("Uh oh! Couldn't request results from Google Speech Recognition service; {0}".format(e))
    
    except KeyboardInterrupt:
        pass
    

    At first I thought the program was trying to access Google Speech too rapidly (more than once per second as is allowed), but putting in time.sleep(1) didn't work.

    Any ideas? Thanks a bunch

    opened by stevenrchun 17
  • Saved audio recorded with SR plays choppy and too fast

    Saved audio recorded with SR plays choppy and too fast

    Steps to reproduce

    1. Record audio from an USB audio interface (Focusrite Scarlett) with Microphone() instance
    2. Save to file with Pythons wave module

    Here's an exemplary code that shows what I do (copied together from actual source):

    import speech_recognition as sr
    import wave
    
    mic_index = 7  # focusrite scarlett input
    
    recognizer = sr.Recognizer()
    mic = sr.Microphone(device_index=mic_index)
    
    print('Recording...')
    with mic as source:
        recognizer.adjust_for_ambient_noise(source, duration=0.2)
        audio = recognizer.listen(source, timeout=1, phrase_time_limit=5)
        
    wave_file = wave.open('audiotest.wav', 'wb')
    wave_file.setnchannels(1)
    wave_file.setsampwidth(2)
    wave_file.setframerate(16000)
    wave_file.writeframes(audio.get_wav_data(convert_rate=16000))
    wave_file.close()
    

    Expected behaviour

    The written wave file should sound like the original audio source: clean and correct tempo

    Actual behaviour

    The written wave file sounds somewhat choppy and way too fast. audiotest.wav.zip

    Recording audio from the device with arecord -D plughw:1,0 -f cd -d 5 alsatest.wav produces a clean result.

    System information

    (Delete all the statements that don't apply.)

    My system is Linux Mint 20.3 Cinnamon.

    My Python version is 3.8.10.

    My Pip version is 20.0.2.

    My SpeechRecognition library version is 3.9.0.

    My PyAudio library version is 0.2.13

    My microphones are:

    HDA NVidia: HDMI 0 (hw:0,3)
    HDA NVidia: HDMI 1 (hw:0,7)
    HDA NVidia: HDMI 2 (hw:0,8)
    HDA NVidia: HDMI 3 (hw:0,9)
    HDA NVidia: HDMI 4 (hw:0,10)
    HDA NVidia: HDMI 5 (hw:0,11)
    HDA NVidia: HDMI 6 (hw:0,12)
    Scarlett 2i2 USB: Audio (hw:1,0)
    HD-Audio Generic: ALC1220 Analog (hw:2,0)
    HD-Audio Generic: ALC1220 Digital (hw:2,1)
    HD-Audio Generic: ALC1220 Alt Analog (hw:2,2)
    C922 Pro Stream Webcam: USB Audio (hw:3,0)
    hdmi
    pulse
    default
    

    My working microphones are:

      7: 'Scarlett 2i2 USB: Audio (hw:1,0)', 
      11: 'C922 Pro Stream Webcam: USB Audio (hw:3,0)', 
      13: 'pulse', 
      14: 'default'
    }
    
    opened by antimatter84 0
  • Recognize_google method not working as intended, solution provided here

    Recognize_google method not working as intended, solution provided here

    As i know the solution of this i would not post my personal specs. When calling recognize_google method of speech_recognition it prints the entire JSON response, investigating the problem is on the 917 and 918 lines of speech_recognition_init_.py The prints are unindented so the all the JSON result is printing constantly on console.

    I dont think this is how it is supossed to work at all. Don't want to create a pull request for this as i dont know well how, just hoping someone notices! Thank you!

    opened by alxgarci 0
  • Fix error when using Whisper Speech Recognition on Windows

    Fix error when using Whisper Speech Recognition on Windows

    Due to Windows' limitations on temporary files, we use a custom temporary file provider to ensure Whisper TTS can work on windows.

    See https://stackoverflow.com/questions/23212435/permission-denied-to-write-to-my-temporary-file

    opened by acenturyandabit 0
  • Speech Recognition from links

    Speech Recognition from links

    This isn't so much an issue, but more so a question that I was unable to find answers for on the regular forums. I was wondering if there was any way to use speech_recognition on a link? What I mean is, if I was to give the API this link: https://www.google.com/recaptcha/api2/payload/audio.mp3?p=06AD1IbLD6eKKII77YoDkinG3lWV0q-eU3T4-3GBEb2xj9V0ZFBbNNCew_GRjMw5sOEd5RDEVbrub-V3v7i6YD6xiK1DGnXWORPTb76o3vrSlc9OcIhd0ENIhnubSq-9HMm69JnsJ_RKZlesGxJWbm3WLe67vojeFlcmiFOFBk_jP4xKmLvx1gK_3lCyN7ByN-1rdoVNkHEC&k=6LeTnxkTAAAAAN9QEuDZRpn90WwKk_R1TRW_g-JC How could I run the speech_recognition software on it. If it is not possible with this api, could someone guide me to the api necessary to do this. I'd prefer it if I didn't have to download the .wav itself, as it comes in a .mp3 format and it would require me to transform it to .wav.

    System information

    (Delete all the statements that don't apply.)

    My system is <Ubuntu 22.04 LTs x64>.

    My Python version is <3.10.6>

    My Pip version is <22.0.2>

    My SpeechRecognition library version is <3.9.0>

    My PyAudio library version is <0.2.13> My microphones are: (partial output) 'HDA Intel PCH: ALC257 Analog (hw:0,0)', 'HDA Intel PCH: HDMI 0 (hw:0,3)', 'HDA Intel PCH: HDMI 1 (hw:0,7)', 'HDA Intel PCH: HDMI 2 (hw:0,8)', 'HDA Intel PCH: HDMI 3 (hw:0,9)', 'HDA Intel PCH: HDMI 4 (hw:0,10)', 'HDA Intel PCH: HDMI 5 (hw:0,11)', 'HDA Intel PCH: HDMI 6 (hw:0,12)', 'HDA Intel PCH: HDMI 7 (hw:0,13)', 'HDA Intel PCH: HDMI 8 (hw:0,14)', 'HDA Intel PCH: HDMI 9 (hw:0,15)', 'HDA Intel PCH: HDMI 10 (hw:0,16)', 'HDA Intel PCH: HDMI 11 (hw:0,17)', 'sysdefault', 'front', 'surround40', 'surround51', 'surround71', 'hdmi', 'samplerate', 'speexrate', 'pulse', 'upmix', 'vdownmix', 'dmix', 'default']

    My working microphones are: {0: 'HDA Intel PCH: ALC257 Analog (hw:0,0)', 13: 'sysdefault', 19: 'samplerate', 20: 'speexrate', 21: 'pulse', 25: 'default'}

    I installed PocketSphinx from

    opened by frankvp11 0
  • Loud audio

    Loud audio "noise spike" at the end of an audio play in a python script - driving me crazy :)

    I am getting a quite loud audio spike / noise sound at the end of an audio play in python when using the following script: https://gist.github.com/bbence84/04d1935a92a4b4ab3d44bf182ea4bcc1 Wav to be played, but it does it with any other wav or mp3 file: https://easyupload.io/vbat12

    It uses the speech_recognition package and the preferredsoundplayer package, but also the VLC python package shows the same when playing the audio.

    When changing the audio play to be non-blocking for the python script, the issue goes away... Very strange. 

    This is driviging me nuts.

    I'm on Armbian 22.11.1 Jammy / 22.04.1 LTS, kernel 5.15.80-sunxi. Orange Pi Zero LTS.

    Thanks for any ideas!

    Bence

    opened by bbence84 0
  • recognize_google_cloud or recognize_google?

    recognize_google_cloud or recognize_google?

    Sorry, it's maybe just me but even after reading couple of issues and the docs, I am not sure what the difference and pros and cons of these different recognize methods? Is e.g. recognize_google_cloud better / faster / more accurate than the "legacy" recognize_google? Which one should I use? Thanks! :)

    opened by bbence84 0
Releases(3.8.1)
  • 3.8.1(Dec 5, 2017)

    Lots of changes since June! Summary below. Get all of these and more with a quick pip install --upgrade SpeechRecognition.

    • Snowboy hotwords support for highly efficient, performant listening (thanks @beeedy!). This is implemented as the snowboy_configuration parameter of recognizer_instance.listen.
    • Configurable Pocketsphinx models - you can now specify your own acoustic parameters, language model, and phoneme dictionary, using the language parameter of recognizer_instance.recognize_sphinx (thanks @frawau!).
    • audio_data_instance.get_segment(start_ms=None, end_ms=None) is a new method that can be called on any AudioData instance to get a segment of the audio starting at start_ms and ending at end_ms. This is really useful when you want to get, say, only the first five seconds of some audio.
    • The stopper function returned by listen_in_background now accepts one parameter, wait_for_stop (defaulting to True for backwards compatibility), which determines whether the function will wait for the background thread to fully shutdown before returning. One advantage is that if wait_for_stop is False, you can call the stopper function from any thread!
    • New example, demonstrating how to simultaneously listen to and recognize speech with the threaded producer/consumer pattern: threaded_workers.py.
    • Various improvements and bugfixes:
      • Python 3 style type annotations in library documentation.
      • recognize_google_cloud now uses the v1 rather than the beta API (thanks @oort7!).
      • recognize_google_cloud now returns timestamp info when the show_all parameter is True.
      • recognize_bing won't time out as often on credential requests, due to a longer default timeout.
      • recognize_google_cloud timeouts respect recognizer_instance.operation_timeout now (thanks @reefactor!).
      • Any recognizers using FLAC audio were broken inside Linux on Docker - this is now fixed (thanks @reefactor!).
      • Various documentation and lint fixes (thanks @josh-hernandez-exe!).
      • Lots of small build system improvements.
    Source code(tar.gz)
    Source code(zip)
  • 3.7.1(Jun 27, 2017)

    As usual, get it with pip install --upgrade SpeechRecognition

    • New grammar parameter for recognizer_instance.recognize_sphinx - now, you can specify a JSGF or FSG grammar to PocketSphinx (thanks @aleneum!).
    • Update PyAudio to version 0.2.11 - this fixes a couple memory management issues users have been experiencing.
    • Update FLAC to 1.3.2 on all platforms - this will make it easier to support more audio formats in the near future.
    • Fixes for various APIs on Python 3.6+ - small changes in urllib.request behavior made requests fail in certain situations.
    • Fixes for Bing Speech API timing out due to some backwards incompatible changes to their API.
    • Restore original IBM audio segmentation behaviour - previously, it would stop recognizing after the first pause. Now, it will recognize all speech in the input audio, as it did before IBM's changes.
    • Fix links in PocketSphinx docs and library reference. Add-on language models now available from Google Drive, including the now-officially-supported Italian model.
    • New troubleshooting entries for JACK server in README.
    • Documentation and build process updates.
    Source code(tar.gz)
    Source code(zip)
  • 3.6.5(Apr 13, 2017)

  • 3.6.4(Apr 13, 2017)

    Bugfix release!

    • Fix tempfile.NamedTemporaryFile on Windows, by replacing it with a PortableNamedTemporaryFile class. Previously, it didn't necessarily support the file being re-opened after originally opened.
    • Documentation/troubleshooting improvements (thanks @hassanmian!).
    • Add support for 24-bit FLAC audio files (thanks @sudevschiz!).
    • Fix phrase_time_limit being ignored for listen_in_background (thanks @dodysw!)
    • Added lots of new audio regression tests.
    • Code cleanup for tests and examples.
    Source code(tar.gz)
    Source code(zip)
  • 3.6.3(Mar 11, 2017)

    Small bugfix release:

    • Handle case when GSR doesn't return a confidence value (thanks @jcsilva!).
    • Config, style, and release improvements.
    • Fix console window sometimes popping up when on Windows (thanks @Qdrew!)
    • Switch release over to universal Wheels rather than source distribution.
    Source code(tar.gz)
    Source code(zip)
  • 3.6.0(Jan 7, 2017)

    This is more of a maintenance release, but a few features slipped in as well:

    • Support for the Google Cloud Speech API with recognizer_instance.recognize_google_cloud (thanks @Thynix!), plus documentation and examples.
    • Automatic sample rate detection in speech_recognition.Microphone - this should fully resolve all the "Invalid sample rate" issues from PyAudio.
    • Project now has automated tests and continuous integration with TravisCI. It's pretty nifty, and has already caught a few things during development!
    • Keywords example for recognizer_instance.recognize_sphinx.
    • Documentation improvements and updated advice in troubleshooting and library reference.
    • Bugfix - Google Speech Recognition sometimes didn't return the text with the highest confidence (thanks @akabraham!).
    • Bugfix - EOFError upon encountering malformed audio files; a proper exception message is now given.
    • Updated FLAC binaries for OS X.
    • Bugfix - invalid FLAC binary path on OS X (thanks @akabraham!).
    • Code cleanup.
    Source code(tar.gz)
    Source code(zip)
  • 3.5.0(Nov 21, 2016)

    • Support for the Houndify API with recognizer_instance.recognize_houndify (thanks @tb0hdan!).
    • recognize_sphinx now supports keyword-based matching via the keywords=[("cat", 30), ("potato", 45)] parameter.
      • The second number in each pair is the sensitivity, which determines how loosely Sphinx will interpret speech to be those keywords - higher numbers mean more false positives, while lower numbers mean a lower detection rate.
      • A new example for keyword matching is now available.
    • BREAKING CHANGE: API.AI STT API IS BEING SHUT DOWN SOON. (source)
      • For now, the recognize_api function will keep working if you're on a paid API.AI plan, and we will not be removing it until the service is shut down entirely.
      • It is best to transition to another backend as soon as possible. I recommend Microsoft Bing Voice Recognition or Wit.ai for previous API.AI users.
    • phrase_time_limit option for listening functions, to limit phrase lengths to a certain number of seconds.
    • Support for operation timeouts with recognizer_instance.operation_timeout - this can be used to ensure long requests always take finite time.
    • recognize_ibm now opts out of request logging by default, for improved user privacy (thanks @michellemorales!). This is a breaking change if you previously relied on request logging behaviour.
    • Bugfix - listen() sometimes didn't terminate on finite-length streams.
    • Bugfix - Microsoft Bing Voice Recognition changed their authentication API endpoint, so that required some small code updates (thanks @tmator!).
    • Bugfix - 24-bit audio now works correctly on Python 2.
    • Update Wit.ai API version from deprecated version.
    • A bunch of documentation updates, fixes, and improvements.
    Source code(tar.gz)
    Source code(zip)
  • 3.4.6(May 22, 2016)

    Bugfix release.

    Changes:

    • api.ai now requires the sessionId field, so we'll just add that in (thanks @jhoelzl!).
    • Improve documentation a bit.
    • Various other small fixes.
    Source code(tar.gz)
    Source code(zip)
  • 3.4.5(May 11, 2016)

    Changes:

    • Bug fix: non-24-bit audio wasn't converted properly to 16-bit audio on Python 2, due to the new 24-bit audio shim. Thanks to @jhoelzl for reporting!
    Source code(tar.gz)
    Source code(zip)
  • 3.4.4(May 10, 2016)

    Maintenance release:

    • Python versions less than 3.4 don't support 24-bit audio properly. We now have pure-Python shims that will allow 24-bit audio to work on those old Python versions, though they will be somewhat slower. Thanks to @danse for reporting the issue!
    • Added updated Pocketsphinx binaries and Pocketsphinx installation procedures to match improvements on their end.
    • Fix Unicode file paths on Windows.
    • Fix caching in recognizer_instance.recognize_bing.
    • We now use the Manylinux Docker image for building FLAC. Hopefully, this will make building universal Linux binaries easier for packagers.
    Source code(tar.gz)
    Source code(zip)
  • 3.4.3(Apr 9, 2016)

    Bugfix release:

    • Thanks to @jhoelzl, api.ai language support works again for non-English languages.

    We're now GPG signing all our release tags. Under the releases page, you should see the following:

    Signature screenshot

    This tells you that GitHub thinks the Git tag is the same as the one we intended to release.

    This key can also be found on the SKS keyservers, and you can import it with the following command:

    gpg --keyserver x-hkp://pool.sks-keyservers.net --recv-keys 0x5F56B350
    

    The packages on PyPI are signed as well - the signature can be downloaded under the "pgp" link on the SpeechRecognition PyPI page.

    Source code(tar.gz)
    Source code(zip)
  • 3.4.2(Apr 4, 2016)

    Quick bugfix release on the tails of yesterday's big one:

    • Add support for the monotonic library on Python 2 - if you have monotonic installed in Python 2, recognize_bing will work faster!
      • On Python 3, recognize_bing already does the things that would make it fast, so the library is unnecessary.
    • Fix loading of non-16-bit AIFF files on Python 2.
    • Better document the Pocketsphinx language pack installation.
    Source code(tar.gz)
    Source code(zip)
  • 3.4.1(Apr 3, 2016)

    Changes:

    • BREAKING CHANGE: AT&T STT API IS BEING SHUT DOWN SOON. (source)
      • For now, the recognize_att function will keep working, until the API itself is shut down.
      • It is best to transition over to IBM, Wit.ai, Google, CMU Sphinx, Bing Voice, or api.ai as soon as possible.
      • In most cases, you can simply rename recognize_att to a different service like recognize_ibm, then generate new API keys/tokens for it.
    • DEPRECATED CLASS: WavFile has been renamed to AudioFile.
      • WavFile will continue to work for the foreseeable future. New code should use AudioFile.
      • AudioFile is the same as WavFile, but in addition to WAV, it also supports AIFF and FLAC files!
    • New api.ai support, courtesy of @sbraden! See recognize_api in the library reference.
    • New Microsoft Bing Voice Recognition API support! See recognize_bing in the library reference.
    • Support for 8-bit unsigned WAV audio (thanks to @zhaoqf123 for reporting!).
    • Faster, upgraded FLAC binaries, with Linux binaries using Holy Build Box for maximum distro compatibility..
    • Updated setup process for Wit.ai.
    • Update phrase retrieval for recognize_ibm, courtesy of Bhavik Shah from IBM.
    • Documentation improvements and code cleanup.
    • Clearer licensing information - see the README.

    As always, you can upgrade with pip install --upgrade speechrecognition.

    Source code(tar.gz)
    Source code(zip)
  • 3.3.3(Mar 5, 2016)

  • 3.3.2(Mar 4, 2016)

    Bugfix release!

    • Fix exception_on_overflow shenanigans. This version will eliminate those pesky ValueErrors.
    • The overflow error should well and truly be gone now.

    Special thanks to @michaelpri10 for reporting the exception_on_overflow bug.

    Source code(tar.gz)
    Source code(zip)
  • 3.3.1(Feb 26, 2016)

  • 3.3.0(Feb 20, 2016)

    Major changes since 3.2.1:

    • Possible backward incompatibility: if PyAudio is not installed, Microphone now throws an AttributeError when created rather than not being defined.
      • This only requires changes if you are explicitly testing for the existance of the class, using something like hasattr or getattr.
    • More robust error handling - always clean up PyAudio resources, no matter what error conditions arise.
    • Better error checking - always verify PyAudio version.
    Source code(tar.gz)
    Source code(zip)
  • 3.2.1(Feb 20, 2016)

  • 3.2.0(Feb 20, 2016)

    Major changes since 3.1.3:

    • Support for recognition using CMU Sphinx - do speech recognition while offline!
      • English supported out of the box; French and Mandarin available for download. See the README for details.
    • Automatic sample rate/sample width conversions; users shouldn't have to worry about audio formats at all.
    • Lots of documentation improvements.
    Source code(tar.gz)
    Source code(zip)
  • 3.1.3(Nov 5, 2015)

  • 3.1.2(Nov 2, 2015)

    Changes since 3.1.0:

    • Update documentation to account for new releases of Python, PyInstaller, and PyAudio.
      • The new PyAudio version fixes an obscure overflow issue and also makes installation much easier on all platforms.
      • New documentation for the updated PyAudio installation process.
    • General documentation improvements.
    • Handle errors better and add additional error checks.
    Source code(tar.gz)
    Source code(zip)
  • 3.1.0(Nov 2, 2015)

  • 3.0.0(Sep 1, 2015)

    • MULTIPLE SERVICE SUPPORT! Now you can use Google Speech Recognition, Wit.ai, or IBM Speech to Text to obtain the recognition results.
    • Filtering for clicks and pops. This drastically reduces the number of false positives in phrase recognition.
    • Better usage examples - ready-to-run and better organized.

    The API has also changed somewhat. Here's a quick upgrade guide:

    • speech_recognition.Recognizer(language = "en-US", key = "AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw") changed to speech_recognition.Recognizer().
      • The key and language are now specified in the speech_recognition.recognize_* functions instead.
    • For recognizer_instance.listen, speech_recognition.WaitTimeoutError exceptions are thrown rather than OSError exceptions upon timeout.
    • The stop function returned by recognizer_instance.listen_in_background now blocks until the background listener actually stops before returning.
    • recognizer_instance.recognize(audio_data, show_all = False) has changed to recognizer_instance.recognize_google(audio_data, key = None, language = "en-US", show_all = False).
      • Additionally, when show_all is set, the return value is the raw result from the API call, rather than a list of predictions and their confidences.
      • Note that we specify the key and language here now rather than in the speech_recognition.Recognizer() constructor.
      • speech_recognition.UnknownValueError is now thrown instead of LookupError when speech is unintelligible, and speech_recognition.RequestError is now thrown instead of IndexError or KeyError when recognition fails.
    • Added recognizer_instance.recognize_wit and recognizer_instance.recognize_ibm for recognizing with Wit.ai or IBM Speech to Text.

    To download, go to the PyPI page!

    Source code(tar.gz)
    Source code(zip)
Owner
Anthony Zhang
Co-founder at Hypotenuse Labs. Formerly: CS @ uWaterloo
Anthony Zhang
Carnatic Notes Predictor for audio files

Carnatic Notes Predictor for audio files Link for live application: https://share.streamlit.io/pradeepak1/carnatic-notes-predictor-for-audio-files/mai

1 Nov 06, 2021
pyo is a Python module written in C to help digital signal processing script creation.

pyo is a Python module written in C to help digital signal processing script creation.

Olivier Bélanger 1.1k Jan 01, 2023
Open-Source bot to play songs in your Telegram's Group Voice Chat. Powered by @Akki_ThePro

VcPlayer Telegram Voice-Chat Bot [PyTGCalls] ⇝ Requirements ⇜ Account requirements A Telegram account to use as the music bot, You cannot use regular

Akki ThePro 2 Dec 25, 2021
This is an OverPowered Vc Music Player! Will work for you and play music in Voice Chatz

VcPlayer This is an OverPowered Vc Music Player! Will work for you and play music in Voice Chatz Telegram Voice-Chat Bot [PyTGCalls] ⇝ Requirements ⇜

1 Dec 20, 2021
Reading list for research topics in sound event detection

Sound event detection aims at processing the continuous acoustic signal and converting it into symbolic descriptions of the corresponding sound events present at the auditory scene.

Soham 64 Jan 05, 2023
Audio library for modelling loudness

Loudness Loudness is a C++ library with Python bindings for modelling perceived loudness. The library consists of processing modules which can be casc

Dominic Ward 33 Oct 02, 2022
🎵 A music bot for discord servers!

music bot A music bot for Discord Servers Features Play songs in your discord server Get the lyrics without going on a web explorer Commands Command P

1 Jul 25, 2022
Python library for audio and music analysis

librosa A python package for music and audio analysis. Documentation See https://librosa.org/doc/ for a complete reference manual and introductory tut

librosa 5.6k Jan 06, 2023
Some utils for auto speech recognition

About Some utils for auto speech recognition. Utils Util Description Script Reset audio Reset sample rate, sample width, etc of audios.

1 Jan 24, 2022
A voice control utility for Spotify

Spotify Voice Control A voice control utility for Spotify · Report Bug · Request

Shoubhit Dash 27 Jan 01, 2023
live coding in python + supercollider

live coding in python + supercollider

Zack 6 Feb 06, 2022
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

A Python library for audio feature extraction, classification, segmentation and applications This doc contains general info. Click here for the comple

Theodoros Giannakopoulos 5.1k Jan 02, 2023
MUSIC-AVQA, CVPR2022 (ORAL)

Audio-Visual Question Answering (AVQA) PyTorch code accompanies our CVPR 2022 paper: Learning to Answer Questions in Dynamic Audio-Visual Scenarios (O

44 Dec 23, 2022
A bot that can play music on Telegram Group and Channel Voice Chats

DaisyXmusic ❤ is the best and only Telegram VC player with playlists, Multi Playback, Channel play and more

TeamOfDaisyX 20 Jun 11, 2021
nicfit 425 Jan 01, 2023
Synchronize a local directory of songs' (MP3, MP4) metadata (genre, ratings) and playlists with a Plex server.

PlexMusicSync Synchronize a local directory of songs' (MP3, MP4) metadata (genre, ratings) and playlists (m3u, m3u8) with a Plex server. The song file

Tom Goetz 9 Jul 07, 2022
An audio digital processing toolbox based on a workflow/pipeline principle

AudioTK Audio ToolKit is a set of audio filters. It helps assembling workflows for specific audio processing workloads. The audio workflow is split in

Matthieu Brucher 238 Oct 18, 2022
Codes for "Efficient Long-Range Attention Network for Image Super-resolution"

ELAN Codes for "Efficient Long-Range Attention Network for Image Super-resolution", arxiv link. Dependencies & Installation Please refer to the follow

xindong zhang 124 Dec 22, 2022
Music Streaming Platform based on full implementation of DBSM

Symphony Music Streaming Platform based on full implementation of DBSM List of Commands Insert User (INSERT) Function to implement input in USER Get a

Parth Maradia 1 Nov 12, 2021
This is an AI that runs in the terminal. It is a voice assistant that can do common activities and can also help in your coding doubts like

This is an AI that runs in the terminal. It is a voice assistant that can do common activities and can also help in your coding doubts like

OneBit 1 Nov 05, 2021