Python library for interactive topic model visualization. Port of the R LDAvis package.

Overview

pyLDAvis

Python library for interactive topic model visualization. This is a port of the fabulous R package by Carson Sievert and Kenny Shirley.

LDAvis icon

pyLDAvis is designed to help users interpret the topics in a topic model that has been fit to a corpus of text data. The package extracts information from a fitted LDA topic model to inform an interactive web-based visualization.

The visualization is intended to be used within an IPython notebook but can also be saved to a stand-alone HTML file for easy sharing.

Note: LDA stands for latent Dirichlet allocation.

version status build status docs

Installation

  • Stable version using pip:
pip install pyldavis
  • Development version on GitHub

Clone the repository and run python setup.py

Usage

The best way to learn how to use pyLDAvis is to see it in action. Check out this notebook for an overview. Refer to the documentation for details.

For a concise explanation of the visualization see this vignette from the LDAvis R package.

Video demos

Ben Mabey walked through the visualization in this short talk using a Hacker News corpus:

Carson Sievert created a video demoing the R package. The visualization is the same and so it applies equally to pyLDAvis:

More documentation

To read about the methodology behind pyLDAvis, see the original paper, which was presented at the 2014 ACL Workshop on Interactive Language Learning, Visualization, and Interfaces in Baltimore on June 27, 2014.

Comments
  • Support of the Hierarchical Dirichlet Process from Gensim.

    Support of the Hierarchical Dirichlet Process from Gensim.

    Hi,

    I was playing with HDP models and I wanted to visualise them with pyLDAvis. Unfortunately it wasn't natively supported, so I made the few fixes to make it.

    I am simply looking for attributes lda_alpha and lda_beta as they are specific to the HDP model. I also changed the __num_dist_rows__ function because even if the matrix was normalized, the doc_topic_dists made the asserts crying (maybe NaN values ?). I didn't look too much into it but this fix is working.

    I am not sure if the lda_beta parameter is exactly the same as state.get_lambda() parameter but it needed the topic-term distribution so I though it was ok...

    I am using it and it's working. Tell me if the PR seems right to you ! :)

    opened by bloody76 29
  • pyLDAvis.display() doesn't show anything

    pyLDAvis.display() doesn't show anything

    I run my commands in the IPython notebook. when I use command pyLDAvis.show(LDAvis_prepared),there's only a red out[], and doesn't show anything! wx20170728-155647 2x

    then I use command pyLDAvis.show(LDAvis_prepared), it shows the answer. wx20170728-160114 2x but, it advises me to use pyLDAvis.display() wx20170728-155720 2x

    opened by SiriusHsh 22
  • KeyError in gensim.prepare

    KeyError in gensim.prepare

    Hi there, I'm using gensim to do LDA on a collection of novels (using just 40 for testing, I have several hundreds). Building the corpus and dictionary seems to work fine, as does the modeling process itself. I can also inspect the resulting model (topics in documents and words in topics, for example). However, when attempting to use pyLDAvis, I run into a KeyError.

    I'm on Linux (Ubuntu 14.04) and using Python 3.4 and the following versions of relevant modules: pyLDAvis 1.2.0 numpy 1.9.2 gensim 0.11.1-1

    This is my code (loading corpus, dictionary and model from previous step):

    def gensim_output(modelfile, corpusfile, dictionaryfile): 
        """Displaying gensim topic models"""
        ## Load files from "gensim_modeling"
        corpus = corpora.MmCorpus(corpusfile)
        dictionary = corpora.Dictionary.load(dictionaryfile) # for pyLDAvis
        myldamodel = models.ldamodel.LdaModel.load(modelfile)    
    
        ## Interactive visualisation
        import pyLDAvis.gensim
        vis = pyLDAvis.gensim.prepare(myldamodel, corpus, dictionary)
        pyLDAvis.display(vis)
    

    This is the output I get:

    Traceback (most recent call last):
    
      File "<ipython-input-79-940daa51d8a9>", line 1, in <module>
        runfile('/home/[PATH]/an5/mygensim.py', wdir='/home/christof/Dropbox/0-Analysen/2015/rp_Sydney/an5')
    
      File "/usr/lib/python3/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 586, in runfile
        execfile(filename, namespace)
    
      File "/usr/lib/python3/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 48, in execfile
        exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
    
      File "/home/[PATH]/an5/mygensim.py", line 84, in <module>
        main("./5_lemmata/*.txt", "gensim_corpus.dict", "gensim_corpus.mm", "gensim_modelfile.gensim")
    
      File "/home/[PATH]/an5/mygensim.py", line 82, in main
        gensim_output(modelfile, corpusfile, dictionaryfile)
    
      File "/home/[PATH]/an5/mygensim.py", line 75, in gensim_output
        vis = pyLDAvis.gensim.prepare(myldamodel, corpus, dictionary)
    
      File "/usr/local/lib/python3.4/dist-packages/pyLDAvis/gensim.py", line 61, in prepare
        return vis_prepare(**_extract_data(topic_model, corpus, dictionary))
    
      File "/usr/local/lib/python3.4/dist-packages/pyLDAvis/gensim.py", line 24, in _extract_data
        term_freqs = [term_freqs_dict[id] for id in xrange(N)]
    
      File "/usr/local/lib/python3.4/dist-packages/pyLDAvis/gensim.py", line 24, in <listcomp>
        term_freqs = [term_freqs_dict[id] for id in xrange(N)]
    
    KeyError: 6
    

    Not sure whether this is a bug or bad usage of the module. Any help would be very much appreciated.

    bug 
    opened by christofs 21
  • Gensim Prepare

    Gensim Prepare

    Preparing a gensim lda model does not work for me (Linux, Python 3.4) because of the following error: `` pyLDAvis.gensim.prepare(lda, corpus, dictionary)

    TypeError Traceback (most recent call last) in () ----> 1 pyLDAvis.gensim.prepare(lda, corpus, dictionary)

    /home/methodds/anaconda3/lib/python3.4/site-packages/pyLDAvis/gensim.py in prepare(topic_model, corpus, dictionary, **kargs) 64 http://nbviewer.ipython.org/github/bmabey/pyLDAvis/blob/master/notebooks/Gensim%20Newsgroup.ipynb 65 """ ---> 66 opts = fp.merge(_extract_data(topic_model, corpus, dictionary), kargs) 67 return vis_prepare(**opts)

    /home/methodds/anaconda3/lib/python3.4/site-packages/pyLDAvis/gensim.py in _extract_data(topic_model, corpus, dictionary) 30 31 topics = topic_model.show_topics(formatted=False, num_words=len(vocab), num_topics=topic_model.num_topics) ---> 32 topics_df = pd.DataFrame([dict((y,x) for x, y in tuples) for tuples in topics])[vocab] 33 topic_term_dists = topics_df.values 34

    /home/methodds/anaconda3/lib/python3.4/site-packages/pyLDAvis/gensim.py in (.0) 30 31 topics = topic_model.show_topics(formatted=False, num_words=len(vocab), num_topics=topic_model.num_topics) ---> 32 topics_df = pd.DataFrame([dict((y,x) for x, y in tuples) for tuples in topics])[vocab] 33 topic_term_dists = topics_df.values 34

    /home/methodds/anaconda3/lib/python3.4/site-packages/pyLDAvis/gensim.py in (.0) 30 31 topics = topic_model.show_topics(formatted=False, num_words=len(vocab), num_topics=topic_model.num_topics) ---> 32 topics_df = pd.DataFrame([dict((y,x) for x, y in tuples) for tuples in topics])[vocab] 33 topic_term_dists = topics_df.values 34

    TypeError: 'int' object is not iterable `` Any idea what is going on here?

    opened by cschwem2er 16
  • Y tick labels not displaying in jupyter notebook for term frequency

    Y tick labels not displaying in jupyter notebook for term frequency

    Hello,

    I am running into a visualization issue when running pyLDAvis.display() with any lda visualization from pyLDAvis.gensim.prepare().

    Here is an example output from running this notebook http://nbviewer.ipython.org/github/bmabey/pyLDAvis/blob/master/notebooks/Gensim%20Newsgroup.ipynb

    I have installed pyLDAvis 3.2.0 via pip. My OS is MacOS Big Sur v 11.1 and I am running this on python 3.8.5.

    If I can provide any additional details to help please let me know!

    image

    opened by azespinoza 15
  • RuntimeWarning: divide by zero encountered in log

    RuntimeWarning: divide by zero encountered in log

    When passing a GSDMM short text clustering model to pyLDAvis for visualisation, I sometimes get 'divide by zero' warnings even though the visualisation is created successfully. How can these be resolved? Is it because of a small corpus? I am usually building these models on around 100 documents containing 10-15 tokens each. Screenshot attached, would appreciate help on this!

    I am using Python 3.7 on MacOS Catalina version 10.15.3.

    Screenshot 2020-09-15 at 16 18 29 (2)

    opened by kruttikanadig 13
  • Remove dependencies on scikit-bio

    Remove dependencies on scikit-bio

    I propose to remove the dependencies on scikit-bio.

    scikit-bio has recently undergo incompatible changes to the API, especially with regards to the pcoa() function. In addition, related to issue 57, it is still incompatible with Windows machines.

    After going through the codes, I see that only the pcoa() and DistanceMatrix() functions are used from the scikit-bio package. These can be reimplemented with functions from scikit-learn only. Given the maturity of the sklearn package, it should be a good idea.

    I can try to implement these portions if necessary.

    opened by yxtay 12
  • support for manual topic tagging

    support for manual topic tagging

    Enhancement for https://github.com/bmabey/pyLDAvis/issues/89

    Please review.

    Features:

    • Init Button: Initial the topics with default valuues. eg: topic-0
    • Load Button: Loads a json file with array of strings of size K
    • Save Button: Downloads the current topic modelling (array of string of size K) to a json file.
    opened by Mageswaran1989 11
  • `pyLDAvis.gensim` needs to be imported explicitly

    `pyLDAvis.gensim` needs to be imported explicitly

    I keep getting AttributeError when attempting to run:

    import pyLDAvis
    
    # ... creating LDA model, corpus, and dictionary in here
    pyLDAvis.gensim.prepare(ldamodel, corpus, dictionary)
    

    Here's the traceback:

    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-21-950ae09ed90b> in <module>()
    ----> 1 pyLDAvis.gensim.prepare(ldamodel, corpus, dictionary)
    
    AttributeError: module 'pyLDAvis' has no attribute 'gensim'
    

    However, the code runs fine when importing gensim explicitly:

    from pyLDAvis import gensim
    
    # ... same as above
    gensim.prepare(ldamodel, corpus, dictionary)
    

    Unsure why that would be the case? Is it possible that gensim would need to be added to __init__.py?

    opened by martin-martin 10
  • IPython is not visualizing the gensim model.

    IPython is not visualizing the gensim model.

    First and foremost wanted to thank everyone for helping me get this far. I am able to generate a gensim model, run it in IPython notebook, and get to see some results - but not the beautiful graphic we all were hoping for. I'm running WinPython 3.4 QT5 (latest I believe) and I installed both the genism and pyLDAvis also today, so everything is fresh. Here is what my output looks like:

    In[9]: pyLDAvis.enable_notebook() In[10]: pyLDAvis.gensim.prepare(lda, corpus, dictionary) C:\WinPython\python-3.4.3.amd64\lib\site-packages\skbio\stats\ordination_principal_coordinate_analysis.py:109: RuntimeWarning: The result contains negative eigenvalues. Please compare their magnitude with the magnitude of some of the largest positive eigenvalues. If the negative ones are smaller, it's probably safe to ignore them, but if they are large in magnitude, the results won't be useful. See the Notes section for more details. The smallest eigenvalue is -0.009952346420900118 and the largest is 0.034359155356682575. RuntimeWarning Out[10]: PreparedData(topic_coordinates= Freq cluster topics x y topic
    24 20.249823 1 1 0.055595 0.006318 3 18.859849 1 2 0.003016 0.038028 17 16.519686 1 3 -0.117297 0.009020 18 9.578099 1 4 -0.014738 0.006581 .... 13 0.000586 1 24 0.003268 0.007722 9 0.000586 1 25 -0.008638 -0.009829, topic_info= Category Freq Term Total loglift logprob 2300 Default 1280.000000 gladia 1280 30.0000 30.0000 5920 Default 984.000000 giskard 984 29.0000 29.0000 1512 Default 676.000000 amadiro 676 28.0000 28.0000 ... 2252 Topic25 0.000562 anacreon 117 -0.3992 -6.2565 9440 Topic25 0.000626 madam 372 -1.5745 -6.2751

    [1929 rows x 6 columns], token_table= Topic Freq Term term
    3268 1 0.181818 abilities 3268 2 0.318182 abilities ... 1155 10 0.019608 york

    [1984 rows x 3 columns], R=30, lambda_step=0.01, plot_opts={'ylab': 'PC2', 'xlab': 'PC1'}, topic_order=[25, 4, 18, 19, 21, 11, 23, 17, 22, 8, 12, 24, 15, 13, 2, 7, 20, 9, 16, 3, 6, 5, 1, 14, 10])

    question wontfix 
    opened by Grinshpun 10
  • Issue #120: Updated ldavis.js to be compatible with latest d3.v5

    Issue #120: Updated ldavis.js to be compatible with latest d3.v5

    CHANGED scale.linear(...) TO scaleLinear(...)

    CHANGED .axis(...).scale(...).orient(...) TO axisLeft (y)(...), axisTop(...) (x) or axisBottom(...) (scaleX)

    CHANGED d3.scaleordinal(...).domain(...).rangeRoundBands(...) TO d3.scaleBand().domain(...).rangeBands(...).padding(...)

    CHANGED y.rangeBand(...) TO y.bandwidth(...)

    REPLACED d3.v3.js with d3.v5.js

    ADDED: optional function parameters that let user specify primary (color1) and secondary (color2) colors. (Default colors are originals)

    This should also be compatible with d3.v4 but I havent tested it yet.

    opened by dtemkin 9
  • fixes error of get_feature_names removal

    fixes error of get_feature_names removal

    Error when using scikit-learn >= 1.2.0

    pyLDAvis.sklearn.prepare raises an error due to a missing method get_feature_names() for the vectorizer argument.

    AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names'

    Using the documentation of sklearn.feature_extraction.text.CountVectorizer as an example. It can be seen this function was deprecated in 1.0 docs, and removed in 1.2 docs. The same is true for the other vectorizer that can be used TfidfVectorizer.

    The recommendation in those docs is to use get_feature_names_out() as a replacement.

    Instead of returning a list of feature names, this now returns an ndarray of them. Though both being iterable types it makes no difference for the use case, where reference is only required to array-like.

    This fix would also be backwards compatible to at least scikit-learn 1.0.

    Tested on a fresh conda environment with Python==3.10.8, and gives expected behaviour.

    opened by David-Moody 0
  • `tsne` won't work with `sklearn.prepare`

    `tsne` won't work with `sklearn.prepare`

    When running the same steps with the same data on colab I get pretty good results with tsne, but locally (probably because of the Python version) I'm not able to run pyLDAvis.sklearn.prepare as I get ValueError: perplexity must be less than n_samples.

    I know that colab is running 3.7 and locally I got 3.10 . I also know that both use pyLDAvis version 3.3.1 so its probably broken because of a Scikit update.

    I was able to get it to work by manually setting a perplexity value in the TSNE object initialization under pyLDAvis/_prepare.py but it sure isn´t optimal.

    Error log
    Traceback (most recent call last):
      File "/home/isinyaaa/projects/foss-gpgpu-stack/analyze.py", line 418, in <module>
        main(args)
      File "/home/isinyaaa/projects/foss-gpgpu-stack/analyze.py", line 347, in main
        args.workers).run(vectorizer, processed_data)
      File "/home/isinyaaa/projects/foss-gpgpu-stack/analyze.py", line 128, in run
        self.save_result_as_html(model, data, vectorizer)
      File "/home/isinyaaa/projects/foss-gpgpu-stack/analyze.py", line 148, in save_result_as_html
        super().save_result_as_html(prepare, model, data, vectorizer, mds='tsne')
      File "/home/isinyaaa/projects/foss-gpgpu-stack/analyze.py", line 111, in save_result_as_html
        LDAvis_prepared = prepare(*args, **kwargs)
      File "/home/isinyaaa/.local/lib/python3.10/site-packages/pyLDAvis/sklearn.py", line 95, in prepare
        return pyLDAvis.prepare(**opts)
      File "/home/isinyaaa/.local/lib/python3.10/site-packages/pyLDAvis/_prepare.py", line 443, in prepare
        topic_coordinates = _topic_coordinates(mds, topic_term_dists, topic_proportion, start_index)
      File "/home/isinyaaa/.local/lib/python3.10/site-packages/pyLDAvis/_prepare.py", line 192, in _topic_coordinates
        mds_res = mds(topic_term_dists)
      File "/home/isinyaaa/.local/lib/python3.10/site-packages/pyLDAvis/_prepare.py", line 167, in js_TSNE
        return model.fit_transform(dist_matrix)
      File "/home/isinyaaa/.local/lib/python3.10/site-packages/sklearn/manifold/_t_sne.py", line 1122, in fit_transform
        self._check_params_vs_input(X)
      File "/home/isinyaaa/.local/lib/python3.10/site-packages/sklearn/manifold/_t_sne.py", line 793, in _check_params_vs_input
        raise ValueError("perplexity must be less than n_samples")
    ValueError: perplexity must be less than n_samples
    
    opened by isinyaaa 0
  • Bump joblib from 1.0.1 to 1.2.0

    Bump joblib from 1.0.1 to 1.2.0

    Bumps joblib from 1.0.1 to 1.2.0.

    Changelog

    Sourced from joblib's changelog.

    Release 1.2.0

    • Fix a security issue where eval(pre_dispatch) could potentially run arbitrary code. Now only basic numerics are supported. joblib/joblib#1327

    • Make sure that joblib works even when multiprocessing is not available, for instance with Pyodide joblib/joblib#1256

    • Avoid unnecessary warnings when workers and main process delete the temporary memmap folder contents concurrently. joblib/joblib#1263

    • Fix memory alignment bug for pickles containing numpy arrays. This is especially important when loading the pickle with mmap_mode != None as the resulting numpy.memmap object would not be able to correct the misalignment without performing a memory copy. This bug would cause invalid computation and segmentation faults with native code that would directly access the underlying data buffer of a numpy array, for instance C/C++/Cython code compiled with older GCC versions or some old OpenBLAS written in platform specific assembly. joblib/joblib#1254

    • Vendor cloudpickle 2.2.0 which adds support for PyPy 3.8+.

    • Vendor loky 3.3.0 which fixes several bugs including:

      • robustly forcibly terminating worker processes in case of a crash (joblib/joblib#1269);

      • avoiding leaking worker processes in case of nested loky parallel calls;

      • reliability spawn the correct number of reusable workers.

    Release 1.1.0

    • Fix byte order inconsistency issue during deserialization using joblib.load in cross-endian environment: the numpy arrays are now always loaded to use the system byte order, independently of the byte order of the system that serialized the pickle. joblib/joblib#1181

    • Fix joblib.Memory bug with the ignore parameter when the cached function is a decorated function.

    ... (truncated)

    Commits
    • 5991350 Release 1.2.0
    • 3fa2188 MAINT cleanup numpy warnings related to np.matrix in tests (#1340)
    • cea26ff CI test the future loky-3.3.0 branch (#1338)
    • 8aca6f4 MAINT: remove pytest.warns(None) warnings in pytest 7 (#1264)
    • 067ed4f XFAIL test_child_raises_parent_exits_cleanly with multiprocessing (#1339)
    • ac4ebd5 MAINT add back pytest warnings plugin (#1337)
    • a23427d Test child raises parent exits cleanly more reliable on macos (#1335)
    • ac09691 [MAINT] various test updates (#1334)
    • 4a314b1 Vendor loky 3.2.0 (#1333)
    • bdf47e9 Make test_parallel_with_interactively_defined_functions_default_backend timeo...
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Video presentation not available anymore

    Video presentation not available anymore

    In pyLDAvis_overview.ipynb, when it says "from the original R project and this presentation (slides, video)", the link to the Youtube video doesn't work anymore :(

    Any backup?

    opened by raffaem 0
  • Fixing for small number of topics.

    Fixing for small number of topics.

    When number of topics is less than the default perplexity for TSNE, an error is thrown. This reduces perplexity to be always smaller than the number of topics.

    opened by jdagdelen 1
Releases(3.3.1)
Owner
Ben Mabey
Ben Mabey
A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram.

Basic-UI-for-GPT-J-6B-with-low-vram A repository to run GPT-J-6B on low vram systems by using both ram, vram and pinned memory. There seem to be some

90 Dec 25, 2022
Code examples for my Write Better Python Code series on YouTube.

Write Better Python Code This repository contains the code examples used in my Write Better Python Code series published on YouTube: https:/

858 Dec 29, 2022
customer care chatbot made with Rasa Open Source.

Customer Care Bot Customer care bot for ecomm company which can solve faq and chitchat with users, can contact directly to team. 🛠 Features Basic E-c

Dishant Gandhi 23 Oct 27, 2022
A deep learning-based translation library built on Huggingface transformers

DL Translate A deep learning-based translation library built on Huggingface transformers and Facebook's mBART-Large 💻 GitHub Repository 📚 Documentat

Xing Han Lu 244 Dec 30, 2022
A single model that parses Universal Dependencies across 75 languages.

A single model that parses Universal Dependencies across 75 languages. Given a sentence, jointly predicts part-of-speech tags, morphology tags, lemmas, and dependency trees.

Dan Kondratyuk 189 Nov 29, 2022
Neural network sequence labeling model

Sequence labeler This is a neural network sequence labeling system. Given a sequence of tokens, it will learn to assign labels to each token. Can be u

Marek Rei 250 Nov 03, 2022
The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)

Language Models are Few-shot Multilingual Learners Paper This is the source code of the paper [Arxiv] [ACL Anthology]: This code has been written usin

Genta Indra Winata 45 Nov 21, 2022
Global Rhythm Style Transfer Without Text Transcriptions

Global Prosody Style Transfer Without Text Transcriptions This repository provides a PyTorch implementation of AutoPST, which enables unsupervised glo

Kaizhi Qian 193 Dec 30, 2022
This library is testing the ethics of language models by using natural adversarial texts.

prompt2slip This library is testing the ethics of language models by using natural adversarial texts. This tool allows for short and simple code and v

9 Dec 28, 2021
Bu Chatbot, Konya Bilim Merkezi Yen için tasarlanmış olan bir projedir.

chatbot Bu Chatbot, Konya Bilim Merkezi Yeni Ufuklar Sergisi için 2021 Yılında tasarlanmış olan bir projedir. Chatbot Python ortamında yazılmıştır. Sö

Emre Özkul 1 Feb 23, 2022
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

VAENAR-TTS - PyTorch Implementation PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Keon Lee 67 Nov 14, 2022
lightweight, fast and robust columnar dataframe for data analytics with online update

streamdf Streamdf is a lightweight data frame library built on top of the dictionary of numpy array, developed for Kaggle's time-series code competiti

23 May 19, 2022
HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

HuggingSound HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools. I have no intention of building a very complex tool here.

Jonatas Grosman 247 Dec 26, 2022
This repository describes our reproducible framework for assessing self-supervised representation learning from speech

LeBenchmark: a reproducible framework for assessing SSL from speech Self-Supervised Learning (SSL) using huge unlabeled data has been successfully exp

49 Aug 24, 2022
Ray-based parallel data preprocessing for NLP and ML.

Wrangl Ray-based parallel data preprocessing for NLP and ML. pip install wrangl # for latest pip install git+https://github.com/vzhong/wrangl See exa

Victor Zhong 33 Dec 27, 2022
Repository for fine-tuning Transformers 🤗 based seq2seq speech models in JAX/Flax.

Seq2Seq Speech in JAX A JAX/Flax repository for combining a pre-trained speech encoder model (e.g. Wav2Vec2, HuBERT, WavLM) with a pre-trained text de

Sanchit Gandhi 21 Dec 14, 2022
nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

Bernhard Liebl 2 Jun 10, 2022
A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

List Of English Words A text file containing over 466k English words. While searching for a list of english words (for an auto-complete tutorial) I fo

dwyl 8.5k Jan 03, 2023
A simple word search made in python

Word Search Puzzle A simple word search made in python Usage $ python3 main.py -h usage: main.py [-h] [-c] [-f FILE] Generates a word s

Magoninho 16 Mar 10, 2022