An Explainable Leaderboard for NLP

Overview

ExplainaBoard: An Explainable Leaderboard for NLP

Introduction | Website | Download | Backend | Paper | Video | Bib

Introduction

ExplainaBoard is an interpretable, interactive and reliable leaderboard with seven (so far) new features (F) compared with generic leaderboard.

  • F1: Single-system Analysis: What is a system good or bad at?
  • F2: Pairwise Analysis: Where is one system better (worse) than another?
  • F3: Data Bias Analysis: What are the characteristics of different evaluated datasets?
  • F5: Common errors: What are common mistakes that top-5 systems made?
  • F6: Fine-grained errors: where will errors occur?
  • F7: System Combination: Is there potential complementarity between different systems?

Website

We deploy ExplainaBoard as a Web toolkit, which includes 9 NLP tasks, 40 datasets and 300 systems. Detailed information is as follows.

Task

Task Sub-task Dataset Model Attribute
Sentiment 8 40 2
Text Classification Topics 4 18 2
Intention 1 3 2
Text-Span Classification Aspect Sentiment 4 20 4
Text pair Classification NLI 2 6 7
NER 3 74 9
Sequence Labeling POS 3 14 4
Chunking 3 14 9
CWS 7 64 7
Structure Prediction Semantic Parsing 4 12 4
Text Generation Summarization 2 36 7

Download System Outputs

We haven't released datasets or corresponding system outputs that require licenses. But If you have licenses please fill in this form and we will send them to you privately. (Description of output's format can refer here If these system outputs are useful for you, you can cite our work.

Test Your Results

pip install -r requirements.txt

Description of Each Directory

  • task-[task_name]: fine-grained analysis for each task, aiming to generating fine-grained analysis results with the json format. For example, task-mlqa can calculate the fine-graied F1 scores for different systems, and output corresponding json files in task-mlqa/output/ .

  • meta-eval is a sort of controller, which can be used to start the fine-graind anlsysis of all tasks, and analyze output json files.

    • calculate fine-grained results for all tasks: ./meta-eval/run-allTasks.sh
        cd ./meta-eval/
        ./run-allTasks.sh
    • merge json files of all tasks into a csv file, which would be useful for further SQL import: ./meta-eval/genCSV/json2csv.py
        cd ./meta-eval/genCSV/json2csv.py
        python json2csv.py > explainabord.csv
  • src stores some auxiliary codes.

Submit Your Results

You can submit your system's output by this form following the format description.

Acknowledgement

We thanks all authors who share their system outputs with us: Ikuya Yamada, Stefan Schweter, Colin Raffel, Yang Liu, Li Dong. We also thank Vijay Viswanathan, Yiran Chen, Hiroaki Hayashi for useful discussion and feedback about ExplainaBoard.

Comments
  • Is the current applicable condition of t-test correct?

    Is the current applicable condition of t-test correct?

    opened by tetsuok 22
  • Allowed specification of the metric #dimensions

    Allowed specification of the metric #dimensions

    This PR loosens the restriction that sufficient statistics must be a vector, and allows them to be a tensor with the dimension equal to Metric.stats_ndim().

    It also demonstrates how this works on the NLGMetaEvaluation metric.

    @pfliu-nlp and @odashi : could you please check this PR as a potential solution to the discussion in https://github.com/neulab/ExplainaBoard/pull/527 ?

    (sorry, after sending the review request I made a change of naming from dim->ndim, which I think is more in line with the naming in numpy)

    opened by neubig 12
  • test_generate_system_analysis in integration_tests.summarization_test.SummarizationTest is too slow

    test_generate_system_analysis in integration_tests.summarization_test.SummarizationTest is too slow

    commit 8c514c3d81a079d967d208f8bc330c2f202620bb (#437) increases the execution time of integration_tests.summarization_test.SummarizationTest. When I measured on my GCP VM, the time of the test increased by 430 seconds (from 6 seconds to 436 seconds), which is too slow to run as automated tests in pull requests. Slow tests need to be removed or replaced with more focused and fast tests. In general, having slow tests leads to productivity drains: Time to update pull requests takes longer, developers would try to include large commits into pull requests to work around slow CI time, pull requests become expensive to review, which makes identifying bugs or design flaws in code review difficult.

    Repro steps

    rm -rf ~/.cache/explainaboard
    time python -m unittest -v integration_tests.summarization_test.SummarizationTest
    

    Output

    test_datalab_loader (integration_tests.summarization_test.SummarizationTest) ... skipped 'time consuming'
    test_default_features_dont_modify_condgen (integration_tests.summarization_test.SummarizationTest) ... ok
    test_generate_system_analysis (integration_tests.summarization_test.SummarizationTest) ... WARNING:datalabs.load:Couldn't find a directory or a dataset named 'cnn_dailymail' in this version. It was picked from the master branch on github instead.
    WARNING:datalabs.builder:No config specified, defaulting to: cnn_dailymail/3.0.0
    WARNING:datalabs.builder:Reusing dataset cnn_dailymail (/home/t/.cache/expressai/datalab/cnn_dailymail/3.0.0/3.0.0/6e2f5d689f0225c4f22eb78d11ba7a21399810c5cb853edafe39b1d006a1ff95)
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 287113/287113 [06:20<00:00, 755.03it/s]
    100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 287113/287113 [00:29<00:00, 9616.19it/s]
    INFO:explainaboard:caching stats for cnn_dailymail None
    calculating example-level features: 3it [00:00, 51.88it/s]
    calculating token-level features: 3it [00:00, 139.83it/s]
    /home/t/explainaboard-fork/explainaboard/metrics/metric.py:336: DeprecationWarning: Use of keyword argument `alpha` for method `interval` is deprecated. Use first positional argument or keyword argument `confidence` instead.
      return stats_t.interval(
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 349.50it/s]
    ok
    test_generate_system_human_eval (integration_tests.summarization_test.SummarizationTest) ... skipped 'Not yet fixed in v0.11'
    test_load_tsv (integration_tests.summarization_test.SummarizationTest) ... ok
    
    ----------------------------------------------------------------------
    Ran 5 tests in 438.659s
    
    OK (skipped=2)
    python -m unittest -v integration_tests.summarization_test.SummarizationTest  434.35s user 2.58s system 98% cpu 7:22.46 total
    
    opened by tetsuok 12
  • Use 'confidence' instead of deprecated 'alpha' for scipy.stats.t.interval

    Use 'confidence' instead of deprecated 'alpha' for scipy.stats.t.interval

    Reduced heavy logging uncovered buried DeprecationWarnings in tests. We get the following DeprecationWarning in the tests that invoke scipy.stats.t.interval method:

    test_hits (explainaboard.tests.test_metric.TestMetric) ... /home/runner/work/ExplainaBoard/ExplainaBoard/explainaboard/metrics/metric.py:338: DeprecationWarning: Use of keyword argument `alpha` for method `interval` is deprecated. Use first positional argument or keyword argument `confidence` instead.
    

    This PR fixes the warning as the warning suggests.

    opened by tetsuok 12
  • Cache pip dependencies to speed up CI

    Cache pip dependencies to speed up CI

    This PR attempts to speed up both unit-tests and integration-tests CI jobs. Every CI job spends about 2 minutes on installing pip packages. The step dominates about 90% of the total time of unit-tests and about 30% of the total time of integration-tests. The step to install pip packages can be skipped by creating virtual environments and caching the installed packages onto the environments using actions/cache. Note that actions/[email protected] doesn't support caching installed packages. It only allow to avoid re-downloading by caching downloaded packages from PyPI under ~/.cache/pip.

    Dependencies listed in setup.py are moved to requirements.txt. This is to generate lock files for every Python version from requirements.txt. The generated lock files are used as keys to caches to properly invalidate when dependencies are updated. Unless dependencies are changed, every CI job should be reproducible (with respect to installing pip dependencies). Making the CI jobs reproducible and faster achieves at the expense of periodical updates of these lock files. Maintaining lock files for dependencies is pretty common in other programming languages such as JS and Rust. This update can be done by running cicd/gen_requirements_lock.sh.

    opened by tetsuok 12
  • Refactor/loaders

    Refactor/loaders

    1. Commit 1: refactored Loader.__init__()
    • made data a required argument
    • all loaders now call the __init__ method of the base loader
    1. Commit 2: implemented file-specific loaders to simplify the task-specific loaders
    • implements TSVFileLoader, JSONFileLoader, DatalabFileLoader and CoNLLFileLoader which knows how to load a certain type of file given the fields
    • refactored all the existing loaders to use these file-specific loaders instead
    • QAMultipleChoiceLoader KgLinkTailPredictionLoader still uses custom load() methods because they support user-defined features. The way they load these extra features is different so I decided to leave them for now. It'll be easy to incorporate user-defined features to the file loaders (we just need to update the fields based on self.user_defined_features_configs)
    • hellaswag is removed in https://github.com/neulab/ExplainaBoard/commit/4b93b9542b714754eb91d718cd82b98ab706d11c
    • This refactor makes it easier to do #141 in the future. We just need to have two sets of file loaders for each task-specific loader. One is for the (input, reference_output) file and the other one is for the predictions file.

    Please let me know what you think! Thanks!

    opened by lyuyangh 12
  • Potential issue with spearman R bootstrapping

    Potential issue with spearman R bootstrapping

    We observed the following test failure when integrating another PR:

    ======================================================================
    FAIL: test_sample_level_spearmanr_bootstrap (integration_tests.meta_eval_wmt_da_test.MetaEvalNLGCITest)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/home/runner/work/ExplainaBoard/ExplainaBoard/integration_tests/meta_eval_wmt_da_test.py", line 191, in test_sample_level_spearmanr_bootstrap
        self.assertAlmostEqual(ci[0], 0.6488, 2)
    AssertionError: 0.7325904563487001 != 0.6488 within 2 places (0.08379045634870008 difference)
    
    ----------------------------------------------------------------------
    

    We are not sure whether this is an issue with the test or the underlying code, but as a temporary measure we reduced the sensitivity of the test. We should go back and check to make sure whether this is just due to bootstrapping variance or whether it's due to a bug in the test itself.

    opened by neubig 10
  • Implement CalibrationAnalysis

    Implement CalibrationAnalysis

    Calibration is whether a system's confidence is well-correlated with whether the system got the answer right or not. It would be nice if we could do analyses related to calibration, such as calculating expected calibration error: https://arxiv.org/abs/1706.04599

    I think this should probably be implemented as an additional variety of analysis, which would be simple and self-contained: https://github.com/neulab/ExplainaBoard/blob/main/explainaboard/analysis/analyses.py#L45

    good first issue new-analysis 
    opened by neubig 10
  • Correct training set feature field names

    Correct training set feature field names

    Previously, calculation of training set features would fail if the datalab dataset used unconventional column names.

    This does the following things:

    1. Makes an option to use Loader to load only datasets without system outputs if output_data is set to None
    2. Changes _statistics_func to simply take in the samples and system info, and return the statistics (in contrast to previously using the datalab aggregating() functionality.
    3. Loads data used in calculating training features through Loader so that appropriate field mapping will be performed

    Fixes https://github.com/neulab/ExplainaBoard/issues/416

    Notably, @pfliu-nlp, "2." may require some discussion, here are the pros and cons of doing it this new way:

    Pros

    • it makes the statistics code self-contained and not rely on an external library. honestly, even though I'm very familiar with explainaboard, I was always a bit confused about what was actually going on here because the aggregating() decorator was a bit mysterious to me
    • statistics_func can now be called on any set of samples, so it could be called on a non-datalab dataset. this may be useful if we want to, for example, calculate training set features with custom datasets

    Cons

    • the datalab aggregating operator may have implemented parallelism so this aggregation of statistics might be able to be done faster? but I actually am not sure if that's actually the case in practice
    • something else I'm missing?
    opened by neubig 9
  • Unsafe en_core_web_sm downloading in setup.py

    Unsafe en_core_web_sm downloading in setup.py

    Currently setup.py will execute an external command python -m spacy download en_core_web_sm to install a spaCy model during setup. This approach has several issues about system consystency:

    • spaCy models are intendedly not registered to PyPI, and PyPI does not allow libraries depending on external requirements.
    • The command is just a system command which possibly breaks the system, or won't work correctly.

    Since there is no recommended way to add spaCy models to install_requires, we need to take either of follows:

    • Download the model programatically when spacy.load() fails.
    • Bundle the model file into this repository.
    • Ask users to download appropriate models additionally.
    opened by odashi 9
  • How to name metrics when registering them

    How to name metrics when registering them

    There are two ways to name metrics

    (1)

    
    @dataclass
    @metric_config_registry.register("AccuracyConfig")
    class AccuracyConfig(MetricConfig):
        def to_metric(self):
            return Accuracy(self)
    
    

    (2)

    @dataclass
    @metric_config_registry.register("Accuracy")
    class AccuracyConfig(MetricConfig):
        def to_metric(self):
            return Accuracy(self)
    
    

    Currently, we are using (1), which, however, is inconsistent with how the Processor names them. For example:

    https://github.com/neulab/ExplainaBoard/blob/cd54c1b61e490295db8c1cfee8460aff4cce1880/explainaboard/processors/text_classification.py#L132

    Which one do you prefer?

    If we go with (2), this code should be modified to avoid naming bug: https://github.com/neulab/ExplainaBoard/blob/cd54c1b61e490295db8c1cfee8460aff4cce1880/explainaboard/metrics/registry.py#L11

    config_cls = metric_config_registry.get_type(dikt["name"]) # instead of type
    

    I could send a PR of this.

    opened by pfliu-nlp 8
  • add tests for meval to replicate paper results

    add tests for meval to replicate paper results

    Overview

    This PR adds tests to verify whether our implemented meta-evaluation processor is able to replicate reported results from existing published papers.

    Relevant issue: https://github.com/inspired-co/taskboard/issues/180

    Details

    • Collect system outputs from this repo of two metrics (rouge1 and bartscore)
    • Using Explainaboard to process these outputs and compare the results with the ones reported from the above repo.

    References

    • Paper: BARTSCORE: Evaluating Generated Text as Text Generation
    • Code: https://github.com/neulab/BARTScore
    opened by pfliu-nlp 0
  • `TypeError: 'type' object is not subscriptable` when attempt to import or use CLI

    `TypeError: 'type' object is not subscriptable` when attempt to import or use CLI

    How I install ?

    pip install explainaboard
    or
    pip install -U --force-reinstall explainaboard
    

    Both cause same problem

    Version : 0.12.3

    When try to import explainaboard, or run explainaboard from CLI, same error:

    Python 3.8.15 (default, Nov 24 2022, 15:19:38) 
    [GCC 11.2.0] :: Anaconda, Inc. on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import explainaboard
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/__init__.py", line 6, in <module>
        from explainaboard.loaders import DatalabLoaderOption, get_loader_class
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/loaders/__init__.py", line 5, in <module>
        from explainaboard.loaders import file_loader, loader_factory
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/loaders/file_loader.py", line 18, in <module>
        from explainaboard.analysis.analyses import Analysis
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/analysis/analyses.py", line 14, in <module>
        from explainaboard.analysis.bucketing import get_bucketing_method
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/analysis/bucketing.py", line 13, in <module>
        from explainaboard.serialization.types import SerializableData
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/serialization/__init__.py", line 8, in <module>
        from explainaboard.serialization.types import Serializable
      File "/home/cpu12595/miniconda3/envs/nlppytorch/lib/python3.8/site-packages/explainaboard/serialization/types.py", line 21, in <module>
        list["PrimitiveData"],  # type: ignore
    TypeError: 'type' object is not subscriptable
    
    
    opened by ttpro1995 0
  • Bump mypy version to 0.990

    Bump mypy version to 0.990

    Since mypy 0.990 was released yesterday (blog post), it would be better to bump mypy version to 0.990 to take advantage of the new features and bug fixes. It seems there is some sort of efforts to be made to adopt the version when I run mypy 0.990 in the codebase of explainaboard. Below is the output of pre-commit run mypy --color=never --all-files

    mypy.....................................................................Failed
    - hook id: mypy
    - exit code: 1
    
    explainaboard/utils/spacy_loader.py:5: error: Cannot find implementation or library stub for module named "spacy"  [import]
    explainaboard/utils/spacy_loader.py:6: error: Cannot find implementation or library stub for module named "spacy.language"  [import]
    explainaboard/utils/agreement.py:5: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/sum_attribute.py:8: error: Cannot find implementation or library stub for module named "nltk"  [import]
    explainaboard/analysis/sum_attribute.py:10: error: Cannot find implementation or library stub for module named "nltk.util"  [import]
    explainaboard/utils/async_eaas.py:10: error: Cannot find implementation or library stub for module named "eaas"  [import]
    explainaboard/third_party/text_to_sql_test_suit_eval/parse.py:7: error: Cannot find implementation or library stub for module named "sqlparse"  [import]
    explainaboard/third_party/text_to_sql_test_suit_eval/parse.py:8: error: Cannot find implementation or library stub for module named "sqlparse.sql"  [import]
    explainaboard/third_party/text_to_sql_test_suit_eval/parse.py:9: error: Cannot find implementation or library stub for module named "sqlparse.tokens"  [import]
    setup.py:3: error: Skipping analyzing "setuptools": module is installed, but missing library stubs or py.typed marker  [import]
    explainaboard/metrics/auxiliary/qa_table_text_hybrid_auxiliary.py:16: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/auxiliary/qa_table_text_hybrid_auxiliary.py:17: error: Cannot find implementation or library stub for module named "scipy.optimize"  [import]
    explainaboard/utils/logging.py:9: error: Library stubs not installed for "tqdm"  [import]
    explainaboard/utils/logging.py:9: note: Hint: "python3 -m pip install types-tqdm"
    explainaboard/utils/logging.py:9: note: (or run "mypy --install-types" to install all missing stub packages)
    explainaboard/utils/logging.py:16: error: Incompatible default for argument "desc" (default has type "None", argument has type "str")  [assignment]
    explainaboard/utils/logging.py:16: note: PEP 484 prohibits implicit Optional. Accordingly, mypy has changed its default to no_implicit_optional=True
    explainaboard/utils/logging.py:16: note: Use https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade your codebase
    explainaboard/visualizers/bar_chart.py:8: error: Cannot find implementation or library stub for module named "matplotlib"  [import]
    explainaboard/visualizers/bar_chart.py:9: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/bucketing.py:10: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/feature.py:239: error: Incompatible types in assignment (expression has type "Dict[str, FeatureType]", target has type "SerializableData")  [assignment]
    explainaboard/utils/agreement_test.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/utils/typing_utils_test.py:10: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/serialization/serializers.py:53: error: Incompatible return value type (got "Union[List[Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]], Tuple[Union[None, bool, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable], ...]]", expected "Union[None, bool, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]")  [return-value]
    explainaboard/serialization/serializers.py:53: error: Generator has incompatible item type "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"; expected "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [misc]
    explainaboard/serialization/serializers.py:89: error: Incompatible return value type (got "Union[List[Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]], Tuple[Union[None, bool, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]], ...]]", expected "Union[None, bool, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]")  [return-value]
    explainaboard/serialization/serializers.py:89: error: Generator has incompatible item type "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [misc]
    explainaboard/utils/tensor_analysis.py:12: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/metric.py:10: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/metric.py:11: error: Cannot find implementation or library stub for module named "scipy.stats"  [import]
    explainaboard/metrics/metric.py:178: error: Dict entry 0 has incompatible type "str": "Dict[str, MetricValue]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/metrics/metric.py:196: error: Argument 1 to "MetricResult" has incompatible type "Dict[str, Union[None, bool, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]]"; expected "Dict[str, MetricValue]"  [arg-type]
    explainaboard/third_party/text_to_sql_test_suit_eval/process_sql.py:30: error: Cannot find implementation or library stub for module named "nltk"  [import]
    explainaboard/utils/tokenizer.py:15: error: Cannot find implementation or library stub for module named "sacrebleu.tokenizers"  [import]
    explainaboard/utils/tokenizer.py:16: error: Cannot find implementation or library stub for module named "sacrebleu.tokenizers.tokenizer_intl"  [import]
    explainaboard/utils/tokenizer.py:17: error: Cannot find implementation or library stub for module named "sacrebleu.tokenizers.tokenizer_ja_mecab"  [import]
    explainaboard/utils/tokenizer.py:18: error: Cannot find implementation or library stub for module named "sacrebleu.tokenizers.tokenizer_zh"  [import]
    explainaboard/metrics/continuous.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/metric_test.py:10: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/external_eval.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/meta_evaluation.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/meta_evaluation.py:9: error: Cannot find implementation or library stub for module named "scipy"  [import]
    explainaboard/analysis/feature_test.py:69: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/feature_test.py:134: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/feature_test.py:205: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:230: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:231: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:232: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:233: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Collection[str]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:234: error: List item 0 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:234: error: List item 1 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:234: error: List item 2 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:235: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Tuple[Dict[str, object], Dict[str, object], Dict[str, object]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:237: error: Dict entry 0 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/serialization/serializers_test.py:237: error: Dict entry 1 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/serialization/serializers_test.py:237: error: Dict entry 2 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/serialization/serializers_test.py:240: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:241: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:242: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:243: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Collection[str]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:244: error: List item 0 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:244: error: List item 1 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:244: error: List item 2 has incompatible type "Dict[str, object]"; expected "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [list-item]
    explainaboard/serialization/serializers_test.py:245: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Tuple[Dict[str, object], Dict[str, object], Dict[str, object]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/serialization/serializers_test.py:247: error: Dict entry 0 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/serialization/serializers_test.py:247: error: Dict entry 1 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/serialization/serializers_test.py:247: error: Dict entry 2 has incompatible type "str": "Dict[str, object]"; expected "str": "Union[None, int, float, str, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]"  [dict-item]
    explainaboard/metrics/eaas.py:9: error: Cannot find implementation or library stub for module named "eaas.async_client"  [import]
    explainaboard/metrics/eaas.py:10: error: Cannot find implementation or library stub for module named "eaas.config"  [import]
    explainaboard/metrics/eaas.py:11: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/eaas.py:12: error: Cannot find implementation or library stub for module named "sacrebleu"  [import]
    explainaboard/metrics/eaas.py:13: error: Cannot find implementation or library stub for module named "sacrebleu.metrics.base"  [import]
    explainaboard/metrics/eaas.py:13: error: Cannot find implementation or library stub for module named "sacrebleu.metrics"  [import]
    explainaboard/metrics/ranking.py:9: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/performance.py:51: error: Dict entry 1 has incompatible type "str": "List[int]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/performance.py:52: error: Dict entry 2 has incompatible type "str": "Dict[str, MetricResult]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/performance.py:72: error: Argument 1 to "float" has incompatible type "Union[str, None, int, float, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"; expected "Union[SupportsFloat, SupportsIndex, str, bytes, bytearray, memoryview, array[Any], mmap, _CData, PickleBuffer]"  [arg-type]
    explainaboard/analysis/performance.py:73: error: Argument 1 to "float" has incompatible type "Union[str, None, int, float, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"; expected "Union[SupportsFloat, SupportsIndex, str, bytes, bytearray, memoryview, array[Any], mmap, _CData, PickleBuffer]"  [arg-type]
    explainaboard/metrics/log_prob.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/accuracy.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/external_eval_test.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/performance_test.py:219: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/performance_test.py:241: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/metrics/qa_table_text_hybrid.py:10: error: Cannot find implementation or library stub for module named "numpy"  [import]
    integration_tests/meta_eval_nlg_test.py:5: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/accuracy_test.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/analyses.py:12: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/analyses.py:245: error: Dict entry 0 has incompatible type "str": "List[BucketPerformance]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/analyses.py:446: error: Dict entry 0 has incompatible type "str": "List[BucketPerformance]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/analyses.py:563: error: Argument "bucket_setting" to "__call__" of "BucketingFn" has incompatible type "List[Tuple[float, float]]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses.py:563: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses.py:563: note: Consider using "Sequence" instead, which is covariant
    explainaboard/analysis/analyses.py:658: error: Dict entry 2 has incompatible type "str": "List[int]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/analyses.py:722: error: Dict entry 1 has incompatible type "str": "List[ComboOccurence]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/analyses.py:841: error: Dict entry 1 has incompatible type "str": "Dict[str, FeatureType]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/analyses.py:842: error: Dict entry 2 has incompatible type "str": "Dict[str, MetricConfig]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/metrics/extractive_qa.py:11: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/analysis/analyses_test.py:90: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Collection[str]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:237: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[BucketPerformance]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses_test.py:237: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses_test.py:237: note: Consider using "Sequence" instead, which is covariant
    explainaboard/analysis/analyses_test.py:266: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:280: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:321: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[ComboOccurence]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses_test.py:321: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses_test.py:321: note: Consider using "Sequence" instead, which is covariant
    explainaboard/analysis/analyses_test.py:328: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Union[Sequence[str], None, int, float, List[PrimitiveData], Tuple[PrimitiveData, ...], Dict[str, PrimitiveData]]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:350: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:477: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[BucketPerformance]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses_test.py:477: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses_test.py:477: note: Consider using "Sequence" instead, which is covariant
    explainaboard/analysis/analyses_test.py:507: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, object]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/analyses_test.py:518: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "Dict[str, FeatureType]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses_test.py:518: note: "Dict" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses_test.py:518: note: Consider using "Mapping" instead, which is covariant in the value type
    explainaboard/analysis/analyses_test.py:519: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "Dict[str, MetricConfig]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/analyses_test.py:519: note: "Dict" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/analyses_test.py:519: note: Consider using "Mapping" instead, which is covariant in the value type
    explainaboard/analysis/result.py:33: error: Dict entry 0 has incompatible type "str": "Dict[str, Dict[str, MetricResult]]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/analysis/result.py:34: error: Dict entry 1 has incompatible type "str": "List[AnalysisResult]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/loaders/file_loader.py:15: error: Cannot find implementation or library stub for module named "datalabs"  [import]
    explainaboard/loaders/file_loader.py:16: error: Cannot find implementation or library stub for module named "datalabs.features.features"  [import]
    explainaboard/loaders/file_loader.py:212: error: Incompatible default for argument "fields" (default has type "None", argument has type "List[FileLoaderField]")  [assignment]
    explainaboard/loaders/file_loader.py:212: note: PEP 484 prohibits implicit Optional. Accordingly, mypy has changed its default to no_implicit_optional=True
    explainaboard/loaders/file_loader.py:212: note: Use https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade your codebase
    explainaboard/loaders/file_loader.py:475: error: Incompatible default for argument "fields" (default has type "None", argument has type "List[FileLoaderField]")  [assignment]
    explainaboard/loaders/file_loader.py:475: note: PEP 484 prohibits implicit Optional. Accordingly, mypy has changed its default to no_implicit_optional=True
    explainaboard/loaders/file_loader.py:475: note: Use https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade your codebase
    explainaboard/loaders/file_loader.py:522: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/analysis/result_test.py:35: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Dict[str, MetricResult]]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/result_test.py:36: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[AnalysisResult]"; expected "SerializableData"  [arg-type]
    explainaboard/analysis/result_test.py:36: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/analysis/result_test.py:36: note: Consider using "Sequence" instead, which is covariant
    explainaboard/third_party/text_to_sql_test_suit_eval/exec_eval.py:11: error: Library stubs not installed for "tqdm"  [import]
    explainaboard/info.py:186: error: Dict entry 11 has incompatible type "str": "List[AnalysisLevel]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/info.py:187: error: Dict entry 12 has incompatible type "str": "List[Analysis]"; expected "str": "Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]"  [dict-item]
    explainaboard/info.py:260: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Union[None, int, float, str, List[SerializableData], Tuple[SerializableData, ...], Dict[str, SerializableData], Serializable]]"; expected "PrimitiveData"  [arg-type]
    explainaboard/analysis/feature_funcs.py:8: error: Cannot find implementation or library stub for module named "lexicalrichness"  [import]
    explainaboard/analysis/feature_funcs.py:8: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
    explainaboard/analysis/feature_funcs.py:9: error: Cannot find implementation or library stub for module named "sacrebleu"  [import]
    explainaboard/meta_analyses/ranking.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/meta_analyses/ranking.py:9: error: Cannot find implementation or library stub for module named "pandas"  [import]
    explainaboard/metrics/f1_score.py:9: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/processors/processor.py:9: error: Cannot find implementation or library stub for module named "eaas.async_client"  [import]
    explainaboard/processors/processor.py:10: error: Cannot find implementation or library stub for module named "eaas.config"  [import]
    explainaboard/processors/sequence_labeling.py:43: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/processors/argument_pair_extraction.py:34: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/processors/qa_tat.py:7: error: Cannot find implementation or library stub for module named "datalabs"  [import]
    explainaboard/processors/language_modeling.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/processors/conditional_generation.py:9: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/processors/cloze_generative.py:8: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/processors/summarization.py:8: error: Cannot find implementation or library stub for module named "datalabs.operations.featurize.plugins.summarization.sum_attribute"  [import]
    integration_tests/summarization_test.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    integration_tests/meta_eval_wmt_da_test.py:7: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/text_to_sql.py:11: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/metrics/f1_score_test.py:7: error: Cannot find implementation or library stub for module named "sklearn.metrics"  [import]
    explainaboard/visualizers/draw_charts.py:24: error: Cannot find implementation or library stub for module named "matplotlib"  [import]
    explainaboard/visualizers/draw_charts.py:25: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/info_test.py:116: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[AnalysisLevel]"; expected "SerializableData"  [arg-type]
    explainaboard/info_test.py:116: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/info_test.py:116: note: Consider using "Sequence" instead, which is covariant
    explainaboard/info_test.py:117: error: Argument 1 to "serialize" of "PrimitiveSerializer" has incompatible type "List[Analysis]"; expected "SerializableData"  [arg-type]
    explainaboard/info_test.py:117: note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
    explainaboard/info_test.py:117: note: Consider using "Sequence" instead, which is covariant
    explainaboard/info_test.py:160: error: Argument 1 to "deserialize" of "PrimitiveSerializer" has incompatible type "Dict[str, Union[Collection[str], None, int, float, List[PrimitiveData], Tuple[PrimitiveData, ...]]]"; expected "PrimitiveData"  [arg-type]
    integration_tests/metric_test.py:6: error: Cannot find implementation or library stub for module named "eaas"  [import]
    integration_tests/metric_test.py:7: error: Cannot find implementation or library stub for module named "eaas.async_client"  [import]
    integration_tests/metric_test.py:9: error: Cannot find implementation or library stub for module named "numpy"  [import]
    explainaboard/explainaboard_main.py:10: error: Cannot find implementation or library stub for module named "eaas.endpoint"  [import]
    explainaboard/explainaboard_main.py:10: error: Cannot find implementation or library stub for module named "eaas"  [import]
    explainaboard/explainaboard_main.py:89: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:90: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:91: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:92: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:93: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:94: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:364: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:365: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:367: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:368: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:369: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:370: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:371: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:390: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:401: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:402: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:403: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:404: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:405: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:406: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:407: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:408: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    explainaboard/explainaboard_main.py:499: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
    integration_tests/cli_test.py:10: error: Cannot find implementation or library stub for module named "datalabs"  [import]
    Found 141 errors in 59 files (checked 231 source files)
    
    opened by tetsuok 0
  • add_tasks.md is out of date

    add_tasks.md is out of date

    It seems add_tasks.md is out of date. add_tasks.md mentions tasks.py in three places below:

    • https://github.com/neulab/ExplainaBoard/blame/fcedd5d7aab172b943c6b0025685b09744f149fd/docs/add_new_tasks.md#L6
    • https://github.com/neulab/ExplainaBoard/blame/fcedd5d7aab172b943c6b0025685b09744f149fd/docs/add_new_tasks.md#L12
    • https://github.com/neulab/ExplainaBoard/blame/fcedd5d7aab172b943c6b0025685b09744f149fd/docs/add_new_tasks.md#L133

    but the Python script was removed in #373. add_tasks.md needs to be updated properly.

    opened by tetsuok 0
  • Add system metadata class

    Add system metadata class

    Processor.process() takes metadata, which is used to directly initialize SysOutputInfo. However, these are essentially different data (especially, "metadata" $\subset$ SysOutputInfo, but not $=$) and the current implementation makes some confusion around this:

    The most significant abuse around this behavior is that FileLoaderMetadata is implicitly converted into SysOutputInfo. This shouldn't work unless explicit conversion: https://github.com/neulab/ExplainaBoard/blob/4cec0a01cbe2617e9a67a440be25ee4252f792b2/integration_tests/ner_test.py#L148-L154

    To this end, we need:

    • A struct defining the system metadata.
    • Change the behavior of Processor to take the system metadata, not a dict.
    • Either:
      • A conversion method between system metadata and FileLoaderReturn/SysOutputInfo
      • Include system metadata as a direct member of FileLoaderReturn/SysOutputInfo
    opened by odashi 3
  • Reconsider default number of buckets

    Reconsider default number of buckets

    Currently the default number of buckets is 4: https://github.com/neulab/ExplainaBoard/blob/38db95801cbd15e2e9b2db7b60c40bd7173e1deb/explainaboard/analysis/analyses.py#L117

    But this is probably too few when we're doing discrete bucketing. It'd probably be better to have the default be 4 for continuous and more (maybe 10) for discrete bucketing.

    opened by neubig 0
Releases(v0.8.5)
  • v0.8.5(Apr 2, 2022)

    This release:

    • Refactors the metrics class and the report structure.
    • Adds significance tests to all metrics.
    • Does major code style improvements and adds type checking.
    • Fixes several bugs.
    Source code(tar.gz)
    Source code(zip)
Owner
NeuLab
Graham Neubig's Lab at LTI/CMU
NeuLab
Wake: Context-Sensitive Automatic Keyword Extraction Using Word2vec

Wake Wake: Context-Sensitive Automatic Keyword Extraction Using Word2vec Abstract استخراج خودکار کلمات کلیدی متون کوتاه فارسی با استفاده از word2vec ب

Omid Hajipoor 1 Dec 17, 2021
nlpcommon is a python Open Source Toolkit for text classification.

nlpcommon nlpcommon, Python Text Tool. Guide Feature Install Usage Dataset Contact Cite Reference Feature nlpcommon is a python Open Source

xuming 3 May 29, 2022
Lyrics generation with GPT2-based Transformer

HuggingArtists - Train a model to generate lyrics Create AI-Artist in just 5 minutes! 🚀 Run the demo notebook to train 🚀 Run the GUI demo to test Di

Aleksey Korshuk 65 Dec 19, 2022
A sample project that exists for PyPUG's "Tutorial on Packaging and Distributing Projects"

A sample Python project A sample project that exists as an aid to the Python Packaging User Guide's Tutorial on Packaging and Distributing Projects. T

Python Packaging Authority 4.5k Dec 30, 2022
🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.

A hyper-fast, safe Python module to read and write JSON data. Works as a drop-in replacement for Python's built-in json module. This is alpha software

Matthias 479 Jan 01, 2023
A collection of GNN-based fake news detection models.

This repo includes the Pytorch-Geometric implementation of a series of Graph Neural Network (GNN) based fake news detection models. All GNN models are implemented and evaluated under the User Prefere

SafeGraph 251 Jan 01, 2023
A curated list of efficient attention modules

awesome-fast-attention A curated list of efficient attention modules

Sepehr Sameni 891 Dec 22, 2022
This is a modification of the OpenAI-CLIP repository of moein-shariatnia

This is a modification of the OpenAI-CLIP repository of moein-shariatnia

Sangwon Beak 2 Mar 04, 2022
DeepPavlov Tutorials

DeepPavlov tutorials DeepPavlov: Sentence Classification with Word Embeddings DeepPavlov: Transfer Learning with BERT. Classification, Tagging, QA, Ze

Neural Networks and Deep Learning lab, MIPT 28 Sep 13, 2022
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

ALBERT ***************New March 28, 2020 *************** Add a colab tutorial to run fine-tuning for GLUE datasets. ***************New January 7, 2020

Google Research 3k Dec 26, 2022
DLO8012: Natural Language Processing & CSL804: Computational Lab - II

NATURAL-LANGUAGE-PROCESSING-AND-COMPUTATIONAL-LAB-II DLO8012: NLP & CSL804: CL-II [SEMESTER VIII] Syllabus NLP - Reference Books THE WALL MEGA SATISH

AMEY THAKUR 7 Apr 28, 2022
📜 GPT-2 Rhyming Limerick and Haiku models using data augmentation

Well-formed Limericks and Haikus with GPT2 📜 GPT-2 Rhyming Limerick and Haiku models using data augmentation In collaboration with Matthew Korahais &

Bardia Shahrestani 2 May 26, 2022
Code for text augmentation method leveraging large-scale language models

HyperMix Code for our paper GPT3Mix and conducting classification experiments using GPT-3 prompt-based data augmentation. Getting Started Installing P

NAVER AI 47 Dec 20, 2022
Perform sentiment analysis on textual data that people generally post on websites like social networks and movie review sites.

Sentiment Analyzer The goal of this project is to perform sentiment analysis on textual data that people generally post on websites like social networ

Madhusudan.C.S 53 Mar 01, 2022
Creating a python chatbot that Starbucks users can text to place an order + help cut wait time of a normal coffee.

Creating a python chatbot that Starbucks users can text to place an order + help cut wait time of a normal coffee.

2 Jan 20, 2022
Labelling platform for text using distant supervision

With DataQA, you can label unstructured text documents using rule-based distant supervision.

245 Aug 05, 2022
Automatic privilege escalation for misconfigured capabilities, sudo and suid binaries

GTFONow Automatic privilege escalation for misconfigured capabilities, sudo and suid binaries. Features Automatically escalate privileges using miscon

101 Jan 03, 2023
In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

Med-VQA In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset. Two of these are made on top of Facebook AI Reasearch's Multi-Mo

Kshitij Ambilduke 8 Apr 14, 2022
A Chinese to English Neural Model Translation Project

ZH-EN NMT Chinese to English Neural Machine Translation This project is inspired by Stanford's CS224N NMT Project Dataset used in this project: News C

Zhenbang Feng 29 Nov 26, 2022
Huggingface Transformers + Adapters = ❤️

adapter-transformers A friendly fork of HuggingFace's Transformers, adding Adapters to PyTorch language models adapter-transformers is an extension of

AdapterHub 1.2k Jan 09, 2023