Biomedical Language Modeling
Tools for curating biomedical training data for large-scale language modeling.
Setup
Using conda
:
conda env create -f conda.yml
Activate the environment as:
conda activate bigscience-biomedical
Datasets
Spreadsheet of biomedical training sets (currently ~76 datasets).