当前位置：网站首页>Hugging face's problem record I

Hugging face's problem record I

2022-07-28 06:34:00 【SCHLAU_ tono】

Error 1.

Torch.utils.datasets and huggingface Of datasets It's different

Error 2. cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

To be precise , This is not Hugging face The problem of , I'm using it Torch Problems encountered in , Now also collect them

The main reasons are CUDA runtime version Don't fit , Solution reference post ：

pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

Error 3. vars() argument must have dict attribute？

The reason for this problem is mostly due to the use of customized dataset, But when training the model, there is no relative data_collator

The problem code is as follows :

    encoded_texts = tokenizer(texts, padding = True, truncation = True, return_tensors = 'pt')
    labels = torch.tensor(labels)
    dataset = TensorDataset(encoded_texts['input_ids'], encoded_texts['attention_mask'], labels)

there dataset The type is torch.util.dataset, trainer By default, the incoming dataset yes datasets , So use the default default_data_collator Extract the data .

resolvent ： Customize data_collator Function and then add TrainingArguments in . The code is as follows ：

customised_data_collector(features):
    batch = {
    }
    batch['input_ids'] = torch.stack([f[0] for f in features])
    batch['attention_mask'] = torch.stack([f[1] for f in features])
    batch['labels'] = torch.stack([f[2] for f in features])
    
    return batch
################ 
TrainingArguments(..., data_collator=customised_data_collector ,...)

Reference resources post

Error 4. TypeError: forward() got an unexpected keyword argument ‘label’

The reason for this error is complicated , And I finally failed to solve this problem directly . Here we only record some possible reasons and solutions ：

The parameter names of some models are labels instead of label. Solution datasets.rename_column("label", "labels") Change name
Jupyter Notebook Self madness , Just restart the kernel and run again . Reference resources post
Somewhat model for example MT5EncoderModel and T5EncoderModel Just the basic model , No, label Parameters . Reference model source code forward(). if necessary Sequence classification. You need to customize Model, Reference resources BertForSequenceClassification The implementation of the Source code

How to be in Colab Load in Google Drive The file of .

The article Seven kinds of loading are introduced in detail Google drive The way . I use the sixth method , Mount the hard disk locally (Mount the drive locally). Write the following code in the file :

from google.colab import drive
drive.mount('/content/drive')

The address path when reading the file is as follows
Please add a picture description
(This screenshot is from the article “7 ways to load external data into Google Colab” B. Chen )

About Evaluate Metrics

Official website about all Metric Introduction to ：https://huggingface.co/evaluate-metric

Yes 28 Different matrices , Stick it on the bottom Official statement

>>>from datasets import list_metrics
>>>metrics_list = list_metrics()
>>>len(metrics_list)
>28
>>>print(metrics_list)
['accuracy', 'bertscore', 'bleu', 'bleurt', 'cer', 'comet', 'coval', 'cuad', 'f1', 'gleu', 'glue', 'indic_glue', 'matthews_correlation', 'meteor', 'pearsonr', 'precision', 'recall', 'rouge', 'sacrebleu', 'sari', 'seqeval', 'spearmanr', 'squad', 'squad_v2', 'super_glue', 'wer', 'wiki_split', 'xnli']

Common use combinations ：
metric = load_metric("glue","mrpc")
Show at the same time accuracy and f1 fraction
Insert picture description here

How to be in colab Load in python file (saved on the google drive)

In the load google After the hard disk ,!python 'filepath'

How to be in python Use... In the document pip install

If you are directly in python It says in the file pip install packagename Will jump out Syntax Error. So change to import pip package , Use pip The method built in the package is used to download the third-party library . Solution reference post Why does “pip install” inside Python raise a SyntaxError?

import pip

package_names=['datasets', 'transformers'] #packages to install
pip.main(['install'] + package_names + ['--upgrade'])
# --upgrade to install or update existing packages

Problem 1. Train loss is decreasing, but accuracy remain the same

Train_loss Falling but accuracy There is no change

Probable cause ：

Over fitting . You can try to use weight_decay in the TrainingArguments, hidden_dropout_prob stay model.from_pretrained(...) And data augmentation solve . Reference resources post

原网站

版权声明
本文为[SCHLAU_ tono]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/209/202207280519154737.html