当前位置:网站首页>Nervegrowold hands-on learning deep learning V2 - Bert pre training data set and code implementation

Nervegrowold hands-on learning deep learning V2 - Bert pre training data set and code implementation

2022-08-02 23:34:00 cv_lhp

一. bert预训练数据集

1. 介绍

In order to pre-train the previous one李沐动手学深度学习V2-bert和代码实现实现的BERT模型,We need to use the dataset for training these two pre-training tasks:遮蔽语言模型和下一句预测.On the one hand initialBERT模型是在两个庞大的图书语料库和英语维基百科的合集上预训练的.On the other hand ready-made pre-trainingBERT模型可能不适合医学等特定领域的应用.So right on a custom datasetBERT进行预训练变得越来越流行,This time a smaller corpus was usedWikiText-2,来对BERT进行预训练.
与用于预训练word2vec的PTB数据集相比,WikiText-2:(1)保留了原来的标点符号,适合于下一句预测;(2)保留了原来的大小写和数字;(3)大了一倍以上.

2. WikiText-2数据读取

在WikiText-2数据集中,每行代表一个段落,其中在任意标点符号及其前面的词元之间插入空格.保留至少有两句话的段落.为了简单起见,Split sentences using only periods as delimiters.

import os
import random
import torch
import d2l.torch
d2l.torch.DATA_HUB['wikitext-2'] = (
    'https://s3.amazonaws.com/research.metamind.io/wikitext/'
    'wikitext-2-v1.zip', '3c914d17d80b1459be871a5039ac23e752a53cbe')
def _read_wiki(data_dir):
    file_path = os.path.join(data_dir,'wiki.train.tokens')
    with open(file_path,'r',encoding='utf-8') as f:
        lines = f.readlines()
    # 大写字母转换为小写字母
    paragraphs = [line.strip().lower().split(' . ') for line in lines if len(line.split(' . '))>=2] #Note that the split sentence here is two spaces plus two spaces in between'.'句号,full stop'.'There is a space before and after as a sentence separator
    random.shuffle(paragraphs)
    #paragraphs是一个list of list列表
    return paragraphs

3. 生成下一句预测任务的数据

_get_next_sentence()函数生成二分类任务的训练样本.

#Get a sequence pair(How to get the next adjacent sequence)
def _get_next_sentence(sentence,next_sentence,paragraphs):
    if random.random()<0.5:
        is_next = True
    else:
         # paragraphs是三重列表的嵌套
        next_sentence = random.choice(random.choice(paragraphs))#paragraphs 是一个list of list
        is_next = False
    return sentence,next_sentence,is_next

下面的函数通过调用_get_next_sentence函数从输入paragraph生成用于下一句预测的训练样本.这里paragraph是句子列表,其中每个句子都是词元列表,max_len指定预训练期间的BERT输入序列的最大长度.

#一个段落(That is, a line of text paragraph data)Operations that make up a sequence pair,Whether the sequence pair is adjacent at the same time
def _get_nsp_data_from_paragraph(paragraph,paragraphs,vocab,max_len):
    nsp_data_from_paragraph = []
    for i in range(len(paragraph)-1):
        tokens_a,tokens_b,is_next = _get_next_sentence(paragraph[i],paragraph[i+1],paragraphs)
        # 考虑1个'<cls>'词元和2个'<sep>'词元
        if len(tokens_a)+len(tokens_b)+3>max_len:
            continue
        else:
            tokens,segments = d2l.torch.get_tokens_and_segments(tokens_a,tokens_b)
            nsp_data_from_paragraph.append((tokens,segments,is_next))
    return nsp_data_from_paragraph

4. 生成遮蔽语言模型任务的数据

为了从BERT输入序列生成遮蔽语言模型的训练样本,定义了以下_replace_mlm_tokens()函数.在其输入中,tokens是表示BERT输入序列的词元的列表,candidate_pred_positions是不包括特殊词元的BERT输入序列的词元索引的列表(特殊词元在In the shadowing language model task不被预测),以及num_mlm_preds指示预测的数量(选择15%要预测的随机词元).In the shadowing language model task在每个预测位置,输入可以由特殊的“掩码”词元或随机词元替换,或者保持不变.最后,该函数返回可能替换后的输入词元、发生预测的词元索引和这些预测的标签.

#Randomly select some tokens from the input sequence to replace with '<mask>'word or other word
def _replace_mlm_tokens(tokens,candidate_pred_positions,num_mlm_preds,vocab):
    # 为遮蔽语言模型的输入创建新的词元副本,其中输入可能包含替换的“<mask>”或随机词元
    mlm_input_tokens = [token for token in tokens]
    pred_positions_and_labels = []
    # 打乱后用于在In the shadowing language model task获取15%的随机词元进行预测
    random.shuffle(candidate_pred_positions)
    for mlm_pred_position in candidate_pred_positions:
        if len(pred_positions_and_labels) >= num_mlm_preds:
            break
        mask_token = None
        # 80%的时间:将词替换为“<mask>”词元
        if random.random()<0.8:
            mask_token = '<mask>'
        else:
            # 10%的时间:保持词不变
            if random.random()<0.5:
                mask_token = tokens[mlm_pred_position]
            # 10%的时间:用随机词替换该词
            else:
                mask_token = random.choice(vocab.idx_to_token)
        mlm_input_tokens[mlm_pred_position] = mask_token
        pred_positions_and_labels.append((mlm_pred_position,tokens[mlm_pred_position]))
    return mlm_input_tokens,pred_positions_and_labels

通过调用前述的_replace_mlm_tokens函数,以下函数将BERT输入序列(tokens)作为输入,并返回输入词元的索引(Some lemmas may be replaced after)、发生预测的词元索引以及这些预测的标签索引.

#The input lexeme sequence will be replaced with a mask as input to the model,将其在vocab中的索引id表示出来,Simultaneously predicted location indices,The predicted location label token is in vocab中的索引id表示出来
#Get the input sequence after mask substitution,The location of the mask replacement(Where each sequence needs to be predictedid)As well as getting the original lemma where each sequence needs to be predictedlabel
def _get_mlm_data_from_tokens(tokens,vocab):
    candidate_pred_positions = []
    # tokens是一个字符串列表
    for i,token in enumerate(tokens):
        # 在In the shadowing language model task不会预测特殊词元
        if token in ('<cls>','<sep>'):
            continue #Extra special characters are not expected'<cls>','<sep>'
        else:
            candidate_pred_positions.append(i)
    # In the shadowing language model task预测15%的随机词元
    num_mlm_preds = max(1,round(len(tokens)*0.15))#The number of predictions per sequence
    mlm_input_tokens,pred_positions_and_labels = _replace_mlm_tokens(tokens,candidate_pred_positions,num_mlm_preds,vocab)
    pred_positions_and_labels = sorted(pred_positions_and_labels,key=lambda x:x[0]) #Sort by the index size of the predicted location
    pred_positions = [v[0] for v in pred_positions_and_labels] #Get the index of the predicted location
    mlm_pred_labels = [v[1] for v in pred_positions_and_labels] #Get the token for the predicted position
    return vocab[mlm_input_tokens],pred_positions,vocab[mlm_pred_labels] #The input lexeme sequence will be replaced with a mask as input to the model,将其在vocab中的索引id表示出来,Simultaneously predicted location indices,The predicted location label token is in vocab中的索引id表示出来

5. 将文本转换为预训练数据集

为BERT预训练定制一个Dataset类.First you need to define a helper function_pad_bert_inputs来将特殊的“”词元附加到输入.它的参数examples包含来自两个预训练任务的辅助函数_get_nsp_data_from_paragraph()和_get_mlm_data_from_tokens()的输出.

#The operation done by this function is mainly to be able to work with matrices(张量)进行批量计算,Keep the shape and size of the tensor the same
def _pad_bert_inputs(examples,max_len,vocab):
    max_num_mlm_preds = round(max_len*0.15) #round()The function rounds to an integer
    all_tokens_id,all_segments,all_valid_lens = [],[],[]
    all_mlm_pred_labels,all_mlm_pred_positions,all_mlm_pred_weights = [],[],[]
    all_nsp_labels = []
    for (tokens_id,mlm_pred_positions,mlm_pred_labels,segments,is_next) in examples:
        all_tokens_id.append(torch.tensor(tokens_id+[vocab['<pad>']]*(max_len-len(tokens_id)),dtype=torch.long))
        all_segments.append(torch.tensor(segments+[0]*(max_len-len(segments)),dtype=torch.long))
        # valid_lens不包括'<pad>'的计数
        all_valid_lens.append(torch.tensor(len(tokens_id),dtype=torch.float32))
        all_mlm_pred_positions.append(torch.tensor(mlm_pred_positions+[0]*(max_num_mlm_preds-len(mlm_pred_labels)),dtype=torch.long)) #[0]is the first token of each sequence that is<cls>词元,增加对[<cls>]的掩码预测,相当于pad的作用
        #[0]应该改成[3],因为<cls>的索引是3,Vocab:[<unk>,<pad>,<mask>,<cls>.....],It's just that this part of the padding doesn't matter,Because part of the padding will not be used to calculate the error loss
        all_mlm_pred_labels.append(torch.tensor(mlm_pred_labels+[0]*(max_num_mlm_preds-len(mlm_pred_labels)),dtype=torch.long))
        # 填充词元的预测将通过乘以0权重在损失中过滤掉
        all_mlm_pred_weights.append(torch.tensor([1.0]*len(mlm_pred_labels)+[0.0]*(max_num_mlm_preds-len(mlm_pred_labels)),dtype=torch.float32))
        all_nsp_labels.append(torch.tensor(is_next,dtype=torch.long))
    return (all_tokens_id,all_segments,all_valid_lens,all_mlm_pred_positions,all_mlm_pred_weights,all_mlm_pred_labels,all_nsp_labels)

将用于生成两个预训练任务的训练样本的辅助函数和用于填充输入的辅助函数放在一起,定义以下_WikiTextDataset类为用于预训练BERT的WikiText-2数据集.通过实现__getitem__函数,可以任意访问WikiText-2语料库的一对句子生成的预训练样本(遮蔽语言模型和下一句预测)样本.使用d2l.tokenize函数进行词元化.出现次数少于5Infrequent tokens will be filtered out.

class _WikiTextDataset(torch.utils.data.Dataset):
    def __init__(self,paragraphs,max_len):
        #paragraphsis a three-dimensional list,是一个list of list of listTriple nested lists,The elements inside a triple nested list are a two-dimensional nested list,表示一个段落(A paragraph consists of several sentences),The element inside the two-dimensional nested list is one(一维列表)字符串,表示一个句子,Adjacent elements in a two-dimensional nested list represent adjacent sentences in a paragraph,Then repeat these sentences‘词’为单位分开,Forms a list of words(Thus, a one-dimensional list with words as a unit in the two-dimensional nested list in the three-dimensional nested list is formed),例如:[[['hello','my name is xxx'],['your','name','is'],['today','is','nice']],[['who','are','you'],['how','are',''you],['nice','to','meet','you']]]
        # 输入paragraphs[i]是代表段落的句子字符串列表;
        # 而输出paragraphs[i]是代表段落的句子列表,其中每个句子都是词元列表
        paragraphs = [ d2l.torch.tokenize(paragraph,token='word') for paragraph in paragraphs]
        sentences = [sentence for paragraph in paragraphs for sentence in paragraph]
        self.vocab = d2l.torch.Vocab(sentences,min_freq=5,reserved_tokens=['<pad>','<mask>','<cls>','<sep>'])
        # 获取下一句子预测任务的数据
        examples = []
        for paragraph in paragraphs:
            examples.extend(_get_nsp_data_from_paragraph(paragraph,paragraphs,self.vocab,max_len))
        # 获取遮蔽语言模型任务的数据
        examples = [ (_get_mlm_data_from_tokens(tokens,self.vocab)+(segments,is_next)) for (tokens,segments,is_next) in examples]
        # 填充输入
        (self.all_tokens_id,self.all_segments,self.all_valid_lens,self.all_mlm_pred_positions,self.all_mlm_pred_weights,self.all_mlm_pred_labels,self.all_nsp_labels) = _pad_bert_inputs(examples,max_len,self.vocab)
    def __getitem__(self, idx):
        return (self.all_tokens_id[idx],self.all_segments[idx],self.all_valid_lens[idx],self.all_mlm_pred_positions[idx],self.all_mlm_pred_weights[idx],self.all_mlm_pred_labels[idx],self.all_nsp_labels[idx])
    def __len__(self):
        return len(self.all_tokens_id)

通过使用_read_wiki函数和_WikiTextDataset类,定义了下面的load_data_wiki来下载并生成WikiText-2数据集,并从中生成预训练样本.

def load_data_wiki(batch_size,max_len):
    """加载WikiText-2数据集"""
    #num_workers = d2l.torch.get_dataloader_workers()
    data_dir = d2l.torch.download_extract('wikitext-2', 'wikitext-2')
    paragraphs = _read_wiki(data_dir)
    data_set = _WikiTextDataset(paragraphs,max_len)
    data_iter = torch.utils.data.DataLoader(data_set,batch_size,shuffle=True)#, num_workers=num_workers
    return data_iter,data_set.vocab

将批量大小设置为512,将BERT输入序列的最大长度设置为64,打印出小批量的BERT预训练样本的形状.注意在每个BERT输入序列中,为遮蔽语言模型任务预测 10 ( 64×0.15 )个位置.

batch_size,max_len = 512,64
data_iter,vocab = load_data_wiki(batch_size,max_len)
for (tokens_id_X,segments_X,valid_lens_X,mlm_pred_positions_X,mlm_pred_weights_X,mlm_pred_labels_Y,nsp_labels_Y) in data_iter:
    print(tokens_id_X.shape,segments_X.shape,valid_lens_X.shape,mlm_pred_positions_X.shape,mlm_pred_weights_X.shape,mlm_pred_labels_Y.shape,nsp_labels_Y.shape)
    break
输出结果如下:
torch.Size([512, 64]) torch.Size([512, 64]) torch.Size([512]) torch.Size([512, 10]) torch.Size([512, 10]) torch.Size([512, 10]) torch.Size([512])

Finally, look at the word count,Even after filtering out infrequent tokens,它仍然比PTBThe dataset is more than twice as large.

len(vocab)
输出结果如下:
20256

6. 小结

  • 与PTB数据集相比,WikiText-2The dataset retains the original punctuation、大小写和数字,并且比PTBThe dataset is more than twice as large.
  • can be accessed from anywhereWikiText-2Pre-training for pair sentence generation in the corpus(遮蔽语言模型和下一句预测)样本.

7. 全部代码

import os
import random
import torch
import d2l.torch

d2l.torch.DATA_HUB['wikitext-2'] = (
    'https://s3.amazonaws.com/research.metamind.io/wikitext/'
    'wikitext-2-v1.zip', '3c914d17d80b1459be871a5039ac23e752a53cbe')


def _read_wiki(data_dir):
    file_path = os.path.join(data_dir, 'wiki.train.tokens')
    with open(file_path, 'r', encoding='utf-8') as f:
        lines = f.readlines()
    # 大写字母转换为小写字母
    paragraphs = [line.strip().lower().split(' . ') for line in lines if
                  len(line.split(' . ')) >= 2]  #Note that the split sentence here is two spaces plus two spaces in between'.'句号,full stop'.'There is a space before and after as a sentence separator
    random.shuffle(paragraphs)
    #paragraphs是一个list of list列表
    return paragraphs


#Get a sequence pair(How to get the next adjacent sequence)
def _get_next_sentence(sentence, next_sentence, paragraphs):
    if random.random() < 0.5:
        is_next = True
    else:
        # paragraphs是三重列表的嵌套
        next_sentence = random.choice(random.choice(paragraphs))  #paragraphs 是一个list of list
        is_next = False
    return sentence, next_sentence, is_next


#一个段落(That is, a line of text paragraph data)Operations that make up a sequence pair,Whether the sequence pair is adjacent at the same time
def _get_nsp_data_from_paragraph(paragraph, paragraphs, vocab, max_len):
    nsp_data_from_paragraph = []
    for i in range(len(paragraph) - 1):
        tokens_a, tokens_b, is_next = _get_next_sentence(paragraph[i], paragraph[i + 1], paragraphs)
        # 考虑1个'<cls>'词元和2个'<sep>'词元
        if len(tokens_a) + len(tokens_b) + 3 > max_len:
            continue
        else:
            tokens, segments = d2l.torch.get_tokens_and_segments(tokens_a, tokens_b)
            nsp_data_from_paragraph.append((tokens, segments, is_next))
    return nsp_data_from_paragraph


#Randomly select some tokens from the input sequence to replace with '<mask>'word or other word
def _replace_mlm_tokens(tokens, candidate_pred_positions, num_mlm_preds, vocab):
    # 为遮蔽语言模型的输入创建新的词元副本,其中输入可能包含替换的“<mask>”或随机词元
    mlm_input_tokens = [token for token in tokens]
    pred_positions_and_labels = []
    # 打乱后用于在In the shadowing language model task获取15%的随机词元进行预测
    random.shuffle(candidate_pred_positions)
    for mlm_pred_position in candidate_pred_positions:
        if len(pred_positions_and_labels) >= num_mlm_preds:
            break
        mask_token = None
        # 80%的时间:将词替换为“<mask>”词元
        if random.random() < 0.8:
            mask_token = '<mask>'
        else:
            # 10%的时间:保持词不变
            if random.random() < 0.5:
                mask_token = tokens[mlm_pred_position]
            # 10%的时间:用随机词替换该词
            else:
                mask_token = random.choice(vocab.idx_to_token)
        mlm_input_tokens[mlm_pred_position] = mask_token
        pred_positions_and_labels.append((mlm_pred_position, tokens[mlm_pred_position]))
    return mlm_input_tokens, pred_positions_and_labels


#The input lexeme sequence will be replaced with a mask as input to the model,将其在vocab中的索引id表示出来,Simultaneously predicted location indices,The predicted location label token is in vocab中的索引id表示出来
#Get the input sequence after mask substitution,The location of the mask replacement(Where each sequence needs to be predictedid)As well as getting the original lemma where each sequence needs to be predictedlabel
def _get_mlm_data_from_tokens(tokens, vocab):
    candidate_pred_positions = []
    # tokens是一个字符串列表
    for i, token in enumerate(tokens):
        # 在In the shadowing language model task不会预测特殊词元
        if token in ('<cls>', '<sep>'):
            continue  #Extra special characters are not expected'<cls>','<sep>'
        else:
            candidate_pred_positions.append(i)
    # In the shadowing language model task预测15%的随机词元
    num_mlm_preds = max(1, round(len(tokens) * 0.15))  #The number of predictions per sequence
    mlm_input_tokens, pred_positions_and_labels = _replace_mlm_tokens(tokens, candidate_pred_positions, num_mlm_preds,
                                                                      vocab)
    pred_positions_and_labels = sorted(pred_positions_and_labels, key=lambda x: x[0])  #Sort by the index size of the predicted location
    pred_positions = [v[0] for v in pred_positions_and_labels]  #Get the index of the predicted location
    mlm_pred_labels = [v[1] for v in pred_positions_and_labels]  #Get the token for the predicted position
    return vocab[mlm_input_tokens], pred_positions, vocab[
        mlm_pred_labels]  #The input lexeme sequence will be replaced with a mask as input to the model,将其在vocab中的索引id表示出来,Simultaneously predicted location indices,The predicted location label token is in vocab中的索引id表示出来


#The operation done by this function is mainly to be able to work with matrices(张量)进行批量计算,Keep the shape and size of the tensor the same
def _pad_bert_inputs(examples, max_len, vocab):
    max_num_mlm_preds = round(max_len * 0.15)  #round()The function rounds to an integer
    all_tokens_id, all_segments, all_valid_lens = [], [], []
    all_mlm_pred_labels, all_mlm_pred_positions, all_mlm_pred_weights = [], [], []
    all_nsp_labels = []
    for (tokens_id, mlm_pred_positions, mlm_pred_labels, segments, is_next) in examples:
        all_tokens_id.append(torch.tensor(tokens_id + [vocab['<pad>']] * (max_len - len(tokens_id)), dtype=torch.long))
        all_segments.append(torch.tensor(segments + [0] * (max_len - len(segments)), dtype=torch.long))
        # valid_lens不包括'<pad>'的计数
        all_valid_lens.append(torch.tensor(len(tokens_id), dtype=torch.float32))
        all_mlm_pred_positions.append(
            torch.tensor(mlm_pred_positions + [0] * (max_num_mlm_preds - len(mlm_pred_labels)),
                         dtype=torch.long))  #[0]is the first token of each sequence that is<cls>词元,增加对[<cls>]的掩码预测,相当于pad的作用
        #[0]应该改成[3],因为<cls>的索引是3,Vocab:[<unk>,<pad>,<mask>,<cls>.....],It's just that this part of the padding doesn't matter,Because part of the padding will not be used to calculate the error loss
        all_mlm_pred_labels.append(
            torch.tensor(mlm_pred_labels + [0] * (max_num_mlm_preds - len(mlm_pred_labels)), dtype=torch.long))
        # 填充词元的预测将通过乘以0权重在损失中过滤掉
        all_mlm_pred_weights.append(
            torch.tensor([1.0] * len(mlm_pred_labels) + [0.0] * (max_num_mlm_preds - len(mlm_pred_labels)),
                         dtype=torch.float32))
        all_nsp_labels.append(torch.tensor(is_next, dtype=torch.long))
    return (
    all_tokens_id, all_segments, all_valid_lens, all_mlm_pred_positions, all_mlm_pred_weights, all_mlm_pred_labels,
    all_nsp_labels)


class _WikiTextDataset(torch.utils.data.Dataset):
    def __init__(self, paragraphs, max_len):
        #paragraphsis a three-dimensional list,是一个list of list of listTriple nested lists,The elements inside a triple nested list are a two-dimensional nested list,表示一个段落(A paragraph consists of several sentences),The element inside the two-dimensional nested list is one(一维列表)字符串,表示一个句子,Adjacent elements in a two-dimensional nested list represent adjacent sentences in a paragraph,Then repeat these sentences‘词’为单位分开,Forms a list of words(Thus, a one-dimensional list with words as a unit in the two-dimensional nested list in the three-dimensional nested list is formed),例如:[[['hello','my name is xxx'],['your','name','is'],['today','is','nice']],[['who','are','you'],['how','are',''you],['nice','to','meet','you']]]
        # 输入paragraphs[i]是代表段落的句子字符串列表;
        # 而输出paragraphs[i]是代表段落的句子列表,其中每个句子都是词元列表
        paragraphs = [d2l.torch.tokenize(paragraph, token='word') for paragraph in paragraphs]
        sentences = [sentence for paragraph in paragraphs for sentence in paragraph]
        self.vocab = d2l.torch.Vocab(sentences, min_freq=5, reserved_tokens=['<pad>', '<mask>', '<cls>', '<sep>'])
        # 获取下一句子预测任务的数据
        examples = []
        for paragraph in paragraphs:
            examples.extend(_get_nsp_data_from_paragraph(paragraph, paragraphs, self.vocab, max_len))
        # 获取遮蔽语言模型任务的数据
        examples = [(_get_mlm_data_from_tokens(tokens, self.vocab) + (segments, is_next)) for
                    (tokens, segments, is_next) in examples]
        # 填充输入
        (self.all_tokens_id, self.all_segments, self.all_valid_lens, self.all_mlm_pred_positions,
         self.all_mlm_pred_weights, self.all_mlm_pred_labels, self.all_nsp_labels) = _pad_bert_inputs(examples, max_len,
                                                                                                      self.vocab)

    def __getitem__(self, idx):
        return (
        self.all_tokens_id[idx], self.all_segments[idx], self.all_valid_lens[idx], self.all_mlm_pred_positions[idx],
        self.all_mlm_pred_weights[idx], self.all_mlm_pred_labels[idx], self.all_nsp_labels[idx])

    def __len__(self):
        return len(self.all_tokens_id)


def load_data_wiki(batch_size, max_len):
    """加载WikiText-2数据集"""
    #num_workers = d2l.torch.get_dataloader_workers()
    data_dir = d2l.torch.download_extract('wikitext-2', 'wikitext-2')
    paragraphs = _read_wiki(data_dir)
    data_set = _WikiTextDataset(paragraphs, max_len)
    data_iter = torch.utils.data.DataLoader(data_set, batch_size, shuffle=True)  #, num_workers=num_workers
    return data_iter, data_set.vocab


batch_size, max_len = 512, 64
data_iter, vocab = load_data_wiki(batch_size, max_len)
for (tokens_id_X, segments_X, valid_lens_X, mlm_pred_positions_X, mlm_pred_weights_X, mlm_pred_labels_Y,
     nsp_labels_Y) in data_iter:
    print(tokens_id_X.shape, segments_X.shape, valid_lens_X.shape, mlm_pred_positions_X.shape, mlm_pred_weights_X.shape,
          mlm_pred_labels_Y.shape, nsp_labels_Y.shape)
    break
len(vocab)

8. 相关链接

BERT预训练第一篇:李沐动手学深度学习V2-bert和代码实现
BERT预训练第二篇:李沐动手学深度学习V2-bert预训练数据集和代码实现
BERT预训练第三篇:李沐动手学深度学习V2-BERT预训练和代码实现

原网站

版权声明
本文为[cv_lhp]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/214/202208022001345156.html