keras implement of transformers for humans

Overview

bert4keras

说明

这是笔者重新实现的keras版的transformer模型库,致力于用尽可能清爽的代码来实现结合transformer和keras。

本项目的初衷是为了修改、定制上的方便,所以可能会频繁更新。

因此欢迎star,但不建议fork,因为你fork下来的版本可能很快就过期了。

功能

目前已经实现:

  • 加载bert/roberta/albert的预训练权重进行finetune;
  • 实现语言模型、seq2seq所需要的attention mask;
  • 丰富的examples
  • 从零预训练代码(支持TPU、多GPU,请看pretraining);
  • 兼容keras、tf.keras

使用

安装稳定版:

pip install bert4keras

安装最新版:

pip install git+https://www.github.com/bojone/bert4keras.git

使用例子请参考examples目录。

之前基于keras-bert给出的例子,仍适用于本项目,只需要将bert_model的加载方式换成本项目的。

理论上兼容Python2和Python3,兼容tensorflow 1.14+和tensorflow 2.x,实验环境是Python 2.7、Tesorflow 1.14+以及Keras 2.3.1(已经在2.2.4、2.3.0、2.3.1、tf.keras下测试通过)。

为了获得最好的体验,建议你使用Tensorflow 1.14 + Keras 2.3.1组合。

关于环境组合
  • 支持tf+keras和tf+tf.keras,后者需要提前传入环境变量TF_KERAS=1。

  • 当使用tf+keras时,建议2.2.4 <= keras <= 2.3.1,以及 1.14 <= tf <= 2.2,不能使用tf 2.3+。

  • keras 2.4+可以用,但事实上keras 2.4.x基本上已经完全等价于tf.keras了,因此如果你要用keras 2.4+,倒不如直接用tf.keras。

当然,乐于贡献的朋友如果发现了某些bug的话,也欢迎指出修正甚至Pull Requests~

权重

目前支持加载的权重:

注意事项

  • 注1:brightmart版albert的开源时间早于Google版albert,这导致早期brightmart版albert的权重与Google版的不完全一致,换言之两者不能直接相互替换。为了减少代码冗余,bert4keras的0.2.4及后续版本均只支持加载Google版以brightmart版中带Google字眼的权重。如果要加载早期版本的权重,请用0.2.3版本,或者考虑作者转换过的albert_zh
  • 注2:下载下来的ELECTRA权重,如果没有json配置文件的话,参考这里自己改一个(需要加上type_vocab_size字段)。

更新

  • 2021.04.23: 增加GlobalPointer
  • 2021.03.23: 增加RoFormer
  • 2021.01.30: 发布0.9.9版,完善多GPU支持,增加多GPU例子:task_seq2seq_autotitle_multigpu.py
  • 2020.12.29: 增加residual_attention_scores参数来实现RealFormer,只需要在build_transformer_model中传入参数residual_attention_scores=True启用。
  • 2020.12.04: PositionEmbedding引入层次分解,可以让BERT直接处理超长文本,在build_transformer_model中传入参数hierarchical_position=True启用。
  • 2020.11.19: 支持GPT2模型,参考CPM_LM_bert4keras项目。
  • 2020.11.14: 新增分参数学习率extend_with_parameter_wise_lr,可用于给每层设置不同的学习率。
  • 2020.10.27: 支持T5.1.1Multilingual T5
  • 2020.08.28: 支持GPT_OpenAI
  • 2020.08.22: 新增WebServing类,允许简单地将模型转换为Web接口,详情请参考该类的说明
  • 2020.07.14: Transformer类加入prefix参数;snippets.py引入to_array函数;AutoRegressiveDecoder修改rtype='logits'时的一个隐藏bug。
  • 2020.06.06: 强迫症作祟:将Tokenizer原来的max_length参数重命名为maxlen,同时保留向后兼容性,建议大家用新参数名。
  • 2020.04.29: 增加重计算(参考keras_recompute),可以通过时间换空间,通过设置环境变量RECOMPUTE=1启用。
  • 2020.04.25: 优化tf2下的表现。
  • 2020.04.16: 所有example均适配tensorflow 2.0。
  • 2020.04.06: 增加UniLM预训练模式(测试中)。
  • 2020.04.06: 完善rematch方法。
  • 2020.04.01: Tokenizer增加rematch方法,给出分词结果与原序列的映射关系。
  • 2020.03.30: 尽量统一py文件的写法。
  • 2020.03.25: 支持ELECTRA。
  • 2020.03.24: 继续加强DataGenerator,允许传入迭代器时进行局部shuffle。
  • 2020.03.23: 增加调整Attention的key_size的选项。
  • 2020.03.17: 增强DataGenerator;优化模型写法。
  • 2020.03.15: 支持GPT2_ML
  • 2020.03.10: 支持Google的T5模型。
  • 2020.03.05: 将tokenizer.py更名为tokenizers.py
  • 2020.03.05: application='seq2seq'改名为application='unilm'
  • 2020.03.05: build_bert_model更名为build_transformer_model
  • 2020.03.05: 重写models.py结构。
  • 2020.03.04: 将bert.py更名为models.py
  • 2020.03.02: 重构mask机制(用回Keras自带的mask机制),以便更好地编写更复杂的应用。
  • 2020.02.22: 新增AutoRegressiveDecoder类,统一处理Seq2Seq的解码问题。
  • 2020.02.19: transformer block的前缀改为Transformer(本来是Encoder),使得其含义局限性更少。
  • 2020.02.13: 优化load_vocab函数;将build_bert_model中的keep_words参数更名为keep_tokens,此处改动可能会对部分脚本产生影响。
  • 2020.01.18: 调整文本处理方式,去掉codecs的使用。
  • 2020.01.17: 各api日趋稳定,为了方便大家使用,打包到pypi,首个打包版本号为0.4.6。
  • 2020.01.10: 重写模型mask方案,某种程度上让代码更为简练清晰;后端优化。
  • 2019.12.27: 重构预训练代码,减少冗余;目前支持RoBERTa和GPT两种预训练方式,详见pretraining
  • 2019.12.17: 适配华为的nezha权重,只需要在build_bert_model函数里加上model='nezha';此外原来albert的加载方式albert=True改为model='albert'
  • 2019.12.16: 通过跟keras 2.3+版本类似的思路给低版本引入层中层功能,从而恢复对低于2.3.0版本的keras的支持。
  • 2019.12.14: 新增Conditional Layer Normalization及相关demo。
  • 2019.12.09: 各example的data_generator规范化;修复application='lm'时的一个错误。
  • 2019.12.05: 优化tokenizer的do_lower_case,同时微调各个example。
  • 2019.11.23: 将train.py重命名为optimizers.py,更新大量优化器实现,全面兼容keras和tf.keras。
  • 2019.11.19: 将utils.py重命名为tokenizer.py。
  • 2019.11.19: 想来想去,最后还是决定把snippets放到bert4keras.snippets下面去好了。
  • 2019.11.18: 优化预训练权重加载逻辑,增加保存模型权重至Bert的checkpoint格式方法。
  • 2019.11.17: 分离一些与Bert本身不直接相关的常用代码片段到python_snippets,供其它项目共用。
  • 2019.11.11: 添加NSP部分。
  • 2019.11.05: 适配google版albert,不再支持非Google版albert_zh
  • 2019.11.05: 以RoBERTa为例子的预训练代码开发完毕,同时支持TPU/多GPU训练,详见roberta。欢迎在此基础上构建更多的预训练代码。
  • 2019.11.01: 逐步增加预训练相关代码,详见pretraining
  • 2019.10.28: 支持使用基于sentencepiece的tokenizer。
  • 2019.10.25: 引入原生tokenizer。
  • 2019.10.22: 引入梯度累积优化器。
  • 2019.10.21: 为了简化代码结构,决定放弃keras 2.3.0之前的版本的支持,目前只支持keras 2.3.0+以及tf.keras。
  • 2019.10.20: 应网友要求,现支持直接用model.save保存模型结构,用load_model加载整个模型(只需要在load_model之前执行from bert4keras.layers import *,不需要额外写custom_objects)。
  • 2019.10.09: 已兼容tf.keras,同时在tf 1.13和tf 2.0下的tf.keras测试通过,通过设置环境变量TF_KERAS=1来切换tf.keras。
  • 2019.10.09: 已兼容Keras 2.3.x,但只是临时方案,后续可能直接移除掉2.3之前版本的支持。
  • 2019.10.02: 适配albert,能成功加载albert_zh的权重,只需要在load_pretrained_model函数里加上albert=True

背景

之前一直用CyberZHG大佬的keras-bert,如果纯粹只是为了在keras下对bert进行调用和fine tune来说,keras-bert已经足够能让人满意了。

然而,如果想要在加载官方预训练权重的基础上,对bert的内部结构进行修改,那么keras-bert就比较难满足我们的需求了,因为keras-bert为了代码的复用性,几乎将每个小模块都封装为了一个单独的库,比如keras-bert依赖于keras-transformer,而keras-transformer依赖于keras-multi-head,keras-multi-head依赖于keras-self-attention,这样一重重依赖下去,改起来就相当头疼了。

所以,我决定重新写一个keras版的bert,争取在几个文件内把它完整地实现出来,减少这些依赖性,并且保留可以加载官方预训练权重的特性。

鸣谢

感谢CyberZHG大佬实现的keras-bert,本实现有不少地方参考了keras-bert的源码,在此衷心感谢大佬的无私奉献。

引用

@misc{bert4keras,
  title={bert4keras},
  author={Jianlin Su},
  year={2020},
  howpublished={\url{https://bert4keras.spaces.ac.cn}},
}

交流

QQ交流群:808623966,微信群请加机器人微信号spaces_ac_cn

Comments
  • Using fine tuned Model

    Using fine tuned Model

    I post this to find if I'm doing the write thing.

    I just add dense and softmax to fine tune the model

    albert_model = build_bert_model(config_path, checkpoint_path, albert=True)
    out = Lambda(lambda x: x[: 0])(albert_model.output)
    output = Dense(units=class_num, activation = 'softmax')(out)
    

    after I trained the model, I try to load the model by

    model = load_model (model.dir)
    

    and I get the error like 'I miss the custom layer 'TokenEmbedding' after that, I try

     custom_objects = {'MaskedGlobalPool1D: MaskedGlobalPool1D}
     custom_objects.update(get_bert_custom_objects())
    

    get_bert_custom_objects() is come from keras_bert, basically just define some custom layer while MaskedGlobalPool1D from keras_bert aiming to get rid of the mask of the output of the model.

    I don't know if I'm doing right, since the prediction is not good enough. Can someone explain what is the TokenEmbedding layer, the dese layer I defined?

    opened by erichen510 23
  • 将bert4keras包更新到最新后,对roberta进行预训练遇到问题。

    将bert4keras包更新到最新后,对roberta进行预训练遇到问题。

    File "/root/anaconda3/envs/tf2/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap self.run() File "/root/anaconda3/envs/tf2/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs)

    python 3.6 keras 2.3.1 tensorflow 1.13.2 运行data_utiils.py报错 File "/root/anaconda3/envs/tf2/lib/python3.6/multiprocessing/pool.py", line 103, in worker initializer(*initargs) File "/data/sfang/Pretrain/bert4keras/snippets.py", line 166, in worker_step r = func(d) File "/data/sfang/Pretrain/pretraining/data_utils.py", line 117, in paragraph_process instances = self.paragraph_process(texts) File "/data/sfang/Pretrain/pretraining/data_utils.py", line 209, in paragraph_process return super(TrainingDatasetRoBERTa, self).paragraph_process(texts, starts, ends, paddings) File "/data/sfang/Pretrain/pretraining/data_utils.py", line 53, in paragraph_process sub_instance = self.sentence_process(text) File "/data/sfang/Pretrain/pretraining/data_utils.py", line 188, in sentence_process add_sep=False) TypeError: tokenize() got an unexpected keyword argument 'add_cls'

    opened by Fan9 17
  • model.save 的问题

    model.save 的问题

    提问时请尽可能提供如下信息:

    基本信息

    • 你使用的操作系统:
    • 你使用的Python版本: 3
    • 你使用的Tensorflow版本: 1.14.0-gpu
    • 你使用的Keras版本: 2.3.1
    • 你使用的bert4keras版本: 0.9.1
    • 你使用纯keras还是tf.keras: import keras
    • 你加载的预训练模型: roberta预训练

    核心代码

    # 请在此处贴上你的核心代码。
    # 请尽量只保留关键部分,不要无脑贴全部代码。
    ----> 1 model.save('./best_.weight')
    

    输出信息

    TypeError: a bytes-like object is required, not 'str'# 请在此处贴上你的调试输出

    
    ### 自我尝试
    不管什么问题,请先尝试自行解决,“万般努力”之下仍然无法解决再来提问。此处请贴上你的努力过程。
    
    opened by ZeKunZhang1998 16
  • 加载徐亮版的albert,报错了

    加载徐亮版的albert,报错了

    用example/task_sentiment_albert.py加载徐亮的tiny版albert ,出现下面错误 ValueError: Layer weight shape (21128, 312) not compatible with provided weight shape (21128, 128) 用keras_bert加载也出现同样的错,这是怎么回事啊?

    opened by yuhao1982 16
  • 接 Conv1D 提示不支持

    接 Conv1D 提示不支持

    提问时请尽可能提供如下信息:

    基本信息

    • 你使用的操作系统: Linux
    • 你使用的Python版本: 3.6
    • 你使用的Tensorflow版本: 2.1
    • 你使用的Keras版本: 2.3.1
    • 你使用的bert4keras版本:
    • 你使用纯keras还是tf.keras: 0.7.0
    • 你加载的预训练模型: Albert

    核心代码

    albert = build_transformer_model(
                config_path = config['model_config_path'],
                checkpoint_path = config['checkpoint_path'],
                model = 'albert',
                return_keras_model = False
                )
    output = Conv1D(
                filters = 2 * config['hidden_size'],
                kernel_size = 3
                )(output)
    

    输出信息

    TypeError: Layer conv1d_1 does not support masking, but was passed an input_mask: Tensor("Transformer-FeedForward-Add_5/All:0", shape=(None, None), dtype=bool)
    

    作为downstrea task,请问如何接1d的卷积层呢?

    opened by jerrysjin 14
  • AttributeError: module 'tensorflow.python.framework.ops' has no attribute '_TensorLike'

    AttributeError: module 'tensorflow.python.framework.ops' has no attribute '_TensorLike'

    提问时请尽可能提供如下信息:

    基本信息

    • 你使用的操作系统: colab
    • 你使用的Python版本: 3.6
    • 你使用的Tensorflow版本: 2.3.0
    • 你使用的Keras版本: 2.3.1
    • 你使用的bert4keras版本: 0.8.8
    • 你使用纯keras还是tf.keras:
    • 你加载的预训练模型:bert

    核心代码

    # 请在此处贴上你的核心代码。
    # 请尽量只保留关键部分,不要无脑贴全部代码。
    

    from bert4keras.models import build_transformer_model from bert4keras.tokenizers import Tokenizer import numpy as np

    config_path = '/content/drive/My Drive/roberta_zh_L-6-H-768_A-12.zip (Unzipped Files)/bert_config.json' checkpoint_path = '/content/drive/My Drive/roberta_zh_L-6-H-768_A-12.zip (Unzipped Files)/bert_model.ckpt.data-00000-of-00001' dict_path = '/content/drive/My Drive/roberta_zh_L-6-H-768_A-12.zip (Unzipped Files)/vocab.txt'

    tokenizer = Tokenizer(dict_path, do_lower_case=True) # 建立分詞器 model = build_transformer_model(config_path, checkpoint_path) # 建立模型,加載權重

    输出信息

    # 请在此处贴上你的调试输出
    

    AttributeError Traceback (most recent call last) in () 8 9 tokenizer = Tokenizer(dict_path, do_lower_case=True) # 建立分詞器 ---> 10 model = build_transformer_model(config_path, checkpoint_path) # 建立模型,加載權重 11

    9 frames /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py in is_tensor(x)

    AttributeError: module 'tensorflow.python.framework.ops' has no attribute '_TensorLike'

    自我尝试

    不管什么问题,请先尝试自行解决,“万般努力”之下仍然无法解决再来提问。此处请贴上你的努力过程。 網路上,解決問題的方法有二:

    1. 由 “form keras.xxx import xxx” 變更為 “form tensorflow.keras.xxx import xxx” ..... (X)
    2. keras / tensorflow版本 衝突 ,安裝 相互 不衝突 的版本....(X) 即使 版大推薦的 keras2.3.1/ tf 1.14版本也無法
    opened by fox88-tw 11
  • 运行ner报错 No such layer: Transformer-12-FeedForward-Norm

    运行ner报错 No such layer: Transformer-12-FeedForward-Norm

    运行示例task_sequence_labeling_ner_crf.py(数据是按注释下载解压的)时,报错: , line 114, in output = model.get_layer(output_layer).output File "××××/site-packages/keras/engine/network.py", line 365, in get_layer raise ValueError('No such layer: ' + name) ValueError: No such layer: Transformer-12-FeedForward-Norm 试了普通roberta模型和追一科技自己的small模型都有这个问题。

    opened by yaleimeng 11
  • 让bert4keras使用Tensorflow serving调用模型

    让bert4keras使用Tensorflow serving调用模型

    用bert4keras玩了玩gpt2-ml,觉得每次生成都要重载模型太麻烦,于是试了试用tf serving给bert4keras 提供模型 https://gist.github.com/liprais/78ef34ac0779fb6b604eae044d789f90

    0.基于tf2.0

    1. 把已有的模型导出成pb格式

    保存的时候的INFO:tensorflow:Unsupported signature for serialization貌似不用管

    import os
    os.environ['TF_KERAS'] = '1'
    import numpy as np
    from bert4keras.backend import K as K
    from bert4keras.models import build_transformer_model
    from bert4keras.tokenizers import Tokenizer
    from bert4keras.snippets import AutoRegressiveDecoder
    from keras.models import load_model
    import tensorflow as tf
    from tensorflow.python.framework.ops import disable_eager_execution
    disable_eager_execution()
    model = 'gptml.h5'
    base = '/Volumes/Untitled/pb'
    keras_model = load_model(model,compile=False)
    keras_model.save(base + '/150k/1',save_format='tf') # <====注意model path里面的1是代表版本号,必须有这个不然tf serving 会报找不到可以serve的model
    

    2. 用docker启动server

    docker run -p 8501:8501 --mount type=bind,source=/Volumes/Untitled/pb/150k/,target=/models/my_model -e MODEL_NAME=my_model -t tensorflow/serving
    

    在这个页面查看模型的元数据: http://localhost:8501/v1/models/my_model/metadata inputs后面的就是api要求的参数

    "inputs": { "Input-Token": { "dtype": "DT_FLOAT","tensor_shape": {"dim": [{"size": "-1","name": ""},{"size": "-1","name": ""}],"unknown_rank": false},"name": "serving_default_Input-Token:0"}}
    

    或者不用docker的话,在ubuntu上可以这样:先安装tensorflow_model_server,然后执行命令:

    tensorflow_model_server --model_base_path="/Volumes/Untitled/pb/150k" --rest_api_port=8501 --model_name="my_model"
    

    3.用requests调用

    import requests
    import json
    payload = [[1921,7471,5682,5023,4170,7433]] # <=== 这是tokenizer编码过的中文:天青色等烟雨
    d = {"signature_name": "serving_default",
         "inputs": {"Input-Token":[[1921,7471,5682,5023,4170,7433]]}} <=== payload
    r = requests.post('http://127.0.0.1:8501/v1/models/my_model:predict',json=d)
    print(r.json())
    

    4.把类修改一下,加上远程调用的方法

    以 https://github.com/bojone/bert4keras/blob/master/examples/basic_language_model_gpt2_ml.py 为例子 先依赖requests 和 numpy

    import requests 
    import numpy
    class ArticleCompletion(AutoRegressiveDecoder):
        """基于随机采样的文章续写
        """
        @AutoRegressiveDecoder.wraps(default_rtype='probas')
        def predict(self, inputs, output_ids, states):
            token_ids = np.concatenate([inputs[0], output_ids], 1)
            model_output = self.remote_call(token_ids)
            return model_output[:, -1]
    
    
        def generate(self, text, n=1, topk=5):
            token_ids, _ = tokenizer.encode(text)
            results = self.random_sample([token_ids], n, topk)  # 基于随机采样
            return [text + tokenizer.decode(ids) for ids in results]
           
        def remote_call(self,token_ids):
            payload = token_ids.tolist()
            d = {"signature_name": "serving_default","inputs": {"Input-Token":payload}}
            r = requests.post('http://127.0.0.1:8501/v1/models/my_model:predict',json=d)
            return numpy.array(r.json()['outputs'])
    

    然后就可以跑了 班门弄斧,献丑了

    opened by liprais 10
  • pretrain问题

    pretrain问题

    提问时请尽可能提供如下信息:

    基本信息

    • 你使用的操作系统: linux
    • 你使用的Python版本: 3.6.4
    • 你使用的Tensorflow版本: 1.14.0
    • 你使用的Keras版本: 2.2.4
    • 你使用的bert4keras版本: 0.6.1
    • 你使用纯keras还是tf.keras: tf.keras
    • 你加载的预训练模型:google/bert

    核心代码

    # 这是我修改的data_utils.py中some_texts()部分
        def some_texts():
            filenames = glob.glob('XXX.txt')
            np.random.shuffle(filenames)
            count, texts = 0, []
            for filename in filenames:
                with open(filename) as f:
                    for l in f:
                        l = l.strip()
                        texts.append(l)
                        yield texts
                        texts = []
    

    输出信息

    # 请在此处贴上你的调试输出
    

    自我尝试

    不管什么问题,请先尝试自行解决,“万般努力”之下仍然无法解决再来提问。此处请贴上你的努力过程。

    苏老师好,最近在使用bert4keras针对自己任务语料做预训练,把自己的问题总结成一句话就是“不清楚如何修改data_utils.py和pretraining.py相应的配置和参数,以适配自己的语料”。以下是我的具体问题以及实验,请苏老师指导一下。

    一、任务背景: 1、对评论做二分类,评论属于短文本,长度一般集中在10-30之间。使用bert,f1=66%,我认为还是有较大提升空间,所以用更大的相关的评论语料对bert预训练。 2、语料规模是1千5百万条长度大于10的评论文本,按照data_utils.py逻辑,会重复10遍,就是1亿5千万 3、采用roberta的方式预训练

    二、data_utils.py数据生成逻辑的修改: 原代码从10篇文章中提取文本,组成长度尽可能接近sequence_length=512的complete_instance。由于我的语料都是短文本,也并不像文章一样每句话都有联系。所以改为了sequence_length=40,complete_instance只由一条短文本组成。这样做有什么负面影响吗?

    三、pretraining.py参数: 仔细查了每个配置参数的意思,但是还是不知道如何根据自己的语料来设置具体参数。还请苏老师讲解一下。 1、num_train_steps参数 google/bert的run_pretraining.py中这个参数是直接写死了,也没说为什么。 2、num_warmup_steps参数 google/bert的run_pretraining.py中设置的是num_train_steps的10%,我们的代码中设置num_warmup_steps = 3125的原因是什么呢? 3、lr_schedule的设置 google/bert中,学习率先从0提升到lr,然后是保持不变,我们为什么就降低到0呢。当然可能跟我们lr=0.00176比较大有关,但为什么要这样呢?

    四、实验 1、模型初始参数采用google/bert的参数 2、设置batch_size = 2048,其余采用pretraining.py中默认的参数,从training.log来看,准确率最高只接近37%。我看您在issue84中提到acc一般是50%-60%。苏老师觉得我的问题可能出在哪里呢?

    epoch,loss,mlm_acc_loss,mlm_loss_loss
    0,4.1459182634592056,0.34518507,3.8007286
    1,3.9495830785989763,0.3676725,3.5819125
    2,3.9327449764490128,0.3699239,3.562816
    3,3.951032319355011,0.3666434,3.584382
    4,3.9811926936149598,0.36270177,3.6184883
    5,3.997273787522316,0.36036325,3.6369083
    

    3、设置batch_size = 2048,steps_per_epoch = 100000,其余采用pretraining.py中默认的参数。结果大致不变。

    epoch,loss,mlm_acc_loss,mlm_loss_loss
    0,3.986297978975773,0.36288327,3.623436
    1,3.9343528621816635,0.37045658,3.5639102
    

    4、预训练之后,再在下游任务微调,指标没有明显变化,f1提升1个点吧,考虑到base f1=66%,我觉得应该提升更多才对

    期待苏老师的回复,谢谢

    opened by SuMeng123 9
  • 使用pretraining保存的模型格式无法直接加载

    使用pretraining保存的模型格式无法直接加载

    环境配置: tensorflow-gpu: 1.14.0 keras:2.2.4

    没有使用多gpu,仅仅使用单gpu做预训练的,

    # 单机多卡
    # strategy = tf.distribute.MirroredStrategy()
    # with strategy.scope():
    train_model = build_transformer_model_for_pretraining()
    #train_model.summary()
    

    生成的checkpoint格式: bert_model.data-00000-of-00002
    bert_model.data-00001-of-00002
    bert_model.index
    checkpoint

    模型加载方式 bert_model = build_transformer_model(config_path=Config.bert_config_path,checkpoint_path=Config.bert_ckpt_path,model='roberta')

    加载报错 tensorflow.python.framework.errors_impl.NotFoundError: Key bert/embeddings/word_embeddings not found in checkpoint

    经过大佬鞭策和指导后有了解决方案:

    train_model, bert = build_transformer_model_for_pretraining()
    # 模型训练
    train_model.fit(
        dataset,
        steps_per_epoch=steps_per_epoch,
        epochs=epochs,
        callbacks=[checkpoint, csv_logger],
    )
    
    train_model.load_weights(model_saved_path)
    bert.save_weights_as_checkpoint(filename='bert_model/bert_model.ckpt')
    
    opened by fengxin619 9
  • v0.7.4版的task_sequence_labeling_ner_crf.py代码跑出来效果 比v.0.5.0 版效果差的非常多,怀疑那里有bug

    v0.7.4版的task_sequence_labeling_ner_crf.py代码跑出来效果 比v.0.5.0 版效果差的非常多,怀疑那里有bug

    提问时请尽可能提供如下信息:

    基本信息

    • 你使用的操作系统: linux, redhat
    • 你使用的Python版本: 3.7
    • 你使用的Tensorflow版本: 2.1
    • 你使用的Keras版本:
    • 你使用的bert4keras版本: v.0.5.0和v.0.7.4
    • 你使用纯keras还是tf.keras: tf.keras
    • 你加载的预训练模型: chinese_L-12_H-768_A-12

    核心代码

    v.0.7.4的代码

    import numpy as np
    from bert4keras.backend import keras, K
    from bert4keras.models import build_transformer_model
    from bert4keras.tokenizers import Tokenizer
    from bert4keras.optimizers import Adam
    from bert4keras.snippets import sequence_padding, DataGenerator
    from bert4keras.snippets import open, ViterbiDecoder
    from bert4keras.layers import ConditionalRandomField
    from keras.layers import Dense
    from keras.models import Model
    from tqdm import tqdm
    from tensorflow.compat.v1 import ConfigProto
    from tensorflow.compat.v1 import InteractiveSession
    
    config = ConfigProto()
    # config.gpu_options.per_process_gpu_memory_fraction = 0.2
    config.gpu_options.allow_growth = True
    session = InteractiveSession(config=config)
    
    
    
    
    maxlen = 256
    epochs = 10
    batch_size = 32
    bert_layers = 12
    learing_rate = 1e-5  # bert_layers越小,学习率应该要越大
    crf_lr_multiplier = 1000  # 必要时扩大CRF层的学习率
    
    # bert配置
    config_path = '/home/dengyong/chinese_L-12_H-768_A-12/bert_config.json'
    checkpoint_path = '/home/dengyong/chinese_L-12_H-768_A-12/bert_model.ckpt'
    dict_path = '/home/dengyong/chinese_L-12_H-768_A-12/vocab.txt'
    
    
    def load_data(filename):
        D = []
        with open(filename, encoding='utf-8') as f:
            f = f.read()
            for l in f.split('\n\n'):
                if not l:
                    continue
                d, last_flag = [], ''
                for c in l.split('\n'):
                    char, this_flag = c.split(' ')
                    if this_flag == 'O' and last_flag == 'O':
                        d[-1][0] += char
                    elif this_flag == 'O' and last_flag != 'O':
                        d.append([char, 'O'])
                    elif this_flag[:1] == 'B':
                        d.append([char, this_flag[2:]])
                    else:
                        d[-1][0] += char
                    last_flag = this_flag
                D.append(d)
        return D
    
    
    # 标注数据
    train_data = load_data('/home/dengyong/NER/data/china-people-daily-ner-corpus/example.train')
    valid_data = load_data('/home/dengyong/NER/data/china-people-daily-ner-corpus/example.dev')
    test_data = load_data('/home/dengyong/NER/data/china-people-daily-ner-corpus/example.test')
    
    # 建立分词器
    tokenizer = Tokenizer(dict_path, do_lower_case=True)
    
    # 类别映射
    labels = ['PER', 'LOC', 'ORG']
    id2label = dict(enumerate(labels))
    label2id = {j: i for i, j in id2label.items()}
    num_labels = len(labels) * 2 + 1
    
    
    class data_generator(DataGenerator):
        """数据生成器
        """
        def __iter__(self, random=False):
            batch_token_ids, batch_segment_ids, batch_labels = [], [], []
            for is_end, item in self.sample(random):
                token_ids, labels = [tokenizer._token_start_id], [0]
                for w, l in item:
                    w_token_ids = tokenizer.encode(w)[0][1:-1]
                    if len(token_ids) + len(w_token_ids) < maxlen:
                        token_ids += w_token_ids
                        if l == 'O':
                            labels += [0] * len(w_token_ids)
                        else:
                            B = label2id[l] * 2 + 1
                            I = label2id[l] * 2 + 2
                            labels += ([B] + [I] * (len(w_token_ids) - 1))
                    else:
                        break
                token_ids += [tokenizer._token_end_id]
                labels += [0]
                segment_ids = [0] * len(token_ids)
                batch_token_ids.append(token_ids)
                batch_segment_ids.append(segment_ids)
                batch_labels.append(labels)
                if len(batch_token_ids) == self.batch_size or is_end:
                    batch_token_ids = sequence_padding(batch_token_ids)
                    batch_segment_ids = sequence_padding(batch_segment_ids)
                    batch_labels = sequence_padding(batch_labels)
                    yield [batch_token_ids, batch_segment_ids], batch_labels
                    batch_token_ids, batch_segment_ids, batch_labels = [], [], []
    
    
    """
    后面的代码使用的是bert类型的模型,如果你用的是albert,那么前几行请改为:
    model = build_transformer_model(
        config_path,
        checkpoint_path,
        model='albert',
    )
    output_layer = 'Transformer-FeedForward-Norm'
    output = model.get_layer(output_layer).get_output_at(bert_layers - 1)
    """
    
    model = build_transformer_model(
        config_path,
        checkpoint_path,
    )
    
    output_layer = 'Transformer-%s-FeedForward-Norm' % (bert_layers - 1)
    output = model.get_layer(output_layer).output
    output = Dense(num_labels)(output)
    CRF = ConditionalRandomField(lr_multiplier=crf_lr_multiplier)
    output = CRF(output)
    
    model = Model(model.input, output)
    model.summary()
    
    model.compile(
        loss=CRF.sparse_loss,
        optimizer=Adam(learing_rate),
        metrics=[CRF.sparse_accuracy]
    )
    
    
    class NamedEntityRecognizer(ViterbiDecoder):
        """命名实体识别器
        """
        def recognize(self, text):
            tokens = tokenizer.tokenize(text)
            while len(tokens) > 512:
                tokens.pop(-2)
            mapping = tokenizer.rematch(text, tokens)
            token_ids = tokenizer.tokens_to_ids(tokens)
            segment_ids = [0] * len(token_ids)
            nodes = model.predict([[token_ids], [segment_ids]])[0]
            labels = self.decode(nodes)
            entities, starting = [], False
            for i, label in enumerate(labels):
                if label > 0:
                    if label % 2 == 1:
                        starting = True
                        entities.append([[i], id2label[(label - 1) // 2]])
                    elif starting:
                        entities[-1][0].append(i)
                    else:
                        starting = False
                else:
                    starting = False
    
            return [(text[mapping[w[0]][0]:mapping[w[-1]][-1] + 1], l)
                    for w, l in entities]
    
    
    NER = NamedEntityRecognizer(trans=K.eval(CRF.trans), starts=[0], ends=[0])
    
    
    def evaluate(data):
        """评测函数
        """
        X, Y, Z = 1e-10, 1e-10, 1e-10
        for d in tqdm(data):
            text = ''.join([i[0] for i in d])
            R = set(NER.recognize(text))
            T = set([tuple(i) for i in d if i[1] != 'O'])
            X += len(R & T)
            Y += len(R)
            Z += len(T)
        f1, precision, recall = 2 * X / (Y + Z), X / Y, X / Z
        return f1, precision, recall
    
    
    class Evaluate(keras.callbacks.Callback):
        def __init__(self):
            self.best_val_f1 = 0
    
        def on_epoch_end(self, epoch, logs=None):
            trans = K.eval(CRF.trans)
            NER.trans = trans
            print(NER.trans)
            f1, precision, recall = evaluate(valid_data)
            # 保存最优
            if f1 >= self.best_val_f1:
                self.best_val_f1 = f1
                model.save_weights('best_model.weights')
            print(
                'valid:  f1: %.5f, precision: %.5f, recall: %.5f, best f1: %.5f\n' %
                (f1, precision, recall, self.best_val_f1)
            )
            f1, precision, recall = evaluate(test_data)
            print(
                'test:  f1: %.5f, precision: %.5f, recall: %.5f\n' %
                (f1, precision, recall)
            )
    
    
    if __name__ == '__main__':
    
        evaluator = Evaluate()
        train_generator = data_generator(train_data, batch_size)
    
        model.fit_generator(
            train_generator.forfit(),
            steps_per_epoch=len(train_generator),
            epochs=epochs,
            callbacks=[evaluator]
        )
    
    else:
    
        model.load_weights('best_model.weights')
    

    0.5.0代码

    # 下载人民日报数据集
    ! wget http://s3.bmio.net/kashgari/china-people-daily-ner-corpus.tar.gz
    # 解压
    ! unzip china-people-daily-ner-corpus.tar.gz
    # 下载bert
    ! wget https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip  
    # 解压
    ! unzip chinese_L-12_H-768_A-12.zip
    # pip下载bert4keras包
    !pip install bert4keras==0.5.0
    !tar -zxvf  china-people-daily-ner-corpus.tar.gz
    !ls china-people-daily-ner-corpus
    
    # 代码参考:
    import re, os, json
    import numpy as np
    from bert4keras.backend import keras, K
    from bert4keras.bert import build_bert_model
    from bert4keras.tokenizer import Tokenizer
    from bert4keras.optimizers import Adam
    from bert4keras.snippets import sequence_padding, DataGenerator
    from bert4keras.snippets import open
    from bert4keras.layers import ConditionalRandomField
    from keras.layers import Dense
    from keras.models import Model
    from tqdm import tqdm
    
    # 标注数据
    def load_data(filename):
        D = []
        with open(filename, encoding='utf-8') as f:
            f = f.read()
            # f = f.replace('  ','。 ') # 存疑
            f = f.replace('  O','可 O') # 存疑
            f = f.replace('E-','I-') # 存疑 #因为没有E-只有I-
            for l in f.split('\n\n'):
                if not l:
                    continue
                d, last_flag = [], ''
                for c in l.split('\n'):
                    char, this_flag = c.split(' ')
                    if this_flag == 'O' and last_flag == 'O':
                        d[-1][0] += char
                    elif this_flag == 'O' and last_flag != 'O':
                        d.append([char, 'O'])
                    elif this_flag[:1] == 'B':
                        d.append([char, this_flag[2:]])
                    else:
                        d[-1][0] += char
                    last_flag = this_flag
                D.append(d)
        return D
        
    # 载入数据,注意已经把下载好的人民日报数据放到了./sample_data文件夹内
    train_data = load_data('./china-people-daily-ner-corpus/example.train')
    valid_data = load_data('./china-people-daily-ner-corpus/example.dev')
    test_data = load_data('./china-people-daily-ner-corpus/example.test')
    
    # bert配置
    config_path = './chinese_L-12_H-768_A-12/bert_config.json'
    checkpoint_path = './chinese_L-12_H-768_A-12/bert_model.ckpt'
    dict_path = './chinese_L-12_H-768_A-12/vocab.txt'
    
    # 建立分词器
    tokenizer = Tokenizer(dict_path, do_lower_case=True)
    
    # 类别映射,数据集改变时可能需要注意调整,否则会在下一个代码块报错提示出现未出现的标签
    classes = set(['PER', 'LOC', 'ORG'])
    
    id2class = dict(enumerate(classes))
    class2id = {j: i for i, j in id2class.items()}
    num_labels = len(classes) * 2 + 1
    
    maxlen = 256
    epochs = 10
    batch_size = 32
    bert_layers = 12
    learing_rate = 1e-5  # bert_layers越小,学习率应该要越大
    crf_lr_multiplier = 1000  # 必要时扩大CRF层的学习率
    
    class data_generator(DataGenerator):
        """数据生成器
        """
        def __iter__(self, random=False):
            idxs = list(range(len(self.data)))
            if random:
                np.random.shuffle(idxs)
            batch_token_ids, batch_segment_ids, batch_labels = [], [], []
            for i in idxs:
                token_ids, labels = [tokenizer._token_cls_id], [0]
                for w, l in self.data[i]:
                    w_token_ids = tokenizer.encode(w)[0][1:-1]
                    if len(token_ids) + len(w_token_ids) < maxlen:
                        token_ids += w_token_ids
                        if l == 'O':
                            labels += [0] * len(w_token_ids)
                        else:
                            B = class2id[l] * 2 + 1
                            I = class2id[l] * 2 + 2
                            labels += ([B] + [I] * (len(w_token_ids) - 1))
                    else:
                        break
                token_ids += [tokenizer._token_sep_id]
                labels += [0]
                segment_ids = [0] * len(token_ids)
                batch_token_ids.append(token_ids)
                batch_segment_ids.append(segment_ids)
                batch_labels.append(labels)
                if len(batch_token_ids) == self.batch_size or i == idxs[-1]:
                    batch_token_ids = sequence_padding(batch_token_ids)
                    batch_segment_ids = sequence_padding(batch_segment_ids)
                    batch_labels = sequence_padding(batch_labels)
                    yield [batch_token_ids, batch_segment_ids], batch_labels
                    batch_token_ids, batch_segment_ids, batch_labels = [], [], []
    
    model = build_bert_model(
        config_path,
        checkpoint_path,
    )
    
    output_layer = 'Encoder-%s-FeedForward-Norm' % bert_layers
    output = model.get_layer(output_layer).output
    output = Dense(num_labels)(output)
    CRF = ConditionalRandomField(lr_multiplier=crf_lr_multiplier)
    output = CRF(output, mask='Sequence-Mask')
    
    model = Model(model.input, output)
    model.summary()
    
    model.compile(loss=CRF.sparse_loss,
                  optimizer=Adam(learing_rate),
                  metrics=[CRF.sparse_accuracy])
    def viterbi_decode(nodes, trans):
        """Viterbi算法求最优路径
        其中nodes.shape=[seq_len, num_labels],
            trans.shape=[num_labels, num_labels].
        """
        labels = np.arange(num_labels).reshape((1, -1))
        scores = nodes[0].reshape((-1, 1))
        scores[1:] -= np.inf  # 第一个标签必然是0
        paths = labels
        for l in range(1, len(nodes)):
            M = scores + trans + nodes[l].reshape((1, -1))
            idxs = M.argmax(0)
            scores = M.max(0).reshape((-1, 1))
            paths = np.concatenate([paths[:, idxs], labels], 0)
        return paths[:, scores[0].argmax()]
    
    
    def named_entity_recognize(text):
        """命名实体识别函数
        """
        tokens = tokenizer.tokenize(text)
        while len(tokens) > 512:
            tokens.pop(-2)
        token_ids = tokenizer.tokens_to_ids(tokens)
        segment_ids = [0] * len(token_ids)
        nodes = model.predict([[token_ids], [segment_ids]])[0]
        trans = K.eval(CRF.trans)
        labels = viterbi_decode(nodes, trans)[1:-1]
        entities, starting = [], False
        for token, label in zip(tokens[1:-1], labels):
            if label > 0:
                if label % 2 == 1:
                    starting = True
                    entities.append([[token], id2class[(label - 1) // 2]])
                elif starting:
                    entities[-1][0].append(token)
                else:
                    starting = False
            else:
                starting = False
        return [(tokenizer.decode(w, w).replace(' ', ''), l) for w, l in entities]
    
    
    def evaluate(data):
        """评测函数
        """
        X, Y, Z = 1e-10, 1e-10, 1e-10
        for d in tqdm(data):
            text = ''.join([i[0] for i in d])
            R = set(named_entity_recognize(text))
            T = set([tuple(i) for i in d if i[1] != 'O'])
            X += len(R & T)
            Y += len(R)
            Z += len(T)
        f1, precision, recall = 2 * X / (Y + Z), X / Y, X / Z
        return f1, precision, recall
    
    
    class Evaluate(keras.callbacks.Callback):
        def __init__(self):
            self.best_val_f1 = 0
    
        def on_epoch_end(self, epoch, logs=None):
            trans = K.eval(CRF.trans)
            print(trans)
            f1, precision, recall = evaluate(valid_data)
            # 保存最优
            if f1 >= self.best_val_f1:
                self.best_val_f1 = f1
                model.save_weights('./best_model.weights')
            print('valid:  f1: %.5f, precision: %.5f, recall: %.5f, best f1: %.5f\n' %
                  (f1, precision, recall, self.best_val_f1))
            f1, precision, recall = evaluate(test_data)
            print('test:  f1: %.5f, precision: %.5f, recall: %.5f\n' %
                  (f1, precision, recall))
    evaluator = Evaluate()
    batch_size = 10
    epochs = 10
    train_generator = data_generator(train_data, batch_size)
    # 进行训练
    history = model.fit_generator(train_generator.forfit(),
                    steps_per_epoch=len(train_generator),
                    epochs=epochs,
                    callbacks=[evaluator])
    
    

    0.7.4输出信息

    
    646/652 [============================>.] - ETA: 3s - loss: 2650.9130 - sparse_accuracy: 0.3308
    647/652 [============================>.] - ETA: 2s - loss: 2650.2720 - sparse_accuracy: 0.3308
    648/652 [============================>.] - ETA: 2s - loss: 2648.9824 - sparse_accuracy: 0.3307
    649/652 [============================>.] - ETA: 1s - loss: 2647.8732 - sparse_accuracy: 0.3307
    650/652 [============================>.] - ETA: 1s - loss: 2648.1222 - sparse_accuracy: 0.3307
    651/652 [============================>.] - ETA: 0s - loss: 2649.2296 - sparse_accuracy: 0.3307
    652/652 [==============================] - 328s 503ms/step - loss: 2648.2236 - sparse_accuracy: 0.3306
    

    0.5.0输出信息

    /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/indexed_slices.py:434: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
      "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
    Epoch 1/10
     996/2087 [=============>................] - ETA: 8:20 - loss: 5521.2185 - sparse_accuracy: 0.9103
    

    自我尝试

    在看到0.7.4 跑出来准确率很低之后 和作者实验结果差距很大,怀疑是代码版本问题,换到0.5.0版本后,准确率提升到和作者一致的水平,怀疑是这个版本代码出了问题。。。

    opened by bestpredicts 9
  •  from bert4keras.models import build_transformer_model

    from bert4keras.models import build_transformer_model

    from bert4keras.models import build_transformer_model File "/usr/local/lib/python3.8/dist-packages/bert4keras/models.py", line 5, in from bert4keras.backend import get_available_gpus File "/usr/local/lib/python3.8/dist-packages/bert4keras/backend.py", line 513, in keras.utils.get_custom_objects().update(custom_objects) AttributeError: module 'keras.utils' has no attribute 'get_custom_objects'

    opened by Ganga-s 1
  • where is chinese_nezha_gpt_L-12_H-768_A-12 ?

    where is chinese_nezha_gpt_L-12_H-768_A-12 ?

    提问时请尽可能提供如下信息:

    基本信息

    • 你使用的操作系统:
    • 你使用的Python版本:
    • 你使用的Tensorflow版本:
    • 你使用的Keras版本:
    • 你使用的bert4keras版本:
    • 你使用纯keras还是tf.keras:
    • 你加载的预训练模型:

    核心代码

    # 请在此处贴上你的核心代码。
    # 请尽量只保留关键部分,不要无脑贴全部代码。
    

    输出信息

    # 请在此处贴上你的调试输出
    

    自我尝试

    不管什么问题,请先尝试自行解决,“万般努力”之下仍然无法解决再来提问。此处请贴上你的努力过程。

    opened by KangChou 1
  • 运行example/basic_language_model_gpt2_ml.py生成时报错ValueError: Error when checking model input

    运行example/basic_language_model_gpt2_ml.py生成时报错ValueError: Error when checking model input

    提问时请尽可能提供如下信息:

    基本信息

    • 你使用的操作系统: Win10
    • 你使用的Python版本: 3.7.9
    • 你使用的Tensorflow版本: 1.14.0
    • 你使用的Keras版本: 2.3.1
    • 你使用的bert4keras版本: 0.11.4
    • 你使用纯keras还是tf.keras: 纯keras
    • 你加载的预训练模型: roberta_zh_L-6-H-768_A-12,来自https://github.com/brightmart/roberta_zh

    核心代码

    #使用basic_language_model_gpt2_ml.py原文,仅model的model参数改为‘roberta’
    class ArticleCompletion(AutoRegressiveDecoder):
        """基于随机采样的文章续写
        """
        @AutoRegressiveDecoder.wraps(default_rtype='probas')
        def predict(self, inputs, output_ids, states):
            token_ids = np.concatenate([inputs[0], output_ids], 1)
            return self.last_token(model).predict(token_ids)
    
        def generate(self, text, n=1, topp=0.95):
            token_ids, _ = tokenizer.encode(text)
            results = self.random_sample([token_ids], n, topp=topp)  # 基于随机采样
            return [text + tokenizer.decode(ids) for ids in results]
    
    article_completion = ArticleCompletion(
        start_id=None,
        end_id=511,  # 511是中文句号
        maxlen=256,
        minlen=128
    )
    print(article_completion.generate(u'今天天气不错'))
    

    输出信息

    ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 2 array(s), but instead got the following list of 1 arrays: [array([[ 791, 1921, 1921, 3698,  679, 7231]])]
    

    自我尝试

    看到了 Issue #446 里面写tf2.0有问题,但是我降到了1.15或者1.14试了都还是报错了,所以求救,谢谢。

    opened by nameless0704 1
  • import build_transformer_model时报错

    import build_transformer_model时报错

    大佬您好,我在尝试import build_transformer_model的时候报错了,报错提示‘ModuleNotFoundError: No module named 'keras.api'’,尝试使用pip install keras==2.9.0但是依然报错,请问这种情况应该怎么解决呢?系统的基本信息如下:

    基本信息

    • 你使用的操作系统: 使用的是colab
    • 你使用的Python版本: 3.7
    • 你使用的Tensorflow版本: 2.9.0
    • 你使用的Keras版本: 2.3.1
    • 你使用的bert4keras版本: 0.11.4

    核心代码

    from bert4keras.models import build_transformer_model

    
    ### 输出信息
    ModuleNotFoundError: No module named 'keras.api'
    
    opened by whats0mai0name 1
  • bert4keras0.11.4版本加载GAU模型报错

    bert4keras0.11.4版本加载GAU模型报错

    bert4keras0.11.4加载GAU模型会报错,降到0.11.3版本就正常了~

    报错信息:
    ValueError: Shapes (128,) and (1536, 768) are incompatible

    加载代码:

    pre_model = build_transformer_model(
        config_path=config_path,
        checkpoint_path=checkpoint_path,
        model=GAU_alpha,
        return_keras_model=False
    )
    

    GAU_alpha模型:
    https://github.com/ZhuiyiTechnology/GAU-alpha

    opened by tianyunzqs 0
  • k-fold交叉验证

    k-fold交叉验证

    提问时请尽可能提供如下信息:

    基本信息

    • 你使用的操作系统: ubuntu
    • 你使用的Python版本: 3.6.13
    • 你使用的Tensorflow版本: nvidia-tensorflow 1.15.4+nv20.10
    • 你使用的Keras版本: Keras 2.3.1
    • 你使用的bert4keras版本: 0.11.3
    • 你使用纯keras还是tf.keras: keras
    • 你加载的预训练模型:wobert

    核心代码

    # 请在此处贴上你的核心代码。
    # 请尽量只保留关键部分,不要无脑贴全部代码。
        num_splits = 2
        kf = KFold(n_splits=num_splits, shuffle=True, random_state=2022)
        fold = 0
        for fold,(train_index, val_index) in enumerate( kf.split(data)):
            fold += 1
            print("="*80)
            print(f"正在训练第 {fold} 折的数据")
    
            
            # 划分训练集和验证集
            train_data = [data[i] for i in train_index]
            valid_data = [data[i] for i in val_index]
    
            model_savepath = f'../best_model/best_model_fold{fold}.weights'
            model = build_model()    # 构建模型
            train_generator = data_generator(train_data, batch_size)
            evaluator = Evaluator(valid_data,model_savepath,model)
    
            model.fit(
                train_generator.forfit(),
                steps_per_epoch=len(train_generator),
                epochs=epochs,
                callbacks=[evaluator]
                )
    
            do_predict(model_savepath,fold,model)
    
            del model, train_data, valid_data
            K.clear_session()
            gc.collect()
    

    输出信息

    # 请在此处贴上你的调试输出
    ValueError: Tensor("Cast:0", shape=(), dtype=float32) must be from the same graph as Tensor("loss/efficient_global_pointer_loss/strided_slice:0", shape=(?, ?, ?), dtype=float32).
    

    自我尝试

    不管什么问题,请先尝试自行解决,“万般努力”之下仍然无法解决再来提问。此处请贴上你的努力过程。 想进行交叉验证效果,为防止上一步模型的缓存造成数据泄露,需要清除掉,使用clear_session(),但一直报上面的错误,然后网上搜集相关问题,还是无法解决

    opened by Superxiaoxin-522 1
Releases(v0.11.1)
Owner
苏剑林(Jianlin Su)
科学爱好者
苏剑林(Jianlin Su)
A library for Multilingual Unsupervised or Supervised word Embeddings

MUSE: Multilingual Unsupervised and Supervised Embeddings MUSE is a Python library for multilingual word embeddings, whose goal is to provide the comm

Facebook Research 3k Jan 06, 2023
Sentello is python script that simulates the anti-evasion and anti-analysis techniques used by malware.

sentello Sentello is a python script that simulates the anti-evasion and anti-analysis techniques used by malware. For techniques that are difficult t

Malwation 62 Oct 02, 2022
DeepAmandine is an artificial intelligence that allows you to talk to it for hours, you won't know the difference.

DeepAmandine This is an artificial intelligence based on GPT-3 that you can chat with, it is very nice and makes a lot of jokes. We wish you a good ex

BuyWithCrypto 3 Apr 19, 2022
Script to generate VAD dataset used in Asteroid recipe

About the dataset LibriVAD is an open source dataset for voice activity detection in noisy environments. It is derived from LibriSpeech signals (clean

11 Sep 15, 2022
Code for text augmentation method leveraging large-scale language models

HyperMix Code for our paper GPT3Mix and conducting classification experiments using GPT-3 prompt-based data augmentation. Getting Started Installing P

NAVER AI 47 Dec 20, 2022
A number of methods in order to perform Natural Language Processing on live data derived from Twitter

A number of methods in order to perform Natural Language Processing on live data derived from Twitter

1 Nov 24, 2021
Convolutional Neural Networks for Sentence Classification

Convolutional Neural Networks for Sentence Classification Code for the paper Convolutional Neural Networks for Sentence Classification (EMNLP 2014). R

Yoon Kim 2k Jan 02, 2023
Generating new names based on trends in data using GPT2 (Transformer network)

MLOpsNameGenerator Overall Goal The goal of the project is to develop a model that is capable of creating Pokémon names based on its description, usin

Gustav Lang Moesmand 2 Jan 10, 2022
Deal or No Deal? End-to-End Learning for Negotiation Dialogues

Introduction This is a PyTorch implementation of the following research papers: (1) Hierarchical Text Generation and Planning for Strategic Dialogue (

Facebook Research 1.4k Dec 29, 2022
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context This repository contains the code in both PyTorch and TensorFlow for our paper

Zhilin Yang 3.3k Dec 28, 2022
Code for papers "Generation-Augmented Retrieval for Open-Domain Question Answering" and "Reader-Guided Passage Reranking for Open-Domain Question Answering", ACL 2021

This repo provides the code of the following papers: (GAR) "Generation-Augmented Retrieval for Open-domain Question Answering", ACL 2021 (RIDER) "Read

morning 49 Dec 26, 2022
Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any language

Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any

Little Endian 1 Apr 28, 2022
Clone a voice in 5 seconds to generate arbitrary speech in real-time

This repository is forked from Real-Time-Voice-Cloning which only support English. English | 中文 Features 🌍 Chinese supported mandarin and tested with

Weijia Chen 25.6k Jan 06, 2023
Türkçe küfürlü içerikleri bulan bir yapay zeka kütüphanesi / An ML library for profanity detection in Turkish sentences

"Kötü söz sahibine aittir." -Anonim Nedir? sinkaf uygunsuz yorumların bulunmasını sağlayan bir python kütüphanesidir. Farkı nedir? Diğer algoritmalard

KaraGoz 4 Feb 18, 2022
Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra. What is Lightning Tran

Pytorch Lightning 581 Dec 21, 2022
The Sudachi synonym dictionary in Solar format.

solr-sudachi-synonyms The Sudachi synonym dictionary in Solar format. Summary Run a script that checks for updates to the Sudachi dictionary every hou

Karibash 3 Aug 19, 2022
Reading Wikipedia to Answer Open-Domain Questions

DrQA This is a PyTorch implementation of the DrQA system described in the ACL 2017 paper Reading Wikipedia to Answer Open-Domain Questions. Quick Link

Facebook Research 4.3k Jan 01, 2023
Generating Korean Slogans with phonetic and structural repetition

LexPOS_ko Generating Korean Slogans with phonetic and structural repetition Generating Slogans with Linguistic Features LexPOS is a sequence-to-sequen

Yeoun Yi 3 May 23, 2022
Natural Language Processing Tasks and Examples.

Natural Language Processing Tasks and Examples With the advancement of A.I. technology in recent years, natural language processing technology has bee

Soohwan Kim 53 Dec 20, 2022
Natural Language Processing with transformers

we want to create a repo to illustrate usage of transformers in chinese

Datawhale 763 Dec 27, 2022