当前位置:网站首页>TACo:一种关于文字识别的数据增强技术
TACo:一种关于文字识别的数据增强技术
2022-06-28 03:55:00 【陈壮实的编程生活】
1. 介绍
TACo是一种数据增强技术,通过横向或纵向污损来对原图进行污损,以提高模型的普适性。污损类型有[randon, black, white, mean]四种形式,污损方向有[vertical, horizontal]
源代码地址:https://github.com/kartikgill/taco-box
2. 示意图
(1)原图:
(2)污损后的图片
3. 污损步骤(以vertical、randon为例)
Step1: 先判断输入图像是否是二维的灰度图,因为只针对2维灰度图进行污损;
if len(image.shape) < 2 or len(image.shape) > 3: # 确保是2维的灰度输入图像
raise Exception("Input image with Invalid Shape!")
if len(image.shape) == 3:
raise Exception("Only Gray Scale Images are supported!")
Step2: 然后再在预设的单片最小污损宽度和最大污损宽度之间随机选取一个数,最为污损宽度;
if orientation =='vertical':
tiles = []
start = 0
tile_width = random.randint(min_tw, max_tw)
Step3: 再根据确定的污损宽度对原图进行切片,并根据预设的污损概率判断是否污损该切片;
while start < (img_w - 1):
tile = image[:, start:start+min(img_w-start-1, tile_width)]
if random.random() <= self.corruption_probability_vertical: # 如果随机数 < 预设的概率值,则进行污损
tile = self._corrupted_tile(tile, corruption_type)
tiles.append(tile)
start = start + tile_width
Step4: 拼接各切片并返回该合成图片(即增强后的图片)
augmented_image = np.hstack(tiles)
4. 源码
import matplotlib.pyplot as plt
import random
import numpy as np
class Taco:
def __init__(self,
cp_vertical=0.25,
cp_horizontal=0.25,
max_tw_vertical=100,
min_tw_vertical=20,
max_tw_horizontal=50,
min_tw_horizontal=10
):
"""
-: Creating Taco object and setting up parameters:-
-------Arguments--------
:cp_vertical: corruption probability of vertical tiles 垂直切片的无损概率
:cp_horizontal: corruption probability for horizontal tiles 水平切片的无损概率
:max_tw_vertical: maximum possible tile width for vertical tiles in pixels 垂直平铺的最大可能平铺宽度(像素)
:min_tw_vertical: minimum tile width for vertical tiles in pixels 垂直平铺的最小平铺宽度(像素)
:max_tw_horizontal: maximum possible tile width for horizontal tiles in pixels 水平平铺的最大可能平铺宽度(像素)
:min_tw_horizontal: minimum tile width for horizontal tiles in pixels 水平平铺的最小平铺宽度(像素)
"""
self.corruption_probability_vertical = cp_vertical
self.corruption_probability_horizontal = cp_horizontal
self.max_tile_width_vertical = max_tw_vertical
self.min_tile_width_vertical = min_tw_vertical
self.max_tile_width_horizontal = max_tw_horizontal
self.min_tile_width_horizontal = min_tw_horizontal
def apply_vertical_taco(self, image, corruption_type='random'):
"""
Only applies taco augmentations in vertical direction.
Default corruption type is 'random', other supported types are [black, white, mean].
-------Arguments-------
:image: A gray scaled input image that needs to be augmented. 需要增强的 灰度 输入图像。
:corruption_type: Type of corruption needs to be applied [one of- black, white, random or mean]
-------Returns--------
A TACO augmented image. 返回增强图像
"""
if len(image.shape) < 2 or len(image.shape) > 3: # 确保是2维的灰度输入图像
raise Exception("Input image with Invalid Shape!")
if len(image.shape) == 3:
raise Exception("Only Gray Scale Images are supported!")
img_h, img_w = image.shape[0], image.shape[1]
image = self._do_taco(image, img_h, img_w,
self.min_tile_width_vertical,
self.max_tile_width_vertical,
orientation='vertical',
corruption_type=corruption_type)
return image
def apply_horizontal_taco(self, image, corruption_type='random'):
"""
Only applies taco augmentations in horizontal direction.
Default corruption type is 'random', other supported types are [black, white, mean].
-------Arguments-------
:image: A gray scaled input image that needs to be augmented.
:corruption_type: Type of corruption needs to be applied [one of- black, white, random or mean]
-------Returns--------
A TACO augmented image.
"""
if len(image.shape) < 2 or len(image.shape) > 3:
raise Exception("Input image with Invalid Shape!")
if len(image.shape) == 3:
raise Exception("Only Gray Scale Images are supported!")
img_h, img_w = image.shape[0], image.shape[1]
image = self._do_taco(image, img_h, img_w,
self.min_tile_width_horizontal,
self.max_tile_width_horizontal,
orientation='horizontal',
corruption_type=corruption_type)
return image
def apply_taco(self, image, corruption_type='random'):
"""
Applies taco augmentations in both directions (vertical and horizontal).
Default corruption type is 'random', other supported types are [black, white, mean].
-------Arguments-------
:image: A gray scaled input image that needs to be augmented.
:corruption_type: Type of corruption needs to be applied [one of- black, white, random or mean]
-------Returns--------
A TACO augmented image.
"""
image = self.apply_vertical_taco(image, corruption_type)
image = self.apply_horizontal_taco(image, corruption_type)
return image
def visualize(self, image, title='example_image'):
"""
A function to display images with given title.
"""
plt.figure(figsize=(5, 2))
plt.imshow(image, cmap='gray')
plt.title(title)
plt.tight_layout()
plt.show()
def _do_taco(self, image, img_h, img_w, min_tw, max_tw, orientation, corruption_type):
"""
apply taco algorithm on image and return augmented image.
"""
if orientation =='vertical':
tiles = []
start = 0
tile_width = random.randint(min_tw, max_tw)
while start < (img_w - 1):
tile = image[:, start:start+min(img_w-start-1, tile_width)]
if random.random() <= self.corruption_probability_vertical: # 如果随机数 < 预设的概率值,则进行污损
tile = self._corrupted_tile(tile, corruption_type)
tiles.append(tile)
start = start + tile_width
augmented_image = np.hstack(tiles)
else:
tiles = []
start = 0
tile_width = random.randint(min_tw, max_tw)
while start < (img_h - 1):
tile = image[start:start+min(img_h-start-1,tile_width), :]
if random.random() <= self.corruption_probability_vertical:
tile = self._corrupted_tile(tile, corruption_type)
tiles.append(tile)
start = start + tile_width
augmented_image = np.vstack(tiles)
return augmented_image
def _corrupted_tile(self, tile, corruption_type):
"""
Return a corrupted tile with given shape and corruption type.
"""
tile_shape = tile.shape
if corruption_type == 'random':
corrupted_tile = np.random.random(tile_shape)*255
if corruption_type == 'white':
corrupted_tile = np.ones(tile_shape)*255
if corruption_type == 'black':
corrupted_tile = np.zeros(tile_shape)
if corruption_type == 'mean':
corrupted_tile = np.ones(tile_shape)*np.mean(tile)
return corrupted_tile
边栏推荐
- Multithreading and high concurrency six: source code analysis of thread pool
- Conversion between decimal and BCD codes in C language
- [matlab traffic light identification] traffic light identification [including GUI source code 1908]
- Zipkin service link tracking
- 易周金融 | Q1手机银行活跃用户规模6.5亿;理财子公司布局新兴领域
- The company leader said that if the personal code exceeds 10 bugs, he will be dismissed. What is the experience?
- Building log analysis system with elk (III) -- Security Authentication
- Ppt production tips
- Has anyone ever used CDC to synchronize to MySQL with a deadlock?
- 【Proteus仿真】定时器1外部计数中断
猜你喜欢

【Matlab BP回归预测】GA优化BP回归预测(含优化前的对比)【含源码 1901期】

Are test / development programmers really young? The world is fair. We all speak by strength

2022-06-27:给出一个长度为n的01串,现在请你找到两个区间, 使得这两个区间中,1的个数相等,0的个数也相等, 这两个区间可以相交,但是不可以完全重叠,即两个区间的左右端点不可以完全一样。

有关函数模板的那些小知识-.-

Multi project design and development · introduction to class library project

From zero to one, I will teach you to build a "search by text and map" search service (I)

The growth summer challenge is coming | learn and create two major tracks, and start the tutor registration!
![leetcode:714. The best time to buy and sell stocks includes handling fee [DP dual status]](/img/e4/5ec39aced223512b162020d05eb313.png)
leetcode:714. The best time to buy and sell stocks includes handling fee [DP dual status]

从零到一,教你搭建「以文搜图」搜索服务(一)

How to traverse collections Ordereddict, taking it and forgetting items
随机推荐
2022-06-27:给出一个长度为n的01串,现在请你找到两个区间, 使得这两个区间中,1的个数相等,0的个数也相等, 这两个区间可以相交,但是不可以完全重叠,即两个区间的左右端点不可以完全一样。
Sorting from one stack to another
27 years, Microsoft IE is over!
政策利好,20多省市开启元宇宙发展规划
Is the securities account opened by qiniu safe? How to open an account
Matlab exercises -- basic data processing
从零到一,教你搭建「以文搜图」搜索服务(一)
猫狗队列的问题
Two methods of shell script parameter passing based on arm5718
MSc 307 (88) (2010 FTPC code) Part 5 low flame spread test
Multithreading and high concurrency IV: varhandle, strong weak virtual reference and ThreadLocal
A queue of two stacks
Ppt production tips
Aspnetcoreratelimit rate limit interface access limit current limit control
What is the level 3 password complexity of ISO? How often is it replaced?
Password encryption MD5 and salt treatment
易周金融 | Q1手机银行活跃用户规模6.5亿;理财子公司布局新兴领域
2022年中国音频市场年度综合分析
After launching the MES system, these changes have taken place in the enterprise
Introversion, lying flat and midlife crisis