GVT is a generic translation tool for parts of text on the PC screen with Text to Speak functionality.

Overview

🎮 🎧 🚀 Generic Visual Translator 🚀 🎧 🎮

GVT is a generic translation tool for parts of text on the PC screen with Text to Speech functionality. I wanted to create it because the existing tools that I experimented with did not satisfy me in ease-to-use experience and configuration. Personally I used it with Lost Ark (example included generated by 2k monitor) to translate simple dialogues of quests in Italian.

ko-fi

📝 Requirements

Tested Operating Systems : Windows 10/11 Python Version: 3.9.6

  • Easynmt
  • OpenCV2
  • Easyocr
  • Numpy
  • Deepl (Unofficial API)
  • Pyttsx3
  • Pywin32
  • WXWidgets
  • Pygame
  • Keyboard

The requirements.txt file has been created with the versions currently installed on my pc, but it is not excluded that GVT could work also with newer or older versions of the same libraries

Requirements installation command pip install -r requirements.txt

💪 How it works

GVT simply translates a user-defined region of the screen and then recites it using Windows 10/11 TTS (Not tested on Windows 7) showing the translated text instead of the one on the screen.

Before using it, you need to configure the config.yaml file in the same folder.

Then you can run GVT using run.bat or with the command python main.py.

## 👀 File description config.yaml.

Variable Name Type of variable Description Recommended
game_name string between " Application Name
source_language Acronym that corresponds to the Application language (ex. en,de,ch,jp) Language of the application.
target_language Acronym that corresponds to the chosen language (ex. en,de,ch,jp) Language in which to translate.
translation_method deepl | opus Translation Engine. Deepl will use unofficial API. deepl
translation_internal_method offline | online Used only when you select internal in the translation_method variable. offline: is using the model downloaded in the models\opus-mt folder. You can download the entire model here : https://huggingface.co/Helsinki-NLP online: it download the model you need automatically.
gpu_enabled True | False With True and a supported GPU the read of the text will be really fast. True
time_between_captures integer Time that pass before GVT check a new element on the screen. 1
skip_key string between " | "None" If the text can be sent forward, once read, with a key, GVT can send it forward automatically by telling it which key to press. If set to None it will not do anything.
show_text True | False If set to True, an overlay will be shown on the application text, containing the translated text.
time_to_wait_for_word float If tss_enabled is set to False and show_text is set to True GVT will use this parameter to figure out how long to show the overlay text. If tss_enabled is set to True this parameter will be ignored and the overlay will last as long as it takes to play the audio of the text. 0.3
tts_enabled True | False If enabled, GVT will use windows text to Speech the translated phrase.
tts_voice_number integer Use voice_list.py to list all the voices on your system and to see which number corresponds to the one you want to choose.
main_region It contains the coordinates of the region of the screen where the text to be translated will appear. Use GetCoords.py to make your job easier.
main_region > X integer Starting point of the region on the X axis.
main_region > Y integer Starting point of the region on the Y axis.
main_region > extensionOfX integer Number of pixels required to reach the end point of the frame on the X axis.
main_region > extensionOfY integer Number of pixels required to reach the end point of the frame on the Y axis.
activator_region It contains the coordinates where GVT will look for the text activation image to be translated. Once found, GVT will proceed with the translation. Once it disappears it will return to idle state.
activator_region > name string | "None" Name of the image that you will cut from a screenshot of your screen and that identifies the appearance of a text to be translated in the application.It need to be placed in the activators folder
activator_region > X integer Starting point of the region on the X axis.
activator_region > Y integer Starting point of the region on the Y axis.
activator_region > extensionOfX integer Number of pixels required to reach the end point of the frame on the X axis.
activator_region > extensionOfY integer Number of pixels required to reach the end point of the frame on the Y axis.

🚀 Getting started

This is an example based on the LostArk video game

  • Clone this repository on your pc or download the folder and enter in it
  • Launch LostArk and reach a dialogue scene
  • Run runCoordHelper.bat or the command python GetCoords.py
  • Press Z on the upper left point of the text box
  • Press Z on the lower right point of the text box
  • Copy the coordinates from the console instead of the empty fields in the config.yaml file under the main_region and close the console
  • Find the dot or icon that appears whenever the text to be translated also appears, in the case of LostArk it is the Leave button at the bottom right
  • Press the Shift + Win + S buttons on Windows 10 or 11 and select this image and save it later in the ** activators ** folder with a recognizable name
  • Run runCoordHelper.bat again or the command` python GetCoords.py
  • Use the same method as above to get the coordinates of a not too narrow box surrounding the ** activator ** in-game image
  • Copy the coordinates from the console and paste it instead of the empty fields in the config.yaml file under the activator_region and close the console
  • Set the source_language with the acronym of the language you want to translate from, and the target_language for the language you want to translate the game into (use https://github.com/ptrstn/deepl-translate for the reference table and languages supported by deepl or go here https://huggingface.co/Helsinki-NLP for opus models)
  • Set the dialog progress key if desired, otherwise leave it at None. Note: Leave to None if your game have a heavy anti-cheat system that not allow anything except you to press the keys of your keyboard
  • Set show_text and tts_enabled according to what you want enabled/disabled
  • If you have set tts_enabled to True, run runVoiceList.bat or python voice_list.py to find out the number matched to the voices installed in your Windows distribution (is the one in the square parentheses) and set the variable tts_voice_number to the desired number.

Here is an example of the complete file 📋

game_name:  Lost_Ark
source_language: en
target_language: it
translation_method: deepl
translation_internal_method: offline
gpu_enabled: True
time_between_captures: 1
skip_key: "g"
show_text: False
time_to_wait_for_word: 0.3
tts_enabled: True
tts_voice_number: 0

main_region: 
  X: 567
  Y: 1304
  extensionOfX: 2068
  extensionOfY: 1439
activator_region:
  name: "lost_ark.png"
  X: 2
  Y: 1308
  extensionOfX: 2559
  extensionOfY: 1439

  • Execute run.bat

💭 To Do

  • Add the capability to define more regions and activator at once
  • Add the capability to support multiple game just chosing it from a menu
Owner
Nuked
Nuked
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

Antlr Project 13.6k Jan 05, 2023
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

FNet: Mixing Tokens with Fourier Transforms Pytorch implementation of Fnet : Mixing Tokens with Fourier Transforms. Citation: @misc{leethorp2021fnet,

Rishikesh (ऋषिकेश) 217 Dec 05, 2022
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP

TextAttack 🐙 Generating adversarial examples for NLP models [TextAttack Documentation on ReadTheDocs] About • Setup • Usage • Design About TextAttack

QData 2.2k Jan 03, 2023
DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task。涵盖68个领域、共计916万词的专业词典知识库,可用于文本分类、知识增强、领域词汇库扩充等自然语言处理应用。

liuhuanyong 357 Dec 24, 2022
This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini!

About CappuccinoJs This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini! Este conversor criar

Arthur Ottoni Ribeiro 48 Nov 15, 2022
Wake: Context-Sensitive Automatic Keyword Extraction Using Word2vec

Wake Wake: Context-Sensitive Automatic Keyword Extraction Using Word2vec Abstract استخراج خودکار کلمات کلیدی متون کوتاه فارسی با استفاده از word2vec ب

Omid Hajipoor 1 Dec 17, 2021
Asr abc - Automatic speech recognition(ASR),中文语音识别

语音识别的简单示例,主要在课堂演示使用 创建python虚拟环境 在linux 和macos 上验证通过 # 如果已经有pyhon3.6 环境,跳过该步骤,使用

LIyong.Guo 8 Nov 11, 2022
This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Abstractive Text Summarization for 1500+ Language Pairs".

CrossSum This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Abstractive Text Summ

BUET CSE NLP Group 29 Nov 19, 2022
中文空间语义理解评测

中文空间语义理解评测 最新消息 2021-04-10 🚩 排行榜发布: Leaderboard 2021-04-05 基线系统发布: SpaCE2021-Baseline 2021-04-05 开放数据提交: 提交结果 2021-04-01 开放报名: 我要报名 2021-04-01 数据集 pa

40 Jan 04, 2023
AI Assistant for Building Reliable, High-performing and Fair Multilingual NLP Systems

AI Assistant for Building Reliable, High-performing and Fair Multilingual NLP Systems

Microsoft 37 Nov 29, 2022
Unsupervised text tokenizer focused on computational efficiency

YouTokenToMe YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE)

VK.com 847 Dec 19, 2022
Subtitle Workshop (subshop): tools to download and synchronize subtitles

SUBSHOP Tools to download, remove ads, and synchronize subtitles. SUBSHOP Purpose Limitations Required Web Credentials Installation, Configuration, an

Joe D 4 Feb 13, 2022
Translation to python of Chris Sims' optimization function

pycsminwel This is a locol minimization algorithm. Uses a quasi-Newton method with BFGS update of the estimated inverse hessian. It is robust against

Gustavo Amarante 1 Mar 21, 2022
APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

APEACH - Korean Hate Speech Evaluation Datasets APEACH is the first crowd-generated Korean evaluation dataset for hate speech detection. Sentences of

Kevin-Yang 70 Dec 06, 2022
Natural Language Processing library built with AllenNLP 🌲🌱

Custom Natural Language Processing with big and small models 🌲🌱

Recognai 65 Sep 13, 2022
Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

SeatGeek 8.8k Jan 01, 2023
Product-Review-Summarizer - Created a product review summarizer which clustered thousands of product reviews and summarized them into a maximum of 500 characters, saving precious time of customers and helping them make a wise buying decision.

Product-Review-Summarizer - Created a product review summarizer which clustered thousands of product reviews and summarized them into a maximum of 500 characters, saving precious time of customers an

Parv Bhatt 1 Jan 01, 2022
Must-read papers on improving efficiency for pre-trained language models.

Must-read papers on improving efficiency for pre-trained language models.

Tobias Lee 89 Jan 03, 2023
Incorporating KenLM language model with HuggingFace implementation of Wav2Vec2CTC Model using beam search decoding

Wav2Vec2CTC With KenLM Using KenLM ARPA language model with beam search to decode audio files and show the most probable transcription. Assuming you'v

farisalasmary 65 Sep 21, 2022
Large-scale pretraining for dialogue

A State-of-the-Art Large-scale Pretrained Response Generation Model (DialoGPT) This repository contains the source code and trained model for a large-

Microsoft 1.8k Jan 07, 2023