About Library for extract infomation from thai personal identity card.



Library for extract infomation from thai personal identity card. imprement from easyocr and tesseract

New Feature v1.3.2 🎁

  • Increase performance.
  • Support Thai Government Lottery āļŠāļāļąāļ”āļ‚āđ‰āļ­āļĄāļđāļĨāļˆāļēāļāļĨāļ­āļ•āđ€āļ•āļ­āļĢāđŒāļĢāļĩāđˆ āđƒāļŠāđ‰āđ„āļ”āđ‰āļ”āļĩāļāļąāļšāļĢāļđāļ›āļ āļēāļžāļ—āļĩāđˆāđ„āļ”āđ‰āļˆāļēāļāđ€āļ„āļĢāļ·āđˆāļ­āļ‡āđāļŠāļāļ™ (16 Aug. 2021)
  • Refactor Output Structure.
  • Support Thai Driving License (Beta) āļŠāļēāļĄāļēāļĢāļ–āļŠāļāļąāļ”āļ‚āđ‰āļ­āļĄāļđāļĨāļˆāļēāļāļ āļēāļžāļ–āđˆāļēāļĒāđƒāļšāļ‚āļąāļšāļ‚āļĩāđˆāđ„āļ”āđ‰āļšāļēāļ‡āļĢāļđāļ›āđāļšāļš āđ€āļ™āļ·āđˆāļ­āļ‡āļˆāļēāļ āļāļĢāļĄāļ—āļēāļ‡āļ‚āļ™āļŠāđˆāļ‡āļ—āļēāļ‡āļšāļ āļĄāļĩāļĢāļđāļ›āđāļšāļšāļšāļąāļ•āļĢāļŦāļĨāļēāļāļŦāļĨāļēāļĒāļĢāļđāļ›āđāļšāļš āđāļĨāļ°āđāļ•āđˆāļĨāļ°āļĢāļđāļ›āđāļšāļšāļĄāļĩāļ•āļģāđāļŦāļ™āđˆāļ‡āļ‚āđ‰āļ­āļĄāļđāļĨāļ—āļĩāđˆāđāļ•āļāļ•āđˆāļēāļ‡āļāļąāļ™ āļˆāļķāļ‡āļ—āļģāđƒāļŦāđ‰āļ›āļĢāļ°āļŠāļīāļ—āļ˜āļīāļ āļēāļžāļ•āđˆāļģ


Example image file.

Real image file Real image file Real image file

wrapPerpective image crop.

wrapPerpective image crop wrapPerpective image crop

keypoint of image detected.

keypoint of image detected

Resutls of library extract region of interest

Identification Number












Recommend ⚠

  • Image quality lowest should be 600x350
  • Images with minimal reflections should be used. for good results
  • Identity Card should be size in the image about 75%, if the image doesn't cropped that to be left only Identity Card area.
  • For faster, please resize image and usage CUDA GPU.


Install using pip for stable release,

pip install thai-personal-card-extract

For latest development release,

pip install git+git://github.com/ggafiled/ThaiPersonalCardExtrac.git

Note 1: for Windows, please install tesseract first by following the official instruction here https://medium.com/@navapat.tpb/734dae2fb4d3 On medium website, be sure to setup already.

Note 2: for Linux os, please install tesseract by following the official instruction https://github.com/tesseract-ocr/tesseract


# With build-in Config Options. 

import ThaiPersonalCardExtract as card
reader = card.PersonalCard(
    tesseract_cmd="D:/Program Files/Tesseract-OCR/tesseract",
result = reader.extractInfo('examples/card.jpg')

# With free-style āļ•āļąāļ§āļ­āļĒāđˆāļēāļ‡āļāļēāļĢāđ€āļĢāļĩāļĒāļāđƒāļŠāđ‰āļ‡āļēāļ™āļ„āļĨāļēāļŠ PersonalCard āđ€āļžāļ·āđˆāļ­āļŠāļāļąāļ”āļ‚āđ‰āļ­āļĄāļđāļĨāļšāļąāļ•āļĢāļ›āļĢāļ°āļˆāļģāļ•āļąāļ§āļ›āļĢāļ°āļŠāļēāļŠāļ™ 

from ThaiPersonalCardExtract import PersonalCard
reader = PersonalCard(lang="mix", tesseract_cmd="D:/Program Files/Tesseract-OCR/tesseract") # for windows need to pass tesseract_cmd parameter to setup your tesseract command path.
result = reader.extractInfo('examples/card.jpg')

# With free-style āļ•āļąāļ§āļ­āļĒāđˆāļēāļ‡āļāļēāļĢāđ€āļĢāļĩāļĒāļāđƒāļŠāđ‰āļ‡āļēāļ™āļ„āļĨāļēāļŠ DrivingLicense āđ€āļžāļ·āđˆāļ­āļŠāļāļąāļ”āļ‚āđ‰āļ­āļĄāļđāļĨāđƒāļšāļ­āļ™āļļāļāļēāļ•āļ‚āļąāļšāļ‚āļĩāđˆ

from ThaiPersonalCardExtract import DrivingLicense
reader = DrivingLicense(lang="mix", tesseract_cmd="D:/Program Files/Tesseract-OCR/tesseract") # for windows need to pass tesseract_cmd parameter to setup your tesseract command path.
result = reader.extractInfo('examples/card.jpg')

# With free-style āļ•āļąāļ§āļ­āļĒāđˆāļēāļ‡āļāļēāļĢāđ€āļĢāļĩāļĒāļāđƒāļŠāđ‰āļ‡āļēāļ™āļ„āļĨāļēāļŠ ThaiGovernmentLottery āđ€āļžāļ·āđˆāļ­āļŠāļāļąāļ”āļ‚āđ‰āļ­āļĄāļđāļĨāļĨāļ­āļ•āđ€āļ•āļ­āļĢāđŒāļĢāļĩāđˆ

from ThaiPersonalCardExtract import ThaiGovernmentLottery
reader = ThaiGovernmentLottery(save_extract_result=True, path_to_save="D:/dev/ThaiPersonalCardExtract/examples/extract/thai_government_lottery") # for windows need to pass tesseract_cmd parameter to setup your tesseract command path.
result = reader.extractInfo("../examples/card7.jpg")

Output will be in list format, each item represents result of library can extract, respectively. type of namedtuple āļœāļĨāļĨāļąāļžāļ˜āđŒāļ—āļĩāđˆāđ„āļ”āđ‰āļˆāļ°āđ€āļ›āđ‡āļ™āļ›āļĢāļ°āđ€āļ āļ— namedtuple āļŠāļēāļĄāļēāļĢāļ–āļĻāļķāļāļĐāļēāđ€āļžāļīāđˆāļĄāđ€āļ•āļīāļĄāđ€āļžāļ·āđˆāļ­āđƒāļŠāđ‰āļ‡āļēāļ™āđ„āļ”āđ‰āļˆāļēāļāļ—āļĩāđˆāļ™āļĩāđˆ āļ„āļĨāļīāļ

#Output of PersonalCard
    Card(Identification_Number='9999999999999', FullNameTH='āļ™āļēāļĒ āļ­āļēāļĒāļļāļĄāļšāļĄāļļāļĢāļēāđ€āļŠāļ°', PrefixTH='āļ™āļēāļĒ', NameTH='āļ­āļēāļĒāļļāļĄāļšāļĄāļļāļĢāļēāđ€āļŠāļ°', LastNameTH='āļ­āļēāļĒāļļāļĄāļšāļĄāļļāļĢāļēāđ€āļŠāļ°', PrefixEN='.Mr.Shoyo', NameEN='', LastNameEN='Hinatao', BirthdayTH='21 āļĄāļĩ.āļĒ. 2539', BirthdayEN='21 Jun..1996', Religion='āļžāļļāļ—āļ˜', Address='āļ—8āļ›āļš` 99/1 āļĄāļīāļ‹āļĩāđ‚āļŪāļ° āđ€āļ‚āļ•āļŪāļēāļ™āļēāļĄāļīāļāļēāļ§āļē āļ­āļģāđ€āļ āļ­āļŠāļīāļš', DateOfIssueTH='11 āļŠ.āļ„. 2554', DateOfIssueEN='11 Ang. 2021', DateOfExpiryTH='11 āļŠ.āļ„. 2574', DateOfExpiryEN='11 Aug. 2031,')

#Output of DrivingLicense
    Card(License_Number='98765432', IssueDateTH='āļœāļąāļ‡āļ—āļēāļ—āļĄ', ExpiryDateTH='', IssueDateEN='14 August 2664', ExpiryDateEN='14 August 2574', NameTH='āļē? āđ‚āļ™āļšāļāļ° āđ‚āļ™āļšāļĩ', NameEN='MRONOREAUMANE', BirthDayTH='', BirthDayEN='wa hs OKRA', Identity_Number='', Province='āļ™āļ„āļēāļĢāļēāļŠāļĻāļĩāļĄāļē')

#Output of ThaiGovernmentLottery
    Lottery(LotteryNumber='424603', LessonNumber='08', SetNumber='23', Year='2564') #type namedtuple 

For set lang attribute to tha

from ThaiPersonalCardExtract import PersonalCard
reader = PersonalCard(lang="tha", tesseract_cmd="D:/Program Files/Tesseract-OCR/tesseract") # for windows need to pass tesseract_cmd parameter to setup your tesseract command path.
result = reader.extractInfo('examples/card.jpg')

Output will be in list format, each item represents result of library can extract, respectively.

   "Identification_Number": "9999999999999",
   "FullNameTH": "āļ™āļēāļĒ āļ­āļēāļĒāļļāļĄāļšāļĄāļļāļĢāļēāđ€āļŠāļ°",
   "PrefixTH": "āļ™āļēāļĒ",
   "NameTH": "āļ­āļēāļĒāļļāļĄāļšāļĄāļļāļĢāļēāđ€āļŠāļ°",
   "LastNameTH": "āļ­āļēāļĒāļļāļĄāļšāļĄāļļāļĢāļēāđ€āļŠāļ°",
   "BirthdayTH": "21 āļĄāļĩ.āļĒ. 2539",
   "Religion": "āļžāļļāļ—āļ˜",
   "Address": "āļ—āđ’ 99/1 āļĄāļīāļŠāļĩāđ‚āļŪāļ° āđ€āļ‚āļ•āļŪāļēāļ™āļēāļĄāļīāļāļēāļ§āļē āļ­āļģāđ€āļ āļ­āļŠāļīāļš;",
   "DateOfIssueTH": "11 āļŠ.āļ„. 2554",
   "DateOfExpiryTH": "11 āļŠ.āļ„. 2574"

And you can set ocr provider following below default #used both easyocr and tesseract **Recommend Or easyocr Or tesseract

from ThaiPersonalCardExtract import PersonalCard
reader = PersonalCard(lang="tha", provider="default", tesseract_cmd="D:/Program Files/Tesseract-OCR/tesseract") # for windows need to pass tesseract_cmd parameter to setup your tesseract command path.
result = reader.extractInfo('examples/card.jpg')

Config Options

you can set options to Instance by below keyword

Parameter name Value Type Example
lang String Expected Results Language bash mix #get all area both tha and eng Or bash tha Or bash eng *Default is 'mix' āļŠāļģāļŦāļĢāļąāļš DrivingLicense, PersonalCard
provider String OCR Provider have bash default #used both easyocr and tesseract **Recommend Or bash easyocr Or bash tesseract *Default is 'default' āļŠāļģāļŦāļĢāļąāļš DrivingLicense, PersonalCard
template_threshold Double Rate to cals similarity of template *Default is 0.7
sift_rate Int Feature Keypoint rate *Default is 25,000
tesseract_cmd String Path of your tesseract command **For windows only.
save_extract_result Boolean Set True if you want to save extracted image *Default is False
path_to_save String Path that you given it save extracted image, relative with save_extract_result=True

