Effective Data Augmentation for OCR

https://ift.tt/G8NxOSH My recipe to reach those last percents of (ac)cu(re)teness Image by author (generated with ) Background I faced...

https://ift.tt/G8NxOSH

My recipe to reach those last percents of (ac)cu(re)teness

Background

I faced a challenge of handwritten amounts that needed to be recognized as precise as possible. The difficulty lies in keeping the false positives below 0.01% . The amount of samples in the dataset was fixed, so data augmentation is the logical go-to. A quick search revealed no of-the-shelf method for Optical Character Recognition (OCR). So I pulled up my sleeves and created a data augmentation routine myself. It was used during training and helped my model reach the objective. Read on to know how.

By introducing small changes each time an image is trained, the model is less likely to overfit and generalize better. I used it in conjunction with TROCR, but any other model should benefit as well.

Test setup

Since I can’t share images from my proprietary dataset, I wanted to use samples from the IAM Handwriting Database, but I didn’t get a reply to my request for permission to use it in this article. So I created some of my own examples for demonstrating.

I will make use of OpenCV and the albumentations library, for three kinds of alterations: morphological, noise and transformations.

OpenCV is a well known computer vision library. Albumentations is a relatively new Python library for easy yet powerful image augmentations.

There is also a nice demo website where you can try what albumentations can do. It is however limited because you can’t use your own image to test on. So, I created a Jupyter notebook that I used to render all augmented images in this article. Feel free to open it in colab and experiment.

I will first show the alterations by itself with some explanation and then i will discuss my technique to combine all of them. I will suppose that all images are grayscale and will have undergone already contrast enhancement (eg. CLAHE).

1st augmentation technique: morphological alterations

These relate to the form of structure. To put it in simpler terms: they can be used to make the text lines appear to be written with a finer or thicker pen. Erosion and dilation they are called. Unfortunately these are not (yet?) part of the albumentations library, so i have to resort to opencv for this.

To create the effect that somebody used a pen with a fatter line width, we can dilate the original:

Erosion on the other hand (pun intended) simulates that the text has been written with a finer pen:

Be careful here that the last parameter — which is the number of iterations — is not set too high (here it was set to 3), otherwise you end up with the handwriting completely removed.

cv2.dilate(img, kernel,iterations=random.randint(1, 3))

For my dataset I could only set it to 1, so this really depends on your data.

2nd augmentation technique: noise introduction

We can either remove black pixels or add white pixels to the image. there are several methods to that. I have experimented with many of them, but here is my shortlist:

RandomRain with black drop color is very damaging. Even for me it’s hard to still read the text. That’s why i opt to set the chance of this happening very low:

RandomShadow will smudge the text with lines of varying intensity:

PixelDropout gently turns random pixels into black:

black pixels with PixelDropout (*Image by author*)

Unlike with black color drops, RandomRain with white drop color disintegrates the writing, which hardens the training. Much like the bad quality you see when a photocopy of a xerox of a fax was taken. The probability of this transform happening can be set much higher.

RandomRain — white version (*Image by author*)

In a lesser extent PixelDropout to white does the same. But it results more in a more general faded image:

PixelDropout with white pixels (*Image by author*)

3rd augmentation technique: transformations

ShiftScaleRotate: be careful here with the parameters. Try to avoid that some text is cut off and falls outside the original dimensions. There is both a zoom and rotation going on. Be sure to not overdo it with too big parameters. Otherwise you’ll have more chance that the 1st sample will happen. You can see it actually moves text outside of the image. This can be prevented by choosing a larger bounding box — so effectively adding more whitespace around the text.

Blur. The old (but gold) reliable. Will be performed in different intensities.

blurred handwritten text (*Image by author*)

The big finale: combining them all together:

This is where the power lies. We can randomly combine these effects to create unique images to include in each training epoch. Careful consideration needs to be taken that you don’t do too many methods of the same type. We can do this with the function in albumentation OneOf . OneOf contains a list of possible transformations and like the name implies, will only execute one of these with possibility P. So it makes sense to group transformations that do more or less the same, to avoid overdoing it. Here is the function:

import random
import cv2
import numpy as np
import albumentations as A

#gets PIL image and returns augmented PIL image
def augment_img(img):
  #only augment 3/4th the images
  if random.randint(1, 4) > 3:
      return img
  
  img = np.asarray(img)     #convert to numpy for opencv

  # morphological alterations
  kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(3,3))
  if random.randint(1, 5) == 1:
    # dilation because the image is not inverted
    img = cv2.erode(img, kernel, iterations=random.randint(1, 2))
  if random.randint(1, 6) == 1:
    # erosion because the image is not inverted
    img = cv2.dilate(img, kernel,iterations=random.randint(1, 1))
  
  transform = A.Compose([
      
    A.OneOf([
      #add black pixels noise
      A.OneOf([
             A.RandomRain(brightness_coefficient=1.0, drop_length=2, drop_width=2, drop_color = (0, 0, 0), blur_value=1, rain_type = 'drizzle', p=0.05), 
              A.RandomShadow(p=1),
              A.PixelDropout(p=1),
         ], p=0.9),

      #add white pixels noise
      A.OneOf([
              A.PixelDropout(dropout_prob=0.5,drop_value=255,p=1),
             A.RandomRain(brightness_coefficient=1.0, drop_length=2, drop_width=2, drop_color = (255, 255, 255), blur_value=1, rain_type = None, p=1), 
        ], p=0.9),
    ], p=1),

    #transformations
    A.OneOf([
            A.ShiftScaleRotate(shift_limit=0, scale_limit=0.25, rotate_limit=2, border_mode=cv2.BORDER_CONSTANT, value=(255,255,255),p=1),
            A.ShiftScaleRotate(shift_limit=0.1, scale_limit=0, rotate_limit=8, border_mode=cv2.BORDER_CONSTANT, value=(255,255,255),p=1),
            A.ShiftScaleRotate(shift_limit=0.02, scale_limit=0.15, rotate_limit=11, border_mode=cv2.BORDER_CONSTANT, value=(255,255,255),p=1),  
            A.Affine(shear=random.randint(-5, 5),mode=cv2.BORDER_CONSTANT, cval=(255,255,255), p=1)          
       ], p=0.5),
    A.Blur(blur_limit=5,p=0.25),
  ])
  img = transform(image=img)['image']  
  image = Image.fromarray(img)   
  return image

P stands for the chance of something happening. It’s a value between 0 and 1, where 1 means it always happens and 0 never.

So, let’s see it in action:

Looks pretty neat, no?

alternative approach: 🌮

In the EASTER 2.0 paper, they came up with the TACo technique. It stand for Tiling and Corruption. (🌮 haha)
It is capable of this:

figure by Kartik Chaudhary / Raghav Bali

I have not tried this out because my intuition tells me too much is destroyed from the original. In my opinion, if i can’t read it, a computer can neither. I might be wrong however, when you consider that as a human, you could guess it is ‘TACO’, if you see ‘TA█O’. We would look at the surrounding letters. and taco is a common word. But a computer with a dictionary behind it might make it ‘TAMO’, which happens to be an english word for ‘japanese ash’.

Conclusion

We’ve discussed many image manipulations and how they would be good for the task of OCR. I hope this could proof to be useful for you or at least gave you some inspiration to try it out yourselves. You can use my recipe as a baseline, but you’ll probably need to finetune a few parameters for it to be perfect for your dataset. Let me know how much your models have increased in accuracy!

I made the technique publicly available in this Jupyter notebook.

Page Nav

Ads Place

Effective Data Augmentation for OCR

https://ift.tt/G8NxOSH My recipe to reach those last percents of (ac)cu(re)teness Image by author (generated with ) Background I faced...

My recipe to reach those last percents of (ac)cu(re)teness

Background

Test setup

1st augmentation technique: morphological alterations

2nd augmentation technique: noise introduction

3rd augmentation technique: transformations

alternative approach: 🌮

Conclusion

Related Posts

No comments

Connect WIth Us

Top of the month

How to Create Podcasts using Google Illuminate?

‘The Fire Inside’ Review: When the Fight Isn’t in the Ring

MindsDB wants to give enterprise databases a brain

Symbl.ai, provider of conversational intelligence APIs and tools, gets $17M

Latest Posts

Cloud Labels

Search This Blog

Report Abuse

Contributors

Happy To Help You

Popular Tag

Latest Articles

Should you recalibrate your AI roadmap post changes in OpenAI ?

Web Scraping with LLMs

Drag-and-drop Data Pipelining: The Next Disruptor in ML

Machine Learning Unlocks Insights For Stress Detection

Popular Posts

Spider-Man: No Way Home Torrents May Contain Crypto Malware, Cybersecurity Firm Warns

Onecoin Victims Petition Bulgaria for Seizure of Assets and Compensation

3air Leverages Blockchain Technology to Deliver Extensive Broadband Connectivity in Africa

AI Applications for Border Transportation