OCR on bounding boxes of an image #1564

cometta · 2024-05-24T13:32:09Z

I want to know if it's possible to input multiple bounding boxes and have TrOCR perform OCR only on those specified areas of my image. Could you please advise on this?

maifeeulasad · 2024-05-30T03:16:58Z

I don't think TrOcr for that. All you need is a utility function with opencv or some other preferred library, like this:

import cv2

def crop_images_from_bounding_boxes(image, bounding_boxes):
    """
    Crops images from the original image based on the provided bounding boxes.
    
    Parameters:
    image (numpy.ndarray): The original image.
    bounding_boxes (list of tuples): A list of bounding boxes, where each bounding box is represented by a tuple
                                     (x, y, width, height).
    
    Returns:
    list of numpy.ndarray: A list of cropped images.
    """
    cropped_images = []
    
    for (x, y, w, h) in bounding_boxes:
        cropped_image = image[y:y+h, x:x+w]
        cropped_images.append(cropped_image)
    
    return cropped_images

If you are worried about the performance, be assured that as the images are cropped so it will be faster, at least a bit. And you can you use multi-threading for this, if you don't have a GPU. If you have a GPU, this will be fast as hell.

I hope this answers your question.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCR on bounding boxes of an image #1564

OCR on bounding boxes of an image #1564

cometta commented May 24, 2024

maifeeulasad commented May 30, 2024

OCR on bounding boxes of an image #1564

OCR on bounding boxes of an image #1564

Comments

cometta commented May 24, 2024

maifeeulasad commented May 30, 2024