Redline OCR

Overview

Redline Planroom has an extensive system for accurately scanning sheet(pdf page)'s numbers, names, and callouts.

The results after automatically scanning the sheet numbers for a pdf upload into Redline Planroom.

Technologies

  • For vector pdfs, PDF.js is used to scan all of the text elements in the pdf for patterns relating to sheet number. We do a number of processing on pdfs before scanning.

    • There are a larger number of cases because of the vast amount of ways that pdfs are built and the different softwares they are built from(AutoCAD, ChiefArchitect, etc..). An example of one process we run on a pdf is combining text blocks. Some pdfs will upload with each character as its own pdf text object.

  • For raster pdfs, Tesseract OCR is used to scan portions of the rasterized sheet image.

    • This process can be a hit or miss because, again, of the vast array of pdf styles that are uploaded.

    • However, if clients follow our guidelines for pdf styling (black text, no overlapping, sans serif font, etc..) then success is nearly guaranteed.