Here is a list of terms, expressions and abbreviations used in this documentation.
Abbreviation from optical character recognition. OCR is the process of extracting plain text (and associated information) from an image, photo or a picture. Example: John takes a photo with his mobile phone of a paper based bank statement. Let’s say IBAN number appears on that document. From resulted photo - filename bank-statement.jpeg - John won’t be able to copy IBAN number and pass it over whatsapp to his wife.
On the other hand, if the same bank statement photo is processed using optical character recognition technology (OCR) - the text is extracted from the photo (for example as bank-statement.txt file) and John can open bank-statement.txt file, select IBAN number and copy/paste it in whatsapp chat to his wife.
OCR technology has widespread usage across many areas. It enables computers to understand pictures. If computers understand what text is inside images, then users can search for specific terms across photos.
Scanned document is a just photo of the document - usually of higher quality than photos taken with mobile phones for example. Described with informal terms scanners are specialized devices for taking photos of the documents.
OCRs or OCRed¶
OCRs - jargon term - a verb derived from noun OCR. Expression File X was OCRed means that optical character recognition process was performed on file X. Similarly expression It OCRs the documents reads “it uses optical character recognition technology over the documents” with same meaning as “it extracts text from scanned documents”
Funny enough, here is how you can conjugate verb OCR in present tense:
Singular Plural I OCR We OCR You OCR You OCR He/she/it OCRs They OCR
And in past tense (preterite)
Singular Plural I OCRed We OCRed You OCRed You OCRed He/she/it OCRed They OCRed