This topic came up on a recent discussion on the VA Professional Magazine Facebook Page. Initially, I thought this would be a quick fix with lots of apps out there which could help, but as I looked a little deeper into it, I realised that this isn’t such a straightforward solution and might be something useful for many virtual assistants out there.
There could be many uses for this feature. Imagine you have a printed letter from a few years ago. The original word document has long since been lost or deleted and now you’d like to reuse the copy, but make a few edits before you do. You could spend an hour copy typing the contents of the letter or you could scan a copy of the letter and use the OCR imaging features of Microsoft to convert the contents to an editable format and save yourself tonnes of time along the way.
So how do you do it?
There are a few ways of achieving this, depending upon the version of Microsoft Office you are using. In older versions of Office there was a document scanner and OCR software included as standard, however, it was removed for the 2010 version of Microsoft Offic. You can install it as a feature if you need to in more recent versions. The feature was later re-added to a lesser degree in office 365 where you can open PDF files in Word and extract some text in order to edit it.
Open PDF files as word documents on Office 365
Simply open up a blank word document and go to File > Open > then browse to the location of your PDF or scanned file, select it > Click OK
You’ll see a message on your screen saying that you file will be converted to an editable file. Click OK and Voila!
Edit Scanned documents as editable files in older versions of Word
This requires a slightly longer process, use of Microsoft OneNote and installation of the OCR function
Start > Control Panel > Uninstall a Program
From the list find your version of Microsoft Office, then right click it and choose “Change”
In the resulting window choose “Add or Remove Features” > “Office Tools“ > Expand this section to show “Optical Character Recognition“ then select “Run from My Computer“ > “Continue“.
After Microsoft finishes configuration, you can then use OneNote to extract the text from the PDF file.
First off, open OneNote and open a blank page or new Notebook.
Go to Insert > File Printout > then browse to your scanned document or PDF file.
You’ll then see a copy of your file appear in your notebook. Right click the image and choose “Make Text in Image Searchable” > Select the language your document is in
Next, right click file printout and choose “Copy Text from All Pages of Printout”.
This will copy the entire text from your file to the clipboard. From here all you need to do is open up Microsoft Word and paste in your content. This will now be fully editable and you can amend in whichever way you need to.
Note: the OCR imaging will work on any images and PDF files you print to OneNote, however, results will depend upon the quality of the original file. The clearer the text, the better result you’ll get.
If you’re looking to extract data from an image file, you might also find this previously written article helpful.