I would expect a very time consuming expensive process because of the need to carefully check every word. I don't share your faith that OCR will be 100% correct. Sometimes such tasks are offshored because they are so labour intensive.
But that aside, I would simply scan and OCR. I would discover what process was adding renderable text (which makes a nonsense of the idea of OCR) and stop it doing that.
Sorry if this comes over as overly negative. You don't have an easy problem and I hope you are able to bill for it suitably.