“Difficult” PDF files for translation
“Difficult” PDF files for translation
Very often, we receive PDF documents for technical translation – and unfortunately most of the times, they are non-editable PDF files – just scanned documents, especially in case of legal or government documents – most of the times the scan quality is bad and it is difficult to read the hand-written text, names and stamps! And even though editable PDF files are slightly better than their non-editable counterparts, if there is some editing or proofing work to be done on such editable files, conversion into Word file is required. This is a very painful process post conversion due to the multiple error issues that can occur as follows:
- incorrect pagination, paragraph breaks and sentence breaks
- images/drawings not or worse, incorrectly and incompletely converted with garbled text in labeling
- some letters and numbers are converted incorrectly, e.g. 0 instead of O/D, B instead of 8, C instead of 6, and as for other characters and symbols, it is a complete changeover too .. 0 instead of Ø, @ instead of ø, 5 or s instead of ≤, ? instead of 7,and the list goes on..
- in case of multiple columns on a page, text can get mixed up
- missing/hidden alphabets, words, lines, sentences or entire paragraphs
- parts of text are not converted and remain non editable in the form of images
- table columns and rows appear weird or sometimes, text gets mixed up
Hence it becomes essential to recheck each and every word, character, numeral, code and diagram thoroughly before proceeding to translation. We use CAT tools for translation and every paragraph break is considered by the tool as a new string for translation. It is amply clear that it is therefore not possible to translate a PDF converted into Word directly in a CAT tool, and therefore the time and effort spent on creating and checking such converted documents has to be factored in the scope of work. Most of the times, the client can and does provide a better formatted document, but sometimes we just have to make do with such painfully incorrect converted documents and have to spend more (and more )time correcting and checking it so as to finally translate a correct document, thus hopefully avoiding later rework in translations!
