TeXtracta is a free tool to extract the text information embedded in documents which are not purely textual, such as PDF, JPG, or XLS, making use of the iFilter technology. The extracted text can be saved automatically in a TXT file, and its batch capabilities will allow you to convert into plain text all the files in a given directory (including, if needed, its subdirectories).
The iFilter technology was developed by Microsoft to allow its search engines to identify and index sensitive textual information contained in documents which are essentially graphical in nature. TeXtracta supports also other type of documents which contain text, such as XML, DOC, and RTF, extracting the plain text and leaving out all the graphical information associated with it. Each of these types of documents requires a specific iFilter plug-in, which need to be installed in your system for this program to work properly.
If needed, the text extracted from a single file may appear in the program’s main window, allowing you to cut and paste whatever fragments you need to re-use. For obvious reasons, this functionality is not available when the extraction is performed in multiple files within a directory, in which case the program will automatically save the corresponding TXT files in the same directory and with the same name as the source file.
The application’s interface is extremely simple, including just the necessary buttons to control the extraction process. The performance speed is one of its main assets, as well as the accuracy in the recognition of the textual information. Accented characters and other non-ASCII elements are usually well represented, and some interactive elements (such as links to URLs) will be represented in the text window when working with a single document. These links, however, will not be present in the TXT file.
Comments