Apache Tika 0.7

Detecting and Extracting Data

Jessica Thornsby

The Apache Lucene project have released Apache Tika 0.7.

Apache Tika is a toolkit for detecting and extracting both metadata and structured text content using existing parser libraries.

With this release, MP3 file parsing has been updated. This includes Channel and SampleRate extraction and ID3v2 support. Audio parsing mime detection for the MIDI format has also been updated. Apache Tika now no longer relies on X11 for its RTF parsing functionality, and has been upgraded to the recently-released PDFBox 1.0.0 Java library.

Please see the Release Notes for more information.

comments powered by Disqus