Search Web......

How can I index Powerpoint documents?

To create a Lucene based index of Powerpoint documents you need to first parse them to extract text that you want to index. You can use the Jakarta Apache POIto parse the Powerpoint document and extract the relevant text from it.


You should also store the location of the Powerpoint document on file system in one of index fields this way when you find the document it can be directly shown as a link to the user. Storing all text in a content field is a good idea as you may want to show text representation of same document. Also this will help people to different wild card searches.