View Single Post
  #1 (permalink)  
Old 03-12-2008, 05:13 PM
andyt andyt is offline
Junior Member
 
Join Date: Mar 2008
Posts: 1
Default Stripping HTML out of Zend Search Lucene indexes

I'm using Zend Search Lucene (versions 0.92-beta and 1,04) to index the content of a web site. Basically it is all working but the embedded HTML markup is causing a problem as a search query for, say, the word 'family' returns almost every page in the site as the HTML tag <font-family> has been used

Is there a built-in Zend_Search_Lucene function that will strip out all of the HTML markup from the page before it is indexed? Or should I filter it through the PHP strip_tags() function before indexing it?

Andy
Reply With Quote