View Single Post
  #1 (permalink)  
Old 03-24-2008, 05:07 PM
demi demi is offline
Junior Member
 
Join Date: Mar 2008
Posts: 7
Default lucene does not search cyrillic text

Here is program in which I create index with some example data:

PHP Code:
        $itemId 3245;
        
$title 'деякий екземпловий текст'// here is some cyrillic text
        
        
setlocale(LC_CTYPE'uk_UA.UTF-8');
        
        
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
            new 
Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8());
        
        
Zend_Loader::loadClass('Zend_Search_Lucene');
        
        
$index Zend_Search_Lucene::create('tmp/index');
        
$doc = new Zend_Search_Lucene_Document();
        
$doc->addField(Zend_Search_Lucene_Field::UnIndexed('itemId'$itemId));
        
$doc->addField(Zend_Search_Lucene_Field::Text('url'$url));
        
$doc->addField(Zend_Search_Lucene_Field::Keyword('title'$title));
        
$doc->addField(Zend_Search_Lucene_Field::UnStored('contents'$contentText)); // $contentText is variable comes with cyrillic text
        
$index->addDocument($doc); 
After executing this code, try to find indexed text:

PHP Code:
        $index Zend_Search_Lucene::open('tmp/index');
        
$hits $index->find($this->getRequest()->query); // contains cyrillic word which is also contains in indexed text
        
foreach($this->items as $item)
        {
            echo 
$item->title;
            echo 
'<br />';
            echo 
$item->url;
        } 
But $index->find(), returns empty result! Please, help me understand what wrong? Program works fine if operates with text contains latin symbols. Why here is important latin or cyrillic symbols I use?

Thank you in advance!
Reply With Quote