Welcome, Guest. Register Now!
   
Mark Forums Read Mark Forums Read Mark Forums Read


Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 03-24-2008, 05:07 PM
Junior Member
 
Join Date: Mar 2008
Posts: 6
Default lucene does not search cyrillic text

Here is program in which I create index with some example data:

PHP Code:
        $itemId 3245;
        
$title 'деякий екземпловий текст'// here is some cyrillic text
        
        
setlocale(LC_CTYPE'uk_UA.UTF-8');
        
        
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
            new 
Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8());
        
        
Zend_Loader::loadClass('Zend_Search_Lucene');
        
        
$index Zend_Search_Lucene::create('tmp/index');
        
$doc = new Zend_Search_Lucene_Document();
        
$doc->addField(Zend_Search_Lucene_Field::UnIndexed('itemId'$itemId));
        
$doc->addField(Zend_Search_Lucene_Field::Text('url'$url));
        
$doc->addField(Zend_Search_Lucene_Field::Keyword('title'$title));
        
$doc->addField(Zend_Search_Lucene_Field::UnStored('contents'$contentText)); // $contentText is variable comes with cyrillic text
        
$index->addDocument($doc); 
After executing this code, try to find indexed text:

PHP Code:
        $index Zend_Search_Lucene::open('tmp/index');
        
$hits $index->find($this->getRequest()->query); // contains cyrillic word which is also contains in indexed text
        
foreach($this->items as $item)
        {
            echo 
$item->title;
            echo 
'<br />';
            echo 
$item->url;
        } 
But $index->find(), returns empty result! Please, help me understand what wrong? Program works fine if operates with text contains latin symbols. Why here is important latin or cyrillic symbols I use?

Thank you in advance!
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-16-2008, 02:02 PM
Junior Member
 
Join Date: Mar 2008
Posts: 2
Default

Set lucene fields to utf-8 encoding like this:
PHP Code:
$doc->addField(Zend_Search_Lucene_Field::Keyword('title'$title'utf-8')); 
Add this lines before searching indexes:
PHP Code:
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
    new 
Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8()); 
And finally do this:
PHP Code:
$query Zend_Search_Lucene_Search_QueryParser::parse($this->getRequest()->query,'utf-8');
$hits $index->find($query); 
With me and Bulgarian characters works fine

Cheers!

Last edited by hedonism : 04-16-2008 at 02:10 PM.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT. The time now is 11:07 AM.