Welcome, Guest. Register Now!
   
Mark Forums Read Mark Forums Read Mark Forums Read


Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 03-31-2008, 06:34 PM
Junior Member
 
Join Date: Jan 2008
Posts: 1
Default IBM Omnifind Yahoo! edition

I would like to use the Lucene collections created by the IBM Omnifind Yahoo! edition search engine. Using a simple program I have successfully opened the index and I am able to execute all of the basic functions and return results that match what I see when I inspect the index using Luke. The problem I am having is that I am not returning any results when I perform a find() on the field named _plain. (This is the primary field where all the Omnifind index terms are stored.) I suspect that it is due to the character set used by Omnifind to store terms in the _plain field. Oddly enough I am able to return results from all of the other fields...

** update **
After some testing I have discovered that I am getting results for the query but the score() method is filtering out all scores of 0. This is due to a value of 0 being returned by the norm() method...

Here is what happens:

The Lucene index created by Omnifind seems to have several empty elements in the field/normalization factor array.
$this->_norms
array (
[0] => yyy|||||||
[4] => ||||||||||
[8] => ||||||||||
[11] =>
[12] =>
[1] => |€€€€||||„
[2] => xv|vx|xxvy
[3] => ††††††††††
[5] => uuuuuuuuuu
[6] => ||||||||||
[7] => eppqmjjeeq
[9] =>
[10] =>
[13] =>
)

The segInfo->norm() method calls
PHP Code:
Zend_Search_Lucene_Search_Similarity::decodeNorm(ord($this->_norms[$fieldNum]{$id})); 
For the search field "_plain" the $fieldNum = 10

Since $this->_norms[$fieldNum] is empty the returned value is 0! This bubbles all the way back to the score() and keeps the record out of the results.

** another update **
It appears that there is a bug in the _loadNorm($fieldNum) method of Zend_Search_Lucene_Index_SegmentInfo. Rather than taking the value of the passed $fieldNum the function loops over the $this->_fields array loading all of the fields. This is corrupting the $_norms array and that is why there were empty elements in the array. Queries on the Omnifind index works fine if I comment out the foreach loop and use just a single line to load the norm file for the appropriate field.
PHP Code:
            $this->_norms[$fieldNum] = $normfFile->readBytes($this->_docCount);
//             foreach ($this->_fields as $fieldNum => $fieldInfo) {
//                 if ($fieldInfo->isIndexed) {
//                     $this->_norms[$fieldNum] = $normfFile->readBytes($this->_docCount);
//                 }
//             } 

Last edited by jsloan : 04-01-2008 at 08:19 PM.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT. The time now is 11:07 AM.