I would like to use the Lucene collections created by the IBM Omnifind Yahoo! edition search engine. Using a simple program I have successfully opened the index and I am able to execute all of the basic functions and return results that match what I see when I inspect the index using Luke. The problem I am having is that I am not returning any results when I perform a find() on the field named _plain. (This is the primary field where all the Omnifind index terms are stored.) I suspect that it is due to the character set used by Omnifind to store terms in the _plain field. Oddly enough I am able to return results from all of the other fields...
** update **
After some testing I have discovered that I am getting results for the query but the score() method is filtering out all scores of 0. This is due to a value of 0 being returned by the norm() method...
Here is what happens:
The Lucene index created by Omnifind seems to have several empty elements in the field/normalization factor array.
$this->_norms
array (
[0] => yyy|||||||
[4] => ||||||||||
[8] => ||||||||||
[11] =>
[12] =>
[1] => |€€€€||||„
[2] => xv|vx|xxvy
[3] => ††††††††††
[5] => uuuuuuuuuu
[6] => ||||||||||
[7] => eppqmjjeeq
[9] =>
[10] =>
[13] =>
)
The segInfo->norm() method calls [PHP]Zend_Search_Lucene_Search_Similarity::decodeNorm(o rd($this->_norms[$fieldNum]{$id}));[/PHP]
For the search field "_plain" the $fieldNum = 10
Since $this->_norms[$fieldNum] is empty the returned value is 0! This bubbles all the way back to the score() and keeps the record out of the results.
** another update **
It appears that there is a bug in the _loadNorm($fieldNum) method of Zend_Search_Lucene_Index_SegmentInfo. Rather than taking the value of the passed $fieldNum the function loops over the $this->_fields array loading all of the fields. This is corrupting the $_norms array and that is why there were empty elements in the array. Queries on the Omnifind index works fine if I comment out the foreach loop and use just a single line to load the norm file for the appropriate field.
[php]
$this->_norms[$fieldNum] = $normfFile->readBytes($this->_docCount);
// foreach ($this->_fields as $fieldNum => $fieldInfo) {
// if ($fieldInfo->isIndexed) {
// $this->_norms[$fieldNum] = $normfFile->readBytes($this->_docCount);
// }
// }
[/php]
Last edited by jsloan; 04-01-2008 at 08:19 PM.