Results 1 to 1 of 1

Thread: IBM Omnifind Yahoo! edition

  1. #1
    jsloan is offline Junior Member
    Join Date
    Jan 2008

    Default IBM Omnifind Yahoo! edition

    I would like to use the Lucene collections created by the IBM Omnifind Yahoo! edition search engine. Using a simple program I have successfully opened the index and I am able to execute all of the basic functions and return results that match what I see when I inspect the index using Luke. The problem I am having is that I am not returning any results when I perform a find() on the field named _plain. (This is the primary field where all the Omnifind index terms are stored.) I suspect that it is due to the character set used by Omnifind to store terms in the _plain field. Oddly enough I am able to return results from all of the other fields...

    ** update **
    After some testing I have discovered that I am getting results for the query but the score() method is filtering out all scores of 0. This is due to a value of 0 being returned by the norm() method...

    Here is what happens:

    The Lucene index created by Omnifind seems to have several empty elements in the field/normalization factor array.
    array (
    [0] => yyy|||||||
    [4] => ||||||||||
    [8] => ||||||||||
    [11] =>
    [12] =>
    [1] => |€€€€||||„
    [2] => xv|vx|xxvy
    [3] => ††††††††††
    [5] => uuuuuuuuuu
    [6] => ||||||||||
    [7] => eppqmjjeeq
    [9] =>
    [10] =>
    [13] =>

    The segInfo->norm() method calls [PHP]Zend_Search_Lucene_Search_Similarity::decodeNorm(o rd($this->_norms[$fieldNum]{$id}));[/PHP]

    For the search field "_plain" the $fieldNum = 10

    Since $this->_norms[$fieldNum] is empty the returned value is 0! This bubbles all the way back to the score() and keeps the record out of the results.

    ** another update **
    It appears that there is a bug in the _loadNorm($fieldNum) method of Zend_Search_Lucene_Index_SegmentInfo. Rather than taking the value of the passed $fieldNum the function loops over the $this->_fields array loading all of the fields. This is corrupting the $_norms array and that is why there were empty elements in the array. Queries on the Omnifind index works fine if I comment out the foreach loop and use just a single line to load the norm file for the appropriate field.
    $this->_norms[$fieldNum] = $normfFile->readBytes($this->_docCount);
    // foreach ($this->_fields as $fieldNum => $fieldInfo) {
    // if ($fieldInfo->isIndexed) {
    // $this->_norms[$fieldNum] = $normfFile->readBytes($this->_docCount);
    // }
    // }
    Last edited by jsloan; 04-01-2008 at 08:19 PM.

Similar Threads

  1. Feed and Yahoo weather
    By eric.pommereau in forum Web & Web Services
    Replies: 2
    Last Post: 08-05-2008, 07:48 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts