+ Reply to Thread
Results 1 to 8 of 8

Thread: Search_lucene and MoreLikeThis

  1. #1
    jbond is offline Junior Member
    Join Date
    Oct 2008
    Posts
    9

    Default Search_lucene and MoreLikeThis

    There's currently no MoreLikeThis functionality in lucene_search. I had a look at the source in the java version and it's really not that complicated. But reverse engineering it into Lucene is hitting one road block. I would need to get a list of all the terms in an existing indexed document. At the moment I can't find any functions in the zend version to help do this.

    Any possibility of there being an official MoreLikeThis function?

    Any ideas on how to get a list of terms for an indexed document?

  2. #2
    Alexander Veremyev is offline Junior Member
    Join Date
    Oct 2008
    Posts
    2

    Default

    Quote Originally Posted by jbond View Post
    There's currently no MoreLikeThis functionality in lucene_search. I had a look at the source in the java version and it's really not that complicated. But reverse engineering it into Lucene is hitting one road block. I would need to get a list of all the terms in an existing indexed document. At the moment I can't find any functions in the zend version to help do this.

    Any possibility of there being an official MoreLikeThis function?

    Any ideas on how to get a list of terms for an indexed document?
    Yes, Zend_Search_Lucene doesn't generate and doesn't use Term Vectors now (they are not needed for supported query types).
    Stored term vectors are required to have possibility of requesting terms matching specified document.

    What is your time frame for MoreLikeThis functionality support? Term vectors operations could be added

  3. #3
    jbond is offline Junior Member
    Join Date
    Oct 2008
    Posts
    9

    Default

    Quote Originally Posted by Alexander Veremyev View Post
    What is your time frame for MoreLikeThis functionality support? Term vectors operations could be added
    This year?

  4. #4
    jbond is offline Junior Member
    Join Date
    Oct 2008
    Posts
    9

    Default

    I've come back to this project. One possibility would be to start to re-index a document without actually posting it to the index. Extract the terms, and then use these to construct a query. So at what point are the terms generated in the document->Index process and could I get them out?

  5. #5
    jbond is offline Junior Member
    Join Date
    Oct 2008
    Posts
    9

    Default

    I've implemented a simple but workable solution to this.
    - Extend document.php to include a function to return an array of terms->frequency
    - Build a document based on the record to be used as if you were about to Add it to the index.
    - Get the term->frequency array
    - Calculate frequency*length(term_text) to create score
    - Sort this desc
    - Take the 1st 7 entries
    - Use this for a lucene search
    So long words that appear often are taken as descriptive of this document

  6. #6
    arekanderu is offline Junior Member
    Join Date
    Mar 2009
    Posts
    1

    Default

    Is there any progress with the terms vector? I've created my own MoreLikeThis class but unfortunately it's a bit slow when i manually build the terms vector for the document because i have no way to retrieve the terms ONLY for a specific document id and not for the whole index.

    Is there a way to
    1. get the terms() for a specific document?
    2. get the termFreq() for a specific document?

    Thank you for your time

  7. #7
    Jeebs24 is offline Junior Member
    Join Date
    Apr 2009
    Posts
    1

    Default

    I too would like to know this. If there is no way to do this in Zend Search Lucene, can any recommend how I can create some kind of recommendation engine?

  8. #8
    cpliakas is offline Junior Member
    Join Date
    Jun 2009
    Posts
    1

    Default

    Please check out the Search Lucene MoreLikeThis module for drupal. It is an extension of the Search Lucene API project and works fairly well. Although the module is in the development stage, you may be able to fokr the code for your purposes. Viewing the source code of the Search Lucene API project will also answer questions on how to get terms() and termFreq() as well.

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts