Zend Framework Forum

Go Back   Zend Framework Forum > Zend Framework Components > Mail, Formats & Search

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 10-10-2008, 10:10 AM
Junior Member
 
Join Date: Oct 2008
Posts: 9
Default Search_lucene and MoreLikeThis

There's currently no MoreLikeThis functionality in lucene_search. I had a look at the source in the java version and it's really not that complicated. But reverse engineering it into Lucene is hitting one road block. I would need to get a list of all the terms in an existing indexed document. At the moment I can't find any functions in the zend version to help do this.

Any possibility of there being an official MoreLikeThis function?

Any ideas on how to get a list of terms for an indexed document?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 10-11-2008, 10:53 AM
Junior Member
 
Join Date: Oct 2008
Posts: 2
Default

Quote:
Originally Posted by jbond View Post
There's currently no MoreLikeThis functionality in lucene_search. I had a look at the source in the java version and it's really not that complicated. But reverse engineering it into Lucene is hitting one road block. I would need to get a list of all the terms in an existing indexed document. At the moment I can't find any functions in the zend version to help do this.

Any possibility of there being an official MoreLikeThis function?

Any ideas on how to get a list of terms for an indexed document?
Yes, Zend_Search_Lucene doesn't generate and doesn't use Term Vectors now (they are not needed for supported query types).
Stored term vectors are required to have possibility of requesting terms matching specified document.

What is your time frame for MoreLikeThis functionality support? Term vectors operations could be added
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 10-11-2008, 02:20 PM
Junior Member
 
Join Date: Oct 2008
Posts: 9
Default

Quote:
Originally Posted by Alexander Veremyev View Post
What is your time frame for MoreLikeThis functionality support? Term vectors operations could be added
This year?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 12-19-2008, 01:12 PM
Junior Member
 
Join Date: Oct 2008
Posts: 9
Default

I've come back to this project. One possibility would be to start to re-index a document without actually posting it to the index. Extract the terms, and then use these to construct a query. So at what point are the terms generated in the document->Index process and could I get them out?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 01-30-2009, 12:11 PM
Junior Member
 
Join Date: Oct 2008
Posts: 9
Default

I've implemented a simple but workable solution to this.
- Extend document.php to include a function to return an array of terms->frequency
- Build a document based on the record to be used as if you were about to Add it to the index.
- Get the term->frequency array
- Calculate frequency*length(term_text) to create score
- Sort this desc
- Take the 1st 7 entries
- Use this for a lucene search
So long words that appear often are taken as descriptive of this document
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 03-30-2009, 12:07 AM
Junior Member
 
Join Date: Mar 2009
Posts: 1
Default

Is there any progress with the terms vector? I've created my own MoreLikeThis class but unfortunately it's a bit slow when i manually build the terms vector for the document because i have no way to retrieve the terms ONLY for a specific document id and not for the whole index.

Is there a way to
1. get the terms() for a specific document?
2. get the termFreq() for a specific document?

Thank you for your time
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-23-2009, 02:55 PM
Junior Member
 
Join Date: Apr 2009
Posts: 1
Default

I too would like to know this. If there is no way to do this in Zend Search Lucene, can any recommend how I can create some kind of recommendation engine?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 06-30-2009, 01:36 PM
Junior Member
 
Join Date: Jun 2009
Posts: 1
Default

Please check out the Search Lucene MoreLikeThis module for drupal. It is an extension of the Search Lucene API project and works fairly well. Although the module is in the development stage, you may be able to fokr the code for your purposes. Viewing the source code of the Search Lucene API project will also answer questions on how to get terms() and termFreq() as well.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT. The time now is 05:11 AM.


Designed by: Miner Skinz Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.1.0