![]() |
|
|||
|
There's currently no MoreLikeThis functionality in lucene_search. I had a look at the source in the java version and it's really not that complicated. But reverse engineering it into Lucene is hitting one road block. I would need to get a list of all the terms in an existing indexed document. At the moment I can't find any functions in the zend version to help do this.
Any possibility of there being an official MoreLikeThis function? Any ideas on how to get a list of terms for an indexed document? |
|
|||
|
Quote:
Stored term vectors are required to have possibility of requesting terms matching specified document. What is your time frame for MoreLikeThis functionality support? Term vectors operations could be added
|
|
|||
|
I've come back to this project. One possibility would be to start to re-index a document without actually posting it to the index. Extract the terms, and then use these to construct a query. So at what point are the terms generated in the document->Index process and could I get them out?
|
|
|||
|
I've implemented a simple but workable solution to this.
- Extend document.php to include a function to return an array of terms->frequency - Build a document based on the record to be used as if you were about to Add it to the index. - Get the term->frequency array - Calculate frequency*length(term_text) to create score - Sort this desc - Take the 1st 7 entries - Use this for a lucene search So long words that appear often are taken as descriptive of this document |
|
|||
|
Is there any progress with the terms vector? I've created my own MoreLikeThis class but unfortunately it's a bit slow when i manually build the terms vector for the document because i have no way to retrieve the terms ONLY for a specific document id and not for the whole index.
Is there a way to 1. get the terms() for a specific document? 2. get the termFreq() for a specific document? Thank you for your time |
|
|||
|
Please check out the Search Lucene MoreLikeThis module for drupal. It is an extension of the Search Lucene API project and works fairly well. Although the module is in the development stage, you may be able to fokr the code for your purposes. Viewing the source code of the Search Lucene API project will also answer questions on how to get terms() and termFreq() as well.
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|
| Designed by: Miner Skinz |
Powered by vBulletin® Version 3.8.4 Copyright ©2000 - 2010, Jelsoft Enterprises Ltd. Search Engine Friendly URLs by vBSEO 3.1.0 |
![]() |