練功房推薦書單

  • 猛虎出柙雙劍合璧版--最新 OCA / OCP Java SE 7 Programmer 專業認證 (電子書)
  • 流浪教師存零股存到3000萬(全新增修版)(書+DVD)
  • 開始在關西自助旅行(京都‧大阪‧神戶‧奈良)(全新增訂版)
  • 不敗教主的300張股票存股術

JAVA實作Search 一問 for Lucene RSS feed
討論區首頁 » 網頁程式設計 Web Development
發表人 內容
crc2121

九級學員

註冊時間: 2011/1/17
文章: 10
離線
我無法搜尋我要的詞彙(鼻涕),我這是有少打甚麼?為何只會搜尋到0筆資料。
可以給個起手式嗎?

懇請賜教~

Lucene 3.0.0 API
http://www.jarvana.com/jarvana/view/org/apache/lucene/lucene-core/3.0.0/lucene-core-3.0.0-javadoc.jar!/index.html?org/apache/lucene/util/Version.html



[Index code]

import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.Date;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.DateTools;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;

public class Indexer
{
public static void main(String[] args) throws IOException
{
String Idx = "C:\\test\\Idx";
String dateDir = "C:\\test\\Data";
IndexWriter indexWriter = null;

Directory dir = new SimpleFSDirectory(new File(Idx));
indexWriter = new IndexWriter(dir,new StandardAnalyzer(Version.LUCENE_30),true,IndexWriter.MaxFieldLength.UNLIMITED);

File[] files = new File(dateDir).listFiles();

for (int i = 0; i < files.length; i++)
{
Document doc = new Document();
doc.add(new Field("contents", new FileReader(files[i])));
doc.add(new Field("filename", files[i].getName(), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("indexDate",DateTools.dateToString(new Date(), DateTools.Resolution.DAY),Field.Store.YES,Field.Index.NOT_ANALYZED));
indexWriter.addDocument(doc);
}
System.out.println("numDocs"+indexWriter.numDocs());
indexWriter.close();

}

}
[END]

[Searcher code]

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;

public class Searcher
{
public static void main(String[] args) throws IOException, ParseException
{
String Idx = "C:\\test\\Idx";
Directory dir = new SimpleFSDirectory(new File(Idx));
IndexSearcher indexSearch = new IndexSearcher(dir);

QueryParser queryParser = new QueryParser(Version.LUCENE_30, "contents", new StandardAnalyzer(Version.LUCENE_30));
Query query = queryParser.parse("鼻涕"); //key Query term
TopDocs hits = indexSearch.search(query, 500);
System.out.println("找到"+hits.totalHits+"個");
for (int i = 0; i < hits.scoreDocs.length; i++)
{
ScoreDoc sdoc = hits.scoreDocs[i];
Document doc = indexSearch.doc(sdoc.doc);
System.out.println(doc.get("filename"));
}
indexSearch.close();
}
}
[END]


訊息回應如下圖:
http://img263.imageshack.us/f/bug03.jpg/

andowson

七段學員
[Avatar]

註冊時間: 2007/1/2
文章: 710
來自: 台北
離線
這個程式我測試過應該沒問題才對
1.首先下載Lucene 3.0.3,然後解壓縮,並將下列jar檔加入CLASSPATH
lucene-3.0.3\lucene-core-3.0.3.jar
lucene-3.0.3\contrib\analyzers\common\lucene-analyzers-3.0.3.jar

2.接著建立C:\test目錄,並在C:\test底下再建立兩個子目錄Idx和Data

3.接著透過Google搜尋鼻涕(google),找到任何一篇文章後,將它的內容另存成一個文字檔,如1.txt,存到C:\test\Data目錄下。

4.執行Indexer

5.再執行Searcher





 檔案名稱 1.txt [Disk] 下載
 描述 內容含有鼻涕的測試用文章
 檔案大小 4 Kbytes
 下載次數:  6 次


分享經驗 累積智慧
[WWW]
crc2121

九級學員

註冊時間: 2011/1/17
文章: 10
離線
我知道我錯哪邊了囧rz...
編碼的屬性是UTF-8應更改為ANSI才對囧rz...
andowson

七段學員
[Avatar]

註冊時間: 2007/1/2
文章: 710
來自: 台北
離線
關於您問到的「如果在搜尋裡面增加,所搜尋到的文件且呈現內文應該怎麼打? 」
我想可以用高亮方式來處理:
Searcher.java:

import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.StringReader;

import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.InvalidTokenOffsetsException;
import org.apache.lucene.search.highlight.QueryScorer;
import org.apache.lucene.search.highlight.Scorer;
import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;

public class Searcher {
public static void main(String[] args) throws IOException, ParseException, InvalidTokenOffsetsException {
String Idx = "C:\\test\\Idx";
String dataDir = "C:\\test\\Data";
Directory dir = new SimpleFSDirectory(new File(Idx));
IndexSearcher indexSearch = new IndexSearcher(dir);

QueryParser queryParser = new QueryParser(Version.LUCENE_30,
"contents", new StandardAnalyzer(Version.LUCENE_30));
Query query = queryParser.parse("鼻涕"); // key Query term
TopDocs hits = indexSearch.search(query, 500);
System.out.println("找到" + hits.totalHits + "個");
for (int i = 0; i < hits.scoreDocs.length; i++) {
ScoreDoc sdoc = hits.scoreDocs[i];
Document doc = indexSearch.doc(sdoc.doc);
System.out.println(doc.get("filename"));
Scorer scorer = new QueryScorer(query);
SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("", "");
Highlighter highlighter = new Highlighter(simpleHTMLFormatter, scorer);
String content = readFileAsString(dataDir+File.separator+doc.get("filename"));
TokenStream tokenStream = queryParser.getAnalyzer().tokenStream(
"contents", new StringReader(content));

String fragment = highlighter.getBestFragment(tokenStream, content);
System.out.println(fragment != null ? fragment : content);

}
indexSearch.close();
}
private static String readFileAsString(String filePath) throws java.io.IOException{
byte[] buffer = new byte[(int) new File(filePath).length()];
BufferedInputStream f = null;
try {
f = new BufferedInputStream(new FileInputStream(filePath));
f.read(buffer);
} finally {
if (f != null) try { f.close(); } catch (IOException ignored) { }
}
return new String(buffer);
}
}

需另外加入下列jar檔到CLASSPATH:
lucene-3.0.3\contrib\highlighter\lucene-highlighter-3.0.3.jar
lucene-3.0.3\contrib\memory\lucene-memory-3.0.3.jar
 檔案名稱 Searcher.java [Disk] 下載
 描述 加上 Highlighter
 檔案大小 3 Kbytes
 下載次數:  6 次


分享經驗 累積智慧
[WWW]
 
討論區首頁 » 網頁程式設計 Web Development
前往:   
行動版