Magpy is a Python wrapper for the mg search engine.
It features fast full text sarch, indexing and boolean queries.
Both mg and magpy are released under the GPL (General Public License).
|Source Code (Unix & Windows):||magpy-0.3.4.tar.gz|
|Windows DLLs (Python 2.3.5):||magpy-0.3.1.win32.python23.zip|
|Windows DLLs (Python 2.4.2):||magpy-0.3.1.win32.python24.zip|
To install magpy from source, extract the archive magpy-*.tar.gz, and run the following commands:
./configure python setup.py build python setup.py install
For installation on Windows, just extract the .zip file into your Python directory (e.g. to c:\python24\lib\).
#!/usr/bin/python import mgindexer import mgquery import sys store = mgquery.MGSearchStore("/tmp/data","alice") while 1: query = sys.stdin.readline() q = store.newQuery(query) print "Searching for",query,"(words",q.words(),")" for docnum,ranking in q.execute(): print "Document",docnum,"matches (Ranking",ranking,")"
Before this works, you first have to create search store at (here) the location "/tmp/data", with the name "alice".
The following script creates such a store from a raw text file, which it splits into individual documents by splitting it into paragraphs.
#!/usr/bin/python import mgindexer fi = open("alice13a.txt", "rb") fo = open("alice13a.splitted.txt", "wb") for line in fi.readlines(): # split the file on the paragraph boundaries if line.strip() == "": fo.write(mgindexer.SEPARATOR) else: fo.write(line) fo.close() fi.close() mgindexer.makeindex("alice13a.splitted.txt", "/tmp/data/", "alice")
If you have many individual documents you would like to run a search on, the following script is probably closer to what you need (notice it creates a new collection of the name "files", so you have to substitute "alice" by "files" in the example query script above):
#!/usr/bin/python import mgindexer import os PATH = "files/" fo = open("searchdata.txt", "wb") for file in os.listdir(PATH): if os.path.isfile(PATH + file): # copy file fi = open(PATH + file, "rb") for line in fi.readlines(): fo.write(line) # write document boundary fo.write(mgindexer.SEPARATOR) fo.close() mgindexer.makeindex("searchdata.txt", "/tmp/data", "files")