Magpy is a Python wrapper for the mg search engine.
It features fast full text sarch, indexing and boolean queries.
Both mg and magpy are released under the GPL (General Public License).
Source Code (Unix & Windows): | magpy-0.3.4.tar.gz |
Windows DLLs (Python 2.3.5): | magpy-0.3.1.win32.python23.zip |
Windows DLLs (Python 2.4.2): | magpy-0.3.1.win32.python24.zip |
To install magpy from source, extract the archive magpy-*.tar.gz, and run the following commands:
./configure python setup.py build python setup.py install
For installation on Windows, just extract the .zip file into your Python directory (e.g. to c:\python24\lib\).
#!/usr/bin/python import mgindexer import mgquery import sys store = mgquery.MGSearchStore("/tmp/data","alice") while 1: query = sys.stdin.readline() q = store.newQuery(query) print "Searching for",query,"(words",q.words(),")" for docnum,ranking in q.execute(): print "Document",docnum,"matches (Ranking",ranking,")"
Before this works, you first have to create search store at (here) the location "/tmp/data", with the name "alice".
The following script creates such a store from a raw text file, which it splits into individual documents by splitting it into paragraphs.
#!/usr/bin/python import mgindexer fi = open("alice13a.txt", "rb") fo = open("alice13a.splitted.txt", "wb") for line in fi.readlines(): # split the file on the paragraph boundaries if line.strip() == "": fo.write(mgindexer.SEPARATOR) else: fo.write(line) fo.close() fi.close() mgindexer.makeindex("alice13a.splitted.txt", "/tmp/data/", "alice")
If you have many individual documents you would like to run a search on, the following script is probably closer to what you need (notice it creates a new collection of the name "files", so you have to substitute "alice" by "files" in the example query script above):
#!/usr/bin/python import mgindexer import os PATH = "files/" fo = open("searchdata.txt", "wb") for file in os.listdir(PATH): if os.path.isfile(PATH + file): # copy file fi = open(PATH + file, "rb") for line in fi.readlines(): fo.write(line) # write document boundary fo.write(mgindexer.SEPARATOR) fo.close() mgindexer.makeindex("searchdata.txt", "/tmp/data", "files")