Magpy is a Python wrapper for the mg search engine.
It features fast full text sarch, indexing and boolean queries.
Both mg and magpy are released under the GPL (General Public License).
| Source Code (Unix & Windows): | magpy-0.3.4.tar.gz |
| Windows DLLs (Python 2.3.5): | magpy-0.3.1.win32.python23.zip |
| Windows DLLs (Python 2.4.2): | magpy-0.3.1.win32.python24.zip |
To install magpy from source, extract the archive magpy-*.tar.gz, and run the following commands:
./configure python setup.py build python setup.py install
For installation on Windows, just extract the .zip file into your Python directory (e.g. to c:\python24\lib\).
#!/usr/bin/python
import mgindexer
import mgquery
import sys
store = mgquery.MGSearchStore("/tmp/data","alice")
while 1:
query = sys.stdin.readline()
q = store.newQuery(query)
print "Searching for",query,"(words",q.words(),")"
for docnum,ranking in q.execute():
print "Document",docnum,"matches (Ranking",ranking,")"
Before this works, you first have to create search store at (here) the location "/tmp/data", with the name "alice".
The following script creates such a store from a raw text file, which it splits into individual documents by splitting it into paragraphs.
#!/usr/bin/python
import mgindexer
fi = open("alice13a.txt", "rb")
fo = open("alice13a.splitted.txt", "wb")
for line in fi.readlines():
# split the file on the paragraph boundaries
if line.strip() == "":
fo.write(mgindexer.SEPARATOR)
else:
fo.write(line)
fo.close()
fi.close()
mgindexer.makeindex("alice13a.splitted.txt", "/tmp/data/", "alice")
If you have many individual documents you would like to run a search on, the following script is probably closer to what you need (notice it creates a new collection of the name "files", so you have to substitute "alice" by "files" in the example query script above):
#!/usr/bin/python
import mgindexer
import os
PATH = "files/"
fo = open("searchdata.txt", "wb")
for file in os.listdir(PATH):
if os.path.isfile(PATH + file):
# copy file
fi = open(PATH + file, "rb")
for line in fi.readlines():
fo.write(line)
# write document boundary
fo.write(mgindexer.SEPARATOR)
fo.close()
mgindexer.makeindex("searchdata.txt", "/tmp/data", "files")