Information Retrieval


Sebastian writes: "After ages and ages of "real-soon-now" comments, we finally have our ZAP! Apache module ready for public release. ZAP! allows you to build WWW-based Z39.50 clients by filling in "templates" for each page in your interface. It is simple to do basic things, and yet it is possible to build quite advanced gateways. In the Apache mode, it can be extremely efficient, but it also runs as a conventional CGI script. It is freeware, but we have certain commercial options available." Nice... tried it on Yale's ORBIS using the demo page and it seems to work pretty clean.


Sebastian writes: "Index Data has just made its first release of a Z39.50 Server module for Perl. It provides a pretty simple API which hides most of the complexities of Z39.50 and network programming in general, so all you have to do is provide a bit of code to interface to your resource. You can draw on all the usual Perl tools to talk to back-end databases, create response records in XML, MARC, etc." Built using YAZ by the people who brought us YAZ to begin with... definitely worth a close look.


from freshmeat: "Significant API changes, involving provision of a unified settings system for passing parameters, and several small tweaks. Applications will need simple modifications and recompilation. Also features improved weighting, a forking network server, and a few bugfixes. Note that the Java, Perl and Python bindings do not function in this version - do not upgrade to this version if you require these." For more see


from freshmeat: "The purpose of mifluz is to provide a C++ library to build and query a full text inverted index. It is dynamically updatable, scalable (up to 1Tb indexes), uses a controlled amount of memory, shares index files and memory cache among processes or threads and compresses index files to 50% of the raw data. The structure of the index is configurable at runtime and allows inclusion of relevance ranking information. The query functions do not require to load all the occurences of a searched term. They consume very few resources and many searches can be run in parallel.

[changes include] Integration into the GNU project, complete re-architecture of the inverted index structure, major performance enhancements, and more." All this apparently from the principal author of Catalog...


Rob S. writes: "Cheshire is an OSS (Berkeley style licence) z39.50 search engine/server in active development. Also being developed is an extension
to Mozilla for the z39.50 protocol.


from freshmeat: "Distributed searching across several machines, improved writable databases, the ability to automatically select elite terms for performing queries, small API changes, and many bugfixes." see for more.

SLRI: web to Z39.50

the Simon Fraser University Library Research Instrument (SLRI) is "a web to Z39.50 client interface" brought to you by the good folks at SFU. it's an adaptation of the web to Z39.50 gateway developed by Harold Finkbeiner at Stanford, licensed under GPL and recently spied at as well.

CDS/ISIS: tell us more

I've now seen CDS/ISIS and its variants mentioned in several places and am still confused about what it is but here's a brief description nonetheless. from the UNESCO ISIS page: Micro CDS/ISIS is an advanced non-numerical information storage and retrieval software developed by UNESCO since 1985 to satisfy the need expressed by many institutions, especially in developing countries, to be able to streamline their information processing activities by using modern (and relatively inexpensive) technologies. The software was originally based on the Mainframe version of CDS/ISIS, started in the late '60s, thus taking advantage of several years of experience acquired in database management software development." take 2, from the CDS-ISIS user forum site: "Mini/Micro CDS/ISIS is a text retrieval program, designed and distributed free of charge by UNESCO. It is widely used for bibliographic (and other) databases throughout the world, and especially in developing countries." If I understand all this properly, it is basically a non-relational database environment commonly used by libraries and other largely nonprofits (20,000+ of 'em) throughout the world. I pulled down the unix version but can't quite make heads or tails of it. Somebody please explain more... update: collected comments from all who offered are available here.

muscat-0.1.0: Dialog Corp IR library

as seen at freshmeat: "Open Muscat is a high performance open source search engine library. It implements the probabalistic model of information retrieval, and is designed for use in applications ranging from full scale Web search engines to searching through email archives." what this doesn't say: muscat comes from the Dialog Corp. and what it also doesn't say: the muscat 'version' of the GPL is missing a significant section of the Real GPL, including the final paragraph which states "This General Public License does not permit incorporating your program into proprietary programs." which, apparently, Dialog doesn't understand, because they explicitly solicit requests for commercial licenses as well. somebody please tell them about the LGPL...

[Update, years later: IIRC, the post author was an idiot. This was a legit use of the GPL.]


as seen at freshmeat: "A minor API change for document access, a fix for a bug causing DA file reading to fail, various other bugfixes, extended test suite, and internal code reorganisation." for more see

Syndicate content