HitKeeper [1] is a collection of software tools, mainly written in Perl/SQL, intended to help bioinformatic researchers with "friendly" tools. It allows fully automatic handling of multiple databases on a large scale, calculation of "hits", and querying.
HitKeeper is developed at the Swiss Institute of Bioinformatics. It forms the "back end" of the MyHits website [2], which is in turn an extension to the database Hits [3]. - The metamotif query language is based on experience with mmsearch [4].
MyHits is in full production status since 2003 and currently handles more than 7 mio sequences (with weekly updates!), and 20 mio hits on these. With regard to these data, HitKeeper can be considered as robust and scalable.
Further publications related to HitKeeper are given below.
The software is originally designed for the investigation of the relationships between protein sequences and motifs defined (or predicted) on them. These "motifs" are defined by a (heterogeneous) collection of "predictors", which include RegEx, generalized profiles, and hidden Markov models. In contrast to other projects with similar scope, HitKeeper attempts to provide a generic solution to the handling of redundancy and incremental update of biological databases. It also promotes the development of original query tools.
Some particular features are:
In the long run, the target of HitKeeper is to arrive at a system that is capable of storing biological sequences in the widest sense, including protein, DNA, and 3D-structures. Consequently, large and partially redundant collections of sequences as well as heterogeneous formats are expected -- so the system was designed to be modular and highly flexible.
The main authors are Marco Pagni and Jörg Hau, with contributions - be it with suggestions, testing, discussions or code - from Dmitry Kuznetsov, Vassilios Ioannidis, Laurent Falquet, Lorenzo "Luli" Cerutti, Monique Zahn-Zabal, Michael Muller, and Victor Jongeneel. Thank you!
HitKeeper and all related software are Free Software and are published under Version 2 of the GNU General Public License (GPL). You can redistribute it and/or modify it under the same term, which ensures that its source code is free and that any derivatization, or implementation of it in other software, will also remain free.
The difference between "free software" and "freeware" is of legal importance. If you do not understand any portion of this license, please seek appropriate professional legal advice. If you do not or if - for any reason - you can not accept all of the conditions of the GPL, then you must not use nor distribute this software.
HitKeeper is a collection of scripts - mainly written in Perl - that interact with each other and with an SQL database engine. The system is primarily developed using MySQL, and in parallel it is ported to Oracle. HitKeeper runs on most "unixoid" Platforms (Linux and OSX tested with success).
Detailed instructions for installation, operational qualification testing, and maintenance are provided inside the distribution.
HitKeeper is hosted at sourceforge.net. To download, you have several possibilities:
[1] J. Hau, M. Muller, M. Pagni. HitKeeper, a generic software package for hit list management. Source Code for Biology and Medicine 2 (2007), 2.
[2] M. Pagni, V. Ioannidis, L. Cerutti, M. Zahn-Zabal, C.V. Jongeneel, L. Falquet. MyHits: a new interactive resource for protein annotation and domain identification. Nucleic Acids Res. 32 (2004), W332-335 (Web Server issue).
[3] M. Pagni, C. Iseli, T. Junier, L. Falquet, V. Jongeneel, P. Bucher. TrEST, trGEN and Hits: Access to databases of predicted protein sequences. Nucleic Acids Research 29 (2001), 148-151.
[4] T. Junier, M. Pagni, P. Bucher. mmsearch: a motif arrangement language and search program. Bioinformatics 17 (2001), 1234-1235.
[5] M. Pagni, J. Hau, H. Stockinger. A Multi-protocol Bioinformatics Web Service: Use SOAP, Take a REST or Go with HTML. Eighth IEEE International Symposium on Cluster Computing and the Grid (ccgrid) (2008), 728-734.
[6] M. Pagni, V. Ioannidis, L. Cerutti, M. Zahn-Zabal, C. V. Jongeneel, J. Hau, O. Martin, D. Kuznetsov, L. Falquet. MyHits: improvements to an interactive resource for analyzing protein sequences. Nucleic Acids Res. 35 (2007), W433-W437.