This website is a user-browsable database for motifs discovered by a fairly simple comparative genomics method described in:

You may browse all motifs. There are a number of little widgets on the ensuing pages might not be immediately intuitive. A help page is forthcoming, as are a number of new data features that should help make it easy for biologists to identify potential function.



A note about the software

A few people have requested software that accompanies the paper. In the coming weeks I will make the core components of the method available for download for the purposes of review, but you should be aware that while the actual computation is relatively simple, it's the handling of data that's requires 90% of the time. For example, to analyze motifs you need to scan all genomes for all instances of all motifs, which took on the order of 20,000 CPU hours (Boyer-Moore and Suffix-tree based algorithms are particularly hard to apply here since we are looking for bounded Hamming distance matches). I am considering alternatives to these technical problems; please contact me at ncjones at cs dot ucsd dot edu if you have any input or wish to use the software. In particular, if you do not have access to a compute cluster but would still like to use the software, it may be possible to make a service available here at UCSD that does all the heavy lifting, which you would control through a workflow application like Taverna. However, this can't happen unless I have some way of estimating demand and impact.


This website is maintained by Neil Jones. Last modified 8/02/2006.