Documentation read from 01/25/2012 11:58:32 version of /scratch/FIGdisk.server/FIG/bin/svr_assign_to_dna_using_figfams.

svr_assign_to_dna_using_figfams

svr_assign_to_dna_using_figfams

Introduction

    svr_assign_to_dna_using_figfams <feature_list.fasta >functions.tbl

Assign Using the FIGfams Server

This script takes a FASTA file of DNA contigs from the standard input and writes the predicted function of each (if it can be estimated) to the standard output. FIGfams are used to determine the function when possible. When not possible, a message will be written to the standard error output.

This script is substantially different from svr_assign_using_figfams.pl in that each incoming sequence should be considered as contigs in which results can be found rather than a single sequence whose function is desired. As a result, the output will not correspond well to the input. Some sequences will get many hits, some will have only one, and some may not have any.

Command-Line Options

--kmer

Size of the kmers used to detect similarity (defaults to 8)

--reliability

A number, generally from 1 to 100, indicating how careful we should be about making the assignments. A higher number indicates greater care. Defaults to 5.

--maxGap

When looking for a match, if two sequence elements match and are closer than this distance, then they will be considered part of a single match. Otherwise, the match will be split.

--minSize

When looking for a match, we group together a set that "covers" some region. The set is not necessarily in a single frame (i.e., we treat the sequence as low quality and only consider the number of hits in a region on the same strand). This parameter forces the size of the region to be above a specified value. The default is 6 * the size of the kmers (the 'kmer' parameter).

--by_location

Display the hits ordered by location.

--help

Display this command's parameters and options.

Output Format

The standard output is a tab-delimited file. Each output record consists of the ID of the query sequence, the number of matching kmers, a location in the sequence, the predicted function, and an organism name that represents an OTU category.