SLiMMaker regular expression generator

Enter sequences and click "Make SLiM". Sequences must be of same length and can be raw sequences or fasta format.

(Example sequences are LIG_PCNA ELM occurrences.)

SLiMMaker options:
Min. no. of sequences for an aa to be in.
Min. combined freq of accepted aa to avoid wildcard
Max. no. different amino acids for one position
"Amino acid(s)" to ignore. (If nucleotide, would be N-)
Whether "peptides" are actually DNA fragments
Perform iterative SLiMMaker.

Citation: SLiMMaker is part of the ongoing benchmarking of QSLiMFinder, which should be submitted for publication soon. In the meantime, please cite this URL.

Availability: SLiMMaker is available on request and will shortly be part of the SLiMSuite package.

Function

    SLiMMaker has a fairly simple function of reading in a set of sequences and generating a regular expression motif
    from them. It is designed with protein sequences in mind but should work for DNA sequences too. Input sequences can
    be in fasta format or just plain text (with no sequence headers) and should be aligned already. Gapped positions will
    be ignored (treated as Xs) and variable length wildcards are not returned.

    SLiMMaker considers each column of the input in turn and compresses it into a regular expression element according to
    some simple rules, screening out rare amino acids and converting particularly degenerate positions into wildcards.
    Each amino acid in the column that occurs at least X times (as defined by minseq=X) is considered for the regular
    expression definition for that position. The full set of amino acids meeting this criterion is then assessed for
    whether to keep it as a defined position, or convert into a wildcard. First, if the number of different amino acids
    meeting this criterion is zero or above a second threshold (maxaa=X), the position is defined as a wildcard. Second,
    the proportion of input sequences matching the amino acid set is compared to a minimum frequency criterion
    (minfreq=X). Failing to meet this minimum frequency will again result in a wildcard. Otherwise, the amino acid set is
    added to the SLiM definition as either a fixed position (if only one amino acid met the minseq criterion) or as a
    degenerate position. Finally, leading and trailing wildcards are removed.

    By default, each defined position in a motif will contain amino acids that (a) occur in at least three sequences
    each, (b) have a combined frequency of >=75%, and (c) have 5 or fewer different amino acids (that occur in 3+
    sequences).

    Note. Unless the "iterate" function is used, the final motif only contains defined positions that match a given 
    frequency of the input (75% by default). Because positions are considered independently, however, the final motif
    might occur in fewer than 75% of the input sequences. SLiMSearch can be used to check the occurrence stats.

© RJ Edwards 2012. Last modified 13 June 2012.