jebl.evolution.sequences
Class Utils

java.lang.Object
  extended by jebl.evolution.sequences.Utils

public class Utils
extends java.lang.Object

Version:
$Id: Utils.java 918 2008-06-04 01:28:08Z twobeers $
Author:
Andrew Rambaut, Alexei Drummond

Method Summary
static State[] cleanSequence(java.lang.CharSequence seq, SequenceType type)
          Produce a clean sequence filtered of spaces and digits.
static NucleotideState[] complement(NucleotideState[] sequence)
           
static int getGaplessLocation(Sequence sequence, int gappedLocation)
          Gets the site location index for this sequence excluding any gaps.
static int getGappedLocation(Sequence sequence, int gaplessLocation)
          Gets the site location index for this sequence that corresponds to a location given excluding all gaps.
static byte[] getStateIndices(State[] sequence)
           
static SequenceType guessSequenceType(java.lang.CharSequence seq)
          Guess type of sequence from contents.
static boolean isPredominantlyRNA(java.lang.CharSequence sequenceString, int maximumNonGapsToLookAt)
          Is the given NucleotideSequence predominantly RNA? (i.e the more occurrences of "U" than "T")
static State[] reverse(State[] sequence)
           
static NucleotideState[] reverseComplement(NucleotideState[] sequence)
           
static java.lang.String reverseComplement(java.lang.String nucleotideSequence)
           
static java.lang.String reverseComplementWithGaps(java.lang.String nucleotideSequence)
           
static State[] stripGaps(State[] sequence)
           
static java.lang.String toString(State[] states)
           
static Sequence translate(Sequence sequence, GeneticCode geneticCode)
          Translates a given Sequence to a corresponding Sequence under the given genetic code.
static AminoAcidState[] translate(State[] states, GeneticCode geneticCode)
          Translates each of a given sequence of NucleotideStates or CodonStates to the AminoAcidState corresponding to it under the given genetic code.
static java.lang.String translate(java.lang.String nucleotideSequence, GeneticCode geneticCode)
          A wrapper for translateCharSequence(CharSequence,GeneticCode) that takes a nucleotide sequence as a String only rather than a CharSequence.
static java.lang.String translateCharSequence(java.lang.CharSequence nucleotideSequence, GeneticCode geneticCode)
          Translates the given nucleotideSequence into an amino acid sequence string, using the given geneticCode.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

translate

public static Sequence translate(Sequence sequence,
                                 GeneticCode geneticCode)
Translates a given Sequence to a corresponding Sequence under the given genetic code. Simply a utility function that calls AminoAcidState[] translate(final State[] states, GeneticCode geneticCode)

Parameters:
sequence - the Sequence.
geneticCode -
Returns:

translate

public static AminoAcidState[] translate(State[] states,
                                         GeneticCode geneticCode)
Translates each of a given sequence of NucleotideStates or CodonStates to the AminoAcidState corresponding to it under the given genetic code. Translation doesn't stop at stop codons; these are translated to AminoAcids.STOP_STATE. If translating from NucleotideState and the number of states is not a multiple of 3, then the excess states at the end are silently dropped.

Parameters:
states - States to translate; must all be of the same type, either NucleotideState or CodonState.
geneticCode -
Returns:

isPredominantlyRNA

public static boolean isPredominantlyRNA(java.lang.CharSequence sequenceString,
                                         int maximumNonGapsToLookAt)
Is the given NucleotideSequence predominantly RNA? (i.e the more occurrences of "U" than "T")

Parameters:
sequenceString - the sequence string to inspect to determine if it's RNA
maximumNonGapsToLookAt - for performance reasons, only look at a maximum of this many non-gap residues in deciding if the sequence is predominantly RNA. Can be -1 or Integer.MAX_VALUE to look at the entire sequence.
Returns:
true if the given NucleotideSequence predominantly RNA

reverseComplement

public static java.lang.String reverseComplement(java.lang.String nucleotideSequence)

reverseComplementWithGaps

public static java.lang.String reverseComplementWithGaps(java.lang.String nucleotideSequence)

translateCharSequence

public static java.lang.String translateCharSequence(java.lang.CharSequence nucleotideSequence,
                                                     GeneticCode geneticCode)
Translates the given nucleotideSequence into an amino acid sequence string, using the given geneticCode. The translation is done triplet by triplet, starting with the triplet that is at index 0..2 in nucleotideSequence, then the one at index 3..5 etc. until there are less than 3 nucleotides left.

This method uses translate(State[],GeneticCode) to do the translation, hence it shares some properties with that method: 1.) Any excess nucleotides at the end will be silently discarded, 2.) Translation doesn't stop at stop codons; instead, they are translated to "*", which is AminoAcids.STOP_STATE's code.

Parameters:
nucleotideSequence - nucleotide sequence to translate
geneticCode - genetic code to use for the translation
Returns:
A string with length nucleotideSequence.length() / 3 (rounded down), the translation of nucleotideSequence with the given genetic code

translate

public static java.lang.String translate(java.lang.String nucleotideSequence,
                                         GeneticCode geneticCode)
A wrapper for translateCharSequence(CharSequence,GeneticCode) that takes a nucleotide sequence as a String only rather than a CharSequence. This is to preserve backwards compatibility with existing compiled code.

Parameters:
nucleotideSequence - nucleotide sequence string to translate
geneticCode - genetic code to use for the translation
Returns:
A string with length nucleotideSequence.length() / 3 (rounded down), the translation of nucleotideSequence with the given genetic code

stripGaps

public static State[] stripGaps(State[] sequence)

reverse

public static State[] reverse(State[] sequence)

complement

public static NucleotideState[] complement(NucleotideState[] sequence)

reverseComplement

public static NucleotideState[] reverseComplement(NucleotideState[] sequence)

getStateIndices

public static byte[] getStateIndices(State[] sequence)

getGaplessLocation

public static int getGaplessLocation(Sequence sequence,
                                     int gappedLocation)
Gets the site location index for this sequence excluding any gaps. The location is indexed from 0.

Parameters:
sequence - the sequence
gappedLocation - the location including gaps
Returns:
the location without gaps.

getGappedLocation

public static int getGappedLocation(Sequence sequence,
                                    int gaplessLocation)
Gets the site location index for this sequence that corresponds to a location given excluding all gaps. The first non-gapped site in the sequence has a gaplessLocation of 0.

Parameters:
sequence - the sequence
gaplessLocation -
Returns:
the site location including gaps

guessSequenceType

public static SequenceType guessSequenceType(java.lang.CharSequence seq)
Guess type of sequence from contents.

Parameters:
seq - the sequence
Returns:
SequenceType.NUCLEOTIDE or SequenceType.AMINO_ACID, if sequence is believed to be of that type. If the sequence contains characters that are valid for neither of these two sequence types, then this method returns null.

cleanSequence

public static State[] cleanSequence(java.lang.CharSequence seq,
                                    SequenceType type)
Produce a clean sequence filtered of spaces and digits.

Parameters:
seq - the sequence
type - the sequence type
Returns:
An array of valid states of SequenceType (may be shorter than the original sequence)

toString

public static java.lang.String toString(State[] states)


http://code.google.com/p/jebl2/