MIfold helpfile
MIfold is a matlab program that uses mutual information and related
measures to infer secondary structures (including pseudoknots).
INSTALLATION
Unpack and untar the downloaded file MIfold.tar.gz by the following commands;
bash$ gunzip MIfold.tar.gz
bash$ tar -xf MIfold.tar
this will create a directory named MIfold containing the .m-files
necessary to run MIfold. All the .m-files have help texts, just type
'help filename' in the matlab command window for help about a particular
.m-file.
Start matlab by
bash$ matlab&
(or similaryly, depending on your operating system) add the search
path to the MIfold .m-files to the matlab path by
the command addpath e.g.,
>> addpath /home/user/MIfold/mfiles
for simplicity (if you want to use MIfold more than once) you should add
this line to your startup.m file.
Now you can run MIfold. To launch the MIfold GUI type MIfold in the matlab
command window.
>> MIfold
PROGRAM USAGE
MIfold can be used through the GUI or by calling the separate functions
directly. Try typing help functionname in the matlab command window to
get more information about how to use the function.
When you first open MIfold only a few options are available.
Load file - Loads a new alignment. The alignment should be in CLUSTAL format.
When a new file is loaded some information about the alignment
is printed in the matlab command window, such as size of
alignment, average pairwise identity and prior probabilities
used (for information of how to set the prior probabilities
see ADVANCED OPTIONS below).
Mutual information - This allows you to use different mutual information
measures.
H - This is the classical definition of mutual
information.
M - This is a 'simpler' version of mutual information
computed as an information content for a variable
indicating if a base pair can or cannot be formed
between two bases.
C - The covariation measure as used in RNAalifold
(Hofacker, 2002).
Inconsistent sequences penalty, q - The inconsistency penalty gives a
penalty to base pair positions that
contain inconsitent base pairs (bases
that cannot form a base pair). A base
pair positions with more inconsistent
bases get a higher penalty. q is
always a variable between 0 and 1.
Close - Closes MIfold
When an alignment is loaded more options are available;
Threshold values - Three different threshold values can be set;
sec.str. - The threshold value for prediction of
secondary structure. All mutual information
values below this threshold value will be
set to zero and thus no base pairs will be
formed between any two bases that have a
mutual information content below this
threshold.
p.knot - The threshold for prediction of pseuodoknots.
Mutual information values below this value will
be set to zero when the psudoknot base pairs
are predicted. This value can only be set if
the checkbox for predicting pseudoknot has been
checked.
info. seq. - The threshold used for computing the most
informative sequence. A nucleotide with an
information value below this threshold will
not be displayed in the most informative
sequence.
Plot MI - Plot the mutual information distribution. A new window with a
surface plot of the mutual information is opened. All mutual
information values above a threshold value are displayed. The
threshold value can be changed interactively in the new figure,
this can be used to set the secondary (and pseudoknot)
thresholds for structure prediction.
Radiobuttons 'Surface plot' and 'Contour plot' are available for
switching between displaying the mutual informatin distribution
in a surface or a contour plot.
Mountainplot - Display the mutual information content in a mountainplot.
This mountain plot is incremented by the mutual information
measure of the base pairs in the secondary structure. All
base pairs are displayed by a line connecting the positions
of the two base pairing nucleotides, this is how the
pseudoknots are displayed in the pseudoknot. The mountainplot
has two radiobuttons to the left, 'secondary str.' and
'pseudoknot'. These can be used for switching between
incrementing the pseudoknot structure or the secondary
structure.
The check box 'Display known' can be used to compare the
predicted structure and a known secondary structure. A
known secondary structure can be displayed in red in the
mountain plot. The mountain plot of the known structure
is scaled to have the same area as the mountain plot of
the predicted structure.
Below the mountain plot is the sequence information content
displayed in a bar plot.
info.seq. - Prints the most informative sequence in the command window.
The most informative sequence is computed using the matlab
function most_inf_seq.m with the defined threshold value.
Predict structure - Prints the predicted structure in dot-bracket notation
in the command value, if the pseudoknot checkbox is
checked a pseudoknot structure will be predicted (if
possible) otherwise only a secondary structure will be
predicted. Pseudoknots are displayed by [] or {}, the
secondary structure by ().
Predict p.knot - Check this checkbox to predict not only a nested
secondary structure, but also (if such exist) pseudoknot
structures.
Define structure - If a structure is defined a priori, the mountainplot
will only be incremented for base pairs that are in the
known structure. This festure can be used to display
how much covariation that supports a certain structure.
If a part of the secondary structure is known only this
part of the structure can be defined and all nucleotides
with unknown structure should be marked with -
ADVANCED OPTIONS
By default the allowed base pairs are GC, CG, AU, UA, GU and UG. This
can be changed by redefining the basepairing matrix, that defines if the
following base pairs are allowed;
AA AU AG AC A-
UA UU UG UC U-
GA GU GG GC G-
CA CU CG CC C-
-A -U -G -C --
By default the base pairing matrix is defined as;
0 1 0 0 0
1 0 1 0 0
0 1 0 1 0
0 0 1 0 0
0 0 0 0 0
By creating a file named as the alignment file .pairingmatrix (i.e., for
the alignmentfile PRP.aln, the file would be named PRP.aln.pairingmatrix)
and in this file enter the wanted basepairing matrix the wanted base pairings
will be used. If MIfold cannot find such a file the default matrix will
be used.
The sequence information content displayed below the mountainplot in a bar
plot is computed from a certain prior distribution of nucleotides. By
default the frequencies of A, U, G and C in the alignment are computed
and used as prior probabilities. In some cases this can be very incorrect,
in such cases the prior probabilities should be defined manually. This can be
done by creating a file named as the alignment file .prior (i.e., for PRP.aln
the file would be named PRP.aln.prior) that contains the prioor probabilities
for A, U, G and C, respectively. If the prior probablilities are A: 0.4,
U: 0.3, G: 0.1 and C: 0.2 the file should contain the following line;
0.4 0.3 0.1 0.2
FUNCTIONS
bracket2num.m
THis function converts a dot-bracket notation to a numerical vector
describing how the nucleotides are base paired.
DPpredstr.m
Predicts a secondary structure that maximises the total score given a
matrix of scores for every possible base combination. In MIfold the
matrix is the mutual information matrix and the total mutual information
is maximised.
inconsistency.m
Computes the inconsistency for every position in an alignment.
MIfold.m
The main function. Call this function to launch the GUI.
MIplot.m
Displays the mutual information distribution.
most_inf_seq.m
Computes the most informative sequence. This function cannot easily
be called from outside MIfold
mountainplot.m
Plots the mountainplot and the sequence bars. This function cannot
easily be called from outside MIfold.
mutual_H.m, mutual_C.m, mutual_M.m
Three functons for computing the three different kinds of mutual
information. mutual_H computes the classical mutual information,
mutual_C computes the covariation measure and mutual_M computes
a modified mutual information measure similar to the measure used
in RNAlogo.
pairwiseidentity.m
Calculates the mean pairwise identity of sequences in a multiple alignment.
readclustalw.m
Reads a clustalw alignment
seq_bars.m
Displays the sequence information content in sequence bars.
test_threshold.m
Use this file to test several different secondary and psudoknot
structure thresholds.
For more information about the functions, type help functionname in
the matlab command window.