bioYALP analyses FASTA files containing protein sequences. You can retrieve bacterial proteomes in the FASTA format on the NCBI's website.
This how a FASTA file looks like:
>protein 1 MLAGALCKQ >protein 2 MMLSEGITARR >protein 3 MTLPREE
Before each sequence there is a comment line (a line beginning with a >) which provides information on the sequence below. When a sequence is longer than 70 amino acids, it is split into lines of 70 amino acids. Here is another example:
>gi|16127995|ref|NP_414542.1| thr operon leader peptide [Escherichia coli K12] MKRISTTITTTITITTGNGAG >gi|16127997|ref|NP_414544.1| homoserine kinase [Escherichia coli K12] MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETFSLNNLGRFADKLPSEPRENIVYQCWE RFCQELGKQIPVAMTLEKNMPIGSGLGSSACSVVAALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHY DNVAPCFLGGMQLMIEENDIISQQVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCIAHGRHLAGF IHACYSRQPELAAKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETAQRVA DWLGKNYLQNQEGFVHICRLDTAGARVLEN
In some cases, it may be helpful to analyse files containing one sequence by line. This feature is currently in development and not available yet.
Once you've run an analysis (see command lines section below), bioYALP writes its results in the output folder. More precisely, bioYALP writes the FASTA comments of the proteins which seems to be lipoproteins.
The syntax of the output filenames is: [name of input file]_predicted_lipoproteins.list
The content of an output file looks like this:
gi|90111299|ref|NP_416102.1|
gi|90111309|ref|NP_416156.1|
gi|49176129|ref|YP_025304.1|
Open a terminal and go to the bioYALP folder (cd ~/bioYALP for
instance). Then you can call the bioYALP script by typing:
./bioYALP.pl -f [FASTA file you want to analyse] [options].
Replace [FASTA file you want to analyse] by the path of your FASTA file. For instance, input/E_coli_K12.fasta if you want to analyse E_coli_K12.fasta located into the folder input. You can also type absolute paths: -f /var/proteomes/E_coli_K12.fasta
If you do not want to use options, do not type [options].
When the script shows the number of predicted lipoproteins, it's done. You can open the results files located in the output folder. They only contain text, so you can use the text editor you want.
bioYALP looks for lipoproteins using the following criteria:
[LVI][ASTVI][ASG]C
You can change these with the -c option. Its syntax is:
-c [n-region minimal length],[n-region maximal
length],[n-region minimum charges],[lipobox]
If you do not want to change all of them, just leave a criterium blank.
Example 1
./bioYALP.pl -f file.fasta -c 3,10,1,[LV][ASTVI][ASG]C
Looks for sequences with an n-region between 3 and 10 amino acids,
having at least one charge, and a lipobox matching the regular expression
[LV][ASTVI][ASG]C
Example 2
./bioYALP.pl -f file.fasta -c ,,,[LV][ASTVI][ASG]C
Looks for sequences with default criteria for the n-region (see above)
and matching the regular expression [LV][ASTVI][ASG]C
For more information about tuning bioYALP, please read lipoprotein prediction.
Copyright © 2007-2008 Pierre-Yves Beaudouin