PaSiMap maps protein sequences as coordinates based on their pairwise similarities. The similarity between each pair of sequences is determined with their global (i.e. whole-length) alignment. Therefore, the protein sequences should be trimmed to the region of interest and should have similar lengths.
FASTA-format
The protein sequences have to be in FASTA-format.
This means that each protein sequence consists of:
>
'.>
'.Example 1:
>Z__Z1
APTFTQPLQSVVVLEGSTATFEAHISGFPVPEVSWFRDGQVISTSTLPGVQISFSDGRAKLTIPAVTKANSGRYSLKATNGSGQATSTAELLV
>Z__Z2
APPNFVQRLQSMTVRQGSQVRLQVRVTGIPTPVVKFYRDGAEIQSSLDFQISQEGDLYSLLIAEAYPEDSGTYSVNATNSVGRATSTAELLVQ
>Z__Z3
PPTLVSGLKNVTVIEGESVTLECHISGYPSPTVTWYREDYQIESSIDFQITFQSGIARLMIREAFAEDSGRFTCSAVNEAGTVSTSCYLA
>Z__Z4
APYFITKPVVQKLVEGGSVVFGCQVGGNPKPHVYWKKSGVPLTTGYRYKVSYNKQTGECKLVISMTFADDAGEYTIVVRNKHGETSASASLL
>Z__Z5
SGFDSRIKNYRILEGMGVTFHCKMSGYPLPKIAWYKDGKRIKHGERYQMDFLQDGRASLRIPVVLPEDEGIYTAFASNIKGNAICSGKLYVE
>Z__Z6
PVFVLKPVSFKCLEGQTARFDLKVVGRPMPETFWFHDGQQIVNDYTHKVVIKEDGTQSLIIVPATPSDSGEWTVVAQNRAGRSSISVILT
>Z__Z7
PMFVEKLKNVNIKEGSRLEMKVRATGNPNPDIVWLKNSDIIVPHKYPKIRIEGTKGEAALKIDSTVSQDSAWYTATAINKAGRDTTRCKVN
Example 2:
>Z__Z1
APTFTQPLQSVVVLEGSTATFEAHISGFPVPEVSWFRDGQVISTSTLPGV
QISFSDGRAKLTIPAVTKANSGRYSLKATNGSGQATSTAELLV
>Z__Z2
APPNFVQRLQSMTVRQGSQVRLQVRVTGIPTPVVKFYRDGAEIQSSLDFQ
ISQEGDLYSLLIAEAYPEDSGTYSVNATNSVGRATSTAELLVQ
>Z__Z3
PPTLVSGLKNVTVIEGESVTLECHISGYPSPTVTWYREDYQIESSIDFQI
TFQSGIARLMIREAFAEDSGRFTCSAVNEAGTVSTSCYLA
>Z__Z4
APYFITKPVVQKLVEGGSVVFGCQVGGNPKPHVYWKKSGVPLTTGYRYKV
SYNKQTGECKLVISMTFADDAGEYTIVVRNKHGETSASASLL
>Z__Z5
SGFDSRIKNYRILEGMGVTFHCKMSGYPLPKIAWYKDGKRIKHGERYQMD
FLQDGRASLRIPVVLPEDEGIYTAFASNIKGNAICSGKLYVE
>Z__Z6
PVFVLKPVSFKCLEGQTARFDLKVVGRPMPETFWFHDGQQIVNDYTHKVV
IKEDGTQSLIIVPATPSDSGEWTVVAQNRAGRSSISVILT
>Z__Z7
PMFVEKLKNVNIKEGSRLEMKVRATGNPNPDIVWLKNSDIIVPHKYPKIR
IEGTKGEAALKIDSTVSQDSAWYTATAINKAGRDTTRCKVN
FASTA-header: ASCII, must be unique
PaSiMap only allows ASCII-characters in the header.
The special characters ':|,/\
' and space
-characters will be internally replaced with the character '_
'.
All FASTA-headers must be unique, even after this replacement.
FASTA-body: amino acids, should be unique
PaSiMap was developed for sequences of the 20 amino acids. The sequence will be internally converted to uppercase letters.
All FASTA-bodies should be unique in order to avoid a bias towards the more frequent sequences.
No, an advantage of PaSiMap is that the input sequences do not have to be aligned.
The input sequences do not have to be aligned to each other.
However, if you have a MSA of your sequences, you can also input the MSA instead. If you want to use the alignment information of your MSA (instead of letting PaSiMap compute the pairwise alignments), please do the following steps:
You can use any kind of data, as long as the pairwise similarities can be described as correlation coefficients or correlation coefficient like values. This means that the value range must be within the range of -1 to 1: with 0 for no correlation, 1 for the strongest positive correlation and -1 for the strongest negative correlation.
PaSiMap can only determine the pairwise similarities for protein sequences. The range of the values is from 0 to 1: with 0 for no similarity and 1 for the strongest similarity. Negative values do not occur. For more details please refer to PaSiMap paper.
However, you may skip the computation of the pairwise similarities done by PaSiMap and directly input your own pairwise relations. The calculation of pairwise similarities for anything but protein sequences is not included in PaSiMap, so you will need to do that yourself. The pairwise similarities can then be used as input with the expert-option 'Pairwise relations'.
If you want to use pairwise relations instead of protein sequences as input, please do the following steps:
The pairwise relations between the objects (e.g. protein sequences) must be in space-separated-format.
The following 3 space-separated values are needed:
The numbering of the objects must start with 1 and be continuous. Duplicate object-pairs with different pairwise similarity values are not allowed. Self-pairs (i.e. object A = object B) are not allowed.
Please use a text file for the option 'Upload from file'.
Microsoft Word, LibreOffice Writer and PDF documents are not text files.
Yes, only people knowing your 12 character job-ID can access the data.
After submitting your job, a flash message displays the link for accessing your results:
pasimap.biologie.uni-konstanz.de/query/job-ID
(The 12 character job-ID is specific for your submitted job.)
Simply copy the link from the flash message for easy access.
Your results will be stored for 14 days. After that your data will be deleted, so please save your data before then.
Within PaSiMap, the mapping of the pairwise similarities between the objects (e.g. protein sequences) is done with the multidimensional scaling method cc_analysis. Therefore, the results of PaSiMap are interpreted in the following way:
You can download the 'coordinates.csv'-file and use it to create a scatter plot with your favourite plotting program (e.g. Microsoft Excel, LibreOffice Calc, gnuplot, Matplotlib).
If your job finished successfully, you do not need the file 'connectivity.csv'.
You only need this file, if your job failed because of missing connections. In that case, the flash-message will report the connectivity-problem and tell you how to proceed.