PaSiMap (Pairwise Similarity Map)

Input of query

Protein sequences

PaSiMap maps protein sequences as coordinates based on their pairwise similarities. The similarity between each pair of sequences is determined with their global (i.e. whole-length) alignment. Therefore, the protein sequences should be trimmed to the region of interest and should have similar lengths.

FASTA-format

The protein sequences have to be in FASTA-format.

This means that each protein sequence consists of:

  • header:
    1 line that describes/names the sequence. This line must start with the character '>'.
  • body:
    1 or more lines containing the sequence of amino acids. The line(s) must not start with the character '>'.

Example 1:

>Z__Z1
APTFTQPLQSVVVLEGSTATFEAHISGFPVPEVSWFRDGQVISTSTLPGVQISFSDGRAKLTIPAVTKANSGRYSLKATNGSGQATSTAELLV
>Z__Z2
APPNFVQRLQSMTVRQGSQVRLQVRVTGIPTPVVKFYRDGAEIQSSLDFQISQEGDLYSLLIAEAYPEDSGTYSVNATNSVGRATSTAELLVQ
>Z__Z3
PPTLVSGLKNVTVIEGESVTLECHISGYPSPTVTWYREDYQIESSIDFQITFQSGIARLMIREAFAEDSGRFTCSAVNEAGTVSTSCYLA
>Z__Z4
APYFITKPVVQKLVEGGSVVFGCQVGGNPKPHVYWKKSGVPLTTGYRYKVSYNKQTGECKLVISMTFADDAGEYTIVVRNKHGETSASASLL
>Z__Z5
SGFDSRIKNYRILEGMGVTFHCKMSGYPLPKIAWYKDGKRIKHGERYQMDFLQDGRASLRIPVVLPEDEGIYTAFASNIKGNAICSGKLYVE
>Z__Z6
PVFVLKPVSFKCLEGQTARFDLKVVGRPMPETFWFHDGQQIVNDYTHKVVIKEDGTQSLIIVPATPSDSGEWTVVAQNRAGRSSISVILT
>Z__Z7
PMFVEKLKNVNIKEGSRLEMKVRATGNPNPDIVWLKNSDIIVPHKYPKIRIEGTKGEAALKIDSTVSQDSAWYTATAINKAGRDTTRCKVN

Example 2:

>Z__Z1
APTFTQPLQSVVVLEGSTATFEAHISGFPVPEVSWFRDGQVISTSTLPGV
QISFSDGRAKLTIPAVTKANSGRYSLKATNGSGQATSTAELLV
>Z__Z2
APPNFVQRLQSMTVRQGSQVRLQVRVTGIPTPVVKFYRDGAEIQSSLDFQ
ISQEGDLYSLLIAEAYPEDSGTYSVNATNSVGRATSTAELLVQ
>Z__Z3
PPTLVSGLKNVTVIEGESVTLECHISGYPSPTVTWYREDYQIESSIDFQI
TFQSGIARLMIREAFAEDSGRFTCSAVNEAGTVSTSCYLA
>Z__Z4
APYFITKPVVQKLVEGGSVVFGCQVGGNPKPHVYWKKSGVPLTTGYRYKV
SYNKQTGECKLVISMTFADDAGEYTIVVRNKHGETSASASLL
>Z__Z5
SGFDSRIKNYRILEGMGVTFHCKMSGYPLPKIAWYKDGKRIKHGERYQMD
FLQDGRASLRIPVVLPEDEGIYTAFASNIKGNAICSGKLYVE
>Z__Z6
PVFVLKPVSFKCLEGQTARFDLKVVGRPMPETFWFHDGQQIVNDYTHKVV
IKEDGTQSLIIVPATPSDSGEWTVVAQNRAGRSSISVILT
>Z__Z7
PMFVEKLKNVNIKEGSRLEMKVRATGNPNPDIVWLKNSDIIVPHKYPKIR
IEGTKGEAALKIDSTVSQDSAWYTATAINKAGRDTTRCKVN

FASTA-header: ASCII, must be unique

PaSiMap only allows ASCII-characters in the header. The special characters ':|,/\' and space-characters will be internally replaced with the character '_'.

All FASTA-headers must be unique, even after this replacement.

FASTA-body: amino acids, should be unique

PaSiMap was developed for sequences of the 20 amino acids. The sequence will be internally converted to uppercase letters.

All FASTA-bodies should be unique in order to avoid a bias towards the more frequent sequences.

No, an advantage of PaSiMap is that the input sequences do not have to be aligned.

The input sequences do not have to be aligned to each other.

However, if you have a MSA of your sequences, you can also input the MSA instead. If you want to use the alignment information of your MSA (instead of letting PaSiMap compute the pairwise alignments), please do the following steps:

  • display the expert-options by clicking 'More options (experts-only)'
  • tick the checkbox 'Protein sequences: aligned as a MSA (multiple sequence alignment).'
If you do not tick this checkbox, the alignment information of the MSA will be ignored.

Pairwise relations (experts-only)

You can use any kind of data, as long as the pairwise similarities can be described as correlation coefficients or correlation coefficient like values. This means that the value range must be within the range of -1 to 1: with 0 for no correlation, 1 for the strongest positive correlation and -1 for the strongest negative correlation.

PaSiMap can only determine the pairwise similarities for protein sequences. The range of the values is from 0 to 1: with 0 for no similarity and 1 for the strongest similarity. Negative values do not occur. For more details please refer to PaSiMap paper.

However, you may skip the computation of the pairwise similarities done by PaSiMap and directly input your own pairwise relations. The calculation of pairwise similarities for anything but protein sequences is not included in PaSiMap, so you will need to do that yourself. The pairwise similarities can then be used as input with the expert-option 'Pairwise relations'.

If you want to use pairwise relations instead of protein sequences as input, please do the following steps:

  • display the expert-options by clicking 'More options (experts-only)'
  • tick the checkbox 'Pairwise relations.'

The pairwise relations between the objects (e.g. protein sequences) must be in space-separated-format.

The following 3 space-separated values are needed:

  • numbering of object A
  • numbering of object B
  • pairwise similarity value between objects A and B

The numbering of the objects must start with 1 and be continuous. Duplicate object-pairs with different pairwise similarity values are not allowed. Self-pairs (i.e. object A = object B) are not allowed.

Upload file

Please use a text file for the option 'Upload from file'.

Microsoft Word, LibreOffice Writer and PDF documents are not text files.

Output of query

Access

Yes, only people knowing your 12 character job-ID can access the data.

After submitting your job, a flash message displays the link for accessing your results:
pasimap.biologie.uni-konstanz.de/query/job-ID
(The 12 character job-ID is specific for your submitted job.)

Simply copy the link from the flash message for easy access.

Your results will be stored for 14 days. After that your data will be deleted, so please save your data before then.

Results

Within PaSiMap, the mapping of the pairwise similarities between the objects (e.g. protein sequences) is done with the multidimensional scaling method cc_analysis. Therefore, the results of PaSiMap are interpreted in the following way:

  • The resulting coordinates can be seen as vectors in a unit sphere (sphere of radius 1).
  • Only the relative positions of the coordinates to each other matter:
  • Therefore, you are free to flip, invert or rotate the coordinates around the origin without changing the interpretation of the result.
  • Angle between the vectors A and B:
  • Systematic difference between the objects A and B.
  • Length difference between the vectors B and C pointing in the same direction:
  • Random difference between the objects B and C. Short vectors represent objects with more random noise while long vectors with a length close to 1 have less noise and can be considered more representative.
What exactly defines the systematic differences can not be directly discerned with only the mapped coordinates. Instead, the identification of these features require downstream analysis based on the systematic groups (vectors with similar angles), which can be detected with this mapping. For more details please refer to PaSiMap paper.

You can download the 'coordinates.csv'-file and use it to create a scatter plot with your favourite plotting program (e.g. Microsoft Excel, LibreOffice Calc, gnuplot, Matplotlib).

If your job finished successfully, you do not need the file 'connectivity.csv'.

You only need this file, if your job failed because of missing connections. In that case, the flash-message will report the connectivity-problem and tell you how to proceed.