Protein Sequence Coverage Map

Protein sequence coverage maps are visualizations used in proteomics and peptidomics to show the distribution of peptides across their parent protein. This software was developed in collaboration with Aarhus University’s Department of Food Science. Further details on the software and its use are provided below. To reference the software, please cite the accompanied review article on bioactive milk peptides.

Authors

Harith Rathish, Søren Drud-Heydary Nielsen, Hans-Jörg Schulz

Cover image of Protein Sequence Map. Details of the labels are given in the description. **Click this image to watch the demo video.**

Overview

The protein sequence coverage map is widely used (1, 2, 3) to provide an overview of the peptides associated to a protein. This implementation is targeted for use cases where horizontal space is limited, e.g., in a paper.

Description

The visualizations show a horizontal line (1) for each peptide, with a different color for different bioactive functions (2). The purpose of this plot is to provide an overview of all the peptides that were associated with the protein. The lines that wrap around the edges of the image are indicated with an arrow mark (3). For each amino acid "X", the longest peptide that starts with "X" is shown on top. From top to down, the length of the peptides decrease.

Currently we support exporting the plot as an SVG image (4) , which contains the plot alone without the legend or the controls.

Interaction

Choose a protein (5) : Users can choose a protein out of the list of unique values for "Entry" from the protein sequences dataset. The sequence will be built and peptides will be stacked based on this choice.

Length of signal peptide (6) : This indicates the number of amino acids to hide from the beginning of the sequence.

Max axis length (7) : This indicates the maximum number of amino acids shown per axis. If it's set to 0, then the number is calculated based on available screen width. This lets users control the width according to their horizontal space limits.

Design Choices

Color Scheme

The plot uses a categorical color scheme prescribed in Figure 4 of Qualititative Color Schemes in Paul Tol's notes. We chose this scheme as it is developed with the help of mathematical descriptions of colour differences and the two main types of colour-blind vision. The only change we've made is darkening the grey color from #DDDDDD to #BBBBBB, due to printability reasons.

As of now, the bioactive functions map to the same colors as shown in the figure, regardless of their order in any input dataset. The color codes for each bioactive function is shown in the table below.

	Bioactive Function	Color Hexcode
	ACE-inhibitory	#CC6677
	Antimicrobial	#DDCC77
	Antioxidant	#117733
	DPP-IV Inhibitor	#88CCEE
	Opioid	#999933
	immunomodulatory	#44AA99
	Anticancer	#AA4499
	Others	#BBBBBB

Structure of input data

There are two datasets to be uploaded in .csv format onto fields (8) and (9), the structure of which are as follows:

Protein Sequences
Should contain two columns, one for the protein ID ("Entry") and one for the protein sequence ("Sequence"). The protein ID should be unique, and should match an entry in the UniProt database. The sequence string should have no spaces or line breaks. An example is shown below:

Example for protein sequences
	Entry	Sequence
	P02663	MKFFIFTCL...
	P02662	MKLLILTCL...
	...	...

Peptides
Should contain three columns, one for the protein ID ("proteinID"), one for the peptide sequence ("Peptide") and one for the bioactive function of the peptide ("function"). The protein ID should match one and only one entry in the protein sequences dataset.

Example for peptides
proteinID	peptide	function
P02663	TKVIPYVRYL	Antimicrobial
P02662	FFVAP	ACE-inhibitory
...	...	...

Citation in BibTeX

To cite this article, we encourage you to use the following bibtex entry in your citation manager:

@article{doi:10.1080/10408398.2023.2240396,
author = {Søren Drud-Heydary Nielsen and Ningjian Liang and Harith Rathish and Bum Jin Kim and Jiraporn Lueangsakulthai and Jeewon Koh and Yunyao Qu and Hans-Jörg Schulz and David C. Dallas},
title = {Bioactive milk peptides: an updated comprehensive overview and database},
journal = {Critical Reviews in Food Science and Nutrition},
volume = {0},
number = {0},
pages = {1-20},
year  = {2023},
publisher = {Taylor & Francis},
doi = {10.1080/10408398.2023.2240396},
    note ={PMID: 37504497},
URL = { 
        https://doi.org/10.1080/10408398.2023.2240396
},
eprint = { 
        https://doi.org/10.1080/10408398.2023.2240396
}
}