Protein domains are fundamental units of organization, and are the building blocks of larger proteins. Accurate delineation of domain boundary is not only of theoretical interest but also of great practical importance.
Motived by excellent data representation of deep learning, we proposed approach, called DNN-Dom, that combines Convolutional Neural Networks (CNN) and Gate Recurrent Units (GRU, A concrete implementation of Recursive Neural Network (RNN) ) models for boundary prediction.
There are 3 steps in DNN-Dom:
(a)Hybrid Deep Learning. Several types of features including PSSM, 3-state SS, SA and AA are used as input features to train the hybrid deep learning model, which is consisted of multiscale CNN layers with different kernel sizes and stacked bidirectional gate recurrent units (BGRUs). The output from two fully connected layers are as deep features.
(b)Parallel Balanced Random Forests (p-BRF) for imbalanced big data classification. First, balanced Random Forests are trained by feeding each tree of RF with balanced training samples, in which boundary samples are maintained and non-boundary samples are under-sampled randomly. Second, the training processes of all trees in FR are realized simultaneously. This parallel training is achieved by CPU multithreading mechanism. Lastly, the probability score from p-BRF is adopt for decision-making, and a threshold is selected by maximizing the Matthews Correlation Coefficient (MCC).
(c) Classifying New samples. For a new sequence, the shallow features of its residues are transformed into deep features by hybrid deep learning. The achieved deep features are feed into the trained p-BRF. The residue with its probability score greater than the threshold is belongs to domain boundary, and vice versa.
The server allows users to upload sequence and get the predicted domain boundaries.
We have released our code of DNN-Dom in GitHub. The website is https://github.com/shiqiang-sq/DNN-Dom.
1. Qiang Shi, Weiya Chen, Siqi Huang, Fanglin Jin, Yinghao Dong, Yan Wang, Zhidong Xue. DNN-Dom: predicting protein domain boundary from sequence alone by deep neural network. Bioinformatics, Volume 35, Issue 24, 15 December 2019, Pages 5128–5136, https://doi.org/10.1093/bioinformatics/btz464.
2. Source Codes: https://github.com/shiqiang-sq/DNN-Dom.