Sequence Data - Polya Signal

Publication:

Huiqing Liu, et al. "An In-silico Method for Prediction of Polyadenylation Signals in Human Sequences". Accepted by the 14th International Conference on Genome Informatics (GIW 2003), Pacifico Yokohama, Japan, December 14-17, 2003

Raw Data:

Polya-SourceData.zip

Description:

This data set is converted from sequence data and aims to predict the polyadenylation signals (PAS) in human seuquences. The original data was first used in Sequence Determinants in Human Polyadenylation Site Selection, BMC Genomics, 4(1):7, 2003. The data set contains one group of training data (2327 true PAS) and 5 groups of testing data, each of them consists of 982 samples. Among these 5 sets of testing data, one is true PAS and the other four are all false PAS. By the similar feature generation technique that described in TIS data section, we construct feature space using 1-gram, 2-gram and 3-gram nuleotide acid patterns. There are total 168 features.

Download:

Our transformed .data and .names format files are available here.

Back to Data Repository