Leukemia (Stjude data)

Publication:

Eng-Juh Yeoh, et al. "Classification, Subtype Discovery, and Prediction of Outcome in Pediatric Acute Lymphoblastic Leukemia by Gene Expression Profiling". Cancer Cell, 1:133-143, March, 2002

Raw Data:

http://www.stjuderesearch.org/data/ALL1/

Description:

This study is about classifying subtypes of pediatric acute lymphoblastic leukemia. The data has been divided into six diagnostic groups (BCR-ABL, E2A-PBX1, Hyperdiploid>50, MLL, T-ALL and TEL-AML1), and one that contains diagnostic samples that did not fit into any one of the above groups (labelled as "Others"). There are 12558 genes. According to the above publication, each group of samples has been randomized into training and testing parts. The number of training and testing samples in each group is listed in the table below.

Group (Class)

Number of Training Samples

Number of Testing Samples

BCR-ABL

9

6

E2A-PBX1

18

9

Hyperdiploid>50

42

22

MLL

14

6

T-ALL

28

15

TEL-AML1

52

27

Others

52

27

Total

215

112


Please note that the .names file we provided contains all these 7 class labels.

Download:

Our transformed .data and .names format files are available here.

Back to Data Repository