The CAMDA’03 data sets are focused on lung cancers. Four microarray data sets were released as a CAMDA'03 data challenge package. All four microarray data sets (Boston, Michigan, Ontario, and Stanford) are independently acquired to ask the same questions in lung cancer biology. This years challenge is to integrate information from different data sets. Thus, you are encouraged to submit your analysis result based on at least two of the four data sets. However, there may be obstacles to integrating the data. For example, there may not be a sufficient number of clones in common between the different sets for interpreting the results. Discussions of the problem and solution of cross-platform issues are encouraged. We should emphasize that the final goal of the analysis is to make an impact on cancer biology and eventually patient care. Thus, the biological relevance of your methodology is critical. We would like to see papers deeply analyzing the biology of lung cancer. Especially, we welcome the methodology development of survival analysis using microarrays for cancer prognostics (Bioinformatics 18: S120, 2002).
***Please note that in the initial release, some of the raw image files are not available. We will notify the people who downloaded this partial data set as soon as we acquired the raw data.
The four data sets are: CAMDA’03- Boston, Michigan, Ontario, and Stanford.
Datasets: (pdf format)
- Harvard Lung Cancer Dataset
- Bhattacharjee, A, et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. PNAS. 98 (24), 13790-13795, November 2001.
- Michigan Lung Cancer Dataset
- Beer, D, et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Medicine 9 (816), 2002.
- Stanford Lung Cancer Dataset
- Garber, ME, et al. Diversity of gene expression in adenocarcinoma of the lung. PNAS. 98 (24), 13790-13795, November 2001.
- Ontario Lung Cancer Dataset
- Wigle, D, et al. Molecular profiling of non-small cell lung cancer and correlation with disease-free survival. Cancer Research, 62 (11): 3005-3008, June 2002.
Downloadable Data: (zip format) (ftp.camda.duke.edu/CAMDA03_DATASETS/)
Note: only 9 concurrent downloads are allowed at any one time. Web browsers tend to create multiple connections when FTPing, so we strongly encourage that you download a single file at a time via a command-line ftp client during off-peak hours (between 6 PM and 8 AM, anytime on weekends).
CAMDA 2003 Data Set Policies:
Because CAMDA is a competition, we are unable to answer questions related to the content or format of the data files. Part of the technical challenge of the competition is in decoding the raw data itself, in addition to processing the data. To facilitate this process, we have set up a public mailing list for discussion of the datasets. This mailing list is accessible at http://groups.yahoo.com/group/camdadata/. To be fair to all contestants, we will not reply to individuals. However, contestants are encouraged to share ideas and questions on the public mailing list. We hope that the mailing list will become a forum for discussing the technical aspects of the datasets and move the process of analyzing the data forward. We will answer questions to the mailing list at our own discretion.