br To examine the potential of proteotyping
To examine the potential of proteotyping for breast cancer classification, we next constructed a decision tree to classify the 96 tumors into the five conventional subtypes based on their proteotypes. We started by selecting the most differentially abundant proteins (log2FC > 1.5; FDR-adj. p < 0.05) from the following comparisons: ER+ versus ER (8 proteins); grade 3 versus grade 1 (2 proteins); HER2+ versus HER2 (2 proteins); luminal B versus luminal A (3 proteins); luminal B HER2+ versus luminal A (3 proteins); HER2-enriched versus luminal A (7 proteins); triple-negative versus luminal A (5 proteins); and
Figure 2. Enrichment of Pathways and Functional Classes among Differentially Abundant Proteins in Different Tumor Phenotypes
Pathway enrichment was performed by gene set enrichment analysis (GSEA) in lists of proteins sorted according to their fold change in four different comparisons: ER status; HER2 status; tumor grade; and lymph node status. Only pathways enriched in the positive phenotype are shown, i.e., ER positivity, high tumor grade, HER2 positivity, and lymph node positivity. Pathways with significance at a = 0.1 are displayed and ordered according to p value.
HER2-enriched versus luminal B (2 proteins). This procedure re-sulted in a list of 22 key proteins (partially overlapping among different comparisons). In a next step, we applied a recursive partitioning algorithm for continuous data in a conditional inference framework (Hothorn et al., 2006). The algorithm auto-matically selected discriminant proteins from the protein list and provided their quantitative thresholds as well as the structure of the decision tree. The algorithm generated a decision tree with three key nodes (Figure 3A), representing three key proteins: type II inositol 3,4-bisphosphate 4-phosphatase (INPP4B); cy-clin-dependent kinase 1 (CDK1); and receptor tyrosine-protein kinase erbB-2 (ERBB2). Importantly, the differential Alprostadil of the selected proteins reflects key clinical parameters defining breast cancer subtypes: ER status (INPP4B; Figure 3B); tumor grade (CDK1; Figure 3C); and HER2 status (ERBB2; Figure 3D). Furthermore, we found that the proteotype-based decision tree assigned 84% of the tumors into their diagnosed conventional subtypes (Figure 3A).
Validation of the Three Key Proteins Selected by the Decision Tree
We next asked whether the changes in protein levels of the three key proteins from the decision tree, INPP4B, CDK1, and ERBB2, have general discriminative potential and biological validity beyond our 96-patient dataset. Analysis of a published proteo-mic dataset of 60 human tumor cell lines (http://proteomics. wzw.tum.de/nci60) confirmed high levels of INPP4B protein in ER+ breast cancer cell lines (MCF-7 and T47D), and no INPP4B protein was found in ER breast cancer cell lines (MDA-MB-231, MDA-MB-468, BT549, and HS 578T), supporting the link be-tween INPP4B and ER status. CDK1 and ERBB2 proteins were not covered in this reference dataset. We furthermore compared our protein-level data with gene expression data in five published microarray datasets (883 patients; Figure S3; Haibe-Kains et al., 2012) and a published RNA sequencing data-set (1,078 patients) by The Cancer Genome Atlas (TCGA) (https://portal.gdc.cancer.gov). This analysis confirmed the connection of INPP4B with ER status, CDK1 with tumor grade, and ERBB2 with HER2 status (Figures 4 and S4). Furthermore, we found that gene expression of INPP4B, CDK1, and ERBB2 was statistically significantly connected with patient survival in the same manner as the commonly used reference genes ESR1 (for ER status) and MKI67 (for tumor grade or proliferation; Figure S5).
Higher Level of ERBB2 in ER /HER2+ versus ER+/HER2+ Tumors
An interesting feature of our decision tree is that the algorithm decided between two HER2+ subtypes based on ERBB2
Figure 3. Classification of Breast Cancer Patients Based on Protein Levels in Tumor Tissue
(A) Decision tree classification. The top panel shows the decision tree generated from 22 proteins selected from proteotypes of 96 patients (see Data S1A and S1B for details). The bar plots (bottom part) show the number of patients, classified by the protein-based decision tree, that coincide with the conventional subtype classification.
(legend continued on next page)
protein levels: whereas lower levels of ERBB2 protein seem to be associated with ER+/HER2+ grade 3 tumors, higher levels were found in ER /HER2+ grade 3 tumors (Figure 3A). To test whether this observation is of general validity, we manually validated the SWATH-MS-based protein quantifica-tion and performed independent analyses at both protein and transcript level. Transcript-level analysis of the same 96 tumor samples described in this study (Bouchal et al., 2015), transcript-level analysis in four additional datasets of