Citation
Periyakoil, Preethi Kasthuri (2018) Utilizing Machine Learning Techniques to Rapidly Identify MUC2 Expression in Colon Cancer Tissues. Senior thesis (Major), California Institute of Technology. doi:10.7907/sapn-r691. https://resolver.caltech.edu/CaltechTHESIS:08232018-131754063
Abstract
Colorectal cancer is the third-most common form of cancer among American men and women. Like most tumors, colon cancer is sustained by a subpopulation of “stem cells” that possess the ability to self-renew and differentiate into more specialized cell types. It would be useful to detect stem cells in images of colon cancer tissue, but the first step in being able to do so is to know what genes are expressed in the stem cells and how to detect their expression pattern from the tissue images. Machine learning (ML) is a powerful tool that is widely used in biological research as a novel and innovative technique to facilitate rapid diagnosis of cancer. The current study demonstrates the feasibility and effectiveness of using ML techniques to rapidly detect the expression of the gene MUC2 (mucin 2) in colon cancer tissue images. We analyzed histological images of colon cancer and segmented the nuclei to look for features (area, perimeter, eccentricity, compactness, etc.) that correlate with high or low levels of MUC2. Grid search was then run on this data set to tune the hyper-parameters, and the following models were tested as potential classifiers: random forest, gradient boosting, decision trees with AdaBoost, and support vector machines. Of all of the tested models, it was found that the random forest classifier (f1 score of 0.71) and the gradient boosting classifier (f1 score of 0.72) were able to predict the output label most accurately. Under certain conditions, we have identified four features that have predictive capabilities. Predicting individual gene expression with machine learning is the first step in detecting genes that are specific to cancer stem cells in the early stages of cancer, while there is still hope for a cure.
Item Type: | Thesis (Senior thesis (Major)) |
---|---|
Subject Keywords: | machine learning, biology, medicine, cancer, colon, AI, ML, computer science |
Degree Grantor: | California Institute of Technology |
Division: | Engineering and Applied Science |
Major Option: | Computer Science |
Awards: | Library Friends Senior Thesis Prize Finalist, 2018. |
Thesis Availability: | Public (worldwide access) |
Research Advisor(s): |
|
Group: | Senior Undergraduate Thesis Prize |
Thesis Committee: |
|
Defense Date: | 19 March 2018 |
Record Number: | CaltechTHESIS:08232018-131754063 |
Persistent URL: | https://resolver.caltech.edu/CaltechTHESIS:08232018-131754063 |
DOI: | 10.7907/sapn-r691 |
Default Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. |
ID Code: | 11159 |
Collection: | CaltechTHESIS |
Deposited By: | Preethi Periyakoil |
Deposited On: | 24 Aug 2018 17:48 |
Last Modified: | 02 Aug 2022 21:39 |
Thesis Files
|
PDF
- Final Version
See Usage Policy. 427kB |
Repository Staff Only: item control page