CaltechTHESIS
  A Caltech Library Service

Utilizing Machine Learning Techniques to Rapidly Identify MUC2 Expression in Colon Cancer Tissues

Citation

Periyakoil, Preethi Kasthuri (2018) Utilizing Machine Learning Techniques to Rapidly Identify MUC2 Expression in Colon Cancer Tissues. Senior thesis (Major), California Institute of Technology. doi:10.7907/sapn-r691. https://resolver.caltech.edu/CaltechTHESIS:08232018-131754063

Abstract

Colorectal cancer is the third-most common form of cancer among American men and women. Like most tumors, colon cancer is sustained by a subpopulation of “stem cells” that possess the ability to self-renew and differentiate into more specialized cell types. It would be useful to detect stem cells in images of colon cancer tissue, but the first step in being able to do so is to know what genes are expressed in the stem cells and how to detect their expression pattern from the tissue images. Machine learning (ML) is a powerful tool that is widely used in biological research as a novel and innovative technique to facilitate rapid diagnosis of cancer. The current study demonstrates the feasibility and effectiveness of using ML techniques to rapidly detect the expression of the gene MUC2 (mucin 2) in colon cancer tissue images. We analyzed histological images of colon cancer and segmented the nuclei to look for features (area, perimeter, eccentricity, compactness, etc.) that correlate with high or low levels of MUC2. Grid search was then run on this data set to tune the hyper-parameters, and the following models were tested as potential classifiers: random forest, gradient boosting, decision trees with AdaBoost, and support vector machines. Of all of the tested models, it was found that the random forest classifier (f1 score of 0.71) and the gradient boosting classifier (f1 score of 0.72) were able to predict the output label most accurately. Under certain conditions, we have identified four features that have predictive capabilities. Predicting individual gene expression with machine learning is the first step in detecting genes that are specific to cancer stem cells in the early stages of cancer, while there is still hope for a cure.

Item Type:Thesis (Senior thesis (Major))
Subject Keywords:machine learning, biology, medicine, cancer, colon, AI, ML, computer science
Degree Grantor:California Institute of Technology
Division:Engineering and Applied Science
Major Option:Computer Science
Awards:Library Friends Senior Thesis Prize Finalist, 2018.
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Yue, Yisong
Group:Library Friends' Senior Thesis Prize
Thesis Committee:
  • None, None
Defense Date:19 March 2018
Record Number:CaltechTHESIS:08232018-131754063
Persistent URL:https://resolver.caltech.edu/CaltechTHESIS:08232018-131754063
DOI:10.7907/sapn-r691
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:11159
Collection:CaltechTHESIS
Deposited By: Preethi Periyakoil
Deposited On:24 Aug 2018 17:48
Last Modified:01 Feb 2021 22:45

Thesis Files

[img]
Preview
PDF - Final Version
See Usage Policy.

427kB

Repository Staff Only: item control page