Citation
Lee, Euiwoong (2012) Clustering affine subspaces : algorithms and hardness. Master's thesis, California Institute of Technology. http://resolver.caltech.edu/CaltechTHESIS:07052012191337554
Abstract
We study a generalization of the famous $k$center problem where each object is an affine subspace of dimension $\Delta$, and give either the first or significantly improved algorithms and hardness results for many combinations of parameters. This generalization from points ($\Delta=0$) is motivated by the analysis of incomplete data, a pervasive challenge in statistics: incomplete data objects in $\mathbb{R}^d$ can be modeled as affine subspaces. We give three algorithmic results for different values of $k$, under the assumption that all subspaces are axisparallel, the main case of interest because of the correspondence to missing entries in data tables.
1) $k=1$: Two polynomial time approximation schemes which runs in $poly(\Delta, 1/\epsilon) nd$.
2) $k=2$: $O(\Delta^{1/4})$approximation algorithm which runs in $poly(n, d, \Delta)$
3) General $k$: Polynomial time approximation scheme which runs in $2^{O(\Delta k \log k (1 + 1/\epsilon^2))}nd$
We also prove nearly matching hardness results; in both the general (not necessarily axisparallel) case (for $k \geq 2$) and in the axisparallel case (for $k \geq 3$), the running time of an approximation algorithm with any approximation ratio cannot be polynomial in even one of $k$ and $\Delta$, unless P = NP. Furthermore, assuming that the 3SAT problem cannot be solved subexponentially, the dependence on both $k$ and $\Delta$ must be exponential in the general case (in the axisparallel case, only the dependence on $k$ drops to $2^{\Omega(\sqrt{k})}$). The simplicity of the first and the third algorithm suggests that they might be actually used in statistical applications. The second algorithm, which demonstrates a theoretical gap between the axisparallel and general case for $k=2$, displays a strong connection between geometric clustering and classical coloring problems on graphs and hypergraphs, via a new Hellytype theorem.
Item Type:  Thesis (Master's thesis)  

Subject Keywords:  kcenter; Clustering; Helly theorem; High dimension; Flats; Minimum Enclosing Ball; Incomplete data  
Degree Grantor:  California Institute of Technology  
Division:  Engineering and Applied Science  
Major Option:  Computer Science  
Thesis Availability:  Public (worldwide access)  
Research Advisor(s): 
 
Thesis Committee: 
 
Defense Date:  2012  
Funders: 
 
Record Number:  CaltechTHESIS:07052012191337554  
Persistent URL:  http://resolver.caltech.edu/CaltechTHESIS:07052012191337554  
Default Usage Policy:  No commercial reproduction, distribution, display or performance rights in this work are provided.  
ID Code:  7171  
Collection:  CaltechTHESIS  
Deposited By:  Euiwoong Lee  
Deposited On:  19 Jul 2012 23:46  
Last Modified:  12 Apr 2017 20:35 
Thesis Files

PDF (Complete Thesis)
 Final Version
See Usage Policy. 448Kb 
Repository Staff Only: item control page