Citation
Lee, Euiwoong (2012) Clustering Affine Subspaces: Algorithms and Hardness. Master's thesis, California Institute of Technology. doi:10.7907/VF38NT60. https://resolver.caltech.edu/CaltechTHESIS:07052012191337554
Abstract
We study a generalization of the famous kcenter problem where each object is an affine subspace of dimension Δ, and give either the first or significantly improved algorithms and hardness results for many combinations of parameters. This generalization from points (Δ=0) is motivated by the analysis of incomplete data, a pervasive challenge in statistics: incomplete data objects in R^{d} can be modeled as affine subspaces. We give three algorithmic results for different values of k, under the assumption that all subspaces are axisparallel, the main case of interest because of the correspondence to missing entries in data tables.
1) k=1: Two polynomial time approximation schemes which runs in poly(Δ, 1/ε)nd.
2) k=2: O(Δ^{1/4})approximation algorithm which runs in poly(n,d,Δ)
3) General k: Polynomial time approximation scheme which runs in 2^{O(Δk log k(1+1/ε2))}nd
We also prove nearly matching hardness results; in both the general (not necessarily axisparallel) case (for k ≥ 2) and in the axisparallel case (for k ≥ 3), the running time of an approximation algorithm with any approximation ratio cannot be polynomial in even one of k and Δ, unless P = NP. Furthermore, assuming that the 3SAT problem cannot be solved subexponentially, the dependence on both k and Δ must be exponential in the general case (in the axisparallel case, only the dependence on k drops to 2^{Ω√k)}). The simplicity of the first and the third algorithm suggests that they might be actually used in statistical applications. The second algorithm, which demonstrates a theoretical gap between the axisparallel and general case for k=2, displays a strong connection between geometric clustering and classical coloring problems on graphs and hypergraphs, via a new Hellytype theorem.
Item Type:  Thesis (Master's thesis)  

Subject Keywords:  kcenter; Clustering; Helly theorem; High dimension; Flats; Minimum Enclosing Ball; Incomplete data  
Degree Grantor:  California Institute of Technology  
Division:  Engineering and Applied Science  
Major Option:  Computer Science  
Thesis Availability:  Public (worldwide access)  
Research Advisor(s): 
 
Thesis Committee: 
 
Defense Date:  2012  
Funders: 
 
Record Number:  CaltechTHESIS:07052012191337554  
Persistent URL:  https://resolver.caltech.edu/CaltechTHESIS:07052012191337554  
DOI:  10.7907/VF38NT60  
Default Usage Policy:  No commercial reproduction, distribution, display or performance rights in this work are provided.  
ID Code:  7171  
Collection:  CaltechTHESIS  
Deposited By:  Euiwoong Lee  
Deposited On:  19 Jul 2012 23:46  
Last Modified:  03 Oct 2019 23:56 
Thesis Files

PDF (Complete Thesis)
 Final Version
See Usage Policy. 458kB 
Repository Staff Only: item control page