CaltechTHESIS
A Caltech Library Service

# Clustering affine subspaces : algorithms and hardness

## Citation

Lee, Euiwoong (2012) Clustering affine subspaces : algorithms and hardness. Master's thesis, California Institute of Technology. http://resolver.caltech.edu/CaltechTHESIS:07052012-191337554

## Abstract

We study a generalization of the famous $k$-center problem where each object is an affine subspace of dimension $\Delta$, and give either the first or significantly improved algorithms and hardness results for many combinations of parameters. This generalization from points ($\Delta=0$) is motivated by the analysis of incomplete data, a pervasive challenge in statistics: incomplete data objects in $\mathbb{R}^d$ can be modeled as affine subspaces. We give three algorithmic results for different values of $k$, under the assumption that all subspaces are axis-parallel, the main case of interest because of the correspondence to missing entries in data tables.

1) $k=1$: Two polynomial time approximation schemes which runs in $poly(\Delta, 1/\epsilon) nd$.

2) $k=2$: $O(\Delta^{1/4})$-approximation algorithm which runs in $poly(n, d, \Delta)$

3) General $k$: Polynomial time approximation scheme which runs in $2^{O(\Delta k \log k (1 + 1/\epsilon^2))}nd$

We also prove nearly matching hardness results; in both the general (not necessarily axis-parallel) case (for $k \geq 2$) and in the axis-parallel case (for $k \geq 3$), the running time of an approximation algorithm with any approximation ratio cannot be polynomial in even one of $k$ and $\Delta$, unless P = NP. Furthermore, assuming that the 3-SAT problem cannot be solved subexponentially, the dependence on both $k$ and $\Delta$ must be exponential in the general case (in the axis-parallel case, only the dependence on $k$ drops to $2^{\Omega(\sqrt{k})}$). The simplicity of the first and the third algorithm suggests that they might be actually used in statistical applications. The second algorithm, which demonstrates a theoretical gap between the axis-parallel and general case for $k=2$, displays a strong connection between geometric clustering and classical coloring problems on graphs and hypergraphs, via a new Helly-type theorem.

Item Type:Thesis (Master's thesis)
Subject Keywords:k-center; Clustering; Helly theorem; High dimension; Flats; Minimum Enclosing Ball; Incomplete data
Degree Grantor:California Institute of Technology
Division:Engineering and Applied Science
Major Option:Computer Science
Thesis Availability:Public (worldwide access)
• Schulman, Leonard J.
Thesis Committee:
• Unknown, Unknown
Defense Date:2012
Funders:
Funding AgencyGrant Number
Samsung ScholarshipUNSPECIFIED
Record Number:CaltechTHESIS:07052012-191337554
Persistent URL:http://resolver.caltech.edu/CaltechTHESIS:07052012-191337554
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:7171
Collection:CaltechTHESIS
Deposited By: Euiwoong Lee
Deposited On:19 Jul 2012 23:46