Citation
Song, Yang (2003) A Probabilistic Approach to Human Motion Detection and Labeling. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/945J-QX86. https://resolver.caltech.edu/CaltechETD:etd-12102002-113833
Abstract
Human motion analysis is a very important task for computer vision with many potential applications. There are several problems in human motion analysis: detection, tracking, and activity interpretation. Detection is the most fundamental problem of the three, but remains untackled due to its inherent difficulty. This thesis develops a solution to the problem. It is based on a learned probabilistic model of the joint positions and velocities of the body parts, where detection and labeling are performed by hypothesis testing on the maximum a posterior estimate of the pose and motion of the body. To achieve efficiency in learning and testing, a graphical model is used to approximate the conditional independence of human motion. This model is also shown to provide a natural way to deal with clutter and occlusion.
One key factor in the proposed method is the probabilistic model of human motion. In this thesis, an unsupervised learning algorithm that can obtain the probabilistic model automatically from unlabeled training data is presented. The training data include useful foreground features as well as features that arise from irrelevant background clutter. The correspondence between parts and detected features is also unknown in the training data. To learn the best model structure as well as model parameters, a variant of the EM algorithm is developed where the labeling of the data (part assignments) is treated as hidden variables. We explore two classes of graphical models: trees and decomposable triangulated graphs and find that the later are superior for our application. To better model human motion, we also consider the case when the model consists of mixtures of decomposable triangulated graphs.
The efficiency and effectiveness of the algorithm have been demonstrated by applying it to generate models of human motion automatically from unlabeled image sequences, and testing the learned models on a variety of sequences. We find detection rates of over 95% on pairs of frames. This is very promising for building a real-life system, for example, a pedestrian detector.
Item Type: | Thesis (Dissertation (Ph.D.)) |
---|---|
Subject Keywords: | decomposable triangulated graphs; dynamic programming; graphical models; human motion detection and labeling; Johansson displays; unsupervised learning |
Degree Grantor: | California Institute of Technology |
Division: | Engineering and Applied Science |
Major Option: | Electrical Engineering |
Thesis Availability: | Public (worldwide access) |
Research Advisor(s): |
|
Thesis Committee: |
|
Defense Date: | 13 November 2002 |
Record Number: | CaltechETD:etd-12102002-113833 |
Persistent URL: | https://resolver.caltech.edu/CaltechETD:etd-12102002-113833 |
DOI: | 10.7907/945J-QX86 |
Default Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. |
ID Code: | 4917 |
Collection: | CaltechTHESIS |
Deposited By: | Imported from ETD-db |
Deposited On: | 16 Dec 2002 |
Last Modified: | 08 Nov 2023 00:44 |
Thesis Files
|
PDF
- Final Version
See Usage Policy. 1MB |
Repository Staff Only: item control page