A Caltech Library Service

Automatic observation and synthesis of human motion


Goncalves, Luis (2000) Automatic observation and synthesis of human motion. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/n3fn-jd79.


Over the past few decades Computer Vision and Computer Graphics have experienced a rapid evolution, thanks in part to the continual improvement in computer hardware, which enables the investigation of increasingly complex problems. In Computer Graphics this evolution is visible on a nearly day-by-day basis. For instance, computer-generated special effects in feature films have evolved to such a level of sophistication that it is often impossible to distinguish what is real from what is not. However, one challenging problem that still stands, considered by many experts in the field to be a Holy-Grail of Computer Graphics, is the automatic synthesis of life-like human character animation. Although rendering and modeling techniques have reached a stage where a computer generated image of a person is nearly indistinguishable from the real thing, as soon as that model begins to move the illusion is broken. The problem is difficult because no-one yet knows how to model human motion in all it's intricacy and subtlety, and also because humans are so well tuned to perceive these subtleties that they can only be fooled if the modeling is done with complete perfection. In this thesis, we explore a novel method of automatic synthesis of human motion that brings us one step closer to the ultimate goal. The method is based on decomposing human motion into elemental, nameable actions such as walking, running, and throwing, and using observations of people performing these actions to create mathematical models of the actions. Various samples of an action are acquired, and each sample is labeled according to state (initial body configuration), goal (desired outcome of the motion, such as direction of a throw or placement of a foot for a step), and mood and style parameters. Then established and novel techniques of machine learning are applied to derive a function that can synthesis a motion given some desired parameters. We explore the use of polynomial interpolants, radial basis function networks (RBFs), feed-forward neural networks (FFNNs) with sigmoidal activation functions, as well as a new method with local linear models. We find that a linear model more often that not works quite well, whereas higher order polynomial interpolants, RBFs and FFNNs are unable to extrapolate robustly when the motion parameters lie outside of the convex hull of the parameters of the available sample motions. The method with local linear models successfully improves the fidelity of the synthetic motions compared to the linear model, and also provides robust extrapolation. We also investigate the use of a recursive, probabilitic model where motions are specified by defining the initial and final body poses of the motion, and synthesis is done by computing the most likely motion to satisfy the boundary constraints. Although the results with this method are not yet completely satisfactory, it holds promise, and under certain types of conditions can re-synthesize the sample motions more accurately than any of the other methods. With the additional development of methods to smoothly concatenate actions together and to interactively map synthesized motions to a 3-D polygonal character model, a realtime interactive demo was created that successfully demonstrates the level of realism and interactivity achievable by our method of human motion synthesis. Our interest in the problem of realistic human motion synthesis arose from an initial study of the (in some sense) inverse problem in Computer Vision of the automatic observation (rather than synthesis) of human motion. Although progress in Computer Vision has not yet reached a level enabling it's widespread use in daily life, this state will most likely be achieved within the next decade. One large class of problems for which this is the case is the endowment of computers with visual perceptual skills similar to those of humans. Among the vast set of visual tasks imagineable, the automatic detection, recognition, and estimation of humans and human motion is a particularly interesting set of problems since there are many possible applications of such a technology in modern life, ranging from security and monitoring systems, to systems for biometric analysis, to novel human-machine interfaces. In this thesis we describe a method of robustly estimating the motion of a human body from a monocular view. The method is based on the use of a 3-D model of the body, and comparing the actual image to an expected image based on the 3-D model to update the estimate of the body pose at each time step. The method was implemented in realtime as a human-machine interface. This system demonstrated that the method can be used to robustly track a human arm with a hand-tip positioning resolution of 2cm under close viewing conditions (where perspective projection causes significant changes in the appearance of the arm in the camera view).

Item Type:Thesis (Dissertation (Ph.D.))
Degree Grantor:California Institute of Technology
Division:Engineering and Applied Science
Major Option:Computation and Neural Systems
Thesis Availability:Public (worldwide access)
Research Advisor(s):
  • Perona, Pietro
Thesis Committee:
  • Perona, Pietro (chair)
  • Barr, Alan H.
  • Bruck, Jehoshua
  • Burdick, Joel Wakeman
  • Andersen, Richard A.
Defense Date:8 March 2000
Record Number:CaltechETD:etd-11022006-104150
Persistent URL:
Default Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:4371
Deposited By: Imported from ETD-db
Deposited On:16 Apr 2007
Last Modified:08 Nov 2023 00:44

Thesis Files

PDF (Goncalves_l_2000.pdf) - Final Version
See Usage Policy.


Repository Staff Only: item control page