UTILITY PATENT APPLICATION METHODS AND APPARATUS FOR MOTION CAPTURE INVENTORS: Prem Kuchi, Raghu Ram Hiremagalur, and Sethuraman Panchanathan
CROSS-REFERENCES TO RELATED APPLICATIONS This application claims the benefit of U.S. Provisional Patent Application Serial No. 60/579,962, filed June 14, 2004, entitled Markerless Motion Capture System using Conventional Visual Range Video Cameras, and incorporates the disclosure of the application by reference.
BACKGROUND OF THE INVENTION A great deal may be learned through the careful observation and study of bodies in motion. When applied to human motion, the concept of motion analysis enables an evaluation of how humans interact with the environment and how human bodies respond under certain circumstances or as a result of specific activity. Detailed study of human motion also facilitates better emulation and modeling of human motion. Motion capture is the process by which the movement information of various objects is quantized to be stored and/or processed. The advancement of motion capture technologies has enabled applications in a wide range of fields, including medical rehabilitation sciences, sports sciences, gaming and animation, entertainment, animal research, and industrial usability and product development. BRIEF DESCRIPTION OF THE DRAWING FIGURES A more complete understanding of the present invention may be derived by referring to the detailed description when considered in connection with the following illustrative figures. In the following figures, like reference numbers refer to similar elements and steps.
Figure 1 is a block diagram of a motion capture system according to various aspects of the present invention. Figure 2 is a flow diagram of an exemplary motion capture process. Figure 3 is an illustration of a subject in an exemplary initial pose. Figure 4 is a flow diagram of an image data analysis process. Figure 5 is a diagram of a motion capture implementation. Elements and steps in the figures are illustrated for simplicity and clarity and have not necessarily been rendered according to any particular sequence. For example, steps that may be performed concurrently or in different order are illustrated in the figures to help to improve understanding of embodiments of the present invention. DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS The present invention is described partly in terms of functional components and various processing steps. Such functional components may be realized by any number of components configured to perform the specified functions and achieve the various results. For example, the present invention may employ various elements, materials, signal sources, signal types, integrated components, recording devices, image data sources, processing components, filters, and the like, which may carry out a variety of functions. In addition, although the invention is described in the motion capture environment, the present invention may be practiced in conjunction with any number of applications, environments, data processing, image and motion analysis, therapeutic and diagnostic systems, and entertainment systems, and the systems described are merely exemplary applications for the invention. Further, the present invention may employ any number of techniques for manufacturing, assembling, testing, and the like.
Referring now to Figure 1, a motion capture system 100 according to various aspects of the present invention comprises one or more cameras 110 and a computer system 112. The cameras 110 record images and transfer corresponding image information to the computer system 112. The computer system 112 analyzes the image information and generates motion capture data. The motion capture process may be performed for any suitable purpose, for example rehabilitative medicine, performance enhancement, motion research, animation, security, or industrial processes. In one embodiment, the motion capture system uses standard, commercially available, relatively low-cost equipment, such as standard consumer video cameras and a conventional personal computer. The cameras 110 may comprise any suitable systems for generating data corresponding to images. For example, the camera 110 may comprise a conventional video camera using a charge-coupled device (CCD) that generates image information, such as analog or digital signals, corresponding to the viewed environment. The camera 110 suitably responds to visual light, though the camera 110 may also or alternatively generate signals in response to other spectral elements, such as infrared, ultraviolet, x-ray, or polarized light. In one embodiment, the motion capture system 100 comprises one or multiple conventional color video cameras. Alternatively, the cameras may comprise high-speed cameras for capturing quick motion. The cameras 110 provide the generated information to the computer system 112. Alternatively, the computer system 112 may receive the image information from another source, such as a storage medium or other source of image data. The computer system 112 analyzes the data to generate motion capture data, such as data for use in the creation of a two- or three-dimensional representation of a live performance. The computer system 112 suitably generates, in real-time, following a delay, or using pre-recorded information, motion
capture data comprising, for example, a recording of body movement (or other movement) for immediate or delayed analysis and playback. The motion capture data may be used for any suitable purpose, such as to map human motion onto a computer-generated character, for example to replicate human arm motion for a character's arm motion, or human hand and finger patterns controlling a character's skin color or emotional state. The computer system 112 may generate the motion capture data according to any appropriate process and/or algorithm. For example, the computer system 112 may be configured to establish selected points associated with one or more targets contained in the image data. As the image data is collected, the computer system 112 tracks the movement of the selected points. In addition, the computer system 112 correlates the movement of the selected points as tracked by different cameras 110 to establish two- or three-dimensional tracks for the selected points. The computer system 112 may also perform post-processing on the image data to prepare the data for playback. In the present embodiment, the computer system executes a motion capture program. For example, referring to Figure 5, the motion capture program suitably comprises a plurality of threads operating on the computer system 112 relating to different tasks, such as a capture thread 512 , a process thread 514, and a display thread 516. Generally, the capture thread 512 captures image information as it is received from the source, the process thread 514 processes the captured data to generate the motion capture data, and the display thread 516 provides the information for display. Referring to Figure 2, a motion capture system 100 according to various aspects of the present invention initially prepares a subject 114, such as a person, animal, or other moving entity or item, for acquiring image data (210). The subject 114 may be prepared in any suitable manner (212). For example, if the motion capture system 100 uses physical markers,
the markers may be attached to relevant places on the subject 114. In addition, the subject 114 environment may be initially situated to facilitate generation of the motion capture data. For example, the lighting for the environment may be adjusted to provide proper contrast for generating the image data and/or the subject 114 placed before a suitable background. In the present embodiment, the subject 114 is placed in an initial pose that may be used for initially identifying the selected points on the subject 114. For example, referring to Figure 3, the subject 114 may be positioned so that at least one leg and one arm are bent to more clearly define the relative locations of the shoulder 1, hip 2, knee 3, ankle 4, and toe 5. In the present embodiment, the motion capture system 100 is initialized by activating at least one camera 110 and observing a target. The computer system fΪ2 also loads the motion capture program from a medium, such as a hard drive or other storage medium. The computer system 112 may perform any appropriate steps to prepare the computer system 112 and/or the remaining elements of the motion capture system 100. For example, the computer system 112 may check the contrast level in the signals received from the cameras 110. If the contrast is inadequate, the computer system 112 may notify the operator to correct the condition, such as requiring additional light, less light, different colored markers, or the like. When the motion capture system 100 is ready, the capture thread 512 captures the image information from the cameras, data files, or other source of image information. The computer system 112 may use any suitable technology to capture the image information, such as conventional DirectX functionalities. As each frame of data is collected, the capture thread 512 may transfer the data to a memory, such as a global buffer 518 (Figure 5). The computer system 112 may identify the markers or other selected points to be tracked in the image data. The computer system 112 may acquire the selected point in any suitable manner, and may operate with any suitable selected points. For example, the motion
capture system 100 may track unilateral markers, bilateral markers, or other configurations. The motion capture system 100 may use physical markers, virtual markers, anatomical landmarks, or other appropriate points. In one embodiment, the process thread 514 of the motion capture system 100 tracks the movement of physical markers, such as visible or otherwise optically responsive markers attached to selected points on the subject 114. For example, the markers may comprise colored markers attached to the subject 114, such as colored paper or plastic discs or rectangles attached via an adhesive, like conventional Post It
® notes. The computer system 112 may detect the marker in any suitable manner, such as using color filtering. The various body segments of the subject may then be identified, for example using anthropomorphic information. In another embodiment, the markers comprise virtual markers, such as markers that are designated by the user or automatically selected by the computer system 112. For example, the computer system 112 may display a frame of an image to the user. The user may then select one or points in the image for tracking, such as by selecting the points using a tracking device. Thus, the user may "point-and-click" to designate the selected virtual makers as desired locations, such as the hip, knee, ankle, toe, or the like. Color and shape models may then be built around the designated virtual markers, and the body segments may be identified, such as by using anthropomoφhic information. Alternatively, the computer system 112 may automatically identify and select virtual markers for tracking. For example, the computer system 112 may use trained and pre-stored shape models, anthropomorphic approximations, and/or a pattern recognition process to identify appropriate points for the hip, knee, ankle, toe, or the like. Color models and shape models may then be generated around
the identified virtual markers. The locations of the virtual markers may also be refined based on the image data as the motion information is received. Upon identification of the markers, the computer system 112 may also associate designations with each marker. The designations may comprise any suitable designations, such as descriptions of the anatomical area associated with the marker. For example, the various markers may be designated, either automatically or manually, as LEFT HIP, RIGHT KNEE, HEAD, or the like. The designations may also be associated with other identifying information, such as the name of the subject 114, the time of the motion capture session, and other relevant information. In the present embodiment, the markers are designated using preselected designations. The preselected designations are associated with various rules and/or characteristics. For example, if a marker is designated as RIGHT ELBOW, the marker may be spatially associated with a RIGHT SHOULDER marker and a RIGHT WRIST marker, such as to assist in reconstruction of the motion, generation of the motion capture data, prevention of inappropriate reconstructions (such as connecting the RIGHT KNEE marker to the LEFT ANKLE marker), and the like. In the present exemplary embodiment, the process thread 514 manages various marker data. The process thread 514 receives the captured image information and returns the tracked point. The process thread 514 suitably iteratively processes the marker data and computes the relevant markers' positions in the current frame. The marker data may comprise any suitable information associated with one or more markers. For example, marker data may modeled as a dynamic graph with each marker as the vertex. Marker data may comprise: i) Markers: individual positions of the markers;
ii) ii) Connectors: join two markers to generate a line; and/or iii) iii) Joints: joining of two lines to generate a joint. In real time instances, each of the markers, lines, and joints are suitably selected and connected before the subject begins moving. In non-real time applications, the markers, lines, and joints may be added at any time and can be deleted at any time during the duration of the motion capture data. The marker data may be modeled as a structure that is associated per frame. The marker data for each frame is suitably written to a secondary storage as a file of the current session. Any future reference to this session may display the frames and the marker data available in the corresponding file. The data present in the secondary storage can be interfaced or exported with other applications or otherwise utilized. The computer system 112 may also analyze the marker data and associate the data with known models to facilitate generation and analysis of the motion capture data. For example, the computer system 112 may use the marker data to estimate the height of the subject 114. The marker data may also assist in the selection of other model information that may be applied to the subject 114, such as a predicted gait cycle, approximate body structure, and the like. When the motion capture system 100 has been initialized and the subject 114 prepared, the process thread 514 may proceed with receiving and analyzing successive frames of image data from the capture thread 512 (214). The successive frames are suitably synchronized so that the data from the various cameras 110 or other sources relate to substantially identical times. As the image data is received, the computer system 112 identifies the marker locations in the image data and accordingly generates data for the marker locations in two- or
three-dimensional space (216). The computer system 112 may employ any appropriate algorithms or processes to identify the locations of the markers and determine their locations. For example, the process thread 514 may predict the location of the marker in a particular frame, search the frame for the marker, identify the marker location, and refine the resulting motion capture data. The process thread 514 may also perform any other desired computations relating to the image data and/or motion capture, such as calculating joint angle trajectories, stresses on joints, accelerations and velocities of body parts, or other relevant data. For example, referring to Figure 4, the process thread 514 may receive the image data for successive frames of image data (410). The frames may be received
" at any appropriate rate, such as at the cameras' 110 frame rate for real-time processing. The computer system 112 may perform any suitable analysis, including filtering, correlation, assimilation, adaptation, and the like. For example, the computer system 112 may apply particle filters, Kalman filters, non-Gaussian and Gaussian filters, adaptive forecasting algorithms, or other statistical signal processing techniques to the image data. The present embodiment applies multiple analyses to the image data, including particle filtering and mean shift tracking, to track the movement of the markers. To determine the location of a particular marker in a frame of data, the computer system 112 may initially apply a prediction model using image data or motion capture data from one or more preceding frames to select a likely area for searching for the marker in the current frame (412). For example, the computer system 112 may apply a conventional prediction model using positions, velocities, accelerations, and/or angles from preceding frames to predict the current location of each marker. The prediction model may generate any suitable prediction data, such as an area or set of pixels most likely to currently include
the marker. In the present embodiment, the prediction model establishes a selected number of locations for further analysis, such as 10 to 1000 locations, for example 75-150 potential locations. A greater number of locations may increase the likelihood of accurately finding the location of the marker, but may also contribute complexity and duration to the processing. The image data, such as the areas identified by the prediction model, may be analyzed to identify the current location of the marker. The present computer system 112 may perform particle filtering (variously known as condensation filtering, sequential Monte Carlo methods, or stochastic filtering methods) on the areas identified by the prediction model (414). In particular, particle filtering calculates probabilities for the marker being located in each candidate area. Particle filtering tends to minimize error. The particle filtering may process
" any suitable characteristics in the image data, such as color, texture, or shape of the target. In the present embodiment, the particle filter analyzes the relevant areas based on color to determine the various probabilities for the location of the marker within the analyzed areas. The particle filtering is suitably constrained or bounded by various rules, such as anatomic distances between markers. The particle filtering suitably generates a set of candidate particles most likely to include a particular marker. The computer system 112 also performs mean shift tracking on the image data (416). For example, the computer system 112 of the present embodiment performs a standard mean shift analysis on the candidate particles identified by the particle filter. Mean shift tracking performs global optimization for one frame to estimate the location of the marker in a subsequent frame. Mean shift tracking tends to provide smoother tracking of the marker movement than particle filtering alone, and more effectively tolerate variations in position and features, such as light variation or shadows. The mean shift tracking generates an expected area that is designated as the marker location. The combination of particle filtering
and mean shift tracking tends to statistically minimize error and maintain stability in the marker tracking. The resulting data may be further processed to correct, clean, or smooth the data. Any suitable processes or techniques may be applied to the data for any desired purpose, for example to more accurately identify the location of the marker, smooth the data, or improve the marker tracking process. The present motion capture system 100 performs one or more different processes to correct and clean the motion capture data (418). In one embodiment, the computer system 112 allows occlusion management and correction of data loss, for example due to occlusion of markers or other failure to detect the marker. The data may be corrected in any suitable manner. For example,
review any frame from which data is missing or having incorrect data and manually adjust the data to insert the correct marker location. Alternatively, the computer system 112 may automatically correct the data, such as by inteφolating between prior and subsequent data. In addition, the computer system 112 may perfoπn additional searching for missing data, for example using different search techniques and/or supplemental data. For example, if a left wrist marker location is missing but the location of the left elbow marker is known, then the computer system 112 may search a substantially spherical area having a radius corresponding to the length of the subject's forearm. The computer system 112 is also suitably configured to perform region growing to reduce error. In the present embodiment, if the full area of the marker is not detected, the computer system 112 may apply region growing to the detected marker to encompass the entire marker. Region growing may be performed using any suitable characteristics of the marker, such as the shape, texture, and/or color of the marker, and may use any suitable region growing techniques, such as conventional region growing processes. The computer
system 112 may then calculate the centroid of the grown region, and the centroid is then designated as the marker. In addition, the computer system 112 may refine the marker locations according to known information. For example, the computer system 112 may refine the shape of the markers according to shape-based parameters, motion information, and/or anthropomoφhic information. The computer system 112 may also perform data smoothing on the motion capture data, for example to minimize errors and correct locations of markers. The data smoothing may be applied to any data at any suitable point in the process. For example, the computer system 112 may perform Monte Carlo Markov Chain (MCMC) smoothing on the data generated by the particle filter to smooth transitions between frames and minimize the number of particles. MCMC smoothing tends to reduces erratic motion that may be produced by the particle filter. If the motion capture system 100 is generating two-dimensional data, the computer system 112 may also apply a calibration algorithm to the data to compensate for three- dimensional movement of the subject 114. If the subject 114 moves other than in a plane peφendicular to the camera 110, the calibration algorithm may adjust the data to compensate accordingly. The calibration algorithm may comprise any suitable calibration algorithm, such as a conventional calibration algorithm. The computer system 112 may also perform any other desired calculations on the motion capture data, such as calculation of different joint angles for each frame, joint angle trajectories, and timing differences between different parts of the body (420). As the motion capture data is generated, the computer system 112 may store any appropriate data for later use, such as the original video record, the motion capture data, and the calculated information, at any appropriate location, such as in a
processed buffer 520 (Figure 5). All or parts of the process may be repeated until the last frame of the original image data is processed and the motion capture data is complete. The computer system 112 may also perform additional output processing to enhance the usefulness of the motion capture data or other suitable puφose. The output processing may comprise any suitable processing of the image capture data for later use. For example, the display thread 516 may organize the records, including any appropriate data such as the original image data, the motion capture data, and other information, according to each subject's name and a session identifier. The display thread 516 may also generate additional information for analysis, such as graphs showing time and amplitude of various motions, comparisons to previous motion capture^sessions, overlays showing the motion- capture- information overlaid on the original image data, and the like. The display thread 516 may also facilitate entry of a report, such as using word processing program, to store a report that may be associated with the subject 114, session, or other information and data. The display thread 516 may also include export functions for transmitting the records or storing them on a medium, such as a DVD. The particular implementations shown and described are illustrative of the invention and its best mode and are not intended to otherwise limit the scope of the present invention in any way. Indeed, for the sake of brevity, conventional manufacturing, connection, preparation, and other functional aspects of the system may not be described in detail. Furthermore, the connecting lines shown in the various figures are intended to represent exemplary functional relationships and/or physical couplings between the various elements. Many alternative or additional functional relationships or physical connections may be present in a practical system.
The present invention has been described above with reference to a preferred embodiment. However, changes and modifications may be made to the preferred embodiment without departing from the scope of the present invention. These and other changes or modifications are intended to be included within the scope of the present invention.