« ZurückWeiter »
COMPUTER INPUT DEVICE HAVING SIX
DEGREES OF FREEDOM FOR
CONTROLLING MOVEMENT OF A THREE-
FIELD OF THE INVENTION 5
The present invention relates generally to input devices for use with computers and other display-based processing systems, and more particularly to an input device which is capable of providing information in six degrees of freedom 10 for controlling movement of a real or virtual object in three-dimensional space.
BACKGROUND OF THE INVENTION
Many conventional computer input devices, such as a 15 mouse, a tracking ball or a joystick, provide control of a two-dimensional computer-generated object, e.g., a cursor on a computer screen. These devices provide two degrees of freedom in the form of two positional parameters, i.e., left-right and forward-backward movement, also known as 20 X and Y translational position. Other conventional computer input devices provide three or more degrees of freedom. For example, there are a number of known input devices that provide control over three positional parameters, i.e., leftright, forward-backward and up-down movement, also 25 known as X, Y, and Z translational position, as well as control over three angular or orientation parameters, i.e., roll, pitch and yaw, for a total of six degrees of freedom. Such devices provide realistic control of three-dimensional computer-generated objects in applications such as virtual 30 reality, video games and graphical editing.
An example of an input device providing six degrees of freedom is described in U.S. Pat. No. 5,889,505 entitled "Vision-Based Six-Degree-of-Freedom Computer Input Device" and issued to Toyama et al. The position and 35 orientation of this input device are determined by tracking a physical object suspended by cables as it is moved by a user. The tracking mechanism requires either an initialization where the tracked object is first imaged in a "home position," or a comparison of current data to previously 40 stored data. The Z coordinate is measured by computing how far apart the pixels of the tracked object are from its centroid. Thus, this method includes all the pixels of the tracked object in its Z computation. Another problem with this approach is that it computes orientation by tracking two 45 reference points that have different distinguishing characteristics. In other words, these reference points must be visually distinguishable. Yet another drawback of this approach is that it does not provide absolute values for the rotation and translation parameters, but only values that are 50 proportional to the actual quantities. These values must then be scaled before being used to control applications.
U.S. Pat. No. 5,856,844, issued to Batterman et al. and entitled "Method and Apparatus for Determining Position and Orientation," describes a method for determining the six 55 degrees of freedom of a head mounted display and a handle to which an optically-modulated target is attached. The target is marked with squares on its surface, and by tracking the perspective views of these squares, six degrees of freedom are computed. A problem with this approach is that 60 it requires a special orientation mark in the opticallymodulated target, in order to identify the ordering of the squares. Another problem is that this approach determines rotation angles directly, and is therefore unduly prone to noise-related distortions. 65
Techniques described in U.S. Pat. No. 5,227,985, issued to DeMenthon and entitled "Computer Vision System for
Position Monitoring in Three Dimensions Using NonCoplanar Light Sources Attached to a Monitored Object," and U.S. Pat. No. 5,297,061, issued to DeMenthon et al. and entitled "Three Dimensional Pointing Device Monitored by Computer Vision," determine position and orientation of an object by utilizing a set of non-coplanar light sources mounted on the object. A problem with this approach is that the use of non-coplanar light sources makes the device more difficult to manufacture and therefore more costly. Another problem is that the light sources used in this approach are of different sizes, in order to correctly identify the ordering of the light sources in the corresponding image, which adds additional complexity to the device.
U.S. Pat. No. 4,672,562, issued to Egli et al. and entitled "Method and Apparatus for Determining Location and Orientation of Objects," describes an input device comprising an orthogonally-related target array. The points are arranged in a very specific configuration such that the fourth target point forms a common intersection point of first, second and third line projections passing separately through the first three points and intersecting the fourth point. In addition, these line projections must form three right angles at the fourth target point. Such constraints are generally undesirable in that they can render the device difficult to manufacture and use.
In view of the above, it is apparent that a need exists for an improved input device capable of providing six degrees of freedom, while also avoiding the problems associated with the conventional approaches.
SUMMARY OF THE INVENTION
The invention provides an improved input device and related method for providing position and orientation information with six degrees of freedom for use in manipulating a real or virtual three-dimensional object. An illustrative embodiment includes a handle with a plate attached to an upper portion thereof. Associated with an upper planar portion of the plate is a set of light sources arranged in a designated pattern. A camera or other detection device detects light from the light sources in order to determine the position, e.g., X, Y and Z positional parameters, and orientation, e.g., roll, pitch and yaw rotational parameters, of the input device. Control software running on a computer system may be used to detect and localize the light sources that are visible in a given image generated by the camera, to associate each of the light sources with a corresponding label, to determine the position and orientation information from the positions of the light sources in the image, and to communicate the position and orientation information to at least one application running on the computer system.
In accordance with the invention, the control software determines the corresponding label for each of the light sources at least in part using a measure which is a function of a set of points associated with the planar surface, each corresponding to one of the light sources, with a given one of the points selected as a principal point. The measure is invariant under perspective projection, such that if the set of points associated with the planar surface are projected into an image, the measure generated for the sets of points associated with the planar surface is the same as that generated for the points as projected into the image. The control software first computes the three-dimensional positions of all the light sources, and subsequently determines the orientation information by directly calculating roll, pitch and yaw rotational parameters from computed translational vectors.
The techniques of the present invention provide a number of significant advantages over the previously-described conventional approaches. For example, an input device in accordance with the illustrative embodiment of the invention utilizes a planar arrangement of light sources, each of which 5 may have the same characteristics, and is therefore easier to manufacture and less costly than conventional devices. No orientation marks are required and there are no constraints on the pattern of light sources as long as the sources are distinguishable using the above-mentioned measure. In 10 addition, an input device in accordance with the invention can generate accurate positional and orientation values without any scale ambiguity. Furthermore, because the invention first computes the three-dimensional positions of all light sources and uses this information to determine the 15 orientation, it is more robust and less prone to noise-related distortions than methods that determine rotation angles directly.
BRIEF DESCRIPTION OF THE DRAWINGS 20
FIG. 1 shows an illustrative embodiment of an input device in accordance with the invention.
FIG. 2 shows an example of the manner in which the FIG. 1 input device is utilized in conjunction with a computer system.
FIG. 3 is a block diagram of the system architecture of the FIG. 1 input device.
FIG. 4 shows an exemplary labeling of the LEDs in the FIG. 1 input device. 30
FIG. 5 shows an exemplary fork-shaped switch that may be utilized in the FIG. 1 input device.
FIGS. 6(a) and 6(b) show a first example of the manner in which correct labels may be determined for each LED of an input device in accordance with the invention. 35
FIGS. 7(a) and 7(b) show a second example of the manner in which correct labels may be determined for each LED of an input device in accordance with the invention.
FIGS. 8(a), 8(b) and 8(c) illustrate the measure of points 4Q using a designated principal point in accordance with the invention.
FIG. 9 shows an exemplary imaging geometry of an input device in accordance with the invention.
DETAILED DESCRIPTION OF THE 45
FIG. 1 shows an illustrative embodiment of an input device 10 in accordance with the invention. The input device 10 provides a minimum of three positional parameters, i.e., 50 left-right or X, forward-backward or Y, and up-down or Z, and three angular parameters, i.e., roll, pitch and yaw, and thus six degrees of freedom for controlling, e.g., the movement of three-dimensional objects in numerous applications. The input device 10 includes a handle 12 and a plate 14 55 attached to the top of the handle 12.
Several light emitting devices (LEDs) 16 are mounted on an upper planar surface of the plate 14. These LEDs can be turned on and off using switches 15-1, 15-2 and 15-3 attached to the device. One of the LEDs 16 is designated as 60 a "trigger LED" and is turned on and off using a "trigger switch" 15-1 attached to the front of the handle. When the trigger switch 15-1 is pressed, the trigger LED turns on, i.e., emits light, and when the trigger switch 15-1 is released, the trigger LED turns off, i.e., stops emitting light. The other 65 switches 15-2 and 15-3 can control multiple LEDs simultaneously. The LEDs 16 can be of the same color or of
different colors and can be arranged in a variety of different patterns. The LEDs 16 may be powered, e.g., by battery cells that are encased in the handle 12. The LEDs 16 may be, for example, light emitting diodes or other similar light emitting devices.
It should be noted that the use of LEDs 16 in the illustrative embodiment is by way of example, and alternative embodiments of the invention may use other types of light sources. It is also possible to replace one or more of the LEDs with reflective markers. The term "light source" as used herein is intended to include such reflective markers, as well as any other type of device capable or generating or directing light.
FIG. 2 illustrates an exemplary configuration in which the input device of FIG. 1 operates in conjunction with a computer system 20. The computer system 20 includes a computer 21, monitor 22 and keyboard 23. A camera 24 connected to the computer 21 is positioned so as to detect light emitted from the LEDs of the input device 10. The input device 10 and computer system 20 are arranged on a flat surface 25 such as a table top or desktop. A lower portion of the handle 12 may rest directly on the surface 25, or alternatively the handle can be configured to rest on a cradle, a ball or other support device so that the device can be rotated easily. In operation, a user holds the input device 10 by its handle 12 under camera 24. When the user moves the input device 10 under the camera 24, the camera 24 sends video signals to the computer 21 and associated control software running on the computer 21 computes the threedimensional (3D) position and orientation of the input device 10.
The 3D position is represented in terms of three numbers (X, Y, Z) and 3D orientation is represented in terms of three angles (roll, pitch, yaw). These six parameters, which are also referred to as control parameters, are used to control applications that require multi-dimensional input. An example of one such application is a 3D graphical editor which allows users to move and rotate objects. In this case, a selected object displayed on the monitor 22 can be moved and rotated in 3D by simply moving and rotating the input device 10 under the camera 24. Another example application is a computer video game in which a user controls a virtual 3D flight by moving and rotating the input device 10. The input device 10 can also be used in conjunction with applications involving the control of real-world objects, such as, e.g., robot arms.
Although FIG. 2 illustrates the operation of input device 10 in conjunction with a conventional desktop computer system 20, the input device 10 can of course be utilized with other types of information processing devices, such as portable or palmtop computers, workstations, personal digital assistants (PDAs), televisions, set-top boxes, etc. The term "computer" as used herein is intended to include these and other processor-based devices.
FIG. 3 is a block diagram showing the architecture of the FIG. 2 configuration. The camera 24 detects light emitted by the LEDs of device 10, and sends a corresponding video signal to a set of control software 30. The control software 30 runs on the computer 21, e.g., may be stored in an electronic or disk-based memory of computer 21 and is executed by a microprocessor or other digital data processor of the computer 21. As will be described in greater detail below, the control software 30 analyzes the images from the camera 24 in order to identify the particular LEDs that are turned on; estimates the position and orientation of the input device 10 and interprets control signals sent from device 10;
and sends this position, orientation and control signal information to applications, e.g., applications 32-1, 32-2 and 32-3, that expect control from the input device 10.
In the illustrative embodiment as described in conjunction with FIGS. 1-3, the input device 10 is wireless and the only means of communication between the device 10 and the computer 21 is through the camera 24. Alternative embodiments may include other arrangements. For example, the input device 10 may be configured to include, e.g., a direct link to the computer 24 via a cable or wireless connection. Such a direct link may be used to transmit and receive the above-noted additional control signals. As another example, the input device 10 may be configured to include a forced feedback mechanism providing tactile feedback to the user.
FIG. 4 shows the input device 10 with an exemplary labeling of the LEDs 16. It should be noted that the particular pattern of LEDs on the upper surface of plate 14 is exemplary only, and numerous other LED patterns may be used in other embodiments. The LEDs 16 are labeled 20 LED-T, LED-U, LED-D, LED-1, LED-2, LED-3, LED-4 and LED-5 as shown. When the user presses the trigger switch 15-1, the trigger LED (LED-T) turns on and this is detected by the control software 30 as a "click event." This click event is then sent to all applications communicating 25 with the control software. Although this embodiment of input device 10 includes only one trigger switch and one trigger LED, other embodiments can include as many such combinations as required.
Switch 15-2 of input device 10 controls the LEDs labeled LED-U and LED-D. These LEDs are normally off. LED-U comes on when the user presses an upper part of switch 15-2 and turns off when the upper part is released. Similarly, LED-D comes on when a lower part of switch 15-2 is 3J pressed and turns off when the lower part is released. The switch 15-2 is configured such that LED-U comes on only when the upper part of the switch is pressed and LED-D comes on only when the lower part of the switch is pressed. Both parts cannot be pressed at the same time, and hence only one of LED-U and LED-D can be on at any given time. Switch 15-2 can be implemented in a number of different ways. For example, it can be a small stick-like protrusion that when pushed up, turns on LED-U and when pushed down, turns on LED-D.
FIG. 5 shows another possible implementation of the switch 15-2. In this implementation, the switch 15-2 comprises a fork-shaped protrusion into which the user can slide a finger, e.g., a thumb, such that the user controls the LEDs by moving the finger up and down. Other suitable 50 arrangements, such as a ring into which the user inserts a finger, could also be used.
Switch 15-3 of input device 10 controls the LEDs labeled LED-1, LED-2, LED-3, LED-4 and LED-5. When the user j5 desires to utilize the input device 10, switch 15-3 is flipped to an on position, and the above-noted five LEDs turn on and remain on until the switch is flipped back to an off position. These five LEDs are collectively referred to herein as "principal LEDs" and remain on whenever the switch 15-3 is in the on position. The switch 15-3 thus effectively serves as an on/off switch for the device 10.
The control software 30 will now be described in greater detail. As previously noted, the control software 30 analyzes images of the input device 10 as obtained via the camera 24, 65 and computes the position and orientation of the device as well as additional control information. An exemplary pro
cessing algorithm implemented by the control software 30 includes the following five main steps:
1. Detection and localization of the LEDs that are visible in the image.
2. Association of each LED with its correct label.
3. Determination of the position and orientation of the device from the LED positions in the image.
4. Generation of additional control signals.
5. Communications with applications.
Each of these steps is described in detail below.
Step 1. Step 1 involves analyzing each image, e.g., frame, of the video signal generated by camera 24 to determine the pixel locations, i.e., coordinates, of the LEDs that are on. Any of a number of well-known conventional feature extraction or region extraction techniques may be used to implement this function. In the illustrative embodiment, a conventional region extraction technique is used to determine the regions in the image that correspond to the LEDs that are on. For each such LED, the algorithm computes the pixel coordinates (x, y) of the center of the corresponding region. Thus, the output of Step 1 is a set of pixel locations (x, y) that correspond to the LED centers in the image.
Step 2. Each LED 16 of the input device 10 has a unique label or name associated with it as shown in FIG. 4. As the device 10 is moved and rotated under the camera 24, each LED gets projected onto a different location in the image and the goal of Step 2 is to identify each of the LEDs in the image, i.e., to associate each LED in the image with its correct label.
As an example, FIG. 6(a) shows how the image might look when only the principal LEDs (LED-1,..., LED-5) are on. The above-described Step 1 of the algorithm will determine the pixel locations (x,-, y,) of the LED centers in the image. Since there are five LEDs that are visible, there will be five such (x,-, y,) pairs. Step 2 takes these pixel locations as input and determines the correct label for each LED, as illustrated in FIG. 6(b). Another example image and labeling is shown in FIGS. 7(a) and 7(b).
Step 2 can be implemented in a number of different ways. One way is to use LEDs of different colors or other physical characteristics and determine their labels based on these characteristics. Another possibility is to track the LEDs from one frame to the next using motion tracking algorithms. The latter generally requires an initialization phase in which the device 10 is shown under the camera 24 at a specific orientation so that the labels are easily determined. After this initialization phase, LEDs in the current video frame are tracked using information available from the previous frames.
A preferred implementation of Step 2 in accordance with the invention does not require LEDs of different physical characteristics and does not perform motion tracking and hence requires no initialization. This implementation is described below.
Consider five points P1; P2, P3, P4 and P5 that lie on a plane. Define a quantity called "basic measure" (BM) which is a function of the five points with one of them chosen as a "principal point." For example, the basic measure of the above five points with P5 chosen as the principal point is denoted by BM(P5; P1; P2, P3, P4) and is computed in the following manner.
FIG. 8(a) shows the five points. To compute BM(P5; P1; P2, P3, P4), first draw line segments from P5 to P1; P2, P3 and P4 as shown in FIG. 8(b). Next, as shown in FIG. 8(c), choose any line L that intersects the line segments ... P5P2, P5P3 and P5P4. Let the points of intersection be A, B, C and D. These points must be labeled sequentially from left