WO2013091016A1

WO2013091016A1 - Structured light system for robust geometry acquisition

Info

Publication number: WO2013091016A1
Application number: PCT/AU2012/001587
Authority: WO
Inventors: David John Battle; David John Maunder; Donald James Bone
Original assignee: Canon Kabushiki Kaisha
Priority date: 2011-12-23
Filing date: 2012-12-21
Publication date: 2013-06-27
Also published as: AU2011265572A1

Abstract

Disclosed is a method of determining coordinates (790) of a reference point (780) on an object in three-dimensional space in a scene captured by an image sensor. The object is irradiated by light sources (710,720) which are modulated at a different spatio-temporal frequency. The method generates a composite phase signal (630,640) on the object by a predetermined geometric arrangement of the light sources, and captures (910) the composite phase signal at the reference point with the image sensor. A processing arrangement determines, from the captured composite phase signal, a set of measured positioning parameters (measured carrier phase ψ_lm) independent of a position of the image sensor. The measured positioning parameters from the light sources are used for determining the coordinates of the reference point.

Description

.

- 1-

STRUCTURED LIGHT SYSTEM FOR ROBUST GEOMETRY ACQUISITION

REFERENCE TO RELATED PATENT APPLICATIONS

[0001] This application c.laims the benefit under 35 U.S.C. §119 of the filing date of Australian Patent Application No. 201 1265572, filed December 23, 2011 , hereby incorporated by reference in its entirety as if fully set forth herein.

TECHNICAL FIELD

[0002] The present invention relates generally to the photographic acquisition of detailed geometric information regarding a scene and, in particular, to the use of modulated light sources to make this information robust and independent of imaging system calibrations. Applications of the invention include metrology, robot part picking, reverse engineering and geometry-based post processing of digital images.

BACKGROUND

[0003] Digital images represent projections of the three-dimensional world into two dimensions from the particular viewpoint of a camera. There are many situations, however, where it is desirable for a human, computer or mobile robot to possess additional information regarding a captured scene, such as the relative distances, or depths, of objects contained within the scene. The ability to record object distances in an image allows a photographer, for example, to selectively blur features in the background so as to enhance the salience of foreground objects. Computer security systems employing image analysis algorithms are greatly assisted in segmenting objects of interest where appropriate geometric information is available. Accurate knowledge of scene geometry is also important in the case of mobile robots, which may be required to negotiate and handle complex objects in the real world.

[0004] Several methods are known for acquiring depth information from scenes. Most of these belong to one of the three general categories of time of flight (TOF), depth from defocus (DFD), and triangulation methods. In TOF methods, light propagation is timed PC I

- 2 - through projection, reflection and reception to directly measure object distances. In DFD methods, variations of blur size throughout the depth of field of a camera are used to gauge approximate ranges. Whereas these two methods involve considerable expense or complexity to achieve moderate accuracy and speed, triangulation methods, which relate lateral object displacements to depth through straightforward triangulation, are known to be fast, accurate and inexpensive.

[0005] Within the triangulation category of depth capture methods, there are both passive and active branches. Passive triangulation methods essentially amount to stereo vision, wherein cameras record disparities in feature locations between multiple viewpoints. Problems arise, however, when the scene itself lacks sufficient feature points to permit unambiguous triangulation, in which case the depth . map becomes sparse and loses robustness.

[0006] In active stereo methods, one camera is replaced by alight source projecting a specially designed— or structured - pattern of illumination. This approach has important advantages over passive stereo methods, because the projected patterns make up for any lack of features in the scene and also improve robustness against variations in ambient lighting.

[0007] Notwithstanding the considerable advances made in structured light technologies to date, there are still significant problems that limit the effectiveness of even the most advanced systems. Chief amongst these are a lack of robustness due to shadows and occlusion, which can be expected in any real-world scene. Loss of 3D information through shadows and occlusion comes as a direct, result of a loss of dimensionality in the captured image. In conventional structured light systems, scenes lose dimensionality even before their images have been captured on account of using a single direction of illumination. For surface orientations oblique to the illumination and/or the . camera vectors, the reliability of reconstructed depths is also seriously compromised by shadowing and intensity spreading. Loss of information in captured geometries poses particular problems in robotics, where the analysis of object shapes determines how they are negotiated or manipulated.

[0008] While multi-projector and multi-camera systems have been put forward as a means of improving the performance of structured light systems, it has generally proved difficult _^

- 3 - to de-multiplex and fuse the resulting information to form coherent geometry estimates. Sequential pattern projection is common, but this makes systems slow, and still requires a data fusion step. Aside from these problems, projectors of the type used for precision structured lighting are usually cumbersome and expensive. Multiple projector systems, therefore, tend to be bulky, with little chance of mobile deployment.

[0009] Another problem with existing structured light technology is the distortion inherent in camera and projector optics. On account of X and Y scene coordinates being inferred from sensor pixel positions, rather than actually being measured, the accuracy of triangulated Z coordinates is only as good as the distortion calibrations. This poses difficulties, for example, when strongly distorting fisheye lenses are used to obtain a wide field of view.

[0010] In summary, a system is desired to be capable of utilising multiple sources of illumination without being cumbersome or expensive. Such a system would be expected to scale well with the number of sources, implying that the sources themselves should be simple and inexpensive with minimal communication and control requirements.

[0011] Such an improved depth ranging system would also be independent of the kind of optical distortion that currently needs to be calibrated out of depth calculations, implying that the use of optical elements should be minimised and that depth calculations should not rely on implicit correspondences between pixel positions and scene coordinates.

[0012] Lastly, an improved depth ranging system should be capable of acquiring information from multiple sources simultaneously, and efficiently fusing the information into a coherent geometric description of the scene. Such a system would then constitute a geometry acquisition system, rather than simply a depth mapping system.

[0013] Simplifications in projector design, in which virtually no optics are employed, have recently been proposed in "Development of a 3D vision range sensor using equiphase light section method", Kumagai, M, Journal of Robotics and Mechatronics, vol. 17, no. 2, pp. 110-115, 2005. These simplifications involve projecting light through a rotating mask of sinusoidally varying transparency such that the resulting illumination is spatio- temporally modulated. When successive video frames are processed, the phases of sinusoidal components in the intensity, calculated with respect to some angular datum, can PC I

- 4 - be used to estimate the angular displacements. By associating these measured angular displacements with calibrated angular displacements of pixels within the field of view of the camera, the depths of scene points can be triangulated for each camera-source pair. Unfortunately, this scheme is not ideal, in that the estimated scene depths remain strongly dependent on both the location of the camera and its inherent calibration parameters.

SUMMARY

[001 ] According to one aspect of the present disclosure, there is provided a method of determining coordinates of a reference point on an object in three-dimensional space in a scene captured by an image sensor. The object is irradiated by light sources which are modulated at a different spatio-temporal frequency. The method generates a composite intensity (light) signal (which has multiple components each providing a phase angle - called herein a composite phase signal) on the object by a predetermined geometric arrangement of the light sources, and captures the composite phase signal at the reference point using the image sensor. A processing arrangement determines, from the captured composite phase signal, a set of measured positioning parameters independent of a position of the image sensor. The measured positioning parameters from the light sources are used for determining the coordinates of the reference point.

[0015] According to another aspect of the present disclosure, there is provided a method of determining at least two coordinates of a reference point on an object in three-dimensional space in a scene captured by an image sensor, the object being irradiated simultaneously by a plurality of spatio-temporally modulated light sources, at least one of the plurality of spatio-temporally modulated light sources being modulated at a different spatio-temporal frequency to another one of said plurality of spatio-temporally modulated light sources, the method comprising the steps of:

generating a composite phase signal on the object in the scene by a predetermined geometric arrangement of the plurality of spatio-temporally modulated light sources, each one of said plurality of light sources having a known position in the three-dimensional space;

capturing the composite phase signal at the reference point with the image sensor; determining from the captured composite phase signal, a set of measured positioning parameters, used in the determination of at least two coordinates of the PUJ

- 5 - reference point, from the plurality of light sources, wherein the set of measured positioning parameters is determined independent of a position of the image sensor; and

determining at least two coordinates at the reference point using the set of measured positioning parameters from the plurality of light sources.

[0016] Preferably one of the at least two coordinates is a depth coordinate.

[0017] Desirably each of the plurality of light sources is characterised by at least one known positioning parameter with respect to a reference line through said reference point.

[0018] Advantageously the difference in spatio-temporal frequency of the at least one light source results from at least one of:

(i) a different spatial frequency of a pattern on the at least one light source,

(ii) a different rotation velocity of the at least one light source; and

(iii) a different orientation of the at least one light source.

[0019] Desirably each light source comprises multiple intersecting patterns to create a two-dimensional signal. Preferably the patterns are orthogonal.

[0020] In a specific implementation each said light source may comprise a rotating pattern surrounding the light source.

[0021] Desirably the composite phase signal forms a wavefront that is radial to the corresponding light source.

[0022] Typically the patterns are sinusoidal.

[0023] Generally the measured positioning parameters comprise an angular displacement from the light source.

[0024] Most typically the object is in a three-dimensional space and the method determines the three-dimensional coordinates of the reference point in the three- dimensional space.

[0025] Desirably the positioning parameters are measured with respect to a reference line through each of the plurality of spatio-temporally modulated light sources, thereby being independent of a position of the image sensor. PCI

- 6 -

[0026] According to another aspect of the present disclosure, there is provided a robotic system comprising:

a robotic manipulator arranged for operation in association with an object in three- dimensional space;

an image sensor arranged for imaging a scene formed at least by the object;

a plurality of spatio-temporally modulated light sources configured to

simultaneously illuminate the scene, at least one of said plurality of spatio-temporally modulated light sources being modulated at a different spatio-temporal frequency to another one of said plurality of spatio-temporally modulated light sources, each one of said plurality of light sources having a known position in the three-dimensional space;

a computing device connected to the robotic manipulator, the image sensor and the each of the light sources and configured to:

. generate a composite phase signal^' on the object in the scene by a predetermined geometric arrangement of the plurality of spatio-temporally modulated light sources;

capture the composite phase signal at a reference point on the object with the image sensor;

determine from the captured composite phase signal, a set of measured positioning parameters, used in the determination of at least two coordinates of the reference point, from the plurality of light sources, wherein the set of measured positioning parameters is determined independent of a position of the image sensor;

determining the at least two coordinates of the reference point using the set of measured positioning parameters from the plurality of light sources; and

controlling a position of the robotic manipulator based on the determined coordinates of the reference point.

[0027] Preferably the image sensor is mounted upon the robotic manipulator.

Alternatively the image sensor may be located at the reference point, where desirably the image sensor can be a photodiode.

[0028] Other aspects are also disclosed. P

- 7 -

BRIEF DESCRIPTION OF THE DRAWINGS

[0029] At least one embodiment of the present invention will now be described with reference to the following drawings, in which:

[0030] Fig. 1 illustrates the intersections of iso-phase planes generated by a pair of cylindrical spatio-temporally modulated light sources;

[0031] Fig. 2 is a plan view of a pair of spatio-temporally modulated sources in relation to two objects in the scene and two possible camera viewpoints;

[0032] Figs. 3A and 3B illustrate a typical intensity signal projected by a spatio-temporally modulated light source comprising two superimposed sinusoidal carriers with distinct periods, and the appearance of the signal in the Fourier domain;

[0033] Figs. 4A and 4B illustrate a skew sinusoidal pattern possessing modulation in both horizontal (X) and vertical (Y) orientations along with mappings of such a pattern into cylindrical and spherical geometries in one implementation according to the present disclosure;

[0034] Fig. 5 is a visualisation of an iso-phase surface of a spherically mapped skew sinusoidal mask and how light rays emanating from the centre of the sphere are mapped to spatial coordinates with varying azimuth and elevation angles;

[0035] Figs. 6A and 6B illustrate the construction of a pattern possessing skew-orthogonal sinusoidal components with two distinct periods along with mappings of such a pattern into cylindrical and spherical geometries in an implementation according to the present disclosure;

[0036] Fig. 7 illustrates the deployment of two spatio-temporally modulated light sources, each projecting multiple skewed sinusoidal patterns into a 3-D scene according to a preferred implementation;

[0037] Fig. 8 illustrates the typical convergence of the reconstruction algorithm when fusing data from multiple spatio-temporal light sources in another implementation; PC

- 8 -

[0038] Fig. 9 is a schematic block diagram illustrating the sequence of steps in processing captured frames from a video camera into estimates of scene geometry according to a preferred implementation;

[0039] Fig. 10 is a schematic illustration of a typical robotic part picking application involving multiple spatio-temporally modulated light sources;

[0040] Figs. 1 1A and 1 IB form a schematic block diagram of a general purpose computer system upon which arrangements described can be practiced, and

[0041] Fig. 12 illustrates a system with three spatio-temporally modulated light sources oriented so as to allow a determination of the three dimensional location of any point in the scene.

DETAILED DESCRIPTION INCLUDING BEST MODE

[0042] Fig. 10 illustrates a robotic system 1000 in which a manipulator 1060 controlled by a computer 1070 is tasked with handling various objects 1030. The system 1000 therefore, needs to know or otherwise estimate or determine precise spatial locations of the objects 1030, and particularly in association with manipulator 1060, whose location in the 3D space will be known. Also shown in Fig. 10 is a video camera 1050 operating as an image sensor for capturing images of the scene in which the objects 1030 are located at a high frame rate. Typically in such a robotic system 1000, the image sensor (camera) 1050 may be conveniently mounted to a peripheral limb of the manipulator 1060. The system 1000 also includes multiple light sources 1010, 1020 and 1040 configured at known locations around the periphery of the scene for substantially simultaneous irradiation of the scene. These light sources 1010, 1020 and 1040 illuminate the scene coincidently, but are spatio- temporally modulated on account of radiating intensities that are functions of both position and time. At least two such light sources are required according to the present disclosure.

[0043] The multiple light sources are preferably modulated at different carrier frequencies, as the resulting diversity of illumination can improve robustness to shadows and occlusions, as discussed above. The difference in frequency between any two light sources may be from any one or combination of:

(i) a different spatial frequency (or corresponding wavelength) of a pattern of PC J

- 9 - one of the light sources,

(ii) a different rotation (or angular) velocity of one of the light sources; and

(iii) a different orientation of one of the light sources (for example one light source having a different axis of rotation to another).

[0044] The arrangements presently disclosed utilise L such spatio-temporal ly modulated sources (where L is generally greater than 1 , and equal to 3 in the example of Fig. 10) to achieve robustness against occlusions and shadows, as well as partial or complete independence (depending on the specific implementation) of the calibration parameters of the camera 1050. Importantly, the scene geometry afforded by the objects 1030 is estimated in the coordinate frame of the sources 1010, 1020 and 1040, which is stationary.

[0045] Figs. 1 1 A and 1 IB depict the computer system 1070, which may be implemented using a general-purpose computer, and upon which the various arrangements described can be practiced.

[0046] As seen in Fig. 1 1A, the computer system 1070 includes: a computer module 1 101 ; input devices such as a keyboard 1102, a mouse pointer device 1103, a scanner 1 126, the camera 1050, and a microphone 1180; and output devices including the light sources 1010, 1020, 1040, a display device 11 14 and loudspeakers 1117. An external Modulator- Demodulator (Modem) transceiver device 1116 may be used by the computer

module 1 101 for communicating to and from a communications network 1120 via a connection 1121. The communications network 1120 may be a wide-area network (WAN), such as the Internet, a cellular telecommunications network, or a private WAN. Where the connection 1 121 is a telephone line, the modem 1 116 may be a traditional "dial- up" modem. Alternatively, where the connection 1 121 is a high capacity (e.g., cable) connection, the modem 1 116 may be a broadband modem. A wireless modem may also be used for wireless connection to the communications network 1120.

[0047] The computer module 1 101 typically includes at least one processor unit 1 105, and a memory unit 1106. For example, the memory unit 1106 may have semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The computer

i

module 1101 also includes an number of input/output (I/O) interfaces including: an audio- video interface 1107 that couples to the video display 1114, loudspeakers 11 17 and microphone 1180; an I/O interface 1 113 that couples to the keyboard 1102, mouse 1103, _|(

- 10 - scanner 1126, camera 1050 and optionally a joystick or other human interface device (not illustrated); and an interface 1108 for the external modem 1 116 and light sources 1010, 1020 and 1040. In some implementations, the modem 1116 may be incorporated within the computer module 1 101, for example within the interface 1 108. The computer module 1 101 also has a local interface 1 111, which permits coupling of the computer system 1070 via a connection 1 123 to the manipulator 1060.

[0048] The I/O interfaces 1 108, 1 1 1 1 and 1 1 13 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 1109 are provided and typically include a hard disk drive (HDD) 1 1 10. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 1112 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (e.g., CD-ROM, DVD, Blu-ray Disc™), USB-RAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate sources of data to the system 1070.

[0049] The components 1 105 to 1113 of the computer module 1 101 typically

communicate via an interconnected bus 1104 and in a manner that results in a conventional mode of operation of the computer system 1070 known to those in the relevant art. For example, the processor 1 105 is coupled to the system bus 1 104 using a connection 1118. Likewise, the memory 1106 and optical disk drive 11 12 are coupled to the system bus 1 104 by connections 11 19. Examples of computers on which the described

arrangements can be practised include IBM-PC's and compatibles, Sun Sparcstations™, Apple Mac™ or a like computer systems.

[0050] The methods of coordinate determination may be implemented using the computer system 1070 wherein the processes of Figs. 1 to 10, to be described, may be implemented as one or more software application programs 1133 executable within the computer system 1070. In particular, the steps of the methods of depth mapping and coordinate determination are effected by instructions 1 131 (see Fig. 1 IB) in the software 1 133 that are carried out within the computer system 1070. The software instructions 1 131 may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the PCI

• - 1 1 - corresponding code modules performs the depth and coordinate determination methods and a second part and the corresponding code modules manage a user interface between the first part and the user.

[0051] The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer system 1070 from the computer readable medium, and then executed by the computer system 1070. A computer readable medium having such software or computer program recorded on the computer readable medium is a computer program product. The use of the computer program product in the computer system 1070 preferably effects an

advantageous apparatus for coordinate determination, and associated depth determination.

[0052] The software 1133 is typically stored in the HDD 1110 or the memory 1 106. The software is loaded into the computer system 1070 from a computer readable medium, and executed by the computer system 1070. Thus, for example, the software 1 133 may be stored on an optically readable disk storage medium (e.g., CD-ROM) 1 125 that is read by the optical disk drive 11 12. A computer readable medium having such software or computer program recorded on it is a computer program product. The use of the computer program product in the computer system 1070 preferably effects an apparatus for determining 3D coordinates and/or depth.

[0053] In some instances, the application programs 1 133 may be supplied to the user encoded on one or more CD-ROMs 1125 and read via the corresponding drive 1 112, or alternatively may be read by the user from the network 1 120. Still further, the software can also be loaded into the computer system 1070 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to the computer system 1070 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Blu-ray™ Disc, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 1 101. Examples of transitory or non-tangible computer readable transmission media that may also participate in the provision of software, application programs, instructions and/or data to the computer module 1 101 include radio or infra-red transmission channels as well as a PC l

network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.

[0054] The second part of the application programs 1133 and the corresponding code modules mentioned above may be executed to implement one or more graphical user, interfaces (GUIs) to be rendered or otherwise represented upon the display 1 114. Through manipulation of typically the keyboard 1 102 and the mouse 1103, a user of the computer system 1070 and the application may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be

implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 1117 and user voice commands input via the microphone 1180.

[0055] Fig. 1 IB is a detailed schematic block diagram of the processor 1 105 and a "memory" 1134. The memory 1 134 represents a logical aggregation of all the memory modules (including the HDD 1 109 and semiconductor memory 1 106) that can be accessed by the computer module 1 101 in Fig. 11 A.

[0056] When the computer module 1 101 is initially powered up, a power-on self-test (POST) program 1 150 executes. The POST program 1150 is typically stored in a

ROM 1 149 of the semiconductor memory 1106 of Fig. 11A. A hardware device such as the ROM 1149 storing software is sometimes referred to as firmware. The POST program 1 150 examines hardware within the computer module 1101 to ensure proper functioning and typically checks the processor 1105, the memory 1 134 (1109, 1 106), and a basic input-output systems software (BIOS) module 1151, also typically stored in the ROM 1 149, for correct operation. Once the POST program 1150 has run successfully, the BIOS 1151 activates the hard disk drive 1110 of Fig. 11A. Activation of the hard disk drive 1 1 10 causes a bootstrap loader program 1152 that is resident on the hard disk drive 1 110 to execute via the processor 1105. This loads an operating system 1 153 into the RAM memory 1 106, upon which the operating system 1 153 commences operation. The operating system 1 153 is a system level application, executable by the processor 1 105, to fulfil various high level functions, including processor management, memory management, device management, storage management, software application interface, and generic user interface. j

- 13 -

[0057] The operating system 1153 manages the memory 1134 (1109, 1 106) to ensure that each process or application running on the computer module 1 101 has sufficient memory in which to execute without colliding with memory allocated to another process.

Furthermore, the different types of memory available in the system 1070 of Fig. 1 1 A must be used properly so that each process can run effectively. Accordingly, the aggregated memory 1 134 is not intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but rather to provide a general view of the memory accessible by the computer system 1070 and how such is used.

[0058] As shown in Fig. 1 IB, the processor 1 105 includes a number of functional modules including a control unit 1139, an arithmetic logic unit (ALU) 1140, and a local or internal memory 1 148, sometimes called a cache memory. The cache memory 1148 typically includes a number of storage registers 1144 - 1 146 in a register section. One or more internal busses 1141 functionally interconnect these functional modules. The ^" processor 105 typically also has one or more interfaces 1142 for communicating with external devices via the system bus 1 104, using a connection 1 1 18. The memory 1 134 is coupled to the bus 1 104 using a connection 1 119.

[0059] The application program 1 133 includes a sequence of instructions 1 131 that may include conditional branch and loop instructions. The program 1133 may also include data 1132 which is used in execution of the program 1133. The instructions 1 131 and the data 1 132 are stored in memory locations 1 128, 1 129, 1130 and 1135, 1 136, 1137, respectively. Depending upon the relative size of the instructions 1 131 and the memory locations 1 128-1 130, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 1130. Alternately, an instruction may be segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 1128 and 1 129.

[0060] In general, the processor 1 105 is given a set of instructions which are executed therein. The processor 1 105 waits for a subsequent input, to which the processor 1105 reacts to by executing another set of instructions. Each input may be provided from one or more of a number of sources, including data generated by one or more of the input devices 1 102, 1103, data received from an external source across one of the networks 1 120, 1 102, data retrieved from one of the storage devices 1106, 1109 or data retrieved from a storage medium 1 125 inserted into the corresponding reader 1 1 12, all depicted in Fig. 1 1 A. The execution of a set of the instructions may in spme cases result in output of data. Execution may also involve storing data or variables to the memory 1134.

[0061] The disclosed depth and coordinate measurement arrangements use input variables 1 154, which are stored in the memory 1134 in corresponding memory locations 1 155, 1 156, 1 157. The arrangements produce output variables 1 161, which are stored in the memory 1134 in corresponding memory locations 1162, 1 163, 1164.

Intermediate variables 1 158 may be stored in memory locations 1159, 1160, 1 166 and 1167.

[0062] Referring to the processor 1 105 of Fig. 1 IB, the registers 1144, 1145, 1146, the arithmetic logic unit (ALU) 1 140, and the control unit 1 139 work together to perform sequences of micro-operations needed to perform "fetch, decode, and execute" cycles for every instruction in the instruction set making up the program 1 133. Each fetch, decode, and execute cycle comprises:

(a) a fetch operation, which fetches or reads an instruction 1131 from a memory location 1 128, 1 129, 1 130;

(b) a decode operation in which the control unit 1139 determines which instruction has been fetched; and

(c) an execute operation in which the control unit 1139 and/or the ALU 1 140 execute the instruction.

[0063] Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 1 139 stores or writes a value to a memory location 1132.

[0064] Each step or sub-process in the processes of Figs. 1 to 10 is associated with one or more segments of the program 1133 and is performed by the register

section 1 144, 1 145, 1 147, the ALU 1 140, and the control unit 1139 in the processor 1105 working together to perform the fetch, decode, and execute cycles for every instruction in the instruction set for the noted segments of the program 1133. [0065] The methods or parts thereof may alternatively be implemented in dedicated hardware such as one or more integrated circuits performing the functions or sub functions of depth and coordinate mapping. Such dedicated hardware may include graphic processors, digital signal processors, or one or more microprocessors and associated memories.

[0066] Fig. 1 illustrates one arrangement in which L=2 and two spatio-temporally modulated light sources 1 10 and 120 achieve camera independence in the X and Z coordinates. While still relying on camera calibration for coordinate estimates in the Y-axis, such a.system is useful in 2-D applications.

[0067] Generally, spatial modulation of the I^th ofL sources is achieved by virtue of a grey- scale mask whose transmittance is a linear sum of M superposed sinusoidal functions of the circumferential angle 0_f 170 in the X-Z plane as illustrated. In the arrangement depicted in Fig. 1 , (L = 2, M = 1) the masks are mapped onto cylinders with horizontal circumferential frequencies of K_tm cycles around the circumference. Hence, the intensity radiated by the I^th source is given by

M

/i « T [l +∞s(Ki_m0.)]

m=i which is a linear sum of sinusoidal terms, each offset (in this instance by a value of one) to maintain overall positivity.

[0068] In its static condition, the pattern of light intensity radiated through each mask from a line or point source on an axis of the mask is proportional to the mask transmittance. It should be noted that, although the light source described would ordinarily radiate in all directions, with an attendant rapid loss of intensity with range, it is straightforward to constrain the angle of illumination to any desired value by using suitable internal reflectors. The central aspect of importance here is that there are no refracting optics in the light path of the source. Unlike conventional projectors, therefore, which use lenses and are thus limited to finite depths of field, the spatio-temporal light source described radiates an unfocussed, diverging field. PC

- 16 -

[0069] Generally, for a light source network comprising L spatio-temporal sources, the temporal component of modulation for the I^th source is achieved by rotating its cylindrical mask at a velocity of /V_t revolutions per second, where the specific value of Ni is characteristic of the I^th source.

[0070] The far-field intensity of the I^th source then comprises a rotating sum of sinusoidal carriers whose phases can be directly related to the instantaneous mechanical angle 6_t 170 through which the cylinder has rotated with respect to the datum θ₀ 180, which is common to all sources. The oscillation frequency of each carrier intensity is therefore determined by the horizontal circumferential frequency K_lm and the number of revolution per second N as follows: flm = KlmNl (HZ).

[0071] Although, for simplicity, this and later implementations to be described involve physically rotating the mask to achieve the desired spatio-temporal modulation, for example, similar results can be achieved using solid-state spatial light modulators,, provided the far-field intensities remain sinusoidal with a linear phase proportional to the mechanical (azimuth) angle of respective light source 0_{. Henceforth, phases of sinusoidal carrier intensities will be referred to in terms of the electrical angles ψ_1τη, which are related to the mechanical (azimuth) angles d_t by im = ^κΐτη(θι + 2n-fyt_n)(radians) where t_n is an instant in time. By virtue of sweeping sinusoidal light patterns at predictable frequencies and phases, the spatio-temporal sources 1 10 and 120 encode points in the scene such that their azimuth angles with respect to each source can be readily estimated. This is accomplished by demodulating the pixel time histories across successive frames captured by the camera 1050 and determining the phase(s) of each carrier. De-multiplexing multiple carriers is relatively straight forward on account of different sources using different values of either K_lm or N which enables a form of frequency-division multiplexing, as illustrated in Figs. 3 A and 3B.

[0072] Fig. 2 is a plan view of the pair of spatio-temporal sources 210 and 220, such as those depicted in Fig. 1 , in relation to objects forming a scene 230, which in this case is the PC

- 17 - same as that shown in Fig. 10. Fig. 2 shows coordinate axes 290 by which positioning parameters of the source locations 210 and 220 are known or determinable. With knowledge of the source locations 210 and 220, and the angles B 270 and θ₂ 240 relative to the angular datum θ₀ 280, this arrangement permits direct triangulation of scene points 250 in X and Y, being at least two coordinates in the 3D system, without regard to either the calibration parameters of a camera 295, nor the location 260, 265, which is free to vary.

[0073] Fig. 3 A illustrates what the temporal history, or time signal 310, of a single camera pixel might resemble when two carriers are present, according to the arrangement depicted in Figs. 1 and 2. Fig. 3B is the temporal spectrum of the time signal in Fig. 3A computed using a fast Fourier transform (FFT). Fig. 3B firstly shows the presence of the two sinusoidal signals 330 and 340 having different frequencies. In view of the spread of peaks shown at DC (0 Hz) and at 21 Hz and 27 Hz, Fig. 3B secondly shows the limitations of FFT techniques in estimating signal parameters from short time records. Particularly, the FFT approach displays poor resolution of closely spaced frequencies in comparison to algebraic techniques, especially when the spectrum becomes more crowded with carriers. In view of this poorer resolution, together with interference from the background (DC) illumination 320, and mutual interference between sinusoidal components 3.30 and 340, the preferred implementation uses an algebraic (matrix) approach to estimating carrier amplitudes and phases. Demodulating the camera frame data using an algebraic approach gives superior phase estimation performance on account of the frequencies being precisely known beforehand.

[0074] Though moving parts are generally undesirable in practical devices, the necessary frequency and phase stability for the above implementation may be readily achieved using a combination of low-drift oscillators and synchronous motors. Similarly, there are numerous means of achieving the necessary spatio-temporal synchronisation between the light sources 210,220 and the camera 295, ranging from sophisticated solutions employing digital compasses and wireless communication, to simpler wired solutions applicable in industrial situations where the illumination arrangements rarely change.

[0075] Referring again to Fig. 1, iso-phase planes 130 and 140 are shown radiating from the spatio-temporal sources 1 10 and 120. On such planes, the phases of sinusoidal carriers P

- 18 - radiated by a given source are constant, and thus the corresponding wavefront is radial. Due to the integer K_lm cycles around the circumference of each mask, there will also be Ki_m such planes corresponding to any measured carrier phase ψ_1τη. As a consequence of this phase wrapping phenomenon, there will be ambiguities in associating phase measurements with the correct iso-phase planes, which are needed to triangulate scene coordinates. As illustrated in Fig. 1, a line of intersection 150 between the iso-planes 130 and 140 may be no more valid than the line 60.

[0076] Solutions to this problem involve selecting both fine and coarse periods for the M superposed sinusoids projected by each source such that the azimuth angles 0_t become unambiguous. When this is achieved, the correct iso-phase planes 150 and 160 are unambiguously identifiable and the scene coordinates are localised to lie on intersections between the planes. As evident from Fig. 1 , the elevations of points above and below the X-Z plane remain ambiguous. This means that the desirable goal of camera independence is not completely achieved by this first implementation with respect to the third dimension. This implementation still offers significant advantages for applications requiring independence in a two-dimensional plane. For example, for a robotic arm applicator confined to a two-dimensional plane.

[0077] In another implementation, camera independence maybe achieved by adding additional sources projecting rotating patterns surrounding the light sources in the vertical plane, and hence intersecting the line 150 to provide unambiguous localisation. This is illustrated in Figure 12, which shows three light sources 1210, 1212, 1214, each rotating about their respective axes. A point 1218 in the scene in some part of a surface 1240 in the scene when seen by a camera 1230, will be associated with phases from each of the sources 1210-1214 which, after analysis, will reveal the angles, θι, θ₂, θ₂ of the planes 1220, 1222, 1224 containing the rays emitted from the three light sources. If the locations of the point light sources aredcnown then then these angles can be used with those locations to triangulate the location of the point in the scene in x,y,z coordinates. The coordinates may be defined, for example, by the location of the first light source, 1250 (taken to be the origin), a line joining the light sources, 1251 , (taken as the x axis), the axis of rotation of the first light source, 1252, (taken to be the y axis) and a line perpendicular to the x and y axes, 1253, (taken to be the z-axis). This process is independent of the location of the camera 1230 and reveals the 3D location of points in the scene rather than PC I

- 19 - just their depth. Such an arrangement will not be discussed further on account of it being inferior to the next (and preferred) implementation, which utilises additional degrees of freedom available in the original pattern, thereby reducing the number of sources required to achieve true camera independence.

[0078] Fig. 4A illustrates a sinusoidal pattern 430 similar to that discussed in Fig. 1 , however, this pattern is skewed, or tilted, with respect to the X and Y axes. By virtue of this skew, the pattern 430 can be considered to have spatial frequencies in both horizontal and vertical directions, the aim of which is to resolve the ambiguous elevations in the first implementation.

[0079] Fig. 4B shows a skew-sinusoidal pattern 430 mapped to both cylindrical 410 and spherical 420 geometries, where the condition of integral circumferential cycles is observed in each case. Further discussion will focus on the spherical geometry 420 on account of its advantages in providing a direct mapping between its vertical phase component and the elevation angle ι, as well as the sphere having surfaces normal to the light path, and thus less likely to introduce unwanted refraction into the projected intensities.

[0080] In Fig. 5, a single spherical mask 510 is illustrated with respect to the global coordinate frame 550. Whereas the iso-phase surfaces for the first arrangement took the form of vertical planes, the iso-phase surfaces for this spherical skew pattern take the form of helical coils 520. Rays projected from the centre of the sphere 510 through the helix 520 intersect scene points p 560 possessing identical phase. As the spherical skew pattern rotates around its axis, in this case the Y-axis of the coordinate frame 550, the iso-phase surfaces 520 rotate with the pattern, encoding the angular displacements represented by the azimuth angle θ_{ 540 and the elevation angle φχ 530 into the projected intensities I_lm according to l_lm oc 1 + cos(K_lme_s + V_lm<t>_s) where V_tm is the vertical equivalent of the horizontal circumferential frequency K_lm.

[0081 ] Given the two angles to be resolved and the single phase measurement available, 0_t and φι cannot be resolved using a single skew-sinusoidal pattern. For this reason, the PC I

- 20 - preferred implementation comprises at least two sinusoidal patterns 610 and 620 skewed in opposite directions as illustrated in Fig. 6A. These patterns, corresponding to positive and negative values of V_Im, are summed to construct or create a composite pattern 630, being a two-dimensional signal. The pattern 630 is an example of a composite signal, being a composite of the signal patterns 610 and 620 as impinging upon the object. As seen in Fig. 6A the patterns are orientated orthogonal to each other, thus creating a cross-hatched composite pattern. Since each of the patterns 610 and 620 contains phase changes, the pattern 630 is an example of a composite phase signal representing a composite signal of light intensities from the spatio-temporal light sources from which the patterns are generated.

[0082] By arranging for the skew-sinusoidal components to have the distinct

circumferential frequencies K_tl and K_l2, their respective contributions to the intensity I_t can be easily separated in temporal frequency, as previously discussed in relation to demodulation. Once the sinusoidal carriers are separated and their unwrapped phases and have been determined throughout the scene, the associated azimuth and elevation angles 6_t and 0jis determined for all points with respect to the I^th source 640 in Fig. 6B by solving the system of equations set out below.

[0083] As a consequence of using oppositely skewed sinusoidal components, Fig. 7 shows an example implementation 700 in a three-dimensional system 790 where respective iso- phase surfaces 740 and 750 wind around a known positioned spatio-temporal source 710 in opposite directions. The positioning of the light sources 710 and 720 provide for a predetermined geometric arrangement of the sources relative to the object or reference point 780 within a three-dimensional space. The intersections of these contra-rotating surfaces define a ray 760 which identifies the azimuth angle fyand elevation angle^' 0_{ of points, such as a point 780 in the scene with respect to the /^{t l}source 710. The addition of a second source (/ = 2) 720 at a known position in the space 790 with either different values of K_lm, or simply different rotational speeds N introduces additional rays 770, which intersect the first ray 760 to uniquely localise 3D points in the scene, particularly the point 780 as illustrated. [0084] Complete camera independence, with respect to the third dimension, can thus be achieved by using two or more spatio-temporal light sources of the type described. The question of uniqueness for the calculated angles still arises, as the measured phases for the various carriers will generally be wrapped. This issue can be handled in the manner previously discussed, whereby sinusoidal patterns are augmented with additional components of differing periods to resolve the ambiguity. Wrapped phase means that a phase angle and corresponding phase angle rotated by 2π radians is not disambiguated. As such, following from the patterns discussed with reference to Figs. 6A and 6B, as seen in Fig. 7, the point 780 is irradiated with a composite wrapped phase signal produced by the light sources 710, 720, with each light source being characterised by at least one known positioning parameter with respect to a reference line, such as 760, 770 through the reference point 780.

[0085] The remaining aspect of the present disclosure to be discussed is the reconstruction algorithm, which is responsible for forming the overall geometry estimate based on information from L distributed spatio-temporal sources. As discussed above, the angles and φι can, in principle, be determined directly for each source. This, however, ignores the influence of noise in the intensity measurements, and also the non-uniformity in data quality to be expected in real- world measurements. For reasons mentioned in the

Background discussion, much of the raw data captured will be unreliable. Some sources will be shadowed in certain parts of the scene, while specular surfaces may reflect the light from other sources away from the camera. To reconstruct reliable estimates of the scene geometry, the complete data ensemble from L distributed sources must be weighted and combined according to its reliability.

[0086] The remaining component of the present disclosure is a minimisation algorithm (e.g. Newton's algorithm) designed to reconstruct scene geometries such that modelled carrier phases match the measured phases in an overall least squares sense. For a single sinusoidal intensity component /_lm and a known geometry comprising the coordinate triplets(x, y, z), the forward data model, mapping scene coordinates to estimated phases i i_m, takes the form

„

PCl

- 22 - where (xi, yi, z ) is the locationof the I^th source.

[0087] The cost function for the least squares minimisation is calculated oyer the M sinusoidal carriers of each of the L spatio-temporal sources. The spectral intensities estimated in the demodulation step, exemplified in Fig. 3B by I 350 and /₂ 360, are used to weight the respective phase estimates such that those associated with stronger signals take precedence over noisier ones. This is the underlying principle whereby geometric diversity in the positioning of multiple sources is able to improve the robustness of geometry estimates. The overall weighted cost function to be minimised is given by

[0088] The cost function derivatives necessary for calculating the coordinate increments at each iteration of the minimisation algorithm (e.g. Newton's algorithm) include the vectors of first derivatives Vx², and the matrices of second derivatives V ^² for each scene point p, given respectively by

I-3 d²x² a²x²- dx² dxdy dxdz

2 _ d²x² d²x² d²x²

and

dxdy dy^z dydz

d²x² d²X² d²x²

■dxdz dydz z^z - _^

PCl

- 23 -

[0089] All components of and V x² have straight-forward analytic forms, which makes them simple to recalculate during each iteration. Given the non-linearity of the cost function associated with the arctangent functions, the step-length in the error minimisation is scaled by a fraction γ so that piece-wise quadratic approximations to the cost-function remain reasonably accurate. The refinement step for the scene coordinate estimation then takes the form

where the initial estimate p^can be constructed using direct triangulation, as practiced in the prior art, without regard to camera calibration. Generally, the better the initial estimate of p, the fewer steps are required to reduce the squared error below the desired tolerance.

[0090] Fig. 8 is a typical plot of the convergence of the above reconstruction algorithm in which it can be observed that the squared error 820 is monotonically reduced on each iteration 810. The minimum attainable residual error 830 is a function of the signal to noise ratio of the input image frames, as well as the accuracy of the source locations etc.

[0091] Fig. 9 summarises a data processing architecture 900 representative of a process of the preferred implementation of the geometry acquisition system described herein. At the commencement of the process 900, before geometry estimation can begin, a certain minimum number of frames are acquired at step 910 from the camera system, of Fig. 10 for example, to permit the phases of sinusoidal illumination components to be estimated. The scene geometry is then initialised, as indicated by the dashed arrow connection 912, to a starting estimate at step 920. This starting estimate can take the form of a regular Cartesian grid of pixels having some uniform (or user specified) default depth.

Alternatively, on the assumption that the scene has not changed substantially from the previous estimation cycle (i.e. assuming small movements or a sufficiently high frame rate), the processing system can use the result of the previous calculation to initialise the current estimate. In yet another implementation, the carrier phase outputs 935 arising from the sinusoidal fit 930 can be used in conjunction with the camera data to triangulate the approximate coordinates of the camera pixels as an initial geometry estimate.

[0092] Following demodulation of the carrier phases 935 from the sinusoidal fitting as performed at step 930, the measured carrier phases 935 are compared in step 990 with the P<J J

- 24 - modelled phases generated from the current geometry estimate determined in step 980, or step 920 in the first iteration. The comparison of step 990 calculates errors of the carrier phases with respect to the current geometry estimate. If the sum of squared errors is less than a prescribed threshold, being a convergence test performed at step 995, the reconstruction of the scene geometry halts, and the process 900 proceeds to acquire the next frame at step 999 for processing in the next cycle, as indicated by the dashed line 998.

[0093] Where the error has not converged at step 995, the next stage of the processing, being the most numerically intensive, involves the calculation of derivatives of the cost function at step 950. In step 960, the derivatives are weighted according to the carrier amplitudes 940 found during the sinusoidal fitting step 930 and used to construct the Newton increment in step 970. The Newton increment is then scaled and added to the current geometry estimate in step 980 to provide the updated geometry estimate to step 990.

[0094] The iterative process 900 then continues, with the subsequently calculated error being less than that in the preceding iteration. On convergence, the variance of the geometry estimates is greatly improved over straightforward triangulation, on account of the data being fused from multiple geometrically diverse sources of illumination, independently of any camera or projector calibration.

INDUSTRIAL APPLICABILITY

[0095] The arrangements described are applicable to the computer and data processing industries and particularly for the measurement of depth in 3D environments using a single imaging device.

[0096] The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.

[0097] For example, whilst the implementation of Fig. 10 illustrates the image sensor as a camera 1050 preferably mounted to the robotic arm 1060, the sensor may be implemented as a simple light detector, such as a photodiode, positioned at the reference point in the 3D scene. In such an implementation, the sensor does not detect light from the sources p

- 25 - reflected from the 3D scene, but rather the light from the spatio-temporal source as the light directly impinges the scene. This arrangement can be useful of detecting motion in the scene, being a situation where the depth may be undergoing variation.

Claims

CLAIMS:

1. A method of determining at least one coordinate of a reference point on an object in three-dimensional space obtained by an image sensor, said object being irradiated by a plurality of light sources, at least one of said plurality of light sources irradiating a different pattern of light compared to a pattern of light irradiated by another one of said plurality of light sources, said method comprising:

generating a composite pattern of light on the object by irradiating each pattern of light by said plurality of light sources which have a known geometric relation in the three- dimensional space;

obtaining the composite pattern of light at the reference point with the image sensor;

determining from the obtained composite pattern of light, at least one parameter, used in the determination of at least one coordinate of the reference point, from the plurality of light sources, wherein the parameter is determined independent of a position of the image sensor; and

determining at least one coordinate at the reference point using the parameter.

2. A method according to claim 1 , wherein the plurality of light sources comprise spatio-temporally modulated light sources, and at least one of the plurality of spatio- temporally modulated light sources is modulated at a different temporal frequency to another one of said plurality of spatio-temporally modulated light sources.

3. A method according to claim 1 , wherein the composite pattern of light forms a composite phase signal on the object by a predetermined geometric arrangement of the plurality of spatio-temporally modulated light sources, each one of said plurality of light sources having a known position in the three-dimensional space.

4. A method according to claim 1 , wherein:

the obtaining comprises the image sensor capturing the composite pattern of light at the reference point; and

the determining comprises:

determining from the captured composite pattern of light, a set of measured positioning parameters, used in the determination of at least two coordinates of the reference point, wherein the set of measured positioning parameters is determined independent of a position of the image sensor; and

5. A method according to claim 4, wherein one of the at least two coordinates is a depth coordinate.

6. A method according to claim 1 , wherein each of said plurality of light sources is characterised by at least one known positioning parameter with respect to a reference line through said reference point.

7. A method according to claim 2, wherein the difference in temporal frequency of the at least one light source results from at least one of:

(i) a different spatial frequency of a pattern on the at least one light source;

(ii) a different rotation velocity of the at least one light source; and

(iii) a different orientation of the at least one light source.

8. A method according to claim 1, wherein each said light source comprises multiple intersecting patterns to create a two-dimensional signal.

9. A method according to claim 5, wherein the patterns are orthogonal.

10. A method according to claim 1 , wherein each said light source comprises a rotating pattern surrounding the light source.

1 1. A method according to claim 3, wherein the composite phase signal forms a wavefront that is radial to the corresponding light source.

12. A method according to claim 1 , wherein the patterns are sinusoidal.

13. A method according to claim 4, wherein the measured positioning parameters comprise an angular displacement from the light source.

14. A method according to claim 1, wherein the object is in a three-dimensional space and the method determines the three-dimensional coordinates of the reference point in the three-dimensional space.

15. A method according to claim 1 , wherein the parameter is measured with respect to a reference line through each of the plurality of light sources, independent of a position of the image sensor.

16. A method according to claim 3, further comprising algebraic demodulation of the obtained composite pattern of light to form light carrier amplitudes and phases from which the at least one parameter is determined.

17. A robotic system comprising:

an image sensor arranged for imaging a scene formed at least by the object;

a plurality of spatio-temporally modulated light sources configured to illuminate the scene, at least one of said plurality of spatio-temporally modulated light sources being modulated at a different temporal frequency to another one of said plurality of spatio- temporally modulated light sources, each one of said plurality of light sources having a known position in the three-dimensional space;

generate a composite light pattern on the object in the scene by a predetermined geometric arrangement of the plurality of spatio-temporally modulated light sources, the composite light pattern including a composite phase signal;

capture with the image sensor at least the composite phase signal at a reference point on the object;

determining the at least two coordinates of the reference point using the set of measured positioning parameters from the plurality of light sources; and controlling a position of the robotic manipulator based on the determined coordinates of the reference point.

18. A robotic system according to claim 17 wherein the image sensor is mounted upon^x the robotic manipulator.

19. A robotic system according to claim 17 wherein the image sensor is located at the reference point.

20. A robotic system according to claim 19 wherein the image sensor is a photodiode.

21. A computer readable storage medium having a program recorded thereon, the program being executable by computer apparatus to determine at least one coordinate of a reference point on an object in three-dimensional space obtained by an image sensor, said object being irradiated by a plurality of light sources, at least one of said plurality of light sources irradiating the object with a different pattern of light compared to a pattern of light irradiated by another one of said plurality of light sources, said program comprising:

code for generating a composite pattern of light on the object by irradiating each pattern of light by said plurality of light sources which have a known geometric relation in the three-dimensional space;

code for obtaining the composite pattern of light at the reference point with the image sensor;

code for determining from the obtained composite pattern of light, at least one parameter, used in the determination of the at least one coordinate of the reference point, from the plurality of light sources, wherein parameter is determined independent of a position of the image sensor; and

code for determining at least one coordinates at the reference point using the parameter from the plurality of light sources.

22. Computer apparatus for determining at least two coordinates of a reference point on an object in three-dimensional space in a scene captured by an image sensor, said object being irradiated simultaneously by a plurality of spatio-temporally modulated light sources, at least one of said plurality of spatio-temporally modulated light sources being modulated at a different temporal frequency to another one of said plurality of spatio- temporally modulated light sources, said apparatus comprising:

means for generating a composite intensity signal on the object in the scene according to a predetermined geometric arrangement of the plurality of spatio-temporally modulated light sources, each one of said plurality of light sources having a known position in the three-dimensional space;

means for capturing the composite intensity signal at the reference point with the image sensor;

means for determining from the captured composite intensity signal, a set of measured positioning parameters, used in the determination of at least two coordinates of the reference point, from the plurality of light sources, wherein the set of measured positioning parameters is determined independent of a position of the image sensor; and means for determining at least two coordinates at the reference point using the set of measured positioning parameters from the plurality of light sources.