Based on video real-time face replacement method and the system thereof of local affine invariant and color transfer technology
Technical field
The present invention relates to technical field of image processing, specifically based on video real-time face replacement method and the system thereof of local affine invariant and color transfer technology.
Background technology
Face replacement technology is an important research direction in computer vision field, instead of the softwares such as photoshop due to face replacement technology and carry out the various drawbacks such as picture editting's fusion by hand, thus have tremendous influence in business, amusement and some special industries thereof.Such as, some the dangerous play sheets needing acrobat to perform are all that stunt double first takes, and then in post-production, use face automatic replacement technology the face of acrobat to be replaced to personage's face of needs, reach the object that film is final.Present stage, a lot of researchist proposed different replacement policies, achieved certain success, but cannot realize when being applied in video.Main cause is the singularity of video human face replacement technology, it needs to accomplish real-time, fast speed and high-level efficiency, and traditional face replacement technology exist calculated amount large, have the shortcoming that aberration, time consumption are high, and the effect after replacing is poor, therefore can not be used in the Video Applications of real life.
Several face method of replacing of the prior art all cannot be applied to during video processes in real time, face method of replacing as based on Poisson editor: it is according to original face semantic information, in facial image database, automatically choose the target face of semantic similitude and build its three-dimensional model, then according to the attitude of original face and illumination estimation information, target face is adjusted, target face and the seamless fusion of original image is made afterwards by Poisson image fusion technology, make the seamless fusion of target face and original image/video, reach and replace nature true to nature, thus realize the automatic replacement of face.Its advantage is that the face gap after merging is not obvious; Its shortcoming is that the image after merging has aberration and operand is large, causes real-time poor, can not be used for video.
For another example based on the face replacement method of 3D face database, first the method carries out face registration and off-line learning to download face database, obtains the parameters such as the attitude of face in each image, illumination and expression; When input 1 image, system can carry out Face datection, obtain in image data base the face similar with inputting human face posture, resolution, illumination as face replace alternative; Carry out color adjustment and illumination to candidate afterwards to correct and input face mate, and complete image co-registration; Finally devise a combination of edge evaluation function to sort to gained fusion results, with merging best result as final replacement result.Its advantage be can syncretizing effect better; Its shortcoming to ensure that replacing target face with 1 face of specifying just can reach good effect; Real-time is poor.
For another example based on the face replacement method of real-time video, the method adopts AAM method to detect face and unique point thereof, then gets target face according to unique point, source images face template; According to triangle Affine Principle, by the shape of affine for the face shape of source images one-tenth target face, then adjust color weight, make the color of source images face as far as possible consistent with the color of target face.Its advantage is that real-time is good, does not postpone; Shortcoming is that Face datection rate is low; Face shape after affine and target person face shape have gap.Color adjustment is not change automatically, just an empirical value.
Above three kinds of methods are methods comparatively popular recently, and other method is all the improvement done on its basis, although Poisson editor algorithm fusion effect is better than real-time video algorithm, time efficiency but cannot meet video real-time demand; Method syncretizing effect based on 3D face database is best, but time consumption is also the highest, processes a frame time and reaches 9 seconds.
For the limitation that various face replacement technology exists, under existing hardware condition, how designing a kind of real-time face replacement method that can use in Video Applications effectively and rapidly has become the current technical matters being badly in need of solving.
Summary of the invention
The object of the invention is cannot accomplish to solve in prior art the defect that real-time face is replaced, providing a kind of video real-time face replacement method based on local affine invariant and color transfer technology and system thereof to solve the problems referred to above.
To achieve these goals, technical scheme of the present invention is as follows:
Based on the video real-time face replacement method of local affine invariant and color transfer technology, comprise the following steps:
Video acquisition, obtains target face video by camera, and intercepts current frame image;
Carry out face and feature point detection thereof, to the target facial image got, strong classifier is used to detect face, to the sample training in face database on the basis of the shape built, sample characteristics point in target person face characteristic point and face database is mated, make corresponding conversion, and unique point search, position mark are carried out to the target face detected;
Carry out face conversion, utilize affine transformation parameter, by the sample face affined transformation in face database to the target person in video on the face, carry out the replacement of face;
Carry out face fusion process, by the target person after the color transfer of the target face before conversion to conversion on the face, adopt Laplce gaussian pyramid to carry out fusion treatment, and be shown in the image that camera gets after the smoothing process of Gaussian filter is adopted to the edge of the target facial image after fusion treatment;
Check whether camera closes, if close, then terminate face and replace; If do not close, then proceed video acquisition and obtain current frame image, proceeding Face datection, face conversion and face fusion step.
Described carry out face and feature point detection comprises the following steps:
Read the sample data in face database, construct several sorters, and several classifier stage are unified into a strong classifier;
Read current video two field picture, use strong classifier to carry out Face datection to current video two field picture;
Build shape and train;
Structure human face characteristic point, obtains affine transformation parameter.
Described face conversion of carrying out comprises the following steps:
The coordinate of each unique point is calculated to the unique point of sample in target face and face database, and adjacent any three points are carried out unique point trigonometric ratio;
Utilize affine transformation parameter, the unique point after face database intermediate cam is mapped to target person on the face respectively, be transformed to the shape of target face, the affined transformation concrete steps of its structure are as follows:
For the target face after unique point trigonometric ratio and the sample in face database, find out the leg-of-mutton position of each triangle and corresponding target face in face database;
Keep the position relationship of each unique point in face database, by the position of each unique point Triangular Maps in face database to target person face characteristic point triangle place.
Described face fusion process of carrying out comprises the following steps:
At l α β spatially by the target person after the color transfer of the target face before conversion to conversion on the face, wherein l represents achromatic luminance channel, and α represents colored champac passage, and β represents red green passage;
The method of Laplce's gaussian pyramid is adopted to carry out fusion treatment to the different scale of target face after migration, the image of different decomposition layer;
Fringe region is found to the target facial image after fusion treatment, adopts the smoothing process of Gaussian filter.
Described several sorters that construct comprise the following steps:
Be uniformly distributed each sample in face database, obtain preliminary classification device H by training
0;
Classification judges, correct for classification, reduces its distribution probability; Classification error, improve its distribution probability, obtain a new training set S
1;
Obtain sorter, use training set S
1train, obtain sorter H
1;
Iteration is carried out classification and is judged and obtain sorter step T time altogether, obtains { H
1, H
2..., H
tbe total to T sorter.
Described structure shape also carries out training package and draws together following steps:
Collect 400 face training samples, and mark each facial feature points in sample;
The coordinate of unique point in training set is conspired to create proper vector;
Shape facility is normalized and registration process;
To the shape facility principal component analyzing and processing after alignment, the structure concrete steps of principal component analysis (PCA) are as follows:
Input x, x=[x
1x
2x
m]
tbe m unit vector variable, calculate the sample matrix of x, its computing formula is as follows:
Wherein
j=1,2 ..., n is variable x
i, i=1,2 ..., the discrete sampling of m;
Calculate the average value mu of i-th row of sample matrix X
i, its computing formula is as follows:
Calculate the center square of i-th row of sample matrix X
its computing formula is as follows:
Wherein
Calculate the center square of sample matrix X
its computing formula is as follows:
The covariance matrix Ω of computing center's distance, its computing formula is as follows:
Wherein φ=[φ
1φ
2φ
m] be the orthogonal vector matrix of m × m, Λ=diag{ λ
1, λ
2..., λ
m, λ
1>=λ
2>=...>=λ
mit is diagonal angle eigenvalue matrix;
Calculate orthogonal transition matrix P, its computing formula is as follows:
P=φ
T;
Orthogonal transition matrix is associated with
obtain principal component analysis (PCA)
For the facial feature points of each sample builds local feature.
Described structure human face characteristic point comprises the following steps:
Characteristic point position calculates, and calculates the position of target person face characteristic point, and does yardstick and rotate change;
Each local feature region of the sample in the unique point of target face and face database is mated, calculates each local feature region and correspond to target person new position on the face;
Iteration carries out above-mentioned steps, obtains affine transformation parameter.
The described method of Laplce's gaussian pyramid that utilizes is carried out fusion treatment comprise the following steps the different scale of target face after migration, the image of different decomposition layer:
Adopt gaussian pyramid to obtain down-sampled images on several different spaces layers, multiple dimensioned for the target face after color transfer, construct image pyramid, the gaussian pyramid concrete steps of structure are as follows:
Target facial image G after input color transfer
0, with G
0as the 0th layer of gaussian pyramid;
To original input picture G
0carry out Gassian low-pass filter and the interlacing down-sampling every row, obtain the ground floor image G of gaussian pyramid
1;
To ground floor image G
1carry out Gassian low-pass filter and the interlacing down-sampling every row, obtain the second layer image G of gaussian pyramid
2;
To l-1 tomographic image G
l-1carry out Gassian low-pass filter and the interlacing down-sampling every row, obtain l (1≤l≤N) the tomographic image G of gaussian pyramid
l,
Wherein N is the maximum number of plies of gaussian pyramid, R
land C
lbe respectively line number and the columns of the l layer of gaussian pyramid,
the window function of a two dimension separable 5 × 5,
Repeat above process, form final gaussian pyramid;
Adopt laplacian pyramid to reconstruct upper layer images to up-sampling from pyramid top layer images to the pyramid diagram picture on the different frequency bands of different decomposition layers, the laplacian pyramid concrete steps of structure are as follows:
To the top layer images G of the gaussian pyramid obtained
ninterpolation method is used to be amplified image G
n *, wherein N is the maximum number of plies of gaussian pyramid;
To N-1 tomographic image G
n-1interpolation method is used to be amplified image
By the l tomographic image G of gaussian pyramid
linterpolation method is used to be amplified image G
l *, its computing formula is as follows:
Wherein N is the maximum number of plies of gaussian pyramid, R
land C
lbe respectively line number and the columns of the l layer of gaussian pyramid,
the window function of a two dimension separable 5 × 5,
Calculate
Calculate
wherein N is for drawing the pyramidal maximum number of plies of Price, LP
lit is the l tomographic image of Laplacian pyramid;
Repeat the calculating of each layer of gaussian pyramid layer, obtain laplacian pyramid LP
0, LP
1..., LP
l..., LP
n;
To rebuild after image merge, fusion treatment.
Based on the video real-time face replacement system of local affine invariant and color transfer technology, comprising:
Video acquisition module, for gathering the facial image of each frame in the video that obtains under camera;
Sorter constructing module, for carrying out Face datection to the image in the video got;
Shape training module, for building local feature for everyone face characteristic point, sets up out the position constraint of each unique point;
Principal component analysis (PCA) module, for doing feature extraction process by shape constructing module to shape facility;
Human face characteristic point search module, calculates the position at unique point place for seeker's face characteristic point;
Face affined transformation module, for being mapped to the relevant position of target face by the sample in face database;
Based on Laplce's gaussian pyramid image co-registration module, for the target face after color transfer is carried out corresponding fusion treatment;
Described video acquisition load module is connected with sorter tectonic model, the output terminal of described sorter constructing module is connected with principal component analysis (PCA) module with shape training module respectively, shape training module is connected with the input end of human face characteristic point search module respectively with principal component analysis (PCA) module, the output terminal of human face characteristic point search module is connected with face affined transformation module, and the output terminal of face affined transformation module is connected with based on Laplce's gaussian pyramid image co-registration module.
Beneficial effect
Video real-time face replacement method based on local affine invariant and color transfer technology of the present invention and system thereof, compared with prior art improve quality, speed and efficiency that face is replaced, can be used in replacing in real time in video.Utilize strong classifier can detect face accurately; Utilize the shape of structure, unique point that the application of principal component analysis (PCA) and face characteristic point search can detect face fast; Utilize local affine transformations the sample in face database can be mapped to target person on the face accurately; Utilize the image fusion technology of Laplce's gaussian pyramid, the target face after color transfer can be carried out decomposing, rebuilding, and then fusion treatment accurately.Whole face replacement process is real-time under camera, carry out accurately, fast, breach prior art and carry out the demand that time consumption when face is replaced is high, cannot meet video real-time, the image after simultaneously replacing has the defect that aberration, operand are large.
Accompanying drawing explanation
Fig. 1 is method precedence diagram of the present invention;
Fig. 2 is system connection layout of the present invention.
Embodiment
For making to have a better understanding and awareness architectural feature of the present invention and effect of reaching, coordinating detailed description in order to preferred embodiment and accompanying drawing, being described as follows:
As shown in Figure 1, the video real-time face replacement method based on local affine invariant and color transfer technology of the present invention, comprises the following steps:
The first step, video acquisition, reads the video under current camera, obtains target face video, and intercept current frame image, for post-processed is prepared by camera.
Second step, carries out face and feature point detection thereof.To the target facial image got, strong classifier is used to detect face, to the sample training in face database on the basis of the shape built, sample characteristics point in target person face characteristic point and face database is mated, make corresponding conversion, and unique point search, position mark are carried out to the target face detected.Feature point detection replaces in order to later stage face to prepare herein, if human face characteristic point can detect precisely, then the efficiency that processes in real time will be higher the later stage, otherwise, face and unique point thereof will be detected always, until feature point detection arrives, therefore here feature point detection is very crucial, is related to that can to carry out real-time face replaces.Its concrete steps are as follows:
(1) read the sample data in face database, construct several sorters, and several classifier stage are unified into a strong classifier.Construct several sorters to comprise the following steps:
A, each sample be uniformly distributed in face database, face database is the database preserving numerous face information, obtains preliminary classification device H by training
0.
B, classification judge, correct for classification, reduce its distribution probability; Classification error, improve its distribution probability, obtain a new training set S
1.
C, obtain new sorter, use training set S
1train, obtain new sorter H
1.
D, iteration are carried out classification and are judged and obtain sorter step T time altogether, obtain { H
1, H
2..., H
tbe total to T sorter.Each and every one classifier stage of T is unified into a strong classifier, and uses this strong classifier to carry out Face datection to the current frame image obtained.
(2) read current video two field picture, use strong classifier to carry out Face datection to current video two field picture.
(3) build shape and train, it specifically comprises the following steps:
A, collection 400 face training samples, face training sample quantity can adjust accordingly, and Sample Storehouse is better at most, but the time of simultaneous training sample can be more.Hand labeled goes out each facial feature points in sample.If the facial feature points of the star's face in storehouse all marks with software, may occur the situation of mark by mistake, so the later stage is detected face again and replaces and just there will be error.Therefore the collection of Sample Storehouse is artificial acquisition mode, then artificially marks, and artificially marks when training, and when the later stage carries out identifications target face, software just understands automatic mark to target person on the face.Here Sample Storehouse is star's face storehouse, in actual applications, first will establish storehouse, then establish template.
B, the coordinate of unique point in training set is conspired to create proper vector.
C, be normalized and registration process shape facility, shape facility refers to that hand labeled goes out the facial feature points in sample, as eyes, cheekbone etc.
D, to alignment after shape facility principal component analyzing and processing, the structure concrete steps of principal component analysis (PCA) are as follows:
A, input x, x=[x
1x
2x
m]
tbe m unit vector variable, calculate the sample matrix of x, its computing formula is as follows:
Wherein
j=1,2 ..., n is variable x
i, i=1,2 ..., the discrete sampling of m;
The average value mu of i-th row of b, calculating sample matrix X
i, its computing formula is as follows:
The center square of i-th row of c, calculating sample matrix X
its computing formula is as follows:
Wherein
The center square of d, calculating sample matrix X
its computing formula is as follows:
The covariance matrix Ω of e, computing center's distance, its computing formula is as follows:
Wherein φ=[φ
1φ
2φ
m] be the orthogonal vector matrix of m × m, Λ=diag{ λ
1, λ
2..., λ
m, λ
1>=λ
2>=...>=λ
mit is diagonal angle eigenvalue matrix;
F, calculate orthogonal transition matrix P, its computing formula is as follows:
P=φ
T;
G, orthogonal transition matrix to be associated with
obtain principal component analysis (PCA)
H, for each sample facial feature points build local feature.
(4) construct human face characteristic point, obtain affine transformation parameter.Its concrete steps are as follows:
A, characteristic point position calculate.Calculate the position of target person face characteristic point, and do simple yardstick and rotate change.Due to the face size in star's face storehouse and varying in size of target face, the position of such unique point is certainly just different, and for example, star's face storehouse star is bold, and target face is little face, and that just needs to do yardstick and rotation has changed.
B, each local feature region of the sample in the unique point of target face and face database to be mated, calculate each local feature region and correspond to target person new position on the face;
C, iteration carry out above-mentioned steps, obtain affine transformation parameter.Affine transformation parameter is the transition matrix produced in the local feature region matching process of sample in the unique point of target face and face database, such as target person on the face some points on sample certain point on, we need from a point to another point, that just needs conversion, such as adds or subtraction, or division etc., this is point-to-point conversion, that is if on the required sample of feature point pairs of all target faces in all unique points, that has been exactly matrix, and we are called transformation matrix.
3rd step, carries out face conversion.Utilize affine transformation parameter, by the sample face affined transformation in face database to the target person in video on the face, carry out the replacement of face.Local affine invariant object makes the shape of star's face corresponding consistent with the shape of target face, in face replacement process, seem truer.In simple terms, the thing that affine method is done is exactly by a reflection of graphics to another position, the size of figure itself in mapping process, and shape may change, but still maintains the inner position relationship between points of figure.Affine basis is triangle polyester fibre method, so will carry out affined transformation must calculate three point coordinate i.e. unique point trigonometric ratio.View picture changing image is divided into each sub regions, affined transformation is carried out to each sub regions, based on the image ratio of local affine transformations based on the effect of the image of global change closer to target image.Its concrete steps are as follows:
(1) coordinate of each unique point is calculated to the unique point of sample in target face and face database, and adjacent any three points are carried out unique point trigonometric ratio.
(2) utilize affine transformation parameter, the unique point after face database intermediate cam is mapped to target person on the face respectively, be transformed to the shape of target face, the affined transformation concrete steps of its structure are as follows:
A, for the target face after unique point trigonometric ratio and the sample in face database, find out the leg-of-mutton position of each triangle in face database and corresponding target face.
The position relationship of each unique point in B, maintenance face database, by the position of each unique point Triangular Maps in face database to target person face characteristic point triangle place.
4th step, carries out face fusion process.By the target person after the color transfer of the target face before conversion to conversion on the face, adopt Laplce gaussian pyramid to carry out fusion treatment, and be shown to after the smoothing process of Gaussian filter is adopted to the edge of the target facial image after fusion treatment in the image that camera gets.Because in star's face colour of skin and reality, the colour of skin of target face there are differences, avoid the gap of merging rear generation in order to more effective, introduce color transfer method.Above local affine invariant method and color transfer method consuming time all smaller, so by two kinds of method integrated uses, put and can produce method more efficient than prior art in video.Simultaneously in order to improve the effect after face fusion further, utilize laplacian image fusion method at this, by the image co-registration after color transfer in target face, effect will be much better than art methods.Its concrete steps are as follows:
(1) at l α β spatially by the target person after the color transfer of the target face before conversion to conversion on the face, wherein l represents achromatic luminance channel, and α represents colored champac passage, and β represents red green passage.The basis in ShiLMS space, l α β space is set up, because three, LMS space interchannel has larger correlativity, brings certain difficulty to image processing process.For this situation, propose based on l α β color space, wherein, l represents achromatic luminance channel, and α represents colored champac passage, and β represents red green passage, different from other colour systems, and l α β space is more suitable for human visual perception system.To natural scene, 1Op triple channel nearly orthogonal, interchannel correlativity can drop to minimum.
(2) method of Laplce's gaussian pyramid is adopted to carry out fusion treatment to the different scale of target face after migration, the image of different decomposition layer.Laplacian pyramid is a kind of method of multiple dimensioned, multiresolution.Fusion process based on the Image Fusion of pyramid decomposition is carried out respectively on different scale, different spatial resolutions and different decomposition layer, better syncretizing effect can be obtained compared with simple image blending algorithm, simultaneously can in occasion use widely.It comprises the following steps:
A, for after color transfer target face adopt gaussian pyramid obtain down-sampled images on several different spaces layers, multiple dimensioned, construct image pyramid.The gaussian pyramid concrete steps of structure are as follows:
Target facial image G after a, input color transfer
0, with G
0as the 0th layer of gaussian pyramid;
B, to original input picture G
0carry out Gassian low-pass filter and the interlacing down-sampling every row, obtain the ground floor image G of gaussian pyramid
1;
C, to ground floor image G
1carry out Gassian low-pass filter and the interlacing down-sampling every row, obtain the second layer image G of gaussian pyramid
2;
D, to l-1 tomographic image G
l-1carry out Gassian low-pass filter and the interlacing down-sampling every row, obtain l (1≤l≤N) the tomographic image G of gaussian pyramid
l,
Wherein N is the maximum number of plies of gaussian pyramid, R
land C
lbe respectively line number and the columns of the l layer of gaussian pyramid,
the window function of a two dimension separable 5 × 5,
E, repeat above process, form final gaussian pyramid.
B, on the different frequency bands of different decomposition layers pyramid diagram picture adopt laplacian pyramid from pyramid top layer images, reconstruct upper layer images to up-sampling, the laplacian pyramid concrete steps of structure are as follows:
A, top layer images G to the gaussian pyramid obtained
ninterpolation method is used to be amplified image G
n *, wherein N is the maximum number of plies of gaussian pyramid;
B, to N-1 tomographic image G
n-1interpolation method is used to be amplified image
C, by the l tomographic image G of gaussian pyramid
linterpolation method is used to be amplified image G
l *, its computing formula is as follows:
Wherein N is the maximum number of plies of gaussian pyramid, R
land C
lbe respectively line number and the columns of the l layer of gaussian pyramid,
the window function of a two dimension separable 5 × 5,
D, calculating
E, calculating
wherein N is for drawing the pyramidal maximum number of plies of Price, LP
lit is the l tomographic image of Laplacian pyramid;
The calculating of f, each layer of repetition gaussian pyramid layer, obtains laplacian pyramid LP
0, LP
1..., LP
l..., LP
n.
C, to rebuild after image merge, fusion treatment.
(3) fringe region is found to the target facial image after fusion treatment, adopt the smoothing process of Gaussian filter.Edge-smoothing transition when the face of fusion being put into camera image frame by Gauss's edge filter method.
5th step, checks whether camera closes, if close, then terminates face and replaces; If do not close, then proceed video acquisition and obtain current frame image, proceeding Face datection, face conversion and face fusion step.
As shown in Figure 2, based on the video real-time face replacement system of local affine invariant and color transfer technology, comprising:
Video acquisition module, for gathering the facial image of each frame in the video that obtains under camera;
Sorter constructing module, for carrying out Face datection to the image in the video got;
Shape training module, for building local feature for everyone face characteristic point, sets up out the position constraint of each unique point;
Principal component analysis (PCA) module, for doing feature extraction process by shape constructing module to shape facility;
Human face characteristic point search module, calculates the position at unique point place for seeker's face characteristic point;
Face affined transformation module, for being mapped to the relevant position of target face by the sample in face database;
Based on Laplce's gaussian pyramid image co-registration module, for the target face after color transfer is carried out corresponding fusion treatment;
Described video acquisition load module is connected with sorter tectonic model, the output terminal of described sorter constructing module is connected with principal component analysis (PCA) module with shape training module respectively, shape training module is connected with the input end of human face characteristic point search module respectively with principal component analysis (PCA) module, the output terminal of human face characteristic point search module is connected with face affined transformation module, and the output terminal of face affined transformation module is connected with based on Laplce's gaussian pyramid image co-registration module.
More than show and describe ultimate principle of the present invention, principal character and advantage of the present invention.The technician of the industry should understand; the present invention is not restricted to the described embodiments; the just principle of the present invention described in above-described embodiment and instructions; the present invention also has various changes and modifications without departing from the spirit and scope of the present invention, and these changes and improvements all fall in claimed scope of the present invention.The protection domain of application claims is defined by appending claims and equivalent thereof.