WO2010083018A1 - Segmenting grass regions and playfield in sports videos - Google Patents

Segmenting grass regions and playfield in sports videos Download PDF

Info

Publication number
WO2010083018A1
WO2010083018A1 PCT/US2010/000004 US2010000004W WO2010083018A1 WO 2010083018 A1 WO2010083018 A1 WO 2010083018A1 US 2010000004 W US2010000004 W US 2010000004W WO 2010083018 A1 WO2010083018 A1 WO 2010083018A1
Authority
WO
WIPO (PCT)
Prior art keywords
grass
mask
pixels
generating
image
Prior art date
Application number
PCT/US2010/000004
Other languages
French (fr)
Inventor
Jesus Barcons-Palau
Sitaram Bhagavathy
Joan Llach
Mithun George Jacob
Original Assignee
Thomson Licensing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing filed Critical Thomson Licensing
Publication of WO2010083018A1 publication Critical patent/WO2010083018A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/155Segmentation; Edge detection involving morphological operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/143Segmentation; Edge detection involving probabilistic approaches, e.g. Markov random field [MRF] modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20036Morphological image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30221Sports video; Sports image
    • G06T2207/30228Playing field

Definitions

  • the present invention generally relates to digital image analysis, and more particularly to the detection of features in video images.
  • grass detection In sports video analysis, such as the analysis of video of a soccer match, the detection of certain features, such as grass in the playfield, is one of the first steps for a wide variety of applications that deal with such content.
  • Applications that can benefit from accurate grass detection include, for example, object detection/highlighting technologies.
  • object detection/highlighting technologies In the particular case of detecting and highlighting objects of interest inside the playfield (such as the ball and the players), a mask of the grass provides information about where such objects can be found.
  • Another application for grass detection includes the classification of scenes into different camera views (e.g. far-view, close-up, etc.) by evaluating the characteristics of the playfield and the objects inside it.
  • a further application includes video compression, in which the grass mask can be passed to the encoder as metadata. Such information may be used, for example, to control the bit budget allocated to encode such regions.
  • yet another application includes video reframing. Since the action of a soccer match is supposed to happen inside the playfield, the grass mask can be passed as metadata to a reframing application to preserve zones of interest.
  • a simple way to represent the color of grass is by using a constant mean color value that is obtained through prior statistics over a large data set.
  • the color distance between a pixel and the mean value of the field could be used to determine whether the pixel is to be classified as grass or not.
  • the hue component defined in Smith's hexagonal cone model can be used to detect the green colored pixels given a certain range.
  • Some statistical detection methods involve learning a Gaussian or mixture-of- Gaussian (MoG) color model for the grass inside the playfield, which can be incrementally adapted using the Expectation-Maximization (EM) algorithm.
  • EM Expectation-Maximization
  • a non-parametric color model for the grass is the histogram.
  • Another approach to grass detection is to find the dominant color, assuming that the field has a uniform color of green and occupies the largest area in each frame. (See S. Choi et al. cited above.)
  • Yet another approach uses ground truth data of grass and non-grass pixels to create two 3D histograms. (See Y. Huang et al.
  • a method is provided of segmenting grass regions in an image, such as a frame of a sports video.
  • the exemplary method includes: generating likelihood grass and non- grass masks; generating a grass mask using the likelihood masks; performing a first clean-up of the grass mask to generate an intermediate grass mask; generating a playfield mask using the intermediate grass mask; refining the intermediate grass mask; performing a second clean-up of the refined intermediate grass mask; constraining the grass pixels of the grass mask to be with the playfield mask; and performing a third clean-up to generate a final grass mask.
  • the final grass mask provides a robust and accurate representation of grass regions in the video frame.
  • FIG. IA is an illustrative image frame of a playfield
  • FIGs. IB and 1C are a grass mask and a field mask, respectively, derived from the image of FIG. 1 A;
  • FIGs. 2 A and 2B show a flowchart of an exemplary grass and playfield detection method
  • FIG. 3 is a flowchart of an exemplary texture-based grass detection method
  • FIG. 4 is a flowchart of an exemplary RGB histogram-based grass detection method
  • FIG. 5 is a flowchart of an exemplary HSV non-grass detection method
  • FIG. 6 is a flowchart of an exemplary local maxima non-grass detection method; and [0021] FIG. 7 is a block diagram of an exemplary system embodiment of the present invention. Description of Embodiments
  • FIG. IA is an illustrative image frame of a play field, such as may be grabbed from a video of a sporting event such as a soccer match.
  • the image of the playfield contains the grass regions of the field, as well as objects in the field or in the foreground, such as field lines, players, referees, balls and the like.
  • an exemplary method described below outputs a grass mask (GM) specifying pixels belonging to grass regions inside the field, and a playfield mask (PM) specifying all pixels lying inside the playfield.
  • GM grass mask
  • PM playfield mask
  • FIG. IB shows an illustrative grass mask
  • FIG. 1C shows an illustrative playfield mask.
  • FIGs. 2A and 2B show a flowchart of an exemplary method 200 for segmenting grass pixels and playfield pixels from a given soccer video frame.
  • the method entails generating likelihood grass and non-grass masks, and then using the masks to perform grass and playfield detection.
  • the exemplary method includes: at 210 generating likelihood grass and non-grass masks; at 220 generating a grass mask using the likelihood masks computed in 210; at 230 cleaning up the grass mask; at 240 refining the detection of grass pixels in the grass mask; at 250 cleaning-up the refined grass mask; at 260 detecting the playfield and generating a playfield mask using the cleaned-up grass mask from 230; at 270 constraining the grass pixels of the grass mask to be inside the playfield; and at 280 further cleaning up the grass mask by deleting isolated pixels.
  • the generation of likelihood grass and non- grass masks at 210 includes: texture-based grass detection 211, which outputs a binary mask GM te ⁇ t ; RGB histogram-based grass detection 212, which outputs a binary mask C/Mi rgb ; HSV non-grass detection 213, which outputs a binary mask NGM ⁇ , and local maxima (bright spot) detection 214, which outputs a binary mask
  • texture-based grass detection 211 which outputs a binary mask GM te ⁇ t
  • RGB histogram-based grass detection 212 which outputs a binary mask C/Mi rgb
  • local maxima (bright spot) detection 214 which outputs a binary mask
  • the input to each of functions 21 1-214 is a frame grabbed from a video stream depicting a soccer match.
  • the illustrative frame is in a YUV colorspace format but may be in any
  • Functions 211-214 may be carried out in parallel, sequentially or any combination thereof. Moreover, even though four masks are generated in this particular embodiment, a different number of masks can be used. In addition, these masks are not limited to being binary. [0026] Exemplary implementations of functions 211-214 will now be described. Texture-based grass detection
  • FIG. 3 shows a flowchart of an exemplary method 300 for implementation of a texture-based grass detection procedure such as used in the method 200 of FIGs. 2 A and 2B described above.
  • the purpose of method 300 is to prevent foreground objects such as players, goalmouths, lines, and balls from being detected as grass.
  • Texture-based grass detection method 300 isolates possible grass areas based on the assumption that grass areas have a smooth appearance.
  • the standard deviation of a neighborhood of pixels can be used as a local texture descriptor.
  • the standard deviation provides a measure of spatial activity, with a small standard deviation indicating low spatial activity and therefore a smooth surface.
  • the luminance 7 of the frame is normalized by its maximum value in the frame, so that the range of Y in the frame becomes [0, I].
  • a processing loop 320-370 beginning with step 330 in which a local standard deviation image I s ,d is generated using a neighborhood of N s i d ⁇ N std pixels around each pixel in the normalized Y norm .
  • the value of I std computed for each pixel is compared to a threshold T st d- If I std ( ⁇ > y) ⁇ T sld , the pixel at (x, y) in the binary mask GM teM is set to 1 at step 350, otherwise, it is set to 0 at step 360.
  • T st d 5
  • T s ,d 0.2.
  • values for N std have a range of 3 to 7
  • values for T s ,d have a range of 0 to 2.5 for pixel values of 0-255.
  • FIG. 4 shows a flowchart of an exemplary method 400 for implementation of RGB histogram-based grass detection procedure such as used in the method 200 of FIGs. 2A and 2B described above.
  • Method 400 relies on the assumptions that: 1) the playfield is the biggest area containing the most dominant color in the frame, hence, a pixel quite close to the dominant color is likely to be grass; and 2) a very green pixel is likely to be grass.
  • the YUV frame is converted at 410 to the RGB color space.
  • a histogram with N h i st bins is computed at 412 for each color component -R (red), G (green) and B (blue)-of the frame.
  • the bin with the largest value (peak) in each of the three histograms is determined.
  • the values of the bin centers corresponding to the peaks of the R, G and B histograms are determined. These values are designated Rpeak, Gpeak and B peak , respectively.
  • values for S fact have a range of 1.1 to 1.3.
  • the comparisons of R, G, and B values in steps 430 and 450 are based on the assumption that a grass pixel is "more green" than other pixels. This implies that the G component should be higher than the R and B components in a grass pixel. This assumption is tested in the first two comparisons (1) and (2) of step 430 and in the comparisons (6) of step 450, in which the R and B components are scaled and compared to the G component.
  • Step 430 also incorporates the assumption that a majority of the pixels in the field image are grass pixels. Therefore, pixels within an interval 7h, st about the peaks of the R, G and B histograms should belong to grass. This is tested in the last three comparisons (3)-(5) of step 430.
  • the dominant color may correspond to a non- grass area leading to an incorrect detection result.
  • the aforementioned histograms are preferably computed using pixels only from the bottom-half of the frame.
  • the audience is usually at the top of the frame and the playfield lies between the audience and the bottom of the frame. Therefore the color of grass is likely to be dominant in the bottom half of the frame even if it does not dominate the entire frame.
  • This knowledge helps improve the robustness of detection and reduce the complexity of histogram computation as there are fewer pixels to consider.
  • the bottom half of the image frame is selected for computation of the histograms at 412.
  • histograms could be computed for different portions of the frame; for instance: left, right, top, bottom and center. Such portions of the frame may overlap each other. Then, the histogram with the highest, greenest peak would define the dominant color.
  • FIG. 5 is a flowchart of an exemplary method 500 for implementing HSV non-grass detection 213.
  • the input frame if not already in the HSV colorspace, is converted to HSV colorspace format.
  • the hue component H takes on values in the range [0, 360).
  • this operation is carried out over all pixels of the image in processing loop 520-560.
  • H max 210.
  • FIG. 6 is a flowchart of an exemplary method 600 for implementing local maxima non-grass detection 214.
  • the luminance of the frame is convolved at step 620 with a normalized Gaussian kernel G n k generating the result C
  • G n ⁇ has values of 5, 7, 9 and 11.
  • grass detection is performed at 220 to generate an initial grass mask G ⁇ fj nJt indicative of pixels representing grass in the playfield.
  • G ⁇ fj nJt indicative of pixels representing grass in the playfield.
  • Operation then proceeds to 240 in which the intermediate grass mask is refined to generate refined grass mask GM refined .
  • the intermediate grass mask is refined to generate refined grass mask GM refined .
  • the corresponding values of non- grass masks NGM hsv and NGM] QCmax are tested at 242.
  • the corresponding value of the refined grass mask GM ⁇ e nne ⁇ (x,y) is set to 0 if it is determined at 242 that NGMhsv(x,y) ⁇ 1. If not, operation proceeds to 244.
  • GM K r me d(x,y) is set to 1.
  • a further opening morphological operation using, for example a 3x3 kernel
  • playfield mask PM is generated at 260.
  • Playfield mask PM indicates those pixels that are in the playfield.
  • playfield mask PM is generated by performing a morphological closing operation on intermediate grass mask GM m tc ⁇ - Any holes in the resulting binary image are closed and the largest connected component is found and output as the playfield mask PM.
  • Playfield mask generation 260 can be carried out before, in parallel or after 240 and 250.
  • the grass pixels as indicated by the refined and further cleaned-up grass mask are constrained to lie inside the playfield. This operation is based on the assumption that the only grass areas of interest lie inside the playfield. This operation can be omitted for applications in which this constraint is not required.
  • all of the pixels of the grass mask where the playfield mask PM is 0 are set to 0 as well. In other words, all pixels outside the detected playfield are set to non- grass.
  • this threshold size is two pixels, whereby isolated, individual pixels are removed. For larger threshold sizes, noisy clusters of connected pixels would be removed as well. The result of this operation is the final grass mask.
  • S fa ⁇ 1.1
  • Ythri 50
  • 7 t hr2 ' 80 values for Sf ac t have a range of 1.0 to 1.3
  • values for F 1J11 - I have a range of 40 to 70
  • values for F t hr2 have a range of 60 to 90.
  • playfield mask generation 260 uses intermediate grass mask GM mi &, instead of, for example, the Final grass mask GM, because intermediate grass mask GM mtr is more likely to have the field as one connected component.
  • the operations applied to the grass mask at 240 and 250 are focused on the rejection of pixels classified as grass and may cause the field to break up into multiple pieces.
  • FIG. 7 is a block diagram of an exemplary system 700 in accordance with the principles of the invention.
  • the system 700 can be used to generate grass and playfield masks, GM and PM, respectively, from a video stream of a sports event such as a soccer match.
  • the system 700 comprises a frame grabber 710 and a digital video editor 720.
  • Frame grabber 710 captures one or more frames of the video stream for processing by digital video editor 720 in accordance with the principles of the invention.
  • Digital video editor 720 comprises processor 721, memory 722 and I/O 723.
  • digital video editor 720 may be implemented as a general purpose computer executing software loaded in memory 722 for carrying out grass and playfield segmenting as described above.
  • Exemplary embodiments in accordance with the principles of the invention can be readily applied to surfaces other than grass covered playfields.
  • surfaces such as clay for tennis or ice for ice hockey can be detected with minor modifications adapting the criteria to the features of the surface.
  • the method can be applied to still images as well as frames from a video stream.
  • the real values from these masks could be combined in determining whether pixels should be classified as grass pixels, as opposed to simply checking if the mask values are 0 or 1, as in steps 222 and 242 of the exemplary method depicted in FIGs. 2A and 2B.

Abstract

A method for segmenting grass and playfield regions in sports videos entails: generating likelihood grass and non-grass masks; generating a grass mask using the likelihood masks; performing a first clean-up of the grass mask to generate an intermediate grass mask; generating a playfield mask using the intermediate grass mask; refining the intermediate grass mask; performing a second clean-up of the refined intermediate grass mask; constraining the grass pixels of the grass mask to be with the playfield mask; and performing a third clean-up to generate a final grass mask. The final grass mask provides a robust and accurate representation of grass regions in the video frame. A soccer application is described.

Description

SEGMENTING GRASS REGIONS AND PLAYFIELD IN SPORTS VIDEOS
Related Patent Applications
[0001] This application claims the benefit under 35 U.S. C. § 1 19(e) of United States Provisional Application No. 61/205,333, filed January 16, 2009, the entire contents of which are hereby incorporated by reference for all purposes into this application.
Field of Invention
[0002] The present invention generally relates to digital image analysis, and more particularly to the detection of features in video images.
Background
[0003] In sports video analysis, such as the analysis of video of a soccer match, the detection of certain features, such as grass in the playfield, is one of the first steps for a wide variety of applications that deal with such content. Applications that can benefit from accurate grass detection include, for example, object detection/highlighting technologies. In the particular case of detecting and highlighting objects of interest inside the playfield (such as the ball and the players), a mask of the grass provides information about where such objects can be found. [0004] Another application for grass detection includes the classification of scenes into different camera views (e.g. far-view, close-up, etc.) by evaluating the characteristics of the playfield and the objects inside it.
[0005] A further application includes video compression, in which the grass mask can be passed to the encoder as metadata. Such information may be used, for example, to control the bit budget allocated to encode such regions.
[0006] And yet another application includes video reframing. Since the action of a soccer match is supposed to happen inside the playfield, the grass mask can be passed as metadata to a reframing application to preserve zones of interest.
[0007] Most previous approaches to the detection of grass in a playfield detect the grass pixels in the playfield using color segmentation and post-processing with morphological operations, such as connected component analysis, in order to limit the search area. (See, e.g., S. Choi et al., "Where are the ball and players? Soccer game analysis with color-based tracking and image mosaic," Intl. Con/, on Image Analysis and Processing, Sep 1997; Y. Liu et al., "Playfield Detection Using Adaptive GMM and Its Application," IEEE ICASSP '05, pp. 421- 424, March 2005; X. Tong et al., "An Effective and Fast Soccer Ball Detection and Tracking Method," /CPi?'O4, pp. 795-798, 2004; O. Utsumi et al., "An object detection method for describing soccer games from video," IEEE ICME '02, 2002; Y. Huang et al., "Players and Ball Detection in Soccer Videos Based on Color Segmentation and Shape Analysis," International Workshop on Multimedia Content Analysis and Mining (MCAM'07), in conjunction with ICMF07, pp. 416-425, June, 2007.)
[0008] A simple way to represent the color of grass is by using a constant mean color value that is obtained through prior statistics over a large data set. The color distance between a pixel and the mean value of the field could be used to determine whether the pixel is to be classified as grass or not. (See X. Tong et al. cited above.) Since the soccer field is roughly green colored, the hue component defined in Smith's hexagonal cone model can be used to detect the green colored pixels given a certain range. (See O. Utsumi et al., cited above.)
[0009] Some statistical detection methods involve learning a Gaussian or mixture-of- Gaussian (MoG) color model for the grass inside the playfield, which can be incrementally adapted using the Expectation-Maximization (EM) algorithm. (See Y. Liu et al. cited above.) A non-parametric color model for the grass is the histogram. [0010] Another approach to grass detection is to find the dominant color, assuming that the field has a uniform color of green and occupies the largest area in each frame. (See S. Choi et al. cited above.) [0011] Yet another approach uses ground truth data of grass and non-grass pixels to create two 3D histograms. (See Y. Huang et al. cited above.) Then, the probability of a given pixel being grass or non-grass is evaluated using the data from the learned histograms. The limitation of this method is that the success of the grass detection is very dependent on the ground truth used for learning the histograms. Summary
[0012] In an exemplary embodiment in accordance with the principles of the invention, a method is provided of segmenting grass regions in an image, such as a frame of a sports video. The exemplary method includes: generating likelihood grass and non- grass masks; generating a grass mask using the likelihood masks; performing a first clean-up of the grass mask to generate an intermediate grass mask; generating a playfield mask using the intermediate grass mask; refining the intermediate grass mask; performing a second clean-up of the refined intermediate grass mask; constraining the grass pixels of the grass mask to be with the playfield mask; and performing a third clean-up to generate a final grass mask. The final grass mask provides a robust and accurate representation of grass regions in the video frame.
[0013] In view of the above, and as will be apparent from the detailed description, other embodiments and features are also possible and fall within the principles of the invention.
Brief Description of the Figures
[0014] Some embodiments of apparatus and/or methods in accordance with embodiments of the present invention are now described, by way of example only, and with reference to the accompanying figures in which: [0015] FIG. IA is an illustrative image frame of a playfield; and FIGs. IB and 1C are a grass mask and a field mask, respectively, derived from the image of FIG. 1 A;
[0016] FIGs. 2 A and 2B show a flowchart of an exemplary grass and playfield detection method;
[0017] FIG. 3 is a flowchart of an exemplary texture-based grass detection method; [0018] FIG. 4 is a flowchart of an exemplary RGB histogram-based grass detection method;
[0019] FIG. 5 is a flowchart of an exemplary HSV non-grass detection method;
[0020] FIG. 6 is a flowchart of an exemplary local maxima non-grass detection method; and [0021] FIG. 7 is a block diagram of an exemplary system embodiment of the present invention. Description of Embodiments
[0022] Other than the inventive concept, the elements shown in the figures are well known and will not be described in detail. For example, other than the inventive concept, familiarity with digital image processing is assumed and not described herein. It should also be noted that embodiments of the invention may be implemented using various combinations of hardware and software. Finally, like-numbers in the figures represent similar elements.
[0023] FIG. IA is an illustrative image frame of a play field, such as may be grabbed from a video of a sporting event such as a soccer match. The image of the playfield contains the grass regions of the field, as well as objects in the field or in the foreground, such as field lines, players, referees, balls and the like. For each frame, an exemplary method described below, outputs a grass mask (GM) specifying pixels belonging to grass regions inside the field, and a playfield mask (PM) specifying all pixels lying inside the playfield. FIG. IB shows an illustrative grass mask and FIG. 1C shows an illustrative playfield mask.
[0024] FIGs. 2A and 2B show a flowchart of an exemplary method 200 for segmenting grass pixels and playfield pixels from a given soccer video frame. At a high level, the method entails generating likelihood grass and non-grass masks, and then using the masks to perform grass and playfield detection. More specifically, the exemplary method includes: at 210 generating likelihood grass and non-grass masks; at 220 generating a grass mask using the likelihood masks computed in 210; at 230 cleaning up the grass mask; at 240 refining the detection of grass pixels in the grass mask; at 250 cleaning-up the refined grass mask; at 260 detecting the playfield and generating a playfield mask using the cleaned-up grass mask from 230; at 270 constraining the grass pixels of the grass mask to be inside the playfield; and at 280 further cleaning up the grass mask by deleting isolated pixels. Each of these steps or procedures will now be described in greater detail.
[0025] In the exemplary method 200, the generation of likelihood grass and non- grass masks at 210 includes: texture-based grass detection 211, which outputs a binary mask GMteχt; RGB histogram-based grass detection 212, which outputs a binary mask C/Mirgb; HSV non-grass detection 213, which outputs a binary mask NGM^, and local maxima (bright spot) detection 214, which outputs a binary mask
Figure imgf000006_0001
The input to each of functions 21 1-214 is a frame grabbed from a video stream depicting a soccer match. The illustrative frame is in a YUV colorspace format but may be in any suitable colorspace format, such as HSV or RGB. Functions 211-214 may be carried out in parallel, sequentially or any combination thereof. Moreover, even though four masks are generated in this particular embodiment, a different number of masks can be used. In addition, these masks are not limited to being binary. [0026] Exemplary implementations of functions 211-214 will now be described. Texture-based grass detection
[0027] FIG. 3 shows a flowchart of an exemplary method 300 for implementation of a texture-based grass detection procedure such as used in the method 200 of FIGs. 2 A and 2B described above. The purpose of method 300 is to prevent foreground objects such as players, goalmouths, lines, and balls from being detected as grass. [0028] Texture-based grass detection method 300 isolates possible grass areas based on the assumption that grass areas have a smooth appearance. The standard deviation of a neighborhood of pixels can be used as a local texture descriptor. The standard deviation provides a measure of spatial activity, with a small standard deviation indicating low spatial activity and therefore a smooth surface. [0029] As shown in FIG. 3, at 310, the luminance 7 of the frame is normalized by its maximum value in the frame, so that the range of Y in the frame becomes [0, I]. This is followed by a processing loop 320-370 beginning with step 330 in which a local standard deviation image Is,d is generated using a neighborhood of Nsid χNstd pixels around each pixel in the normalized Ynorm. [0030] At step 340, the value of Istd computed for each pixel is compared to a threshold Tstd- If Istd(χ >y) ≤ Tsld , the pixel at (x, y) in the binary mask GMteM is set to 1 at step 350, otherwise, it is set to 0 at step 360. In an exemplary embodiment for QVGA or 320x240-pixel video of a soccer match, the following parameter values were used: Ns,d = 5, Ts,d = 0.2. In various embodiments, values for Nstd have a range of 3 to 7, and values for Ts,d have a range of 0 to 2.5 for pixel values of 0-255. [0031] Although the generation of binary mask GMtext can be sensitive to local variations in the color or intensity of the grass surface and the resultant mask can be noisy, it provides valuable information about the grass texture. Moreover, by using this mask in combination with other masks and applying one or more clean-up operations, as described below, the effects of any noise can be reduced or eliminated.
RGB histogram-based grass detection method
[0032] FIG. 4 shows a flowchart of an exemplary method 400 for implementation of RGB histogram-based grass detection procedure such as used in the method 200 of FIGs. 2A and 2B described above. Method 400 relies on the assumptions that: 1) the playfield is the biggest area containing the most dominant color in the frame, hence, a pixel quite close to the dominant color is likely to be grass; and 2) a very green pixel is likely to be grass.
[0033] As shown in FIG. 4, the YUV frame is converted at 410 to the RGB color space. To detect the most dominant color in a frame, a histogram with Nhist bins is computed at 412 for each color component -R (red), G (green) and B (blue)-of the frame. [0034] Then, at 413, the bin with the largest value (peak) in each of the three histograms is determined. Additionally, the values of the bin centers corresponding to the peaks of the R, G and B histograms are determined. These values are designated Rpeak, Gpeak and Bpeak, respectively. [0035] Operation then proceeds to processing loop 420-470 in which each pixel (x, y) of the frame is classified as grass at 440 (i.e., GMhrgb(*,.y) = 1 in the output binary mask) or non-grass at 460 (i.e., GMhrgbføjO - 0 in the output binary mask) depending on the results of comparison tests performed at steps 430 and 450. More specifically, a pixel (xj/) is classified as grass at 440 if it is determined at 430 that: *(x, y) < G(x, y) , and (1)
5(x, y) < G(x, y) , and (2)
|#(x, y) - *Pe<*| < 7;ω , and (3)
|G(χ, y) - GpeoA| < rAOT , and (4)
Figure imgf000008_0001
or if it is determined at 450 that:
SfaclR(x, y) < G(x, y) and SfaclB(x, y) < G(x, y) , (6) where 7h,st is a threshold to bound the difference between the peak and a value of a given component, and 5fact is a scale factor. Otherwise, if a pixel (x, y) fails both tests at 430 and 450, it is classified as non-grass at 460.
[0036] In an exemplary embodiment, the following values are used: Nh,st = 100, T^st = 12 and Sfacl = 1.1. In various embodiments, for a pixel intensity range of 1-256, values for Nhist have a range of 75 to 150, values for Zh,st have a range of 10 to 20 and values for Sfact have a range of 1.1 to 1.3.
[0037] The comparisons of R, G, and B values in steps 430 and 450 are based on the assumption that a grass pixel is "more green" than other pixels. This implies that the G component should be higher than the R and B components in a grass pixel. This assumption is tested in the first two comparisons (1) and (2) of step 430 and in the comparisons (6) of step 450, in which the R and B components are scaled and compared to the G component.
[0038] Step 430 also incorporates the assumption that a majority of the pixels in the field image are grass pixels. Therefore, pixels within an interval 7h,st about the peaks of the R, G and B histograms should belong to grass. This is tested in the last three comparisons (3)-(5) of step 430.
[0039] Occasionally, the dominant color (histogram peaks) may correspond to a non- grass area leading to an incorrect detection result. In order to reduce this possibility and to increase the likelihood of isolating the color of grass, the aforementioned histograms are preferably computed using pixels only from the bottom-half of the frame. In most soccer videos (particularly in the case of far-view scenes) the audience is usually at the top of the frame and the playfield lies between the audience and the bottom of the frame. Therefore the color of grass is likely to be dominant in the bottom half of the frame even if it does not dominate the entire frame. This knowledge helps improve the robustness of detection and reduce the complexity of histogram computation as there are fewer pixels to consider. As such, at 411, the bottom half of the image frame is selected for computation of the histograms at 412.
[0040] In some embodiments, several histograms could be computed for different portions of the frame; for instance: left, right, top, bottom and center. Such portions of the frame may overlap each other. Then, the histogram with the highest, greenest peak would define the dominant color.
HSV non-grass detection
[0041] The aim of texture-based grass detection 21 1 and RGB histogram-based grass detection 212 is to detect regions likely to be grass. HSV non-grass detection 213 detects pixels with a very low likelihood of being grass. This information is used, as described below, to ensure that non-grass pixels are not labeled as grass, thereby reducing the occurrence of false alarms and increasing the accuracy of grass detection. [0042] FIG. 5 is a flowchart of an exemplary method 500 for implementing HSV non-grass detection 213. At step 510, the input frame, if not already in the HSV colorspace, is converted to HSV colorspace format. In the HSV colorspace representation, the hue component H takes on values in the range [0, 360). Although hues in the range [0, 240] contain some amount of green, grass pixels are more likely to have a hue in the range [60, 180]. Using the knowledge that grass is green, method 500 generates the non-grass mask NGM^ as follows: if Hm]n < H(x, y) < Hmax, then NGMhsv(x, y) = 0, else 1.
As shown in FIG. 5, this operation is carried out over all pixels of the image in processing loop 520-560. In an exemplary embodiment the following values are used: •#min = 30 and Hmax = 150. In an alternative exemplary embodiment Hmax = 210.
Local maxima non-grass detection [0043] Like HSV non-grass detection 213, the purpose of local maxima non-grass detection 214 is to detect non-grass pixels. This operation ensures that small bright spots in the playfield (e.g. ball, field lines, goalmouth) are not incorrectly detected as grass. Bright spots are isolated by locating pixels corresponding to local maxima in the luminance component /. [0044] FIG. 6 is a flowchart of an exemplary method 600 for implementing local maxima non-grass detection 214. In order to generate the non-grass mask NGM\ocmax based on the local maxima, the luminance of the frame is convolved at step 620 with a normalized Gaussian kernel Gnk generating the result C|Uma:
The non-grass mask NGMιocmaχ is generated as follows: if Y(x, y) > C,uma(x, y) + Tc], then NGM]0Cmax(x, y) = h else 0.
As shown in FIG. 6, this operation is carried out over all pixels of the image in processing loop 610-660. In an exemplary embodiment, Gn^ is a 9 *9 kernel and Tc\ = 0.1. In various embodiments, Gn^ has values of 5, 7, 9 and 11.
Grass and playfield detection
[0045] Referring again to FIGs. 2A and 2B, using the likelihood grass and non-grass masks generated at 210, grass detection is performed at 220 to generate an initial grass mask GΛfjnJt indicative of pixels representing grass in the playfield. [0046] As shown in FIG. 2 A, in a processing loop 221-225, for each pixel (x, y) in the video frame being processed, the corresponding values of the grass masks GMtext and GMhrgb and the Y component of the frame are tested at 222. At 223, the corresponding initial grass mask value GMmφc,y) is set to 1 if it is determined at 222 that: GMitxt(x,y) = 1, and GMhrgb(x,y) = 1, and Y(x,y) > YtM. If not, at 224, GM(x,y) is set to 0. [0047] Once all pixels have been processed in 220 to generate GMιnα, at 230 an opening morphological operation (using, for example a 3x3 kernel) is applied to
Figure imgf000010_0001
in order to remove small objects. The resultant intermediate grass mask is labeled GMmχ. [0048] Operation then proceeds to 240 in which the intermediate grass mask is refined to generate refined grass mask GMrefined. As shown in FIG. 2B, in a processing loop 241-246, for each pixel (x,y) in the video frame, the corresponding values of non- grass masks NGMhsv and NGM]QCmax are tested at 242. At 243, the corresponding value of the refined grass mask GMτenneά(x,y) is set to 0 if it is determined at 242 that NGMhsv(x,y)
Figure imgf000010_0002
~ 1. If not, operation proceeds to 244. If it is determined at 244 that Sfact x R(x,y) < G(x,y), and Sfact x B(x,y) < G(x,y), and Y(x,y) > 7thr2, then at 245, GMKrmed(x,y) is set to 1. [0049] Once GMKfmtά has been generated upon completion of refinement 240, a further opening morphological operation (using, for example a 3x3 kernel) is applied at 250 to GMrefined in order to remove small objects. [0050] Using intermediate grass mask GMmer generated at 230, playfield mask PM is generated at 260. Playfield mask PM indicates those pixels that are in the playfield. In an exemplary embodiment, playfield mask PM is generated by performing a morphological closing operation on intermediate grass mask GMmtcτ- Any holes in the resulting binary image are closed and the largest connected component is found and output as the playfield mask PM. Playfield mask generation 260 can be carried out before, in parallel or after 240 and 250.
[0051] At 270, using the playfield mask PM, the grass pixels as indicated by the refined and further cleaned-up grass mask are constrained to lie inside the playfield. This operation is based on the assumption that the only grass areas of interest lie inside the playfield. This operation can be omitted for applications in which this constraint is not required. At 270, all of the pixels of the grass mask where the playfield mask PM is 0 are set to 0 as well. In other words, all pixels outside the detected playfield are set to non- grass.
[0052] At 280, components smaller than a given threshold size are removed. In an exemplary embodiment this threshold size is two pixels, whereby isolated, individual pixels are removed. For larger threshold sizes, noisy clusters of connected pixels would be removed as well. The result of this operation is the final grass mask. [0053] In an exemplary embodiment the following parameter values are used: Sfaα = 1.1, Ythri = 50, and 7thr2 ' 80. In various embodiments, values for Sfact have a range of 1.0 to 1.3, values for F1J11-I have a range of 40 to 70, and values for Fthr2 have a range of 60 to 90.
[0054] Note that for applications that do not require limiting the grass mask to those grass pixels in the playfield, this operation can be omitted. If the playfield mask PM is not required, the processing loop in 220 to create the initial grass mask of pixels and the processing loop in 240 to refine the grass mask can be merged into one loop. [0055] Furthermore, in the exemplary embodiment shown, playfield mask generation 260 uses intermediate grass mask GMmi&, instead of, for example, the Final grass mask GM, because intermediate grass mask GMmtr is more likely to have the field as one connected component. The operations applied to the grass mask at 240 and 250 are focused on the rejection of pixels classified as grass and may cause the field to break up into multiple pieces. [0056] FIG. 7 is a block diagram of an exemplary system 700 in accordance with the principles of the invention. The system 700 can be used to generate grass and playfield masks, GM and PM, respectively, from a video stream of a sports event such as a soccer match. The system 700 comprises a frame grabber 710 and a digital video editor 720. Frame grabber 710 captures one or more frames of the video stream for processing by digital video editor 720 in accordance with the principles of the invention. Digital video editor 720 comprises processor 721, memory 722 and I/O 723. In an exemplary embodiment, digital video editor 720 may be implemented as a general purpose computer executing software loaded in memory 722 for carrying out grass and playfield segmenting as described above. [0057] Exemplary embodiments in accordance with the principles of the invention can be readily applied to surfaces other than grass covered playfields. For example, surfaces such as clay for tennis or ice for ice hockey can be detected with minor modifications adapting the criteria to the features of the surface. Moreover, as can be appreciated, the method can be applied to still images as well as frames from a video stream.
[0058] Additionally, in an exemplary embodiment, the likelihood masks, GMext, GMhtgb, NGMhsv and
Figure imgf000012_0001
may be non-binary and real-valued. In such an embodiment, the real values from these masks could be combined in determining whether pixels should be classified as grass pixels, as opposed to simply checking if the mask values are 0 or 1, as in steps 222 and 242 of the exemplary method depicted in FIGs. 2A and 2B.
[0059] In view of the above, the foregoing merely illustrates the principles of the invention and it will thus be appreciated that those skilled in the art will be able to devise numerous alternative arrangements which, although not explicitly described herein, embody the principles of the invention and are within its spirit and scope. For example, although illustrated in the context of separate functional elements, these functional elements may be embodied in one, or more, integrated circuits (ICs). Similarly, although shown as separate elements, some or all of the elements may be implemented in a stored- program-controlled processor, e.g., a digital signal processor or a general purpose processor, which executes associated software, e.g., corresponding to one, or more, steps, which software may be embodied in any of a variety of suitable computer-readable media. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention.

Claims

1. A computer implemented method for segmenting grass regions in an image, comprising: generating at least one likelihood grass mask; generating at least one likelihood non-grass mask; generating a grass mask using the likelihood masks; performing a first clean-up operation of the grass mask to generate an intermediate grass mask; generating a playfield mask using the intermediate grass mask; refining the intermediate grass mask; performing a second clean-up operation of the refined intermediate grass mask; and constraining the refined intermediate grass mask to be within the playfield mask.
2. The method of claim 1, comprising: performing a third clean-up operation to generate a final grass mask.
3. The method of claim 1, wherein the image is a frame of a sports video.
4. The method of claim 1 , wherein generating at least one likelihood grass mask includes: comparing red, green and blue color components of pixels of the image; and designating pixels of the image as grass pixels if the pixels have a green color component that is greater than the red and blue color components.
5. The method of claim 1, wherein generating at least one likelihood grass mask includes: generating a histogram for each of a red, green and blue color component of the image; determining a peak of each of the histograms; and designating pixels of the image as grass pixels if the pixels have red, green and blue color components proximate to the peaks of the histograms.
6. The method of claim 5, wherein the histogram are generated using a lower portion of the image.
7. The method of claim 1, wherein generating at least one likelihood grass mask includes: determining texture information for the image.
8. The method of claim I, wherein generating at least one likelihood non-grass mask includes: performing local maxima detection; and designating local maxima as non-grass.
9. The method of claim 1, wherein generating at least one likelihood non-grass mask includes: designating pixels of the image as non-grass pixels if the pixels have a hue that is not green.
10. The method of claim 1, wherein performing each of the first and second clean-up operations includes performing a morphological opening operation.
11. A computer program recorded on a computer-readable medium, said program causing a computer to segment grass regions in an image by executing the steps of: generating at least one likelihood grass mask; generating at least one likelihood non-grass mask; generating a grass mask using the likelihood masks; performing a first clean-up operation of the grass mask to generate an intermediate grass mask; generating a playfield mask using the intermediate grass mask; refining the intermediate grass mask; performing a second clean-up operation of the refined intermediate grass mask; and constraining the refined intermediate grass mask to be within the playfield mask.
12. The computer program of claim 11 causing the computer to segment grass regions in an image by executing the further step of: performing athiid cVean-up operation to generate a final grass mask.
13. The computer program of claim 11, wherein the image is a frame of a sports video.
14. The computer program of claim 11, wherein generating at least one likelihood grass mask includes: comparing red, green and blue color components of pixels of the image; and designating pixels of the image as grass pixels if the pixels have a green color component that is greater than the red and blue color components.
15. The computer program of claim 11, wherein generating at least one likelihood grass mask includes: generating a histogram for each of a red, green and blue color component of the image; determining a peak of each of the histograms; and designating pixels of the image as grass pixels if the pixels have red, green and blue color components proximate to the peaks of the histograms.
16. The computer program of claim 15, wherein the histogram are generated using a lower portion of the image.
17. The computer program of claim 11, wherein generating at least one likelihood grass mask includes: determining texture information for the image.
18. The. computer pϊogtaαv of claim. 11 , wherein generating, at least one likelihood non-grass mask includes: performing local maxima detection; and designating local maxima as non- grass.
19. The computer program of claim 11, wherein generating at least one likelihood non-grass mask includes: designating pixels of the image as non-grass pixels if the pixels have a hue that is not green.
20. The computer program of claim 1 1, wherein performing each of the first and second clean-up operations includes performing a morphological opening operation.
PCT/US2010/000004 2009-01-16 2010-01-04 Segmenting grass regions and playfield in sports videos WO2010083018A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US20533309P 2009-01-16 2009-01-16
US61/205,333 2009-01-16

Publications (1)

Publication Number Publication Date
WO2010083018A1 true WO2010083018A1 (en) 2010-07-22

Family

ID=42340045

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2010/000004 WO2010083018A1 (en) 2009-01-16 2010-01-04 Segmenting grass regions and playfield in sports videos

Country Status (1)

Country Link
WO (1) WO2010083018A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9020259B2 (en) 2009-07-20 2015-04-28 Thomson Licensing Method for detecting and adapting video processing for far-view scenes in sports video

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5329379A (en) * 1992-10-22 1994-07-12 International Business Machines Corporation System and method of measuring fidelity of decompressed video signals and images
US5841899A (en) * 1995-08-21 1998-11-24 Kabushiki Kaisha Toshiba Specific color field recognition apparatus and method
US20040017389A1 (en) * 2002-07-25 2004-01-29 Hao Pan Summarization of soccer video content
US6738100B2 (en) * 1996-06-07 2004-05-18 Virage, Inc. Method for detecting scene changes in a digital video stream
US20040130567A1 (en) * 2002-08-02 2004-07-08 Ahmet Ekin Automatic soccer video analysis and summarization
US20050114092A1 (en) * 2003-03-31 2005-05-26 Baoxin Li Processing of video content
US20050271269A1 (en) * 2002-03-19 2005-12-08 Sharp Laboratories Of America, Inc. Synchronization of video and data
US7143354B2 (en) * 2001-06-04 2006-11-28 Sharp Laboratories Of America, Inc. Summarization of baseball video content

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5329379A (en) * 1992-10-22 1994-07-12 International Business Machines Corporation System and method of measuring fidelity of decompressed video signals and images
US5841899A (en) * 1995-08-21 1998-11-24 Kabushiki Kaisha Toshiba Specific color field recognition apparatus and method
US6738100B2 (en) * 1996-06-07 2004-05-18 Virage, Inc. Method for detecting scene changes in a digital video stream
US7143354B2 (en) * 2001-06-04 2006-11-28 Sharp Laboratories Of America, Inc. Summarization of baseball video content
US20050271269A1 (en) * 2002-03-19 2005-12-08 Sharp Laboratories Of America, Inc. Synchronization of video and data
US20040017389A1 (en) * 2002-07-25 2004-01-29 Hao Pan Summarization of soccer video content
US20040130567A1 (en) * 2002-08-02 2004-07-08 Ahmet Ekin Automatic soccer video analysis and summarization
US20050114092A1 (en) * 2003-03-31 2005-05-26 Baoxin Li Processing of video content

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9020259B2 (en) 2009-07-20 2015-04-28 Thomson Licensing Method for detecting and adapting video processing for far-view scenes in sports video

Similar Documents

Publication Publication Date Title
JP5465657B2 (en) Method and apparatus for detecting interest in soccer video by color segmentation and shape analysis
JP5686800B2 (en) Method and apparatus for processing video
CN110264493B (en) Method and device for tracking multiple target objects in motion state
Huang et al. Players and ball detection in soccer videos based on color segmentation and shape analysis
WO2010080687A1 (en) Method and apparatus for detecting and separating objects of interest in soccer video by color segmentation and shape analysis
Cioppa et al. A bottom-up approach based on semantics for the interpretation of the main camera stream in soccer games
Xu et al. Insignificant shadow detection for video segmentation
Wang et al. Automatic extraction of semantic colors in sports video
Heydari et al. An MLP-based player detection and tracking in broadcast soccer video
Izadi et al. Robust region-based background subtraction and shadow removing using color and gradient information
Hung et al. Generalized playfield segmentation of sport videos using color features
WO2010083018A1 (en) Segmenting grass regions and playfield in sports videos
Siles Temporal segmentation of association football from tv broadcasting
Tran et al. Long-view player detection framework algorithm in broadcast soccer videos
Wu et al. Robust lip localization on multi-view faces in video
Weeratunga et al. Application of computer vision to automate notation for tactical analysis of badminton
Rai et al. A novel method for detection and extraction of human face for video surveillance applications
Kim et al. Extracting semantic information from basketball video based on audio-visual features
Wibawa et al. Soccer Players Detection Using GDLS Optimization and Spatial Bitwise Operation Filter
Thomas et al. An energy minimization approach for automatic video shot and scene boundary detection
Bai et al. Playfield detection using color ratio and local entropy
Chen et al. Dynamic visual saliency modeling based on spatiotemporal analysis
Farhat et al. New approach for automatic view detection system in tennis video
RastegarSani et al. Playfield extraction in soccer video based on Lab color space classification
Singh et al. An interactive framework for abandoned and removed object detection in video

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10731918

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10731918

Country of ref document: EP

Kind code of ref document: A1