WO2015059472A1

WO2015059472A1 - A method and system of generating video data with captions

Info

Publication number: WO2015059472A1
Application number: PCT/GB2014/053152
Authority: WO
Inventors: Adam Price
Original assignee: Life On Show Limited
Priority date: 2013-10-23
Filing date: 2014-10-23
Publication date: 2015-04-30
Also published as: GB201318725D0; GB2519537A

Abstract

A method and system of generating video data of a product or service with captions is disclosed. The method comprises generating a set of tags representative of the time within the video that one or more aspects of the product or service are shown and generating a set of captions based on information about the aspects of the product or service, each caption corresponding to a particular aspect. The method further comprises linking one or more of the tags with one or more of the captions and subsequently storing and/or playing the video data with captions, each caption being displayed with the video data at the time indicated by the tag linked therewith.

Description

A Method and System of Generating Video Data with Captions

The present invention relates to a method and system for generating video data with captions.

Video is becoming increasingly important as internet based services move toward a more media led audience. As a result video is fast becoming a popular way of delivering content to end users over the internet. When delivering content in this manner, it is desirable to present so-called "take-away messages" from the video so that the viewer can recall the important parts of the video and this is typically achieved by placing striking captions in the video, as the video is displayed. However, captions are not always available to accompany a video of a particular product or service and as such, the providers of such products or services can be put at a disadvantage when it comes to exploiting the internet based market. It can be a time consuming exercise to manually create captions and then place them in the video. Moreover, these captions typically obscure the video being displayed and/or otherwise distract the viewer from the displayed video content resulting in the viewer missing aspects of the video.

Embodiments of the present invention are intended to address at least some of the problems discussed above. Embodiments of the present invention can automate the video generation process and generate the video with appropriate captions in seconds.

In accordance with the present invention as seen from a first aspect, there is provided a method of generating video data of a product or service with captions, the method comprising: - generating a set of tags representative of the time within the video data that one or more aspects of the product or service are shown;

- generating a set of captions based on information about the aspects of the product or service, each caption corresponding to a particular aspect;

- linking one or more of the tags with one or more of the captions; and,

- storing and/or playing the video data with captions, each caption being displayed with the video data at the time indicated by the tag linked therewith.

In an embodiment, the method comprises generating a plurality of selectable sets of captions. For example, sets of captions may be generated in different languages so that a user may select the preferred language for the captions. It is envisaged however, that the language of the set of captions may be selected automatically based on the geographic location of the user. The method further comprises selecting a set of captions from the plurality of sets for display with the video data.

In an embodiment, the tags are assigned a category from a set of categories relating to aspects of the product or service. For example, in the case that the product comprises a vehicle, the set of categories may comprise one relating to the vehicle engine, wheels, upholstery, bodywork, boot space, dashboard and the like. The captions may further be categorised according to the same set of categories, so that the captions may be linked to the respective tag using the categories.

In a further embodiment, the tags comprise, or further comprise, information controlling at least one of the positioning of the caption on a screen displaying the video data, the duration that the caption is to be displayed and animation relating to the caption, such as animation relating to the manner in which the caption enters and exits the display screen.

The method may further comprise pausing the video to allow the viewer of the video data to read the caption without missing any of the video content. The pausing may be performed manually by the user or may be automated according to the associated tag.

In a further embodiment, the method may further comprise the steps of:

- receiving video data relating to the product or service;

- interrogating the video data to acquire an identifier of the product or service; and,

- searching a database comprising information relating to the product or service using the identifier to acquire the information about the aspects of the product or service.

The step of interrogating the video data may comprise performing an image analysis on at least one frame of the video data. For example, in situations where the product is a vehicle, then the method may include applying an image analysis technique to identify a number of the licence plate of the vehicle. The step of searching a database may subsequently comprise searching the database of the driver and vehicle licensing agency (DVLA®), for example.

The method may further comprise generating one or more bespoke captions and tagging the video data with a bespoke tag indicative of the time within the video that the bespoke caption is to be displayed. In accordance with the present invention, as seen from a second aspect, there is provided a system configured to generate video data of a product or service with captions, the system comprising:

- a device configured to generate a set of tags representative of the time within the video data that one or more aspects of the product or service are shown;

- a device configured to generate a set of captions based on information about the aspects of the product or service, and link each caption with a corresponding tag; and

- a device configured to store and/or play the video data with captions, each caption being displayed with the video data at the time indicated by the tag linked therewith.

In an embodiment, the system may further comprise:

- a receiver for receiving video data relating to a product or service; and - a processor for interrogating the video data to acquire an identifier of the product or service and for subsequently searching a database comprising information relating to the product or service using the identifier to acquire information about the aspects of the product or service.

Embodiments of the present invention will now be described by way of example only and with reference to the accompanying drawings, in which:

Figure 1 is a schematic illustration of a system according to an embodiment of the present invention; and,

Figure 2 is a flow chart illustrating the steps associated with a method according to an embodiment of the present invention. Referring to figure 1 of the drawings, there is illustrated a system 10 according to an embodiment of the present invention, for generating video data with captions. The system 10 comprises a computer device 11 having a processor 12 and a memory 13. The computer 1 1 also comprises other standard components such as a user interface 14, a display screen 15 and a communications module 16 that enables the computer 11 to communicate over a network 20, including the internet, with at least one remote computing device 30. The memory 13 of the computer 10 comprises an application that is arranged to process video data according to an embodiment of a method 100 of the present invention using the processor 12, as illustrated in figure 2 of the drawings, in order to produce video data that may be displayed with one or more captions.

The system 10 further comprises a receiver 17, such as a video camera which is arranged to acquire video data of a product (not shown), such as a vehicle or a property for example, or a service (not shown), such as a service offered by a plumber or technician, for example at step 101. For the purposes of describing an embodiment of the present invention, the following relates to the acquisition of video data relating to a vehicle. However, the skilled person will recognise that the system may be used to acquire details about alternative products and services.

In an embodiment the acquisition of the video data may comprise an automated process, whereby a video of the vehicle is taken by following a predefined viewing path around the vehicle. The path may comprise viewing stations (not shown), at which the camera 17 is arranged to dwell, so that a magnified view of a particular aspect of the vehicle may be taken, such as the engine or wheels (not shown). Alternatively, the dwell time may be arranged to provide the viewer with the ability to take a more considered view of a particular aspect of the vehicle, for example. The automated acquisition of the video may thus comprise an automated, repeatable viewing profile, where known aspects of a vehicle are viewed at a particular known time within the video. For example, it may be known that for all videos obtained according to the automated viewing profile that at 6 seconds from the start of the video, the engine bay (not shown) will be viewed, at 42 seconds the vehicle interior (not shown) will be viewed.

In an alternative embodiment, the video data may be acquired according to a bespoke viewing profile, such as that involving a user walking around the vehicle with the video camera 17 in a random manner. In these circumstances, the aspect of the vehicle being viewed at a particular time within the resultant video may not be known prior to the video being taken.

The application is arranged to process at least one frame of the video data using the processor 12 to acquire an identifier of the product at step 102, which in the case of a vehicle may comprise the vehicle registration or number plate (not shown). This may involve the use of conventional number recognition equipment, such as that available at www.ndi-rs.com or alternatively, this may comprise an analysis of at least one image of the vehicle in an attempt to determine the manufacturer/model through recognition of notable insignia and/or shapes.

With the identifier determined, the system 10 is arranged to interrogate a database 31 at step 103, such as that associated with the remote computing device 30, which may involve use of the communication module 16 so that the system 10 can communicate over the internet. In an embodiment, the database 31 may comprise that of the DVLA®. The processor 12 interrogates the database 31 to acquire information on various aspects of the vehicle, such as the make, model, any custom modifications, engine type and the like, and stores this information in a database held on the memory 13. The system 10 further comprises a processing unit 18, which is used to generate a set of captions relating to the particular aspects of the vehicle, based on the respective information, at step 104. The processing unit 18 subsequently stores the set of captions in a database also held on the memory 13. In a further embodiment however, the set of captions may be generated from user acquired information, or from vehicle manufacturer feeds or customer feeds, and the processing unit 18 may be configured to generate pre-defined captions which incorporate the information concerning the aspects of the vehicle. The processing unit 18 may be arranged to generate a plurality of selectable, sets of captions, each set comprising a different language for the captions, for example. Alternatively, the different sets of captions may be generated and tailored for different seasonal events or occasions, such as a sale event or a Christmas promotional event, for example. In a further alternative, the different sets of captions may comprise unique branding for different vehicle manufacturers, for example.

The system 10 further comprises a device 19 configured to generate a set of tags corresponding to the video data, at step 105. The set of tags (not shown) represent a plurality of markers which indicate the temporal position or the time within the video data that a particular aspect of the vehicle is displayed and the set of tags are stored in a database held on the memory 13. The set or sets of captions generated by the processor 12 or processing unit 18 for a particular video are subsequently linked to a corresponding tag at step 106, so that the captions can be displayed at the same time that the respective aspect of the video data is displayed on the display screen 15. In this respect, each video will comprise an associated set of tags and one or more corresponding selectable sets of captions. Accordingly, once the tags have been generated for the video data, the generated captions of the or each set of captions can be linked to one or more of the tags. When it is desired to view the video data on a display screen 15 with the captions, the video data may be streamed from the respective database at step 107, along with any other relevant content to the display screen 15 or media player for access by potential customers and/or internet based search engines, and the desired set of captions may be selected at step 108. As the video data is displayed on the display screen 15, the application is arranged to display the captions from the desired set at step 109 as an overlay of the video data, at the time indicated by the respective tag. If required, the video data with captions may be subsequently stored in the memory 13 at step, 110. In an alternative embodiment, it is also envisaged that the caption may be delivered to the viewer as an audible message, in addition to or instead of a textual message.

The synchronous, separate display of the video data and captions on the same display screen enables the same video data to be used with different sets of captions. For example, the video data may show a video of a vehicle and the captions applied or overlaid on the video image may be selected according to the viewers preferred language. Similarly, the captions presented on the display screen 15 may be changed to represent a particular sale event, for example, without having to generate new video data.

In an embodiment, the captions may be linked to one or more tags by appropriately categorising aspects relating to the particular product or service using the processing unit 18. For example, in situations where the product comprises a vehicle, the categories may be selected from a set of pre-defined categories stored in the memory 13, which may comprise one or more of the engine, wheels, boot space, upholstery (not shown) and the like. Accordingly, in situations where the information retrieved by the processor 12 from the database comprises a reference to "V8" or "4.2 litre", the processing unit 18 is arranged to process these items of information and categorise the information as relating to the aspect of the vehicle engine (not shown). However, in a further embodiment, the captions may be linked to one or more tags using a label, such as a numeric label.

In an embodiment, the tags may comprise control functionality for controlling the captions linked thereto. For example, the tags may be arranged to control any animated entry and/or exit of the linked caption or captions on the display screen 15 and may further control the positioning of the caption or captions on the display screen 15. In a further embodiment, the tags may be arranged to pause the underlying video data once the corresponding caption has been displayed to enable the viewer (not shown) to read the caption without missing any of the video content.

In situations where the video data is created automatically according to a predefined viewing profile, then the tags may be generated automatically using the device 19, since the aspects of the product or service will be displayed at known temporal positions within the video. However, in situations where the video data is generated according to a bespoke viewing profile, then the tags may be generated following an interrogation of the video data using the device 19. However, tags may also be generated for video data acquired automatically, so that bespoke captions may be applied at bespoke temporal positions within the video. It is envisaged that this may enable captions to be further tailored, as required.

Claims

1. A method of generating video data of a product or service with captions, the method comprising:

- generating a set of tags representative of the time within the video data that one or more aspects of the product or service are shown;

- linking one or more of the tags with one or more of the captions; and,

2. A method according to claim 1 , further comprising generating a plurality of selectable sets of captions.

3. A method according to claim 2, further comprising selecting a set of captions from the plurality of sets for display with the video data.

4. A method according to any preceding claim, wherein the tags are assigned a category from a set of predefined categories relating to an aspect of the product or service.

5. A method according to any preceding claim, wherein the tags comprise, or further comprise information controlling one or more of the positioning of the caption on a screen displaying the video data, the duration that the caption is to be displayed and animation relating to the caption.

6. A method according to any preceding claim, further comprising pausing the video data to allow the viewer of the video data to read the caption without missing any of the video content.

7. A method according to any preceding claim, wherein the tags are arranged to control or further control the video data being displayed to pause.

8. A method according to any preceding claim, further comprising:

- receiving video data relating to a product or service;

- searching a database comprising information relating to the product or service using the identifier to acquire information about the aspects of the product or service;

9. A method according to any preceding claim, wherein the step of interrogating the video data comprises performing an image analysis on at least one frame of the video data.

10. A method according to any preceding claim, further comprising generating one or more bespoke captions and tagging the video data with a tag indicative of the time within the video that the bespoke caption is to be displayed.

11. A system configured to generate video data of a product or service with captions, the system comprising:

- a device configured to generate a set of captions based on information obtained about aspects of the product or service, and link each caption with a corresponding tag; and

12. A system according to claim 11 , further comprising:

- a receiver for receiving video data relating to a product or service;

- a processor for interrogating the video data to acquire an identifier of the product or service and for subsequently searching a database comprising information relating to the product or service using the identifier to acquire information about the aspects of the product or service;