WO2011145036A1 - System and method for detecting network contents, computer program product therefor - Google Patents

System and method for detecting network contents, computer program product therefor Download PDF

Info

Publication number
WO2011145036A1
WO2011145036A1 PCT/IB2011/052125 IB2011052125W WO2011145036A1 WO 2011145036 A1 WO2011145036 A1 WO 2011145036A1 IB 2011052125 W IB2011052125 W IB 2011052125W WO 2011145036 A1 WO2011145036 A1 WO 2011145036A1
Authority
WO
WIPO (PCT)
Prior art keywords
contents
browser
analysis
pages
detect
Prior art date
Application number
PCT/IB2011/052125
Other languages
French (fr)
Inventor
Giuseppe Provera
Original Assignee
Convey S.R.L.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Convey S.R.L. filed Critical Convey S.R.L.
Publication of WO2011145036A1 publication Critical patent/WO2011145036A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services; Handling legal documents
    • G06Q50/184Intellectual property management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce

Definitions

  • the functional architecture and its characteristic impact on the users is always related to a very high cost of technological systems, whose management is (almost) never entrusted by those subjects that have a true knowledge of the intangible good (for example, IP asset) subject to monitoring/protection.
  • a typical productive process is almost always out-sourced to expert computer engineers who, for the final report, interface with the Legal/IP departments of the owner of the intellectual property rights (IPRs) .
  • the owner's organization hardly develops a deep knowledge/conscience of the nature of the process and cannot provide the subjects who manage the intelligence activity with any experience and feedback.
  • the daily, intense and differentiated use of the Internet which characterizes the actual work of any person or organization does not find any appreciable improvement in the perspective of protecting IPRs and/or the image in the network.
  • Various embodiments comprise at least one user terminal equipped with a browser to surf web pages in a network such as the Internet; the browser is coupled with a contents detection module configured for detecting the contents of the pages opened from time to time by the user via the browser, thus performing a contents detection action driven via the browser.
  • the above-mentioned contents detection module may be configured to interact with a centralized server subsystem in order to send to such a centralized server subsystem contents information on the above-mentioned pages with a view to possibly storing such data in a repository in the centralized server subsystem.
  • Various embodiments are able to use, for example in relation to IPRs and related areas, the successful and widespread model of antivirus systems, which are normally established in an organized context, with a server-side application of high technical profile and client applications distributed over all the workstations included in the Intranet/Extranet to complete an active protection framework, with high "center/remote" synergies.
  • Various embodiments introduce in a IPR protection model the concept of a remote application possibly adapted to interact with a central application and which may not just be activated within the framework of an organization (professional, company, body) , but is also applicable to subjects grouped at different levels (e.g. inter-company, consortium, category, territory, community, etc) .
  • Various embodiments support the user while currently surfing the Internet, for whatever purpose, by supplying real-time information on the presence in each page visited of one or more elements of interest precisely selected in an initial configuration stage, irrespective of whether these are of a textual type, or distinctive signs, or images.
  • the analysis results may be reviewed immediately and, if deemed interesting, the user may authorize that these are sent to a central server for data collection.
  • the data present in the central server may be possibly subject to further processing and/or integrated with further information acquired from the Internet (e.g. server locations hosting the page) .
  • the central server data may be made available to the user that originated them and, possibly, also to other authorized subjects (e.g. Consortiums, Associations, public authorities, etc.) in the form of statistical reports and/or summary charts, thanks to different views being processed.
  • authorized subjects e.g. Consortiums, Associations, public authorities, etc.
  • a toolbar includes a user authentication mechanism that prevents inappropriate o unauthorized use of the application (e.g. the monitoring/analysis of a third party brand; extension of use of the application beyond the end of a trial period or the expiry of a service contract/subscription, etc.).
  • inappropriate o unauthorized use of the application e.g. the monitoring/analysis of a third party brand; extension of use of the application beyond the end of a trial period or the expiry of a service contract/subscription, etc.
  • plural users within a same organization may use the toolbar by sharing the same basic configuration (e.g. for the "Fashion” field) but with different specific targets (e.g. different brands) .
  • the configuration configuration is, indeed, server-side, or rather is held within a special database and is communicated to the toolbar solely after (positive) access credential verification .
  • the toolbar configuration may be modified at a single central point connected to the Internet, by making it immediately available for a subsequent start-up.
  • the toolbar is equipped with a control on the most recent release available and is capable of self-updating the analysis components without the need for complete re-installation.
  • the user may decide what analysis options are to be enabled/disabled within the framework of service menus offered by the toolbar.
  • the toolbar may also include a configuration including customized stoplists, i.e. lists of term that, if present in a web page, must not be taken into account during the analysis and also black/white lists containing URLs that must/must not be taken into account in the analysis.
  • customized stoplists i.e. lists of term that, if present in a web page, must not be taken into account during the analysis and also black/white lists containing URLs that must/must not be taken into account in the analysis.
  • figure 1 represents a general system architecture
  • FIG. 2 is a more detailed block diagram of an embodiment
  • FIGS. 3, 4 e 5 are flow diagrams representative of operation of embodiments.
  • an embodiment within the scope of this description denotes that a particular configuration, structure o feature described in connection with the embodiment is included in at least an embodiment. Therefore, phrases such as "in an embodiment”, possibly present in different places of this description, do not necessarily refer to the same embodiment.
  • particular shapes, structures o features may be combined in a suitable manner in one or more embodiments.
  • Figure 1 is a block diagram of a system as described herein, whose components may be divided from the logical viewpoint into client components CC and server components CS, that will be assumed to be connected over a network N such as the Internet.
  • PCs that generally represent the user terminal or end user, operating at a location remote with respect to the centralized authentication/processing system CS.
  • the browser in the PCs considered herein will be assumed to be equipped with a toolbar TB properly installed and configured so as to operate according to modes better described in the following.
  • toolbar a component (widget) is meant which is used in many user interfaces. It is typically a box or horizontal or vertical bar, where icons representing links to various system functions are present.
  • At least some of the user terminals 10 may be include, instead of PCs, computer devices such as PDAs, evoluted portable terminals (such as iPhone®, iPad® terminals etc..) capable of supporting a browser equipped with the Toolbar TB.
  • computer devices such as PDAs, evoluted portable terminals (such as iPhone®, iPad® terminals etc..) capable of supporting a browser equipped with the Toolbar TB.
  • the components on the server side may be organized in a server network coming done to the firewall FW.
  • the configuration on the server side may include a data collect server 20 that is entrusted with receiving incoming data from the applications (toolbar) active remotely (that is, on the client side) .
  • the function of the server 20 is to verify the syntactic accuracy of the input data and to provide for sending them to a database server 22 for subsequent storage within a memory or main repository 24.
  • this component may be implemented as a web service or web application that presents outside a series of public methods (or interfaces) to be recalled by the toolbar TB (and possibly by other client applications) .
  • the data collect server 22 may likewise communicate with an authentication server 26 in order to avoid undesired access to the remote system.
  • the authentication server 26 verifies the user identity associated with the request originating from a remote client, with the double purpose of supplying credential validation at toolbar start-up and toolbar configuration services.
  • a RDBMS system may be installed on the database server 22 to manages the main database 24 containing the analysis results originating from the remote systems.
  • the database 24 may be comprised of appropriate tables, stored procedures, views, triggers, etc...
  • a processing server 28 may then be present, consisting in one or more servers that process in the background data saved in the main repository 24.
  • one or more servers 30 may then be present with the role of report servers, in order to take care of supplying to a caller the views of interest on data present in the repository 24.
  • a possible implementation includes a web application that submits a graphical report to the end-user.
  • the toolbar TB (below indicated by the specific reference 100) may be integrated in the browser 102 of the terminals 10 as a plug-in (or addon) .
  • the integration mechanism may be different as a function of the browser typology adopted, since each browser supplies different APIs (Application Programming Interfaces) and interaction instruments.
  • the toolbar 100 communicates with analysis libraries 104 installed on the user terminal 10.
  • the toolbar 100 may be integrated in a specific interface 106 that represents a common entry-point. In such a way, by keeping the interface 106 unaltered, it is possible to replace, as required, the underlying libraries 104, for example when these have been developed following a specific protocol.
  • the analysis libraries 104 may be used each for a different purpose, and the values resulting from their respective processing may be used in real time for a composite and weighted calculation of the final result returned to the toolbar.
  • the libraries 104 may be identified based on the type of analysis they perform, that is:
  • classifications e.g. the predominant language; the contents typology traced back to specific categories, differentiated as a function of the application sector, etc.
  • this locates and analyses in the page specific textual combinations wherein a text searched and significant and/or specific elements for each application field (e.g., searches the brand together with textual elements that configure improper uses, abusive and/or parasitical of the same) are simultaneously present.
  • This component provides for different configurations of the "semantic analysis" process in relation to the application field/sector (e.g. fashion, luxury, agri-food, pharmaceutical, software, services, etc.);
  • this locates and analyses the images present in the page and determines the degree of similarity with respect to one or more "sample” images of interest for the user, loaded in the application (or residing in the remote server) at the time of its initial configuration and/or at subsequent times, on request of the user;
  • Figure 3 schematically represents an example of a user authentication stage within the framework of the system previously described.
  • step 1000 the user will open the browser, performing the authentication on the toolbar 100 by entering access username and password.
  • the toolbar 100 requests a verification of the credentials from the remote server (authentication server 26) .
  • step 1006 the authentication server 26 returns the configuration to the toolbar 100.
  • the toolbar 100 is active and may start listening to what is occurring in the browser 102 and to provide analysis of the pages opened by the user from time to time.
  • step 1004 the authentication server 22 will return a value representative of verification failure, whereby the user will have to authenticate again, re-starting from step 1000.
  • Figure 4 schematically represents an example of page within the framework of the system previously described .
  • the system In the presence of a negative outcome of the check performed in the step 2008, the system returns to the monitoring phase 2002.

Abstract

A system for detecting the contents of web pages, for example for detecting improper contents related to counterfeiting, IPR piracy and similar illegal behaviour, comprising at least a user terminal (10) equipped with a browser to surf web pages of a network (N). Coupled to the browser is a contents detection module (TB) configured to detect the contents of pages opened via said browser with a contents detection action and real-time feed-back driven via said browser.

Description

"System and method for detecting network contents, computer program product therefor"
-k ~k ~k ~k
Technical field
The present disclosure relates to techniques for detecting network contents, for example on networks such as the Internet.
The present disclosure has been devised by paying attention to the possible use in detecting improper contents, for instance for protecting owners of industrial property rights (IPRs) from counterfeiting, piracy and similar illegal behaviour.
Technological background
Over the recent years, various solutions have been devised in order to monitor the Internet network and its contents, for example in order to protect owners of industrial property rights from counterfeiting, piracy acts, abuses, improper usage, etc. This both to respond to the increasing need of protecting the rights of use /economic exploitation and with the aim of protecting the image of a subject being cited.
These techniques are identified with various designations such as "Internet Intelligence", "Web Monitoring", "Web Mining", "Competitive Intelligence", "Social Network Analysis", "Web 2.0 Analytics", or, with more relevance to intellectual property aspects, with designations such as "Brand Monitoring", "Brand Protection", "TradeMarks Intelligence", "Copyright Protection", "Internet IP Protection & Management" and so forth.
The relative literature is quite extensive, as witnessed, by mere example, by documents such as US-B-6 401 118, , US-A-6 983 320, WO-A-0108382 , WO-A- 2007/047871.
Various detection toolbars provide for collecting primarily browsing data from the browser of a "remote" user (see, for example, http: //www. alexa . com/toolbar) , carrying out checks (e.g. antiphishing) based on pre¬ existing databases on a central server, by providing the results to the user (see, for example, http : //toolbar . netcraft . com/ ) , or supporting information sharing between user communities (see, for example,
http : / /www . google . com/intl/it /toolbar/ie/index . html ) . In any case, these solutions are based on the assumption that an established centralized computer system exists characterized by significant web page acquisition capabilities.
Object and summary
The inventor has observed that the existing technological solutions have tackled the monitoring problem by establishing centralized computer systems, characterized by significant web page acquisition capabilities, by resorting to website contents crawling/spidering technologies and high-level contents analysis capabilities, for example via complex text analysis algorithms based on "NLP - Natural Language Processing" methodologies.
The inventor has observed that such centralized approach presents numerous drawbacks.
For instance, as regards web page acquisition capability, the available acquisition power, no matter how large, is inevitably constrained by the existing technical limitations (available internet download bandwidth or concentrated download of high volumes of pages and stocking/storage thereof) and by the limitations stemming from the browsing/acquisition logics of the network crawler (limited in fact by a number of finite/limited search strategies, which are essentially repetitive in time as they are conceived and developed by few "thinking subjects", based on their own experience, own field knowledge and/or of specific goals of the monitoring activity) .
Also, as regards content analysis ability, irrespective of the degree of sophistications of the textual analysis algorithms and "NLP - Natural Language Processing" methodologies applied, certain linguistic ambiguity situations may arise in different contexts and/or with high interaction or mix between textual contents and graphic contents, which frequently elude any correct analysis and correct appreciation of the level of "danger" in the page as regards protecting IPRs from the point of view of the owner (of text, images, brand, multimedia contents, etc.)
Moreover, the functional architecture and its characteristic impact on the users is always related to a very high cost of technological systems, whose management is (almost) never entrusted by those subjects that have a true knowledge of the intangible good (for example, IP asset) subject to monitoring/protection. A typical productive process is almost always out-sourced to expert computer engineers who, for the final report, interface with the Legal/IP departments of the owner of the intellectual property rights (IPRs) . The owner's organization hardly develops a deep knowledge/conscience of the nature of the process and cannot provide the subjects who manage the intelligence activity with any experience and feedback. In other words, the daily, intense and differentiated use of the Internet which characterizes the actual work of any person or organization does not find any appreciable improvement in the perspective of protecting IPRs and/or the image in the network.
The object of various embodiments is to provide a solution for detecting contents in a network capable of overcoming the drawbacks outlined above.
In various embodiments, that object is achieved thanks to a system having the characteristics specifically recited in the claims that follow. The invention also relates to a corresponding method, as well as a computer program product, loadable in the memory of at least one computer and including software code portions for implementing the steps of the method when the product is run on at least one computer. As used herein, reference to such a computer program product is intended to be equivalent to reference to a computer readable medium containing instructions for controlling a processing system to coordinate the implementation of the method according the invention. Reference to "at least one computer" is evidently intended to highlight the possibility that the present invention may be implemented in a modular and/or distributed form.
The claims form an integral part of the technical disclosure of the invention as provided herein.
Various embodiments comprise at least one user terminal equipped with a browser to surf web pages in a network such as the Internet; the browser is coupled with a contents detection module configured for detecting the contents of the pages opened from time to time by the user via the browser, thus performing a contents detection action driven via the browser.
In various embodiments, the above-mentioned contents detection module is configured for supplying the user terminal with feed-back information on the contents of said pages.
In various embodiments, the above-mentioned contents detection module may be configured to interact with a centralized server subsystem in order to send to such a centralized server subsystem contents information on the above-mentioned pages with a view to possibly storing such data in a repository in the centralized server subsystem.
Various embodiments are able to use, for example in relation to IPRs and related areas, the successful and widespread model of antivirus systems, which are normally established in an organized context, with a server-side application of high technical profile and client applications distributed over all the workstations included in the Intranet/Extranet to complete an active protection framework, with high "center/remote" synergies.
Various embodiments introduce in a IPR protection model the concept of a remote application possibly adapted to interact with a central application and which may not just be activated within the framework of an organization (professional, company, body) , but is also applicable to subjects grouped at different levels (e.g. inter-company, consortium, category, territory, community, etc) .
Various embodiments are based on an innovative application, capable of providing, in real time, the capability of executing a specialized analysis of a situation of interest (e.g. use/abuse of some intangible IPRs/asset in a web page) in the very moment where a person finds himself or herself surfing a page when working daily on a workstation: such person is made immediately capable of understanding/construing the signals (at different levels as appropriate) that the application produces in real time, leaving then to the person and to his/her knowledge/understanding of the "IP asset" involved the decision as to whether the signal/output received should be subjected to further and more in-depth centralized analysis, as mentioned above . Various embodiments exploit the fact that the technical/computer science intelligence for the analysis of critical situations present in the contents of web pages (e.g. in terms of integrity of IPRs, or associations which may dangerous for the image of an individual/company) as provided by a remote station presents various advantages (and also complementarity) with respect to a centralized solution, in particular:
- it is included in the browser, that is in the main instrument for Internet surfing, namely a software feature present in any computer capable of connecting to the network;
- it acts in real time, providing feed-back information immediately;
- it is activated by several subjects
(theoretically all those subjects capable of using the network) , without distinctions in terms of role/activity and without time constraints and/or predefined modes that might affect the behavioural usage;
it is oriented and characterized by network exploration logics as numerous as the subjects that use it, the activities that these carry out, the problems that, from time to time, these must solve on the Internet when working every day in their respective tasks and responsibilities (technical, commercial, financial, organizational, relational, etc.);
- it may benefit from immediate human supervision, as it is made available to subjects that have intelligence, creativity, memory and relationship ability at a very high degree in comparison with the management capabilities of software algorithms of centralized systems, which are predefined and limited;
- it may be integrated with centralized technical and methodological components, which are capable of cross-checking and verifying situations emerging from the action of plural persons at the periphery, who may have detected similar/identical results following even quite different paths/logics, thus making it possible to investigate to a deeper extent (not in real time) particularly complex analysis aspects thanks to the possible presence of more powerful computing capabilities ;
is has a very small unitary cost, hardly comparable to the very high cost of centralized intelligence systems, thus making extended usage possible ;
- it facilitates spreading an effective culture of "IP asset" and protection of a company/product in any organized context and adopting a widespread monitoring/protection practice of intangible assets (e.g. IPRs and/or Brand reputation) also in small enterprises .
Various embodiments support the user while currently surfing the Internet, for whatever purpose, by supplying real-time information on the presence in each page visited of one or more elements of interest precisely selected in an initial configuration stage, irrespective of whether these are of a textual type, or distinctive signs, or images.
Various embodiments may be integrated in any Internet surfing browser (e.g. Internet Explorer) and may be implemented, for instance, as a toolbar capable of calling adequate function libraries to execute differentiated and complementary operations on text (and/or images) present on the web page viewed by the user .
Various embodiments lend themselves to an implementation represented by the search/analysis within a HTML text comprising a page of a brand, or a logo, in order to determine specific use/abuse situations, not only in terms of IPRs, but also, for example, for the purpose of detecting aspects/elements connected to brand image and its reputation (the so called "word-of-mouth" in Social Media/Social Networks ) .
In various embodiments, analysis is performed in real-time while a web page is being surfed and the user is notified that it has been completed by an acoustic and/or visual signal.
In various embodiments, the analysis results may be reviewed immediately and, if deemed interesting, the user may authorize that these are sent to a central server for data collection.
In various embodiments, the data present in the central server may be possibly subject to further processing and/or integrated with further information acquired from the Internet (e.g. server locations hosting the page) .
In various embodiments, the central server data may be made available to the user that originated them and, possibly, also to other authorized subjects (e.g. Consortiums, Associations, public Authorities, etc.) in the form of statistical reports and/or summary charts, thanks to different views being processed.
In various embodiments, a toolbar includes a user authentication mechanism that prevents inappropriate o unauthorized use of the application (e.g. the monitoring/analysis of a third party brand; extension of use of the application beyond the end of a trial period or the expiry of a service contract/subscription, etc.).
In various embodiments, plural users within a same organization (enterprise, consortium, association, etc.) may use the toolbar by sharing the same basic configuration (e.g. for the "Fashion" field) but with different specific targets (e.g. different brands) .
In various embodiments, the configuration configuration, is, indeed, server-side, or rather is held within a special database and is communicated to the toolbar solely after (positive) access credential verification .
In various embodiments, the toolbar configuration may be modified at a single central point connected to the Internet, by making it immediately available for a subsequent start-up.
In various embodiments, the toolbar is equipped with a control on the most recent release available and is capable of self-updating the analysis components without the need for complete re-installation.
In various embodiments, the user may decide what analysis options are to be enabled/disabled within the framework of service menus offered by the toolbar.
In various embodiments, the toolbar may also include a configuration including customized stoplists, i.e. lists of term that, if present in a web page, must not be taken into account during the analysis and also black/white lists containing URLs that must/must not be taken into account in the analysis.
Brief description of the attached figures
The invention will be now described, purely by non-limiting example, with reference to the attached figures, wherein:
figure 1 represents a general system architecture
- figure 2 is a more detailed block diagram of an embodiment, and
figures 3, 4 e 5 are flow diagrams representative of operation of embodiments.
Detailed description of embodiments In the following description various specific details finalised to an in-depth understanding of the embodiments are illustrated. The embodiments may be realized without one or more of the specific details, or with other methods, components, materials, etc. In other cases, structures, materials or operations known are not shown or described in detail to avoid to make unclear the various aspects of the embodiments.
The reference to "an embodiment" within the scope of this description denotes that a particular configuration, structure o feature described in connection with the embodiment is included in at least an embodiment. Therefore, phrases such as "in an embodiment", possibly present in different places of this description, do not necessarily refer to the same embodiment. In addition, particular shapes, structures o features may be combined in a suitable manner in one or more embodiments.
The references herein used are only for convenience and to not define therefore the scope of protection or range of the embodiments.
Figure 1 is a block diagram of a system as described herein, whose components may be divided from the logical viewpoint into client components CC and server components CS, that will be assumed to be connected over a network N such as the Internet.
In the exemplary embodiments considered herein, the client components CC include in general Personal Computers or PCs 10 with a browser equipped with a toolbar TB as better described in the following.
As schematically illustrated, the PCs 10 may be both individual PCs with a "stand-alone" configuration (like the single PC depicted below in the left portion of figure 1), and PCs included in a corporate network (like the group of PCs connected to a firewall F depicted on top in the left portion of figure 1) .
These are PCs that generally represent the user terminal or end user, operating at a location remote with respect to the centralized authentication/processing system CS.
As already indicated, the browser in the PCs considered herein will be assumed to be equipped with a toolbar TB properly installed and configured so as to operate according to modes better described in the following.
As is well known, by toolbar a component (widget) is meant which is used in many user interfaces. It is typically a box or horizontal or vertical bar, where icons representing links to various system functions are present.
In various embodiments, at least some of the user terminals 10 may be include, instead of PCs, computer devices such as PDAs, evoluted portable terminals (such as iPhone®, iPad® terminals etc..) capable of supporting a browser equipped with the Toolbar TB.
In the exemplary embodiments considered herein, the components on the server side may be organized in a server network coming done to the firewall FW.
In a possible configuration, the configuration on the server side may include a data collect server 20 that is entrusted with receiving incoming data from the applications (toolbar) active remotely (that is, on the client side) . The function of the server 20 is to verify the syntactic accuracy of the input data and to provide for sending them to a database server 22 for subsequent storage within a memory or main repository 24.
In various embodiments, this component may be implemented as a web service or web application that presents outside a series of public methods (or interfaces) to be recalled by the toolbar TB (and possibly by other client applications) .
In various embodiments, the data collect server 22 may likewise communicate with an authentication server 26 in order to avoid undesired access to the remote system. The authentication server 26 verifies the user identity associated with the request originating from a remote client, with the double purpose of supplying credential validation at toolbar start-up and toolbar configuration services.
In various embodiments, a RDBMS system may be installed on the database server 22 to manages the main database 24 containing the analysis results originating from the remote systems. In various embodiments, the database 24 may be comprised of appropriate tables, stored procedures, views, triggers, etc...
In various embodiments, a processing server 28 may then be present, consisting in one or more servers that process in the background data saved in the main repository 24.
In various embodiments, one or more servers 30 may then be present with the role of report servers, in order to take care of supplying to a caller the views of interest on data present in the repository 24. A possible implementation includes a web application that submits a graphical report to the end-user.
As schematized in figure 2, in various embodiments, the toolbar TB (below indicated by the specific reference 100) may be integrated in the browser 102 of the terminals 10 as a plug-in (or addon) . The integration mechanism may be different as a function of the browser typology adopted, since each browser supplies different APIs (Application Programming Interfaces) and interaction instruments.
To carry out its task, the toolbar 100 communicates with analysis libraries 104 installed on the user terminal 10. In various embodiments, in order to mask the implementation of the analysis libraries, the toolbar 100 may be integrated in a specific interface 106 that represents a common entry-point. In such a way, by keeping the interface 106 unaltered, it is possible to replace, as required, the underlying libraries 104, for example when these have been developed following a specific protocol.
In various embodiments, the analysis libraries 104 may be used each for a different purpose, and the values resulting from their respective processing may be used in real time for a composite and weighted calculation of the final result returned to the toolbar.
In various embodiments, the libraries 104 may be identified based on the type of analysis they perform, that is:
statistical analysis: this analyses the HTML page code yielding as a result a set of quantitative values related to the presence of a content searched (for instance, a brand) within the HTML code in specific technical positions (for example, in Tags);
- classification: this analyses the actual page content, by returning one or more respective classifications (e.g. the predominant language; the contents typology traced back to specific categories, differentiated as a function of the application sector, etc.) which may be used to properly "weigh" the indicators/parameters/values detected by other libraries ;
- semantic analysis: this locates and analyses in the page specific textual combinations wherein a text searched and significant and/or specific elements for each application field (e.g., searches the brand together with textual elements that configure improper uses, abusive and/or parasitical of the same) are simultaneously present. This component provides for different configurations of the "semantic analysis" process in relation to the application field/sector (e.g. fashion, luxury, agri-food, pharmaceutical, software, services, etc.);
- image analysis: this locates and analyses the images present in the page and determines the degree of similarity with respect to one or more "sample" images of interest for the user, loaded in the application (or residing in the remote server) at the time of its initial configuration and/or at subsequent times, on request of the user;
- hidden text analysis: this analyses the HTML text of the page searching for parts/elements of interest, not visible to the end user within the browser .
All the above cited analysis typologies (with the possible exception of image analysis), may be focused, during configuration, to search and analyze also "distorted textual dictions" and dictions similar of the content of interest (e.g. a brand), present in the page under inspection, generated by resemblance of "sounding" (for example the so called "Italian sounding" for a fashion brand, etc.) and/or due to typing errors (so-called "mispelling" or
"typosquatting" ) .
Figure 3 schematically represents an example of a user authentication stage within the framework of the system previously described.
Specifically, the flow diagram of figure 3 assumes that in step 1000 the user will open the browser, performing the authentication on the toolbar 100 by entering access username and password. The toolbar 100 requests a verification of the credentials from the remote server (authentication server 26) .
In the affirmative (positive outcome of step 1004), in a step 1006 the authentication server 26 returns the configuration to the toolbar 100. At this point, the toolbar 100 is active and may start listening to what is occurring in the browser 102 and to provide analysis of the pages opened by the user from time to time.
In case of the negative outcome of step 1004, the authentication server 22 will return a value representative of verification failure, whereby the user will have to authenticate again, re-starting from step 1000.
Figure 4 schematically represents an example of page within the framework of the system previously described .
Step 2000 is generally representative of the toolbar waiting for a new page to analyze, in particular within the framework of a monitoring phase 2002. Such phases involves the periodic execution of a step 2004 to check if the user has opened a new page on the browser. In the case of a negative outcome of step 2004, the toolbar simply returns to the phase 2002, to repeat then, after a standby interval, the step 2004.
In the presence of a positive outcome of the step 2004, which will indicate that the user has opened a new page, in a step 2006 the toolbar calls the analysis libraries and requests real time analysis of the page, based on the configuration current at that moment.
At the end of the analysis of each page, concise results are returned to the toolbar 100. These may be represented by a numeric indicator and/or by its graphical representation (e.g. a colour scale) that, in real time, may highlight, for example in the case of a brand, a determined danger/risk index for the abuse of the same or for the presence of concepts/terminology negative/immoral for the image of the cited subject.
If - in view of the positive outcome of a check performed in a step 2008 - results are available (absolute and/or beyond a threshold value of the numeric/graphical indicator mentioned previously) , in a step 2010 the toolbar advises the user via a visual and/or acoustic alert message (alert) .
In the presence of a negative outcome of the check performed in the step 2008, the system returns to the monitoring phase 2002.
If in the step 2008 the existence of results is detected, the user may immediately obtain an "x-ray image" of the page visited, with the specific indication of the elements of interest positively detected and save such file in a repository prearranged by the toolbar 100 in the workstation.
The toolbar has notified in real time the user that the analysis results related to the page viewed are ready, with values that exceed the predefined threshold .
For example, in the presence of a page with evidence of particular seriousness and as schematized in figure 5, starting from a step 3000, in a step 3002 the user has the faculty of authorizing immediate transmission of the data to the remote server (data collect server 20) . The toolbar 100 may provide appropriate configurations to facilitate the user the possible addition of concise information of particular interest for subsequent analysis in the central repository 2 .
Sending of data to the server in question discussion is represented in figure 5 by the step 3004, while the negative outcome of the step 3002 brings back to step 3000.
If the verification of data consistency on the server side (step 3006) yields a positive outcome (with a positive outcome of a verification step 3008), in a step 3010 the data are stored in the central repository (database) 24. Otherwise (negative outcome of the step 3008), in a step 3012 the system notifies the toolbar 100 of a data inconsistency error.
Of course, without prejudice to the principle of the invention, the implementation details and the embodiments may vary, even in a significant manner, as here set forth by mere non-limitative example, without exiting from the scope of the invention as defined by the annexed claims.

Claims

1. A system for detecting the contents of web pages, the system including at least one user terminal (10) equipped with a browser to surf web pages in a net (N) , wherein to said browser a contents detection module (TB) is coupled configured to detect (2006) the contents of pages opened (2004) via said browser with a contents detection action driven via said browser.
2. The system of claim 1, wherein said contents detection module (TB) is configured to provide (2010) to the user of said at least one user terminal (10) feedback information on the contents of said pages.
3. The system of claim 1 or claim 2, wherein said contents detection module (TB) is configured to interact with a centralized server subsystem (CS) to send (3004) to said centralized server subsystem (CS) information on the contents of said pages in view of possible storing (3010) of said information in said centralized server subsystem (CS) .
4. The system of any of previous claims, wherein said contents detection module (TB) is organized in analysis libraries (104) having functions selected out of:
- statistical analysis, to provide quantitative values indicative of the presence of a given content;
- classification, to provide indicators indicative of said contents belonging to specific classification categories ;
- semantic analysis, to detect the presence of a given text and meaningful and/or specific elements for each application sector;
- image analysis, to detect the presence of images identical or similar to sample images;
- hidden text analysis, to analyze the HTML text of a page by searching parts/elements not visible in the browser.
5. A method of detecting the contents of web pages via at least one user terminal (10) equipped with a browser to surf web pages in a net (N) , the method including coupling to said browser a contents detection module (TB) configured to detect (2006) the contents of pages opened (2004) via said browser with a contents detection action driven via said browser.
6. The method of claim 5, including providing
(2010) to the user of said at least one user terminal (10), via said contents detection module (TB) , feedback information on the contents of said pages.
7. The method of claim 5 or claim 6, including sending to a centralized server subsystem (CS) from said at least one user terminal (10) information on the contents of said pages in view of possible storing (3010) of said information in said centralized server subsystem (CS) .
8. The method of any of the preceding claims 5 to
7, wherein said detecting includes functions selected out of :
statistical analysis, to provide quantitative values indicative of the presence of a given content;
- classification, to provide indicators indicative of said contents belonging to specific classification categories ;
- semantic analysis, to detect the presence of a given text and meaningful and/or specific elements for each application sector;
- image analysis, to detect the presence of images identical or similar to sample images;
- hidden text analysis, to analyze the HTML text of a page by searching parts/elements not visible in the browser.
9. A computer program product, loadable in the memory of at least one computer and including software code portions to perform the method of any of claims 5 to 8 when the product is run on at least one computer.
PCT/IB2011/052125 2010-05-18 2011-05-16 System and method for detecting network contents, computer program product therefor WO2011145036A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
ITTO2010A000413A IT1399704B1 (en) 2010-05-18 2010-05-18 SYSTEM AND PROCEDURE FOR THE COLLECTION OF NETWORK CONTENT, CORRESPONDENT IT PRODUCT
ITTO2010A000413 2010-05-18

Publications (1)

Publication Number Publication Date
WO2011145036A1 true WO2011145036A1 (en) 2011-11-24

Family

ID=43301751

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2011/052125 WO2011145036A1 (en) 2010-05-18 2011-05-16 System and method for detecting network contents, computer program product therefor

Country Status (2)

Country Link
IT (1) IT1399704B1 (en)
WO (1) WO2011145036A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105828189A (en) * 2015-01-05 2016-08-03 任子行网络技术股份有限公司 Method of detecting illegal audio and video programs from multiple dimensions

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001008382A1 (en) 1999-07-22 2001-02-01 Emarkmonitor Inc. Process for searching and monitoring for internet trademark usage
US6401118B1 (en) 1998-06-30 2002-06-04 Online Monitoring Services Method and computer program product for an online monitoring search engine
US20040103122A1 (en) * 2002-07-13 2004-05-27 John Irving Method and system for filtered web browsing in a multi-level monitored and filtered system
US6983320B1 (en) 2000-05-23 2006-01-03 Cyveillance, Inc. System, method and computer program product for analyzing e-commerce competition of an entity by utilizing predetermined entity-specific metrics and analyzed statistics from web pages
US20060021031A1 (en) * 2004-06-30 2006-01-26 Scott Leahy Method and system for preventing fraudulent activities
WO2007047871A2 (en) 2005-10-17 2007-04-26 Markmonitor Inc. Client side brand protection
US20090119143A1 (en) * 2005-10-17 2009-05-07 Markmonitor Inc. Brand notification systems and methods

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6401118B1 (en) 1998-06-30 2002-06-04 Online Monitoring Services Method and computer program product for an online monitoring search engine
WO2001008382A1 (en) 1999-07-22 2001-02-01 Emarkmonitor Inc. Process for searching and monitoring for internet trademark usage
US6983320B1 (en) 2000-05-23 2006-01-03 Cyveillance, Inc. System, method and computer program product for analyzing e-commerce competition of an entity by utilizing predetermined entity-specific metrics and analyzed statistics from web pages
US20040103122A1 (en) * 2002-07-13 2004-05-27 John Irving Method and system for filtered web browsing in a multi-level monitored and filtered system
US20060021031A1 (en) * 2004-06-30 2006-01-26 Scott Leahy Method and system for preventing fraudulent activities
WO2007047871A2 (en) 2005-10-17 2007-04-26 Markmonitor Inc. Client side brand protection
US20090119143A1 (en) * 2005-10-17 2009-05-07 Markmonitor Inc. Brand notification systems and methods

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105828189A (en) * 2015-01-05 2016-08-03 任子行网络技术股份有限公司 Method of detecting illegal audio and video programs from multiple dimensions
CN105828189B (en) * 2015-01-05 2018-10-23 任子行网络技术股份有限公司 A kind of method of various dimensions detection violation audio/video program

Also Published As

Publication number Publication date
IT1399704B1 (en) 2013-04-26
ITTO20100413A1 (en) 2011-11-19

Similar Documents

Publication Publication Date Title
Ullah et al. Cyber security threats detection in internet of things using deep learning approach
Mozzaquatro et al. An ontology-based cybersecurity framework for the internet of things
Lei et al. EveDroid: Event-aware Android malware detection against model degrading for IoT devices
Thakur et al. An investigation on cyber security threats and security models
CN103493061B (en) For the method and apparatus tackling Malware
Mouratidis et al. Security requirements engineering for cloud computing: The secure tropos approach
Zhu et al. Android malware detection based on multi-head squeeze-and-excitation residual network
Gärtner et al. Maintaining requirements for long-living software systems by incorporating security knowledge
Wang Statistical techniques for network security: modern statistically-based intrusion detection and protection: modern statistically-based intrusion detection and protection
Trivellato et al. A semantic security framework for systems of systems
Kaur et al. Detection of cross-site scripting (XSS) attacks using machine learning techniques: a review
Sun et al. A matrix decomposition based webshell detection method
Dehmer et al. Collaborative risk management for national security and strategic foresight: Combining qualitative and quantitative operations research approaches
Alnusair et al. Context-aware multimodal recommendations of multimedia data in cyber situational awareness
Alsobeh et al. Integrating data-driven security, model checking, and self-adaptation for IoT systems using BIP components: A conceptual proposal model
Huang et al. A model for aggregation and filtering on encrypted XML streams in fog computing
Bobek et al. Mobile context-based framework for threat monitoring in urban environment with social threat monitor
Mateus-Coelho et al. Exploring Cyber Criminals and Data Privacy Measures
WO2011145036A1 (en) System and method for detecting network contents, computer program product therefor
CN113923037B (en) Anomaly detection optimization device, method and system based on trusted computing
Rodrigues et al. PTMOL: a suitable approach for modeling privacy threats in online social networks
Rajagopal et al. Adopting artificial intelligence in ITIL for information security management—way forward in industry 4.0
Vignesh Saravanan et al. Data Protection and Security Enhancement in Cyber-Physical Systems Using AI and Blockchain
Grojek et al. Ontology-driven artificial intelligence in IoT forensics
Vlachos et al. The SAINT observatory subsystem: an open-source intelligence tool for uncovering cybersecurity threats

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11727776

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11727776

Country of ref document: EP

Kind code of ref document: A1