US20150074121A1 - Semantics graphs for enterprise communication networks - Google Patents

Semantics graphs for enterprise communication networks Download PDF

Info

Publication number
US20150074121A1
US20150074121A1 US14/542,098 US201414542098A US2015074121A1 US 20150074121 A1 US20150074121 A1 US 20150074121A1 US 201414542098 A US201414542098 A US 201414542098A US 2015074121 A1 US2015074121 A1 US 2015074121A1
Authority
US
United States
Prior art keywords
signifier
distance metric
signifiers
enterprise
communication network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/542,098
Inventor
Mehmet Kivanc Ozonat
Claudio Bartolini
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Micro Focus LLC
Original Assignee
Hewlett Packard Development Co LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Development Co LP filed Critical Hewlett Packard Development Co LP
Priority to US14/542,098 priority Critical patent/US20150074121A1/en
Publication of US20150074121A1 publication Critical patent/US20150074121A1/en
Assigned to HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP reassignment HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.
Assigned to ENTIT SOFTWARE LLC reassignment ENTIT SOFTWARE LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Assigned to JPMORGAN CHASE BANK, N.A. reassignment JPMORGAN CHASE BANK, N.A. SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ARCSIGHT, LLC, ATTACHMATE CORPORATION, BORLAND SOFTWARE CORPORATION, ENTIT SOFTWARE LLC, MICRO FOCUS (US), INC., MICRO FOCUS SOFTWARE, INC., NETIQ CORPORATION, SERENA SOFTWARE, INC.
Assigned to JPMORGAN CHASE BANK, N.A. reassignment JPMORGAN CHASE BANK, N.A. SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ARCSIGHT, LLC, ENTIT SOFTWARE LLC
Assigned to MICRO FOCUS LLC reassignment MICRO FOCUS LLC CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: ENTIT SOFTWARE LLC
Assigned to MICRO FOCUS LLC (F/K/A ENTIT SOFTWARE LLC) reassignment MICRO FOCUS LLC (F/K/A ENTIT SOFTWARE LLC) RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0577 Assignors: JPMORGAN CHASE BANK, N.A.
Assigned to NETIQ CORPORATION, BORLAND SOFTWARE CORPORATION, MICRO FOCUS SOFTWARE INC. (F/K/A NOVELL, INC.), MICRO FOCUS (US), INC., MICRO FOCUS LLC (F/K/A ENTIT SOFTWARE LLC), SERENA SOFTWARE, INC, ATTACHMATE CORPORATION reassignment NETIQ CORPORATION RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718 Assignors: JPMORGAN CHASE BANK, N.A.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F17/30861
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • G06F17/30867

Definitions

  • Crawling and retrieval of web content can include browsing the World Wide Web in a methodical and/or orderly fashion to create a copy of visited pages for later processing by a search engine.
  • search engines cannot index the entire Web.
  • Prior approaches to crawling and retrieving web content include the use of focused web crawlers.
  • a focused web crawler estimates a probability of a visited page being relevant to a focus topic and retrieves a link corresponding to the page only if a target probability is reached; however, a focus web crawler may not retrieve a sufficient number of links or sufficiently relevant links. For example, a focus web crawler can download only a fraction of Web pages visited.
  • FIG. 1 is a block diagram illustrating an example of a method for building a semantics graph for an enterprise communication network according to the present disclosure.
  • FIG. 2 is a flow chart illustrating an example of a process for building a semantics graph for an enterprise communication network according to the present disclosure.
  • FIG. 3 illustrates an example of a system according to the present disclosure.
  • An enterprise may use an enterprise network, such as a cloud system and/or Internet network, to distribute workloads.
  • An enterprise network can include a network system to offer services to users of the enterprise (e.g., employees and/or customers).
  • a service as used herein, can include an intangible commodity offer to users of a network.
  • Such services can include computing resources (e.g., storage, memory, processing resources) and/or computer-readable instructions (e.g., programs).
  • a user may benefit from another user's experience with a particular service.
  • users may have difficulty in sharing knowledge, such as services experiences.
  • an enterprise may use an enterprise communication network to assist users of an enterprise network in sharing knowledge, learning from other users' services experiences, and searching for content relevant to the enterprise and/or the enterprise network.
  • the enterprise communication network can include an electronic communication network to connect users of the network to relevant content. Users of the enterprise communication network can contribute to the enterprise communication network through a range of activities such as posting service-related entries, linking entries to content available on internal and external domains, reading comments, commenting on comments, and/or voting on users' entries.
  • the enterprise communication network can act as a social network associated with the enterprise, services offered by the enterprise, and/or documents associated with the enterprise, among other topics.
  • the range of activities that users can contribute to an enterprise communication network can result in the enterprise communication network containing unstructured content. Due to the unstructured nature of the content, a general purpose search engine may not properly function to allow users to search for content in the enterprise communication network. General purpose search engines may utilize measures such as back-links and/or clicks to define a quality and reputation of searched content. In an enterprise communication network, the quality and reputations of content may not be proportional to the number of back-links and/or clicks.
  • a relatedness of content within the enterprise communication network can be identified by automatically learning semantics of signifiers within the enterprise communication network and/or the enterprise network.
  • the signifiers can be identified by gathering content using a search tool and extracting signifiers from the gathered content.
  • a relatedness of the identified signifiers can be defined by calculating a distance metric between pairs of signifiers. Using the defined distance metric, a semantics graph can be built that identifies the proximity of relations between the signifiers.
  • a semantics graph can assist in tagging and searching for content within the enterprise communication network.
  • An example method for building a semantics graph for an enterprise communication network can include calculating a distance metric between a first signifier and a second signifier associated with an enterprise communication network, wherein the distance metric includes a plurality of relationships defined based on a frequency of co-occurrences of the first signifier and the second signifier, and building the semantics graph for the enterprise communication network using the calculated distance metric.
  • FIG. 1 is a block diagram illustrating an example of a method 100 for building a semantics graph for an enterprise communication network according to the present disclosure.
  • the method 100 can be used to build a semantics graph for an enterprise communication network containing a plurality of signifiers.
  • An enterprise communication network can include a network connecting a plurality of users and content through a range of activities.
  • the activities can be related to a services network of the enterprise (e.g., enterprise network).
  • the activities can include posting service-related entries, linking entries to internal enterprise domains and/or external domains, and/or reading, commenting, and/or voting on other user's entries.
  • the enterprise communication network can be a sub-portion and/or contained within the enterprise network.
  • a semantics graph can allow users of the enterprise communication network to search for content within the enterprise communication network.
  • a general purpose search engine may not be able to search for content in the enterprise communication network given the unstructured nature of the content.
  • Such a search engine may function by defining a quality and reputation of content (e.g., domains) based on a number of back-links (e.g., links from other content) and/or clicks by a user.
  • content in the enterprise communication network may not have proportional back-links and/or clicks to the quality and/or reputation of the content.
  • content in the enterprise communication network may not have measureable back-links and/or clicks (e.g., email).
  • semantics of signifiers within the enterprise network may be automatically learned. For instance, automatically learning the semantics of signifiers can include building a semantics graph to identify proximity of signifiers using the method 100 .
  • the method 100 for building a semantics graph for an enterprise communication network can include calculating a distance metric between a first signifier and a second signifier associated with the enterprise communication network, wherein the distance metric includes a plurality of relationships defined based on a frequency of co-occurrences of the first signifier and the second signifier.
  • the plurality of relationships can be based on a frequency of co-related services, a frequency of co-related phrases, and an average location of the first signifier and the second signifier (as discussed further herein).
  • a signifier can include a word, phrase, and/or acronym within the content of the enterprise network and/or the enterprise communication network.
  • the signifiers can be gathered, in various examples, using search tools (e.g., web crawlers) and extraction tools (e.g., extractors) (as discussed further herein).
  • a signifier associated with the enterprise communication network can include a signifier gathered from the enterprise network and/or the enterprise communication network.
  • a distance metric can include a numerical score calculated.
  • the numerical score can represent the proximity of relation between a first signifier and a second signifier.
  • calculating the distance metric can include calculating a weighted Euclidean distance including constructing an n-dimensional feature vector.
  • a Euclidean distance can include an ordinary distance (e.g., numerical description of a distance) between two points.
  • the distance metric can be based on a plurality of criteria to construct the n-dimensional feature vector. Such criteria can be based on a frequency of co-occurrences of the first signifier and the second signifier in the enterprise network and/or the enterprise communication network (e.g., a plurality of relationships). Examples of co-occurrences can include the first signifier and the second signifier in the same list, table, paragraph, and/or linked content (e.g., domains), among other co-occurrences.
  • the method 100 for building a semantics graph for an enterprise communication network can include building the semantics graph for the enterprise communication network using the calculated distance metric.
  • a semantics graph can include a data structure representing concepts that are related to one another.
  • the concepts can include language (e.g., words, phrases, acronyms), for instance.
  • the semantics graph can include a plurality of nodes connected by a plurality of edges.
  • a node can include a vertex representing a signifier.
  • the edges can connect related signifiers.
  • Each edge can be weighted with the score defined by the calculated distance metric between pairs of related signifiers (e.g., the first signifier and the second signifier).
  • Weighting an edge with a score can include associating the score with the edge connecting a pair of related signifiers.
  • the method 100 can include adding the first signifier and the second signifier as nodes on the semantics graph with an edge connecting the first signifier and the second signifier.
  • the edge connecting the first signifier and the second signifier can be weighted with a score defined by the calculated distance metric, in various examples.
  • FIG. 2 is a flow chart illustrating an example of a process 210 for building a semantics graph for an enterprise communication network according to the present disclosure.
  • the process 210 can include gathering content.
  • a search tool can gather content from the enterprise network and/or the enterprise communication network.
  • a search tool can include hardware components and/or computer-readable instruction components designated and/or designed to scan the enterprise network and/or the enterprise communication network to collect data.
  • the search tool can search the enterprise network for the plurality of signifiers (e.g., words, phrases, and/or acronyms).
  • the data can include documents and/or data associated with the enterprise communication network and/or the enterprise network.
  • Such data can include Hypertext Markup Language (HTML) content, email communications, and/or other documents (e.g., SharePoint documents).
  • HTML Hypertext Markup Language
  • a repository builder can gather the content and build a repository with the gather content.
  • a repository builder can include hardware components and/or computer-readable instruction components designated and/or designed to build a repository.
  • a repository can include a source storage system.
  • a repository can include a file folder and/or shared directory. The repository may store the gathered content, for instance.
  • the process 210 can include extracting signifiers.
  • Signifiers can be extracted from the content gathered (e.g., at 212 ).
  • an extraction tool may extract the signifiers.
  • An extraction tool can include hardware components and/or computer-readable instruction components that extract information from an unstructured and/or semi-structured structure (e.g., the content gathered).
  • the extracted signifiers can include a plurality of words, phrases, and/or acronyms extracted through pattern recognition techniques. For instance, with HTML content, signifiers can be located in the title, lists, links, tables, paragraphs, and/or linked content (e.g., domains).
  • the pattern recognition technique used by an extraction tool can identify the location and/or format of the title, lists, links, and tables on the HTML document and extract their members as signifiers.
  • the process 210 can include calculating (e.g., determining) distance metrics for related signifiers.
  • a distance metric for related signifiers can include a calculated distance metric between a first signifier and a second signifier.
  • the process 210 can be used to define a set of proximities (e.g., distance metrics) of the plurality of signifiers as extracted (e.g., at 214 ).
  • calculating a distance metric between a first signifier and a second signifier can include calculating a ratio of co-related services associated with the first signifier and the second signifier (e.g., related signifiers) associated with an enterprise communication network, calculating a ratio of co-related phrases associated with the first signifier and the second signifier, and averaging (e.g., median) a location of the first signifier and the second signifier on the enterprise network (e.g., a plurality of relationships defined based on a frequency of co-occurrences).
  • the sum of the services ratio, the phrases ratio, and the average location can include the distance metric between the first signifier and the second signifier.
  • Calculating a ratio of co-related services associated with related signifiers can include:
  • d 1 ⁇ ( u , v ) ⁇ ( services ⁇ ⁇ related ⁇ ⁇ to ⁇ ⁇ both ⁇ ⁇ u ⁇ ⁇ and ⁇ ⁇ v ) ⁇ ( services ⁇ ⁇ related ⁇ ⁇ to ⁇ ⁇ u + services ⁇ ⁇ related ⁇ ⁇ to ⁇ ⁇ v ) ,
  • the calculated ratio d 1 (u, v) can include a sum of services related to both the first signifier u and the second signifier v divided by a sum of services related to the first signifier u plus services related to the second signifier v.
  • Related services can include a service that references a signifier (e.g., u or v).
  • Services related to both signifiers u, v can include domains and/or documents associated with a service that contains both signifiers u and v.
  • Services related to u can include services related to the first signifier but not related to the second signifier (e.g., services that reference u but do not reference v).
  • Services related to v can include services related to the second signifier but not related to the first signifier (e.g., services that reference v but do not reference u).
  • the denominator in the ratio of d 1 (u, v) can include a sum of independent services (e.g., related to u independent of v and related to v independent of u).
  • determining services related to a first signifier u and a second signifier v can include determining services each signifier (e.g., u and v) is related to. Determining services related to a signifier can include calculating a distance from a service domain to a domain retrieved by the search tool (e.g., web crawler) that contains the signifier (e.g., u or v).
  • the service domain e.g., web page
  • the domain retrieved can include an Internet page that the signifier is located on.
  • the distance from the service domain to the retrieved domain can include a number of links from the service domain to the retrieved domain.
  • each signifier can have a vector of distances between the retrieved domain and each service line.
  • Related services to a signifier can be based on retrieved domains the signifier appears on and a vector of distances from the retrieved domains.
  • a related service can include a service with a distance between a retrieved domain and the service domain that is below a threshold distance.
  • the denominator in the ratio of d 1 (u, v) can include a normalization factor.
  • the numerator can include a monotonically decreasing function and the denominator can include a monotonically increasing function.
  • a monotonic function can include a function between ordered sets that preserves the order.
  • a monotonically decreasing function can include a function wherein the Y-axis decreases (e.g., the distance metric) as the X-axis increases (e.g., sum of services related to both u and v).
  • a monotonically increasing function can include a function wherein the Y-axis increases (e.g., distance metric) as the X-axis decreases (e.g., sum of services related to u plus the services related to v).
  • a distance metric for a first signifier and a second signifier can be smaller than a distance metric between a third signifier and a fourth signifier in response to identifying the first signifier and the second signifier relate to a service (e.g., and the third signifier and fourth signifier do not).
  • Calculating a ratio of co-related phrases associated with related signifiers can include:
  • Alpha can denote a numerical value that remains constant. For instance, in some examples, alpha can be limited to a constant numerical value that is greater than the max of s(u, v).
  • s(u, v) can denote common phrases between a first signifier u and a second signifier v.
  • s(u, v) can include a ratio of a sum of words common to both u and v divided by a sum of the number of words in u plus the number of words in v (e.g., the total phrases of u and v).
  • s(u, v) can be defined as:
  • s ⁇ ( u , v ) ⁇ ( words ⁇ ⁇ common ⁇ ⁇ to ⁇ ⁇ u ⁇ ⁇ and ⁇ ⁇ v ) ⁇ ( number ⁇ ⁇ of ⁇ ⁇ words ⁇ ⁇ in ⁇ ⁇ u + number ⁇ ⁇ of ⁇ ⁇ words ⁇ ⁇ in ⁇ ⁇ v ) .
  • a distance metric for a first signifier and a second signifier can be smaller than a distance metric between a third signifier and a fourth signifier in response to identifying the first signifier and the second signifier have co-related words (e.g., and the third signifier and fourth signifier do not).
  • Calculating an average of the location (e.g. distance) between related signifiers can include:
  • the average location between u and v can include a median of the location distances between u and v on an HTML domain.
  • d3 (u, v) can be defined by a plurality of criteria.
  • the criteria can include rules.
  • An example of the plurality of rules can include:
  • u and v may be smaller than if they do not and smaller than (a).
  • u and v appear in the same (e.g., identical) sub-portion (e.g., table, list, paragraph) of the same domain, than the distance between u and v may be smaller than if they do not and smaller than (a) and (b).
  • a mathematical representation of the rules can include distance a>distance b>distance c.
  • a smaller distance can indicate signifiers are more related than a larger distance, for instance.
  • the present example illustrates the average location as a median of the location, examples in accordance with the present disclosure are not so limited.
  • An average location can include a mean, a geometric mean, an average percentage, and/or a mode, among other averaging techniques.
  • four signifiers may be identified on an enterprise network and/or an enterprise communication network.
  • the first signifier u may be related to the second signifier v and may be located on HTML domains linked together.
  • the second signifier v may be related to the third signifier w but not located on linked HTML domains.
  • the first signifier u and the third signifier w may be unrelated.
  • the third signifier w may be related to the fourth signifier y and may be found on the same HTML domain.
  • the first signifier u may be related to the fourth signifier y and may be found on the same table and/or list on the same HTML domain.
  • the second signifier v and the fourth signifier y may be found to be unrelated.
  • the distance metrics associated with the four signifiers (e.g., u, v, w, y) can be summarized as:
  • the distance metric can be denoted by, for example:
  • d ( u, v ) d 1 ( u, v )+ d 2 ( u, v )+ d 3 ( u, v ),
  • a plurality of distance metrics calculated for a plurality of related signifiers can include a set of proximities between the plurality of signifiers.
  • the process 210 can include building a semantics graph for the enterprise communication network.
  • the semantics graph can be built using the calculated distance metric between the first signifier and the second signifier.
  • the semantics graph can include the defined distance metrics of the plurality of pairs of related signifiers.
  • the set of proximities can be represented (e.g., added to the semantics graph) as edges between the nodes as defined by the distance metrics between related signifiers.
  • the set of edges includes a set of proximities of the plurality of signifiers as defined by distance metrics between pairs related signifiers.
  • the process 210 can utilize a semantics builder 220 for calculating distance metrics for related signifiers (e.g., 216 ) and/or building the semantic graph (e.g., 218 ).
  • the semantics builder 220 can include hardware components and/or computer-readable instruction components designated and/or designed to build a semantics graph associated with the enterprise communication network.
  • the semantics graph can include the set of signifiers as nodes with a set of proximities between the set of signifiers.
  • the set of proximities can be represented (e.g., added to the semantics graph) as edges between the nodes as defined by the distance metrics between related signifiers.
  • FIG. 3 illustrates a block diagram of an example of a system 322 according to the present disclosure.
  • the system 322 can utilize software, hardware, firmware, and/or logic to perform a number of functions.
  • the system 322 can be any combination of hardware (e.g., one or more processing resource 324 , computer-readable medium (CRM), etc.) and program instructions (e.g., computer-readable instructions (CRI)) configured to build a semantics graph for an enterprise communication network.
  • a processing resource 324 can include any number of processors capable of executing instructions stored by a memory resource 328 .
  • Processing resource 324 can be integrated in a single device or distributed across devices.
  • the memory resource 328 can be in communication with a processing resource 324 (e.g., one or more processing devices).
  • the processing resource 324 can be in communication with a tangible non-transitory CRM (e.g., memory resource 328 ) storing a set of CRI executable by the processing resource 324 , as described herein.
  • the CRI can also be stored in remote memory managed by a server and represent an installation package that can be downloaded, installed, and executed.
  • the system 322 can include memory resource 328 , and the processing resource 324 can be coupled to the memory resource 328 .
  • memory resource 328 may be fully or partially integrated in the same device as processing resource 324 or it may be separate but accessible to that device and processing resource 324 .
  • the system 322 may be implemented on a user and/or a client device, on a server device and/or a collection of server devices, and/or on a combination of the user device and the server device and/or devices.
  • Processing resource 324 can execute CRI that can be stored on an internal or external memory resource 328 .
  • the processing resource 324 can execute CRI to perform various functions, including the functions described with respect to FIG. 1 and FIG. 2 .
  • the CRI can include a number of modules 330 , 332 , 334 .
  • the number of modules 330 , 332 , 334 can include CRI that when executed by the processing resource 324 can perform a number of functions.
  • the modules 330 , 332 , 334 can be sub-modules of other modules.
  • the distance metric module 332 and the build semantics graph module 334 can be sub-modules and/or contained within the same computing device.
  • the modules 330 , 332 , 334 can comprise individual modules at separate and distinct locations (e.g., CRM, etc.).
  • An extract module 330 can include CRI that when executed by the processing resource 324 can provide a number of extraction functions.
  • the extract module 330 can extract a plurality of signifiers from an enterprise network and/or an enterprise communication network using an extraction tool.
  • the system 322 can include a search module (not illustrated in the example of FIG. 3 ).
  • the search module can include CRI that when executed by the processing resource 324 can provide a number of search functions.
  • the search module can search the enterprise network and/or the enterprise communication network for content (e.g., documents, signifiers, and/or other relevant data).
  • the content searched for by the search module can be used by the extract module 330 to extract the plurality of signifiers, for instance.
  • a distance metric module 332 can include CRI that when executed by the processing resource 324 can perform a number of calculation functions.
  • the distance metric module 332 can define a distance metric between pairs of related signifiers among the plurality of signifiers.
  • Related signifiers can include signifiers that have a co-occurrence on the enterprise network and/or the enterprise communication network.
  • the distance metric module 332 can include instructions to define a distance metric between pairs of related signifiers that includes instructions to calculate a ratio of co-related services associated with both a first signifier and a second signifier and services related independently to the first signifier and services related independently to the second signifier; calculate a ratio of co-related phrases associated with both the first signifier and the second signifier and phrase related independently to the first signifier and to the second signifier; average a location of the first signifier and the second signifier on the enterprise network; and, define the distance metric as a sum of the ratio of co-related services, the ratio of co-related phrases, and the average location.
  • a build semantics graph module 334 can include CRI that when executed by the processing resource 324 can perform a number of building graph functions.
  • the build semantics graph module 334 can build a semantics graph using the defined distance metrics between pairs of related signifiers, including the defined distance metric of the first signifier and the second signifier.
  • a memory resource 328 can include volatile and/or non-volatile memory, and can be integral, or communicatively coupled, to a computing device, in a wired and/or a wireless manner.
  • the memory resource 328 can be in communication with the processing resource 324 via a communication path 326 local or remote to a machine (e.g., a computing device) associated with the processing resource 324 .
  • the communication path 326 can be such that the memory resource 328 is remote from the processing resource (e.g., 324 ), such as in a network connection between the memory resource 328 and the processing resource (e.g., 324 ).
  • the processing resource 324 coupled to the memory resource 328 can execute CRI to extract a plurality of signifiers from an enterprise network using an extraction tool.
  • the processing resource 324 coupled to the memory resource 328 can also execute CRI to define a distance metric between related signifiers among the plurality of signifiers, wherein defining a distance metric between each pair of related signifiers includes: calculate a ratio of co-related services associated with both a first signifier and a second signifier and services related independently to the first signifier independent and services related independently to the second signifier; calculate a ratio of co-related phrases associated with both the first signifier and the second signifier and phrases related independently to the first signifier and phrases related independently to the second signifier; average a location of the first signifier and the second signifier on the enterprise network; and define the distance metric as a sum of the ratio of co-related services, the ratio of co-related phrases, and the average location.
  • the processing resource 324 coupled to the memory resource 328 can also execute CRI to build a semantics graph
  • logic is an alternative or additional processing resource to execute the actions and/or functions, etc., described herein, which includes hardware (e.g., various forms of transistor logic, application specific integrated circuits (ASICs), etc.), as opposed to computer executable instructions (e.g., software, firmware, etc.) stored in memory and executable by a processor.
  • hardware e.g., various forms of transistor logic, application specific integrated circuits (ASICs), etc.
  • computer executable instructions e.g., software, firmware, etc.

Abstract

Building a semantics graph for an enterprise communication network can include calculating a distance metric between a first signifier and a second signifier associated with an enterprise communication network, wherein the distance metric includes a plurality of relationships defined based on a frequency of co-occurrences of the first signifier and the second signifier, and building a semantics graph for the enterprise communication network using the calculated distance metric.

Description

    PRIORITY INFORMATION
  • This application is a Continuation of U.S. application Ser. No. 13/755,556, filed Jan. 31, 2013, the entire contents of which are incorporated herein by reference in its entirety.
  • BACKGROUND
  • Crawling and retrieval of web content can include browsing the World Wide Web in a methodical and/or orderly fashion to create a copy of visited pages for later processing by a search engine. However, due to the current size of the Web, search engines cannot index the entire Web.
  • Prior approaches to crawling and retrieving web content include the use of focused web crawlers. A focused web crawler estimates a probability of a visited page being relevant to a focus topic and retrieves a link corresponding to the page only if a target probability is reached; however, a focus web crawler may not retrieve a sufficient number of links or sufficiently relevant links. For example, a focus web crawler can download only a fraction of Web pages visited.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram illustrating an example of a method for building a semantics graph for an enterprise communication network according to the present disclosure.
  • FIG. 2 is a flow chart illustrating an example of a process for building a semantics graph for an enterprise communication network according to the present disclosure.
  • FIG. 3 illustrates an example of a system according to the present disclosure.
  • DETAILED DESCRIPTION
  • An enterprise may use an enterprise network, such as a cloud system and/or Internet network, to distribute workloads. An enterprise network, as used herein, can include a network system to offer services to users of the enterprise (e.g., employees and/or customers). A service, as used herein, can include an intangible commodity offer to users of a network. Such services can include computing resources (e.g., storage, memory, processing resources) and/or computer-readable instructions (e.g., programs). A user may benefit from another user's experience with a particular service. However, due to the distributed nature of an enterprise network, users may have difficulty in sharing knowledge, such as services experiences.
  • In some situations, an enterprise may use an enterprise communication network to assist users of an enterprise network in sharing knowledge, learning from other users' services experiences, and searching for content relevant to the enterprise and/or the enterprise network. The enterprise communication network, as used herein, can include an electronic communication network to connect users of the network to relevant content. Users of the enterprise communication network can contribute to the enterprise communication network through a range of activities such as posting service-related entries, linking entries to content available on internal and external domains, reading comments, commenting on comments, and/or voting on users' entries. Thereby, the enterprise communication network can act as a social network associated with the enterprise, services offered by the enterprise, and/or documents associated with the enterprise, among other topics.
  • However, the range of activities that users can contribute to an enterprise communication network can result in the enterprise communication network containing unstructured content. Due to the unstructured nature of the content, a general purpose search engine may not properly function to allow users to search for content in the enterprise communication network. General purpose search engines may utilize measures such as back-links and/or clicks to define a quality and reputation of searched content. In an enterprise communication network, the quality and reputations of content may not be proportional to the number of back-links and/or clicks.
  • In contrast, in examples of the present disclosure a relatedness of content within the enterprise communication network can be identified by automatically learning semantics of signifiers within the enterprise communication network and/or the enterprise network. The signifiers can be identified by gathering content using a search tool and extracting signifiers from the gathered content. A relatedness of the identified signifiers can be defined by calculating a distance metric between pairs of signifiers. Using the defined distance metric, a semantics graph can be built that identifies the proximity of relations between the signifiers. A semantics graph can assist in tagging and searching for content within the enterprise communication network.
  • Examples of the present disclosure may include methods, systems, and computer-readable and executable instructions and/or logic. An example method for building a semantics graph for an enterprise communication network can include calculating a distance metric between a first signifier and a second signifier associated with an enterprise communication network, wherein the distance metric includes a plurality of relationships defined based on a frequency of co-occurrences of the first signifier and the second signifier, and building the semantics graph for the enterprise communication network using the calculated distance metric.
  • In the following detailed description of the present disclosure, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration how examples of the disclosure may be practiced. These examples are described in sufficient detail to enable those of ordinary skill in the art to practice the examples of this disclosure, and it is to be understood that other examples may be utilized and the process, electrical, and/or structural changes may be made without departing from the scope of the present disclosure.
  • The figures herein follow a numbering convention in which the first digit or digits correspond to the drawing figure number and the remaining digits identify an element or component in the drawing. Similar elements or components between different figures may be identified by the use of similar digits. Elements shown in the various examples herein can be added, exchanged, and/or eliminated so as to provide a number of additional examples of the present disclosure.
  • In addition, the proportion and the relative scale of the elements provided in the figures are intended to illustrate the examples of the present disclosure, and should not be taken in a limiting sense. As used herein, “a number of” an element and/or feature can refer to one or more of such elements and/or features.
  • FIG. 1 is a block diagram illustrating an example of a method 100 for building a semantics graph for an enterprise communication network according to the present disclosure. The method 100 can be used to build a semantics graph for an enterprise communication network containing a plurality of signifiers.
  • An enterprise communication network, as used herein, can include a network connecting a plurality of users and content through a range of activities. The activities can be related to a services network of the enterprise (e.g., enterprise network). For example, the activities can include posting service-related entries, linking entries to internal enterprise domains and/or external domains, and/or reading, commenting, and/or voting on other user's entries. In various examples of the present disclosure, the enterprise communication network can be a sub-portion and/or contained within the enterprise network.
  • A semantics graph, as may be built using the method 100, can allow users of the enterprise communication network to search for content within the enterprise communication network. A general purpose search engine may not be able to search for content in the enterprise communication network given the unstructured nature of the content. Such a search engine may function by defining a quality and reputation of content (e.g., domains) based on a number of back-links (e.g., links from other content) and/or clicks by a user. However, content in the enterprise communication network may not have proportional back-links and/or clicks to the quality and/or reputation of the content. In some instances, content in the enterprise communication network may not have measureable back-links and/or clicks (e.g., email). In order to search content within the enterprise communication network, semantics of signifiers within the enterprise network may be automatically learned. For instance, automatically learning the semantics of signifiers can include building a semantics graph to identify proximity of signifiers using the method 100.
  • At 102, the method 100 for building a semantics graph for an enterprise communication network can include calculating a distance metric between a first signifier and a second signifier associated with the enterprise communication network, wherein the distance metric includes a plurality of relationships defined based on a frequency of co-occurrences of the first signifier and the second signifier. For instance, the plurality of relationships can be based on a frequency of co-related services, a frequency of co-related phrases, and an average location of the first signifier and the second signifier (as discussed further herein).
  • A signifier, as used herein, can include a word, phrase, and/or acronym within the content of the enterprise network and/or the enterprise communication network. The signifiers can be gathered, in various examples, using search tools (e.g., web crawlers) and extraction tools (e.g., extractors) (as discussed further herein). A signifier associated with the enterprise communication network can include a signifier gathered from the enterprise network and/or the enterprise communication network.
  • A distance metric, as used herein, can include a numerical score calculated. The numerical score can represent the proximity of relation between a first signifier and a second signifier. For instance, calculating the distance metric can include calculating a weighted Euclidean distance including constructing an n-dimensional feature vector. A Euclidean distance can include an ordinary distance (e.g., numerical description of a distance) between two points. The distance metric can be based on a plurality of criteria to construct the n-dimensional feature vector. Such criteria can be based on a frequency of co-occurrences of the first signifier and the second signifier in the enterprise network and/or the enterprise communication network (e.g., a plurality of relationships). Examples of co-occurrences can include the first signifier and the second signifier in the same list, table, paragraph, and/or linked content (e.g., domains), among other co-occurrences.
  • At 104, the method 100 for building a semantics graph for an enterprise communication network can include building the semantics graph for the enterprise communication network using the calculated distance metric. A semantics graph, as used here, can include a data structure representing concepts that are related to one another. The concepts can include language (e.g., words, phrases, acronyms), for instance. The semantics graph can include a plurality of nodes connected by a plurality of edges. A node can include a vertex representing a signifier. The edges can connect related signifiers. Each edge can be weighted with the score defined by the calculated distance metric between pairs of related signifiers (e.g., the first signifier and the second signifier). Weighting an edge with a score, as used herein, can include associating the score with the edge connecting a pair of related signifiers.
  • For instance, the method 100 can include adding the first signifier and the second signifier as nodes on the semantics graph with an edge connecting the first signifier and the second signifier. The edge connecting the first signifier and the second signifier can be weighted with a score defined by the calculated distance metric, in various examples.
  • FIG. 2 is a flow chart illustrating an example of a process 210 for building a semantics graph for an enterprise communication network according to the present disclosure.
  • At 212, the process 210 can include gathering content. For instance, a search tool can gather content from the enterprise network and/or the enterprise communication network. A search tool, as used herein, can include hardware components and/or computer-readable instruction components designated and/or designed to scan the enterprise network and/or the enterprise communication network to collect data. For instance, the search tool can search the enterprise network for the plurality of signifiers (e.g., words, phrases, and/or acronyms). The data can include documents and/or data associated with the enterprise communication network and/or the enterprise network. Such data can include Hypertext Markup Language (HTML) content, email communications, and/or other documents (e.g., SharePoint documents).
  • In various examples the present disclosure, a repository builder can gather the content and build a repository with the gather content. A repository builder can include hardware components and/or computer-readable instruction components designated and/or designed to build a repository. A repository can include a source storage system. For example, a repository can include a file folder and/or shared directory. The repository may store the gathered content, for instance.
  • At 214, the process 210 can include extracting signifiers. Signifiers can be extracted from the content gathered (e.g., at 212). For instance, an extraction tool may extract the signifiers. An extraction tool can include hardware components and/or computer-readable instruction components that extract information from an unstructured and/or semi-structured structure (e.g., the content gathered).
  • The extracted signifiers can include a plurality of words, phrases, and/or acronyms extracted through pattern recognition techniques. For instance, with HTML content, signifiers can be located in the title, lists, links, tables, paragraphs, and/or linked content (e.g., domains). The pattern recognition technique used by an extraction tool can identify the location and/or format of the title, lists, links, and tables on the HTML document and extract their members as signifiers.
  • At 216, the process 210 can include calculating (e.g., determining) distance metrics for related signifiers. A distance metric for related signifiers can include a calculated distance metric between a first signifier and a second signifier. The process 210 can be used to define a set of proximities (e.g., distance metrics) of the plurality of signifiers as extracted (e.g., at 214).
  • For instance, as illustrated in the example of FIG. 2, calculating a distance metric between a first signifier and a second signifier can include calculating a ratio of co-related services associated with the first signifier and the second signifier (e.g., related signifiers) associated with an enterprise communication network, calculating a ratio of co-related phrases associated with the first signifier and the second signifier, and averaging (e.g., median) a location of the first signifier and the second signifier on the enterprise network (e.g., a plurality of relationships defined based on a frequency of co-occurrences). The sum of the services ratio, the phrases ratio, and the average location can include the distance metric between the first signifier and the second signifier.
  • Calculating a ratio of co-related services associated with related signifiers can include:
  • d 1 ( u , v ) = ( services related to both u and v ) ( services related to u + services related to v ) ,
  • wherein the calculated ratio d1 (u, v) can include a sum of services related to both the first signifier u and the second signifier v divided by a sum of services related to the first signifier u plus services related to the second signifier v. Related services, as used herein, can include a service that references a signifier (e.g., u or v). Services related to both signifiers u, v can include domains and/or documents associated with a service that contains both signifiers u and v. Services related to u can include services related to the first signifier but not related to the second signifier (e.g., services that reference u but do not reference v). Services related to v can include services related to the second signifier but not related to the first signifier (e.g., services that reference v but do not reference u). In other words, the denominator in the ratio of d1 (u, v) can include a sum of independent services (e.g., related to u independent of v and related to v independent of u).
  • In various examples of the present disclosure, determining services related to a first signifier u and a second signifier v can include determining services each signifier (e.g., u and v) is related to. Determining services related to a signifier can include calculating a distance from a service domain to a domain retrieved by the search tool (e.g., web crawler) that contains the signifier (e.g., u or v). The service domain (e.g., web page) can include an Internet page that is the main location of the service. The domain retrieved can include an Internet page that the signifier is located on. The distance from the service domain to the retrieved domain can include a number of links from the service domain to the retrieved domain. In some instances, there may be multiple paths (e.g., sequence of links) for a user to go from the retrieved domain to the service domain and/or vice versa. The distance, in such an instance, can include the path with the lowest number of links among the multiple paths. Thereby, each signifier can have a vector of distances between the retrieved domain and each service line. Related services to a signifier (e.g., first signifier u) can be based on retrieved domains the signifier appears on and a vector of distances from the retrieved domains. For instance, a related service can include a service with a distance between a retrieved domain and the service domain that is below a threshold distance.
  • The denominator in the ratio of d1 (u, v) can include a normalization factor. In addition, the numerator can include a monotonically decreasing function and the denominator can include a monotonically increasing function. A monotonic function can include a function between ordered sets that preserves the order. A monotonically decreasing function can include a function wherein the Y-axis decreases (e.g., the distance metric) as the X-axis increases (e.g., sum of services related to both u and v). A monotonically increasing function can include a function wherein the Y-axis increases (e.g., distance metric) as the X-axis decreases (e.g., sum of services related to u plus the services related to v). Thereby, a distance metric for a first signifier and a second signifier can be smaller than a distance metric between a third signifier and a fourth signifier in response to identifying the first signifier and the second signifier relate to a service (e.g., and the third signifier and fourth signifier do not).
  • Calculating a ratio of co-related phrases associated with related signifiers can include:

  • d 2(u, v)=a−s(u, v).
  • Alpha can denote a numerical value that remains constant. For instance, in some examples, alpha can be limited to a constant numerical value that is greater than the max of s(u, v). As used herein, s(u, v) can denote common phrases between a first signifier u and a second signifier v. For instance, s(u, v) can include a ratio of a sum of words common to both u and v divided by a sum of the number of words in u plus the number of words in v (e.g., the total phrases of u and v). For example, s(u, v) can be defined as:
  • s ( u , v ) = ( words common to u and v ) ( number of words in u + number of words in v ) .
  • Thereby, a distance metric for a first signifier and a second signifier can be smaller than a distance metric between a third signifier and a fourth signifier in response to identifying the first signifier and the second signifier have co-related words (e.g., and the third signifier and fourth signifier do not).
  • Calculating an average of the location (e.g. distance) between related signifiers can include:

  • d 3(u, v)=median(location between u and v).
  • The average location between u and v can include a median of the location distances between u and v on an HTML domain. For example, d3 (u, v) can be defined by a plurality of criteria. The criteria can include rules. An example of the plurality of rules can include:
  • a. if u and v appear on linked domains than the distance between u and v may be smaller than if they do not.
  • b. if u and v appear on the same (e.g., identical) domain, than the distance between u and v may be smaller than if they do not and smaller than (a).
  • c. if u and v appear in the same (e.g., identical) sub-portion (e.g., table, list, paragraph) of the same domain, than the distance between u and v may be smaller than if they do not and smaller than (a) and (b).
  • Thereby a mathematical representation of the rules can include distance a>distance b>distance c. A smaller distance can indicate signifiers are more related than a larger distance, for instance. Although the present example illustrates the average location as a median of the location, examples in accordance with the present disclosure are not so limited. An average location can include a mean, a geometric mean, an average percentage, and/or a mode, among other averaging techniques.
  • As an example of calculating d3, four signifiers may be identified on an enterprise network and/or an enterprise communication network. The first signifier u may be related to the second signifier v and may be located on HTML domains linked together. The second signifier v may be related to the third signifier w but not located on linked HTML domains. The first signifier u and the third signifier w may be unrelated. The third signifier w may be related to the fourth signifier y and may be found on the same HTML domain. The first signifier u may be related to the fourth signifier y and may be found on the same table and/or list on the same HTML domain. The second signifier v and the fourth signifier y may be found to be unrelated. The distance metrics associated with the four signifiers (e.g., u, v, w, y) can be summarized as:

  • d 3(u, y)<d 3(w, y)<d 3(u, v)<d 3(v, w)<d3(v, y), d 3(u, w).
  • The distance metric can be denoted by, for example:

  • d(u, v)=d 1(u, v)+d 2(u, v)+d 3(u, v),
  • and can be calculated for each subset of related signifiers (e.g., each pair of related signifiers). Thereby, a plurality of distance metrics calculated for a plurality of related signifiers can include a set of proximities between the plurality of signifiers.
  • At 218, the process 210 can include building a semantics graph for the enterprise communication network. The semantics graph can be built using the calculated distance metric between the first signifier and the second signifier. In various examples of the present disclosure, the semantics graph can include the defined distance metrics of the plurality of pairs of related signifiers. The set of proximities can be represented (e.g., added to the semantics graph) as edges between the nodes as defined by the distance metrics between related signifiers. The set of edges includes a set of proximities of the plurality of signifiers as defined by distance metrics between pairs related signifiers.
  • The process 210 can utilize a semantics builder 220 for calculating distance metrics for related signifiers (e.g., 216) and/or building the semantic graph (e.g., 218). The semantics builder 220 can include hardware components and/or computer-readable instruction components designated and/or designed to build a semantics graph associated with the enterprise communication network. For instance, the semantics graph can include the set of signifiers as nodes with a set of proximities between the set of signifiers. The set of proximities can be represented (e.g., added to the semantics graph) as edges between the nodes as defined by the distance metrics between related signifiers.
  • FIG. 3 illustrates a block diagram of an example of a system 322 according to the present disclosure. The system 322 can utilize software, hardware, firmware, and/or logic to perform a number of functions.
  • In a number of examples, the system 322 can be any combination of hardware (e.g., one or more processing resource 324, computer-readable medium (CRM), etc.) and program instructions (e.g., computer-readable instructions (CRI)) configured to build a semantics graph for an enterprise communication network. A processing resource 324, as used herein, can include any number of processors capable of executing instructions stored by a memory resource 328. Processing resource 324 can be integrated in a single device or distributed across devices.
  • The memory resource 328 can be in communication with a processing resource 324 (e.g., one or more processing devices). For instance, the processing resource 324 can be in communication with a tangible non-transitory CRM (e.g., memory resource 328) storing a set of CRI executable by the processing resource 324, as described herein. The CRI can also be stored in remote memory managed by a server and represent an installation package that can be downloaded, installed, and executed. The system 322 can include memory resource 328, and the processing resource 324 can be coupled to the memory resource 328. Further, memory resource 328 may be fully or partially integrated in the same device as processing resource 324 or it may be separate but accessible to that device and processing resource 324. Thus, it is noted that the system 322 may be implemented on a user and/or a client device, on a server device and/or a collection of server devices, and/or on a combination of the user device and the server device and/or devices.
  • Processing resource 324 can execute CRI that can be stored on an internal or external memory resource 328. The processing resource 324 can execute CRI to perform various functions, including the functions described with respect to FIG. 1 and FIG. 2.
  • The CRI can include a number of modules 330, 332, 334. The number of modules 330, 332, 334 can include CRI that when executed by the processing resource 324 can perform a number of functions.
  • The modules 330, 332, 334 can be sub-modules of other modules. For example, the distance metric module 332 and the build semantics graph module 334 can be sub-modules and/or contained within the same computing device. In another example, the modules 330, 332, 334 can comprise individual modules at separate and distinct locations (e.g., CRM, etc.).
  • An extract module 330 can include CRI that when executed by the processing resource 324 can provide a number of extraction functions. The extract module 330 can extract a plurality of signifiers from an enterprise network and/or an enterprise communication network using an extraction tool.
  • In various examples of the present disclosure, the system 322 can include a search module (not illustrated in the example of FIG. 3). The search module can include CRI that when executed by the processing resource 324 can provide a number of search functions. The search module can search the enterprise network and/or the enterprise communication network for content (e.g., documents, signifiers, and/or other relevant data). The content searched for by the search module can be used by the extract module 330 to extract the plurality of signifiers, for instance.
  • A distance metric module 332 can include CRI that when executed by the processing resource 324 can perform a number of calculation functions. The distance metric module 332 can define a distance metric between pairs of related signifiers among the plurality of signifiers. Related signifiers can include signifiers that have a co-occurrence on the enterprise network and/or the enterprise communication network.
  • The distance metric module 332 can include instructions to define a distance metric between pairs of related signifiers that includes instructions to calculate a ratio of co-related services associated with both a first signifier and a second signifier and services related independently to the first signifier and services related independently to the second signifier; calculate a ratio of co-related phrases associated with both the first signifier and the second signifier and phrase related independently to the first signifier and to the second signifier; average a location of the first signifier and the second signifier on the enterprise network; and, define the distance metric as a sum of the ratio of co-related services, the ratio of co-related phrases, and the average location.
  • A build semantics graph module 334 can include CRI that when executed by the processing resource 324 can perform a number of building graph functions. The build semantics graph module 334 can build a semantics graph using the defined distance metrics between pairs of related signifiers, including the defined distance metric of the first signifier and the second signifier.
  • A memory resource 328, as used herein, can include volatile and/or non-volatile memory, and can be integral, or communicatively coupled, to a computing device, in a wired and/or a wireless manner. The memory resource 328 can be in communication with the processing resource 324 via a communication path 326 local or remote to a machine (e.g., a computing device) associated with the processing resource 324. The communication path 326 can be such that the memory resource 328 is remote from the processing resource (e.g., 324), such as in a network connection between the memory resource 328 and the processing resource (e.g., 324).
  • The processing resource 324 coupled to the memory resource 328 can execute CRI to extract a plurality of signifiers from an enterprise network using an extraction tool. The processing resource 324 coupled to the memory resource 328 can also execute CRI to define a distance metric between related signifiers among the plurality of signifiers, wherein defining a distance metric between each pair of related signifiers includes: calculate a ratio of co-related services associated with both a first signifier and a second signifier and services related independently to the first signifier independent and services related independently to the second signifier; calculate a ratio of co-related phrases associated with both the first signifier and the second signifier and phrases related independently to the first signifier and phrases related independently to the second signifier; average a location of the first signifier and the second signifier on the enterprise network; and define the distance metric as a sum of the ratio of co-related services, the ratio of co-related phrases, and the average location. The processing resource 324 coupled to the memory resource 328 can also execute CRI to build a semantics graph for the enterprise communication network using the defined distance metrics between pairs of related signifiers, including the defined distance metric of the first signifier and the second signifier.
  • As used herein, “logic” is an alternative or additional processing resource to execute the actions and/or functions, etc., described herein, which includes hardware (e.g., various forms of transistor logic, application specific integrated circuits (ASICs), etc.), as opposed to computer executable instructions (e.g., software, firmware, etc.) stored in memory and executable by a processor.
  • The specification examples provide a description of the applications and use of the system and method of the present disclosure. Since many examples can be made without departing from the spirit and scope of the system and method of the present disclosure, this specification sets forth some of the many possible example configurations and implementations.

Claims (15)

What is claimed:
1. A method for building a semantics graph for an enterprise communication network, comprising:
calculating a distance metric between a first signifier and a second signifier associated with the enterprise communication network, wherein the distance metric includes a plurality of relationships defined based on a frequency of co-occurrences of the first signifier and the second signifier using a computing device; and
building a semantics graph for the enterprise communication network using the calculated distance metric.
2. The method of claim 1, wherein calculating the distance metric includes calculating a weighted Euclidean distance including constructing an n-dimensional feature vector.
3. The method of claim 1, wherein building the semantics graph includes adding the first signifier and the second signifier as nodes on the semantics graph with an edge connecting the first signifier and the second signifier.
4. The method of claim 3, including weighting the edge with a score defined by the calculated distance metric.
5. The method of claim 1, including searching an enterprise network for a plurality of signifiers, including words, phrases, and/or acronyms using a search tool.
6. The method of claim 1, including calculating the distance metric based on a plurality of criteria.
7. A non-transitory computer-readable medium storing a set of instructions executable by a processing resource, wherein the set of instructions can be executed by the processing resource to:
calculate a distance metric between a first signifier and a second signifier associated with an enterprise communication network, wherein the distance metric includes a plurality of relationships defined based on:
a frequency of co-related services associated with the first signifier and the second signifier;
a frequency of co-related phrases associated with the first signifier and the second signifier; and
an average location of the first signifier and the second signifier on the enterprise network; and
build a semantics graph for the enterprise communication network using the calculated distance metric.
8. The medium of claim 7, wherein the instructions executable by the processing resource include instructions executable to extract a plurality of signifiers from the enterprise network using an extraction tool, wherein the plurality of signifiers include the first signifier and the second signifier.
9. The medium of claim 8, wherein the instructions executable by the processing resource include instructions executable to calculate a distance metric for related signifiers among the plurality of signifiers, including the distance metric for the first signifier and the second signifier.
10. The medium of claim 7, wherein the instructions executable by the processing resource include instructions executable to:
divide the frequency of co-related services associated with the first signifier and the second signifier by a frequency of services related independently to the first signifier and services related independently to the second signifier to determine a ratio of co-related services of the first signifier and the second signifier; and
divide the frequency of co-related phrases associated with the first signifier and the second signifier by a frequency of phrases related independently to the first signifier and phrases related independently to the second signifier to determine a ratio of co-related phrases of the first signifier and the second signifier.
11. The medium of claim 10, wherein the instructions executable by the processing resource to calculate the distance metric between a first signifier and a second signifier include instructions executable to define the distance metric as a sum of the ratio of co-related services, the ratio of co-related phrases, and the average location.
12. The medium of claim 7, wherein the instructions executable by the processing resource to build the semantics graph include instructions executable to add a first node, a second node, and an edge to the semantics graph, wherein the first node represents the first signifier, the second node represents the second signifier, and the edge connects the first node and the second node.
13. The medium of claim 12, wherein the edge is weighted by the distance metric.
14. A system for building a semantics graph for an enterprise communication network comprising:
a processing resource; and
a memory resource communicatively coupled to the processing resource containing instructions executable by the processing resource to:
extract a plurality of signifiers from an enterprise network using an extraction tool;
define a distance metric between pairs of related signifiers among the plurality of signifiers, wherein defining a distance metric between a first signifier and a second signifier includes:
calculate a frequency of co-related services associated with the first signifier and the second signifier;
calculate a frequency of co-related phrases associated with the first signifier and the second signifier;
average a location of the first signifier and the second signifier on the enterprise network; and
define the distance metric as a sum of the frequency of co-related services, the frequency of co-related phrases, and the average location; and
build a semantics graph for the enterprise communication network using the defined distance metrics between pairs of related signifiers, including the defined distance metric of the first signifier and the second signifier.
15. The system of claim 14, wherein related signifiers include signifiers that have a co-occurrence on the enterprise network.
US14/542,098 2013-01-31 2014-11-14 Semantics graphs for enterprise communication networks Abandoned US20150074121A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/542,098 US20150074121A1 (en) 2013-01-31 2014-11-14 Semantics graphs for enterprise communication networks

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13/755,556 US8914416B2 (en) 2013-01-31 2013-01-31 Semantics graphs for enterprise communication networks
US14/542,098 US20150074121A1 (en) 2013-01-31 2014-11-14 Semantics graphs for enterprise communication networks

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/755,556 Continuation US8914416B2 (en) 2013-01-31 2013-01-31 Semantics graphs for enterprise communication networks

Publications (1)

Publication Number Publication Date
US20150074121A1 true US20150074121A1 (en) 2015-03-12

Family

ID=51224159

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/755,556 Active 2033-06-23 US8914416B2 (en) 2013-01-31 2013-01-31 Semantics graphs for enterprise communication networks
US14/542,098 Abandoned US20150074121A1 (en) 2013-01-31 2014-11-14 Semantics graphs for enterprise communication networks

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US13/755,556 Active 2033-06-23 US8914416B2 (en) 2013-01-31 2013-01-31 Semantics graphs for enterprise communication networks

Country Status (1)

Country Link
US (2) US8914416B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160179965A1 (en) * 2014-12-19 2016-06-23 International Business Machines Corporation Information propagation via weighted semantic and social graphs
CN108027825A (en) * 2015-09-04 2018-05-11 微软技术许可有限责任公司 The exposure exterior content in enterprise

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9886950B2 (en) * 2013-09-08 2018-02-06 Intel Corporation Automatic generation of domain models for virtual personal assistants
US9436760B1 (en) * 2016-02-05 2016-09-06 Quid, Inc. Measuring accuracy of semantic graphs with exogenous datasets

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030225749A1 (en) * 2002-05-31 2003-12-04 Cox James A. Computer-implemented system and method for text-based document processing
US20080250039A1 (en) * 2007-04-04 2008-10-09 Seeqpod, Inc. Discovering and scoring relationships extracted from human generated lists
US20090234832A1 (en) * 2008-03-12 2009-09-17 Microsoft Corporation Graph-based keyword expansion
US20110179084A1 (en) * 2008-09-19 2011-07-21 Motorola, Inc. Selection of associated content for content items
US7996379B1 (en) * 2008-02-01 2011-08-09 Google Inc. Document ranking using word relationships

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6539374B2 (en) 1999-06-03 2003-03-25 Microsoft Corporation Methods, apparatus and data structures for providing a uniform representation of various types of information
US6711585B1 (en) * 1999-06-15 2004-03-23 Kanisa Inc. System and method for implementing a knowledge management system
US7296001B1 (en) 1999-07-12 2007-11-13 Ariba, Inc. Electronic multilateral negotiation system
US7103580B1 (en) 2000-03-30 2006-09-05 Voxage, Ltd. Negotiation using intelligent agents
US7624337B2 (en) 2000-07-24 2009-11-24 Vmark, Inc. System and method for indexing, searching, identifying, and editing portions of electronic multimedia files
KR20020045343A (en) 2000-12-08 2002-06-19 오길록 Method of information generation and retrieval system based on a standardized Representation format of sentences structures and meanings
GB0107290D0 (en) 2001-03-23 2001-05-16 Hewlett Packard Co Method and data structure for participation in multiple negotiations
US7072883B2 (en) 2001-12-21 2006-07-04 Ut-Battelle Llc System for gathering and summarizing internet information
GB2385954A (en) 2002-02-04 2003-09-03 Magenta Corp Ltd Managing a Virtual Environment
US20060074980A1 (en) 2004-09-29 2006-04-06 Sarkar Pte. Ltd. System for semantically disambiguating text information
WO2006087854A1 (en) 2004-11-25 2006-08-24 Sharp Kabushiki Kaisha Information classifying device, information classifying method, information classifying program, information classifying system
US8112324B2 (en) 2006-03-03 2012-02-07 Amazon Technologies, Inc. Collaborative structured tagging for item encyclopedias
US8676802B2 (en) 2006-11-30 2014-03-18 Oracle Otc Subsidiary Llc Method and system for information retrieval with clustering
JP4451435B2 (en) * 2006-12-06 2010-04-14 本田技研工業株式会社 Language understanding device, language understanding method, and computer program
US8171029B2 (en) * 2007-10-05 2012-05-01 Fujitsu Limited Automatic generation of ontologies using word affinities
US20090313173A1 (en) 2008-06-11 2009-12-17 Inderpal Singh Dynamic Negotiation System
EP2386089A4 (en) 2009-01-12 2013-01-16 Namesforlife Llc Systems and methods for automatically identifying and linking names in digital resources
US8041729B2 (en) * 2009-02-20 2011-10-18 Yahoo! Inc. Categorizing queries and expanding keywords with a coreference graph
US9069754B2 (en) 2010-09-29 2015-06-30 Rhonda Enterprises, Llc Method, system, and computer readable medium for detecting related subgroups of text in an electronic document

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030225749A1 (en) * 2002-05-31 2003-12-04 Cox James A. Computer-implemented system and method for text-based document processing
US20080250039A1 (en) * 2007-04-04 2008-10-09 Seeqpod, Inc. Discovering and scoring relationships extracted from human generated lists
US7996379B1 (en) * 2008-02-01 2011-08-09 Google Inc. Document ranking using word relationships
US20090234832A1 (en) * 2008-03-12 2009-09-17 Microsoft Corporation Graph-based keyword expansion
US20110179084A1 (en) * 2008-09-19 2011-07-21 Motorola, Inc. Selection of associated content for content items

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160179965A1 (en) * 2014-12-19 2016-06-23 International Business Machines Corporation Information propagation via weighted semantic and social graphs
US10146875B2 (en) * 2014-12-19 2018-12-04 International Business Machines Corporation Information propagation via weighted semantic and social graphs
US10949481B2 (en) 2014-12-19 2021-03-16 International Business Machines Corporation Information propagation via weighted semantic and social graphs
CN108027825A (en) * 2015-09-04 2018-05-11 微软技术许可有限责任公司 The exposure exterior content in enterprise

Also Published As

Publication number Publication date
US8914416B2 (en) 2014-12-16
US20140214858A1 (en) 2014-07-31

Similar Documents

Publication Publication Date Title
US9264505B2 (en) Building a semantics graph for an enterprise communication network
US9442928B2 (en) System, method and computer program product for automatic topic identification using a hypertext corpus
US9442930B2 (en) System, method and computer program product for automatic topic identification using a hypertext corpus
US8527451B2 (en) Business semantic network build
US8903800B2 (en) System and method for indexing food providers and use of the index in search engines
Cossu et al. A review of features for the discrimination of twitter users: application to the prediction of offline influence
JP2013534334A (en) Method and apparatus for sorting query results
US9355166B2 (en) Clustering signifiers in a semantics graph
US9886711B2 (en) Product recommendations over multiple stores
US11249993B2 (en) Answer facts from structured content
CN103605848A (en) Method and device for analyzing paths
US20150074121A1 (en) Semantics graphs for enterprise communication networks
Franzoni et al. Heuristics for semantic path search in wikipedia
US20180189380A1 (en) Job search engine
Moya et al. Integrating web feed opinions into a corporate data warehouse
Liang et al. Searching for people to follow in social networks
Choudhary et al. Role of ranking algorithms for information retrieval
Nakanishi et al. Interconnection of heterogeneous knowledge bases and its application on Knowledge Grid
US20230244724A1 (en) Method and system for automated public information discovery
Gu et al. Utilizing semantic information from linked open data in web service clustering
Moghaddam et al. A novel temporal trust-based recommender system
Caicedo-Castro S3niffer: A text description-based service search system
Abidi et al. Web service matchmaking using a hybrid of signature and specification matching methods
US9704136B2 (en) Identifying subsets of signifiers to analyze
Yang et al. Product information extraction & analysis

Legal Events

Date Code Title Description
AS Assignment

Owner name: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.;REEL/FRAME:037079/0001

Effective date: 20151027

AS Assignment

Owner name: ENTIT SOFTWARE LLC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP;REEL/FRAME:042746/0130

Effective date: 20170405

AS Assignment

Owner name: JPMORGAN CHASE BANK, N.A., DELAWARE

Free format text: SECURITY INTEREST;ASSIGNORS:ATTACHMATE CORPORATION;BORLAND SOFTWARE CORPORATION;NETIQ CORPORATION;AND OTHERS;REEL/FRAME:044183/0718

Effective date: 20170901

Owner name: JPMORGAN CHASE BANK, N.A., DELAWARE

Free format text: SECURITY INTEREST;ASSIGNORS:ENTIT SOFTWARE LLC;ARCSIGHT, LLC;REEL/FRAME:044183/0577

Effective date: 20170901

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: MICRO FOCUS LLC, CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:ENTIT SOFTWARE LLC;REEL/FRAME:052010/0029

Effective date: 20190528

AS Assignment

Owner name: MICRO FOCUS LLC (F/K/A ENTIT SOFTWARE LLC), CALIFORNIA

Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0577;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:063560/0001

Effective date: 20230131

Owner name: NETIQ CORPORATION, WASHINGTON

Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:062746/0399

Effective date: 20230131

Owner name: MICRO FOCUS SOFTWARE INC. (F/K/A NOVELL, INC.), WASHINGTON

Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:062746/0399

Effective date: 20230131

Owner name: ATTACHMATE CORPORATION, WASHINGTON

Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:062746/0399

Effective date: 20230131

Owner name: SERENA SOFTWARE, INC, CALIFORNIA

Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:062746/0399

Effective date: 20230131

Owner name: MICRO FOCUS (US), INC., MARYLAND

Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:062746/0399

Effective date: 20230131

Owner name: BORLAND SOFTWARE CORPORATION, MARYLAND

Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:062746/0399

Effective date: 20230131

Owner name: MICRO FOCUS LLC (F/K/A ENTIT SOFTWARE LLC), CALIFORNIA

Free format text: RELEASE OF SECURITY INTEREST REEL/FRAME 044183/0718;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:062746/0399

Effective date: 20230131