US20160203417A1 - System and method for using graph transduction techniques to make relational classifications on a single connected network - Google Patents

System and method for using graph transduction techniques to make relational classifications on a single connected network Download PDF

Info

Publication number
US20160203417A1
US20160203417A1 US15/078,408 US201615078408A US2016203417A1 US 20160203417 A1 US20160203417 A1 US 20160203417A1 US 201615078408 A US201615078408 A US 201615078408A US 2016203417 A1 US2016203417 A1 US 2016203417A1
Authority
US
United States
Prior art keywords
graph
edges
nodes
weight
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/078,408
Inventor
Amit Dhurandhar
Jun Wang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US15/078,408 priority Critical patent/US20160203417A1/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DHURANDHAR, AMIT, WANG, JUN
Publication of US20160203417A1 publication Critical patent/US20160203417A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N99/005
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • G06F17/30958
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models

Abstract

A system and method for extending partially labeled data graphs to unlabeled nodes in a single network classification by weighting the data with a weight matrix that uses a modified graph Laplacian based regularization framework and applying graph transduction methods to the weighted data. The technique may be applied to data graphs that are directed or undirected, that may or may not have attributes and that may be homogeneous or heterogeneous.

Description

    BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention generally relates to techniques for statistical relational learning, and more particularly to techniques for making relational classifications on a single connected network.
  • 2. Background Description
  • Given the prevalence of large connected relational graphs across diverse domains, single or within network classification has been one of the popular endeavors in statistical relational learning (SRL) research. Ranging from social networking websites to movie databases to citation networks, large connected relational graphs are banal. In single network classification, we have a partially labeled data graph and the goal is to extend this labeling, as accurately as possible, to the unlabeled nodes. The nodes themselves may or may not have associated attributes. An example where within network classification could be useful is in forming common interest groups on social networking websites. For instance, a group of people in the same geography may be interested in playing soccer and they would be interested in finding more people who are likely to have the same interest. In a different domain such as entertainment, one might be interested in estimating which of the new movies is likely to make a splash at the box office. Based on the success of other movies that had some of the same actors and/or the same director, one could provide a reasonable estimate of which movies are most likely to be successful.
  • Many methods that learn and infer over a data graph have been developed in SRL literature. Some of the more effective methods perform collective classification, that is, besides using the attributes of the unlabeled node to infer its label, they also use attributes and labels of related nodes/entities. These are thus a generalization of methods that assume that the data is independently and identically distributed (i.i.d.). Examples of such methods are relational Markov networks (RMNs), relational dependency networks (RDNs), Markov logic networks (MLNs), and probabilistic relational models (PRMs). These all fall under the umbrella of Markov networks. There have been simpler models suggested as baselines, such as relational neighbor classifiers (RN) which simply choose the most numerous class label amongst their neighbors to more involved variants such as those using relaxation labeling. Interestingly, these simple models perform quite well when the auto-correlation is high, even though the graph may be sparsely labeled. Recently, a pseudo-likelihood expectation maximization (PL-EM) method was introduced, which seems to perform favorably to other methods when the graph has a moderate number (around 20-30%) of labeled nodes.
  • A different class of methods that could potentially address the problem at hand are graph transduction methods, which are a part of semi-supervised learning methods and in some sense are the i.i.d. counterpart of relational methods. These methods typically perform well when we are given a weighted graph and the linked nodes have mostly the same labels—unless apriori dissimilar nodes are explicitly specified —, even if only a small fraction of the labels are known. If a weighted graph is not readily available, it is constructed from the (explanatory) attributes of the nodes. If an unweighted graph with no attributes is given, then the adjacency matrix is passed as input.
  • In relational learning, the graphs are typically unweighted and sometimes may not have attributes. In many cases, the attributes may not accurately predict the labels, in which case, weighting the edges solely on them may not provide acceptable results. The links could be viewed as an additional source of information to determine labels amongst connected nodes. Thus, the weights should also be functions of the known labeling. Some of these intuitions are captured in the relational gaussian process model, but it is limited to undirected graphs and the suggested kernel function is not easy to adapt to relational settings where we may have heterogeneous data.
  • SUMMARY OF THE INVENTION
  • The present invention provides a lucid way to effectively leverage a rich class of graph transduction methods, namely those based on the graph Laplacian regularization framework, to make within network relational classifications. Among the existing graph transduction methods, this class of methods is considered to be one of the most efficient and accurate in real applications. In particular, the invention provides a procedure to learn a weight matrix for a graph that may be directed or undirected, that may exhibit positive or negative auto-correlation and where the edges in the graph may be between labeled nodes, between unlabeled nodes or between a labeled and an unlabeled node.
  • The inventive methodology first provides a solution for a graph where nodes have no attributes, only class labels. We then extend the solution to include attributes (and heterogenous data) by incorporating a conical weighting scheme that weighs importance of the links relative to the attributes. The construction of the weight matrix assumes binary labeling. However, recursive application of the chosen graph transduction method with reconstruction of the weight matrix will accomplish multi-class classification as is shown in the experiments on real data in connection with FIGS. 8A and 8B.
  • When we have a connected unweighted homogeneous/heterogeneous graph that is partially labeled, the goal is to propagate the labels to the unlabeled nodes. In this disclosure, we provide a different perspective on this problem by enabling the effective use of graph transduction techniques. We accomplish this by providing a novel procedure for constructing a weight matrix that serves as input to a rich class of graph transduction techniques. Our procedure has multiple desirable properties. For example, the weights it assigns to edges between unlabeled nodes naturally relate to a measure of association commonly used in statistics, namely the Gamma test statistic. We further portray the efficacy of our approach on synthetic as well as real data, by comparing it with state-of-the-art relational learning algorithms, and graph transduction techniques using a binary adjacency matrix or a real valued weight matrix computed using available attributes as input. In these experiments we see that our approach consistently outperforms other approaches when the graph is sparsely labeled, and remains competitive with the best when the proportion of known labels increases.
  • The invention provides a method and system for extending a partially labeled data graph to unlabeled nodes in a single network classification. The invention operates by constructing a weight matrix for data in a single network classification, applying the weight matrix to the data, and then applying a graph transduction method to the weighted data to generate labels for the unlabeled nodes. In one implementation the weight matrix uses a modified graph Laplacian based regularization framework. In one aspect of the method and system, the edges of the data graph are partitioned into categories, weights are assigned to each category, and each edge is assigned the weight of its respective category. In another implementation the categories are edges between nodes with the same label, edges between nodes with opposite labels, edges between unlabeled nodes, edges between an unlabeled node and a node with a label 1, and edges between an unlabeled node and a node with a label −1.
  • It is also an aspect of the invention to assign weights to edges between unlabeled nodes, where the assigned weight denotes an expectation based on a distribution of edges that have labels. In a variation on this implementation, edges between an unlabeled node and a labeled node are assigned a weight denoting an expectation based on a distribution of edges that have labels, where the distribution is limited to those edges having one node equal to the labeled node. A further variation on this implementation is to assign to each edge a weight that is a conical combination of a weight based on the respective category and a weight based on affinity of attribute values of nodes connected by the edge. In yet another implementation, applying a graph transduction method is accomplished by imposing a tradeoff between a fitting accuracy of a prediction function on labeled data and a smoothness of the prediction function over the graph. It is a further aspect of the invention to estimate the smoothness of the prediction function for the graph Laplacian based regularization framework, and modifying the prediction function to ensure compatibility between the graph transduction method and the graph Laplacian based regularization framework.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The foregoing and other objects, aspects and advantages will be better understood from the following detailed description of a preferred embodiment of the invention with reference to the drawings, in which:
  • FIG. 1 is an example input graph (T) to the invention's construction method.
  • FIG. 2 is a weighted version Tw of graph T shown in FIG. 1.
  • FIG. 3 shows instantiation of graph Tw when the labeled edges have only nodes with the same labels.
  • FIG. 4 shows instantiation of graph Tw when the labeled edges have only nodes with different labels.
  • FIG. 5A represents a relational schema with node types Paper and Author, where the relationship between them is many-to-many; FIG. 5B is the corresponding data graph which shows authors linked to the papers that they authored or co-authored.
  • FIG. 6A is a set of graphs generated by applying the inventive method to preferential attachment synthetic data where the auto-correlation is high; FIG. 6B is a set of graphs generated by applying the inventive method to preferential attachment synthetic data where the auto-correlation is low.
  • FIG. 7A is a set of graphs generated by applying the inventive method to forest fire synthetic data where the auto-correlation is high; FIG. 7B is a set of graphs generated by applying the inventive method to forest fire synthetic data where the auto-correlation is low.
  • FIG. 8A is a set of graphs generated by applying the inventive method to a collection of web pages known as the WEBKB dataset; FIG. 8B is a set of graphs generated by applying the inventive method to a collection of sales information about bread products known as the BREAD dataset.
  • DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT OF THE INVENTION
  • The notation used in this disclosure is described in the following table, where graph type “D” is directed and graph type “U” is undirected:
  • TABLE 1
    Symbol Graph Type Symantics
    Nq D and U Number of nodes with label q
    Nqr D Number of edges from node with
    label q into node with label r
    Nqr U When q = r,
    Number of edges between node
    with label q and node with label
    r
    When q /= r,
    Half of the number of edges between
    Np D and U Total number of labeled edges i.e. edges
    where both nodes are labeled
    Psame D and U Ratio of the number of edges between
    nodes with same label to total
    number of labeled edges
    Popp D and U Ratio of the number of edges between
    nodes with different labels to
    total number of labeled edges
    D D and U Distribution over labeled edges
  • Weight Matrix Construction
  • In this section we elucidate a way of constructing the weight matrix for a partially labeled graph G(V, E) where V is the set of nodes and E the set of edges. We assume that the labeling is binary, i.e. any labeled node i has a label Yiε{1,−1}. As mentioned before, the procedure of constructing the weight matrix W, which serves as input to a graph transduction technique, could be applied recursively or iteratively to each (binary) classified portion, to attain multi-class classification. Hence, the input in any run to our weight matrix construction method is a partially (binary) labeled graph as shown in FIG. 1.
  • Given our setup, a partially labeled graph G has 3 types of nodes and consequently 9 types of edges for a directed graph while 6 types of edges for an undirected one. A node could be labeled 1 or −1 or may be unlabeled. An edge could be between two nodes with the same label (i.e. (1→1) or (−1→−1)) or between two oppositely labeled nodes (i.e. (1→−1) or (−1→1)) or between a labeled and unlabeled node (i.e. (1→?) or (−1→?) or (?→1) or (?→1)) or between two unlabeled nodes (i.e. (?→?)). An undirected example graph T is shown in FIG. 1. Our task then is to assign weights to each of these types of edges.
  • Notation
  • Before we describe the weights we assign to the different types of edges, we introduce some notation. Given a graph G, let Nq denote the number of nodes with label q. Let Nqr denote the number of edges from node with label q into node with label r. In an undirected graph, this would be the number of edges between nodes labeled q and r, if q=r. If q≠r, then Nqr would be half of the number of edges between q and r. Notice that Nqr for q≠r could thus be a float, but we do this to make the formulae in this paper consistent irrespective of whether we have a directed or an undirected graph. Let Np denote the total number of labeled edges, i.e. the total number of edges where both nodes are labeled. In other words, Np=N11+N−11+N1−1+N−1−1. With this let,
  • P same = N 11 + N - 1 , - 1 N p , P opp = N 1 - 1 + N - 11 N p ( 1 )
  • Hence, Psame+Popp=1. We denote this empirical distribution derived from labeled edges by D. A summary of this notation for directed and undirected graphs is shown in Table 1.
  • Assignment of Weights
  • We now describe our weight matrix construction which applies to both directed and undirected graphs. We partition the types of edges into five categories and suggest a way of assigning weights to edges in each of these categories.
      • Edges between nodes with the same label: If an edge is between nodes having the same label, that is if node i and node j have the same label, we assign a weight Wij=Psame to that edge. This makes intuitive sense since we want to weigh the edge based on how likely it is to have nodes with the same label being connected.
      • Edges between nodes with opposite/different labels: If an edge is between nodes with opposite labels, that is if node i and node j have different labels, we assign a weight Wij=−Popp to that edge. This is also intuitive since, we want to weigh the edge based on how likely it is to have nodes with opposite labels connected. We assign a negative sign since simply assigning the magnitude will not create a distinction between nodes labeled alike and those with different labels.
      • Edges between unlabeled nodes: If an edge is between unlabeled nodes, that is if node i and node j do not have labels, we assign a weight Wij=ED [Yi, Yj] to that edge. ED [Yi, Yj] denotes the expectation of labeled edges over the distribution D. Yi and Yjε{1, −1} and hence,
  • E D [ Y i , Y j ] = q , r { 1 , - 1 } qrP [ Y i = q , Y j = r ] = P [ Y i = 1 , Y j = 1 ] - P [ Y i = 1 , Y j = - 1 ] + P [ Y i = - 1 , Y j = - 1 ] - P [ Y i = - 1 , Y j = 1 ] = N 11 N P - N 1 - 1 N P + N - 1 - 1 N P - N - 11 N P ( 2 )
      • Since we do not know the labels of any of the nodes for edges in this category, we assign our most unbiased estimate which is the indicated expected value.
      • Edges between an unlabeled node and a node with label 1: If an edge is between an unlabeled node and a node with label 1, we assign a weight Wij=ED [Yi|Yj=1] to that edge. Here Yiε{1, −1}. In this case,
  • E D [ Y i | Y j = 1 ] = N 11 N 1 - N - 11 + N 1 - 1 N 1 ( 3 )
      • is our unbiased estimate given that one of the nodes has a label of 1.
      • Edges between an unlabeled node and a node with label −1: If an edge is between an unlabeled node and a node with label −1, we assign a weight Wij=ED [Yi|Yj=−1] to that edge. Here Yiε{1, −1}. In this case,
  • E D [ Y i | Y j = - 1 ] = N - 1 - 1 N - 1 - N - 11 + N 1 - 1 N - 1 ( 4 )
      • is our unbiased estimate given that one of the nodes has a label of −1.
  • A weighted version of our example graph T in FIG. 1, is shown by graph Tw in FIG. 2.
  • Characteristics of Matrix Construction
  • In the previous section, we elucidated a way of constructing a weight matrix for a partially labeled graph. In this section, we discuss certain characteristics of this construction. We discuss aspects such as relationships of the suggested weights to standard statistical measures and the tendencies of the weight matrix as a function of the connectivity and labeling in the graph. As we will see, our construction seems to have desirable properties.
  • Relation to Standard Measures of Association
  • In the previous section, we described and provided a brief justification of the procedure to assign weights. It turns out that the weights we assign to edges that have at least one unlabeled node, besides being unbiased, have more (statistical) semantics.
  • Proposition 1. The weights assigned to edges between unlabeled nodes i.e. ED [Yi, Yj], equate to the gamma test statistic (ρ) in the relational setting.
  • Proof. From equation 2 we have,
  • E D [ Y i , Y j ] = N 11 N P - N 1 - 1 N P + N - 1 - 1 N P - N - 11 N P = 1 N P ( N 11 + N - 1 - 1 ) - 1 N P ( N - 11 + N 1 - 1 ) = P same - P opp = ρ
  • The gamma test statistic ρ, is a standard measure of association used in statistics. The value of this statistic ranges from [−1, 1], where positive values indicate agreement, negative values indicate disagreement/inversion and zero indicates absence of association. The statistic was historically used to compare the sorted order of observations based on values of two attributes. Recently, however, it has been suggested as a metric to measure auto-correlation in relational data graphs. Hence, our assignment of weight to edges between unlabeled nodes is the auto-correlation in the graph, which makes intuitive sense.
  • The weights assigned to edges with one labeled and one unlabeled node i.e. ED [Yi|Yj=1] or ED [Yi|Yj=−1], based on equations 3 and 4 can be written as: (Psame|1)−(Popp|1)=ρ1 and (Psame|−1)−(Popp|−1)=ρ−1. These could be considered as gamma test statistics conditioned on one particular type of label and could be referred to as conditional gamma test statistics.
  • Behavior of Weight Matrix
  • We now analyze the behavior of the weight matrix as the labeled edges in our input graph tend towards only connecting nodes with the same labels or analogously only connecting nodes with different labels.
  • As our input graph tends to have only nodes with same labels being connected, it has the following effect on our weight matrix. The weight of edges between nodes with the same label tends to one, i.e. Psame→1. The weight of edges between nodes with different labels tends to zero, i.e. −Popp→0. The weight of edges between unlabeled nodes tends to 1, i.e. ρ→1. The weight of the remaining set of edges also tends to one, i.e. ρ1, ρ−1→1. Hence, in this situation the weight matrix becomes an adjacency matrix in the extreme case, with different labeled edges vanishing (i.e. being weighted 0) and all other edges getting a weight of one. Consequently, our example weighted graph Tw in FIG. 2 becomes graph Ts in FIG. 3.
  • As our input graph tends to have only nodes with different labels being connected, it has the following effect on our weight matrix. The weight of edges between nodes with the same label tends to zero, i.e. Psame→0. The weight of edges between nodes with different labels tends to −1, i.e. −Popp→−1. The weight of edges between unlabeled nodes tends to −1, i.e. ρ→−1. The weight of the remaining set of edges also tends to −1, i.e. ρ1, ρ−1→−1. Since the graph in the extreme case has no positive weights, the negative sign in the weights is superfluous and can be eliminated. Hence, in this situation too the weight matrix becomes an adjacency matrix in the extreme case, with same labeled edges vanishing (i.e. being weighted 0) and all other edges getting a weight of one. Consequently, our example weighted graph Tw in FIG. 2 becomes graph To in FIG. 4.
  • We thus have Ts∪To=T, and the labeled edges in Ts and To complement each other on the labeled portion with respect to the base graph T. We intuitively expect the labeled edges between differently labeled nodes to slowly disappear while the other edges remain present, as edges connecting nodes with the same label become predominant. We also expect analogous behavior for the diametric case. As we have seen, these intuitions are captured implicitly, in our modeling of the weight matrix, thus making the construction procedure more acceptable.
  • Extensions
  • In the previous sections, we described a procedure for constructing the weight matrix for a partially labeled graph with no attributes. In this section, we extend the weighting scheme to include attribute information. Moreover, we also present a solution to handle data heterogeneity using ideas from relational learning.
  • Modeling with Attributes
  • For data graphs that have attributes, we want to be able to leverage this information in addition to the information learned from the connectivity of the graph, so as to possibly further improve the performance of our procedure. In particular, we need to extend our weight assignment procedure to be able to encapsulate attribute information. A simple way of combining the already modeled connectivity information with the attributes, is to assign a weight to an edge that is a conical combination of the weight based on connectivity and a weight based on the affinity of attribute values of the connected nodes. Hence, if wc is the weight assigned based on the connectivity for the particular edge type and wa is the weight assigned based on attributes, then λwc+μwa is the new weight of that edge, where μ, λ≧0. wc is essentially a weight assignment described above (in the Assignment of Weights subsection), viz. Psame or ρ etc. wa is a function of the attributes of the nodes connected by the corresponding edge, which we will soon define. μ and λ are parameters which can be determined through standard model selection techniques such as cross-validation. A reasonable indicator for the value of λ could be the absolute value of the auto-correlation in the graph. While a reasonable estimate of the value of μ could be the absolute value of the cross-correlation between wa and the labeling of the corresponding nodes, i.e. if the labels are the same or different.
  • In the absence of attributes, our weight assignment wc for any type of edge, has a value in the interval [−1, 1]. To effectively combine the aforementioned two sources of information, wa needs to be of the same scale as wc. One obvious choice could be cosine similarity which is commonly used in text analytics. Cosine similarity lies in [−1, 1], where values close to 1 imply that the nodes are similar while values close to −1 imply that the nodes are dissimilar. Other choices could be kernel functions (K) such as Gaussian kernel, which normalize popular distance metrics such as Euclidean distance and other lp norms to value in [0, 1]. Here, values close to 1 imply similarity and values close to 0 imply dissimilarity. This range can be easily transformed to our usual range of [−1,1] with the same symantics as before, by a simple linear transformation of the form, 2K−1.
  • Modeling with Heterogeneous Data
  • If the data graph has multiple types of entities, resulting in different types of nodes, the procedure previously described cannot be directly applied to construct the weight matrix. In such cases, standard relational learning strategies such as collapsing portions of the graph and using aggregation can be applied to reduce to a graph with a single type of node with attributes. To this new graph the above extended procedure can be applied.
  • For instance, in a citation graph we may have authors linked to papers, with papers having multiple authors and vice-versa. An example of this is shown in FIGS. 5A and 5B. In FIG. 5A, we see that the node type Paper 510 has two attributes, Title 515 and Area 516, which denote the title of the paper and the research area it belongs to, respectively. Let the attribute Area 516 be the class label, i.e. we want to classify papers based on their research area. The node type Author 520 has attributes Paper Title 525 and Age 526, which relates a particular paper to the ages of the authors that wrote it. The Title 515 attribute (a primary key) in Paper 510 is the same as the Paper Title 526 attribute (a foreign key) in Author 520. Hence, each Paper 510 node has three attributes namely; Title 515, Area 516 and Age 525. The attributes Title 515 and Area 516 are called intrinsic attributes as they belong to node type Paper 510 and the attribute Age 525 is called a relational attribute since it belongs to a different linked node type Author 520. Each paper can have variable number of authors and thus each paper would be associated with multiple values of Age 525. A popular solution to this problem is to aggregate the values of the attribute Age 525 of Author 520 into a single value such that each paper is associated with only a single Age 525 value. An aggregation function such as average over the ages of the related authors for each paper can be used. Now instead of the Age 525 attribute we can introduce a new attribute AvgAge which denotes average age. With this the attributes of Paper node are; Title, Area and AvgAge. Linking authors that co-authored a paper, we now have a data graph that links only the Paper node type, with each node having two attributes and a class label.
  • If we have heterogeneous link types, then the described procedures can be applied independently to graphs formed from each link type and the final result could be obtained by aggregating the individual decisions through standard ensemble label consolidation techniques such as taking a majority vote or a weighted majority based on the corresponding auto-correlations.
  • Compatibility with Graph Transduction Techniques
  • Graph based transductive learning approaches impose a trade off between the fitting accuracy of the prediction function on labeled data and the smoothness of the function over the graph. Typically, the smoothness measure of a prediction function f over the graph G is calculated as:
  • f G 2 = i j W ij f ( x i ) - f ( x j ) 2 = 1 2 f ( X ) T Lf ( X ) ( 5 )
  • where Wij is the weight of the edge between nodes xi and xj, X is the input matrix denoting the nodes, f(xi) is the label of node xi, f(X)=[f(x1), . . . , f(xn)]T if there are n nodes and L is the graph T aplacian of G.
  • Given the above measure of function smoothness, a graph Laplacian based regularization framework estimates the unknown function ƒ as follows:

  • f opt=argminQ(X l ,Y l ,f)+η∥f∥ G 2  (6)
  • where Q(Xl, Yl, f) is a loss function measuring the accuracy over the labeled set (Xl, Yl). For example, Q(Xl,Yl,f)=∥f(Xl)−Yl2 i.e. squared loss, is a popular choice.
  • A weight matrix constructed using our method cannot directly be passed as input to this graph regularization framework. This is because, the smoothness measure using the graph Laplacian is based on the assumption that connected nodes tend to have the same class labels and hence the weights have to be non-negative (i.e. Wij≧0 ∀i,j). However, it is well-known that edges in relational networks could connect nodes with different labels, which would lead to our construction method assigning negative weights to such edges. An example is the WEBKB dataset, described in Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence, by M. Craven et al., AAAI, pages 509-516 (American Association for Artificial Intelligence, 1998), where student nodes are typically connected to faculty nodes more than other student nodes. To ensure compatibility with the graph Laplacian based regularization framework, we make the following modification:
  • f G 2 = i j W ij f ( x i ) - sgn ( W ij ) f ( x j ) 2 = 1 2 f ( X ) T Mf ( X ) ( 7 )
  • similar to the one described in the article “Dissimilarity in graph-based semi-supervised classification” by L. Getoor et al. in Artificial Intelligence and Statistics (AISTATS), 2007, where {tilde over (W)}ij=|Wij|, the degree matrix {tilde over (D)}={{tilde over (D)}ij} is computed as {tilde over (D)}iij{tilde over (W)}ij, M=({tilde over (D)}−{tilde over (W)})+(1−sgn(W))∘W and the symbol ∘ is the Hadamard product. With this new smoothness measure, we can now pass our constructed weight matrix as input to this rich class of graph transduction methods.
  • Experiments
  • In the previous sections, we described a method to construct a weight matrix for relational data that serves as input to a rich class of graph based transductive learning algorithms. In this section, we assess the efficacy of our approach through empirical studies on synthetic and real data. In these studies, we compare methods across three broad categories, namely: a) sophisticated relational learning (RL) methods, b) sophisticated graph transduction methods with the weight matrix computed using available attributes or adjacency matrix (if no attributes) as input (GTA) and c) relational transductive methods where our learned weight matrix is passed as input to (enhanced/modified) graph transduction techniques. The situations where methods in category c) perform favorably to methods in the other two categories would be the conditions under which use of our procedure would be justified. The relational learning methods we consider are: MLNs, RDNs, PL-EM and RN. The graph transduction methods we consider are: local global consistency (LGC) method (as described in the article “Pseudolikelihood em for within-network relational learning” by R. Xiang et al. in Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, pages 1103-1108, published by IEEE Computer Society, Washington, D.C., USA) and harmonic functions Gaussian fields (HFGF) method (as described in the article “Semi-supervised learning using Gaussian fields and harmonic functions” by X. Zhu et al. in Proceedings of ICML, pages 912-919, 2003).
  • In all of our experiments, we vary the percentage of known labels for training from 5% to 10% to 30% to 70%. The errors for each of the methods are obtained by randomly selecting (100 times) the labeled nodes for the specified proportions followed by averaging the corresponding errors. To avoid clutter in the figures reporting the results, we plot only the following four curves (rather than eight),
      • the best performance at each labeled percentage of methods in category a) (BEST RL),
      • the best performance at each labeled percentage of methods in category b) (BEST GTA),
      • the LGC method with our constructed weight matrix as input (LGCW) and
      • the HFGF method with our constructed weight matrix as input (HFGFW) i.e. methods in category c).
    Synthetic Experiments
  • We generate graphs using well accepted random graph generation procedures that create real world graphs, namely: forest fire (as described in the article “Graph evolution: Densification and shrinking diameters” by J. Leskovec et al. in ACM Trans. Knowl. Discov. Data, 1(1):2, 2007), and preferential attachment (as described in the article “Emergence of scaling in random networks” by A. Barabasi et al. in Science, 286:509-512, 1999). These procedures add one node at a time and as nodes get added, we assign a label to it based on an intuitive label generation procedure which is described below.
  • Setup
  • We generate graphs consisting of 1000 nodes for the two generation techniques mentioned above. The parameter settings for forest fire (forward probability=0.37, backward probability=0.32) and preferential attachment (exponent β=1.6) are derived from the above cited articles which indicate that these settings lead to the most realistic graphs.
  • On the labeling front, we generate a binary labeling ε{1, −1} by a simple procedure for each of these graphs. Whenever a new node is added, with probability p we assign the majority class amongst its labeled neighbors and with probability 1−p we assign one of the two labels uniformly at random. Hence, the labels generated are dependent on the particular graph generation procedure and consequently the connectivity of the graph, as is desired. It's easy to see that as p→1 the auto-correlation in the graph increases, leading to more homogeneity or less entropy amongst connected nodes. For each of the two graph generation procedures, we create graphs where p is low (i.e. 0.3) and where p is high (i.e. 0.8). The low p leads to an auto-correlation of about 0.2 (i.e. p≈0.2) while the high p leads to an auto-correlation of about 0.7 (i.e. p≈0.7), which are calculated from the generated graphs.
  • Observations
  • From FIGS. 6A, 6B, 7A and 7B we see that given a particular graph generation procedure—irrespective of the level of auto-correlation—the relative performance of the three different classes of methods is qualitatively similar. GTAs are known to perform particularly well when only a few nodes are labeled and this is confirmed in our experiments. As the percentage of known labels increases however, the relational learning methods start performing better than standard graph transduction techniques. This is probably due to the fact that most sophisticated relational learning methods have low bias and relatively high variance. However, with increasing number of labeled nodes this variance drops rapidly.
  • The interesting result, however, is that our weight matrix construction technique seems to capture enough of the complexity of the labeling and the network structure that besides performing exceedingly well when the graph is sparsely labeled, it remains competitive with relational learning methods when the percentage of known labels is moderate to high.
  • Real Data Experiments
  • For experiments on real data we choose two datasets, namely: WEBKB and a real industrial dataset, BREAD, obtained from a large consumer retail company.
  • Setup
  • The WEBKB dataset has a collection of webpages obtained from computer science departments of four US universities. Each webpage belongs to one of seven categories namely; course, faculty, student, staff, project, department or other. The “other” category webpages were not used as input in the classification task, but were used to link webpages in the remaining six classes as described in the article “Classification in networked data: A toolkit and a univariate case study” by S. Macskassy et al. in J. Mach. Learn. Res., 8:935-983, 2007. We performed experiments on the four graphs formed—one for each university—and computed the average error over the four universities for each of the learning methods.
  • The BREAD dataset has sales information about bread products sold in different stores in the northeastern United States. The dataset has information from 2347 stores. For each store we know its location, we know if the store met or underachieved its target quarterly sales, we know the amounts it had on promotion during that period, we know the quantity ordered during that period and we know the amount reclaimed during that period. Based on location, we can form a graph linking the closest stores together. With this, we have a dataset of size 2347 and where each node in the graph has four attributes. Setting the attribute indicating whether the sales met or underachieved the expected amount as our class label, we obtain a graph where each node has three explanatory attributes.
  • Observations
  • On the WEBKB dataset we see in FIG. 8A that the best GTA is better than the relational methods when a small percentage (<20%) of labels are known, but the relational methods quickly close this gap and start outperforming the GTAs with more label information. Our weight matrix construction method however, performs better than the other two classes of methods at low label proportions and remains competitive with the relational methods as this proportion increases, unlike the GTAs. This favorable behavior can most likely be attributed to our method being able to effectively model the strength (i.e. the numerical value) and direction (i.e. + or −) of dependencies between linked entities, something GTAs seemingly fail to capture.
  • On the BREAD dataset we see in FIG. 8B that the GTAs are much worse than the other class of methods. A possible reason for this is that stores near to one another typically compete with each other for the same type of products and hence, our input graph exhibits strong negative auto-correlation. Since GTAs predominantly model similarity between linked entities, their performance is practically unchanged even when the percentage of known labels is increased. The relational methods perform much better than GTAs in this setting. In contrast to GTAs, they effectively capture the dissimilarity between linked nodes as the number of known labels increases. However, our weight matrix construction method seems to capture this relationship much earlier with only a small percentage of labels known.
  • Discussion
  • In this disclosure, we have provided a simple yet novel way of constructing a weight matrix for partially labeled relational graphs that may be directed or undirected, that may or may not have attributes and that may be homogeneous or heterogeneous. We have described the manner in which such a weight matrix can serve as input to a rich class of graph transduction methods through a modified graph Laplacian based regularization framework. We have portrayed the desirable properties of this construction method and showcased its effectiveness in capturing complex dependencies through experiments on synthetic and real data.
  • In the future, it would be interesting to extend this procedure to perform multi-class classification in a single shot, rather than having to perform multiple binary classification tasks. This would most likely improve the actual running time, though not necessarily the time complexity in terms of O(.). On the theory side, it might be of some interest to analyze the synthetic label generation procedure introduced in this paper, for different types of graphs. One could use ideas from the theory of random walks to determine tendencies of the label generation procedure. From a learning theory perspective, one could potentially derive error bounds as functions of p (amongst other parameters), and if one were to express p in terms of auto-correlation ρ, one would have error bounds as functions of ρ. This would be of some interest since ρ can be computed from static graphs or given a snapshot of an evolving graph, where one does not have to know the order in which the nodes were attached, thus making the error bound applicable to graphs in a larger set of applications.
  • While the invention has been described in terms of a single preferred embodiment, those skilled in the art will recognize that the invention can be practiced with modification within the spirit and scope of the appended claims.

Claims (20)

1. A method for extending a partially labeled data graph to unlabeled nodes in a single network classification, comprising:
constructing a weight matrix for data in a single network classification, the weight matrix incorporating a conical weighting scheme that weighs importance of links relative to attributes;
applying the weight matrix to the data; and
applying a graph transduction method to the weighted data to generate labels for the unlabeled nodes.
2. A method as in claim 1, wherein the weight matrix uses a modified graph Laplacian based regularization framework.
3. A method as in claim 2, further comprising:
partitioning edges of the data graph into categories;
assigning a weight to each category; and
assigning to each edge the weight of its respective category.
4. A method as in claim 3, wherein the categories are
edges between nodes with the same label;
edges between nodes with opposite labels;
edges between unlabeled nodes;
edges between an unlabeled node and a node with a label 1; and
edges between an unlabeled node and a node with a label −1.
5. A method as in claim 4, wherein edges between unlabeled nodes are assigned a weight denoting an expectation based on a distribution of edges that have labels.
6. A method as in claim 4, wherein edges between an unlabeled node and a labeled node are assigned a weight denoting an expectation based on a distribution of edges that have labels, said distribution being limited to those edges having one node equal to the labeled node.
7. A method as in claim 3, further comprising assigning to each edge a weight that is a conical combination of a weight based on the respective category and a weight based on affinity of attribute values of nodes connected by said edge.
8. A method as in claim 1, wherein applying a graph transduction method further comprises imposing a tradeoff between a fitting accuracy of a prediction function on labeled data and a smoothness of the prediction function over the graph.
9. A method as in claim 8, further comprising
estimating the smoothness of the prediction function for the graph Laplacian based regularization framework; and
modifying the prediction function to ensure compatibility between the graph transduction method and the graph Laplacian based regularization framework.
10. A system for extending a partially labeled data graph to unlabeled nodes in a single network classification, comprising:
a weight matrix for data in a single network classification, the weight matrix incorporating a conical weighting scheme that weighs importance of links relative to attributes;
means for applying the weight matrix to the data; and
a graph transduction method applied to the weighted data to generate labels for the unlabeled nodes.
11. A system as in claim 10, wherein the weight matrix uses a modified graph Laplacian based regularization framework.
12. A system as in claim 11, further comprising:
means for partitioning edges of the data graph into categories;
means for assigning a weight to each category; and
means for assigning to each edge the weight of its respective category.
13. A system as in claim 12, wherein the categories are
edges between nodes with the same label;
edges between nodes with opposite labels;
edges between unlabeled nodes;
edges between an unlabeled node and a node with a label 1; and
edges between an unlabeled node and a node with a label −1.
14. A system as in claim 13, wherein edges between unlabeled nodes are assigned a weight denoting an expectation based on a distribution of edges that have labels.
15. A system as in claim 13, wherein edges between an unlabeled node and a labeled node are assigned a weight denoting an expectation based on a distribution of edges that have labels, said distribution being limited to those edges having one node equal to the labeled node.
16. A system as in claim 12, further comprising assigning to each edge a weight that is a conical combination of a weight based on the respective category and a weight based on affinity of attribute values of nodes connected by said edge.
17. A system as in claim 10, wherein a graph transduction method is applied by imposing a tradeoff between a fitting accuracy of a prediction function on labeled data and a smoothness of the prediction function over the graph.
18. A system as in claim 17, further comprising
means for estimating the smoothness of the prediction function for the graph Laplacian based regularization framework; and
means for modifying the prediction function to ensure compatibility between the graph transduction method and the graph Laplacian based regularization framework.
19. A computer implemented system for extending a partially labeled data graph to unlabeled nodes in a single network classification, comprising:
a computer processor for executing computer code;
first computer code for constructing a weight matrix for data in a single network classification, the weight matrix incorporating a conical weighting scheme that weighs importance of links relative to attributes;
second computer code for applying the weight matrix to the data; and
third computer code for applying a graph transduction method to the weighted data to generate labels for the unlabeled nodes.
20. A computer implemented system as in claim 19, wherein the weight matrix uses a modified graph Laplacian based regularization framework.
US15/078,408 2013-03-07 2016-03-23 System and method for using graph transduction techniques to make relational classifications on a single connected network Abandoned US20160203417A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/078,408 US20160203417A1 (en) 2013-03-07 2016-03-23 System and method for using graph transduction techniques to make relational classifications on a single connected network

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13/787,807 US9355367B2 (en) 2013-03-07 2013-03-07 System and method for using graph transduction techniques to make relational classifications on a single connected network
US15/078,408 US20160203417A1 (en) 2013-03-07 2016-03-23 System and method for using graph transduction techniques to make relational classifications on a single connected network

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/787,807 Continuation US9355367B2 (en) 2013-03-07 2013-03-07 System and method for using graph transduction techniques to make relational classifications on a single connected network

Publications (1)

Publication Number Publication Date
US20160203417A1 true US20160203417A1 (en) 2016-07-14

Family

ID=51489137

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/787,807 Expired - Fee Related US9355367B2 (en) 2013-03-07 2013-03-07 System and method for using graph transduction techniques to make relational classifications on a single connected network
US15/078,408 Abandoned US20160203417A1 (en) 2013-03-07 2016-03-23 System and method for using graph transduction techniques to make relational classifications on a single connected network

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US13/787,807 Expired - Fee Related US9355367B2 (en) 2013-03-07 2013-03-07 System and method for using graph transduction techniques to make relational classifications on a single connected network

Country Status (1)

Country Link
US (2) US9355367B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451617A (en) * 2017-08-08 2017-12-08 西北大学 One kind figure transduction semisupervised classification method
CN111863281A (en) * 2020-07-29 2020-10-30 山东大学 Personalized adverse drug reaction prediction method, system, equipment and medium

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10120956B2 (en) * 2014-08-29 2018-11-06 GraphSQL, Inc. Methods and systems for distributed computation of graph data
CN107451596B (en) * 2016-05-30 2020-04-14 清华大学 Network node classification method and device
US11868916B1 (en) * 2016-08-12 2024-01-09 Snap Inc. Social graph refinement
US11205103B2 (en) 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis
US10178131B2 (en) * 2017-01-23 2019-01-08 Cisco Technology, Inc. Entity identification for enclave segmentation in a network
US20180211177A1 (en) * 2017-01-25 2018-07-26 Pearson Education, Inc. System and method of bayes net content graph content recommendation
US11500936B2 (en) 2018-08-07 2022-11-15 Walmart Apollo, Llc System and method for structure and attribute based graph partitioning
US10936657B2 (en) * 2018-08-31 2021-03-02 Netiq Corporation Affinity determination using graphs
CN111626311B (en) * 2019-02-27 2023-12-08 京东科技控股股份有限公司 Heterogeneous graph data processing method and device
US11606393B2 (en) * 2019-08-29 2023-03-14 Nec Corporation Node classification in dynamic networks using graph factorization
CN111966823B (en) * 2020-07-02 2022-04-22 华南理工大学 Graph node classification method facing label noise
CN113111134A (en) * 2021-04-21 2021-07-13 山东省人工智能研究院 Self-coding and attention mechanism-based heterogeneous graph node feature embedding method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060053099A1 (en) * 2004-09-03 2006-03-09 Biowisdom Limited System and method for capturing knowledge for integration into one or more multi-relational ontologies
US20080275902A1 (en) * 2007-05-04 2008-11-06 Microsoft Corporation Web page analysis using multiple graphs
US20090132561A1 (en) * 2007-11-21 2009-05-21 At&T Labs, Inc. Link-based classification of graph nodes
US20120089552A1 (en) * 2008-12-22 2012-04-12 Shih-Fu Chang Rapid image annotation via brain state decoding and visual pattern mining

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002071243A1 (en) 2001-03-01 2002-09-12 Biowulf Technologies, Llc Spectral kernels for learning machines
US20030024975A1 (en) 2001-07-18 2003-02-06 Rajasekharan Ajit V. System and method for authoring and providing information relevant to the physical world
US7958067B2 (en) 2006-07-12 2011-06-07 Kofax, Inc. Data classification methods using machine learning techniques
WO2010075408A1 (en) * 2008-12-22 2010-07-01 The Trustees Of Columbia University In The City Of New York System and method for annotating and searching media
JP5246073B2 (en) 2009-07-07 2013-07-24 富士通モバイルコミュニケーションズ株式会社 Terminal device and content recommendation method
KR101306667B1 (en) 2009-12-09 2013-09-10 한국전자통신연구원 Apparatus and method for knowledge graph stabilization

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060053099A1 (en) * 2004-09-03 2006-03-09 Biowisdom Limited System and method for capturing knowledge for integration into one or more multi-relational ontologies
US20080275902A1 (en) * 2007-05-04 2008-11-06 Microsoft Corporation Web page analysis using multiple graphs
US20090132561A1 (en) * 2007-11-21 2009-05-21 At&T Labs, Inc. Link-based classification of graph nodes
US20120089552A1 (en) * 2008-12-22 2012-04-12 Shih-Fu Chang Rapid image annotation via brain state decoding and visual pattern mining

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples"M. Belkin et al; Journal of Machine Learning Research 7 (2006) 2399-2434 Submitted 4/05; Revised 5/06; Published 11/062006 Mikhail Belkin, Partha Niyogi and Vikas Sindhwani *
A. B. Goldberg, J. Zhu, and S. Wright. Dissimilarity in graph-based semi-supervised classification. In AISTATS, 2007 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451617A (en) * 2017-08-08 2017-12-08 西北大学 One kind figure transduction semisupervised classification method
CN111863281A (en) * 2020-07-29 2020-10-30 山东大学 Personalized adverse drug reaction prediction method, system, equipment and medium

Also Published As

Publication number Publication date
US20140258196A1 (en) 2014-09-11
US9355367B2 (en) 2016-05-31

Similar Documents

Publication Publication Date Title
US9355367B2 (en) System and method for using graph transduction techniques to make relational classifications on a single connected network
Bhagat et al. Node classification in social networks
Shao et al. Multiple incomplete views clustering via weighted nonnegative matrix factorization with regularization
Barbieri et al. Who to follow and why: link prediction with explanations
Kim et al. Simultaneous discovery of common and discriminative topics via joint nonnegative matrix factorization
Kim et al. Latent multi-group membership graph model
Wang et al. Scalable social sensing of interdependent phenomena
Sun et al. Co-evolution of multi-typed objects in dynamic star networks
Xie et al. Tst: Threshold based similarity transitivity method in collaborative filtering with cloud computing
Xu et al. TNS-LPA: an improved label propagation algorithm for community detection based on two-level neighbourhood similarity
CN106067029A (en) The entity classification method in data-oriented space
Khodadadi et al. Sign prediction in social networks based on tendency rate of equivalent micro-structures
Li et al. Deepgraph: Graph structure predicts network growth
Bernard et al. Sparse production networks
Jaya Lakshmi et al. Temporal probabilistic measure for link prediction in collaborative networks
Xie et al. Explaining dynamic graph neural networks via relevance back-propagation
Cao et al. A stochastic model for detecting overlapping and hierarchical community structure
Katsimpras et al. Class-aware tensor factorization for multi-relational classification
Sodja Detecting anomalous time series by GAMLSS-Akaike-Weights-Scoring
Khosravi et al. Transaction-based link strength prediction in a social network
Zi et al. Sparse Production Networks
Dhurandhar et al. Single network relational transductive learning
US20190197411A1 (en) Characterizing model performance using global and local feature contributions
Cheng et al. Effective Pre-rating Method Based on Users' Dichotomous Preferences and Average Ratings Fusion for Recommender Systems
Martinez-Seis et al. Ranking features in Facebook to detect overlapping communities

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DHURANDHAR, AMIT;WANG, JUN;REEL/FRAME:038292/0700

Effective date: 20130301

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION