US20150142521A1 - Customer clustering using integer programming - Google Patents

Customer clustering using integer programming Download PDF

Info

Publication number
US20150142521A1
US20150142521A1 US14/084,903 US201314084903A US2015142521A1 US 20150142521 A1 US20150142521 A1 US 20150142521A1 US 201314084903 A US201314084903 A US 201314084903A US 2015142521 A1 US2015142521 A1 US 2015142521A1
Authority
US
United States
Prior art keywords
cluster
clusters
customer
customers
splitting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/084,903
Inventor
Burcu Aydin
Michael Tamir
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Transform Sr Brands LLC
Original Assignee
Sears Brands LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US14/084,903 priority Critical patent/US20150142521A1/en
Application filed by Sears Brands LLC filed Critical Sears Brands LLC
Publication of US20150142521A1 publication Critical patent/US20150142521A1/en
Assigned to JPP, LLC reassignment JPP, LLC SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SEARS BRANDS, L.L.C.
Assigned to SEARS BRANDS, L.L.C. reassignment SEARS BRANDS, L.L.C. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AYDIN, BURCU
Assigned to CANTOR FITZGERALD SECURITIES, AS AGENT reassignment CANTOR FITZGERALD SECURITIES, AS AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: TRANSFORM SR BRANDS LLC
Assigned to SEARS BRANDS, L.L.C. reassignment SEARS BRANDS, L.L.C. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: JPP, LLC
Assigned to BANK OF AMERICA, N.A. reassignment BANK OF AMERICA, N.A. SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: TRANSFORM SR BRANDS LLC
Assigned to CITIBANK, N.A. reassignment CITIBANK, N.A. SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: TRANSFORM SR BRANDS LLC
Priority to US16/366,542 priority patent/US11288688B2/en
Assigned to TRANSFORM SR BRANDS LLC reassignment TRANSFORM SR BRANDS LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SEARS BRANDS, L.L.C.
Assigned to TRANSFORM SR BRANDS LLC reassignment TRANSFORM SR BRANDS LLC RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: CANTOR FITZGERALD SECURITIES, AS AGENT
Assigned to TRANSFORM SR BRANDS LLC reassignment TRANSFORM SR BRANDS LLC RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: BANK OF AMERICA, N.A.
Assigned to TRANSFORM SR BRANDS LLC reassignment TRANSFORM SR BRANDS LLC RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: CITIBANK, N.A., AS AGENT
Priority to US17/705,483 priority patent/US11823218B2/en
Priority to US18/499,613 priority patent/US20240070694A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0204Market segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements

Definitions

  • Various embodiments relate to electronic commerce (e-commerce), and more particularly, to classifying customers in an e-commerce environment.
  • Electronic commerce (e-commerce) websites are an increasingly popular venue for consumers to research and purchase products without physically visiting a conventional brick-and-mortar retail store.
  • An e-commerce website may provide products and/or services to a vast number of customers.
  • the e-commerce website may obtain extensive amounts of data about their customer base. Such customer data may aid the e-commerce website to provide products and/or services that are relevant and/or otherwise desirable to a particular customer.
  • an e-commerce website may attempt to identify groups of customers with similar interests or similar lifestyles.
  • the e-commerce website may analyze these identified groups to derive generalizations regarding members of the group.
  • the e-commerce website may then tailor its services to members of each group based upon the derived generalizations.
  • FIG. 1 shows an e-commerce environment comprising a computing device and an e-commerce system in accordance with an embodiment of the present invention.
  • FIG. 2 shows an embodiment of a computing device for use in the e-commerce environment of FIG. 1 .
  • FIG. 3 shows user profiles and product catalogs maintained by an e-commerce system of FIG. 1 .
  • FIG. 4 shows an embodiment of a product listing provided by the e-commerce system of FIG. 1 .
  • FIG. 5 shows a flowchart for an embodiment of a process that may be used by the e-commerce system of FIG. 1 to obtain a transaction space and a feature space from purchase history data and demographic data.
  • FIG. 6 shows an example entry of the purchase history data for the e-commerce system of FIG. 1 .
  • FIG. 7 shows an example purchase history table for the e-commerce system of FIG. 1 after evaluating and retaining data of the purchase history data for a time window of interest.
  • FIG. 8 shows an example purchase history table for the e-commerce system of FIG. 1 after combining rows that correspond to the same customer and product category.
  • FIG. 9 shows an entry from an example customer-item (CI) matrix for the e-commerce system of FIG. 1 .
  • CI customer-item
  • FIG. 10 shows an example quantile table for the e-commerce system of FIG. 1 .
  • FIG. 11 shows a standardized entry from the example quantile table of FIG. 10 .
  • FIG. 12 shows a flowchart of a process that may be used by the e-commerce system of FIG. 1 to cluster customers based on the transaction space and feature space.
  • FIGS. 13-16 depict an example partitioning of a customer base.
  • aspects of the present invention are related to classifying and/or grouping customers together that exhibit similar interests, lifestyles, and/or purchase behavior. More specifically, certain embodiments of the present invention relate to apparatus, hardware and/or software systems, and associated methods that cluster customers based on solving an Integer Program that accounts for purchase history data and demographic data of the customers.
  • the e-commerce environment 10 may include a computing device 20 connected to an e-commerce system 30 via a network 40 .
  • the network 40 may include a number of private and/or public networks such as, for example, wireless and/or wired LAN networks, cellular networks, and the Internet that collectively provide a communication path and/or paths between the computing device 20 and the e-commerce system 30 .
  • the computing device 20 may include a desktop, a laptop, a tablet, a smart phone, and/or some other type of computing device which enables a user to communicate with the e-commerce system 30 via the network 40 .
  • the e-commerce system 30 may include one or more web servers, database servers, routers, load balancers, and/or other computing and/or networking devices that operate to provide an e-commerce experience for users that connect to the e-commerce system 30 via the computing device 20 and the network 40 .
  • the e-commerce system 30 may further include a customer classifier 33 , one or more tailored services 35 , and one or more electronic databases 37 upon which are stored purchase history data 38 and demographic data 39 for customers of the e-commerce system 30 .
  • the classifier 33 may include one or more firmware and/or software instructions, routines, modules, etc. that the e-commerce system 30 may execute in order to classify, group, or cluster customers of the e-commerce system 30 into classes, groups, or clusters of customers that exhibit similar purchasing habits.
  • the classifier 33 may analyze purchase history data and demographic data for the customers to identify clusters of customers with similar purchasing preferences.
  • the tailored services 35 may comprise one or more firmware and/or software instructions, routines, modules, etc. that the e-commerce system 30 may execute in order to tailor one or more aspects of the e-commerce system 30 for a particular customer.
  • the tailored services 35 may include advertisements, promotions, product recommendations, email campaigns, etc. that are tailored based upon the cluster to which the customer has been placed.
  • the classifier 33 and tailored services 35 may be executed concurrently by a single computing device of the e-commerce system 30 .
  • a computing device may execute the classifier 33 offline in order to obtain appropriate clusters and other input data for the tailored services 35 .
  • the classifier 33 may periodically (e.g., once an hour, once a day, once a week, etc.) provide one or more of the tailored services 35 with updated cluster and other input data.
  • the e-commerce system 30 may continue to provide tailored services 35 without the constant overhead of the classifier 33 and/or without the overhead of constant updates.
  • the e-commerce system 30 may execute the classifier 33 only during generally idle periods (e.g., after normal business hours). Further details regarding the classifier 33 and the tailored services 35 are presented below in regard to FIGS. 5-11 .
  • FIG. 1 depicts a simplified embodiment of the e-commerce environment 10 which may be implemented in numerous different manners using a wide range of different computing devices, platforms, networks, etc. Moreover, while aspects of the e-commerce environment 10 may be implemented using a client/server architecture, aspects of the e-commerce may be implemented using a peer-to-peer architecture or another networking architecture.
  • the e-commerce system 30 may include one or more computing devices.
  • FIG. 2 depicts an embodiment of a computing device 50 suitable for the computing device 20 and/or the e-commerce system 30 .
  • the computing device 50 may include a processor 51 , a memory 53 , a mass storage device 55 , a network interface 57 , and various input/output (I/O) devices 59 .
  • the processor 51 may be configured to execute instructions, manipulate data and generally control operation of other components of the computing device 50 as a result of its execution.
  • the processor 51 may include a general purpose processor such as an x86 processor or an ARM processor which are available from various vendors. However, the processor 51 may also be implemented using an application specific processor and/or other logic circuitry.
  • the memory 53 may store instructions and/or data to be executed and/or otherwise accessed by the processor 51 .
  • the memory 53 may be completely and/or partially integrated with the processor 51 .
  • the mass storage device 55 may store software and/or firmware instructions which may be loaded in memory 53 and executed by processor 51 .
  • the mass storage device 55 may further store various types of data which the processor 51 may access, modify, and/otherwise manipulate in response to executing instructions from memory 53 .
  • the mass storage device 55 may comprise one or more redundant array of independent disks (RAID) devices, traditional hard disk drives (HDD), solid-state device (SSD) drives, flash memory devices, read only memory (ROM) devices, etc.
  • RAID redundant array of independent disks
  • HDD hard disk drives
  • SSD solid-state device
  • flash memory devices read only memory devices, etc.
  • the network interface 57 may enable the computing device 50 to communicate with other computing devices directly and/or via network 40 .
  • the networking interface 57 may include a wired networking interface such as an Ethernet (IEEE 802.3) interface, a wireless networking interface such as a WiFi (IEEE 802.11) interface, a radio or mobile interface such as a cellular interface (GSM, CDMA, LTE, etc), and/or some other type of networking interface capable of providing a communications link between the computing device 50 and network 40 and/or another computing device.
  • a wired networking interface such as an Ethernet (IEEE 802.3) interface
  • a wireless networking interface such as a WiFi (IEEE 802.11) interface
  • a radio or mobile interface such as a cellular interface (GSM, CDMA, LTE, etc)
  • the I/O devices 59 may generally provide devices which enable a user to interact with the computing device 50 by either receiving information from the computing device 50 and/or providing information to the computing device 50 .
  • the I/O devices 59 may include display screens, keyboards, mice, touch screens, microphones, audio speakers, etc.
  • computing device 50 While the above provides general aspects of a computing device 50 , those skilled in the art readily appreciate that there may be significant variation in actual implementations of a computing device. For example, a smart phone implementation of a computing device may use vastly different components and may have a vastly different architecture than a database server implementation of a computing device. However, despite such differences, computing devices generally include processors that execute software and/or firmware instructions in order to implement various functionality. As such, aspects of the present application may find utility across a vast array of different computing devices and the intention is not to limit the scope of the present application to a specific computing device and/or computing platform beyond any such limits that may be found in the appended claims.
  • the e-commerce system 30 may enable customers, which may be guests or members of the e-commerce system 30 , to browse and/or otherwise locate products.
  • the e-commerce system 30 may further enable such customers to purchase products offered for sale.
  • the e-commerce system 30 may maintain an electronic product database or product catalog 300 which may be stored on an associated mass storage device 55 .
  • the product catalog 300 includes product listings 310 for each product available for purchase.
  • Each product listing 310 may include various information or attributes regarding the respective product, such as a unique product identifier (e.g., stock-keeping unit “SKU”), a product description, product image(s), manufacture information, available quantity, price, product features, etc.
  • SKU stock-keeping unit
  • the e-commerce system 30 may enable guests to purchase products without registering and/or otherwise signing-up for a membership, the e-commerce system 30 may provide additional and/or enhanced functionality to those users that become a member.
  • a customer profile 330 may include personal information 331 , purchase history data 335 , and other customer activity data 337 .
  • the personal information 331 may include such items as name, mailing address, email address, phone number, billing information, clothing sizes, birthdates of friends and family, etc.
  • the purchase history data 335 may include information regarding products previously purchased by the customer from the e-commerce system 30 .
  • the customer history data 335 may further include products previously purchased from affiliated online and brick-and-mortar vendors.
  • the other customer activity data 337 may include information regarding prior customer activities such as products for which the customer has previously searched, products for which the customer has previously viewed, products for which the customer has provide comments, products for which the customer has rated, products for which the customer has written reviews, etc. and/or purchased from the e-commerce system 30 .
  • the other customer activity data 337 may further include similar activities associated with affiliated online and brick-and-mortar vendors.
  • the e-commerce system 30 may cause a computing device 10 to display a product listing 310 as shown in FIG. 4 .
  • the e-commerce system 30 may provide such a product listing 310 in response to a member browsing products by type, price, kind, etc., viewing a list of products obtained from a product search, and/or other techniques supported by the e-commerce system 30 for locating products of interest.
  • the product listing 310 may include one or more representative images 350 of the product as well as a product description 360 .
  • the product listing 310 may further include one or more products 370 recommended by a recommendation engine of the tailored services 35 .
  • the recommendation engine may provide product recommendations based on the personal information 331 , purchase history data 335 and/or activity data 337 .
  • the classifier 33 in accordance with the method 500 respectively transforms the purchase history data and demographic data into a transaction space and feature space which the classifier 33 may use to partition or cluster the customer base as shown and discussed below in regard to FIG. 6 .
  • the classifier 33 at 510 may preprocess purchase history data 335 to obtain a Customer-Item (CI) matrix.
  • the e-commerce system 30 may collect and maintain purchase history data 335 for the customer over a period of time.
  • the purchase history data in its raw form, may include information recorded for each purchase.
  • An example entry is shown in FIG. 6 .
  • the e-commerce system 30 may maintain the purchase history data 335 in one or more relational database tables.
  • Each row of the purchase history table may include a row for each transaction, and each row may include a customer identifier (ID) that uniquely identifies the customer associated with the corresponding transaction.
  • ID customer identifier
  • the classifier 33 may preprocess the raw purchase history information found in the purchase history table into a Customer-Item space. To this end, the classifier 33 may select a time window (e.g., the most recent 24 months). The classifier 33 may extract entries from the purchase history table that have a transaction date that falls within the selected time window. The classifier 33 may then discard all fields other than the Customer ID, Item ID and Quantity of that particular item purchased in that transaction.
  • a time window e.g., the most recent 24 months.
  • the classifier 33 may extract entries from the purchase history table that have a transaction date that falls within the selected time window.
  • the classifier 33 may then discard all fields other than the Customer ID, Item ID and Quantity of that particular item purchased in that transaction.
  • the classifier 33 may be configured to coalesce purchased items of multiple Item IDs under a single Category ID that lies at a high level in the product hierarchy.
  • FIG. 7 shows an example table after evaluating the time window as described above.
  • the resulting table may still include multiple entries or rows for each Customer ID and Category ID pair.
  • the classifier 33 may apply a pivoting step to the resulting table in order to combine rows having the same Customer ID and Category ID pair into a single row.
  • the resulting table includes a single row for each Customer ID and Category ID pair and includes Quantity data that contains the sum of all purchased quantities for this ID pair.
  • the classifier 33 may create a Customer-Item (CI) matrix.
  • CI Customer-Item
  • each row i corresponds to a unique Customer ID
  • each column j corresponds to a unique Category ID
  • the entry CI ij corresponds to the quantity of this Customer ID and Category ID pair from the table shown in FIG. 7 . If a particular customer did not purchase from a product in a category of CI matrix, then corresponding entry is zero.
  • the classifier 33 may further preprocess the demographic data of its customers to obtain a feature space.
  • the e-commerce system 30 may collect demographic data from customers such as personal information 331 provided in the customers profile 330 .
  • the e-commerce system 30 may further obtain demographic data for customers from various providers of demographic data.
  • the classifier 33 may maintain and/or create a demographic table.
  • the demographic table may include a row for each Customer ID.
  • each column of the table may represent a different feature such, as for example, age, gender, occupation, number of children, etc.
  • the classifier 33 may turn each demographic entry into a numerical value.
  • the “Gender” column may contain only two kinds of entries, male and female.
  • the classifier 33 may preprocess the demographic table such that that Gender column includes a 1 for each female customer and a 0 otherwise.
  • the preprocessed demographic table may form the feature space for later classification.
  • the classifier 33 at 520 may standardize the CI matrix to obtain a standardized CI matrix which is referred to as transaction space. Standardizing the CI Matrix may ensure that the columns of the standardized CI matrix are scale-wise comparable with each other. In one embodiment, the classifier 33 applies standardization to each column separately using a bin quantiles standardization (BQS) technique. However, other standardization techniques may be utilized.
  • BQS bin quantiles standardization
  • the classifier 33 in accordance with the BQS technique may traverse the column, record every unique quantity except zero that appears along with how many times each unique quantity appears in the column.
  • the classifier 33 may sort the results based on occurrence of each unique quantity. See, e.g., the Occurrences column of FIG. 10 .
  • the classifier 33 may traverse the occurrences to obtain a cumulative sum of the number of occurrences.
  • the classifier 33 for each row may divide the respective cumulative occurrence value by the last number in the cumulative occurrence column (i.e., the total number of occurrences) to obtain the quantile value for that row. See, e.g., Quantile column of FIG. 10 .
  • the BQS result shown in FIG. 10 suggests that the customers who bought 1 item associated with the category ID constitute the first 50% quantile, customers who bought 2 or less such items are the 75% quantile, and customers who bought 8 or less such items are the 100% quantile.
  • the classifier 33 may then update the quantity values of the original column with their corresponding quantile values as shown in FIG. 11 to obtain the standardized column.
  • the BQS technique may provide two advantages. One, all the numbers in the columns of CI matrix are guaranteed to be between 0 and 1, therefore the purchase patterns of high-frequency items such as grocery items and a low-frequency items such as expensive electronics items are comparable. Second, because the quantile values are thought in terms of frequencies of each number appearing and their relative order rather than their nominal values, the occasional very large number observed in the columns do not skew the analysis.
  • the classifier 33 may classify or cluster the customers.
  • the classifier 33 may attempt to find linear partitions in the feature space that divides the data points (customers) into groups or clusters with the smallest sum of distances within themselves. The distances are defined using the standardized transaction space.
  • the distance between customer A and customer B is a measure of the dissimilarity between their purchase history data 335 . While many distance functions may be used, the classifier 33 in one embodiment uses the Minkowski distance for Euclidean space.
  • the Minkowski distance for an integer p may be represented by the following expression:
  • CI A represents the row in the standardized CI matrix for the customer A
  • CI B represents the row in the standardized CI matrix for the customer B
  • CI A i represents the i th element of row CI A
  • CI B i represents the i th element of row CI B .
  • the classifier 33 may alternatively utilize a distance function that provides a metric of the similarity between customers. In such an embodiment, the classifier 33 may attempt to maximize the sum of inner-similarities per cluster. For example, the classifier 33 may use Jaccard similarity functions, correlation functions, and/or some other similarity function in such an embodiment.
  • the classifier 33 may proceed to analyze the feature space and transaction space in order to identify clusters of customers with similar purchasing behaviors. To this end, the classifier 33 may iteratively divide customer sets into two partitions until a suitable number of partitions for the customer base is obtained. In particular, the classifier 33 may divide the feature space into two partitions that minimizes the inner-distance between members of the cluster in the transaction space by solving an Integer Program that takes into account both the feature space and transaction space of the customer base.
  • the following parameters, data, variables, and formulation define a Integer Program which may be solved to obtain a hyperplane that suitably divides the customer base into two clusters.
  • the above Integer Program when solved by an Integer Programming solver of the classifier 33 , returns the clustering of customers in the feature space together with hyperplane variables ⁇ and ⁇ 0 that define the division rule for the clusters.
  • the classifier 33 may use the division rule to place new customers into one of the defined clusters based on known demographic features. By doing so, the classifier 33 may obtain some insight into the likely purchasing behavior for a new customer despite not having much or any purchase history data for the new customer.
  • the above Integer Program divides the customer base into only two clusters or partitions, which is most likely not enough number of clusters to provide meaningful insight into the purchasing behaviors of the customer base. Accordingly, the classifier 33 may iteratively apply the above Integer Program in order to further divide the clusters until a suitable number of clusters are obtained.
  • Such an iterative clustering method 600 is shown in FIG. 12 .
  • the classifier 33 at 610 may solve the above Integer Program to obtain a hyperplane that divides or partitions the customer base or data set into two partitions or clusters. After dividing the data set into two clusters, the classifier 33 at 620 may determine whether further partitioning of the data set is warranted. To this end, the classifier 33 may make such a determination based upon a stopping rule.
  • a stopping rule may define conditions for stopping further partitioning of the data set and for identifying which cluster or clusters to further divide.
  • a first example stopping rule may be to pre-define the desired number of clusters, and iteratively keep dividing the cluster with the largest population until the desired number of clusters is reached.
  • a second example stopping rule may be to define the largest population to be allowed in a single cluster, and keep dividing the clusters that are more populated than this limit until no cluster exceeds this limit. It should be appreciated that the above two stopping rules are merely examples and that other stopping rules and/or a combination of rules may be used by the classifier 33 to ascertain whether to cease partitioning and/or selecting which clusters to further partition.
  • the classifier 33 may cease further partitioning of the data set. However, if the classifier 33 determines that the stopping rules indicates further partitioning is warranted, then the classifier 33 at 630 may select a cluster for further partitioning based on the stopping rule. For example, the classifier 33 per the first example stopping rule may select the cluster having the largest population for further partitioning. If the second example stopping rule is being used, then the classifier 33 may select a cluster having a population greater than the predefined limit.
  • the classifier 33 may return to 610 in order to solve the Integer Program and obtain a hyperplane that partitions the selected cluster into two smaller clusters. In this manner, the classifier 33 may continue to obtain further partitions until a suitable number of partitions is achieved per the stopping rule in effect.
  • FIGS. 13-16 an example of partitioning a data set of customers per the method 600 is shown.
  • the example illustrates partitioning based on a stopping rule of the largest allowable cluster having a population of 3.
  • FIG. 13 an unclustered data set of 9 customers in a two dimensional feature space is shown.
  • FIG. 14 shows a hyperplane H 1 obtained by the classifier 33 as a result of solving the Integer Program in order to partition the 9 customers of FIG. 13 .
  • the lower partition has a data set of 3 customers and is thus not divided further per the stopping rule.
  • the upper partition defines a data set of 6 customers and thus exceeds the population limit of 3 for the stopping rule.
  • the classifier solves the Integer Program for the upper data set to obtain the hyperplane H 2 shown in FIG. 15 .
  • the upper left partition has a data set of 2 customers and is thus not divided further per the stopping rule.
  • the upper right partition defines a data set of 4 customers and thus still exceeds the population limit of 3 for the stopping rule.
  • the classifier solves the Integer Program for the upper right data set to obtain the hyperplane H 3 shown in FIG. 16 .
  • all partitions have less the than population limit of 3.
  • the classifier 33 ceases further partitioning of the customer base per the stopping rule.
  • certain embodiments may be implemented as a plurality of instructions on a non-transitory, computer readable storage medium such as, for example, flash memory devices, hard disk devices, compact disc media, DVD media, EEPROMs, etc.
  • Such instructions when executed by one or more computing devices, may result in the one or more computing devices identifying customer clusters based on purchase history data and demographic data for the customer.

Abstract

Methods and apparatus are disclosed regarding an e-commerce system that clusters customers based on demographic data and purchase history data for the customers. In some embodiments, the e-commerce system solves an Integer Program that accounts for the demographic data and purchase history data in order to identify a hyperplane that splits a selected cluster of customers.

Description

    FIELD OF THE INVENTION
  • Various embodiments relate to electronic commerce (e-commerce), and more particularly, to classifying customers in an e-commerce environment.
  • BACKGROUND OF THE INVENTION
  • Electronic commerce (e-commerce) websites are an increasingly popular venue for consumers to research and purchase products without physically visiting a conventional brick-and-mortar retail store. An e-commerce website may provide products and/or services to a vast number of customers. As a result of providing such products and/or services, the e-commerce website may obtain extensive amounts of data about their customer base. Such customer data may aid the e-commerce website to provide products and/or services that are relevant and/or otherwise desirable to a particular customer.
  • In particular, an e-commerce website may attempt to identify groups of customers with similar interests or similar lifestyles. The e-commerce website may analyze these identified groups to derive generalizations regarding members of the group. The e-commerce website may then tailor its services to members of each group based upon the derived generalizations.
  • Limitations and disadvantages of conventional and traditional approaches should become apparent to one of skill in the art, through comparison of such systems with aspects of the present invention as set forth in the remainder of the present application.
  • BRIEF SUMMARY OF THE INVENTION
  • Apparatus and methods of classifying or grouping customers are substantially shown in and/or described in connection with at least one of the figures, and are set forth more completely in the claims.
  • These and other advantages, aspects and novel features of the present invention, as well as details of an illustrated embodiment thereof, will be more fully understood from the following description and drawings.
  • BRIEF DESCRIPTION OF SEVERAL VIEWS OF THE DRAWINGS
  • FIG. 1 shows an e-commerce environment comprising a computing device and an e-commerce system in accordance with an embodiment of the present invention.
  • FIG. 2 shows an embodiment of a computing device for use in the e-commerce environment of FIG. 1.
  • FIG. 3 shows user profiles and product catalogs maintained by an e-commerce system of FIG. 1.
  • FIG. 4 shows an embodiment of a product listing provided by the e-commerce system of FIG. 1.
  • FIG. 5 shows a flowchart for an embodiment of a process that may be used by the e-commerce system of FIG. 1 to obtain a transaction space and a feature space from purchase history data and demographic data.
  • FIG. 6 shows an example entry of the purchase history data for the e-commerce system of FIG. 1.
  • FIG. 7 shows an example purchase history table for the e-commerce system of FIG. 1 after evaluating and retaining data of the purchase history data for a time window of interest.
  • FIG. 8 shows an example purchase history table for the e-commerce system of FIG. 1 after combining rows that correspond to the same customer and product category.
  • FIG. 9 shows an entry from an example customer-item (CI) matrix for the e-commerce system of FIG. 1.
  • FIG. 10 shows an example quantile table for the e-commerce system of FIG. 1.
  • FIG. 11 shows a standardized entry from the example quantile table of FIG. 10.
  • FIG. 12 shows a flowchart of a process that may be used by the e-commerce system of FIG. 1 to cluster customers based on the transaction space and feature space.
  • FIGS. 13-16 depict an example partitioning of a customer base.
  • DETAILED DESCRIPTION OF THE INVENTION
  • Aspects of the present invention are related to classifying and/or grouping customers together that exhibit similar interests, lifestyles, and/or purchase behavior. More specifically, certain embodiments of the present invention relate to apparatus, hardware and/or software systems, and associated methods that cluster customers based on solving an Integer Program that accounts for purchase history data and demographic data of the customers.
  • Referring now to FIG. 1, an e-commerce environment 10 is depicted. As shown, the e-commerce environment 10 may include a computing device 20 connected to an e-commerce system 30 via a network 40. The network 40 may include a number of private and/or public networks such as, for example, wireless and/or wired LAN networks, cellular networks, and the Internet that collectively provide a communication path and/or paths between the computing device 20 and the e-commerce system 30. The computing device 20 may include a desktop, a laptop, a tablet, a smart phone, and/or some other type of computing device which enables a user to communicate with the e-commerce system 30 via the network 40. The e-commerce system 30 may include one or more web servers, database servers, routers, load balancers, and/or other computing and/or networking devices that operate to provide an e-commerce experience for users that connect to the e-commerce system 30 via the computing device 20 and the network 40.
  • The e-commerce system 30 may further include a customer classifier 33, one or more tailored services 35, and one or more electronic databases 37 upon which are stored purchase history data 38 and demographic data 39 for customers of the e-commerce system 30. The classifier 33 may include one or more firmware and/or software instructions, routines, modules, etc. that the e-commerce system 30 may execute in order to classify, group, or cluster customers of the e-commerce system 30 into classes, groups, or clusters of customers that exhibit similar purchasing habits. The classifier 33 may analyze purchase history data and demographic data for the customers to identify clusters of customers with similar purchasing preferences.
  • The tailored services 35 may comprise one or more firmware and/or software instructions, routines, modules, etc. that the e-commerce system 30 may execute in order to tailor one or more aspects of the e-commerce system 30 for a particular customer. The tailored services 35 may include advertisements, promotions, product recommendations, email campaigns, etc. that are tailored based upon the cluster to which the customer has been placed.
  • The classifier 33 and tailored services 35 may be executed concurrently by a single computing device of the e-commerce system 30. However, in some embodiments, a computing device may execute the classifier 33 offline in order to obtain appropriate clusters and other input data for the tailored services 35. Moreover, the classifier 33 may periodically (e.g., once an hour, once a day, once a week, etc.) provide one or more of the tailored services 35 with updated cluster and other input data. In this manner, the e-commerce system 30 may continue to provide tailored services 35 without the constant overhead of the classifier 33 and/or without the overhead of constant updates. For example, the e-commerce system 30 may execute the classifier 33 only during generally idle periods (e.g., after normal business hours). Further details regarding the classifier 33 and the tailored services 35 are presented below in regard to FIGS. 5-11.
  • FIG. 1 depicts a simplified embodiment of the e-commerce environment 10 which may be implemented in numerous different manners using a wide range of different computing devices, platforms, networks, etc. Moreover, while aspects of the e-commerce environment 10 may be implemented using a client/server architecture, aspects of the e-commerce may be implemented using a peer-to-peer architecture or another networking architecture.
  • As noted above, the e-commerce system 30 may include one or more computing devices. FIG. 2 depicts an embodiment of a computing device 50 suitable for the computing device 20 and/or the e-commerce system 30. As shown, the computing device 50 may include a processor 51, a memory 53, a mass storage device 55, a network interface 57, and various input/output (I/O) devices 59. The processor 51 may be configured to execute instructions, manipulate data and generally control operation of other components of the computing device 50 as a result of its execution. To this end, the processor 51 may include a general purpose processor such as an x86 processor or an ARM processor which are available from various vendors. However, the processor 51 may also be implemented using an application specific processor and/or other logic circuitry.
  • The memory 53 may store instructions and/or data to be executed and/or otherwise accessed by the processor 51. In some embodiments, the memory 53 may be completely and/or partially integrated with the processor 51.
  • In general, the mass storage device 55 may store software and/or firmware instructions which may be loaded in memory 53 and executed by processor 51. The mass storage device 55 may further store various types of data which the processor 51 may access, modify, and/otherwise manipulate in response to executing instructions from memory 53. To this end, the mass storage device 55 may comprise one or more redundant array of independent disks (RAID) devices, traditional hard disk drives (HDD), solid-state device (SSD) drives, flash memory devices, read only memory (ROM) devices, etc.
  • The network interface 57 may enable the computing device 50 to communicate with other computing devices directly and/or via network 40. To this end, the networking interface 57 may include a wired networking interface such as an Ethernet (IEEE 802.3) interface, a wireless networking interface such as a WiFi (IEEE 802.11) interface, a radio or mobile interface such as a cellular interface (GSM, CDMA, LTE, etc), and/or some other type of networking interface capable of providing a communications link between the computing device 50 and network 40 and/or another computing device.
  • Finally, the I/O devices 59 may generally provide devices which enable a user to interact with the computing device 50 by either receiving information from the computing device 50 and/or providing information to the computing device 50. For example, the I/O devices 59 may include display screens, keyboards, mice, touch screens, microphones, audio speakers, etc.
  • While the above provides general aspects of a computing device 50, those skilled in the art readily appreciate that there may be significant variation in actual implementations of a computing device. For example, a smart phone implementation of a computing device may use vastly different components and may have a vastly different architecture than a database server implementation of a computing device. However, despite such differences, computing devices generally include processors that execute software and/or firmware instructions in order to implement various functionality. As such, aspects of the present application may find utility across a vast array of different computing devices and the intention is not to limit the scope of the present application to a specific computing device and/or computing platform beyond any such limits that may be found in the appended claims.
  • As part of the provided e-commerce experience, the e-commerce system 30 may enable customers, which may be guests or members of the e-commerce system 30, to browse and/or otherwise locate products. The e-commerce system 30 may further enable such customers to purchase products offered for sale. To this end, the e-commerce system 30 may maintain an electronic product database or product catalog 300 which may be stored on an associated mass storage device 55. As shown in FIG. 3, the product catalog 300 includes product listings 310 for each product available for purchase. Each product listing 310 may include various information or attributes regarding the respective product, such as a unique product identifier (e.g., stock-keeping unit “SKU”), a product description, product image(s), manufacture information, available quantity, price, product features, etc. Moreover, while the e-commerce system 30 may enable guests to purchase products without registering and/or otherwise signing-up for a membership, the e-commerce system 30 may provide additional and/or enhanced functionality to those users that become a member.
  • To this end, the e-commerce system 30 may enable members to create a customer profile 330. As shown, a customer profile 330 may include personal information 331, purchase history data 335, and other customer activity data 337. The personal information 331 may include such items as name, mailing address, email address, phone number, billing information, clothing sizes, birthdates of friends and family, etc. The purchase history data 335 may include information regarding products previously purchased by the customer from the e-commerce system 30. The customer history data 335 may further include products previously purchased from affiliated online and brick-and-mortar vendors.
  • The other customer activity data 337 may include information regarding prior customer activities such as products for which the customer has previously searched, products for which the customer has previously viewed, products for which the customer has provide comments, products for which the customer has rated, products for which the customer has written reviews, etc. and/or purchased from the e-commerce system 30. The other customer activity data 337 may further include similar activities associated with affiliated online and brick-and-mortar vendors.
  • As part of the e-commerce experience, the e-commerce system 30 may cause a computing device 10 to display a product listing 310 as shown in FIG. 4. In particular, the e-commerce system 30 may provide such a product listing 310 in response to a member browsing products by type, price, kind, etc., viewing a list of products obtained from a product search, and/or other techniques supported by the e-commerce system 30 for locating products of interest. As shown, the product listing 310 may include one or more representative images 350 of the product as well as a product description 360. The product listing 310 may further include one or more products 370 recommended by a recommendation engine of the tailored services 35. In particular, the recommendation engine may provide product recommendations based on the personal information 331, purchase history data 335 and/or activity data 337.
  • Referring now to FIG. 5, an example method 500 that may be implemented by the classifier 33 of the e-commerce system 30 is shown. In general, the classifier 33 in accordance with the method 500 respectively transforms the purchase history data and demographic data into a transaction space and feature space which the classifier 33 may use to partition or cluster the customer base as shown and discussed below in regard to FIG. 6. To this end, the classifier 33 at 510 may preprocess purchase history data 335 to obtain a Customer-Item (CI) matrix. The e-commerce system 30 may collect and maintain purchase history data 335 for the customer over a period of time. The purchase history data, in its raw form, may include information recorded for each purchase. An example entry is shown in FIG. 6. As shown, the e-commerce system 30 may maintain the purchase history data 335 in one or more relational database tables. Each row of the purchase history table may include a row for each transaction, and each row may include a customer identifier (ID) that uniquely identifies the customer associated with the corresponding transaction.
  • At 510, the classifier 33 may preprocess the raw purchase history information found in the purchase history table into a Customer-Item space. To this end, the classifier 33 may select a time window (e.g., the most recent 24 months). The classifier 33 may extract entries from the purchase history table that have a transaction date that falls within the selected time window. The classifier 33 may then discard all fields other than the Customer ID, Item ID and Quantity of that particular item purchased in that transaction.
  • Many e-commerce sites maintain a product hierarchy of product identifiers where the Item ID corresponds to the lowest level of such hierarchy and various Category IDs lie higher up in the product hierarchy. Moreover, in many environments, the Item IDs are at such a fine a granularity that correlations between purchases may be lost. In such situations, the classifier 33 may be configured to coalesce purchased items of multiple Item IDs under a single Category ID that lies at a high level in the product hierarchy.
  • FIG. 7 shows an example table after evaluating the time window as described above. As may be seen from FIG. 7, the resulting table may still include multiple entries or rows for each Customer ID and Category ID pair. The classifier 33 may apply a pivoting step to the resulting table in order to combine rows having the same Customer ID and Category ID pair into a single row. As shown in FIG. 8, the resulting table includes a single row for each Customer ID and Category ID pair and includes Quantity data that contains the sum of all purchased quantities for this ID pair.
  • From the table shown in FIG. 8, the classifier 33 may create a Customer-Item (CI) matrix. In the CI matrix, each row i corresponds to a unique Customer ID, each column j corresponds to a unique Category ID, and the entry CIij corresponds to the quantity of this Customer ID and Category ID pair from the table shown in FIG. 7. If a particular customer did not purchase from a product in a category of CI matrix, then corresponding entry is zero.
  • At 515, the classifier 33 may further preprocess the demographic data of its customers to obtain a feature space. The e-commerce system 30 may collect demographic data from customers such as personal information 331 provided in the customers profile 330. The e-commerce system 30 may further obtain demographic data for customers from various providers of demographic data. Based on such collected demographic data, the classifier 33 may maintain and/or create a demographic table. The demographic table may include a row for each Customer ID. Moreover, each column of the table may represent a different feature such, as for example, age, gender, occupation, number of children, etc. During preprocessing, the classifier 33 may turn each demographic entry into a numerical value. For example, the “Gender” column may contain only two kinds of entries, male and female. The classifier 33 may preprocess the demographic table such that that Gender column includes a 1 for each female customer and a 0 otherwise. The preprocessed demographic table may form the feature space for later classification.
  • After preprocessing the purchase history and demographic data, the classifier 33 at 520 may standardize the CI matrix to obtain a standardized CI matrix which is referred to as transaction space. Standardizing the CI Matrix may ensure that the columns of the standardized CI matrix are scale-wise comparable with each other. In one embodiment, the classifier 33 applies standardization to each column separately using a bin quantiles standardization (BQS) technique. However, other standardization techniques may be utilized.
  • To illustrate the BQS technique, one example column of the CI matrix is shown in FIG. 9. If depicted column corresponds to a category ID CID in the CI matrix, then the information in column suggests that customer 1 bought 1 unit of an item corresponding to category ID, customer 4 bought 2 items, customer 6 bought 1 item, and customer 7 bought 8 items. The classifier 33 in accordance with the BQS technique may traverse the column, record every unique quantity except zero that appears along with how many times each unique quantity appears in the column. The classifier 33 may sort the results based on occurrence of each unique quantity. See, e.g., the Occurrences column of FIG. 10. The classifier 33 may traverse the occurrences to obtain a cumulative sum of the number of occurrences. See, e.g., Cumulative Occurrences column of FIG. 10. Furthermore, the classifier 33 for each row may divide the respective cumulative occurrence value by the last number in the cumulative occurrence column (i.e., the total number of occurrences) to obtain the quantile value for that row. See, e.g., Quantile column of FIG. 10.
  • The BQS result shown in FIG. 10 suggests that the customers who bought 1 item associated with the category ID constitute the first 50% quantile, customers who bought 2 or less such items are the 75% quantile, and customers who bought 8 or less such items are the 100% quantile. The classifier 33 may then update the quantity values of the original column with their corresponding quantile values as shown in FIG. 11 to obtain the standardized column.
  • The BQS technique may provide two advantages. One, all the numbers in the columns of CI matrix are guaranteed to be between 0 and 1, therefore the purchase patterns of high-frequency items such as grocery items and a low-frequency items such as expensive electronics items are comparable. Second, because the quantile values are thought in terms of frequencies of each number appearing and their relative order rather than their nominal values, the occasional very large number observed in the columns do not skew the analysis.
  • After obtaining feature space the standardized transaction spaces, the classifier 33 may classify or cluster the customers. In particular, the classifier 33 may attempt to find linear partitions in the feature space that divides the data points (customers) into groups or clusters with the smallest sum of distances within themselves. The distances are defined using the standardized transaction space.
  • The distance between customer A and customer B is a measure of the dissimilarity between their purchase history data 335. While many distance functions may be used, the classifier 33 in one embodiment uses the Minkowski distance for Euclidean space. The Minkowski distance for an integer p may be represented by the following expression:

  • i=1 n|CIA i−CIB i|p)1/p
  • where CIA represents the row in the standardized CI matrix for the customer A; CIB represents the row in the standardized CI matrix for the customer B; CIA i represents the ith element of row CIA; CIB i represents the ith element of row CIB. The cases where p=1 and p=2 correspond to the Manhattan distance and Euclidean distance, respectively.
  • The classifier 33 may alternatively utilize a distance function that provides a metric of the similarity between customers. In such an embodiment, the classifier 33 may attempt to maximize the sum of inner-similarities per cluster. For example, the classifier 33 may use Jaccard similarity functions, correlation functions, and/or some other similarity function in such an embodiment.
  • After obtaining the feature space and transaction space, the classifier 33 may proceed to analyze the feature space and transaction space in order to identify clusters of customers with similar purchasing behaviors. To this end, the classifier 33 may iteratively divide customer sets into two partitions until a suitable number of partitions for the customer base is obtained. In particular, the classifier 33 may divide the feature space into two partitions that minimizes the inner-distance between members of the cluster in the transaction space by solving an Integer Program that takes into account both the feature space and transaction space of the customer base.
  • In one embodiment, the following parameters, data, variables, and formulation define a Integer Program which may be solved to obtain a hyperplane that suitably divides the customer base into two clusters.
  • Parameters and Data:
      • n=number of customers;
      • m=number of dimensions in feature space;
      • xi=length-m coordinate vector of customer i in feature space for i=1 . . . n;
      • dij=distance between customers i and j in transaction space according to a pre-selected distance metric;
      • C=a large constant; and
      • ε=a small constant (epsilon).
  • Variables:
      • Ii=indicator variable of customer i, which is one if customer i is in cluster 1 (one side of the optimum hyperplane), and zero if the customer is in cluster 2 (the other side of the hyperplane) in the feature space.
      • Jij=indicator variable for customer pair (i,j), which is equal to one if i and j are in the same cluster, and zero if they are in different clusters.
      • β=the length-m direction vector in feature space that defines the direction of the dividing hyperplane.
      • β0=scalar intercept of the dividing hyperplane.
  • Formulation:
      • Minimize:
  • i = 1 n j = 1 n d ij J ij
      • Subject to:

  • βx i0≦(1−I iC∀i

  • −βx i−β0 ≦I i ·C−ε∀i

  • I i −I j≦1−J ij ∀i,j

  • I i +I j≦1−J ij ∀i,j

  • I iε{0,1}∀i

  • 0≦J ij≦1∀i,j
  • The above Integer Program, when solved by an Integer Programming solver of the classifier 33, returns the clustering of customers in the feature space together with hyperplane variables β and β0 that define the division rule for the clusters. The classifier 33 may use the division rule to place new customers into one of the defined clusters based on known demographic features. By doing so, the classifier 33 may obtain some insight into the likely purchasing behavior for a new customer despite not having much or any purchase history data for the new customer.
  • The above Integer Program, however, divides the customer base into only two clusters or partitions, which is most likely not enough number of clusters to provide meaningful insight into the purchasing behaviors of the customer base. Accordingly, the classifier 33 may iteratively apply the above Integer Program in order to further divide the clusters until a suitable number of clusters are obtained. Such an iterative clustering method 600 is shown in FIG. 12.
  • At 610, the classifier 33 at 610 may solve the above Integer Program to obtain a hyperplane that divides or partitions the customer base or data set into two partitions or clusters. After dividing the data set into two clusters, the classifier 33 at 620 may determine whether further partitioning of the data set is warranted. To this end, the classifier 33 may make such a determination based upon a stopping rule. A stopping rule may define conditions for stopping further partitioning of the data set and for identifying which cluster or clusters to further divide. A first example stopping rule may be to pre-define the desired number of clusters, and iteratively keep dividing the cluster with the largest population until the desired number of clusters is reached. A second example stopping rule may be to define the largest population to be allowed in a single cluster, and keep dividing the clusters that are more populated than this limit until no cluster exceeds this limit. It should be appreciated that the above two stopping rules are merely examples and that other stopping rules and/or a combination of rules may be used by the classifier 33 to ascertain whether to cease partitioning and/or selecting which clusters to further partition.
  • If the classifier 33 determines that no further partitioning is warranted, then the classifier 33 may cease further partitioning of the data set. However, if the classifier 33 determines that the stopping rules indicates further partitioning is warranted, then the classifier 33 at 630 may select a cluster for further partitioning based on the stopping rule. For example, the classifier 33 per the first example stopping rule may select the cluster having the largest population for further partitioning. If the second example stopping rule is being used, then the classifier 33 may select a cluster having a population greater than the predefined limit.
  • After selecting an appropriate cluster for further partitioning, the classifier 33 may return to 610 in order to solve the Integer Program and obtain a hyperplane that partitions the selected cluster into two smaller clusters. In this manner, the classifier 33 may continue to obtain further partitions until a suitable number of partitions is achieved per the stopping rule in effect.
  • Referring now to FIGS. 13-16, an example of partitioning a data set of customers per the method 600 is shown. In particular, the example illustrates partitioning based on a stopping rule of the largest allowable cluster having a population of 3. Starting with FIG. 13, an unclustered data set of 9 customers in a two dimensional feature space is shown. FIG. 14 shows a hyperplane H1 obtained by the classifier 33 as a result of solving the Integer Program in order to partition the 9 customers of FIG. 13. After such partitioning of FIG. 14, the lower partition has a data set of 3 customers and is thus not divided further per the stopping rule. The upper partition, however, defines a data set of 6 customers and thus exceeds the population limit of 3 for the stopping rule. As such, the classifier solves the Integer Program for the upper data set to obtain the hyperplane H2 shown in FIG. 15.
  • After such partitioning of FIG. 15, the upper left partition has a data set of 2 customers and is thus not divided further per the stopping rule. The upper right partition, however, defines a data set of 4 customers and thus still exceeds the population limit of 3 for the stopping rule. As such, the classifier solves the Integer Program for the upper right data set to obtain the hyperplane H3 shown in FIG. 16. After such partitioning of FIG. 16, all partitions have less the than population limit of 3. As such, the classifier 33 ceases further partitioning of the customer base per the stopping rule.
  • Various embodiments of the invention have been described herein by way of example and not by way of limitation in the accompanying figures. For clarity of illustration, exemplary elements illustrated in the figures may not necessarily be drawn to scale. In this regard, for example, the dimensions of some of the elements may be exaggerated relative to other elements to provide clarity. Furthermore, where considered appropriate, reference labels have been repeated among the figures to indicate corresponding or analogous elements.
  • Moreover, certain embodiments may be implemented as a plurality of instructions on a non-transitory, computer readable storage medium such as, for example, flash memory devices, hard disk devices, compact disc media, DVD media, EEPROMs, etc. Such instructions, when executed by one or more computing devices, may result in the one or more computing devices identifying customer clusters based on purchase history data and demographic data for the customer.
  • While the present invention has been described with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the present invention. For example, the above embodiments were described primarily from the standpoint of an e-commerce environment. However, it should be appreciated that clustering of customers may be useful in other environments as well. For example, a brick-and-mortar store may cluster customers in order to provide targeted mailing, coupons, and/or other types of promotions to its customers. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the present invention without departing from its scope. Therefore, it is intended that the present invention not be limited to the particular embodiment or embodiments disclosed, but that the present invention encompasses all embodiments falling within the scope of the appended claims.

Claims (19)

What is claimed is:
1. A computer-implemented method, comprising:
iteratively splitting a plurality of customers into a plurality of clusters based on purchase history data and demographic data for the plurality of customers, wherein each iteration comprises selecting a cluster of customers and splitting the selected cluster until a stopping rule is satisfied; and
tailoring services provided to a customer based on a cluster from the plurality of clusters in which the customer resides.
2. The computer-implemented method of claim 1, wherein the tailoring comprises providing product recommendations based on the cluster in which the customer resides.
3. The computer-implemented method of claim 1, wherein the tailoring comprises providing product promotions based on the cluster in which the customer resides.
4. The computer-implemented method of claim 1, wherein the tailoring comprises providing coupons based on the cluster in which the customer resides.
5. The computer-implemented method of claim 1, wherein the tailoring comprises providing coupons based on the cluster in which the customer resides.
6. The computer-implemented method of claim 1, wherein the splitting comprises solving an Integer Program that accounts for the purchase history data and demographic data for the selected cluster.
7. The computer-implemented method of claim 1, wherein the selecting comprises selecting a cluster that has a population greater than a limit specified by the stopping rule.
8. The computer-implemented method of claim 1, wherein the selecting comprises selecting a cluster that has the largest population of the plurality of clusters.
9. The computer-implemented method of claim 1, further comprising ceasing the iteratively splitting in response to determining the plurality of clusters comprises a quantity of clusters desired by the stopping rule.
10. The computer-implemented method of claim 1, further comprising ceasing the iteratively splitting in response to determining that no cluster of the plurality of clusters comprises exceeds a population limit specified by the stopping rule.
11. The computer-implemented method of claim 1, wherein the splitting comprises solving the following Integer Program:
Minimize:
i = 1 n j = 1 n d ij J ij
Subject to:

βx i0≦(1−I iC∀i

−βx i−β0 ≦I i ·C−ε∀i

I i −I j≦1−J ij ∀i,j

I i +I j≦1−J ij ∀i,j

I iε{0,1}∀i

0≦J ij≦1∀i,j
where n is a number of customers; m is a number of dimensions in a feature space defined by demographic data for the number of customers; xi is a length-m coordinate vector of customer i in the feature space for i=1 . . . n; dij is a distance between customers i and j in transaction space according to a pre-selected distance metric; C is a large constant; ε is a small constant; Ii is an indicator variable of customer i, which is one if customer i is in cluster 1 (one side of the optimum hyperplane), and zero if the customer is in cluster 2 (the other side of the hyperplane) in the feature space; Jij is an indicator variable for customer pair (i,j), which is equal to one if i and j are in the same cluster, and zero if they are in different clusters; β is a length-m direction vector in the feature space that defines the direction of the dividing hyperplane; and β0 is a scalar intercept of the dividing hyperplane.
12. A non-transitory computer-readable medium, comprising a plurality of instructions, that in response to being executed, result in a computing device:
iteratively splitting a plurality of customers into a plurality of clusters based on purchase history data and demographic data for the plurality of customers, wherein each iteration comprises selecting a cluster of customers and splitting the selected cluster until a stopping rule is satisfied; and
tailoring services provided to a customer based on a cluster from the plurality of clusters in which the customer resides.
13. The non-transitory computer-readable medium of claim 12, further comprising instructions that result in the computing device splitting the selected cluster by solving an Integer Program that accounts for the purchase history data and demographic data for the selected cluster.
14. The non-transitory computer-readable medium of claim 12, further comprising instructions that result in the computing device:
selecting a cluster that has a population greater than a limit specified by the stopping rule; and
ceasing the iteratively splitting in response to determining that no cluster of the plurality of clusters exceeds the limit specified by the stopping rule.
15. The non-transitory computer-readable medium of claim 12, further comprising instructions that result in the computing device:
selecting a cluster that has the largest population of the plurality of clusters; and
ceasing the iteratively splitting in response to determining the plurality of clusters comprises a quantity of clusters desired by the stopping rule.
16. A computing device, comprising
an electronic database comprising demographic data and purchase history data for a plurality of customers; and
a processor configured to:
iteratively split a plurality of customers into a plurality of clusters based on purchase history data and demographic data for the plurality of customers, wherein each iteration comprises selecting a cluster of customers and splitting the selected cluster until a stopping rule is satisfied; and
tailor services provided to a customer based on a cluster from the plurality of clusters in which the customer resides.
17. The computing device of claim 16, wherein the processor is further configured to split the selected cluster by solving an Integer Program that accounts for the purchase history data and demographic data for the selected cluster.
18. The computing device of claim 16, wherein the processor is further configured to:
select a cluster that has a population greater than a limit specified by the stopping rule; and
cease further splitting in response to determining that no cluster of the plurality of clusters exceeds the limit specified by the stopping rule.
19. The computing device of claim 16, wherein the processor is further configured to:
select a cluster that has the largest population of the plurality of clusters; and
cease further iteratively splitting in response to determining the plurality of clusters comprises a quantity of clusters desired by the stopping rule.
US14/084,903 2013-11-20 2013-11-20 Customer clustering using integer programming Abandoned US20150142521A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US14/084,903 US20150142521A1 (en) 2013-11-20 2013-11-20 Customer clustering using integer programming
US16/366,542 US11288688B2 (en) 2013-11-20 2019-03-27 Customer clustering using integer programming
US17/705,483 US11823218B2 (en) 2013-11-20 2022-03-28 Customer clustering using integer programming
US18/499,613 US20240070694A1 (en) 2013-11-20 2023-11-01 Customer clustering using integer programming

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US14/084,903 US20150142521A1 (en) 2013-11-20 2013-11-20 Customer clustering using integer programming

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/366,542 Continuation US11288688B2 (en) 2013-11-20 2019-03-27 Customer clustering using integer programming

Publications (1)

Publication Number Publication Date
US20150142521A1 true US20150142521A1 (en) 2015-05-21

Family

ID=53174223

Family Applications (4)

Application Number Title Priority Date Filing Date
US14/084,903 Abandoned US20150142521A1 (en) 2013-11-20 2013-11-20 Customer clustering using integer programming
US16/366,542 Active 2033-12-05 US11288688B2 (en) 2013-11-20 2019-03-27 Customer clustering using integer programming
US17/705,483 Active US11823218B2 (en) 2013-11-20 2022-03-28 Customer clustering using integer programming
US18/499,613 Pending US20240070694A1 (en) 2013-11-20 2023-11-01 Customer clustering using integer programming

Family Applications After (3)

Application Number Title Priority Date Filing Date
US16/366,542 Active 2033-12-05 US11288688B2 (en) 2013-11-20 2019-03-27 Customer clustering using integer programming
US17/705,483 Active US11823218B2 (en) 2013-11-20 2022-03-28 Customer clustering using integer programming
US18/499,613 Pending US20240070694A1 (en) 2013-11-20 2023-11-01 Customer clustering using integer programming

Country Status (1)

Country Link
US (4) US20150142521A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160117702A1 (en) * 2014-10-24 2016-04-28 Vedavyas Chigurupati Trend-based clusters of time-dependent data
US20170068977A1 (en) * 2015-09-03 2017-03-09 Tata Consultancy Services Limited Estimating prospect lifetime values
US20170169447A1 (en) * 2015-12-09 2017-06-15 Oracle International Corporation System and method for segmenting customers with mixed attribute types using a targeted clustering approach
CN108830528A (en) * 2018-06-15 2018-11-16 重庆城市管理职业学院 Express mail Distribution path planing method based on time-space attribute
WO2018237051A1 (en) * 2017-06-20 2018-12-27 Catalina Marketing Corporation Machine learning for marketing of branded consumer products
CN109377213A (en) * 2018-08-27 2019-02-22 平安科技(深圳)有限公司 Self-service card Activiation method, device, computer equipment and storage medium
CN110111158A (en) * 2019-05-16 2019-08-09 创络(上海)数据科技有限公司 The Marketing Design method, apparatus and storage medium of life cycle or Development phase
US20190279232A1 (en) * 2018-03-09 2019-09-12 International Business Machines Corporation Job role identification
CN110503446A (en) * 2018-05-16 2019-11-26 江苏天智互联科技股份有限公司 The client segmentation method and decision-making technique of electric business platform based on clustering algorithm
US10902442B2 (en) 2016-08-30 2021-01-26 International Business Machines Corporation Managing adoption and compliance of series purchases
US11222047B2 (en) 2018-10-08 2022-01-11 Adobe Inc. Generating digital visualizations of clustered distribution contacts for segmentation in adaptive digital content campaigns
US11803868B2 (en) 2015-12-09 2023-10-31 Oracle International Corporation System and method for segmenting customers with mixed attribute types using a targeted clustering approach

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144970B (en) * 2019-11-18 2021-03-12 珠海必要工业科技股份有限公司 Order splitting method and device, electronic equipment and readable medium
CA3104730A1 (en) * 2019-12-31 2021-06-30 Affinio Inc. Apparatus for fast clustering of massive data based on variate-specific population strata

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7672865B2 (en) * 2005-10-21 2010-03-02 Fair Isaac Corporation Method and apparatus for retail data mining using pair-wise co-occurrence consistency
US7835940B2 (en) * 2000-02-24 2010-11-16 Twenty-Ten, Inc. Systems and methods for targeting consumers attitudinally aligned with determined attitudinal segment definitions
US8452652B2 (en) * 2001-01-29 2013-05-28 International Business Machines Corporation Electronic coupons decision support and recommendation system

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5317319A (en) * 1992-07-17 1994-05-31 Hughes Aircraft Company Automatic global radar/IR/ESM track association based on ranked candidate pairings and measures of their proximity
US8140402B1 (en) * 2001-08-06 2012-03-20 Ewinwin, Inc. Social pricing
US7424439B1 (en) * 1999-09-22 2008-09-09 Microsoft Corporation Data mining for managing marketing resources
US8271336B2 (en) * 1999-11-22 2012-09-18 Accenture Global Services Gmbh Increased visibility during order management in a network-based supply chain environment
US6981040B1 (en) * 1999-12-28 2005-12-27 Utopy, Inc. Automatic, personalized online information and product services
US20030023513A1 (en) * 2001-04-06 2003-01-30 Phil Festa E-business systems and methods for diversfied businesses
US7054847B2 (en) * 2001-09-05 2006-05-30 Pavilion Technologies, Inc. System and method for on-line training of a support vector machine
US8301482B2 (en) * 2003-08-25 2012-10-30 Tom Reynolds Determining strategies for increasing loyalty of a population to an entity
US7308418B2 (en) * 2004-05-24 2007-12-11 Affinova, Inc. Determining design preferences of a group
US8571951B2 (en) * 2004-08-19 2013-10-29 Leadpoint, Inc. Automated attachment of segmentation data to hot contact leads for facilitating matching of leads to interested lead buyers
US7610255B2 (en) * 2006-03-31 2009-10-27 Imagini Holdings Limited Method and system for computerized searching and matching multimedia objects using emotional preference
US8626618B2 (en) * 2007-11-14 2014-01-07 Panjiva, Inc. Using non-public shipper records to facilitate rating an entity based on public records of supply transactions
US8296182B2 (en) * 2008-08-20 2012-10-23 Sas Institute Inc. Computer-implemented marketing optimization systems and methods
US20130325681A1 (en) * 2009-01-21 2013-12-05 Truaxis, Inc. System and method of classifying financial transactions by usage patterns of a user
US10504126B2 (en) * 2009-01-21 2019-12-10 Truaxis, Llc System and method of obtaining merchant sales information for marketing or sales teams
US20100262464A1 (en) * 2009-04-09 2010-10-14 Access Mobility, Inc. Active learning and advanced relationship marketing
US20120116875A1 (en) * 2010-11-05 2012-05-10 Microsoft Corporation Providing advertisements based on user grouping
US20130132238A1 (en) * 2011-11-17 2013-05-23 Resource Ventures, Ltd. E-commerce loyalty system and method
JP6215095B2 (en) * 2014-03-14 2017-10-18 株式会社日立製作所 Information system
US10198738B2 (en) * 2014-08-25 2019-02-05 Accenture Global Services Limited System architecture for customer genome construction and analysis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7835940B2 (en) * 2000-02-24 2010-11-16 Twenty-Ten, Inc. Systems and methods for targeting consumers attitudinally aligned with determined attitudinal segment definitions
US8452652B2 (en) * 2001-01-29 2013-05-28 International Business Machines Corporation Electronic coupons decision support and recommendation system
US7672865B2 (en) * 2005-10-21 2010-03-02 Fair Isaac Corporation Method and apparatus for retail data mining using pair-wise co-occurrence consistency

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Tae Hyup Roh (Collaborative filtering recommendation based on SOM cluster-indexing CBR), 12/2003 Expert Systems with Applications 25, Pages 413–423. *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160117702A1 (en) * 2014-10-24 2016-04-28 Vedavyas Chigurupati Trend-based clusters of time-dependent data
US20170068977A1 (en) * 2015-09-03 2017-03-09 Tata Consultancy Services Limited Estimating prospect lifetime values
US10769651B2 (en) * 2015-09-03 2020-09-08 Tata Consultancy Services Limited Estimating prospect lifetime values
US20170169447A1 (en) * 2015-12-09 2017-06-15 Oracle International Corporation System and method for segmenting customers with mixed attribute types using a targeted clustering approach
US11803868B2 (en) 2015-12-09 2023-10-31 Oracle International Corporation System and method for segmenting customers with mixed attribute types using a targeted clustering approach
US10902442B2 (en) 2016-08-30 2021-01-26 International Business Machines Corporation Managing adoption and compliance of series purchases
US11222347B2 (en) 2017-06-20 2022-01-11 Catalina Marketing Corporation Machine learning for marketing of branded consumer products
WO2018237051A1 (en) * 2017-06-20 2018-12-27 Catalina Marketing Corporation Machine learning for marketing of branded consumer products
US11651381B2 (en) 2017-06-20 2023-05-16 Catalina Marketing Corporation Machine learning for marketing of branded consumer products
US20190279232A1 (en) * 2018-03-09 2019-09-12 International Business Machines Corporation Job role identification
CN110503446A (en) * 2018-05-16 2019-11-26 江苏天智互联科技股份有限公司 The client segmentation method and decision-making technique of electric business platform based on clustering algorithm
CN108830528A (en) * 2018-06-15 2018-11-16 重庆城市管理职业学院 Express mail Distribution path planing method based on time-space attribute
CN109377213A (en) * 2018-08-27 2019-02-22 平安科技(深圳)有限公司 Self-service card Activiation method, device, computer equipment and storage medium
US11222047B2 (en) 2018-10-08 2022-01-11 Adobe Inc. Generating digital visualizations of clustered distribution contacts for segmentation in adaptive digital content campaigns
CN110111158A (en) * 2019-05-16 2019-08-09 创络(上海)数据科技有限公司 The Marketing Design method, apparatus and storage medium of life cycle or Development phase

Also Published As

Publication number Publication date
US20190220879A1 (en) 2019-07-18
US11288688B2 (en) 2022-03-29
US20240070694A1 (en) 2024-02-29
US20220284457A1 (en) 2022-09-08
US11823218B2 (en) 2023-11-21

Similar Documents

Publication Publication Date Title
US11823218B2 (en) Customer clustering using integer programming
US11605111B2 (en) Heuristic clustering
US10095771B1 (en) Clustering and recommending items based upon keyword analysis
Wu et al. Contextual bandits in a collaborative environment
Shinde et al. Hybrid personalized recommender system using centering-bunching based clustering algorithm
US20180150914A1 (en) Identity mapping between commerce customers and social media users
US10810616B2 (en) Personalization of digital content recommendations
US11127063B2 (en) Product and content association
US20190012719A1 (en) Scoring candidates for set recommendation problems
US10373197B2 (en) Tunable algorithmic segments
Liu et al. Mobile commerce product recommendations based on hybrid multiple channels
US11048764B2 (en) Managing under—and over-represented content topics in content pools
JP2015526795A (en) Method and apparatus for estimating user demographic data
US11762819B2 (en) Clustering model analysis for big data environments
US9697275B2 (en) System and method for identifying groups of entities
Bhade et al. A systematic approach to customer segmentation and buyer targeting for profit maximization
Singh et al. Optimizing approach of recommendation system using web usage mining and social media for E-commerce
Mohammadnezhad et al. Providing a model for predicting tour sale in mobile e-tourism recommender systems
Liou et al. Hybrid Multiple Channels-based Recommendations for Mobile Commerce
Jadhav et al. Customer Segmentation and Buyer Targeting Approach
CN114756758B (en) Hybrid recommendation method and system
CN113837846B (en) Commodity recommendation method, commodity recommendation device, computer equipment and storage medium
Kaul et al. Evaluating Techniques for Mining Customer Purchase Behavior and Product Recommendation-A Survey

Legal Events

Date Code Title Description
AS Assignment

Owner name: JPP, LLC, FLORIDA

Free format text: SECURITY INTEREST;ASSIGNOR:SEARS BRANDS, L.L.C.;REEL/FRAME:045013/0355

Effective date: 20180104

AS Assignment

Owner name: SEARS BRANDS, L.L.C., ILLINOIS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AYDIN, BURCU;REEL/FRAME:047739/0393

Effective date: 20131120

STCV Information on status: appeal procedure

Free format text: BOARD OF APPEALS DECISION RENDERED

AS Assignment

Owner name: CANTOR FITZGERALD SECURITIES, AS AGENT, FLORIDA

Free format text: SECURITY INTEREST;ASSIGNOR:TRANSFORM SR BRANDS LLC;REEL/FRAME:048308/0275

Effective date: 20190211

AS Assignment

Owner name: SEARS BRANDS, L.L.C., ILLINOIS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:JPP, LLC;REEL/FRAME:048352/0708

Effective date: 20190211

AS Assignment

Owner name: CITIBANK, N.A., NEW YORK

Free format text: SECURITY INTEREST;ASSIGNOR:TRANSFORM SR BRANDS LLC;REEL/FRAME:048424/0291

Effective date: 20190211

Owner name: BANK OF AMERICA, N.A., MASSACHUSETTS

Free format text: SECURITY INTEREST;ASSIGNOR:TRANSFORM SR BRANDS LLC;REEL/FRAME:048433/0001

Effective date: 20190211

AS Assignment

Owner name: TRANSFORM SR BRANDS LLC, ILLINOIS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SEARS BRANDS, L.L.C.;REEL/FRAME:048710/0182

Effective date: 20190211

STCB Information on status: application discontinuation

Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION

AS Assignment

Owner name: TRANSFORM SR BRANDS LLC, ILLINOIS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CANTOR FITZGERALD SECURITIES, AS AGENT;REEL/FRAME:049284/0149

Effective date: 20190417

AS Assignment

Owner name: TRANSFORM SR BRANDS LLC, ILLINOIS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:BANK OF AMERICA, N.A.;REEL/FRAME:052183/0879

Effective date: 20200316

AS Assignment

Owner name: TRANSFORM SR BRANDS LLC, ILLINOIS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CITIBANK, N.A., AS AGENT;REEL/FRAME:052188/0176

Effective date: 20200317