US20150073944A1 - Method and system for classification of venue by analyzing data from venue website - Google Patents

Method and system for classification of venue by analyzing data from venue website Download PDF

Info

Publication number
US20150073944A1
US20150073944A1 US14/543,586 US201414543586A US2015073944A1 US 20150073944 A1 US20150073944 A1 US 20150073944A1 US 201414543586 A US201414543586 A US 201414543586A US 2015073944 A1 US2015073944 A1 US 2015073944A1
Authority
US
United States
Prior art keywords
venue
data
attribute
website
attributes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/543,586
Inventor
Thomas V. Sanguinetti
Dave T. Sanguinetti
Jeffrey S. Ploetner
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CRAZE Inc
Original Assignee
CRAZE Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CRAZE Inc filed Critical CRAZE Inc
Priority to US14/543,586 priority Critical patent/US20150073944A1/en
Assigned to CRAZE, INC. reassignment CRAZE, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SANGUINETTI, DAVE T, SANGUINETTI, THOMAS V, PLOETNER, JEFFREY S
Publication of US20150073944A1 publication Critical patent/US20150073944A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0623Item investigation
    • G06Q30/0625Directed, with specific intent or strategy
    • G06Q30/0627Directed, with specific intent or strategy using item specifications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/12Hotels or restaurants

Definitions

  • the invention relates to data analysis, and more particularly the invention relates to a method and system for classification of a venue by analyzing data from a venue website.
  • existing methods require parsing a large amount of data in order to rank parsed results according to page popularity. Also, existing methods typically count occurrences of individual words found on websites, without considering synonyms of individual words, and typically do not consider similarities of parsed words, as encountered across parsed websites. Moreover, existing methods typically perform searches strictly by character matching, so a search for “beef” would not necessarily yield results with “steak.” Accordingly, there is a need for a more intelligent method and system for analysis of venue website information.
  • a method and system classifies a venue by analyzing venue data from a venue website.
  • the method includes receiving preliminary venue-related data.
  • the preliminary-venue related data includes a venue URL.
  • the method includes scanning the venue website to retrieve venue data, wherein scanning the venue website includes retrieving the venue data from HTML pages, text documents, PDF documents and images.
  • the method includes retrieving verifiable venue data from the venue data.
  • the verifiable venue data is a subset of the venue data.
  • the method includes analyzing the verifiable venue data by comparing the verifiable venue data to the preliminary venue-related data and determining a probability level for the venue URL from the comparison. If the probability level for the venue URL is equal or greater than a first probability level, the venue website data is further analyzed to extract attributes and attribute counts in a robust and context-sensitive way. The method includes determining the percentage of the attribute representation from the total number of preselected attributes in the venue data and classifying the venue based on the percentage of the attribute representation.
  • the method includes determining attribute distance association by identifying correlation of attributes and quantifying attribute similarities from the attribute distance association.
  • the method includes comparing the classified venue to other venues based on quantified attribute similarities.
  • the method includes comparing selected verifiable venue data to corresponding preliminary venue-related data and determining if the selected verifiable venue data is different from the corresponding preliminary venue-related data.
  • the method includes assigning a probability level lower than the first probability level if the selected verifiable venue data is different from the corresponding preliminary venue-related data.
  • FIG. 1 is a flow diagram of a method for classifying a venue in accordance with one embodiment.
  • FIG. 2 is a flow diagram of a method for classifying a venue in accordance with another embodiment.
  • FIG. 3 is a system for implementing an example embodiment including at least portions of the disclosed embodiments.
  • a method and system classifies a venue by analyzing venue data from a venue website.
  • the venue may be a restaurant, a club, a bar, a hotel or any other establishment.
  • the venue is classified by analyzing attributes present in the venue data.
  • a venue may be classified based on certain attributes found in the website. Also, based on certain attributes in the venue website, other attributes that are typically relevant to the attributes found in the venue website may be inferred. In one embodiment, if certain words are synonyms of each other, and the fancy or elegant version of the word is used on the venue website, the venue may be classified appropriately such as for example as an upscale restaurant. Thus, based on the words found on the venue website, a classification of the venue may be made.
  • the word “appetizer” is a synonym of the word “hors'deurves.” If the word “hors'deurves” is used rather than its synonym “appetizer”, it may be reasonable to classify the venue as being an upscale restaurant. On the other hand, if the word “appetizer” is used, it may be reasonable to classify the venue as being casual.
  • the presence of certain words on a menu or on the website may provide information about the atmosphere of the venue. For example, if a menu lists “filet mignon” or a fancy dessert as an item, it may be reasonable to classify the venue as a high-end restaurant, and thus auto-add other relevant attributes generally associated with high-end restaurants.
  • certain selected words found on the website are assigned with a classifier
  • the words and attributes are analyzed based on the assigned classifier factors, including the proximity of certain attributes to each other.
  • FIG. 1 is a flow diagram 100 of classifying a venue in accordance with one embodiment.
  • venue data is received from a venue website.
  • the venue data can be retrieved by scanning data from a venue's website, including HTML pages, text documents, PDF documents, images, or any other media files.
  • the venue data may be derived from Flash, Silverlight, videos and other multimedia formats.
  • optical character recognition techniques used to recognize text contained within images, may be used to read words from images to collect venue data. Other image classification techniques may be used to collect the venue data as well.
  • pages of a venue website are visited by a parser.
  • Content (venue data) found on pages of the venue website is analyzed.
  • the number of selected attributes in the venue data is determined.
  • the selected attributes may be identified by the words or their synonyms or related words in the venue data.
  • the count of attribute occurrences (or their synonyms/determined related words) is tallied for the venue data. Attributes may be deduced from the venue data collected from a set of venue websites based on or specified a priori by any other method of compilation.
  • the resulting tallies might be:
  • step 112 the percentage of each selected attribute in the venue data is determined.
  • the system may consider factors other than strict tallies, such as a predetermined “Importance” factor for each attribute. Suppose the following Importance factors exist (0-1 scale):
  • the resulting attribute prominence weights (0-1 scale) may be:
  • the selected attributes in the venue data are each assigned a classifier factor.
  • attribute prestige factors may be determined based on frequency of expensive venues that have a particular subset/synonym of an attribute, versus the frequency of less expensive venues that feature a particular attribute. E.g. Filet mignon; $: 0.05, $$: 0.15, $$$: 0.50, $$$$: 0.70.
  • Other classifier factors may be assigned.
  • step 120 the venue is classified based on the percentage of each attribute representation and the classifier factor.
  • FIG. 2 is a flow diagram 200 of a method for classifying a venue by analyzing venue data from a venue website, in accordance with another embodiment.
  • preliminary venue-related data is received.
  • the preliminary venue-related data may be a venue URL, a venue address, a venue phone number, etc.
  • the venue website is scanned to retrieve venue data.
  • the venue data can be retrieved by scanning data from HTML pages, text documents, PDF documents and images found in the venue website.
  • the venue data is analyzed to identify verifiable venue data.
  • the verifiable venue data is a subset of the venue data that may be helpful in verifying the venue. For example, the venue address, the venue phone number and other important data may be considered verifiable venue data helpful in the verification of the venue.
  • the verifiable venue data is compared to the corresponding data from the preliminary venue-related data. In other words, certain data scanned from the venue is compared to the preliminary data received in order to verify that the correct venue is being analyzed.
  • a probability level or certainty level for the venue URL is determined.
  • a venue is assigned a numerical score representing the probability factor. For example consider a scenario where the venue address and venue name derived from scanning matches corresponding data from the preliminary venue-related data, but the venue phone number does not match. Consequently, the particular venue may be assigned a probability factor of 0.67.
  • the calculated probability level is compared to a predetermined probability level (e.g., first probability level). If the calculated probability level is equal or greater than the first probability level, the number of selected attributes in the venue data is determined. For example, the venue data may be analyzed to determine the number of occurrences of the attributes “appetizers”, “filet mignon”, “creme brulee”, etc. As discussed before, the venue data is also analyzed for synonyms or words related to the attributes.
  • a predetermined probability level e.g., first probability level
  • the percentage of the attribute representation from the total number of selected attributes in the venue data is determined. For example, the analysis may yield the following results: 30% steak, 25% chicken, 15% American, etc.
  • the venue is classified based on the percentage of the attribute representation. For example, if the percentage representation of the attribute “bar-be-que” is 55%, it may be reasonable to classify the venue as a bar-be-que restaurant. Also, if the analysis of the venue data reveals lack of the attributes “beef”, “steak”, “chicken”, “pork”, “fish” or their synonyms, it may be reasonable to classify the restaurant as a vegetarian restaurant.
  • an attribute distance association is determined by identifying correlation of attributes.
  • an attribute distance (correlation) matrix may be computed based on frequency of co-occurrence of attribute pairs within the venue data set. Suppose 25% of venues that have Chicken or Beef have both. Suppose 50% of venues that have Pasta or Italian have both. Suppose 67% of venues that have Taco or Burrito have both.
  • step 240 attribute similarities are quantified from the attribute distance association. Given a function X that converts correlation statistics to attribute similarities (0-1), the following may be computed:
  • the venue is compared to other venues based on the quantified attribute similarities.
  • an attribute-to-attribute comparison is performed, as well as an attribute-to-(related by similarity) attribute comparison, with overlap tallied.
  • an analysis may yield that the attributes “Moroccan” and “Greek” are 45% similar based on co-occurrences within the venue pool.
  • it may be determined that “Burrito” and “Taco” are 60% similar, and that they highly correlate with “Mexican.” These distance association may then be used to improve searches and general venue comparisons. For example, even with limited data, if there is only oneixie restaurant in a city, a search for similar venues may yield a nearby Greek restaurant, since the attributes “Moroccan” and “Greek” are somewhat similar, based on their correlation.
  • Venue 1 ⁇ Italian: 0.4, Chicken: 0.5, Salad: 0.25 ⁇
  • a general weighted venue-to-venue attribute comparison would yield a score of zero in this case.
  • the comparison may yield a significantly improved score.
  • the venue data may be analyzed for general information, such as operating hours, events, happy hours, specials, etc.
  • the venue data may be analyzed to identify menus based on known characteristics, including currency indicators to collect pricing data. Additional information may be deduced such as the venue atmosphere based on factors such as the Flesh-Kincaid Grade Level and readability scores, descriptions and synonyms used (e.g., filet mignon, free range chicken, wild Alaskan Salmon, Australian rack of lamb, etc.).
  • the venue data containing a Flesh-Kincaid Grade Level of 18 (the 18th grade) may be indicative of a Fine Dining atmosphere.
  • FIG. 3 is a system 300 that may be used to implement an example embodiment of the invention including at least portions of the disclosed embodiments.
  • the system 300 includes a server 304 including a central processing unit (CPU) 308 .
  • the server 304 is connected to a database 312 or any other data storage system
  • a venue classification application 316 may reside in the server 304 .
  • the venue classification application 316 may be a software application or a routine configured to classify a venue by analyzing venue data from a venue website.
  • the CPU 308 executes the application 316 to process data.
  • the server 304 is connected to the Internet 328 .
  • the server 304 may access venue websites 332 x, scan venue data from the venue websites 332 x, and classify venues based on the processes discussed before.
  • the venue data may be stored in the database 312 .
  • a software application embodying a computer program code may be configured to classify a venue by analyzing venue data from a venue website.
  • the steps of the methods described above may be executed by one or more computer readable codes embodied in a computer readable medium such as a computer program product.
  • the computer program product may be a CD, a floppy disk, an optical disk, a hard drive or any other storage system.
  • the venue classification method and system in accordance with embodiments described before provides various advantages.
  • the venue classification may be used to target only websites of interest to be parsed and adds uniformity to venue representations.
  • the venue classification provides varying levels of detail for comparisons, and results are effectively indexed by “meaning”, so a search for “beef” may also return results with “steak.”
  • the venue classification also provides context checking (e.g., out of context possibilities, context-safe attributes).
  • the venue classification enables parsing a set of target websites to deduce attributes, based on the merging of synonyms, and ultimately a frequency analysis.
  • the system, method, and computer program product described in this application may, of course, be embodied in hardware; e.g., within or coupled to a Central Processing Unit (“CPU”), microprocessor, microcontroller, System on Chip (“SOC”), or any other programmable device.
  • CPU Central Processing Unit
  • SOC System on Chip
  • the system, method, computer program product, and propagated signal may be embodied in software (e.g., computer readable code, program code, instructions and/or data disposed in any form, such as source, object or machine language) disposed, for example, in a computer usable (e.g., readable) medium configured to store the software.
  • software e.g., computer readable code, program code, instructions and/or data disposed in any form, such as source, object or machine language
  • a computer usable (e.g., readable) medium configured to store the software.
  • Such software enables the function, fabrication, modeling, simulation, description and/or testing of the apparatus and processes described herein.
  • this can be accomplished through the use of general programming languages (e.g., C, C++), GDSII databases, hardware description languages (HDL) including Verilog HDL, VHDL, AHDL (Altera HDL) and so on, or other available programs, databases, nanoprocessing, and/or circuit (i.e., schematic) capture tools.
  • Such software can be disposed in any known computer usable medium including semiconductor, magnetic disk, optical disc (e.g., CD-ROM, DVD-ROM, etc.) and as a computer data signal embodied in a computer usable (e.g., readable) transmission medium (e.g., carrier wave or any other medium including digital, optical, or analog-based medium).
  • the software can be transmitted over communication networks including the Internet and intranets.
  • a system, method, computer program product, and propagated signal embodied in software may be included in a semiconductor intellectual property core (e.g., embodied in HDL) and transformed to hardware in the production of integrated circuits.
  • a system, method, computer program product, and propagated signal as described herein may be embodied as a combination of hardware and software.
  • routines can execute on a single processing device or multiple processors. Although the steps, operations or computations may be presented in a specific order, this order may be changed in different embodiments. In some embodiments, multiple steps shown as sequential in this specification can be performed at the same time.
  • the sequence of operations described herein can be interrupted, suspended, or otherwise controlled by another process, such as an operating system, kernel, and the like.
  • the routines can operate in an operating system environment or as stand-alone routines occupying all, or a substantial part, of the system processing.
  • Embodiments of the invention may be implemented by using a general purpose digital computer, software applications, routines and software modules, hardware including application specific integrated circuits, programmable logic devices, field programmable gate arrays, optical and other mechanisms may be used.
  • the functions of the present invention can be achieved by any means as is known in the art.
  • Distributed, or networked systems, components and circuits can be used.
  • Communication, or transfer, of data may be wired, wireless, or by any other means.
  • any signal arrows in the drawings/ Figures should be considered only as exemplary, and not limiting, unless otherwise specifically noted.
  • the term “or” as used herein is generally intended to mean “and/or” unless otherwise indicated. Combinations of components or steps will also be considered as being noted, where terminology is foreseen as rendering the ability to separate or combine is unclear.
  • a “computer-readable medium” for purposes of embodiments of the present invention may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, system or device.
  • the computer readable medium can be, by way of example only but not by limitation, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, system, device, propagation medium, or computer memory.
  • One embodiment includes a method for classifying a venue by analyzing venue data from a venue website, comprising: receiving preliminary venue-related data including a venue URL; scanning the venue website to retrieve venue data; retrieving verifiable venue data from the venue data, the verifiable venue data being a subset of the venue data; analyzing the verifiable venue data by comparing the verifiable venue data to the preliminary venue-related data; determining a probability level for the venue URL from the comparison; if the probability level for the venue URL is equal or greater than a first probability level, determining the number of selected attributes in the venue data; determining the percentage of the attribute representation from the total number of preselected attributes in the venue data; and classifying the venue based on the percentage of the attribute representation.
  • the method may further comprise: determining attribute distance association by identifying correlation of attributes; quantifying attribute similarities from the attribute distance association; and comparing the classified venue to other venues based on quantified attribute similarities.
  • the method may further comprise: classifying the atmosphere of the venue based on the attributes.
  • the preliminary venue-related data comprises venue address, venue phone number, and venue category.
  • scanning the venue website comprises retrieving the venue data from HTML pages, text documents, PDF documents and images.
  • the verifiable venue data includes venue name, venue address, and venue phone number.
  • the method may further comprise: comparing selected verifiable venue data to corresponding preliminary venue-related data; determining if the selected verifiable venue data is different from the corresponding preliminary venue-related data; assigning a probability level lower than the first probability level if the selected verifiable venue data is different from the corresponding preliminary venue-related data.
  • the selected verifiable venue data is the venue address. In one implementation of the method, the selected verifiable venue data is the venue phone number. In one implementation of the method, the selected attributes include attribute synonyms.
  • the classification of the venue further comprises: assigning a classifier factor for each selected attribute in the venue data; classifying the venue from the assigned classifier factors.
  • the classifier factor is a fanciness factor. In one implementation of the method, the classifier factor is an outdoorsy factor. In one implementation of the method, the classifier factor is an activity factor.
  • One embodiment includes a method for classifying a venue by analyzing venue data from a venue website, comprising: receiving venue data from the venue website; determining the number of preselected attributes in the venue data; determining a percentage of each preselected attribute representation from the total number of preselected attributes in the venue data; assigning a classifier factor for each preselected attribute in the venue data; classifying the venue based on the percentage of each attribute representation and the classifier factor.
  • the receiving venue data comprises scanning and retrieving information from the venue website's HTML pages, text documents, PDF documents and images.
  • the method may further comprise: receiving preliminary venue-related data; retrieving verifiable venue data from the venue data; analyzing verifiable venue data by comparing the verifiable venue data to preliminary venue-related data; determining a probability level for the venue's URL from the comparison; verifying the venue from the probability level for the URL.
  • the method may further comprise: determining attribute distance association for the preselected attributes in the venue data; quantifying attribute similarities from the attribute distance association; comparing the venue to other venues based on quantified attribute similarities.
  • One embodiment includes a method for classifying a venue by analyzing selected attributes in a venue website, comprising: entering the venue website using a URL corresponding to the website; receiving venue data from the venue website; determining the number of selected attributes in the venue website; determining a percentage of each selected attribute representation from the total number of preselected attributes in the URL data; assigning a classifier factor for each selected attribute in the URL data; classifying the venue based on the percentage of each attribute representation and the classifier factor.
  • the receiving venue data comprises scanning and retrieving information from the venue website's HTML pages, text documents, PDF documents and images.
  • the method may further comprise: receiving preliminary venue-related data; retrieving verifiable venue data from the venue data; analyzing verifiable venue data by comparing the verifiable venue data to preliminary venue-related data; determining a probability level for the venue's URL from the comparison; verifying the venue from the probability level for the URL.
  • the method may further comprise: determining attribute distance association for the
  • One embodiment includes a system for classifying a venue by analyzing venue data from a venue website, comprising: a server having a central processing unit, the server receiving the venue data from the venue website; a venue classification application having program code for executing a plurality of steps to analyze the venue data and classify the venue data; a communication network enabling the server to access the venue website to receive the venue data; the central processing unit executing the steps of: determining the number of selected attributes in the venue data; determining a percentage of each selected attribute representation from the total number of selected attributes in the venue data; assigning a classifier factor for each selected attribute in the venue data; classifying the venue based on the percentage of each attribute representation and the classifier factor.
  • the communication network is the Internet.

Abstract

A method and system classifies a venue by analyzing venue data from a venue website. The method includes receiving preliminary venue related data. The method includes scanning the venue website to retrieve venue data, wherein scanning the venue website includes retrieving the venue data from HTML pages, text documents, PDF documents, and images. The method includes retrieving verifiable venue data from the venue data. The verifiable venue data is a subset of the venue data. The method includes analyzing the verifiable venue data by comparing the verifiable venue data to the preliminary venue-related data and determining a probability level for the venue URL from the comparison. If the probability level for the venue URL is equal or greater than a first probability level, the venue website data is further analyzed to extract attributes and attribute counts in a robust and context-sensitive way. The method includes determining the percentage of the attribute representation from the total number of preselected attributes in the venue data and classifying the venue based on the percentage of the attribute representation.

Description

    FIELD OF THE INVENTION
  • The invention relates to data analysis, and more particularly the invention relates to a method and system for classification of a venue by analyzing data from a venue website.
  • BACKGROUND OF THE INVENTION
  • Due to the increasing amount of information available on the Internet, there is a need for accurate extraction and analysis of information from websites. In particular, when extracting information relating to venues from venue websites some existing methods perform full text searches on the actual words on the website. Other existing methods simply classify the venues in broad terms and do not intelligently extract details and context of the information available on the website.
  • Typically existing methods require parsing a large amount of data in order to rank parsed results according to page popularity. Also, existing methods typically count occurrences of individual words found on websites, without considering synonyms of individual words, and typically do not consider similarities of parsed words, as encountered across parsed websites. Moreover, existing methods typically perform searches strictly by character matching, so a search for “beef” would not necessarily yield results with “steak.” Accordingly, there is a need for a more intelligent method and system for analysis of venue website information.
  • SUMMARY
  • A method and system classifies a venue by analyzing venue data from a venue website. The method includes receiving preliminary venue-related data. The preliminary-venue related data includes a venue URL. The method includes scanning the venue website to retrieve venue data, wherein scanning the venue website includes retrieving the venue data from HTML pages, text documents, PDF documents and images. The method includes retrieving verifiable venue data from the venue data. The verifiable venue data is a subset of the venue data.
  • The method includes analyzing the verifiable venue data by comparing the verifiable venue data to the preliminary venue-related data and determining a probability level for the venue URL from the comparison. If the probability level for the venue URL is equal or greater than a first probability level, the venue website data is further analyzed to extract attributes and attribute counts in a robust and context-sensitive way. The method includes determining the percentage of the attribute representation from the total number of preselected attributes in the venue data and classifying the venue based on the percentage of the attribute representation.
  • The method includes determining attribute distance association by identifying correlation of attributes and quantifying attribute similarities from the attribute distance association. The method includes comparing the classified venue to other venues based on quantified attribute similarities. The method includes comparing selected verifiable venue data to corresponding preliminary venue-related data and determining if the selected verifiable venue data is different from the corresponding preliminary venue-related data. The method includes assigning a probability level lower than the first probability level if the selected verifiable venue data is different from the corresponding preliminary venue-related data.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a flow diagram of a method for classifying a venue in accordance with one embodiment.
  • FIG. 2 is a flow diagram of a method for classifying a venue in accordance with another embodiment.
  • FIG. 3 is a system for implementing an example embodiment including at least portions of the disclosed embodiments.
  • DETAILED DESCRIPTION OF THE INVENTION
  • In one example implementation, a method and system classifies a venue by analyzing venue data from a venue website. The venue may be a restaurant, a club, a bar, a hotel or any other establishment. In one implementation, the venue is classified by analyzing attributes present in the venue data.
  • In accordance with one embodiment, a venue may be classified based on certain attributes found in the website. Also, based on certain attributes in the venue website, other attributes that are typically relevant to the attributes found in the venue website may be inferred. In one embodiment, if certain words are synonyms of each other, and the fancy or elegant version of the word is used on the venue website, the venue may be classified appropriately such as for example as an upscale restaurant. Thus, based on the words found on the venue website, a classification of the venue may be made. For example, the word “appetizer” is a synonym of the word “hors'deurves.” If the word “hors'deurves” is used rather than its synonym “appetizer”, it may be reasonable to classify the venue as being an upscale restaurant. On the other hand, if the word “appetizer” is used, it may be reasonable to classify the venue as being casual.
  • In many instances, the presence of certain words on a menu or on the website may provide information about the atmosphere of the venue. For example, if a menu lists “filet mignon” or a fancy dessert as an item, it may be reasonable to classify the venue as a high-end restaurant, and thus auto-add other relevant attributes generally associated with high-end restaurants.
  • In one embodiment, certain selected words found on the website are assigned with a classifier
  • factor. For example, items on a menu may be assigned a “fanciness factor”, words on the website may be assigned a “theme factor”, and activities may be assigned a “formal factor”, etc. In one embodiment, the words and attributes are analyzed based on the assigned classifier factors, including the proximity of certain attributes to each other.
  • FIG. 1 is a flow diagram 100 of classifying a venue in accordance with one embodiment. In step 104, venue data is received from a venue website. The venue data can be retrieved by scanning data from a venue's website, including HTML pages, text documents, PDF documents, images, or any other media files. In one embodiment, the venue data may be derived from Flash, Silverlight, videos and other multimedia formats. In one embodiment, optical character recognition techniques, used to recognize text contained within images, may be used to read words from images to collect venue data. Other image classification techniques may be used to collect the venue data as well.
  • In one implementation, pages of a venue website are visited by a parser. Content (venue data) found on pages of the venue website is analyzed.
  • In step 108, the number of selected attributes in the venue data is determined. The selected attributes may be identified by the words or their synonyms or related words in the venue data. The count of attribute occurrences (or their synonyms/determined related words) is tallied for the venue data. Attributes may be deduced from the venue data collected from a set of venue websites based on or specified a priori by any other method of compilation.
  • For example, consider the following venue data: “Join us for some of America's favorites, including filet mignon, New York strip steak, Australian rack of lamb, veal chops, and Australian lobster tail.”
  • Consider the following attribute set, with synonyms in parenthesis: [American (America), Beef (filet mignon, steak, veal), Chicken (buffalo wings, hot wings), Lamb (mutton), Seafood (fish, lobster, clams, scallops), Pork (ham, bacon, spam)]
  • The resulting tallies might be:
    • American: 1
    • Beef: 3
    • Chicken: 0
    • Lamb: 1
    • Seafood: 1
    • Pork: 0
  • In step 112, the percentage of each selected attribute in the venue data is determined. Consider the following resulting attribute tallies:
    • Beef: 4
    • Chicken: 3
    • Handicap Accessible: 2
    • Pork: 1
    • Total sum: 10
  • When calculating percentages, the system may consider factors other than strict tallies, such as a predetermined “Importance” factor for each attribute. Suppose the following Importance factors exist (0-1 scale):
    • Beef: 0.9
    • Chicken: 0.9
    • Handicap Accessible: 0.1
    • Pork: 0.9
  • Weighted Total sum: 4*0.9+3*0.9+2*0.1+1*0.9=7.4
  • The resulting percentages would be:
    • Beef: 4*0.9/7.4=49%
    • Chicken: 3*0.9/7.4=36%
    • Handicap Accessible: 2*0.1/7.4=3%
    • Pork: 1*0.9/7.4=12%
  • These percentages are compared to statistics across the venue data set to determine a relative attribute prominence. For example, suppose the following statistics are calculated based on percentages of attribute occurrence (among venues for which the attribute is present) across the venue data set:
  • Attribute AVG STD DEV
    Beef: 21% 7%
    Chicken: 17% 10%
    Handicap Accessible: 2% 2%
    Pork: 15% 4%
  • The resulting attribute prominence weights (0-1 scale) may be:
    • Beef: 0.95
    • Chicken: 0.65
    • Handicap Accessible: 0.55
    • Pork: 0.40
  • In step 116, the selected attributes in the venue data are each assigned a classifier factor. For example, attribute prestige factors may be determined based on frequency of expensive venues that have a particular subset/synonym of an attribute, versus the frequency of less expensive venues that feature a particular attribute. E.g. Filet mignon; $: 0.05, $$: 0.15, $$$: 0.50, $$$$: 0.70. Other classifier factors may be assigned.
  • In step 120, the venue is classified based on the percentage of each attribute representation and the classifier factor.
  • FIG. 2 is a flow diagram 200 of a method for classifying a venue by analyzing venue data from a venue website, in accordance with another embodiment. In step 204, preliminary venue-related data is received. The preliminary venue-related data may be a venue URL, a venue address, a venue phone number, etc. In step 208, the venue website is scanned to retrieve venue data. As discussed before, the venue data can be retrieved by scanning data from HTML pages, text documents, PDF documents and images found in the venue website.
  • In step 212, the venue data is analyzed to identify verifiable venue data. The verifiable venue data is a subset of the venue data that may be helpful in verifying the venue. For example, the venue address, the venue phone number and other important data may be considered verifiable venue data helpful in the verification of the venue. In step 216, the verifiable venue data is compared to the corresponding data from the preliminary venue-related data. In other words, certain data scanned from the venue is compared to the preliminary data received in order to verify that the correct venue is being analyzed. In step 220, a probability level or certainty level for the venue URL is determined. In one embodiment, based on the results of the comparison in step 216, a venue is assigned a numerical score representing the probability factor. For example consider a scenario where the venue address and venue name derived from scanning matches corresponding data from the preliminary venue-related data, but the venue phone number does not match. Consequently, the particular venue may be assigned a probability factor of 0.67.
  • In step 224, the calculated probability level is compared to a predetermined probability level (e.g., first probability level). If the calculated probability level is equal or greater than the first probability level, the number of selected attributes in the venue data is determined. For example, the venue data may be analyzed to determine the number of occurrences of the attributes “appetizers”, “filet mignon”, “creme brulee”, etc. As discussed before, the venue data is also analyzed for synonyms or words related to the attributes.
  • In step 228, the percentage of the attribute representation from the total number of selected attributes in the venue data is determined. For example, the analysis may yield the following results: 30% steak, 25% chicken, 15% American, etc. In step 232, the venue is classified based on the percentage of the attribute representation. For example, if the percentage representation of the attribute “bar-be-que” is 55%, it may be reasonable to classify the venue as a bar-be-que restaurant. Also, if the analysis of the venue data reveals lack of the attributes “beef”, “steak”, “chicken”, “pork”, “fish” or their synonyms, it may be reasonable to classify the restaurant as a vegetarian restaurant.
  • In step 236, an attribute distance association is determined by identifying correlation of attributes. For example, an attribute distance (correlation) matrix may be computed based on frequency of co-occurrence of attribute pairs within the venue data set. Suppose 25% of venues that have Chicken or Beef have both. Suppose 50% of venues that have Pasta or Italian have both. Suppose 67% of venues that have Taco or Burrito have both.
  • In step 240, attribute similarities are quantified from the attribute distance association. Given a function X that converts correlation statistics to attribute similarities (0-1), the following may be computed:
  • Attribute Pair Similarity
    Chicken, Beef 0.30
    Pasta, Italian 0.65
    Taco, Burrito 0.80
  • In step 244, the venue is compared to other venues based on the quantified attribute similarities. As explained before, an attribute-to-attribute comparison is performed, as well as an attribute-to-(related by similarity) attribute comparison, with overlap tallied. For example, an analysis may yield that the attributes “Moroccan” and “Greek” are 45% similar based on co-occurrences within the venue pool. Likewise, it may be determined that “Burrito” and “Taco” are 60% similar, and that they highly correlate with “Mexican.” These distance association may then be used to improve searches and general venue comparisons. For example, even with limited data, if there is only one Moroccan restaurant in a city, a search for similar venues may yield a nearby Greek restaurant, since the attributes “Moroccan” and “Greek” are somewhat similar, based on their correlation.
  • Consider for example, two venues are being compared, each venue being described by a set of attributes and “classifier factors”.
  • Venue 1: {Italian: 0.4, Chicken: 0.5, Salad: 0.25}
  • Venue 2: {Greek: 0.7, Kabobs: 0.4, Appetizers: 0.2}
  • A general weighted venue-to-venue attribute comparison would yield a score of zero in this case. However, if there is a set of Attribute Distance Associations defined as {Italian->Greek: 0.3, Chicken->Kabobs: 0.5, Salad->Appetizers: 0.4}, the comparison may yield a significantly improved score. As example, Italian->Greek overlap *0.3+Chicken->Kabobs overlap *0.5+Salad->Appetizers overlap *0.4=0.18+0.22+0.09=0.49 (raw score, non-normalized.)
  • In one embodiment, if the probability level discussed before is equal or greater than the first probability level, the venue data may be analyzed for general information, such as operating hours, events, happy hours, specials, etc. The venue data may be analyzed to identify menus based on known characteristics, including currency indicators to collect pricing data. Additional information may be deduced such as the venue atmosphere based on factors such as the Flesh-Kincaid Grade Level and readability scores, descriptions and synonyms used (e.g., filet mignon, free range chicken, wild Alaskan Salmon, Australian rack of lamb, etc.). For example, venue data containing a Flesh-Kincaid Grade Level of 18 (the 18th grade), coupled with attribute synonyms with high prestige factors may be indicative of a Fine Dining atmosphere.
  • FIG. 3 is a system 300 that may be used to implement an example embodiment of the invention including at least portions of the disclosed embodiments. The system 300 includes a server 304 including a central processing unit (CPU) 308. The server 304 is connected to a database 312 or any other data storage system A venue classification application 316 may reside in the server 304. The venue classification application 316 may be a software application or a routine configured to classify a venue by analyzing venue data from a venue website. The CPU 308 executes the application 316 to process data. The server 304 is connected to the Internet 328. The server 304 may access venue websites 332 x, scan venue data from the venue websites 332 x, and classify venues based on the processes discussed before. The venue data may be stored in the database 312.
  • In one example implementation, a software application embodying a computer program code may be configured to classify a venue by analyzing venue data from a venue website. In one implementation, the steps of the methods described above may be executed by one or more computer readable codes embodied in a computer readable medium such as a computer program product. The computer program product may be a CD, a floppy disk, an optical disk, a hard drive or any other storage system.
  • The venue classification method and system in accordance with embodiments described before provides various advantages. The venue classification may be used to target only websites of interest to be parsed and adds uniformity to venue representations. The venue classification provides varying levels of detail for comparisons, and results are effectively indexed by “meaning”, so a search for “beef” may also return results with “steak.” The venue classification also provides context checking (e.g., out of context possibilities, context-safe attributes). The venue classification enables parsing a set of target websites to deduce attributes, based on the merging of synonyms, and ultimately a frequency analysis.
  • The system, method, and computer program product described in this application may, of course, be embodied in hardware; e.g., within or coupled to a Central Processing Unit (“CPU”), microprocessor, microcontroller, System on Chip (“SOC”), or any other programmable device. Additionally, the system, method, computer program product, and propagated signal may be embodied in software (e.g., computer readable code, program code, instructions and/or data disposed in any form, such as source, object or machine language) disposed, for example, in a computer usable (e.g., readable) medium configured to store the software. Such software enables the function, fabrication, modeling, simulation, description and/or testing of the apparatus and processes described herein. For example, this can be accomplished through the use of general programming languages (e.g., C, C++), GDSII databases, hardware description languages (HDL) including Verilog HDL, VHDL, AHDL (Altera HDL) and so on, or other available programs, databases, nanoprocessing, and/or circuit (i.e., schematic) capture tools. Such software can be disposed in any known computer usable medium including semiconductor, magnetic disk, optical disc (e.g., CD-ROM, DVD-ROM, etc.) and as a computer data signal embodied in a computer usable (e.g., readable) transmission medium (e.g., carrier wave or any other medium including digital, optical, or analog-based medium). As such, the software can be transmitted over communication networks including the Internet and intranets. A system, method, computer program product, and propagated signal embodied in software may be included in a semiconductor intellectual property core (e.g., embodied in HDL) and transformed to hardware in the production of integrated circuits. Additionally, a system, method, computer program product, and propagated signal as described herein may be embodied as a combination of hardware and software.
  • Any suitable programming language can be used to implement the routines of the present
  • invention including C, C++, Java, assembly language, etc. Different programming techniques can be employed such as procedural or object oriented. The routines can execute on a single processing device or multiple processors. Although the steps, operations or computations may be presented in a specific order, this order may be changed in different embodiments. In some embodiments, multiple steps shown as sequential in this specification can be performed at the same time. The sequence of operations described herein can be interrupted, suspended, or otherwise controlled by another process, such as an operating system, kernel, and the like. The routines can operate in an operating system environment or as stand-alone routines occupying all, or a substantial part, of the system processing.
  • Embodiments of the invention may be implemented by using a general purpose digital computer, software applications, routines and software modules, hardware including application specific integrated circuits, programmable logic devices, field programmable gate arrays, optical and other mechanisms may be used. In general, the functions of the present invention can be achieved by any means as is known in the art. Distributed, or networked systems, components and circuits can be used. Communication, or transfer, of data may be wired, wireless, or by any other means.
  • It will also be appreciated that one or more of the elements depicted in the drawings/figures can also be implemented in a more separated or integrated manner, or even removed or rendered as inoperable in certain cases, as is useful in accordance with a particular application. It is also within the spirit and scope of the present invention to implement a program or code that can be stored in a machine-readable medium to permit a computer to perform any of the methods described above.
  • Additionally, any signal arrows in the drawings/Figures should be considered only as exemplary, and not limiting, unless otherwise specifically noted. Furthermore, the term “or” as used herein is generally intended to mean “and/or” unless otherwise indicated. Combinations of components or steps will also be considered as being noted, where terminology is foreseen as rendering the ability to separate or combine is unclear.
  • As used in the description herein and throughout the claims that follow, “a”, “an”, and “the” includes plural references unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.
  • In the description herein, numerous specific details are provided, such as examples of components and/or methods, to provide a thorough understanding of embodiments of the present invention. One skilled in the relevant art will recognize, however, that an embodiment of the invention can be practiced without one or more of the specific details, or with other apparatus, systems, assemblies, methods, components, materials, parts, and/or the like. In other instances, well-known structures, materials, or operations are not specifically shown or described in detail to avoid obscuring aspects of embodiments of the present invention.
  • A “computer-readable medium” for purposes of embodiments of the present invention may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, system or device. The computer readable medium can be, by way of example only but not by limitation, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, system, device, propagation medium, or computer memory.
  • Reference throughout this specification to “one implementation”, “one embodiment”, “an embodiment”, or “a specific embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention and not necessarily in all embodiments. Thus, respective appearances of the phrases “in one embodiment”, “in an embodiment”, or “in a specific embodiment” in various places throughout this specification are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics of any specific embodiment of the present invention may be combined in any suitable manner with one or more other embodiments. It is to be understood that other variations and modifications of the implementations or embodiments of the present invention described and illustrated herein are possible in light of the teachings herein and are to be considered as part of the spirit and scope of the present invention.
  • It will also be appreciated that one or more of the elements depicted in the drawings/figures can also be implemented in a more separated or integrated manner, or even removed or rendered as inoperable in certain cases, as is useful in accordance with a particular application. It is also within the spirit and scope of the present invention to implement a program or code that can be stored in a machine-readable medium to permit a computer to perform any of the methods described above.
  • The foregoing description of illustrated embodiments of the present invention, including what is described in the Abstract, is not intended to be exhaustive or to limit the invention to the precise forms disclosed herein. While specific embodiments of, and examples for, the invention are described herein for illustrative purposes only, various equivalent modifications are possible within the spirit and scope of the present invention, as those skilled in the relevant art will recognize and appreciate. As indicated, these modifications may be made to the present invention in light of the foregoing description of illustrated embodiments of the present invention and are to be included within the spirit and scope of the present invention.
  • Thus, while the present invention has been described herein with reference to particular embodiments thereof, a latitude of modification, various changes and substitutions are intended in the foregoing disclosures, and it will be appreciated that in some instances some features of embodiments of the invention will be employed without a corresponding use of other features without departing from the scope and spirit of the invention as set forth. Therefore, many modifications may be made to adapt a particular situation or material to the essential scope and spirit of the present invention. It is intended that the invention not be limited to the particular terms used in the following claims and/or to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but that the invention will include any and all embodiments and equivalents falling within the scope of the appended claims. Thus, the scope of the invention is to be determined solely by the appended claims.
  • One Embodiment of a Method
  • One embodiment includes a method for classifying a venue by analyzing venue data from a venue website, comprising: receiving preliminary venue-related data including a venue URL; scanning the venue website to retrieve venue data; retrieving verifiable venue data from the venue data, the verifiable venue data being a subset of the venue data; analyzing the verifiable venue data by comparing the verifiable venue data to the preliminary venue-related data; determining a probability level for the venue URL from the comparison; if the probability level for the venue URL is equal or greater than a first probability level, determining the number of selected attributes in the venue data; determining the percentage of the attribute representation from the total number of preselected attributes in the venue data; and classifying the venue based on the percentage of the attribute representation.
  • The method may further comprise: determining attribute distance association by identifying correlation of attributes; quantifying attribute similarities from the attribute distance association; and comparing the classified venue to other venues based on quantified attribute similarities.
  • The method may further comprise: classifying the atmosphere of the venue based on the attributes.
  • In one implementation of the method, the preliminary venue-related data comprises venue address, venue phone number, and venue category. In one implementation of the method, scanning the venue website comprises retrieving the venue data from HTML pages, text documents, PDF documents and images. In one implementation of the method, the verifiable venue data includes venue name, venue address, and venue phone number.
  • The method may further comprise: comparing selected verifiable venue data to corresponding preliminary venue-related data; determining if the selected verifiable venue data is different from the corresponding preliminary venue-related data; assigning a probability level lower than the first probability level if the selected verifiable venue data is different from the corresponding preliminary venue-related data.
  • In one implementation of the method, the selected verifiable venue data is the venue address. In one implementation of the method, the selected verifiable venue data is the venue phone number. In one implementation of the method, the selected attributes include attribute synonyms.
  • In one implementation of the method, the classification of the venue further comprises: assigning a classifier factor for each selected attribute in the venue data; classifying the venue from the assigned classifier factors.
  • In one implementation of the method, the classifier factor is a fanciness factor. In one implementation of the method, the classifier factor is an outdoorsy factor. In one implementation of the method, the classifier factor is an activity factor.
  • One Embodiment of a Method
  • One embodiment includes a method for classifying a venue by analyzing venue data from a venue website, comprising: receiving venue data from the venue website; determining the number of preselected attributes in the venue data; determining a percentage of each preselected attribute representation from the total number of preselected attributes in the venue data; assigning a classifier factor for each preselected attribute in the venue data; classifying the venue based on the percentage of each attribute representation and the classifier factor.
  • In one implementation of the method, the receiving venue data comprises scanning and retrieving information from the venue website's HTML pages, text documents, PDF documents and images.
  • The method may further comprise: receiving preliminary venue-related data; retrieving verifiable venue data from the venue data; analyzing verifiable venue data by comparing the verifiable venue data to preliminary venue-related data; determining a probability level for the venue's URL from the comparison; verifying the venue from the probability level for the URL.
  • The method may further comprise: determining attribute distance association for the preselected attributes in the venue data; quantifying attribute similarities from the attribute distance association; comparing the venue to other venues based on quantified attribute similarities.
  • One Embodiment of a Method
  • One embodiment includes a method for classifying a venue by analyzing selected attributes in a venue website, comprising: entering the venue website using a URL corresponding to the website; receiving venue data from the venue website; determining the number of selected attributes in the venue website; determining a percentage of each selected attribute representation from the total number of preselected attributes in the URL data; assigning a classifier factor for each selected attribute in the URL data; classifying the venue based on the percentage of each attribute representation and the classifier factor.
  • In one implementation of the method, the receiving venue data comprises scanning and retrieving information from the venue website's HTML pages, text documents, PDF documents and images.
  • The method may further comprise: receiving preliminary venue-related data; retrieving verifiable venue data from the venue data; analyzing verifiable venue data by comparing the verifiable venue data to preliminary venue-related data; determining a probability level for the venue's URL from the comparison; verifying the venue from the probability level for the URL.
  • The method may further comprise: determining attribute distance association for the
  • selected attributes in the venue data; quantifying attribute similarities from the attribute distance association; comparing the venue to other venues based on quantified attribute similarities.
  • One Embodiment of a System
  • One embodiment includes a system for classifying a venue by analyzing venue data from a venue website, comprising: a server having a central processing unit, the server receiving the venue data from the venue website; a venue classification application having program code for executing a plurality of steps to analyze the venue data and classify the venue data; a communication network enabling the server to access the venue website to receive the venue data; the central processing unit executing the steps of: determining the number of selected attributes in the venue data; determining a percentage of each selected attribute representation from the total number of selected attributes in the venue data; assigning a classifier factor for each selected attribute in the venue data; classifying the venue based on the percentage of each attribute representation and the classifier factor.
  • In one implementation of the system, the communication network is the Internet.
  • The content of the following application is incorporated by reference herein in its entirety for all purposes: Ser. No. 12/134,126, filed Jun. 5, 2008, entitled METHOD AND SYSTEM FOR CLASSIFICATION OF VENUE BY ANALYZING DATA FROM VENUE WEBSITE.

Claims (9)

1-20. (canceled)
21. A computer-implemented method for determining attributes of a venue, comprising:
analyze, using a processor, first data associated with a first venue to identify a first set of venue attributes associated with the first venue;
analyze, using the processor, second data associated with one or more other venues to identify a second set of venue attributes associated with the one or more other venues;
compare, using the processor, the first set of venue attributes with the second set of venue attributes; and
determine, based on comparing the first set and the second set, a similarity between the first venue and the one or more other venues.
22. The method of claim 21, wherein the first venue and the one or more other venues are in the same city.
23. The method of claim 21, further comprising:
classifying the first venue based on the first set of venue attributes.
24. The method of claim 21, wherein the first venue is classified based on its atmosphere, fanciness, or prestige.
25. The method claim 21, further comprising:
assigning each of the first set of venue attributes with a classifier factor.
26. The method of claim 25, further comprising:
determining a proximity between a first venue attribute and a second venue attribute based on a first assigned classifier factor of the first venue attribute and a second assigned classifier factor of the second venue attribute.
27. The method of claim 21, further comprising:
determining an importance factor for each venue attribute in the first set of venue attributes.
28. The method of claim 21, wherein the first set of venue attributes may include words found in the first data.
US14/543,586 2008-06-05 2014-11-17 Method and system for classification of venue by analyzing data from venue website Abandoned US20150073944A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/543,586 US20150073944A1 (en) 2008-06-05 2014-11-17 Method and system for classification of venue by analyzing data from venue website

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12/134,126 US8918369B2 (en) 2008-06-05 2008-06-05 Method and system for classification of venue by analyzing data from venue website
US14/543,586 US20150073944A1 (en) 2008-06-05 2014-11-17 Method and system for classification of venue by analyzing data from venue website

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US12/134,126 Continuation US8918369B2 (en) 2008-06-05 2008-06-05 Method and system for classification of venue by analyzing data from venue website

Publications (1)

Publication Number Publication Date
US20150073944A1 true US20150073944A1 (en) 2015-03-12

Family

ID=41401237

Family Applications (2)

Application Number Title Priority Date Filing Date
US12/134,126 Active 2029-11-02 US8918369B2 (en) 2008-06-05 2008-06-05 Method and system for classification of venue by analyzing data from venue website
US14/543,586 Abandoned US20150073944A1 (en) 2008-06-05 2014-11-17 Method and system for classification of venue by analyzing data from venue website

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US12/134,126 Active 2029-11-02 US8918369B2 (en) 2008-06-05 2008-06-05 Method and system for classification of venue by analyzing data from venue website

Country Status (1)

Country Link
US (2) US8918369B2 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8918369B2 (en) * 2008-06-05 2014-12-23 Craze, Inc. Method and system for classification of venue by analyzing data from venue website
US8554854B2 (en) * 2009-12-11 2013-10-08 Citizennet Inc. Systems and methods for identifying terms relevant to web pages using social network messages
US10789526B2 (en) 2012-03-09 2020-09-29 Nara Logics, Inc. Method, system, and non-transitory computer-readable medium for constructing and applying synaptic networks
US10467677B2 (en) 2011-09-28 2019-11-05 Nara Logics, Inc. Systems and methods for providing recommendations based on collaborative and/or content-based nodal interrelationships
US8170971B1 (en) * 2011-09-28 2012-05-01 Ava, Inc. Systems and methods for providing recommendations based on collaborative and/or content-based nodal interrelationships
US8732101B1 (en) 2013-03-15 2014-05-20 Nara Logics, Inc. Apparatus and method for providing harmonized recommendations based on an integrated user profile
US11727249B2 (en) 2011-09-28 2023-08-15 Nara Logics, Inc. Methods for constructing and applying synaptic networks
US11151617B2 (en) 2012-03-09 2021-10-19 Nara Logics, Inc. Systems and methods for providing recommendations based on collaborative and/or content-based nodal interrelationships

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5926812A (en) * 1996-06-20 1999-07-20 Mantra Technologies, Inc. Document extraction and comparison method with applications to automatic personalized database searching
US5956726A (en) * 1995-06-05 1999-09-21 Hitachi, Ltd. Method and apparatus for structured document difference string extraction
US6519557B1 (en) * 2000-06-06 2003-02-11 International Business Machines Corporation Software and method for recognizing similarity of documents written in different languages based on a quantitative measure of similarity
US6782423B1 (en) * 1999-12-06 2004-08-24 Fuji Xerox Co., Ltd. Hypertext analyzing system and method
US20060167757A1 (en) * 2005-01-21 2006-07-27 Holden Jeffrey A Method and system for automated comparison of items
US20070162414A1 (en) * 2005-12-30 2007-07-12 Yoram Horowitz System and method for using external references to validate a data object's classification / consolidation
US20080162449A1 (en) * 2006-12-28 2008-07-03 Chen Chao-Yu Dynamic page similarity measurement
US8918369B2 (en) * 2008-06-05 2014-12-23 Craze, Inc. Method and system for classification of venue by analyzing data from venue website

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020095298A1 (en) * 1999-04-19 2002-07-18 Frogmagic, Inc. Blind Gift Method and System
US6487555B1 (en) * 1999-05-07 2002-11-26 Alta Vista Company Method and apparatus for finding mirrored hosts by analyzing connectivity and IP addresses
JP2000348041A (en) * 1999-06-03 2000-12-15 Nec Corp Document retrieval method, device therefor and mechanically readable recording medium
US6434745B1 (en) * 1999-09-15 2002-08-13 Direct Business Technologies, Inc. Customized web browsing and marketing software with local events statistics database
US20020116276A1 (en) * 2001-02-20 2002-08-22 Ottley Steven R. Intuitive graphical user interface for dynamically addressing electronic shopping cart deliverables
US7403938B2 (en) * 2001-09-24 2008-07-22 Iac Search & Media, Inc. Natural language query processing
US20050149507A1 (en) * 2003-02-05 2005-07-07 Nye Timothy G. Systems and methods for identifying an internet resource address
US7260568B2 (en) * 2004-04-15 2007-08-21 Microsoft Corporation Verifying relevance between keywords and web site contents
JP4720213B2 (en) * 2005-02-28 2011-07-13 富士通株式会社 Analysis support program, apparatus and method
US8615800B2 (en) * 2006-07-10 2013-12-24 Websense, Inc. System and method for analyzing web content
US7657507B2 (en) * 2007-03-02 2010-02-02 Microsoft Corporation Pseudo-anchor text extraction for vertical search

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5956726A (en) * 1995-06-05 1999-09-21 Hitachi, Ltd. Method and apparatus for structured document difference string extraction
US5926812A (en) * 1996-06-20 1999-07-20 Mantra Technologies, Inc. Document extraction and comparison method with applications to automatic personalized database searching
US6782423B1 (en) * 1999-12-06 2004-08-24 Fuji Xerox Co., Ltd. Hypertext analyzing system and method
US6519557B1 (en) * 2000-06-06 2003-02-11 International Business Machines Corporation Software and method for recognizing similarity of documents written in different languages based on a quantitative measure of similarity
US20060167757A1 (en) * 2005-01-21 2006-07-27 Holden Jeffrey A Method and system for automated comparison of items
US20070162414A1 (en) * 2005-12-30 2007-07-12 Yoram Horowitz System and method for using external references to validate a data object's classification / consolidation
US20080162449A1 (en) * 2006-12-28 2008-07-03 Chen Chao-Yu Dynamic page similarity measurement
US8918369B2 (en) * 2008-06-05 2014-12-23 Craze, Inc. Method and system for classification of venue by analyzing data from venue website

Also Published As

Publication number Publication date
US8918369B2 (en) 2014-12-23
US20090307238A1 (en) 2009-12-10

Similar Documents

Publication Publication Date Title
US20150073944A1 (en) Method and system for classification of venue by analyzing data from venue website
US8260731B2 (en) Information classification system, information processing apparatus, information classification method and program
US7751592B1 (en) Scoring items
Larkey et al. Acrophile: an automated acronym extractor and server
CN107862022B (en) Culture resource recommendation system
US8176067B1 (en) Fixed phrase detection for search
US20110112995A1 (en) Systems and methods for organizing collective social intelligence information using an organic object data model
Nguyen et al. Learning to extract form labels
US20170185680A1 (en) Chinese website classification method and system based on characteristic analysis of website homepage
KR100485321B1 (en) A method of managing web sites registered in search engine and a system thereof
JP2014502753A (en) Web page information detection method and system
WO2017113592A1 (en) Model generation method, word weighting method, apparatus, device and computer storage medium
CN111444304A (en) Search ranking method and device
CN111553137B (en) Report generation method and device, storage medium and computer equipment
US20220207483A1 (en) Automatic document classification
CN112989824A (en) Information pushing method and device, electronic equipment and storage medium
CN114692593B (en) Network information safety monitoring and early warning method
CN107908649B (en) Text classification control method
JP2003173352A (en) Retrieval log analysis method and device, document information retrieval method and device, retrieval log analysis program, document information retrieval program and storage medium
Ceroni et al. Improving event detection by automatically assessing validity of event occurrence in text
CN113420198A (en) Patent infringement clue web crawler method for web commodities
CN106919649B (en) Entry weight calculation method and device
CN106934007B (en) Associated information pushing method and device
CN108628875B (en) Text label extraction method and device and server
CN109934740B (en) patent monitoring method and device

Legal Events

Date Code Title Description
AS Assignment

Owner name: CRAZE, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SANGUINETTI, THOMAS V;SANGUINETTI, DAVE T;PLOETNER, JEFFREY S;SIGNING DATES FROM 20141114 TO 20141117;REEL/FRAME:034282/0280

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION