US7698317B2 - Techniques for detecting duplicate web pages - Google Patents
Techniques for detecting duplicate web pages Download PDFInfo
- Publication number
- US7698317B2 US7698317B2 US11/788,505 US78850507A US7698317B2 US 7698317 B2 US7698317 B2 US 7698317B2 US 78850507 A US78850507 A US 78850507A US 7698317 B2 US7698317 B2 US 7698317B2
- Authority
- US
- United States
- Prior art keywords
- pages
- shingles
- page
- group
- processors
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
Definitions
- the present invention relates to search engines and, more specifically, to techniques for detecting duplicate web pages.
- a “page” refers to an online document.
- An online document may be any set of data including, but not limited to, an image, a Portable Document Format (PDF) document, a set of binary data, and a markup language document.
- PDF Portable Document Format
- markup languages include, but are not limited to, HyperText Markup Language (HTML), eXtensible Markup Language (XML), as well as a wide variety of markup languages that are derivatives of the Standard Generalized Markup Language (SGML).
- a content search engine uses a shingle-based mechanism for detecting duplicate web pages.
- shingle refers to a compact data value that represents a fragment of a page.
- the search engine computes a fingerprint of a given page by computing a collection of shingles, where each shingle in the collection is computed based on a particular fragment that is defined by a sliding window over the content of the given page. The search engine determines that two pages have duplicate content when the two pages have the same or substantially the same fingerprints.
- the disadvantage of this duplicate detection mechanism is that in the presence of site-level page templates it produces false positives (e.g. classifying pages as having duplicate content when in fact the pages have different content) and false negatives (e.g. classifying pages as non-duplicates when in fact the pages have the same content).
- false positives e.g. classifying pages as having duplicate content when in fact the pages have different content
- false negatives e.g. classifying pages as non-duplicates when in fact the pages have the same content.
- the shingles used to detect the pages with duplicate content may have been computed over page fragments that originate from the same template part of a site-level template that is shared by the pages on a given site.
- two different web pages on the same site or host usually share the same site-level template, where the site-level template may be a set of HTML or other markup code that is common to, and determines the layout of, all pages on the particular site or host.
- the shingles which are used by a duplicate detection mechanism to determine whether two pages have duplicate content, originate from the same page portions defined by a site-level template, then the duplicate-detection mechanism would classify the two pages as having duplicate content even though the two pages may in fact have different content.
- the duplicate detection mechanism would classify two pages at the different sites as non-duplicates even though the two pages may in fact have the same content.
- FIG. 1 is a block diagram that illustrates an example layout of a web page.
- Different sites or hosts may store web pages that have layouts that are different from the page layout illustrated in FIG. 1 ; for example, different layouts may include a wide variety of different portions in different page positions. It is noted that the techniques described herein are not limited to detecting duplicate pages having any particular layout defined by any particular site-level template, and for this reason the page layout depicted in FIG. 1 is to be regarded in an illustrative rather than a restrictive sense.
- a site-level template may be used to define the common portions of a typical page 100 stored on the site.
- the common page portions may comprise one or more advertisement portions 102 A-B, a navigation portion 104 , and a contact/copyright portion 106 .
- Each of the one more advertisement portions 102 A-B may be used on each page of the site to display certain ads.
- the navigation portion 104 is also common for each page on the site and is used to display buttons and links which a user may use to navigate through the site.
- the contact/copyright portion 106 is also common for each page on the site and is used to display the same copyright and/or contact information.
- Content portion 108 is used to display the content of each page on the site; hence, the content portion 108 would likely be different for the different pages on the site.
- a shingle-based duplicate detection mechanism computes fingerprints for two pages on the site, where the shingles in each fingerprint are computed over fragments from the advertisement portions 102 A-B, the navigation portion 104 , and the content/copyright portion 106 (which are common for both pages).
- the duplicate detection mechanism would compare the fingerprints for each page, would find the shingles therein to be the same, and would classify the two pages as having duplicate content even though the content portions 108 of the two pages may be different.
- FIG. 1 is a block diagram that illustrates an example layout of a web page
- FIG. 2 is a flow diagram that illustrates an example of an improved method for detecting duplicate web pages according to some embodiments
- FIG. 3 is a block diagram that illustrates an example content search system according to one embodiment.
- FIG. 4 is a block diagram of a computer system on which embodiments of the invention may be implemented.
- a “duplicate page” refers to a page that includes the same or substantially the same content portion as another one or more pages. Duplicate pages may be present on the same site or on different sites.
- site generally refers to a group of pages that are related according to some criteria or in some way, such as, for example, based on a shared page layout, based on a shared design site-level templates, based on common ownership, etc.
- “site” can refer to all pages under a particular domain, sub-domain, host, sub-host, and/or any URL-path expression such as “www.nytimes.com/sports/tennis/*”. (“yahoo.com” and “news.yahoo.com” are examples of a domain and sub-domain; “in.news.yahoo.com” and “www.nytimes.com/sports” are examples of a host and a sub-host.)
- “site” can refer to a group of pages that are “owned” by the same entity.
- the pages on multiple domains and/or multiple hosts owned by the same entity may also be considered a “site”.
- “site” can refer to the pages on a mirror host (e.g., “www.cnn.com” may be a mirror host of “www2.cnn.com”).
- duplicate pages As an example of duplicate pages, consider a particular news site that includes a news article as an HTML page, a print page (which is a page specifically formatted for user-friendly printing, e.g., without any advertisements or navigation panes) of the same article, and/or a RSS feed page (which is a page formatted specifically for electronic downloading of content) for the article.
- a print page which is a page specifically formatted for user-friendly printing, e.g., without any advertisements or navigation panes
- RSS feed page which is a page formatted specifically for electronic downloading of content
- FIG. 2 is a flow diagram that illustrates an example of an improved method for detecting duplicate web pages according to the techniques described herein.
- a set of shingles is computed for each page of a group of pages.
- “shingle” refers to a compact data value that represents a fragment of a page.
- a shingle may be of any data type including, but not limited to, a numeric data type, a string datatype, and any binary data type.
- each shingle in the set of shingles for a particular page may be computed over a randomly selected fragment of the page.
- the set of shingles for a particular page may be selected from a plurality of all shingles that can be computed for the page.
- an aggregate set of shingles is determined based on the sets of shingles computed for the pages in the group of pages.
- the aggregate set may include all shingles from all sets of shingles computed for the pages of the group of pages.
- the aggregate set may include a subset of the shingles included in sets of shingles computed for the pages of the group of pages.
- a subset of frequently occurring shingles in the aggregate set is determined.
- the subset may be determined by selecting from the aggregate set those shingles whose frequency of occurrence in the set exceeds a specified threshold.
- the frequency of occurrence of a particular shingle in the aggregate set may be a ratio between the number of pages in which that shingle occurs and the total number of pages in the group of pages.
- the specified threshold may be a percentage of the number of pages in the group of pages.
- the frequency of occurrence of a particular shingle in the aggregate set may be a counter indicating the number of pages in which that shingle occurs.
- the specified threshold may be a specific number of pages.
- a modified set of shingles is generated for each page in the group of pages.
- the modified set for each particular page may be generated by removing, from the computed set of shingles for that page, any shingle or shingles that are included in the subset of frequently occurring shingles.
- one or more duplicate pages are determined in the group of pages based at least in part on the modified sets of shingles that are generated for the pages in the group of pages.
- a shingle-based duplicate detection logic may be executed and applied to the modified sets of shingles in order to determine whether each page in the group of pages is a duplicate of any other page in the group.
- the techniques described herein provide for improving the detection of duplicate pages. Improving the detection of duplicate pages in turn provides other benefits including, but not limited to, more efficient crawling of pages by ceasing to crawl and follow links out of duplicate pages, more efficient storage of content indexes by not storing index entries for duplicate pages, and more accurate responses to user queries by not including duplicate pages in search results.
- the techniques described herein provide a computationally efficient way of improving the accuracy of shingle-based duplicate detection mechanisms. By finding and removing from consideration any shingles that may have been computed over fragments from shared common portions of related pages, in a computationally efficient way the techniques described herein increase the likelihood that the remaining shingles relied on by a duplicate detection mechanism are unique enough to cause a more accurate duplicate detection.
- a crawler As a crawler is traversing the web, the crawler identifies a group of pages that are related or associated with each other according to some criteria. After the crawler accesses and retrieves the group of pages, a set of shingles is computed for each page in the group of related pages. The set of shingles computed for each page may be stored in computer data storage.
- “computer data storage” refers to one or more data structures, which are operable to store data and which may be accessed by a computing device such as a computer system.
- a computer data storage may be physically stored in any volatile and/or non-volatile computer-readable medium.
- a computer data storage may be implemented in a storage device that includes volatile media, such as RAM or other dynamic memory, or non-volatile media, such as an electromagnetic or optical disk.
- One example embodiment may use shingles that are hash values computed over fragments of a page.
- the markup language code of each page in the group of pages is first unfolded.
- hash values are computed based on a sliding window that may include a certain number of words, for example, 10 words.
- the window slides one word at a time starting at the beginning of the unfolded page, and at each word a shingle is computed by evaluating a hash function over the words currently included in the window. In this manner, as many shingles are computed for each page as there are text words in that page.
- a set of shingles is selected from all computed shingles.
- the selection of the shingles may be random or may be based on particular criteria.
- the selected set may include a certain number of shingles that have the smallest values among all shingles computed for the particular page.
- the size of a set of shingles may also be fixed; for example, the same number of shingles may be selected into the set of shingles for each page in the group.
- all shingles computed for a particular page may be included in the set of selected shingles for that page.
- the set of shingles for a particular page may be computed based on the text of the entire page, including tags and other markup that may be present in the page.
- the tags and other markup may be stripped from the page prior to computing the set of shingles for that page. In some implementations this may improve the accuracy of any duplicate detection mechanism because the common text of tags and other markup is excluded when the shingles for each particular page are computed.
- a shingle may be a Cyclical Redundancy Check (CRC) value that is computed over a sliding window of text from a particular page.
- CRC Cyclical Redundancy Check
- page fragments may be directly used as the shingles for that page.
- shingles may be computed based on information included in a Document Object Model (DOM) tree that is generated for the HTML code of a page.
- DOM Document Object Model
- an aggregate set of shingles is generated and stored in computer data storage.
- the aggregate set of shingles may be generated and stored as a table that includes one row for each page in the group of pages, where the columns of a particular row would store the shingles in the set of shingles computed for the particular page associated with that row.
- the table storing the aggregate set may be implemented in volatile memory and/or on persistent storage.
- the aggregate set of shingles is based on the sets of shingles computed for the pages in the group of related pages.
- each shingle may be associated with the particular page from which that shingle was generated.
- the associations in the aggregate set between shingles and their corresponding pages may be kept in a variety of logical data structures including, but not limited to, tables, arrays, and linked lists.
- the techniques described herein are not limited to any particular structure for storing the aggregate set of shingles and the associations to pages thereof.
- the examples of storage structures operable to store an aggregate set of shingles provided herein are to be regarded in an illustrative rather than a restrictive sense.
- a subset of frequently occurring shingles in the aggregate set is determined.
- the subset of frequently occurring shingles is also referred to hereinafter as a “subset of frequent shingles”.
- the subset of frequent shingles may be determined by selecting, from the aggregate set, only those shingles whose frequency of occurrence in the set exceeds a specified threshold.
- the specified threshold may be defined as a percentage of the number of pages in the group of pages. This threshold percentage may be a configurable parameter that is specified for all groups of pages being processed, or it may be a dynamic parameter that is set based on some particular characteristics of the group of pages, such as, for example, the number and/or size of the pages in the group of pages.
- the frequency of occurrence of a particular shingle in the aggregate set may be defined as the ratio between the number of pages in which that shingle occurs and the total number of pages in the group of pages.
- the frequency ratio for a particular shingle may be determined as follows. First, shingles in the aggregate set are traversed and a counter is incremented for each page in which the particular shingle is found to be present. Next, the counter for the particular shingle is divided over the number of pages in the group of pages. After the frequency ratio for a particular shingle is determined, the frequency ratio is compared to the specified threshold. If the frequency ratio is greater than the specified threshold, then the particular shingle is included in the subset of frequent shingles; otherwise, the particular shingle is not included in the subset of frequent shingles.
- the specified threshold may be defined as a specific number of pages. This threshold may be a dynamic or a static parameter that is configurable for all processed groups of pages or on a per-group basis.
- the frequency of occurrence of a particular shingle in the aggregate set may be defined as a counter indicating the number of pages in which that shingle occurs.
- the frequency counter for a particular shingle may be determined by traversing the shingles in the aggregate set and counting the pages in which the particular shingle is found to be present. The frequency counter is then compared to the specified threshold. If the frequency counter is greater than the specified threshold, then the particular shingle is included in the subset of frequent shingles; otherwise, the particular shingle is not included in the subset of frequent shingles.
- Various embodiments may use various types of thresholds and various mechanisms and algorithms to determine the shingles that are frequently occurring in the aggregate set.
- the techniques described herein are not limited to any particular type of threshold or mechanism for determining a subset of frequent shingles from an aggregate set.
- the examples of thresholds and mechanisms for determining a subset of frequent shingles provided herein are to be regarded in an illustrative rather than a restrictive sense.
- a modified set of shingles is generated for each page in the group of related pages.
- the modified sets of shingles generated for the pages in the group of pages may be stored in computer data storage.
- the modified set for a particular page does not include any shingle that is determined to be frequently occurring in the aggregate set generated for the group of pages.
- the modified set for a particular page may be generated as follows. Each shingle in the set of shingles computed for the particular page is compared to all shingles included in the subset of frequent shingles. If the shingle being compared matches any shingle in the subset of frequent shingles, then the shingle being compared is removed from the set of shingles computed for the particular page. After all shingles in the set of shingles for the particular page are processed in this manner, the shingles remaining in the set comprise the modified set of shingles for that particular page.
- any shingle determined to be frequently present in the aggregate set for the group of pages is filtered out from the sets of shingles computed for each individual page of the group of pages.
- This allows the techniques described herein to provide a computationally efficient way to remove from consideration for duplicate detection any shingle that is likely to have been computed over fragments from common portions of the pages in the group of pages.
- the techniques described herein achieve this without actually identifying the common portions in the pages of the group of pages.
- the modified sets of shingles are fed into a shingle-based duplicate detection mechanism to determine the one or more duplicate pages (if any) that are present in the group of pages.
- a duplicate detection mechanism tries to measure a metric that is the ratio of A intersect B to A union B.
- a shingle-based duplicate detection mechanism approximates this metric by computing shingles for documents A and B and counting the number of common shingles.
- a shingle-based duplicate detection mechanism that may be used in conjunction with the techniques described herein is provided in Broder et al., U.S. Pat. No. 5,909,677 (the '677 patent), issued on Jun. 1, 1999 and entitled “METHOD FOR DETERMINING THE RESEMBLANCE OF DOCUMENTS,” the entire contents of which are hereby incorporated by reference for all purposes as if fully set forth herein.
- the techniques described herein may be used in conjunction with any duplicate detection mechanism that is operable to detect duplicate pages based on shingles computed from the pages.
- the techniques described herein are not limited to being used with any particular duplicate detection mechanism, and the duplicate detection mechanism described in the '677 patent is to be regarded in an illustrative rather than a restrictive sense.
- duplicate detection logic may be applied to the modified sets of shingles generated for the pages in the group of pages. Based on the modified sets, the duplicate detection logic when executed may classify or rank each page in the group of pages in at least three categories: a unique category, a parent category, and a duplicate category. Pages ranked in the unique category are considered to have unique content portions. Pages ranked in the parent category are considered to be representative pages for one or more pages in the duplicate category with the same or substantially the same content. Pages ranked in the duplicate category are considered to have the same or substantially the same content as one page in the parent category. Every page in the duplicate category has a corresponding representative page in the parent category. Pages may be classified as parents or duplicates based on their relative importance; for example, pages with more incoming links may be preferred as parents instead of as duplicates.
- the ranking or categorization generated based on the modified sets of shingles for the pages in the group of pages may be used in a content search system in various ways. For example, based on the ranking, a crawler in the search system may stop following links out of duplicate pages. In another example, based on the ranking, an index engine in the search system would not index duplicate pages and would not store any entries for the duplicate pages in a searchable database. As a result, a search engine in the system would not display indications or links to the duplicate pages in search results that are generated in response to queries issued by a user.
- the techniques described herein provide for generating a more accurate page ranking or categorization in a computationally efficient way.
- the page ranking or categorization generated according to the techniques described herein may provide a computational improvement for any system component that is operable to process pages based on the ranking or categorization. For this reason, the usage of the page ranking and categorization described herein is to be regarded in an illustrative rather than a restrictive sense.
- FIG. 3 is a block diagram that illustrates an example content search system according to one embodiment.
- Content search system 300 is generally operable to facilitate searching of web content. According to the techniques described herein, content search system 300 comprises crawler 302 , aggregation logic 308 , filtering logic 310 , and duplicate detection logic 312 . Content search system 300 may further comprise one or more components which are not shown in FIG. 3 and which may include, without limitation, an index engine operable to index content retrieved from the web and to store content indexes in a searchable database, a search engine operable to search the database in response to user queries, and one or more other components such as web servers operable to provide interfaces for receiving user input and for displaying the search results to the users.
- an index engine operable to index content retrieved from the web and to store content indexes in a searchable database
- search engine operable to search the database in response to user queries
- web servers operable to provide interfaces for receiving user input and for displaying the search results to the users.
- Crawler 302 may be a process comprising a combination of integrated software components and an allocation of computational resources, such as memory, CPU time, and/or disk storage space. By utilizing its allocated computational resources, crawler 302 is operable to automatically browse the web and to access and retrieve any pages therefrom.
- computational resources such as memory, CPU time, and/or disk storage space.
- crawler 302 comprises shingle generation logic 304 and shingle removal logic 306 .
- Shingle generation logic 304 may comprise one or more integrated software components, which when executed by one or more processors, are operable to compute shingles for pages retrieved by crawler 302 .
- Shingle removal logic 306 may comprise one or more integrated software components, which when executed by one or more processors, are operable to remove frequently occurring shingles from the sets of shingles generated by shingle generation logic 304 for the pages retrieved by crawler 302 .
- Aggregation logic 308 may comprise one or more integrated software components which, when executed by one or more processors, are operable to generate an aggregate set of shingles according to the techniques described herein.
- aggregation logic 308 may be implemented as a library of functions, which may be invoked via calls made by components of search system 300 .
- aggregation logic 308 may be implemented as a process or thread, which is allocated computational resources such as CPU time and memory and which is operable to respond to requests from other components of search system 300 .
- Filtering logic 310 may comprise one or more integrated software components which, when executed by one or more processors, are operable to generate a subset of frequent shingles according to the techniques described herein.
- filtering logic 310 may be implemented as a library of functions, which may be invoked via calls made by components of search system 300 .
- filtering logic 310 may be implemented as a process or thread, which is allocated computational resources such as CPU time and memory and which is operable to respond to requests from other components of search system 300 .
- Duplicate detection logic 312 may comprise one or more integrated software components which, when executed by one or more processors, are operable to detect duplicate pages based on a shingle-based duplicate detection mechanism that is operable in conjunction with the techniques described herein.
- duplicate detection logic 312 may be implemented as a library of functions, which may be invoked via calls made by components of search system 300 .
- duplicate detection logic 312 may be implemented as a process or thread, which is allocated computational resources such as CPU time and memory and which is operable to respond to requests from other components of search system 300 .
- crawler 302 browses the web and determines that group 301 includes one hundred pages that are related in some way, for example, by being provided on the same host computer system.
- shingle generation logic 304 computes and stores in computer data storage a set of shingles for each page in group 301 . For example, for each particular page, shingle generation logic 304 first strips that page of any tags (e.g. HTML tags) or other markup. Then, shingle generation logic 304 computes as many shingles as there are words in that particular page by evaluating a hash function over a window of 10 words that slides over the stripped page one word at a time. From all shingles computed for that page, shingle generation logic 304 selects the 24 shingles that have the smallest values. In this manner, shingle generation logic 304 generates a set of 24 shingles for each of the one hundred pages in group 301 . Thereafter, shingle generation logic 304 invokes aggregation logic 308 .
- tags e.g. HTML tags
- shingle generation logic 304 first strips that page of any tags (e.g. HTML tags) or other markup. Then, shingle generation logic 304 computes as many
- aggregation logic 308 generates and stores in computer data storage an aggregated set that includes the generated 2400 shingles (24 shingles per each of the one hundred pages in group 301 ). For example, aggregation logic 308 may generate a table with 100 rows, where each row is associated with one of the pages in group 301 , and where the columns of any particular row store the 24 shingles computed for the particular page associated with that row. Thereafter, aggregation logic 308 invokes filtering logic 310 .
- Filtering logic 310 generates and stores in computer data storage the 16 shingles that occur most frequently in the aggregate set. For example, according to the techniques described herein, filtering logic 310 may determine for each particular shingle in the aggregated set a frequency with which that particular shingle occurs in the aggregate set. Filtering logic 310 then compares the frequency of each particular shingle to a specified threshold, and based on the result of the comparison determines whether to include the that particular shingle in the 16 most frequently occurring shingles. After filtering logic 310 processes all shingles in the aggregate set and determines the 16 most frequent shingles, filtering logic 310 invokes shingle removal logic 306 in crawler 302 .
- shingle removal logic 306 traverses the sets of 24 shingles computed for the pages in group 301 , and removes from each set any shingle in the 16 most frequent shingles. For example, for each particular set of 24 shingles that is associated with a particular page of group 301 , shingle removal logic 306 compares each shingle in that particular set to each shingle in the 16 most frequent shingles, and removes from that particular set any matching shingle. After shingle removal logic 306 completes the removal of the frequent shingles in this manner, each set of 24 shingles for each page in group 301 would include at least 8 shingles that comprise the modified set of shingles for that page. (In some embodiments, crawler 302 or a component thereof may further process any modified set that includes more than 8 shingles to adjust that set to include exactly 8 shingles.)
- duplicate detection logic 312 is invoked to determine the duplicate pages (if any) in group 301 based on the modified sets of at least 8 shingles generated for the pages in the group.
- duplicate detection logic 312 When invoked, duplicate detection logic 312 generates and stores output 314 in computer data storage.
- Output 314 is a set of data that comprises a classification of the pages in group 301 into unique, parent, and duplicate categories. Since according to the techniques described herein any common and/or frequently occurring shingles have been found and removed from the modified sets, the shingles remaining in the modified sets are likely to be unique enough in order to cause duplicate detection logic 312 to determine any duplicate pages more accurately.
- FIG. 3 While in FIG. 3 the functionalities provided by the techniques described herein are explained as being performed by particular components of content search system 300 , it is noted that the techniques described herein are not limited to systems that include the particular components illustrated in FIG. 3 .
- the functionalities of crawler 302 , shingle generation logic 304 , aggregation logic 308 , filtering logic 310 , shingle removal logic 306 , and duplicate detection logic 312 may be performed by one integrated component or by any combination of two or more components.
- the components of content search system 300 illustrated in FIG. 3 are to be regarded in an illustrative rather than a restrictive sense.
- modified sets of shingles generated according to the techniques described herein may be used by a search engine to rank search results that are returned in response to a user query.
- a search engine may generate a ranking for each page in the group based on the modified set of shingles for the pages in that group.
- the ranking for each page may be stored in a searchable database in association with that page.
- a search engine may sort and display an indication of that page in search results in accordance with the ranking associated with that page.
- FIG. 4 is a block diagram that illustrates a computer system 400 upon which an embodiment of the invention may be implemented.
- Computer system 400 includes a bus 402 or other communication mechanism for communicating information, and a processor 404 coupled with bus 402 for processing information.
- Computer system 400 also includes a main memory 406 , such as a random access memory (RAM) or other dynamic storage device, coupled to bus 402 for storing information and instructions to be executed by processor 404 .
- Main memory 406 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 404 .
- Computer system 400 further includes a read only memory (ROM) 408 or other static storage device coupled to bus 402 for storing static information and instructions for processor 404 .
- a storage device 410 such as a magnetic disk or optical disk, is provided and coupled to bus 402 for storing information and instructions.
- Computer system 400 may be coupled via bus 402 to a display 412 , such as a cathode ray tube (CRT), for displaying information to a computer user.
- a display 412 such as a cathode ray tube (CRT)
- An input device 414 is coupled to bus 402 for communicating information and command selections to processor 404 .
- cursor control 416 is Another type of user input device
- cursor control 416 such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 404 and for controlling cursor movement on display 412 .
- This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
- the invention is related to the use of computer system 400 for implementing the techniques described herein. According to one embodiment of the invention, those techniques are performed by computer system 400 in response to processor 404 executing one or more sequences of one or more instructions contained in main memory 406 . Such instructions may be read into main memory 406 from another computer-readable medium, such as storage device 410 . Execution of the sequences of instructions contained in main memory 406 causes processor 404 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.
- Non-volatile media includes, for example, optical or magnetic disks, such as storage device 410 .
- Volatile media includes dynamic memory, such as main memory 406 .
- Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 402 . Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
- Computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read.
- Various forms of computer-readable media may be involved in carrying one or more sequences of one or more instructions to processor 404 for execution.
- the instructions may initially be carried on a magnetic disk of a remote computer.
- the remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem.
- a modem local to computer system 400 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal.
- An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 402 .
- Bus 402 carries the data to main memory 406 , from which processor 404 retrieves and executes the instructions.
- the instructions received by main memory 406 may optionally be stored on storage device 410 either before or after execution by processor 404 .
- Computer system 400 also includes a communication interface 418 coupled to bus 402 .
- Communication interface 418 provides a two-way data communication coupling to a network link 420 that is connected to a local network 422 .
- communication interface 418 may be an integrated services digital network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line.
- ISDN integrated services digital network
- communication interface 418 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN.
- LAN local area network
- Wireless links may also be implemented.
- communication interface 418 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
- Network link 420 typically provides data communication through one or more networks to other data devices.
- network link 420 may provide a connection through local network 422 to a host computer 424 or to data equipment operated by an Internet Service Provider (ISP) 426 .
- ISP 426 in turn provides data communication services through the worldwide packet data communication network now commonly referred to as the “Internet” 428 .
- Internet 428 uses electrical, electromagnetic or optical signals that carry digital data streams.
- the signals through the various networks and the signals on network link 420 and through communication interface 418 which carry the digital data to and from computer system 400 , are exemplary forms of carrier waves transporting the information.
- Computer system 400 can send messages and receive data, including program code, through the network(s), network link 420 and communication interface 418 .
- a server 430 might transmit a requested code for an application program through Internet 428 , ISP 426 , local network 422 and communication interface 418 .
- the received code may be executed by processor 404 as it is received, and/or stored in storage device 410 , or other non-volatile storage for later execution. In this manner, computer system 400 may obtain application code in the form of a carrier wave.
Abstract
Description
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/788,505 US7698317B2 (en) | 2007-04-20 | 2007-04-20 | Techniques for detecting duplicate web pages |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/788,505 US7698317B2 (en) | 2007-04-20 | 2007-04-20 | Techniques for detecting duplicate web pages |
Publications (2)
Publication Number | Publication Date |
---|---|
US20080263026A1 US20080263026A1 (en) | 2008-10-23 |
US7698317B2 true US7698317B2 (en) | 2010-04-13 |
Family
ID=39873261
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/788,505 Expired - Fee Related US7698317B2 (en) | 2007-04-20 | 2007-04-20 | Techniques for detecting duplicate web pages |
Country Status (1)
Country | Link |
---|---|
US (1) | US7698317B2 (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090259649A1 (en) * | 2008-04-11 | 2009-10-15 | Krishna Leela Poola | System and method for detecting templates of a website using hyperlink analysis |
US8037073B1 (en) * | 2007-12-31 | 2011-10-11 | Google Inc. | Detection of bounce pad sites |
WO2012000778A1 (en) | 2010-06-30 | 2012-01-05 | International Business Machines Corporation | Page uniqueness detection |
US20150302093A1 (en) * | 2014-04-17 | 2015-10-22 | OnPage.org GmbH | Method and system for filtering of a website |
WO2015163850A1 (en) * | 2014-04-22 | 2015-10-29 | Hewlett-Packard Development Company, L.P. | Identifying a functional fragment of a document object model tree |
WO2016066043A1 (en) * | 2014-10-30 | 2016-05-06 | 阿里巴巴集团控股有限公司 | Web page deduplication method and apparatus |
US20160342500A1 (en) * | 2015-05-22 | 2016-11-24 | Microsoft Technology Licensing, Llc | Template Identification for Control of Testing |
US9959254B2 (en) | 2013-01-18 | 2018-05-01 | International Business Machines Corporation | Representation of an element in a page via an identifier |
US10108525B2 (en) | 2013-06-14 | 2018-10-23 | International Business Machines Corporation | Optimizing automated interactions with web applications |
US11170014B2 (en) * | 2016-12-29 | 2021-11-09 | Google Llc | Method and system for preview of search engine processing |
Families Citing this family (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8205255B2 (en) * | 2007-05-14 | 2012-06-19 | Cisco Technology, Inc. | Anti-content spoofing (ACS) |
US8768932B1 (en) * | 2007-05-14 | 2014-07-01 | Google Inc. | Method and apparatus for ranking search results |
US20090089326A1 (en) * | 2007-09-28 | 2009-04-02 | Yahoo!, Inc. | Method and apparatus for providing multimedia content optimization |
US7930306B2 (en) * | 2008-04-30 | 2011-04-19 | Msc Intellectual Properties B.V. | System and method for near and exact de-duplication of documents |
US9128945B1 (en) | 2008-05-16 | 2015-09-08 | Google Inc. | Query augmentation |
CN102033881A (en) * | 2009-09-30 | 2011-04-27 | 国际商业机器公司 | Method and system for recognizing advertisement in web page |
US8346792B1 (en) | 2010-11-09 | 2013-01-01 | Google Inc. | Query generation using structural similarity between documents |
KR20120124581A (en) | 2011-05-04 | 2012-11-14 | 엔에이치엔(주) | Method, device and computer readable recording medium for improvded detection of similar documents |
US20150169584A1 (en) | 2012-05-17 | 2015-06-18 | Google Inc. | Systems and methods for re-ranking ranked search results |
WO2013173806A2 (en) * | 2012-05-17 | 2013-11-21 | Google Inc. | Systems and methods for determining a likelihood that an entity is an author of original content |
CN102779172B (en) * | 2012-06-25 | 2016-06-01 | 北京奇虎科技有限公司 | The recognition system of non-body text and method in a kind of webpage |
US10007731B2 (en) * | 2012-09-12 | 2018-06-26 | Google Llc | Deduplication in search results |
CA2789936C (en) * | 2012-09-14 | 2020-02-18 | Ibm Canada Limited - Ibm Canada Limitee | Identification of sequential browsing operations |
CN102945244A (en) * | 2012-09-24 | 2013-02-27 | 南京大学 | Chinese web page repeated document detection and filtration method based on full stop characteristic word string |
US20140222760A1 (en) * | 2013-02-04 | 2014-08-07 | Trustees Of Boston University | Method and system for reconciling remote data |
US11017426B1 (en) * | 2013-12-20 | 2021-05-25 | BloomReach Inc. | Content performance analytics |
US9563620B2 (en) * | 2014-06-02 | 2017-02-07 | Empire Technology Development Llc | Automatic aggregation for infrastructure string matching |
CN104063506B (en) * | 2014-07-08 | 2017-04-12 | 百度在线网络技术(北京)有限公司 | Method and device for identifying repeated web pages |
US10200381B2 (en) * | 2015-08-05 | 2019-02-05 | Mcafee, Llc | Systems and methods for phishing and brand protection |
US10346291B2 (en) | 2017-02-21 | 2019-07-09 | International Business Machines Corporation | Testing web applications using clusters |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5745900A (en) * | 1996-08-09 | 1998-04-28 | Digital Equipment Corporation | Method for indexing duplicate database records using a full-record fingerprint |
US5909677A (en) | 1996-06-18 | 1999-06-01 | Digital Equipment Corporation | Method for determining the resemblance of documents |
US5982956A (en) * | 1995-03-29 | 1999-11-09 | Rank Zerox | Secure method for duplicating sensitive documents |
US20020049753A1 (en) * | 2000-08-07 | 2002-04-25 | Altavista Company | Technique for deleting duplicate records referenced in an index of a database |
US20030106017A1 (en) * | 2001-12-05 | 2003-06-05 | I2 Technologies Us, Inc. | Computer-implemented PDF document management |
US6886129B1 (en) * | 1999-11-24 | 2005-04-26 | International Business Machines Corporation | Method and system for trawling the World-wide Web to identify implicitly-defined communities of web pages |
US6968331B2 (en) | 2002-01-22 | 2005-11-22 | International Business Machines Corporation | Method and system for improving data quality in large hyperlinked text databases using pagelets and templates |
US20060041597A1 (en) * | 2004-08-23 | 2006-02-23 | West Services, Inc. | Information retrieval systems with duplicate document detection and presentation functions |
US20070005589A1 (en) * | 2005-07-01 | 2007-01-04 | Sreenivas Gollapudi | Method and apparatus for document clustering and document sketching |
US20080044016A1 (en) * | 2006-08-04 | 2008-02-21 | Henzinger Monika H | Detecting duplicate and near-duplicate files |
US20080162478A1 (en) * | 2001-01-24 | 2008-07-03 | William Pugh | Detecting duplicate and near-duplicate files |
US20080306943A1 (en) * | 2004-07-26 | 2008-12-11 | Anna Lynn Patterson | Phrase-based detection of duplicate documents in an information retrieval system |
US20080319971A1 (en) * | 2004-07-26 | 2008-12-25 | Anna Lynn Patterson | Phrase-based personalization of searches in an information retrieval system |
US7627613B1 (en) * | 2003-07-03 | 2009-12-01 | Google Inc. | Duplicate document detection in a web crawler system |
-
2007
- 2007-04-20 US US11/788,505 patent/US7698317B2/en not_active Expired - Fee Related
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5982956A (en) * | 1995-03-29 | 1999-11-09 | Rank Zerox | Secure method for duplicating sensitive documents |
US5909677A (en) | 1996-06-18 | 1999-06-01 | Digital Equipment Corporation | Method for determining the resemblance of documents |
US20040243569A1 (en) * | 1996-08-09 | 2004-12-02 | Overture Services, Inc. | Technique for ranking records of a database |
US5745900A (en) * | 1996-08-09 | 1998-04-28 | Digital Equipment Corporation | Method for indexing duplicate database records using a full-record fingerprint |
US6886129B1 (en) * | 1999-11-24 | 2005-04-26 | International Business Machines Corporation | Method and system for trawling the World-wide Web to identify implicitly-defined communities of web pages |
US20020049753A1 (en) * | 2000-08-07 | 2002-04-25 | Altavista Company | Technique for deleting duplicate records referenced in an index of a database |
US20080162478A1 (en) * | 2001-01-24 | 2008-07-03 | William Pugh | Detecting duplicate and near-duplicate files |
US20030106017A1 (en) * | 2001-12-05 | 2003-06-05 | I2 Technologies Us, Inc. | Computer-implemented PDF document management |
US6968331B2 (en) | 2002-01-22 | 2005-11-22 | International Business Machines Corporation | Method and system for improving data quality in large hyperlinked text databases using pagelets and templates |
US7627613B1 (en) * | 2003-07-03 | 2009-12-01 | Google Inc. | Duplicate document detection in a web crawler system |
US20080306943A1 (en) * | 2004-07-26 | 2008-12-11 | Anna Lynn Patterson | Phrase-based detection of duplicate documents in an information retrieval system |
US20080319971A1 (en) * | 2004-07-26 | 2008-12-25 | Anna Lynn Patterson | Phrase-based personalization of searches in an information retrieval system |
US20060041597A1 (en) * | 2004-08-23 | 2006-02-23 | West Services, Inc. | Information retrieval systems with duplicate document detection and presentation functions |
US20070005589A1 (en) * | 2005-07-01 | 2007-01-04 | Sreenivas Gollapudi | Method and apparatus for document clustering and document sketching |
US20080044016A1 (en) * | 2006-08-04 | 2008-02-21 | Henzinger Monika H | Detecting duplicate and near-duplicate files |
Non-Patent Citations (1)
Title |
---|
David Gibson et al., The Volume and Evolution of Web Page Templates, ACM Press, 14th International Conference on World Wide Web, May 10, 1995 (10 pgs). |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8037073B1 (en) * | 2007-12-31 | 2011-10-11 | Google Inc. | Detection of bounce pad sites |
US8521746B1 (en) | 2007-12-31 | 2013-08-27 | Google Inc. | Detection of bounce pad sites |
US20090259649A1 (en) * | 2008-04-11 | 2009-10-15 | Krishna Leela Poola | System and method for detecting templates of a website using hyperlink analysis |
US7962523B2 (en) * | 2008-04-11 | 2011-06-14 | Yahoo! Inc. | System and method for detecting templates of a website using hyperlink analysis |
WO2012000778A1 (en) | 2010-06-30 | 2012-01-05 | International Business Machines Corporation | Page uniqueness detection |
US10114804B2 (en) | 2013-01-18 | 2018-10-30 | International Business Machines Corporation | Representation of an element in a page via an identifier |
US9959254B2 (en) | 2013-01-18 | 2018-05-01 | International Business Machines Corporation | Representation of an element in a page via an identifier |
US10127132B2 (en) | 2013-06-14 | 2018-11-13 | International Business Machines Corporation | Optimizing automated interactions with web applications |
US10929265B2 (en) | 2013-06-14 | 2021-02-23 | International Business Machines Corporation | Optimizing automated interactions with web applications |
US10108525B2 (en) | 2013-06-14 | 2018-10-23 | International Business Machines Corporation | Optimizing automated interactions with web applications |
US20150302093A1 (en) * | 2014-04-17 | 2015-10-22 | OnPage.org GmbH | Method and system for filtering of a website |
WO2015163850A1 (en) * | 2014-04-22 | 2015-10-29 | Hewlett-Packard Development Company, L.P. | Identifying a functional fragment of a document object model tree |
US10339207B2 (en) | 2014-04-22 | 2019-07-02 | Entit Software Llc | Identifying a functional fragment of a document object model tree |
US10691769B2 (en) | 2014-10-30 | 2020-06-23 | Alibaba Group Holding Limited | Methods and apparatus for removing a duplicated web page |
WO2016066043A1 (en) * | 2014-10-30 | 2016-05-06 | 阿里巴巴集团控股有限公司 | Web page deduplication method and apparatus |
US9720814B2 (en) * | 2015-05-22 | 2017-08-01 | Microsoft Technology Licensing, Llc | Template identification for control of testing |
US20160342500A1 (en) * | 2015-05-22 | 2016-11-24 | Microsoft Technology Licensing, Llc | Template Identification for Control of Testing |
US11170014B2 (en) * | 2016-12-29 | 2021-11-09 | Google Llc | Method and system for preview of search engine processing |
US20220043809A1 (en) * | 2016-12-29 | 2022-02-10 | Google Llc | Method And System For Preview Of Search Engine Processing |
Also Published As
Publication number | Publication date |
---|---|
US20080263026A1 (en) | 2008-10-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7698317B2 (en) | Techniques for detecting duplicate web pages | |
US20170116200A1 (en) | Trust propagation through both explicit and implicit social networks | |
US8005823B1 (en) | Community search optimization | |
JP4838529B2 (en) | Enhanced clustering of multi-type data objects for search term proposal | |
US7984035B2 (en) | Context-based document search | |
US8051080B2 (en) | Contextual ranking of keywords using click data | |
KR101223173B1 (en) | Phrase-based indexing in an information retrieval system | |
JP4861961B2 (en) | Relevance weighted navigation in information access and retrieval | |
KR101223172B1 (en) | Phrase-based searching in an information retrieval system | |
US8099423B2 (en) | Hierarchical metadata generator for retrieval systems | |
KR101176079B1 (en) | Phrase-based generation of document descriptions | |
US7680858B2 (en) | Techniques for clustering structurally similar web pages | |
CN1112647C (en) | Feature diffusion across hyperlinks | |
US7966341B2 (en) | Estimating the date relevance of a query from query logs | |
US20070022085A1 (en) | Techniques for unsupervised web content discovery and automated query generation for crawling the hidden web | |
US20080010292A1 (en) | Techniques for clustering structurally similar webpages based on page features | |
US20090248707A1 (en) | Site-specific information-type detection methods and systems | |
US20090157652A1 (en) | Method and system for quantifying the quality of search results based on cohesion | |
US20070162448A1 (en) | Adaptive hierarchy structure ranking algorithm | |
Huang et al. | Topic detection from large scale of microblog stream with high utility pattern clustering | |
KR20060048779A (en) | Phrase identification in an information retrieval system | |
US7698329B2 (en) | Method for improving quality of search results by avoiding indexing sections of pages | |
US10467255B2 (en) | Methods and systems for analyzing reading logs and documents thereof | |
Broder et al. | Information Retrieval on the Web. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: YAHOO! INC.,CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SASTURKAR, AMIT;AHUJA, RAJAT;RAVIKUMAR, SHANMUGASUNDARAM;AND OTHERS;REEL/FRAME:019279/0001 Effective date: 20070420 Owner name: YAHOO! INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SASTURKAR, AMIT;AHUJA, RAJAT;RAVIKUMAR, SHANMUGASUNDARAM;AND OTHERS;REEL/FRAME:019279/0001 Effective date: 20070420 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
AS | Assignment |
Owner name: EXCALIBUR IP, LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:YAHOO! INC.;REEL/FRAME:038383/0466 Effective date: 20160418 |
|
AS | Assignment |
Owner name: YAHOO! INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:EXCALIBUR IP, LLC;REEL/FRAME:038951/0295 Effective date: 20160531 |
|
AS | Assignment |
Owner name: EXCALIBUR IP, LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:YAHOO! INC.;REEL/FRAME:038950/0592 Effective date: 20160531 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552) Year of fee payment: 8 |
|
AS | Assignment |
Owner name: STARBOARD VALUE INTERMEDIATE FUND LP, AS COLLATERAL AGENT, NEW YORK Free format text: PATENT SECURITY AGREEMENT;ASSIGNORS:ACACIA RESEARCH GROUP LLC;AMERICAN VEHICULAR SCIENCES LLC;BONUTTI SKELETAL INNOVATIONS LLC;AND OTHERS;REEL/FRAME:052853/0153 Effective date: 20200604 |
|
AS | Assignment |
Owner name: R2 SOLUTIONS LLC, TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:EXCALIBUR IP, LLC;REEL/FRAME:053459/0059 Effective date: 20200428 |
|
AS | Assignment |
Owner name: NEXUS DISPLAY TECHNOLOGIES LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: MONARCH NETWORKING SOLUTIONS LLC, CALIFORNIA Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: UNIFICATION TECHNOLOGIES LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: CELLULAR COMMUNICATIONS EQUIPMENT LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: ACACIA RESEARCH GROUP LLC, NEW YORK Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: SUPER INTERCONNECT TECHNOLOGIES LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: R2 SOLUTIONS LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: SAINT LAWRENCE COMMUNICATIONS LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: LIFEPORT SCIENCES LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: TELECONFERENCE SYSTEMS LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: MOBILE ENHANCEMENT SOLUTIONS LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: STINGRAY IP SOLUTIONS LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: LIMESTONE MEMORY SYSTEMS LLC, CALIFORNIA Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: INNOVATIVE DISPLAY TECHNOLOGIES LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: PARTHENON UNIFIED MEMORY ARCHITECTURE LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: BONUTTI SKELETAL INNOVATIONS LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 Owner name: AMERICAN VEHICULAR SCIENCES LLC, TEXAS Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:053654/0254 Effective date: 20200630 |
|
AS | Assignment |
Owner name: R2 SOLUTIONS LLC, TEXAS Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE ASSIGNEE NAME PREVIOUSLY RECORDED ON REEL 053654 FRAME 0254. ASSIGNOR(S) HEREBY CONFIRMS THE RELEASE OF SECURITY INTEREST GRANTED PURSUANT TO THE PATENT SECURITY AGREEMENT PREVIOUSLY RECORDED;ASSIGNOR:STARBOARD VALUE INTERMEDIATE FUND LP;REEL/FRAME:054981/0377 Effective date: 20200630 |
|
AS | Assignment |
Owner name: STARBOARD VALUE INTERMEDIATE FUND LP, AS COLLATERAL AGENT, NEW YORK Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE THE ASSIGNOR NAME PREVIOUSLY RECORDED AT REEL: 052853 FRAME: 0153. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNOR:R2 SOLUTIONS LLC;REEL/FRAME:056832/0001 Effective date: 20200604 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20220413 |