US20130124552A1 - Locating relevant content items across multiple disparate content sources - Google Patents

Locating relevant content items across multiple disparate content sources Download PDF

Info

Publication number
US20130124552A1
US20130124552A1 US13/295,108 US201113295108A US2013124552A1 US 20130124552 A1 US20130124552 A1 US 20130124552A1 US 201113295108 A US201113295108 A US 201113295108A US 2013124552 A1 US2013124552 A1 US 2013124552A1
Authority
US
United States
Prior art keywords
query
content
computer
statistics
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US13/295,108
Other versions
US9817898B2 (en
Inventor
Bradley Stevenson
Adam David Harmetz
Quentin Gary Christensen
Julian Zbogar Smith
Anupama Janardhan
Carlos David Argott Hernandez
Ramanathan Somasundaram
Benjamin Joseph Rinaca
Fan Mao
Graham Lee McMynn
Jessica Anne Alspaugh
Michal Piaseczny
Tudor Baraboi
Ashish Shrikrishna Malgi
Thottam R. Sriram
Zainal Arifin
John D. Fan
Kameshwar Jayaraman
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Assigned to MICROSOFT CORPORATION reassignment MICROSOFT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MAO, Fan, ARIFIN, ZAINAL, BARABOI, TUDOR, JANARDHAN, Anupama, JAYARAMAN, Kameshwar, MALGI, ASHISH SHRIKRISHNA, SMITH, JULIAN ZBOGAR, SRIRAM, THOTTAM R., ALSPAUGH, JESSICA ANNE, CHRISTENSEN, QUENTIN GARY, FAN, JOHN D., HARMETZ, ADAM DAVID, HERNANDEZ, CARLOS DAVID ARGOTT, MCMYNN, GRAHAM LEE, PIASECZNY, MICHAL, RINACA, BENJAMIN JOSEPH, SOMASUNDARAM, RAMANATHAN, STEVENSON, BRADLEY
Priority to US13/295,108 priority Critical patent/US9817898B2/en
Application filed by Microsoft Corp filed Critical Microsoft Corp
Priority to PCT/US2012/064252 priority patent/WO2013074378A2/en
Priority to EP12848846.7A priority patent/EP2780838B1/en
Priority to CN2012104523061A priority patent/CN102999574A/en
Publication of US20130124552A1 publication Critical patent/US20130124552A1/en
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Priority to US15/201,124 priority patent/US9996618B2/en
Publication of US9817898B2 publication Critical patent/US9817898B2/en
Application granted granted Critical
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Definitions

  • a company involved in litigation may be obligated to locate and disclose all relevant “evidence” to opposing counsel.
  • Such evidence may include a variety of electronic content, including email messages, documents and other files, list and other contents maintained on websites, and the like.
  • This electronic content may be located on a number of different types of content servers in the enterprise, each having a different process of indexing and/or searching information. Identifying, preserving, and processing this electronic content across the multiple servers may be difficult, time consuming, and expensive.
  • the amount of data that the company is required to sort through and produce may be vast.
  • the lack of tools to effectively limit the amount of relevant electronic content disclosed may increase litigation costs due to the manual review needed of all content before it is disclosed
  • a user may leverage search technologies to locate relevant content items from multiple, different content sources, such as email servers, content sites, fileshares, databases and the like, in order to identify, preserve, and process for export the relevant items.
  • a user involved in an e-discovery investigation may utilize the systems, methods, and user interfaces described herein to create targeted search queries against an identified “virtual archive” of items that produce relevant content items for export and disclosure, thereby decreasing the material requiring manual review and reducing cost and risks involved in the corresponding litigation.
  • query parameters are received from a user interface for defining a query for searching a number of content sources located on multiple, disparate content servers.
  • a native search is executed on each of the content servers based on the received query parameters, and query statistics and other data regarding content items in the content sources matching the query parameters are received.
  • the query statistics are aggregated across the content servers and presented in the user interface. The presentation of the query statistics may be broken out by each content source, by each query phrase segmented from the query, and the like.
  • a preview of a number of content items matching the query parameters is presented based on the data received.
  • FIG. 1 is a block diagram showing aspects of an illustrative operating environment and software components provided by the embodiments presented herein;
  • FIGS. 2 and 3 are screen diagrams showing an illustrative user interface for defining a query for locating content items across multiple content sources and providing query statistics regarding the results of the query, according to embodiments described herein;
  • FIG. 4 is a is a block diagram showing multiple examples of the segmentation of queries for generation of query statistics, according to embodiments described herein;
  • FIGS. 5 and 6 are screen diagrams showing an illustrative user interface for previewing results of the query, according to embodiments described herein;
  • FIG. 7 is a screen diagram showing an illustrative user interface for accepting refinements to the query results, according to embodiments described herein;
  • FIG. 8 is a screen diagram showing an illustrative user interface for managing multiple saved queries, according to embodiments described herein;
  • FIG. 9 is a flow diagram showing one method for locating relevant content items across multiple disparate content sources, according to embodiments described herein.
  • FIG. 10 is a block diagram showing an illustrative computer hardware and software architecture for a computing system capable of implementing aspects of the embodiments presented herein.
  • FIG. 1 shows an illustrative operating environment 100 including software components for locating relevant content items across multiple disparate content sources, according to embodiments provided herein.
  • the environment 100 includes a computer system 102 .
  • the computer system 102 represents one or more Web and/or application servers executing web-based application programs and accessed over a network 108 by a user 104 using a Web browser or other client application executing on a user computing device 106 .
  • the network 108 may be a local-area network (“LAN”), a wide-area network (“WAN”), the Internet, or any other networking topology known in the art that connects the user computing device 106 to the computer system 102 .
  • LAN local-area network
  • WAN wide-area network
  • the Internet or any other networking topology known in the art that connects the user computing device 106 to the computer system 102 .
  • the user computing device may comprise a personal computer (“PC”), a desktop workstation, a laptop, a notebook, a tablet, a mobile device, a personal digital assistant (“PDA”), a game console, a set-top box, a consumer electronics device, and the like.
  • the computer system 102 may represent a user computing device executing application programs locally, or any combination of server computers and user computing devices.
  • An e-discovery client application 110 may execute on the computer system 102 .
  • the user 104 may utilize the e-discovery client application 110 to identify, preserve, and export a set of content items relevant to a business issue or event, such as litigation or other legal matters, for example.
  • the e-discovery client application 110 may allow the user 104 to produce targeted search queries to locate relevant content items from a “virtual archive” comprising content items 112 stored in multiple content sources 114 .
  • the e-discovery client application 110 may further provide the user 104 with the ability to preview the content items 112 returned by a search, refine the query, and to dispatch a list of the relevant content items 112 for export.
  • Examples of a content source 114 may include an email mailbox; a document library, list item archive, e.g. a discussion thread or Web log (“blog”), or other content site; a fileshare or fileshare folder; a website; and the like.
  • Examples of content items 112 may include email messages; documents or files; webpages; list items, e.g. entries in a discussion thread, blog posts, or wiki page entries; and the like.
  • the content items 112 may be stored on and/or accessed through multiple, disparate content servers 116 A- 116 N (also referred to herein generally as content servers 116 or content server 116 ).
  • the content servers 116 include one or more email servers, such as MICROSOFT® EXCHANGE SERVER email servers from Microsoft Corporation of Redmond, Wash.
  • the content servers 116 may also include one or more content site servers, such as MICROSOFT® SHAREPOINT® servers, also from Microsoft Corporation.
  • the content servers 116 may also include one or more file servers, NAS storage devices, or other file and document storage systems.
  • the content servers 116 may include document management servers, database servers, Web servers, and other data and content servers known in the art.
  • each content server 116 A- 116 N may provide a corresponding search interface 118 A- 118 N (also referred to herein as search interfaces 118 or search interface 118 ) for searching the content items 112 stored on and/or accessed through the content server.
  • a content server 116 A comprising an email server may provide a search interface 118 that allows content items 112 comprising email messages contained in content sources 114 comprising email mailboxes to be searched by external applications, such as the e-discovery client application 110 executing on the computer system 102 .
  • the content server 116 maintains one or more indexes supporting the searching of associated content items 112 through the search interface 118 .
  • the search interface 118 may comprise an application programming interface (“API”) that defines SOAP-based Web services, Java RMI calls, WINDOWS® communication foundation (“WFC”) services, RPC calls, and the like.
  • API application programming interface
  • the e-discovery client application 110 may access a case dataset 120 that defines the various content sources 114 containing the content items 112 comprising the virtual archive of items to be searched.
  • the case dataset 120 may represent an XML file, one or more database tables in a database, or any other structured storage mechanism known in the art stored on or accessible to the computer system 102 .
  • the case dataset 120 may be built by the user 104 utilizing the e-discovery client application 110 or another application based on content sources deemed potentially relevant to the litigation other business issue/event at hand.
  • the case dataset 120 may be built by the user 104 using methods and user interfaces similar to that described herein for locating relevant content items 112 in the virtual archive.
  • the case dataset 120 may contain one or more content collections 122 , each content collection 122 comprising one or more source specifications 124 A- 124 N (also referred to herein as source specifications 124 or source specification 124 ).
  • Each source specification 124 may identify a specific content source 114 containing content items 112 that collectively make up the virtual archive.
  • one source specification 124 A may identify a specific personal mailbox stored on or accessed through an email content server 116 A.
  • Another source specification 124 B may identify a document library accessed through a content server 116 B hosting a content site.
  • Organizing the source specifications 124 into content collection(s) 122 allows configuration options for the virtual archive to be applied to at a content collection level, such as whether content items 112 should be preserved in place or copied to an archive and the like.
  • filters may be applied at the content collection level to further limit the content items 112 from the specified content sources 114 to be included in the virtual archive. Filters may include date-ranges for email messages sent or documents created or modified, author/sender of documents or email messages, keyword filters, and the like. In other embodiments, filters may further be specified at a content sources level, i.e. per source specification 124 , or for the entire virtual archive defined in the case dataset 120 .
  • the case dataset 120 may further contain one or more query specifications 126 .
  • Each query specification 126 defines a query that is used to search the content sources 114 comprising the virtual archive as defined by the source specifications 124 to locate relevant content items 112 .
  • the users may utilize e-discovery client application 110 to build the query specifications 126 and save them to the case dataset 120 .
  • the e-discovery client application 110 may further parse the query specification 126 and utilize the search interface 118 of each content server 116 identified by the source specifications 124 to execute the query against each content source 114 .
  • Statistics regarding the query as executed against each content source 114 may then be aggregated by the e-discovery client application 110 and presented to the user 104 , as will be described in more detail below.
  • the e-discovery client application 110 may combine data regarding the content items 112 located by each content server 116 in order to present a preview of the results to the user 104 to allow for further refinement of the query.
  • the e-discovery client application 110 may generate a manifest of all the relevant content items 112 located by the query(s) from the various content sources 114 .
  • the manifest may then be dispatched to an export application that may utilize additional interfaces of each content server 116 to retrieve the content items 112 specified in the manifest and save them to a case export file, as is described in co-pending U.S. patent application Ser. No. 13/293,146 filed Nov. 10, 2011, having Attorney Docket No. 334054.01 and entitled “Export of Content Items from Multiple, Disparate Content Sources,” which is incorporated herein by this reference in its entirety.
  • FIG. 2 shows an illustrative user interface (“UI”) 200 for defining a query to search the content sources 114 of the virtual archive as defined by the source specifications 124 contained in the case dataset 120 .
  • the UI 200 may be presented by the e-discovery client application 110 to the user 104 in a browser window 202 rendered by a Web browser application executing on the user computing device 106 , for example.
  • the UI 200 includes a query specification section 206 where the parameters defining the query may be specified by the user 104 .
  • the query specification section 206 may contain a field allowing the user to specify free-text query 208 in any suitable syntax, such as a keyword query language (“KQL”) query, which may include keywords for the query along with junction words, grouping parenthesis, and the like.
  • the free-text query 208 may further include advanced query syntax/specifications, such as property restrictions using the “property:value” syntax, for example.
  • the syntax of the free-text query 208 may be independent of the form or syntax of the query required by search interface 118 of each content server 116 to search the content sources 114 .
  • the e-discovery client application 110 will parse the free-text query 208 and translate the query to the proper form and/or syntax for the content servers 116 when the query is executed.
  • the query specification section 206 may also contain fields that allow the user 104 to specify a from-date value 210 and to-date value 212 defining a date-range parameter for the query.
  • the date-range parameter may be applied to specific properties of content items 112 depending on their type, such as the sent date of email messages, the creation or modification date of documents or files, the posting date for discussion entries, and the like.
  • the query specification section 206 may also contain a field that allows the user 104 to specify an author/sender parameter 214 . Similar to the date-range parameter, the author/sender parameter 214 may be applied to specific properties of content items 112 depending on their type, such as the sender of email messages, the creator of documents, the poster of discussion entries, and the like.
  • the UI 200 may further include a mechanism for specifying a scope of the query being defined, i.e. those content sources 114 of the virtual archive to which the query is to be applied.
  • the UI 200 may include a scope UI control 216 that, when selected by the user 104 , causes a query scope specification panel 302 to be displayed in the window 202 , as shown in FIG. 3 .
  • the query scope specification panel 302 may include a list of content item groupings, such as content item groupings 304 A- 304 D, corresponding to the content collections 122 and/or source specifications 124 contained in the case dataset 120 .
  • the content item groupings 304 A- 304 D may be presented in a hierarchical fashion.
  • content item grouping 304 A may correspond to a first content collection 122 defined in the case dataset 120
  • content item groupings 304 B- 304 D may correspond to source specifications 124 for three content sources 114 , one for a personal mailbox for “Adam Barr,” one for a personal mailbox for “Regina Wilcox,” and one for a fileshare located at “ ⁇ PUBLIC ⁇ ADAM BARR,” each of which are included in the first content collection 122 .
  • Each content item grouping 304 A- 304 D may further include an inclusion UI control 308 that allows the user 104 to specify whether content source(s) 114 identified by the corresponding source specification 124 or content collection 122 are to be included in the scope of the query being defined.
  • the query scope specification panel 302 may also include a select all UI control 310 that allows the user 104 to specify that all content sources 114 identified the case dataset 120 are to be included in the search.
  • the UI 200 may further include a source query statistic section 220 that provides the user 104 with query statistics 222 regarding the execution of the defined query against the content sources 114 identified in the query scope.
  • the user may utilize an execute query UI control 218 to cause the e-discovery client application 110 to parse the query parameters and utilize the search interface 118 of each content server 116 identified by the source specifications 124 to execute a native query against the specified content sources 114 .
  • Query statistics 222 regarding the query as executed against each content source 114 may then be aggregated by the e-discovery client application 110 and presented in the source query statistic section 220 of the UI 200 .
  • the query statistics 222 may include a list of content source entries, such as content source entry 224 , corresponding to each content source 114 included in the scope of the query.
  • the content source entry 224 may include an identifier of the corresponding content source 114 , as shown at 226 , a count of the number of content items 112 located in the content source that match the query parameters, as shown at 228 , a total size of the content items located, as shown at 230 , and the like.
  • the content source entries 224 in the query statistics may be further grouped under grouping headers 232 A- 232 C. For example, the content source entries 224 may be grouped by a type of the corresponding content source 114 , as shown in FIG. 2 .
  • the content source entries 224 may also be grouped by content collection 122 , by content server 116 , or by other groups. In one embodiment, the grouping of the content source entries 224 corresponding to the content sources 114 in the query statistics may be selected by the user 104 through the UI 200 .
  • the query statistics 222 may further include sub-totals and totals of the count and/or size of the located content items 112 , a percentage of items located versus total content items in the content sources 114 , and the like.
  • the UI 200 may also include a query segmentation statistic section 240 that provides the user 104 with additional query statistics 222 regarding the execution of the defined query.
  • the additional query statistics 222 may further include a count of content items 112 located by the query, as shown at 242 , broken down by various phrases, such as query phrase 244 , of the free-text query 208 specified in the query parameters.
  • the segmentation of the query may be performed by the e-discovery client application 110 in a variety of ways. As shown at 404 A in FIG. 4 , in one embodiment a query may be divided into query phrases 244 A- 244 N at each explicit or implied OR, such as query phrases 244 A and 244 B segmented from the main body of the query 402 .
  • the segmentation process may be performed iteratively based on explicit groupings in the query 402 by parenthesis or implied groupings in the query based on operator precedence, syntax, and the like.
  • the query phrase 244 B may be further segmented into query phrases 244 C- 244 H using the same process.
  • the resulting query phrases 244 A- 244 N may be presented in a hierarchical fashion representing the groupings in the query 402 .
  • the query 402 may be divided into query phrases 244 at each explicit or implied AND, as shown at 404 B in FIG. 4 .
  • the count of content items 112 matching each query phrase 244 is further shown in the query segmentation statistic section 240 .
  • the counts may be generated for each query phrase 244 by the content servers(s) 116 and/or the e-discovery client application 110 may perform a search operation with each individual query phrase 244 in order to aggregate the query statistics 222 for presentation in the query segmentation statistic section 240 . It will be appreciated that the query statistics 222 presented in the source query statistic section 220 and the query segmentation statistic section 240 may be updated each time the query parameters defined for the query are modified by the user, or the query parameters
  • FIG. 5 shows another illustrative UI 500 for providing the user a preview of the content items 112 located by the query defined in the query specification section 206 .
  • the e-discovery client application 110 may further provide the user 104 with the ability to preview the content items 112 returned by a search and further refine the query in order to locate only relevant content items for export.
  • the UI 500 may be presented by the e-discovery client application 110 to the user 104 in a browser window 202 rendered by a Web browser application executing on the user computing device 106 , for example.
  • the e-discovery client application 110 may render the UI 500 in addition to or as an alternative to the UI 200 described above in regard to FIGS. 2-4 .
  • the UI 500 includes the query specification section 206 detailing the parameters defined for the query as well as a result list 502 comprising content item entries, such as content item entry 506 , containing data regarding each previewed content item 112 matching the search parameters.
  • Different result lists 502 may be provided for different types of content sources 114 , selectable by the user through a tab metaphor, as shown in FIG. 5 , or another UI mechanism known in the art.
  • each type category of content source 114 may have a corresponding tab 504 A- 504 C showing a result list 502 containing data specific to content items 112 of that type.
  • each content item entry 506 in the result list 502 on a “MAILBOXES” tab 504 A may include header information for the corresponding email message, such as a subject, the recipients, the sender, the date sent, and the like.
  • each content item entry 506 in the result list 502 on a “FILESHARES” tab 504 C may include digest information for the corresponding document of file, such as a document title or filename, a file type, an author, the creation date, the last modification date, and the like.
  • the content item entries 506 in the result list 502 may contain additional data from the previewed content item 112 , such as the first few lines of the body of an email message, a thumbnail image of a document or file, and the like.
  • each tab 504 A- 504 C corresponds to the search mechanism or index that results in surfacing content items 112 from the content source 114 . Therefore a result list 502 containing email messages from email mailboxes and files from fileshares may be listed under an “ENTERPRISE” tab corresponding to an enterprise-wide search index, while documents from document libraries and list items from blogs or discussion groups may appear in a result list under the “CONTENT SITES” tab 504 B corresponding to search indexes from one or more content sites.
  • each tab 504 A- 504 C may correspond to a type of content server 116 , a specific content server, or other categories or grouping of content items 112 , content sources 114 , and/or content servers.
  • the e-discovery client application 110 may retrieve header or digest information for the top-N matching content items 112 of the type corresponding to the selected tab 504 A- 505 C from the identified content servers 116 based on a default or user-selectable sort order, for example, for display in the results list 502 .
  • the header or digest information may be retrieved from the content servers 116 through the corresponding search interfaces 118 or through another API specific to the content server type.
  • the data may be retrieved by the e-discovery client application 110 asynchronously as the query is modified by the user 104 and/or as the query statistics 222 are updated in the UI 200 .
  • the e-discovery client application 110 may retrieve the entire contents of a content item 112 and display it in a preview pane 602 when the corresponding content item entry 506 in the result list 502 is selected by the user, by hovering a mouse pointer 604 over the entry, for example.
  • the UI 500 may further contain a query refinement section 508 that allows further refinements to the query to be made by the user 104 .
  • the query refinement section 508 may contain a list of properties or “filter categories” 704 A- 704 D (referred to herein general as filter categories 704 ) for which values for refinement of the query may be selected.
  • the filter categories 704 presented to the user 104 may be specific to the type of content sources 114 for which the previewed content items 112 are being presented. For example, as shown in FIG.
  • the filter categories 704 A- 704 D may comprise properties of email messages, such as recipient, domain, mail type, attachment type, and the like. Additional and/or alternative filter categories 704 may be shown with result lists 502 on other tabs 504 B, 504 C containing content items 112 of different types.
  • each value entry 706 listed may further include query statistics showing a count of content items 112 from the current query having the property matching the corresponding value, as further shown in FIG. 7 .
  • the user 104 may select one or more of the listed value entries 706 for the selected filter category 704 , and then select a UI control, such as the apply pushbutton UI control 710 , to apply the selected filter category/value pairs to the query.
  • Applying the selected filter category/value pairs to the query may both update the query statistics 222 presented in the UI 200 as well as updating the previewed content items 112 shown in the results list 502 on the currently selected tab 504 A.
  • the selected filter category/value pairs may be added to the free-text query 208 in the query parameters, uses the “property:value” syntax, for example. The user 104 may then re-arrange, group, and change junction operators for the filter category/value pairs in the free-text query 208 to further refine the query.
  • only one filter category 704 A- 704 D may be open and modified at a given time. If the user 104 is modifying one filter category 704 A and then switches to another before selecting the apply pushbutton UI control 710 , the e-discovery client application 110 may warn the user that any changes to the filter category will not be saved unless they select the apply pushbutton.
  • the user 104 is provided with a custom filter UI control 708 that allows the user to specify an unlisted value for one of the filter categories 704 A- 704 D and/or to specify value(s) for another property or filter category for the content source type beyond the filter categories shown. Selecting the custom filter UI control 708 may turn the UI control into a text box, where the user can enter the additional filter category/value pair in the “property:value” syntax, for example.
  • the UI 500 may further include a query save section 712 that allows the query to be saved as a corresponding query specification 126 in the case dataset 120 , as described above in regard to FIG. 1 .
  • the user may be presented with a UI control to provide a name or other identifier to associate with the query specification 126 .
  • all query parameters for the query are saved to the corresponding query specification 126 , including the free-text query 208 , the date-range parameter, the author/sender parameter 214 , the source specifications 124 and/or content collections 122 comprising the query scope, any filter category/value pairs selected in the query refinement section 508 , and the like.
  • the query statistics 222 last generated by the content servers 116 may be stored with the corresponding query specification 126 for later retrieval.
  • the user 104 may be provided the ability to copy the query parameters from an existing query specification 126 to create a new query, which may then be modified while the existing query specification 126 remains intact.
  • FIG. 8 shows another illustrative UI 800 for the management of saved queries, according to further embodiments.
  • the UI 800 may be presented by the e-discovery client application 110 to the user 104 in a browser window 202 rendered by a Web browser application executing on the user computing device 106 , for example.
  • the UI 800 may include a query list 802 including query entries, such as query entry 804 , for the query specification 126 saved in the in the case dataset 120 .
  • Each query entry 804 may include the free text query 806 from the query specification 126 , along with the name 808 or other identifier associated with the query when saved by the user 104 .
  • the query entry 804 may include query statistics 222 , such as a total count 810 and total size 812 of content items 112 matching the query.
  • the query statistics 222 from the last execution of the query may have been stored with the corresponding query specification 126 when the user 104 saved the query, as described above in regard to FIG. 7 .
  • each query entry 804 may further include a query selection control 814 that allows the user 104 to select one or more queries in the query list 802 .
  • the user 104 may then select an export UI control 816 that will cause the e-discovery client application 110 to generate a manifest of all the relevant content items 112 from all content sources 114 across all content servers 116 that match one or more of the selected query(s) and dispatch the manifest to an export application that retrieves the content items 112 specified and saves them to a case export file, as described above in regard to FIG. 1 .
  • FIG. 9 additional details will be provided regarding the embodiments presented herein. It should be appreciated that the logical operations described with respect to FIG. 9 are implemented (1) as a sequence of computer implemented acts or program modules running on a computing system and/or (2) as interconnected machine logic circuits or circuit modules within the computing system. The implementation is a matter of choice dependent on the performance and other requirements of the computing system. Accordingly, the logical operations described herein are referred to variously as operations, structural devices, acts, or modules. These operations, structural devices, acts, and modules may be implemented in software, in firmware, in special purpose digital logic, and any combination thereof. It should also be appreciated that more or fewer operations may be performed than shown in the figures and described herein. The operations may also be performed in a different order than described.
  • FIG. 9 illustrates one routine 900 for locating relevant content items across multiple disparate content sources, according to one embodiment.
  • the routine 900 may be performed by the e-discovery client application 110 executing on the computer system 102 , for example. It will be appreciated that the routine 900 may also be performed by other modules or components executing on the computer system 102 , or by any combination of modules, components, and computing devices.
  • the routine 900 begins at operation 902 , where the e-discovery client application 110 presents a UI to a user 104 for defining a query to search the content sources 114 of the virtual archive as defined by the source specifications 124 contained in the case dataset 120 .
  • the source specifications 124 may identify content sources 114 on multiple, disparate content servers 116 , such as email mailboxes on an email server, a document library on a content site server, and/or a fileshare on a file server.
  • the e-discovery client application 110 may present the UI 200 described above in regard to FIGS. 2 and 3 to the user 104 for defining the query.
  • the UI 200 may be presented by the e-discovery client application 110 to the user 104 in a browser window 202 rendered by a Web browser application executing on the user computing device 106 , for example.
  • the UI 200 may include a query specification section 206 that allows the user to specify parameters defining the query, such as a free-text query 208 , a date-range parameter, an author/sender parameter 214 , and the like.
  • the UI 200 may further include a query scope specification panel 302 that allows the user to specify the content collections 122 and/or source specifications 124 contained in the case dataset 120 to which the query is to be applied.
  • the routine 900 proceeds from operation 902 to operation 904 , where the e-discovery client application 110 receives the query parameters and/or query scope from the user 104 through the UI 200 , as described above.
  • the user 104 may load the query parameters and query scope from a query specification 126 previously saved to the case dataset 120 .
  • the routine 900 proceeds to operation 906 , where the e-discovery client application 110 executes a native search of each content server 116 specified in the source specifications 124 comprising the query scope.
  • the e-discovery client application 110 may parse the query parameters and utilize the search interface 118 of each content server 116 identified by the source specifications 124 to execute a native query against the specified content sources 114 . According to one embodiment, the e-discovery client application 110 may provide the user 104 with a user interface to view and/or modify the native queries generated for the various content servers 116 .
  • the routine 900 proceeds from operation 906 to operation 908 where the e-discovery client application 110 receives query statistics 222 regarding the query as executed against each content source 114 from the content servers 116 .
  • the e-discovery client application 110 may receive raw statistics broken out by one or more of the content source 114 , query phrases 244 segmented from the free-text query 208 , and the like.
  • the query statistics 222 received from the content servers 116 may include a variety of information at different levels, and that different types of content servers 116 may return different levels of query statistics from the query.
  • the e-discovery client application 110 receives header or digest information regarding the content items 112 in the content sources 114 that match the query, and the e-discovery client application generates the query statistics 222 from this information.
  • the e-discovery client application 110 aggregates the query statistics 222 regarding the various content sources 114 received from the content servers 116 and presents the aggregated statistics to the user 104 .
  • the e-discovery client application 110 may present query statistics 222 broken out by each content source 114 included in the scope of the query, as shown in the source query statistic section 220 of the UI 200 described above in regard to FIG. 2 .
  • the query statistics 222 may further include sub-totals and totals of the count and/or size of the located content items 112 , a percentage of items located versus total content items in the content sources 114 , and the like.
  • the e-discovery client application 110 may further present query statistics 222 broken out by various phrases of the query, as further shown in the query segmentation statistic section 240 of the UI 200 described above in regard to FIGS. 2 and 4 .
  • the routine 900 proceeds to operation 912 , where the e-discovery client application 110 retrieves data regarding the content items 112 in the various content sources 114 matching the query parameters.
  • the e-discovery client application 110 may retrieve header or digest information for a number of matching content items 112 from the identified content servers 116 based on a default or user-selectable sort order, for example.
  • the header or digest information may be retrieved from the content servers 116 through the corresponding search interfaces 118 or through another API specific to the content server type.
  • the routine 900 proceeds from operation 912 to operation 914 , where the e-discovery client application 110 presents the retrieved header or digest information to the user 104 as a preview of matching content items 112 .
  • the e-discovery client application 110 may present the UI 500 described above in regard to FIGS. 5 and 6 that allows the user to preview matching content items 112 by content source type.
  • the previewed content items 112 may be de-duplicated at each content server 116 for content sources 114 served by that content server or similar content servers.
  • the e-discovery client application 110 may perform additional or alternative de-duplication of matching content items 112 across content sources 114 and content servers 116 before presenting the query statistics 222 and/or previewed content items 112 to the user 104 .
  • the routine 900 proceeds to operation 916 , where the e-discovery client application 110 may receive a change or refinement to the query.
  • the user 104 may change one or more of the query parameters in the query specification section 206 of the UI 200 or the query scope in the query scope specification panel 302 as described above in regard to FIGS. 2 and 3 .
  • the user 104 may additionally or alternatively select or specify one or more filter category/value pairs from the query refinement section 508 of the UI 500 described above in regard to FIG. 7 .
  • routine 900 returns to operation 906 , where the e-discovery client application 110 re-executes the modified query against each content server 116 and collects and presents the query statistics 222 and previewed content items 112 to the user 104 , as described above. If no changes or refinements to the query is received by the e-discovery client application 110 at operation 916 , then the routine 900 ends.
  • FIG. 10 shows an example computer architecture for a computer 1000 capable of executing the software components described herein for locating relevant content items across multiple disparate content sources, in the manner presented above.
  • the computer architecture shown in FIG. 10 illustrates a server computer, a conventional desktop computer, laptop, notebook, tablet, PDA, wireless phone, or other computing device, and may be utilized to execute any aspects of the software components presented herein described as executing on the computer system 102 , the user computing device 106 , and/or other computing device.
  • the computer architecture shown in FIG. 10 includes one or more central processing units (“CPUs”) 1002 .
  • the CPUs 1002 may be standard processors that perform the arithmetic and logical operations necessary for the operation of the computer 1000 .
  • the CPUs 1002 perform the necessary operations by transitioning from one discrete, physical state to the next through the manipulation of switching elements that differentiate between and change these states.
  • Switching elements may generally include electronic circuits that maintain one of two binary states, such as flip-flops, and electronic circuits that provide an output state based on the logical combination of the states of one or more other switching elements, such as logic gates. These basic switching elements may be combined to create more complex logic circuits, including registers, adders-subtractors, arithmetic logic units, floating-point units, and other logic elements.
  • the computer architecture further includes a system memory 1008 , including a random access memory (“RAM”) 1014 and a read-only memory 1016 (“ROM”), and a system bus 1004 that couples the memory to the CPUs 1002 .
  • the computer 1000 also includes a mass storage device 1010 for storing an operating system 1018 , application programs, and other program modules, which are described in greater detail herein.
  • the mass storage device 1010 is connected to the CPUs 1002 through a mass storage controller (not shown) connected to the bus 1004 .
  • the mass storage device 1010 provides non-volatile storage for the computer 1000 .
  • the computer 1000 may store information on the mass storage device 1010 by transforming the physical state of the device to reflect the information being stored. The specific transformation of physical state may depend on various factors, in different implementations of this description. Examples of such factors may include, but are not limited to, the technology used to implement the mass storage device, whether the mass storage device is characterized as primary or secondary storage, and the like.
  • the computer 1000 may store information to the mass storage device 1010 by issuing instructions to the mass storage controller to alter the magnetic characteristics of a particular location within a magnetic disk drive, the reflective or refractive characteristics of a particular location in an optical storage device, or the electrical characteristics of a particular capacitor, transistor, or other discrete component in a solid-state storage device. Other transformations of physical media are possible without departing from the scope and spirit of the present description.
  • the computer 1000 may further read information from the mass storage device 1010 by detecting the physical states or characteristics of one or more particular locations within the mass storage device.
  • a number of program modules and data files may be stored in the mass storage device 1010 and RAM 1014 of the computer 1000 , including an operating system 1018 suitable for controlling the operation of a computer.
  • the mass storage device 1010 and RAM 1014 may also store one or more program modules.
  • the mass storage device 1010 and the RAM 1014 may store the e-discovery client application 110 , which was described in detail above in regard to FIG. 1 .
  • the mass storage device 1010 and the RAM 1014 may also store other types of program modules or data.
  • the computer 1000 may have access to other computer-readable media to store and retrieve information, such as program modules, data structures, or other data.
  • computer-readable media may be any available media that can be accessed by the computer 1000 , including computer-readable storage media and communications media.
  • Communications media includes transitory signals.
  • Computer-readable storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for the storage of information, such as computer-readable instructions, data structures, program modules, or other data.
  • computer-readable storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, digital versatile disks (DVD), HD-DVD, BLU-RAY, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and that can be accessed by the computer 1000 .
  • the computer-readable storage medium may be encoded with computer-executable instructions that, when loaded into the computer 1000 , may transform the computer system from a general-purpose computing system into a special-purpose computer capable of implementing the embodiments described herein.
  • the computer-executable instructions may be encoded on the computer-readable storage medium by altering the electrical, optical, magnetic, or other physical characteristics of particular locations within the media. These computer-executable instructions transform the computer 1000 by specifying how the CPUs 1002 transition between states, as described above.
  • the computer 1000 may have access to computer-readable storage media storing computer-executable instructions that, when executed by the computer, perform the routine 900 for locating relevant content items across multiple disparate content sources described above in regard to FIG. 9 .
  • the computer 1000 may operate in a networked environment using logical connections to remote computing devices and computer systems through one or more networks 108 , such as a LAN, a WAN, the Internet, or a network of any topology known in the art.
  • the computer 1000 may connect to the network 1020 through a network interface unit 1006 connected to the bus 1004 . It should be appreciated that the network interface unit 1006 may also be utilized to connect to other types of networks and remote computer systems.
  • the computer 1000 may also include an input/output controller 1012 for receiving and processing input from one or more input devices, including a keyboard, a mouse, a touchpad, a touch-sensitive display, an electronic stylus, or other type of input device. Similarly, the input/output controller 1012 may provide output to a display device, such as a computer monitor, a flat-panel display, a digital projector, a printer, a plotter, or other type of output device. It will be appreciated that the computer 1000 may not include all of the components shown in FIG. 10 , may include other components that are not explicitly shown in FIG. 10 , or may utilize an architecture completely different than that shown in FIG. 10 .

Abstract

Technologies are described herein for locating relevant content items across multiple disparate content sources. Query parameters are received from a user interface for defining a query for searching a number of content sources located on multiple, disparate content servers. A native search is executed on each of the content servers based on the received query parameters, and query statistics and other data regarding content items in the content sources matching the query parameters are received. The query statistics are aggregated across the content servers and presented in the user interface. The presentation of the query statistics may be broken out by each content source, by each query phrase segmented from the query, and the like. In addition, a preview of a number of content items matching the query parameters is presented based on the data received.

Description

    BACKGROUND
  • A company involved in litigation may be obligated to locate and disclose all relevant “evidence” to opposing counsel. Such evidence may include a variety of electronic content, including email messages, documents and other files, list and other contents maintained on websites, and the like. This electronic content may be located on a number of different types of content servers in the enterprise, each having a different process of indexing and/or searching information. Identifying, preserving, and processing this electronic content across the multiple servers may be difficult, time consuming, and expensive. The amount of data that the company is required to sort through and produce may be vast. In addition, the lack of tools to effectively limit the amount of relevant electronic content disclosed may increase litigation costs due to the manual review needed of all content before it is disclosed
  • It is with respect to these considerations and others that the disclosure made herein is presented.
  • SUMMARY
  • Technologies are described herein for locating relevant content items across multiple disparate content sources. Utilizing the technologies described herein, a user may leverage search technologies to locate relevant content items from multiple, different content sources, such as email servers, content sites, fileshares, databases and the like, in order to identify, preserve, and process for export the relevant items. For example, a user involved in an e-discovery investigation may utilize the systems, methods, and user interfaces described herein to create targeted search queries against an identified “virtual archive” of items that produce relevant content items for export and disclosure, thereby decreasing the material requiring manual review and reducing cost and risks involved in the corresponding litigation.
  • According to embodiments, query parameters are received from a user interface for defining a query for searching a number of content sources located on multiple, disparate content servers. A native search is executed on each of the content servers based on the received query parameters, and query statistics and other data regarding content items in the content sources matching the query parameters are received. The query statistics are aggregated across the content servers and presented in the user interface. The presentation of the query statistics may be broken out by each content source, by each query phrase segmented from the query, and the like. In addition, a preview of a number of content items matching the query parameters is presented based on the data received.
  • It should be appreciated that the above-described subject matter may be implemented as a computer-controlled apparatus, a computer process, a computing system, or as an article of manufacture such as a computer-readable medium. These and various other features will be apparent from a reading of the following Detailed Description and a review of the associated drawings.
  • This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended that this Summary be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram showing aspects of an illustrative operating environment and software components provided by the embodiments presented herein;
  • FIGS. 2 and 3 are screen diagrams showing an illustrative user interface for defining a query for locating content items across multiple content sources and providing query statistics regarding the results of the query, according to embodiments described herein;
  • FIG. 4 is a is a block diagram showing multiple examples of the segmentation of queries for generation of query statistics, according to embodiments described herein;
  • FIGS. 5 and 6 are screen diagrams showing an illustrative user interface for previewing results of the query, according to embodiments described herein;
  • FIG. 7 is a screen diagram showing an illustrative user interface for accepting refinements to the query results, according to embodiments described herein;
  • FIG. 8 is a screen diagram showing an illustrative user interface for managing multiple saved queries, according to embodiments described herein;
  • FIG. 9 is a flow diagram showing one method for locating relevant content items across multiple disparate content sources, according to embodiments described herein; and
  • FIG. 10 is a block diagram showing an illustrative computer hardware and software architecture for a computing system capable of implementing aspects of the embodiments presented herein.
  • DETAILED DESCRIPTION
  • The following detailed description is directed to technologies for locating relevant content items across multiple disparate content sources. While the subject matter described herein is presented in the general context of program modules that execute in conjunction with the execution of an operating system and application programs on a computer system, those skilled in the art will recognize that other implementations may be performed in combination with other types of program modules. Generally, program modules include routines, programs, components, data structures, and other types of structures that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the subject matter described herein may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like.
  • In the following detailed description, references are made to the accompanying drawings that form a part hereof and that show, by way of illustration, specific embodiments or examples. In the accompanying drawings, like numerals represent like elements through the several figures.
  • FIG. 1 shows an illustrative operating environment 100 including software components for locating relevant content items across multiple disparate content sources, according to embodiments provided herein. The environment 100 includes a computer system 102. In one embodiment, the computer system 102 represents one or more Web and/or application servers executing web-based application programs and accessed over a network 108 by a user 104 using a Web browser or other client application executing on a user computing device 106. The network 108 may be a local-area network (“LAN”), a wide-area network (“WAN”), the Internet, or any other networking topology known in the art that connects the user computing device 106 to the computer system 102. The user computing device may comprise a personal computer (“PC”), a desktop workstation, a laptop, a notebook, a tablet, a mobile device, a personal digital assistant (“PDA”), a game console, a set-top box, a consumer electronics device, and the like. Alternatively, the computer system 102 may represent a user computing device executing application programs locally, or any combination of server computers and user computing devices.
  • An e-discovery client application 110 may execute on the computer system 102. The user 104 may utilize the e-discovery client application 110 to identify, preserve, and export a set of content items relevant to a business issue or event, such as litigation or other legal matters, for example. In particular, the e-discovery client application 110 may allow the user 104 to produce targeted search queries to locate relevant content items from a “virtual archive” comprising content items 112 stored in multiple content sources 114. The e-discovery client application 110 may further provide the user 104 with the ability to preview the content items 112 returned by a search, refine the query, and to dispatch a list of the relevant content items 112 for export. Examples of a content source 114 may include an email mailbox; a document library, list item archive, e.g. a discussion thread or Web log (“blog”), or other content site; a fileshare or fileshare folder; a website; and the like. Examples of content items 112 may include email messages; documents or files; webpages; list items, e.g. entries in a discussion thread, blog posts, or wiki page entries; and the like. According to embodiments, the content items 112 may be stored on and/or accessed through multiple, disparate content servers 116A-116N (also referred to herein generally as content servers 116 or content server 116).
  • In one embodiment, the content servers 116 include one or more email servers, such as MICROSOFT® EXCHANGE SERVER email servers from Microsoft Corporation of Redmond, Wash. The content servers 116 may also include one or more content site servers, such as MICROSOFT® SHAREPOINT® servers, also from Microsoft Corporation. The content servers 116 may also include one or more file servers, NAS storage devices, or other file and document storage systems. In other embodiments, the content servers 116 may include document management servers, database servers, Web servers, and other data and content servers known in the art.
  • According to other embodiments, each content server 116A-116N may provide a corresponding search interface 118A-118N (also referred to herein as search interfaces 118 or search interface 118) for searching the content items 112 stored on and/or accessed through the content server. For example a content server 116A comprising an email server may provide a search interface 118 that allows content items 112 comprising email messages contained in content sources 114 comprising email mailboxes to be searched by external applications, such as the e-discovery client application 110 executing on the computer system 102. In one embodiment, the content server 116 maintains one or more indexes supporting the searching of associated content items 112 through the search interface 118. The search interface 118 may comprise an application programming interface (“API”) that defines SOAP-based Web services, Java RMI calls, WINDOWS® communication foundation (“WFC”) services, RPC calls, and the like.
  • The e-discovery client application 110 may access a case dataset 120 that defines the various content sources 114 containing the content items 112 comprising the virtual archive of items to be searched. The case dataset 120 may represent an XML file, one or more database tables in a database, or any other structured storage mechanism known in the art stored on or accessible to the computer system 102. The case dataset 120 may be built by the user 104 utilizing the e-discovery client application 110 or another application based on content sources deemed potentially relevant to the litigation other business issue/event at hand. In one embodiment, the case dataset 120 may be built by the user 104 using methods and user interfaces similar to that described herein for locating relevant content items 112 in the virtual archive.
  • The case dataset 120 may contain one or more content collections 122, each content collection 122 comprising one or more source specifications 124A-124N (also referred to herein as source specifications 124 or source specification 124). Each source specification 124 may identify a specific content source 114 containing content items 112 that collectively make up the virtual archive. For example, one source specification 124A may identify a specific personal mailbox stored on or accessed through an email content server 116A. Another source specification 124B may identify a document library accessed through a content server 116B hosting a content site. Organizing the source specifications 124 into content collection(s) 122 allows configuration options for the virtual archive to be applied to at a content collection level, such as whether content items 112 should be preserved in place or copied to an archive and the like. In addition, filters may be applied at the content collection level to further limit the content items 112 from the specified content sources 114 to be included in the virtual archive. Filters may include date-ranges for email messages sent or documents created or modified, author/sender of documents or email messages, keyword filters, and the like. In other embodiments, filters may further be specified at a content sources level, i.e. per source specification 124, or for the entire virtual archive defined in the case dataset 120.
  • The case dataset 120 may further contain one or more query specifications 126. Each query specification 126 defines a query that is used to search the content sources 114 comprising the virtual archive as defined by the source specifications 124 to locate relevant content items 112. According to embodiments, the users may utilize e-discovery client application 110 to build the query specifications 126 and save them to the case dataset 120. The e-discovery client application 110 may further parse the query specification 126 and utilize the search interface 118 of each content server 116 identified by the source specifications 124 to execute the query against each content source 114. Statistics regarding the query as executed against each content source 114 may then be aggregated by the e-discovery client application 110 and presented to the user 104, as will be described in more detail below. In addition, the e-discovery client application 110 may combine data regarding the content items 112 located by each content server 116 in order to present a preview of the results to the user 104 to allow for further refinement of the query.
  • Finally, the e-discovery client application 110 may generate a manifest of all the relevant content items 112 located by the query(s) from the various content sources 114. The manifest may then be dispatched to an export application that may utilize additional interfaces of each content server 116 to retrieve the content items 112 specified in the manifest and save them to a case export file, as is described in co-pending U.S. patent application Ser. No. 13/293,146 filed Nov. 10, 2011, having Attorney Docket No. 334054.01 and entitled “Export of Content Items from Multiple, Disparate Content Sources,” which is incorporated herein by this reference in its entirety.
  • FIG. 2 shows an illustrative user interface (“UI”) 200 for defining a query to search the content sources 114 of the virtual archive as defined by the source specifications 124 contained in the case dataset 120. The UI 200 may be presented by the e-discovery client application 110 to the user 104 in a browser window 202 rendered by a Web browser application executing on the user computing device 106, for example.
  • The UI 200 includes a query specification section 206 where the parameters defining the query may be specified by the user 104. For example, the query specification section 206 may contain a field allowing the user to specify free-text query 208 in any suitable syntax, such as a keyword query language (“KQL”) query, which may include keywords for the query along with junction words, grouping parenthesis, and the like. In one embodiment, the free-text query 208 may further include advanced query syntax/specifications, such as property restrictions using the “property:value” syntax, for example. According to embodiments, the syntax of the free-text query 208 may be independent of the form or syntax of the query required by search interface 118 of each content server 116 to search the content sources 114. The e-discovery client application 110 will parse the free-text query 208 and translate the query to the proper form and/or syntax for the content servers 116 when the query is executed.
  • The query specification section 206 may also contain fields that allow the user 104 to specify a from-date value 210 and to-date value 212 defining a date-range parameter for the query. The date-range parameter may be applied to specific properties of content items 112 depending on their type, such as the sent date of email messages, the creation or modification date of documents or files, the posting date for discussion entries, and the like. The query specification section 206 may also contain a field that allows the user 104 to specify an author/sender parameter 214. Similar to the date-range parameter, the author/sender parameter 214 may be applied to specific properties of content items 112 depending on their type, such as the sender of email messages, the creator of documents, the poster of discussion entries, and the like.
  • The UI 200 may further include a mechanism for specifying a scope of the query being defined, i.e. those content sources 114 of the virtual archive to which the query is to be applied. For example, the UI 200 may include a scope UI control 216 that, when selected by the user 104, causes a query scope specification panel 302 to be displayed in the window 202, as shown in FIG. 3. The query scope specification panel 302 may include a list of content item groupings, such as content item groupings 304A-304D, corresponding to the content collections 122 and/or source specifications 124 contained in the case dataset 120. In addition, the content item groupings 304A-304D may be presented in a hierarchical fashion. For example, content item grouping 304A may correspond to a first content collection 122 defined in the case dataset 120, while content item groupings 304B-304D may correspond to source specifications 124 for three content sources 114, one for a personal mailbox for “Adam Barr,” one for a personal mailbox for “Regina Wilcox,” and one for a fileshare located at “\\PUBLIC\ADAM BARR,” each of which are included in the first content collection 122.
  • Each content item grouping 304A-304D may further include an inclusion UI control 308 that allows the user 104 to specify whether content source(s) 114 identified by the corresponding source specification 124 or content collection 122 are to be included in the scope of the query being defined. The query scope specification panel 302 may also include a select all UI control 310 that allows the user 104 to specify that all content sources 114 identified the case dataset 120 are to be included in the search.
  • Returning to FIG. 2, the UI 200 may further include a source query statistic section 220 that provides the user 104 with query statistics 222 regarding the execution of the defined query against the content sources 114 identified in the query scope. For example, the user may utilize an execute query UI control 218 to cause the e-discovery client application 110 to parse the query parameters and utilize the search interface 118 of each content server 116 identified by the source specifications 124 to execute a native query against the specified content sources 114. Query statistics 222 regarding the query as executed against each content source 114 may then be aggregated by the e-discovery client application 110 and presented in the source query statistic section 220 of the UI 200.
  • According to one embodiment, the query statistics 222 may include a list of content source entries, such as content source entry 224, corresponding to each content source 114 included in the scope of the query. The content source entry 224 may include an identifier of the corresponding content source 114, as shown at 226, a count of the number of content items 112 located in the content source that match the query parameters, as shown at 228, a total size of the content items located, as shown at 230, and the like. The content source entries 224 in the query statistics may be further grouped under grouping headers 232A-232C. For example, the content source entries 224 may be grouped by a type of the corresponding content source 114, as shown in FIG. 2. The content source entries 224 may also be grouped by content collection 122, by content server 116, or by other groups. In one embodiment, the grouping of the content source entries 224 corresponding to the content sources 114 in the query statistics may be selected by the user 104 through the UI 200. The query statistics 222 may further include sub-totals and totals of the count and/or size of the located content items 112, a percentage of items located versus total content items in the content sources 114, and the like.
  • The UI 200 may also include a query segmentation statistic section 240 that provides the user 104 with additional query statistics 222 regarding the execution of the defined query. The additional query statistics 222 may further include a count of content items 112 located by the query, as shown at 242, broken down by various phrases, such as query phrase 244, of the free-text query 208 specified in the query parameters. The segmentation of the query may be performed by the e-discovery client application 110 in a variety of ways. As shown at 404A in FIG. 4, in one embodiment a query may be divided into query phrases 244A-244N at each explicit or implied OR, such as query phrases 244A and 244B segmented from the main body of the query 402.
  • The segmentation process may be performed iteratively based on explicit groupings in the query 402 by parenthesis or implied groupings in the query based on operator precedence, syntax, and the like. For example, the query phrase 244B may be further segmented into query phrases 244C-244H using the same process. The resulting query phrases 244A-244N may be presented in a hierarchical fashion representing the groupings in the query 402. In another embodiment, the query 402 may be divided into query phrases 244 at each explicit or implied AND, as shown at 404B in FIG. 4. As shown in FIG. 2, the count of content items 112 matching each query phrase 244 is further shown in the query segmentation statistic section 240. The counts may be generated for each query phrase 244 by the content servers(s) 116 and/or the e-discovery client application 110 may perform a search operation with each individual query phrase 244 in order to aggregate the query statistics 222 for presentation in the query segmentation statistic section 240. It will be appreciated that the query statistics 222 presented in the source query statistic section 220 and the query segmentation statistic section 240 may be updated each time the query parameters defined for the query are modified by the user, or the query parameters
  • FIG. 5 shows another illustrative UI 500 for providing the user a preview of the content items 112 located by the query defined in the query specification section 206. As described above, the e-discovery client application 110 may further provide the user 104 with the ability to preview the content items 112 returned by a search and further refine the query in order to locate only relevant content items for export. The UI 500 may be presented by the e-discovery client application 110 to the user 104 in a browser window 202 rendered by a Web browser application executing on the user computing device 106, for example. The e-discovery client application 110 may render the UI 500 in addition to or as an alternative to the UI 200 described above in regard to FIGS. 2-4.
  • The UI 500 includes the query specification section 206 detailing the parameters defined for the query as well as a result list 502 comprising content item entries, such as content item entry 506, containing data regarding each previewed content item 112 matching the search parameters. Different result lists 502 may be provided for different types of content sources 114, selectable by the user through a tab metaphor, as shown in FIG. 5, or another UI mechanism known in the art. For example, each type category of content source 114 may have a corresponding tab 504A-504C showing a result list 502 containing data specific to content items 112 of that type. For example, for content sources 114 comprising email mailboxes, each content item entry 506 in the result list 502 on a “MAILBOXES” tab 504A may include header information for the corresponding email message, such as a subject, the recipients, the sender, the date sent, and the like. For content sources 114 comprising document libraries or fileshares, each content item entry 506 in the result list 502 on a “FILESHARES” tab 504C may include digest information for the corresponding document of file, such as a document title or filename, a file type, an author, the creation date, the last modification date, and the like.
  • In another embodiment, the content item entries 506 in the result list 502 may contain additional data from the previewed content item 112, such as the first few lines of the body of an email message, a thumbnail image of a document or file, and the like. In another embodiment, each tab 504A-504C corresponds to the search mechanism or index that results in surfacing content items 112 from the content source 114. Therefore a result list 502 containing email messages from email mailboxes and files from fileshares may be listed under an “ENTERPRISE” tab corresponding to an enterprise-wide search index, while documents from document libraries and list items from blogs or discussion groups may appear in a result list under the “CONTENT SITES” tab 504B corresponding to search indexes from one or more content sites. In further embodiments, each tab 504A-504C may correspond to a type of content server 116, a specific content server, or other categories or grouping of content items 112, content sources 114, and/or content servers.
  • The e-discovery client application 110 may retrieve header or digest information for the top-N matching content items 112 of the type corresponding to the selected tab 504A-505C from the identified content servers 116 based on a default or user-selectable sort order, for example, for display in the results list 502. The header or digest information may be retrieved from the content servers 116 through the corresponding search interfaces 118 or through another API specific to the content server type. In addition, the data may be retrieved by the e-discovery client application 110 asynchronously as the query is modified by the user 104 and/or as the query statistics 222 are updated in the UI 200. In addition, as shown in FIG. 6, the e-discovery client application 110 may retrieve the entire contents of a content item 112 and display it in a preview pane 602 when the corresponding content item entry 506 in the result list 502 is selected by the user, by hovering a mouse pointer 604 over the entry, for example.
  • Referring now to FIG. 7, the UI 500 may further contain a query refinement section 508 that allows further refinements to the query to be made by the user 104. The query refinement section 508 may contain a list of properties or “filter categories” 704A-704D (referred to herein general as filter categories 704) for which values for refinement of the query may be selected. The filter categories 704 presented to the user 104 may be specific to the type of content sources 114 for which the previewed content items 112 are being presented. For example, as shown in FIG. 7, if the email messages are being previewed in the result list 502 on the “MAILBOXES” tab 504A, the filter categories 704A-704D may comprise properties of email messages, such as recipient, domain, mail type, attachment type, and the like. Additional and/or alternative filter categories 704 may be shown with result lists 502 on other tabs 504B, 504C containing content items 112 of different types.
  • If the user 104 selects a particular filter category, such as filter category 704A, the user may be further presented with a list of value entries, such as value entry 706, for the selected filter category generated from the previewed content items 112. In one embodiment, each value entry 706 listed may further include query statistics showing a count of content items 112 from the current query having the property matching the corresponding value, as further shown in FIG. 7. The user 104 may select one or more of the listed value entries 706 for the selected filter category 704, and then select a UI control, such as the apply pushbutton UI control 710, to apply the selected filter category/value pairs to the query. Applying the selected filter category/value pairs to the query may both update the query statistics 222 presented in the UI 200 as well as updating the previewed content items 112 shown in the results list 502 on the currently selected tab 504A. In another embodiment, the selected filter category/value pairs may be added to the free-text query 208 in the query parameters, uses the “property:value” syntax, for example. The user 104 may then re-arrange, group, and change junction operators for the filter category/value pairs in the free-text query 208 to further refine the query.
  • According to one embodiment, only one filter category 704A-704D may be open and modified at a given time. If the user 104 is modifying one filter category 704A and then switches to another before selecting the apply pushbutton UI control 710, the e-discovery client application 110 may warn the user that any changes to the filter category will not be saved unless they select the apply pushbutton. In another embodiment, the user 104 is provided with a custom filter UI control 708 that allows the user to specify an unlisted value for one of the filter categories 704A-704D and/or to specify value(s) for another property or filter category for the content source type beyond the filter categories shown. Selecting the custom filter UI control 708 may turn the UI control into a text box, where the user can enter the additional filter category/value pair in the “property:value” syntax, for example.
  • The UI 500 may further include a query save section 712 that allows the query to be saved as a corresponding query specification 126 in the case dataset 120, as described above in regard to FIG. 1. The user may be presented with a UI control to provide a name or other identifier to associate with the query specification 126. According to embodiments, all query parameters for the query are saved to the corresponding query specification 126, including the free-text query 208, the date-range parameter, the author/sender parameter 214, the source specifications 124 and/or content collections 122 comprising the query scope, any filter category/value pairs selected in the query refinement section 508, and the like. In addition, the query statistics 222 last generated by the content servers 116 may be stored with the corresponding query specification 126 for later retrieval. In one embodiment, the user 104 may be provided the ability to copy the query parameters from an existing query specification 126 to create a new query, which may then be modified while the existing query specification 126 remains intact.
  • FIG. 8 shows another illustrative UI 800 for the management of saved queries, according to further embodiments. The UI 800 may be presented by the e-discovery client application 110 to the user 104 in a browser window 202 rendered by a Web browser application executing on the user computing device 106, for example. The UI 800 may include a query list 802 including query entries, such as query entry 804, for the query specification 126 saved in the in the case dataset 120. Each query entry 804 may include the free text query 806 from the query specification 126, along with the name 808 or other identifier associated with the query when saved by the user 104. In addition, the query entry 804 may include query statistics 222, such as a total count 810 and total size 812 of content items 112 matching the query. The query statistics 222 from the last execution of the query may have been stored with the corresponding query specification 126 when the user 104 saved the query, as described above in regard to FIG. 7.
  • According to embodiments, each query entry 804 may further include a query selection control 814 that allows the user 104 to select one or more queries in the query list 802. The user 104 may then select an export UI control 816 that will cause the e-discovery client application 110 to generate a manifest of all the relevant content items 112 from all content sources 114 across all content servers 116 that match one or more of the selected query(s) and dispatch the manifest to an export application that retrieves the content items 112 specified and saves them to a case export file, as described above in regard to FIG. 1.
  • Referring now to FIG. 9, additional details will be provided regarding the embodiments presented herein. It should be appreciated that the logical operations described with respect to FIG. 9 are implemented (1) as a sequence of computer implemented acts or program modules running on a computing system and/or (2) as interconnected machine logic circuits or circuit modules within the computing system. The implementation is a matter of choice dependent on the performance and other requirements of the computing system. Accordingly, the logical operations described herein are referred to variously as operations, structural devices, acts, or modules. These operations, structural devices, acts, and modules may be implemented in software, in firmware, in special purpose digital logic, and any combination thereof. It should also be appreciated that more or fewer operations may be performed than shown in the figures and described herein. The operations may also be performed in a different order than described.
  • FIG. 9 illustrates one routine 900 for locating relevant content items across multiple disparate content sources, according to one embodiment. The routine 900 may be performed by the e-discovery client application 110 executing on the computer system 102, for example. It will be appreciated that the routine 900 may also be performed by other modules or components executing on the computer system 102, or by any combination of modules, components, and computing devices. The routine 900 begins at operation 902, where the e-discovery client application 110 presents a UI to a user 104 for defining a query to search the content sources 114 of the virtual archive as defined by the source specifications 124 contained in the case dataset 120. According to embodiments, the source specifications 124 may identify content sources 114 on multiple, disparate content servers 116, such as email mailboxes on an email server, a document library on a content site server, and/or a fileshare on a file server.
  • The e-discovery client application 110 may present the UI 200 described above in regard to FIGS. 2 and 3 to the user 104 for defining the query. The UI 200 may be presented by the e-discovery client application 110 to the user 104 in a browser window 202 rendered by a Web browser application executing on the user computing device 106, for example. The UI 200 may include a query specification section 206 that allows the user to specify parameters defining the query, such as a free-text query 208, a date-range parameter, an author/sender parameter 214, and the like. The UI 200 may further include a query scope specification panel 302 that allows the user to specify the content collections 122 and/or source specifications 124 contained in the case dataset 120 to which the query is to be applied.
  • The routine 900 proceeds from operation 902 to operation 904, where the e-discovery client application 110 receives the query parameters and/or query scope from the user 104 through the UI 200, as described above. In another embodiment, the user 104 may load the query parameters and query scope from a query specification 126 previously saved to the case dataset 120. From operation 904, the routine 900 proceeds to operation 906, where the e-discovery client application 110 executes a native search of each content server 116 specified in the source specifications 124 comprising the query scope. As described above, the e-discovery client application 110 may parse the query parameters and utilize the search interface 118 of each content server 116 identified by the source specifications 124 to execute a native query against the specified content sources 114. According to one embodiment, the e-discovery client application 110 may provide the user 104 with a user interface to view and/or modify the native queries generated for the various content servers 116.
  • The routine 900 proceeds from operation 906 to operation 908 where the e-discovery client application 110 receives query statistics 222 regarding the query as executed against each content source 114 from the content servers 116. The e-discovery client application 110 may receive raw statistics broken out by one or more of the content source 114, query phrases 244 segmented from the free-text query 208, and the like. It will be appreciated that the query statistics 222 received from the content servers 116 may include a variety of information at different levels, and that different types of content servers 116 may return different levels of query statistics from the query. In one embodiment, the e-discovery client application 110 receives header or digest information regarding the content items 112 in the content sources 114 that match the query, and the e-discovery client application generates the query statistics 222 from this information.
  • At operation 910, the e-discovery client application 110 aggregates the query statistics 222 regarding the various content sources 114 received from the content servers 116 and presents the aggregated statistics to the user 104. The e-discovery client application 110 may present query statistics 222 broken out by each content source 114 included in the scope of the query, as shown in the source query statistic section 220 of the UI 200 described above in regard to FIG. 2. The query statistics 222 may further include sub-totals and totals of the count and/or size of the located content items 112, a percentage of items located versus total content items in the content sources 114, and the like. The e-discovery client application 110 may further present query statistics 222 broken out by various phrases of the query, as further shown in the query segmentation statistic section 240 of the UI 200 described above in regard to FIGS. 2 and 4.
  • From operation 910, the routine 900 proceeds to operation 912, where the e-discovery client application 110 retrieves data regarding the content items 112 in the various content sources 114 matching the query parameters. As described above in regard to FIG. 5, the e-discovery client application 110 may retrieve header or digest information for a number of matching content items 112 from the identified content servers 116 based on a default or user-selectable sort order, for example. The header or digest information may be retrieved from the content servers 116 through the corresponding search interfaces 118 or through another API specific to the content server type.
  • The routine 900 proceeds from operation 912 to operation 914, where the e-discovery client application 110 presents the retrieved header or digest information to the user 104 as a preview of matching content items 112. For example, the e-discovery client application 110 may present the UI 500 described above in regard to FIGS. 5 and 6 that allows the user to preview matching content items 112 by content source type. In one embodiment, the previewed content items 112 may be de-duplicated at each content server 116 for content sources 114 served by that content server or similar content servers. In another embodiment, the e-discovery client application 110 may perform additional or alternative de-duplication of matching content items 112 across content sources 114 and content servers 116 before presenting the query statistics 222 and/or previewed content items 112 to the user 104.
  • From operation 914, the routine 900 proceeds to operation 916, where the e-discovery client application 110 may receive a change or refinement to the query. For example the user 104 may change one or more of the query parameters in the query specification section 206 of the UI 200 or the query scope in the query scope specification panel 302 as described above in regard to FIGS. 2 and 3. The user 104 may additionally or alternatively select or specify one or more filter category/value pairs from the query refinement section 508 of the UI 500 described above in regard to FIG. 7. If a change or refinement to the query is received, the routine 900 returns to operation 906, where the e-discovery client application 110 re-executes the modified query against each content server 116 and collects and presents the query statistics 222 and previewed content items 112 to the user 104, as described above. If no changes or refinements to the query is received by the e-discovery client application 110 at operation 916, then the routine 900 ends.
  • FIG. 10 shows an example computer architecture for a computer 1000 capable of executing the software components described herein for locating relevant content items across multiple disparate content sources, in the manner presented above. The computer architecture shown in FIG. 10 illustrates a server computer, a conventional desktop computer, laptop, notebook, tablet, PDA, wireless phone, or other computing device, and may be utilized to execute any aspects of the software components presented herein described as executing on the computer system 102, the user computing device 106, and/or other computing device.
  • The computer architecture shown in FIG. 10 includes one or more central processing units (“CPUs”) 1002. The CPUs 1002 may be standard processors that perform the arithmetic and logical operations necessary for the operation of the computer 1000. The CPUs 1002 perform the necessary operations by transitioning from one discrete, physical state to the next through the manipulation of switching elements that differentiate between and change these states. Switching elements may generally include electronic circuits that maintain one of two binary states, such as flip-flops, and electronic circuits that provide an output state based on the logical combination of the states of one or more other switching elements, such as logic gates. These basic switching elements may be combined to create more complex logic circuits, including registers, adders-subtractors, arithmetic logic units, floating-point units, and other logic elements.
  • The computer architecture further includes a system memory 1008, including a random access memory (“RAM”) 1014 and a read-only memory 1016 (“ROM”), and a system bus 1004 that couples the memory to the CPUs 1002. A basic input/output system containing the basic routines that help to transfer information between elements within the computer 1000, such as during startup, is stored in the ROM 1016. The computer 1000 also includes a mass storage device 1010 for storing an operating system 1018, application programs, and other program modules, which are described in greater detail herein.
  • The mass storage device 1010 is connected to the CPUs 1002 through a mass storage controller (not shown) connected to the bus 1004. The mass storage device 1010 provides non-volatile storage for the computer 1000. The computer 1000 may store information on the mass storage device 1010 by transforming the physical state of the device to reflect the information being stored. The specific transformation of physical state may depend on various factors, in different implementations of this description. Examples of such factors may include, but are not limited to, the technology used to implement the mass storage device, whether the mass storage device is characterized as primary or secondary storage, and the like.
  • For example, the computer 1000 may store information to the mass storage device 1010 by issuing instructions to the mass storage controller to alter the magnetic characteristics of a particular location within a magnetic disk drive, the reflective or refractive characteristics of a particular location in an optical storage device, or the electrical characteristics of a particular capacitor, transistor, or other discrete component in a solid-state storage device. Other transformations of physical media are possible without departing from the scope and spirit of the present description. The computer 1000 may further read information from the mass storage device 1010 by detecting the physical states or characteristics of one or more particular locations within the mass storage device.
  • As mentioned briefly above, a number of program modules and data files may be stored in the mass storage device 1010 and RAM 1014 of the computer 1000, including an operating system 1018 suitable for controlling the operation of a computer. The mass storage device 1010 and RAM 1014 may also store one or more program modules. In particular, the mass storage device 1010 and the RAM 1014 may store the e-discovery client application 110, which was described in detail above in regard to FIG. 1. The mass storage device 1010 and the RAM 1014 may also store other types of program modules or data.
  • In addition to the mass storage device 1010 described above, the computer 1000 may have access to other computer-readable media to store and retrieve information, such as program modules, data structures, or other data. It should be appreciated by those skilled in the art that computer-readable media may be any available media that can be accessed by the computer 1000, including computer-readable storage media and communications media. Communications media includes transitory signals. Computer-readable storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for the storage of information, such as computer-readable instructions, data structures, program modules, or other data. For example, computer-readable storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, digital versatile disks (DVD), HD-DVD, BLU-RAY, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and that can be accessed by the computer 1000.
  • The computer-readable storage medium may be encoded with computer-executable instructions that, when loaded into the computer 1000, may transform the computer system from a general-purpose computing system into a special-purpose computer capable of implementing the embodiments described herein. The computer-executable instructions may be encoded on the computer-readable storage medium by altering the electrical, optical, magnetic, or other physical characteristics of particular locations within the media. These computer-executable instructions transform the computer 1000 by specifying how the CPUs 1002 transition between states, as described above. According to one embodiment, the computer 1000 may have access to computer-readable storage media storing computer-executable instructions that, when executed by the computer, perform the routine 900 for locating relevant content items across multiple disparate content sources described above in regard to FIG. 9.
  • According to various embodiments, the computer 1000 may operate in a networked environment using logical connections to remote computing devices and computer systems through one or more networks 108, such as a LAN, a WAN, the Internet, or a network of any topology known in the art. The computer 1000 may connect to the network 1020 through a network interface unit 1006 connected to the bus 1004. It should be appreciated that the network interface unit 1006 may also be utilized to connect to other types of networks and remote computer systems.
  • The computer 1000 may also include an input/output controller 1012 for receiving and processing input from one or more input devices, including a keyboard, a mouse, a touchpad, a touch-sensitive display, an electronic stylus, or other type of input device. Similarly, the input/output controller 1012 may provide output to a display device, such as a computer monitor, a flat-panel display, a digital projector, a printer, a plotter, or other type of output device. It will be appreciated that the computer 1000 may not include all of the components shown in FIG. 10, may include other components that are not explicitly shown in FIG. 10, or may utilize an architecture completely different than that shown in FIG. 10.
  • Based on the foregoing, it should be appreciated that technologies for locating relevant content items across multiple disparate content sources are provided herein. Although the subject matter presented herein has been described in language specific to computer structural features, methodological acts, and computer-readable storage media, it is to be understood that the invention defined in the appended claims is not necessarily limited to the specific features, acts, or media described herein. Rather, the specific features, acts, and mediums are disclosed as example forms of implementing the claims.
  • The subject matter described above is provided by way of illustration only and should not be construed as limiting. Various modifications and changes may be made to the subject matter described herein without following the example embodiments and applications illustrated and described, and without departing from the true spirit and scope of the present invention, which is set forth in the following claims.

Claims (20)

What is claimed is:
1. A system for locating content items in a plurality of content sources across different content servers, the system comprising:
one or more processors;
a memory coupled to the one or more processors; and
an e-discovery client application residing in the memory and comprising computer-executable instructions that, when executed by the one or more processors, cause the system to
present a user interface for defining a query for searching the plurality of content sources,
receive query parameters and a query scope regarding the query, the query scope comprising content sources located on at least two content servers of different types,
receive query statistics from searches executed by each of the at least two content servers based on the query parameters,
aggregate the query statistics from the at least two content servers and present the aggregated query statistics in the user interface, wherein the query statistics are shown regarding each of the plurality of content sources,
retrieve data regarding content items matching the query from the at least two content servers, and
present a preview of the content items matching the query in the user interface from the retrieved data.
2. The system of claim 1, wherein the query statistics are presented regarding each of a plurality of query phrases segmented from a free-text query comprising the query parameters.
3. The system of claim 1, wherein the e-discovery client application comprises further computer-executable instructions that cause the system to:
present a filter category along with one or more values for the filter category in the user interface based on the retrieved data;
receive a selection of one of the one or more values for the filter category;
modify the query parameters for a corresponding content server to include a filter category/value pair based on the selection; and
re-execute the search of the corresponding content server based on the modified query parameters.
4. The system of claim 3, wherein the filter category is specific to a type of content items being previewed in the user interface.
5. The system of claim 1, wherein a first of the at least two content servers comprises an email server and a second of the at least two content servers comprises a content site server.
6. A computer-implemented method for locating content items in a plurality of content sources across different content servers, the method comprising:
receiving from a user query parameters and a query scope regarding a query, the query scope comprising content sources located on at least two content servers of different types;
receiving data regarding content items located through native searches executed on each of the at least two content servers based on the query parameters;
aggregating query statistics across the at least two content servers from the received data; and
presenting the query statistics to the user.
7. The computer-implemented method of claim 6, wherein query statistics are presented regarding each of the plurality of content sources comprising the query scope.
8. The computer-implemented method of claim 7, wherein query statistics regarding each of the plurality of content sources are grouped together by content source type.
9. The computer-implemented method of claim 6, wherein query statistics are presented regarding each of a plurality of query phrases segmented from a free-text query comprising the query parameters.
10. The computer-implemented method of claim 9, wherein the plurality of query phrases are segmented from the free-text-query at each explicit or implied OR junction.
11. The computer-implemented method of claim 6, further comprising:
presenting a preview of one or more content items matching the query to the user from the received data.
12. The computer-implemented method of claim 11, wherein the preview of the one or more content items is presented for content items from content sources of a same type.
13. The computer-implemented method of claim 6, further comprising:
receiving from the user a modification of the query parameters;
receiving data regarding the content items located through the native searches re-executed on each of the at least two content servers based on the modified query parameters; and
upon receiving the data, updating the query statistics presented to the user.
14. The computer-implemented method of claim 6, further comprising:
presenting a filter category along with one or more values for the filter category to the user based on the received data;
receiving a selection of one of the one or more values for the filter category from the user;
modifying the query parameters for a corresponding content server to include a filter category/value pair based on the selection;
receiving data regarding the content items located through the native search of the corresponding content server based on the modified query parameters; and
upon receiving the data, updating the query statistics presented to the user.
15. The computer-implemented method of claim 6, wherein a first of the at least two content servers comprises an email server and a second of the at least two content servers comprises a content site server.
16. A computer-readable storage medium encoded with computer-executable instructions that, when executed by a computer, cause the computer to:
present a user interface for defining a query for searching a plurality of content sources located on at least two content servers of different types;
receive query parameters defining the query, the query parameters comprising a free-text query;
execute a native search of each of the at least two content servers based on the query parameters;
receive data regarding content items matching the query parameters from the at least two content servers; and
present a preview of one or more content items matching the query from the received data.
17. The computer-readable storage medium of claim 16, wherein the preview of the one or more content items is presented for content items from content sources of a same type.
18. The computer-readable storage medium of claim 16, encoded with further computer-executable instructions that cause the computer to:
aggregate query statistics across the at least two content servers from the received data;
present the query statistics regarding each of the plurality of content sources; and
present the query statistics regarding each of a plurality of query phrases segmented from the free-text query.
19. The computer-readable storage medium of claim 18, wherein the plurality of query phrases are segmented from the free-text-query at each explicit or implied OR junction between query phrases.
20. The computer-readable storage medium of claim 16, wherein a first of the at least two content servers comprises an email server and a second of the at least two content servers comprises a file server.
US13/295,108 2011-11-14 2011-11-14 Locating relevant content items across multiple disparate content sources Active 2034-10-08 US9817898B2 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US13/295,108 US9817898B2 (en) 2011-11-14 2011-11-14 Locating relevant content items across multiple disparate content sources
PCT/US2012/064252 WO2013074378A2 (en) 2011-11-14 2012-11-09 Locating relevant content items across multiple disparate content sources
EP12848846.7A EP2780838B1 (en) 2011-11-14 2012-11-09 Locating relevant content items across multiple disparate content sources
CN2012104523061A CN102999574A (en) 2011-11-14 2012-11-13 Positioning of relative content item via crossing plural different content sources
US15/201,124 US9996618B2 (en) 2011-11-14 2016-07-01 Locating relevant content items across multiple disparate content sources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/295,108 US9817898B2 (en) 2011-11-14 2011-11-14 Locating relevant content items across multiple disparate content sources

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/201,124 Continuation US9996618B2 (en) 2011-11-14 2016-07-01 Locating relevant content items across multiple disparate content sources

Publications (2)

Publication Number Publication Date
US20130124552A1 true US20130124552A1 (en) 2013-05-16
US9817898B2 US9817898B2 (en) 2017-11-14

Family

ID=47928142

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/295,108 Active 2034-10-08 US9817898B2 (en) 2011-11-14 2011-11-14 Locating relevant content items across multiple disparate content sources
US15/201,124 Active 2031-11-24 US9996618B2 (en) 2011-11-14 2016-07-01 Locating relevant content items across multiple disparate content sources

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/201,124 Active 2031-11-24 US9996618B2 (en) 2011-11-14 2016-07-01 Locating relevant content items across multiple disparate content sources

Country Status (4)

Country Link
US (2) US9817898B2 (en)
EP (1) EP2780838B1 (en)
CN (1) CN102999574A (en)
WO (1) WO2013074378A2 (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140258377A1 (en) * 2013-03-05 2014-09-11 Fuji Xerox Co., Ltd. Relay apparatus, system, and computer-readable medium
US20140258468A1 (en) * 2013-03-05 2014-09-11 Fuji Xerox Co., Ltd. Relay apparatus, client apparatus, and computer-readable medium
US20160026718A1 (en) * 2014-07-28 2016-01-28 Facebook, Inc. Optimization of Query Execution
US9336332B2 (en) * 2013-08-28 2016-05-10 Clipcard Inc. Programmatic data discovery platforms for computing applications
US20170032039A1 (en) * 2011-11-14 2017-02-02 Microsoft Technology Licensing, Llc Locating relevant content items across multiple disparate content sources
US9846718B1 (en) * 2014-03-31 2017-12-19 EMC IP Holding Company LLC Deduplicating sets of data blocks
US20180196807A1 (en) * 2013-06-13 2018-07-12 John F. Groom Alternative search methodology
US20190332601A1 (en) * 2016-08-09 2019-10-31 Ripcord Inc. Systems and methods for contextual retrieval and contextual display of records
US10902066B2 (en) 2018-07-23 2021-01-26 Open Text Holdings, Inc. Electronic discovery using predictive filtering
US11023828B2 (en) 2010-05-25 2021-06-01 Open Text Holdings, Inc. Systems and methods for predictive coding
CN113779374A (en) * 2021-02-24 2021-12-10 北京京东振世信息技术有限公司 Page query management method and device
US11232068B2 (en) 2017-03-27 2022-01-25 Microsoft Technology Licensing, Llc Unified document retention management system
US11250137B2 (en) * 2017-04-04 2022-02-15 Kenna Security Llc Vulnerability assessment based on machine inference
US11294925B2 (en) * 2018-09-24 2022-04-05 Jpmorgan Chase Bank, N.A. Methods for implementing and using a database actuator
US11416345B2 (en) * 2015-06-15 2022-08-16 Open Text Sa Ulc Systems and methods for content server make disk image operation

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9219776B2 (en) * 2013-06-24 2015-12-22 Microsoft Technology Licensing, Llc Aggregating content from different content sources at a cloud service
US9485543B2 (en) 2013-11-12 2016-11-01 Google Inc. Methods, systems, and media for presenting suggestions of media content
US9552395B2 (en) * 2013-11-13 2017-01-24 Google Inc. Methods, systems, and media for presenting recommended media content items
US10430454B2 (en) * 2014-12-23 2019-10-01 Veritas Technologies Llc Systems and methods for culling search results in electronic discovery
US10096074B2 (en) 2014-12-23 2018-10-09 Veritas Technologies Llc Systems and methods for expanding relevant search results in electronic discovery
US10482096B2 (en) * 2017-02-13 2019-11-19 Microsoft Technology Licensing, Llc Distributed index searching in computing systems
US10409634B2 (en) * 2017-04-19 2019-09-10 Microsoft Technology Licensing, Llc Surfacing task-related applications in a heterogeneous tab environment
CN107196919B (en) * 2017-04-27 2021-01-01 北京小米移动软件有限公司 Data matching method and device
US11366814B2 (en) 2019-06-12 2022-06-21 Elsevier, Inc. Systems and methods for federated search with dynamic selection and distributed relevance

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090150168A1 (en) * 2007-12-07 2009-06-11 Sap Ag Litigation document management
US20100017366A1 (en) * 2008-07-18 2010-01-21 Robertson Steven L System and Method for Performing Contextual Searches Across Content Sources
US20100293178A1 (en) * 2009-05-14 2010-11-18 Microsoft Corporation Providing tools for navigational search query results
US20110082848A1 (en) * 2009-10-05 2011-04-07 Lev Goldentouch Systems, methods and computer program products for search results management

Family Cites Families (71)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5826260A (en) 1995-12-11 1998-10-20 International Business Machines Corporation Information retrieval system and method for displaying and ordering information based on query element contribution
US6122666A (en) 1998-02-23 2000-09-19 International Business Machines Corporation Method for collaborative transformation and caching of web objects in a proxy network
US6643694B1 (en) 2000-02-09 2003-11-04 Michael A. Chernin System and method for integrating a proxy server, an e-mail server, and a DHCP server, with a graphic interface
US6738760B1 (en) 2000-03-23 2004-05-18 Albert Krachman Method and system for providing electronic discovery on computer databases and archives using artificial intelligence to recover legally relevant data
AU6108901A (en) 2000-04-27 2001-11-07 Webfeat Inc Method and system for retrieving search results from multiple disparate databases
US7451136B2 (en) 2000-10-11 2008-11-11 Microsoft Corporation System and method for searching multiple disparate search engines
US7043489B1 (en) 2001-02-23 2006-05-09 Kelley Hubert C Litigation-related document repository
TW586069B (en) 2001-03-01 2004-05-01 Ibm A method and a bridge for coupling a server and a client of different object types
US20020129145A1 (en) 2001-03-06 2002-09-12 Accelerate Software Inc. Method and system for real-time querying, retrieval and integration of data from database over a computer network
US6745197B2 (en) 2001-03-19 2004-06-01 Preston Gates Ellis Llp System and method for efficiently processing messages stored in multiple message stores
US6636857B2 (en) 2001-12-18 2003-10-21 Bluecurrent, Inc. Method and system for web-based asset management
US20030131241A1 (en) 2002-01-04 2003-07-10 Gladney Henry M. Trustworthy digital document interchange and preservation
US20030130953A1 (en) 2002-01-09 2003-07-10 Innerpresence Networks, Inc. Systems and methods for monitoring the presence of assets within a system and enforcing policies governing assets
ITMO20020006A1 (en) 2002-01-10 2003-07-10 Dream Team Srl METHOD AND SYSTEM FOR USER IDENTIFICATION AND AUTHENTICATION OF DIGITAL DOCUMENTS ON TELEMATIC NETWORKS
WO2003079191A1 (en) 2002-03-11 2003-09-25 Visionshare, Inc. Method and system for peer-to-peer secure communication
US20040167979A1 (en) 2003-02-20 2004-08-26 International Business Machines Corporation Automatic configuration of metric components in a service level management system
US7162473B2 (en) 2003-06-26 2007-01-09 Microsoft Corporation Method and system for usage analyzer that determines user accessed sources, indexes data subsets, and associated metadata, processing implicit queries based on potential interest to users
EP1494394A1 (en) 2003-06-30 2005-01-05 Sony International (Europe) GmbH Distance-aware service mechanism for determining the availability of remote services in wireless personal area networks
US7523220B2 (en) 2003-09-17 2009-04-21 Microsoft Corporation Metaspace: communication middleware for partially connected mobile ad hoc networks
US20050149496A1 (en) 2003-12-22 2005-07-07 Verity, Inc. System and method for dynamic context-sensitive federated search of multiple information repositories
US7437353B2 (en) 2003-12-31 2008-10-14 Google Inc. Systems and methods for unification of search results
US7376644B2 (en) 2004-02-02 2008-05-20 Ram Consulting Inc. Knowledge portal for accessing, analyzing and standardizing data
US20060048216A1 (en) 2004-07-21 2006-03-02 International Business Machines Corporation Method and system for enabling federated user lifecycle management
US7734606B2 (en) 2004-09-15 2010-06-08 Graematter, Inc. System and method for regulatory intelligence
US20080077570A1 (en) 2004-10-25 2008-03-27 Infovell, Inc. Full Text Query and Search Systems and Method of Use
US7984057B2 (en) 2005-05-10 2011-07-19 Microsoft Corporation Query composition incorporating by reference a query definition
US7984039B2 (en) 2005-07-14 2011-07-19 International Business Machines Corporation Merging of results in distributed information retrieval
US20070050431A1 (en) 2005-08-26 2007-03-01 Microsoft Corporation Deploying content between networks
US20070073638A1 (en) 2005-09-26 2007-03-29 Bea Systems, Inc. System and method for using soft links to managed content
EP1934840A4 (en) 2005-10-06 2010-12-15 Guidance Software Inc Electronic discovery system and method
US7752204B2 (en) 2005-11-18 2010-07-06 The Boeing Company Query-based text summarization
US20070118529A1 (en) 2005-11-18 2007-05-24 Howell James A Jr Content download experience
US8386469B2 (en) 2006-02-16 2013-02-26 Mobile Content Networks, Inc. Method and system for determining relevant sources, querying and merging results from multiple content sources
US8214394B2 (en) 2006-03-01 2012-07-03 Oracle International Corporation Propagating user identities in a secure federated search system
WO2007148289A2 (en) 2006-06-23 2007-12-27 Koninklijke Philips Electronics N.V. Representing digital content metadata
DE602006014831D1 (en) 2006-09-13 2010-07-22 Alcatel Lucent Concatenation of Web Services
US7792789B2 (en) 2006-10-17 2010-09-07 Commvault Systems, Inc. Method and system for collaborative searching
JP4940898B2 (en) 2006-11-02 2012-05-30 富士通株式会社 Digital content search program, digital content search device, and digital content search method
US7866543B2 (en) 2006-11-21 2011-01-11 International Business Machines Corporation Security and privacy enforcement for discovery services in a network of electronic product code information repositories
NZ578672A (en) 2006-12-29 2012-08-31 Thomson Reuters Glo Resources Information-retrieval systems, methods, and software with concept-based searching and ranking
US7962610B2 (en) 2007-03-07 2011-06-14 International Business Machines Corporation Statistical data inspector
US20080228698A1 (en) 2007-03-16 2008-09-18 Expanse Networks, Inc. Creation of Attribute Combination Databases
US20080288509A1 (en) 2007-05-16 2008-11-20 Google Inc. Duplicate content search
US7904470B2 (en) 2007-06-13 2011-03-08 Sap Ag Discovery service for electronic data environment
US20110047189A1 (en) 2007-10-01 2011-02-24 Microsoft Corporation Integrated Genomic System
US8396838B2 (en) 2007-10-17 2013-03-12 Commvault Systems, Inc. Legal compliance, electronic discovery and electronic document handling of online and offline copies of data
US8145684B2 (en) 2007-11-28 2012-03-27 International Business Machines Corporation System and computer program product for assembly of personalized enterprise information integrators over conjunctive queries
US8276152B2 (en) 2007-12-05 2012-09-25 Microsoft Corporation Validation of the change orders to an I T environment
US20090150906A1 (en) 2007-12-07 2009-06-11 Sap Ag Automatic electronic discovery of heterogeneous objects for litigation
CN101187888A (en) 2007-12-11 2008-05-28 浪潮电子信息产业股份有限公司 Method for coping database data in heterogeneous environment
US8572043B2 (en) 2007-12-20 2013-10-29 International Business Machines Corporation Method and system for storage of unstructured data for electronic discovery in external data stores
US9411861B2 (en) 2007-12-21 2016-08-09 International Business Machines Corporation Multiple result sets generated from single pass through a dataspace
US8140494B2 (en) 2008-01-21 2012-03-20 International Business Machines Corporation Providing collection transparency information to an end user to achieve a guaranteed quality document search and production in electronic data discovery
US8055665B2 (en) 2008-03-13 2011-11-08 International Business Machines Corporation Sorted search in a distributed directory environment using a proxy server
TWI476610B (en) 2008-04-29 2015-03-11 Maxiscale Inc Peer-to-peer redundant file server system and methods
US7930306B2 (en) 2008-04-30 2011-04-19 Msc Intellectual Properties B.V. System and method for near and exact de-duplication of documents
US20100235354A1 (en) 2009-03-12 2010-09-16 International Business Machines Corporation Collaborative search engine system
CN101576977A (en) 2009-06-01 2009-11-11 中国政法大学 Evidence management system
US8200642B2 (en) 2009-06-23 2012-06-12 Maze Gary R System and method for managing electronic documents in a litigation context
RU2420800C2 (en) 2009-06-30 2011-06-10 Государственное образовательное учреждение высшего профессионального образования Академия Федеральной службы охраны Российской Федерации (Академия ФСО России) Method of searching for electronic documents similar on semantic content, stored on data storage devices
US20100333116A1 (en) 2009-06-30 2010-12-30 Anand Prahlad Cloud gateway system for managing data storage to cloud storage sites
US20110047166A1 (en) 2009-08-20 2011-02-24 Innography, Inc. System and methods of relating trademarks and patent documents
CN101789021A (en) 2010-02-24 2010-07-28 浪潮通信信息系统有限公司 Universal configurable database data migration method
WO2011109558A1 (en) 2010-03-02 2011-09-09 Renew Data Corp. System and method for creating a de-duplicated data set and preserving its metadata
US9361350B2 (en) 2010-03-26 2016-06-07 Salesforce.Com, Inc. Data transfer between first and second databases
US8346780B2 (en) 2010-04-16 2013-01-01 Hitachi, Ltd. Integrated search server and integrated search method
CN101819592A (en) 2010-04-19 2010-09-01 山东高效能服务器和存储研究院 Universal mass historical data processing method for crossing operating system
US20110320494A1 (en) 2010-06-28 2011-12-29 Martin Fisher Litigation document management linking unstructured documents with business objects
KR101064981B1 (en) 2010-10-07 2011-09-15 한국과학기술정보연구원 Apparatus and method for providing resource search information marked the relationship between research subject using of knowledge base combined multiple resource
US8515962B2 (en) 2011-03-30 2013-08-20 Sap Ag Phased importing of objects
US9817898B2 (en) * 2011-11-14 2017-11-14 Microsoft Technology Licensing, Llc Locating relevant content items across multiple disparate content sources

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090150168A1 (en) * 2007-12-07 2009-06-11 Sap Ag Litigation document management
US20100017366A1 (en) * 2008-07-18 2010-01-21 Robertson Steven L System and Method for Performing Contextual Searches Across Content Sources
US20100293178A1 (en) * 2009-05-14 2010-11-18 Microsoft Corporation Providing tools for navigational search query results
US20110082848A1 (en) * 2009-10-05 2011-04-07 Lev Goldentouch Systems, methods and computer program products for search results management

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11023828B2 (en) 2010-05-25 2021-06-01 Open Text Holdings, Inc. Systems and methods for predictive coding
US11282000B2 (en) 2010-05-25 2022-03-22 Open Text Holdings, Inc. Systems and methods for predictive coding
US20170032039A1 (en) * 2011-11-14 2017-02-02 Microsoft Technology Licensing, Llc Locating relevant content items across multiple disparate content sources
US9996618B2 (en) * 2011-11-14 2018-06-12 Microsoft Technology Licensing, Llc Locating relevant content items across multiple disparate content sources
US20180219939A1 (en) * 2013-03-05 2018-08-02 Fuji Xerox Co., Ltd. Relay apparatus, client apparatus, and computer-readable medium
US9647870B2 (en) * 2013-03-05 2017-05-09 Fuji Xerox Co., Ltd. Relay apparatus, system, and computer-readable medium
US20140258377A1 (en) * 2013-03-05 2014-09-11 Fuji Xerox Co., Ltd. Relay apparatus, system, and computer-readable medium
US20140258468A1 (en) * 2013-03-05 2014-09-11 Fuji Xerox Co., Ltd. Relay apparatus, client apparatus, and computer-readable medium
US10958715B2 (en) * 2013-03-05 2021-03-23 Fuji Xerox Co., Ltd. Relay apparatus, client apparatus, and computer-readable medium
US10574738B2 (en) * 2013-03-05 2020-02-25 Fuji Xerox Co., Ltd. Relay apparatus, client apparatus, and computer-readable medium
US20180196807A1 (en) * 2013-06-13 2018-07-12 John F. Groom Alternative search methodology
US10949459B2 (en) * 2013-06-13 2021-03-16 John F. Groom Alternative search methodology
US9336332B2 (en) * 2013-08-28 2016-05-10 Clipcard Inc. Programmatic data discovery platforms for computing applications
US9846718B1 (en) * 2014-03-31 2017-12-19 EMC IP Holding Company LLC Deduplicating sets of data blocks
US10229208B2 (en) * 2014-07-28 2019-03-12 Facebook, Inc. Optimization of query execution
US20160026718A1 (en) * 2014-07-28 2016-01-28 Facebook, Inc. Optimization of Query Execution
US11416345B2 (en) * 2015-06-15 2022-08-16 Open Text Sa Ulc Systems and methods for content server make disk image operation
US20190332601A1 (en) * 2016-08-09 2019-10-31 Ripcord Inc. Systems and methods for contextual retrieval and contextual display of records
US11030199B2 (en) * 2016-08-09 2021-06-08 Ripcord Inc. Systems and methods for contextual retrieval and contextual display of records
US11232068B2 (en) 2017-03-27 2022-01-25 Microsoft Technology Licensing, Llc Unified document retention management system
US11250137B2 (en) * 2017-04-04 2022-02-15 Kenna Security Llc Vulnerability assessment based on machine inference
US10902066B2 (en) 2018-07-23 2021-01-26 Open Text Holdings, Inc. Electronic discovery using predictive filtering
US11294925B2 (en) * 2018-09-24 2022-04-05 Jpmorgan Chase Bank, N.A. Methods for implementing and using a database actuator
CN113779374A (en) * 2021-02-24 2021-12-10 北京京东振世信息技术有限公司 Page query management method and device

Also Published As

Publication number Publication date
WO2013074378A2 (en) 2013-05-23
EP2780838A4 (en) 2015-10-14
US20170032039A1 (en) 2017-02-02
US9996618B2 (en) 2018-06-12
EP2780838A2 (en) 2014-09-24
US9817898B2 (en) 2017-11-14
WO2013074378A3 (en) 2013-07-18
EP2780838B1 (en) 2019-08-07
CN102999574A (en) 2013-03-27

Similar Documents

Publication Publication Date Title
US9996618B2 (en) Locating relevant content items across multiple disparate content sources
US20130124562A1 (en) Export of content items from multiple, disparate content sources
US8645349B2 (en) Indexing structures using synthetic document summaries
US10311062B2 (en) Filtering structured data using inexact, culture-dependent terms
US9690875B2 (en) Providing search results for mobile computing devices
EP2215568B1 (en) Presenting and navigating content having varying properties
US20160283607A1 (en) Search engine
US10430448B2 (en) Computer-implemented method of and system for searching an inverted index having a plurality of posting lists
US20110072036A1 (en) Page-based content storage system
US8478756B2 (en) Contextual document attribute values
AU2014318151B2 (en) Smart search refinement
US7774345B2 (en) Lightweight list collection
WO2012054309A1 (en) Framework for custom actions on an information feed
US9552378B2 (en) Method and apparatus for saving search query as metadata with an image
US20170371978A1 (en) Method and apparatus for managing a document index
US8156144B2 (en) Metadata search interface
US20050246387A1 (en) Method and apparatus for managing and manipulating digital files at the file component level
US20210209105A1 (en) Query adaptation for a search service in a content management system
US10565250B2 (en) Identifying and displaying related content
US20130297576A1 (en) Efficient in-place preservation of content across content sources
Berhe et al. Digital libraries with J‐ISIS: a preliminary account of possibilities and performance

Legal Events

Date Code Title Description
AS Assignment

Owner name: MICROSOFT CORPORATION, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:STEVENSON, BRADLEY;HARMETZ, ADAM DAVID;CHRISTENSEN, QUENTIN GARY;AND OTHERS;SIGNING DATES FROM 20111104 TO 20111110;REEL/FRAME:027218/0575

AS Assignment

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034544/0001

Effective date: 20141014

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4