WO2012011940A1 - Generating predictive models on supplemental workbook data - Google Patents

Generating predictive models on supplemental workbook data Download PDF

Info

Publication number
WO2012011940A1
WO2012011940A1 PCT/US2011/001250 US2011001250W WO2012011940A1 WO 2012011940 A1 WO2012011940 A1 WO 2012011940A1 US 2011001250 W US2011001250 W US 2011001250W WO 2012011940 A1 WO2012011940 A1 WO 2012011940A1
Authority
WO
WIPO (PCT)
Prior art keywords
computer
processor
data
executable instructions
predictive model
Prior art date
Application number
PCT/US2011/001250
Other languages
French (fr)
Inventor
James C. Maclennan
Ioan Bogdan Crivat
Original Assignee
Predixion Software, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Predixion Software, Inc. filed Critical Predixion Software, Inc.
Publication of WO2012011940A1 publication Critical patent/WO2012011940A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets

Definitions

  • the present disclosure generally relates to generating predictive models and related data mining operations.
  • a method and system that fill a usability and collaboration void that exists in current product offerings from Microsoft, SAS, SPSS and other vendors are disclosed.
  • the disclosed embodiments will allow a greater number of users to create successful predictive analysis projects by providing various tools to create simple and accurate predictive models without requiring
  • One embodiment is directed to a computer-implemented method of generating a predictive model.
  • a computer is used to provide a spreadsheet environment comprising data. Supplemental data is defined and stored in a non- worksheet format in the spreadsheet environment. A predictive analytic is performed on the supplemental data. A scalable predictive model is generated in the non-worksheet format.
  • This method may be implemented in a computer- readable storage medium or in a computer system.
  • Figure 1 is a block diagram illustrating a computer system that can be programmed to implement various embodiments.
  • Figure 2 is a block diagram illustrating an example system for generating a predictive model on supplemental workbook data according to one embodiment.
  • Figure 3 is a flow diagram illustrating an example process for generating a predictive model on supplemental workbook data according to another embodiment.
  • the disclosed subject matter contains multiple components that work together to provide a complete predictive analytics and other product offerings that excel in usability, deployability, collaboration and applicability.
  • the disclosed subject matter is designed to address a previously ignored market by allowing a greater number of less technical business users to apply predictive technologies in their business analysis and decision making processes. Tools are provided that can create simple and accurate predictive models without requiring extensive training or specific knowledge of the methodologies currently required to create successful predictive projects.
  • the disclosed subject matter which may be implemented as an add-on to a spreadsheet environment, such as Microsoft's EXCEL® spreadsheet environment, provides scalable user experiences such that business analysts without specific training can create and consume predictive models while at the same time allowing power users the ability to exercise fine-grained control on all modeling aspects.
  • the methods and systems are schedulable and repeatable so that results update over time indicating changes in the trends underlying the
  • DC1 156PCT 3 data DC1 156PCT 3 data. Results are delivered through web-friendly technologies integrated with Microsoft's SHAREPOINT® business collaboration platform, thereby allowing higher level collaboration and integration.
  • the disclosed subject matter handles data differently and is more cost effective and efficient than typical predictive analytics and data mining software since the method incorporates an understanding of the business meaning behind the data, allowing users to provide their data in business terms and produce results back in those same terms.
  • the disclosed subject matter eliminates the requirement that every user understand the requirements of predictive algorithms. Instead, it translates the predictive process to their needs.
  • Figure 1 is a block diagram illustrating a computer system 100 that can be programmed to implement various embodiments described herein.
  • the computer system 100 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the subject matter described herein.
  • the computer system 100 should not be construed as having any dependency or requirement relating to any one component or combination of components shown in Figure 1.
  • the computer system 100 includes a general computing device, such as a computer 102.
  • Components of the computer 102 may include, without limitation, a processing unit 104, a system memory 106, and a system bus 108 that communicates data between the system memory 106, the processing unit 104, and other components of the computer 102.
  • the system bus 108 may incorporate any of a variety of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of
  • DC 1 156PCT 4 bus architectures include, without limitation, Industry Standard Architecture (ISA) bus, Enhanced ISA (EISA) bus, Micro Channel Architecture (MCA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus, also known as Mezzanine bus.
  • ISA Industry Standard Architecture
  • EISA Enhanced ISA
  • MCA Micro Channel Architecture
  • VESA Video Electronics Standards Association
  • PCI Peripheral Component Interconnect
  • the computer 102 also is typically configured to operate with one or more types of processor readable media or computer readable media, collectively referred to herein as "processor readable media.”
  • Processor readable media includes any available media that can be accessed by the computer 102 and includes both volatile and non-volatile media, and removable and nonremovable media.
  • processor readable media may include storage media and communication media.
  • Storage media includes both volatile and non-volatile, and removable and non-removable media implemented in any method or technology for storage of information such as processor-readable instructions, data structures, program modules, or other data.
  • Storage media includes, but is not limited to, RAM, ROM,
  • Communication media typically embodies processor-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
  • modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
  • communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media. Combinations of any of the above are also intended to be included within the scope of processor readable media.
  • the system memory 106 includes computer storage media in the form of volatile memory, non-volatile memory, or both, such as read only memory (ROM) 1 10 and random access memory (RAM) 1 12.
  • ROM read only memory
  • RAM random access memory
  • a basic input/output system (BIOS) 1 14 contains the basic routines that facilitate the transfer of information between components of the computer 102, for example, during start-up.
  • the BIOS 1 14 is typically stored in ROM 1 10.
  • RAM 1 12 typically includes data, such as program modules, that are immediately accessible to or presently operated on by the processing unit 104.
  • Figure 1 depicts an operating system 1 16, application programs 1 18, other program modules 120, and program data 122 as being stored in RAM 1 12.
  • the computer 102 may also include other removable or nonremovable, volatile or non-volatile computer storage media.
  • Figure 1 illustrates a hard disk drive 124 that communicates with the system bus 108 via a non-removable memory interface 126 and that reads from or writes to a non-removable, non-volatile magnetic medium, a magnetic disk drive 128 that communicates with the system bus 108 via a removable memory interface 130 and that reads from or writes to a removable, non-volatile magnetic disk 132, and an optical disk drive 134 that communicates with the system bus 108 via the interface 130 and that reads from or writes to a removable, non-volatile optical disk 136, such as a CD-RW, a DVD-R W, or another optical medium.
  • Other computer storage media that can be used in
  • DC 1 156PCT 6 connection with the computer system 100 include, but are not limited to, flash memory, solid state RAM, solid state ROM, magnetic tape cassettes, digital video tape, etc.
  • the devices and their associated computer storage media disclosed above and illustrated in Figure 1 provide storage of computer readable instructions, data structures, program modules, and other data that are used by the computer 102.
  • the hard disk drive 124 is illustrated as storing an operating system 138, application programs 140, other program modules 142, and program data 144. These components can be the same as or different from the operating system 1 16, the application programs 1 18, the other program modules 120, and the program data 122 that are stored in the RAM 1 12. In any event, the components stored by the hard disk drive 124 are different copies from the components stored by the RAM 1 12.
  • a user may enter commands and information into the computer 102 using input devices, such as a keyboard 146 and a pointing device 148, such as a mouse, trackball, or touch pad.
  • input devices may be connected to the processing unit 104 via a user input interface 150 that is connected to the system bus 108.
  • input devices can be connected to the processing unit 104 via other interface and bus structures, such as a parallel port, a game port, or a universal serial bus (USB).
  • USB universal serial bus
  • a graphics interface 152 can also be connected to the system bus
  • One or more graphics processing units (GPUs) 154 may communicate with the graphics interface 152.
  • graphics processing units (GPUs) 154 may communicate with the graphics interface 152.
  • a monitor 156 or other type of display device is also connected to the system bus 108 via an interface, such as a video interface 158, which may in turn communicate with video memory 160.
  • the computer system 100 may also include other peripheral output
  • DC1 I 56PCT 7 devices such as speakers 162 and a printer 164, which may be connected to the computer 102 through an output peripheral interface 166.
  • the computer 102 may operate in a networked or distributed computing environment using logical connections to one or more remote computers, such as a remote computer 168.
  • the remote computer 168 may be a personal computer, a server, a router, a network PC, a peer device, or another common network node, and may include many or all of the components disclosed above relative to the computer 102.
  • the logical connections depicted in Figure 1 include a local area network (LAN) 170 and a wide area network (WAN) 172, but may also include other networks and buses.
  • LAN local area network
  • WAN wide area network
  • Such networking environments are common in homes, offices, enterprise-wide computer networks, intranets, and the Internet.
  • the computer 102 When the computer 102 is used in a LAN networking environment, it may be connected to the LAN 170 through a wired or wireless network interface or adapter 174. When used in a WAN networking environment, the computer 102 may include a modem 176 or other means for establishing communications over the WAN 172, such as the Internet.
  • the modem 176 may be internal or external to the computer 102 and may be connected to the system bus 108 via the user input interface 150 or another appropriate component.
  • the modem 176 may be a cable or other broadband modem, a dial-up modem, a wireless modem, or any other suitable
  • program modules depicted as being stored in the computer 102 may be stored in a remote memory storage device associated with the remote computer 168.
  • remote application programs may be stored in such a remote memory storage device. It will be appreciated that the network connections shown in
  • DC 1 156PCT 8 Figure 1 are exemplary and that other means of establishing a communication link between the computer 102 and the remote computer 168 may be used.
  • a method, system and apparatus are provided for performing predictive analytics on data and metadata that are not currently stored in worksheets, such as worksheets in the Microsoft EXCEL® spreadsheet environment.
  • worksheets typically store data as a series of objects made up of cells indexed by rows and columns that can be manipulated through a graphical user interface.
  • some data and metadata known as supplemental data, are stored in arbitrary non-worksheet objects contained within a workbook file or object.
  • the disclosed embodiments also incorporate a method and system for performing predictive analytics on data and metadata stored in the workbook or extracted from the workbook into intermediate forms for the purpose of predictive analytics and data mining.
  • Predictive analytics includes methods used to examine datasets for patterns that can be applied to new data.
  • predictive analytic techniques include, but are not limited to, decision tree, neural networks, regression, association rules, clustering and segmentation, time series analysis, text mining, support vector machines, K nearest neighbors and others.
  • Predictive analytic methods include processing data to learn patterns as well as applying patterns to new data.
  • the disclosed subject matter relates to both applications when they are applied to non-worksheet data stored in a workbook file or objects. Furthermore, the disclosed subject matter relates
  • DO 156PCT 9 to a method of storing predictive models and predictive results as supplemental data inside workbook objects and files. This method includes storing the learned patterns as well as predictions made by applying the patterns to new data, independent of where the original data was stored.
  • FIG. 2 is a block diagram illustrating an example system 200 for generating predictive models on supplemental workbook data according to an embodiment of the disclosed subject matter.
  • the system 200 includes a client component 202 that is hosted as an add-in to a spreadsheet environment 204, such as, for example, Microsoft's EXCEL® spreadsheet environment or the OpenOffice.org Calc spreadsheet environment.
  • a spreadsheet environment 204 such as, for example, Microsoft's EXCEL® spreadsheet environment or the OpenOffice.org Calc spreadsheet environment.
  • the client component 202 functions as an easy to use client for a runtime component 206 that generates predictive models.
  • the client component 202 also provides integration with data from a variety of software packages, such as, for example, the Microsoft PowerPivot add-in 208 for the Microsoft EXCEL® spreadsheet environment.
  • the Microsoft PowerPivot add-in 208 serves as an interface to the Microsoft PowerPivot engine 210.
  • the client component 202 serves as a launching point for a third-party analytics client application 212, such as the R statistical computing environment, available from Revolution Analytics of Palo Alto, California.
  • the runtime 206 leverages data access from the Microsoft PowerPivot add-in 208 and/or other software packages and column-based storage structures and embeds custom and intermediate results directly into a file format used by the spreadsheet environment 204.
  • the spreadsheet environment 204 is embodied as Microsoft's EXCEL® spreadsheet environment, the runtime
  • DC1 156PCT 10 component 206 may embed custom and intermediate results into an .XLS or .XLSX format. In this way, the runtime component 206 may facilitate low-level collaboration between users by sharing files in the format used by the spreadsheet environment 204. Users who have the client component 202 and the runtime component 206 can perform analyses and interact with the embedded predictive models generated by the runtime component 206. Users who do not have the client component 202 or the runtime component 206 may not be able to perform analyses and interact with the embedded predictive models, but they can still open the workbook to view completed analyses.
  • FIG. 3 is a flow diagram illustrating an example method 300 for generating a predictive model.
  • a computer is used to provide a spreadsheet environment comprising data.
  • the data is stored in a worksheet format in the spreadsheet environment.
  • supplemental data is defined that is, unlike the data stored in the worksheet format, stored in a non- worksheet format.
  • a predictive analytic is performed on the supplemental data at a step 306.
  • the predictive analytic may involve processing the worksheet data to learn or determine a pattern and applying the learned pattern to new data.
  • the predictive analytic may include one or more of one of a decision tree, a neural network, a regression, an association rule, clustering and segmentation, a time series analysis, text mining, a support vector machine, and determining a nearest neighbor.
  • the learned or determined pattern, along with the results of applying that pattern to new data, can be stored in the non-worksheet format.
  • a scalable predictive model is generated and is stored in the non- worksheet format.
  • the predictive model may be shared with other users at an
  • DO 156PCT 1 1 optional step 310.
  • An analytics client application such as the R statistical computing environment, may be invoked at an optional step 312.

Abstract

A method and system that generate a predictive model on supplemental workbook data are disclosed. A computer is used to provide a spreadsheet environment comprising data. Supplemental data is defined and stored in a non- worksheet format in the spreadsheet environment. A predictive analytic is performed on the supplemental data. A scalable predictive model is generated in the non-worksheet format. In this way, a greater number of users can create successful predictive analysis projects by providing various tools to create simple and accurate predictive models without requiring extensive training or specific knowledge of the methodologies that are currently required.

Description

GENERATING PREDICTIVE MODELS ON SUPPLEMENTAL
WORKBOOK DATA
CROSS-REFERENCE TO RELATED APPLICATION [0001] This application claims the benefit of United States Provisional
Patent Application Ser. No. 61/399,866, filed July 19, 2010.
TECHNICAL FIELD
[0002] The present disclosure generally relates to generating predictive models and related data mining operations.
BACKGROUND
[0003] The leading companies in the predictive analytics arena include, but are not limited to, SAS and SPSS (now IBM) both of which have developed proprietary solutions based on older technologies. These solutions are expensive and are sold as enterprise or service licenses. Other larger sized companies in the analytics arena include SAP (BusinessObjects), Microsoft and Oracle (OBIEE/Hyperion). SUMMARY
[0004] A method and system that fill a usability and collaboration void that exists in current product offerings from Microsoft, SAS, SPSS and other vendors are disclosed. The disclosed embodiments will allow a greater number of users to create successful predictive analysis projects by providing various tools to create simple and accurate predictive models without requiring
DCl 156PCT 1 extensive training or specific knowledge of the methodologies that are currently required.
[0005] One embodiment is directed to a computer-implemented method of generating a predictive model. A computer is used to provide a spreadsheet environment comprising data. Supplemental data is defined and stored in a non- worksheet format in the spreadsheet environment. A predictive analytic is performed on the supplemental data. A scalable predictive model is generated in the non-worksheet format. This method may be implemented in a computer- readable storage medium or in a computer system.
[0006] These and other features, aspects, and advantages of the disclosed subject matter will be apparent to those skilled in the art from the following detailed description of preferred non-limiting exemplary
embodiments, taken together with the drawings and the claims that follow. BRIEF DESCRIPTION OF THE DRAWINGS
[0007] It is to be understood that the drawings are to be used for the purposes of exemplary illustration only and not as a definition of the limits of the disclosed subject matter. Throughout the disclosure, the word "exemplary" is used exclusively to mean "serving as an example, instance or illustration." Any embodiment described as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
[0008] Figure 1 is a block diagram illustrating a computer system that can be programmed to implement various embodiments.
[0009] Figure 2 is a block diagram illustrating an example system for generating a predictive model on supplemental workbook data according to one embodiment.
DC! 156PCT 2 [0010] Figure 3 is a flow diagram illustrating an example process for generating a predictive model on supplemental workbook data according to another embodiment. DETAILED DESCRIPTION OF EMBODIMENTS
[001 1 ] The detailed description set forth below in connection with the appended drawings is intended as a description of presently non-limiting, exemplary, preferred embodiments of the invention and is not intended to represent the only forms in which the present invention may be construed, constructed and/or utilized.
[0012] The disclosed subject matter contains multiple components that work together to provide a complete predictive analytics and other product offerings that excel in usability, deployability, collaboration and applicability. The disclosed subject matter is designed to address a previously ignored market by allowing a greater number of less technical business users to apply predictive technologies in their business analysis and decision making processes. Tools are provided that can create simple and accurate predictive models without requiring extensive training or specific knowledge of the methodologies currently required to create successful predictive projects.
[0013] The disclosed subject matter, which may be implemented as an add-on to a spreadsheet environment, such as Microsoft's EXCEL® spreadsheet environment, provides scalable user experiences such that business analysts without specific training can create and consume predictive models while at the same time allowing power users the ability to exercise fine-grained control on all modeling aspects. The methods and systems are schedulable and repeatable so that results update over time indicating changes in the trends underlying the
DC1 156PCT 3 data. Results are delivered through web-friendly technologies integrated with Microsoft's SHAREPOINT® business collaboration platform, thereby allowing higher level collaboration and integration.
[0014] The disclosed subject matter handles data differently and is more cost effective and efficient than typical predictive analytics and data mining software since the method incorporates an understanding of the business meaning behind the data, allowing users to provide their data in business terms and produce results back in those same terms. The disclosed subject matter eliminates the requirement that every user understand the requirements of predictive algorithms. Instead, it translates the predictive process to their needs.
EXAMPLE OPERATING ENVIRONMENT
[0015] Figure 1 is a block diagram illustrating a computer system 100 that can be programmed to implement various embodiments described herein. The computer system 100 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the subject matter described herein. The computer system 100 should not be construed as having any dependency or requirement relating to any one component or combination of components shown in Figure 1.
[0016] The computer system 100 includes a general computing device, such as a computer 102. Components of the computer 102 may include, without limitation, a processing unit 104, a system memory 106, and a system bus 108 that communicates data between the system memory 106, the processing unit 104, and other components of the computer 102. The system bus 108 may incorporate any of a variety of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of
DC 1 156PCT 4 bus architectures. These architectures include, without limitation, Industry Standard Architecture (ISA) bus, Enhanced ISA (EISA) bus, Micro Channel Architecture (MCA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus, also known as Mezzanine bus.
[0017] The computer 102 also is typically configured to operate with one or more types of processor readable media or computer readable media, collectively referred to herein as "processor readable media." Processor readable media includes any available media that can be accessed by the computer 102 and includes both volatile and non-volatile media, and removable and nonremovable media. By way of example, and not limitation, processor readable media may include storage media and communication media. Storage media includes both volatile and non-volatile, and removable and non-removable media implemented in any method or technology for storage of information such as processor-readable instructions, data structures, program modules, or other data. Storage media includes, but is not limited to, RAM, ROM,
EEPROM, flash memory or other memory technology, CD-ROM, digital versatile discs (DVDs) or other optical disc storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and that can be accessed by the computer 102. Communication media typically embodies processor-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not
DC 1 156PCT 5 limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media. Combinations of any of the above are also intended to be included within the scope of processor readable media.
[0018] The system memory 106 includes computer storage media in the form of volatile memory, non-volatile memory, or both, such as read only memory (ROM) 1 10 and random access memory (RAM) 1 12. A basic input/output system (BIOS) 1 14 contains the basic routines that facilitate the transfer of information between components of the computer 102, for example, during start-up. The BIOS 1 14 is typically stored in ROM 1 10. RAM 1 12 typically includes data, such as program modules, that are immediately accessible to or presently operated on by the processing unit 104. By way of example, and not limitation, Figure 1 depicts an operating system 1 16, application programs 1 18, other program modules 120, and program data 122 as being stored in RAM 1 12.
[0019] The computer 102 may also include other removable or nonremovable, volatile or non-volatile computer storage media. By way of example, and not limitation, Figure 1 illustrates a hard disk drive 124 that communicates with the system bus 108 via a non-removable memory interface 126 and that reads from or writes to a non-removable, non-volatile magnetic medium, a magnetic disk drive 128 that communicates with the system bus 108 via a removable memory interface 130 and that reads from or writes to a removable, non-volatile magnetic disk 132, and an optical disk drive 134 that communicates with the system bus 108 via the interface 130 and that reads from or writes to a removable, non-volatile optical disk 136, such as a CD-RW, a DVD-R W, or another optical medium. Other computer storage media that can be used in
DC 1 156PCT 6 connection with the computer system 100 include, but are not limited to, flash memory, solid state RAM, solid state ROM, magnetic tape cassettes, digital video tape, etc.
[0020] The devices and their associated computer storage media disclosed above and illustrated in Figure 1 provide storage of computer readable instructions, data structures, program modules, and other data that are used by the computer 102. In Figure 1 , for example, the hard disk drive 124 is illustrated as storing an operating system 138, application programs 140, other program modules 142, and program data 144. These components can be the same as or different from the operating system 1 16, the application programs 1 18, the other program modules 120, and the program data 122 that are stored in the RAM 1 12. In any event, the components stored by the hard disk drive 124 are different copies from the components stored by the RAM 1 12.
[0021] A user may enter commands and information into the computer 102 using input devices, such as a keyboard 146 and a pointing device 148, such as a mouse, trackball, or touch pad. These and other input devices may be connected to the processing unit 104 via a user input interface 150 that is connected to the system bus 108. Alternatively, input devices can be connected to the processing unit 104 via other interface and bus structures, such as a parallel port, a game port, or a universal serial bus (USB).
[0022] A graphics interface 152 can also be connected to the system bus
108. One or more graphics processing units (GPUs) 154 may communicate with the graphics interface 152. A monitor 156 or other type of display device is also connected to the system bus 108 via an interface, such as a video interface 158, which may in turn communicate with video memory 160. In addition to the monitor 156, the computer system 100 may also include other peripheral output
DC1 I 56PCT 7 devices, such as speakers 162 and a printer 164, which may be connected to the computer 102 through an output peripheral interface 166.
[0023] The computer 102 may operate in a networked or distributed computing environment using logical connections to one or more remote computers, such as a remote computer 168. The remote computer 168 may be a personal computer, a server, a router, a network PC, a peer device, or another common network node, and may include many or all of the components disclosed above relative to the computer 102. The logical connections depicted in Figure 1 include a local area network (LAN) 170 and a wide area network (WAN) 172, but may also include other networks and buses. Such networking environments are common in homes, offices, enterprise-wide computer networks, intranets, and the Internet.
[0024] When the computer 102 is used in a LAN networking environment, it may be connected to the LAN 170 through a wired or wireless network interface or adapter 174. When used in a WAN networking environment, the computer 102 may include a modem 176 or other means for establishing communications over the WAN 172, such as the Internet. The modem 176 may be internal or external to the computer 102 and may be connected to the system bus 108 via the user input interface 150 or another appropriate component. The modem 176 may be a cable or other broadband modem, a dial-up modem, a wireless modem, or any other suitable
communication device. In a networked or distributed computing environment, program modules depicted as being stored in the computer 102 may be stored in a remote memory storage device associated with the remote computer 168. For example, remote application programs may be stored in such a remote memory storage device. It will be appreciated that the network connections shown in
DC 1 156PCT 8 Figure 1 are exemplary and that other means of establishing a communication link between the computer 102 and the remote computer 168 may be used.
GENERATING PREDICTIVE MODELS ON SUPPLEMENTAL WORKBOOK DATA
[0025] A method, system and apparatus are provided for performing predictive analytics on data and metadata that are not currently stored in worksheets, such as worksheets in the Microsoft EXCEL® spreadsheet environment. Such worksheets typically store data as a series of objects made up of cells indexed by rows and columns that can be manipulated through a graphical user interface. As contrasted with data that is stored in such typical objects, some data and metadata, known as supplemental data, are stored in arbitrary non-worksheet objects contained within a workbook file or object. The disclosed embodiments also incorporate a method and system for performing predictive analytics on data and metadata stored in the workbook or extracted from the workbook into intermediate forms for the purpose of predictive analytics and data mining.
[0026] Predictive analytics includes methods used to examine datasets for patterns that can be applied to new data. Examples of predictive analytic techniques include, but are not limited to, decision tree, neural networks, regression, association rules, clustering and segmentation, time series analysis, text mining, support vector machines, K nearest neighbors and others.
[0027] Predictive analytic methods include processing data to learn patterns as well as applying patterns to new data. The disclosed subject matter relates to both applications when they are applied to non-worksheet data stored in a workbook file or objects. Furthermore, the disclosed subject matter relates
DO 156PCT 9 to a method of storing predictive models and predictive results as supplemental data inside workbook objects and files. This method includes storing the learned patterns as well as predictions made by applying the patterns to new data, independent of where the original data was stored.
[0028] Figure 2 is a block diagram illustrating an example system 200 for generating predictive models on supplemental workbook data according to an embodiment of the disclosed subject matter. The system 200 includes a client component 202 that is hosted as an add-in to a spreadsheet environment 204, such as, for example, Microsoft's EXCEL® spreadsheet environment or the OpenOffice.org Calc spreadsheet environment.
[0029] The client component 202 functions as an easy to use client for a runtime component 206 that generates predictive models. The client component 202 also provides integration with data from a variety of software packages, such as, for example, the Microsoft PowerPivot add-in 208 for the Microsoft EXCEL® spreadsheet environment. The Microsoft PowerPivot add-in 208, in turn, serves as an interface to the Microsoft PowerPivot engine 210. In addition, the client component 202 serves as a launching point for a third-party analytics client application 212, such as the R statistical computing environment, available from Revolution Analytics of Palo Alto, California.
[0030] Within the spreadsheet environment 204, the runtime component
206 leverages data access from the Microsoft PowerPivot add-in 208 and/or other software packages and column-based storage structures and embeds custom and intermediate results directly into a file format used by the spreadsheet environment 204. For example, if the spreadsheet environment 204 is embodied as Microsoft's EXCEL® spreadsheet environment, the runtime
DC1 156PCT 10 component 206 may embed custom and intermediate results into an .XLS or .XLSX format. In this way, the runtime component 206 may facilitate low-level collaboration between users by sharing files in the format used by the spreadsheet environment 204. Users who have the client component 202 and the runtime component 206 can perform analyses and interact with the embedded predictive models generated by the runtime component 206. Users who do not have the client component 202 or the runtime component 206 may not be able to perform analyses and interact with the embedded predictive models, but they can still open the workbook to view completed analyses.
[0031] Figure 3 is a flow diagram illustrating an example method 300 for generating a predictive model. At a step 302, a computer is used to provide a spreadsheet environment comprising data. The data is stored in a worksheet format in the spreadsheet environment. At a step 304, supplemental data is defined that is, unlike the data stored in the worksheet format, stored in a non- worksheet format. A predictive analytic is performed on the supplemental data at a step 306. The predictive analytic may involve processing the worksheet data to learn or determine a pattern and applying the learned pattern to new data. The predictive analytic may include one or more of one of a decision tree, a neural network, a regression, an association rule, clustering and segmentation, a time series analysis, text mining, a support vector machine, and determining a nearest neighbor.
[0032] The learned or determined pattern, along with the results of applying that pattern to new data, can be stored in the non-worksheet format. At a step 308, a scalable predictive model is generated and is stored in the non- worksheet format. The predictive model may be shared with other users at an
DO 156PCT 1 1 optional step 310. An analytics client application, such as the R statistical computing environment, may be invoked at an optional step 312.
[0033] It will be understood by those who practice the embodiments described herein and those skilled in the art that various modifications and improvements may be made without departing from the spirit and scope of the disclosed embodiments. The scope of protection afforded is to be determined solely by the claims and by the breadth of interpretation allowed by law.
DC 1 156PCT 12

Claims

WHAT IS CLAIMED IS:
1. A computer system comprising a processor (104) configured to receive and to execute processor-executable instructions and a memory device (106) in communication with the processor ( 104), the computer system characterized in that the memory device (106) stores processor-executable instructions that, when executed by the processor (104), cause the processor (104) to:
provide a spreadsheet environment comprising data;
define supplemental data stored in a non-worksheet format in the spreadsheet environment;
perform a predictive analytic on the supplemental data; and
generate a scalable predictive model in the non-worksheet format.
2. The computer system of claim 1, wherein the memory device (106) stores further processor-executable instructions that, when executed by the processor ( 104), cause the processor ( 104) to process the data in the spreadsheet environment to determine a pattern and to store the determined pattern.
3. The computer system of claim 1 , wherein the memory device (106) stores further processor-executable instructions that, when executed by the processor (104), cause the processor (104) to store a result produced by the scalable predictive model.
4. The computer system of claim 1 , wherein the memory device (106) stores further processor-executable instructions that, when executed by the
DC 1 156PCT 13 processor (104), cause the processor (104) to store the scalable predictive model in the non-worksheet format.
5. The computer system of claim 4, wherein the memory device (106) stores further processor-executable instructions that, when executed by the processor (104), cause the processor (104) to share the scalable predictive model.
6. The computer system of claim 1 , wherein the memory device (106) stores further processor-executable instructions that, when executed by the processor (104), cause the processor (104) to invoke an analytics client application.
7. The computer system of claim 1, wherein the predictive analytic comprises at least one of a decision tree, a neural network, a regression, an association rule, clustering and segmentation, a time series analysis, text mining, a support vector machine, and determining a nearest neighbor.
8. A computer-implemented method of generating a predictive model, the method comprising:
using a computer to provide a spreadsheet environment comprising data; defining supplemental data stored in a non-worksheet format in the spreadsheet environment;
performing a predictive analytic on the supplemental data; and generating a scalable predictive model in the non-worksheet format.
DC1 156PCT 14
9. The computer-implemented method of claim 8, further comprising processing the data in the spreadsheet environment to determine a pattern and to store the determined pattern.
10. The computer-implemented method of claim 8, further comprising storing a result produced by the scalable predictive model.
1 1. The computer-implemented method of claim 8, further comprising storing the scalable predictive model in the non-worksheet format.
12. The computer-implemented method of claim 1 1 , further comprising sharing the scalable predictive model.
13. The computer-implemented method of claim 8, further comprising invoking an analytics client application.
14. The computer-implemented method of claim 8, wherein the predictive analytic comprises at least one of a decision tree, a neural network, a regression, an association rule, clustering and segmentation, a time series analysis, text mining, a support vector machine, and determining a nearest neighbor.
15. A computer readable storage medium storing computer-executable instructions that, when executed by a computer, cause the computer to perform a method comprising:
providing a spreadsheet environment comprising data;
DO 156PCT 15 defining supplemental data stored in a non-worksheet format in the spreadsheet environment;
performing a predictive analytic on the supplemental data; and generating a scalable predictive model in the non-worksheet format.
16. The computer readable storage medium of claim 1 5, storing further computer-executable instructions for processing the data in the spreadsheet environment to determine a pattern and to store the determined pattern.
17. The computer readable storage medium of claim 15, storing further computer-executable instructions for storing a result produced by the scalable predictive model.
18. The computer readable storage medium of claim 15, storing further computer-executable instructions for storing the scalable predictive model in the non-worksheet format.
19. The computer readable storage medium of claim 18, storing further computer-executable instructions for sharing the scalable predictive model.
20. The computer readable storage medium of claim 15, storing further computer-executable instructions for invoking an analytics client application.
21. The computer readable storage medium of claim 15, wherein the predictive analytic comprises at least one of a decision tree, a neural network, a
DC1 156PCT 16 regression, an association rule, clustering and segmentation, a time series analysis, text mining, a support vector machine, and determining a nearest neighbor.
DC1 156PCT 17
PCT/US2011/001250 2010-07-19 2011-07-16 Generating predictive models on supplemental workbook data WO2012011940A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US39986610P 2010-07-19 2010-07-19
US61/399,866 2010-07-19

Publications (1)

Publication Number Publication Date
WO2012011940A1 true WO2012011940A1 (en) 2012-01-26

Family

ID=45497114

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2011/001250 WO2012011940A1 (en) 2010-07-19 2011-07-16 Generating predictive models on supplemental workbook data

Country Status (1)

Country Link
WO (1) WO2012011940A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5692107A (en) * 1994-03-15 1997-11-25 Lockheed Missiles & Space Company, Inc. Method for generating predictive models in a computer system
US7418431B1 (en) * 1999-09-30 2008-08-26 Fair Isaac Corporation Webstation: configurable web-based workstation for reason driven data analysis
US20100114554A1 (en) * 2008-11-05 2010-05-06 Accenture Global Services Gmbh Predictive modeling

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5692107A (en) * 1994-03-15 1997-11-25 Lockheed Missiles & Space Company, Inc. Method for generating predictive models in a computer system
US7418431B1 (en) * 1999-09-30 2008-08-26 Fair Isaac Corporation Webstation: configurable web-based workstation for reason driven data analysis
US20100114554A1 (en) * 2008-11-05 2010-05-06 Accenture Global Services Gmbh Predictive modeling

Similar Documents

Publication Publication Date Title
CN107111702B (en) Access blocking for data loss prevention in a collaborative environment
CN108292231B (en) Method and system for generating applications from data
EP2780892B1 (en) Controlling rights to a drawing in a three-dimensional modeling environment
US8122429B2 (en) Method, system and program product for developing a data model in a data mining system
US7743071B2 (en) Efficient data handling representations
US8661065B2 (en) Systems and methods for providing a data glossary management system
CN111930370A (en) Visualized page processing method and device, computer equipment and storage medium
Yassine Investigating product development process reliability and robustness using simulation
JP2006285955A (en) Comparison and contrast of business model
US20080208874A1 (en) Handling multi-dimensional data including writeback data
CN106415586A (en) Fast access rights checking of configured structure data
CN102150164A (en) Data schema transformation using declarative transformations
Ribeiro et al. Association between population distribution and urban GDP scaling
WO2019244036A1 (en) Method and server for access verification in an identity and access management system
US20090063438A1 (en) Regulatory compliance data scraping and processing platform
CN112861056A (en) Enterprise website construction information display and release system and method
US20140282123A1 (en) Executable guidance experiences based on implicitly generated guidance models
US20210286823A1 (en) Systems and methods for integrated dynamic runtime etl tool and scalable analytics server platform
TWI680411B (en) Electronic form building system and method
Lamata et al. Obtaining OWA operators starting from a linear order and preference quantifiers
WO2012011940A1 (en) Generating predictive models on supplemental workbook data
JP2022180289A (en) Quality information output apparatus, quality information output method, and program
WO2009067332A2 (en) Secure authoring and execution of user-entered database programming
Michalakoudis et al. Using functional analysis diagrams as a design tool
Noorshams et al. A generic approach for Architecture-level performance modeling and prediction of virtualized storage systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11809967

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11809967

Country of ref document: EP

Kind code of ref document: A1