US20100205477A1 - Memory Handling Techniques To Facilitate Debugging - Google Patents

Memory Handling Techniques To Facilitate Debugging Download PDF

Info

Publication number
US20100205477A1
US20100205477A1 US12/726,129 US72612910A US2010205477A1 US 20100205477 A1 US20100205477 A1 US 20100205477A1 US 72612910 A US72612910 A US 72612910A US 2010205477 A1 US2010205477 A1 US 2010205477A1
Authority
US
United States
Prior art keywords
memory
computer system
error
interrupt
response
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/726,129
Inventor
Brian Watson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Interactive Entertainment Inc
Sony Network Entertainment Platform Inc
Original Assignee
Sony Computer Entertainment Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Computer Entertainment Inc filed Critical Sony Computer Entertainment Inc
Priority to US12/726,129 priority Critical patent/US20100205477A1/en
Publication of US20100205477A1 publication Critical patent/US20100205477A1/en
Assigned to SONY NETWORK ENTERTAINMENT PLATFORM INC. reassignment SONY NETWORK ENTERTAINMENT PLATFORM INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: SONY COMPUTER ENTERTAINMENT INC.
Assigned to SONY COMPUTER ENTERTAINMENT INC. reassignment SONY COMPUTER ENTERTAINMENT INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SONY NETWORK ENTERTAINMENT PLATFORM INC.
Assigned to SONY INTERACTIVE ENTERTAINMENT INC. reassignment SONY INTERACTIVE ENTERTAINMENT INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: SONY COMPUTER ENTERTAINMENT INC.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3636Software debugging by tracing the execution of the program

Definitions

  • this disclosure relates to memory handling techniques to facilitate software debugging.
  • processing platform typically use a virtual memory address space that exceeds the platform's addressable physical random access memory (RAM).
  • RAM physical random access memory
  • a virtual memory addressing scheme enables a processor to perform operations (e.g., reads, writes, bitwise operations, arithmetic operations, and the like) using a virtual memory address that typically exceeds the processor's available random access memory (“RAM”).
  • Such virtual memory addresses are translated to physical addresses using a memory management system.
  • a processing platform may include a memory management unit (“MMU”) to manage a virtual address space, providing features such as virtual to physical address translation.
  • MMU memory management unit
  • a conventional MMU splits the virtual address space and physical address space into segments of memory called pages. Any page size may be chosen; however, typically a page size of 4 kilobytes to 32 kilobytes is used.
  • the MMU When a program running on a processing platform attempts to access a virtual memory address, the MMU attempts to facilitate the requested access, for example, by performing one or more of the following operations: (i) determining if the memory corresponding to the virtual memory address resides in the cache; (ii) translating the virtual memory address to a physical memory address; (iii) determining if the requesting process has sufficient privileges to access the referenced address; and (iv) determining whether the referenced memory is paged out to secondary storage. If the MMU is unable to provide access to the referenced memory address through translation to physical memory or through the cache, then a fault may be raised.
  • an error handler may simply load the page to physical memory and continue. However, if the fault is caused by insufficient privileges or illegal requests, then the system may not be able to recover and a crash may occur. When a crash occurs various information may be stored for later debugging and analysis.
  • One problem with debugging a crash is a lack of information regarding the events that led to the crash.
  • a programmer may only have information regarding the state of the platform at the time of the crash; however, the event that ultimately caused the crash may have occurred earlier.
  • One solution to this problem is to configure a program to save certain state information every time a procedure is executed. While this technique may be effective, such debugging granularity creates significant overhead. For programs where timing may be important, the added overhead may disrupt such timing, hindering the debugging process. Accordingly, there is a need for an improved debugging technique that provides sufficient granularity to identify and correct problems without excessively increasing system overhead.
  • a method for debugging that includes interacting with a memory management component to force an interrupt upon access to one or more memory locations during software execution.
  • information regarding the execution of the software is saved allowing interaction with the memory management component to disable the interrupt upon access to the one or more memory locations during software execution.
  • an interrupt handler initially and/or subsequently interacts with the memory management component to force interrupts.
  • This interrupt handler may be executed upon the occurrence of an event, after a predetermined period of time, and the like.
  • debugging information is stored.
  • This debugging information may include any information, including the time and/or date, the list of application(s) being executed, process information, the contents of registers and/or memory locations, stack contents, and the like. Such information may be used to automatically recover from a crash by restoring saved information and resuming execution.
  • a system that includes a memory, a memory management component coupled to the memory, and a processor coupled to the memory management component.
  • the processor is configured to facilitate debugging by interacting with the memory management component to force an interrupt upon access to one or more memory locations during software execution.
  • information regarding the execution of the software is saved to allow interaction with the memory management component to disable the interrupt upon access to the one or more memory locations during software execution.
  • the memory may be implemented using any technology including, random access memory, dynamic random access memory, synchronous dynamic random access memory, and the like.
  • the memory management component provides an interface between the processor and the memory and may be implemented using a memory management unit, which may include a translation lookaside buffer to assist in mapping virtual address space to physical address space.
  • the processor may be implemented using any technology including, a general-purpose microprocessor, an application-specific integrated circuit, a digital signal processor, and the like.
  • the method includes enabling the capture of debugging information upon access to a portion of a memory, such as, for example, a memory address, a block of memory, a page of memory, and the like. Upon access to the portion of the memory, debugging information is saved and the capture of debugging information is disabled upon access to the portion of the memory.
  • This method may be performed by an emulator to capture debugging information in a software and/or hardware emulator environment.
  • the capture of debugging information is periodically re-enabled (e.g., after an event occurs, after a predetermined amount of time elapses, and the like). If a predetermined time period is used, the time period may be adjusted to change the granularity of captured information. For example, a predetermined time period of less than one second may be used.
  • a computer-readable medium includes software operable to facilitate debugging in a processing system.
  • the software includes an interrupt enable component operable to force an interrupt upon access to one or more memory locations during software execution and a debugging information storing component operable to save information regarding the execution of the software in response to the forced interrupt so as to disable the interrupt.
  • the interrupt enable component and the debugging information storing component are each implemented using one or more interrupt handlers.
  • This software also may be used to facilitate automated crash recovery using an automated recovery component operable to restore saved information to prevent a fatal error. For example, when a fault occurs that would otherwise result in a fatal error, the system can restore to a previous state using saved debugging information.
  • FIG. 1 is block diagram of a debugging device, having a processor, a memory management unit, and a memory.
  • FIG. 2 is a block diagram of a memory management unit.
  • FIG. 3 is a diagram of virtual address space to physical address space mapping.
  • FIG. 4 is a sequence diagram of a debugging technique.
  • One way to provide debugging information in a processor-based system is to periodically save debugging information based on memory access. For example, memory-handling information may be modified such that the processor-based system periodically invokes an error handling code. This error handling code saves debugging information and then performs whatever steps are necessary such that program execution can be resumed.
  • a device under test includes a processor 12 operable to access a memory 14 through one or more buses 16 using a memory management unit 18 .
  • Memory-handling information is modified to periodically force error handler invocation, upon which invocation, state information is stored to facilitate debugging.
  • the granularity of error handling information may be modified by changing the frequency of error handler invocation and/or by changing the frequency that state information is saved upon such invocation.
  • the processor 12 may be implemented using any processing device, such as, for example, those found in general computing platforms and entertainment platforms (e.g., computers, video game consoles, portable video game devices, cell phones, personal digital assistants (“PDAs”), and the like).
  • the memory 14 may be implemented using any data storage technology including, for example, random access memory (“RAM”), dynamic RAM (“DRAM”), synchronous DRAM (“SDRAM”), and the like.
  • RAM random access memory
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • the memory 14 is formed of multiple memory modules, which may or may not each employee the same technology (e.g., the memory 14 may include multiple RAM modules and/or a DRAM or SDRAM cache).
  • a cache may be implemented as a component of a memory management unit 18 , as discussed below.
  • the processor 12 , the memory 14 , and the memory management unit 18 are coupled using one or more buses 16 .
  • a single bus 16 is used such that the processor 12 , the memory 14 , and the memory management unit 18 each communicate using the same bus 16 .
  • separate buses are used between the memory management unit 18 and the processor 12 , and between the memory management unit 18 and the memory 14 .
  • the debugging techniques described herein are independent of the bus 16 architecture.
  • One skilled in the art will appreciate that any arrangement of buses 16 , processor(s) 12 , and memories 14 may be employed.
  • some implementations of the memory management unit 18 include a cache 22 and a translation lookaside buffer 24 .
  • the cache 22 is implemented using any data storage technology and, typically, provides improved performance over the memory 14 .
  • the memory management unit 18 translates virtual addresses to physical addresses using a translation lookaside buffer (“TLB”) 24 , as discussed below in more detail with reference to FIG. 3 .
  • TLB translation lookaside buffer
  • the translation lookaside buffer (“TLB”) 24 translates virtual addresses 32 to physical addresses 34 using TLB table 36 .
  • the TLB table 36 maintains entries for one or more virtual addresses 32 , which include flags 37 and a physical mapping 38 .
  • the memory management unit 18 typically operates on pages of data within virtual and physical address spaces ( 32 and 34 ), instead of individual memory addresses. Any page size may be employed, such as, for example, 1 kilobyte, 4 kilobytes, 16 kilobytes, or any other page size.
  • the TLB table 36 includes one or more flags 37 used to maintain various information, such as, for example, whether an entry is valid, whether an entry is protected, whether an entry has changed, and the like.
  • one technique for providing debugging information in a processing system is to include a debugging component 40 a to capture information about a program under test 40 b when such software attempts to read and/or write to memory using a memory access component 40 c .
  • the debugging component 40 a flags memory to force interrupts ( 41 ).
  • the debugging component 40 a is implemented in software and includes one or more interrupt handlers.
  • a portion of the debugging component 40 a may be implemented as a software application to allow a user, programmer, and/or system administer to selectively enable or disable debugging functionality.
  • Memory may be flagged by the debugging software 40 a either automatically, in response to a user request, or in response to an interrupt.
  • the TLB flags 37 may be set so as to force an interrupt by marking a TLB table 36 entry as read-only or otherwise protected.
  • the program under test 40 b accesses a virtual memory address ( 42 ). If the virtual memory address that is requested is not flagged to force an interrupt, then the memory access component 40 c permits access to the requested address and/or returns the contents of such address ( 43 ). If, however, a requested memory access ( 44 ) generates an interrupt, the debugging component 40 a is activated. In some implementations, debugging software component 40 a and interrupt handlers are not executed except in response to an interrupt forced by such component 40 a . However, some implementations may execute such interrupt handlers more frequently. In this situation, it may be beneficial for the debugging software component 40 a to determine the cause of the interrupt before taking further action.
  • the appropriate interrupt handler When an interrupt occurs that is forced by the debugging software component 40 a , the appropriate interrupt handler saves the current state ( 45 ) and enables access to the requested data ( 46 ). The memory access component 40 c then permits access ( 47 ) to the program under test 40 b .
  • the debugging component 40 a forces interrupts by interacting with the memory access component 40 c to protect data, to mark the date as read-only, or to otherwise flag the data in such a way that an interrupt or other exception will be raised when the program under test 40 b attempts to access (either read and/or write) a portion of the memory 14 .
  • a memory management unit 18 is configured to mark every page in memory 14 as read-only.
  • a program under test 40 b attempts to write to a read-only memory location, a fault occurs and an exception handler is invoked.
  • the exception handler saves certain debugging information, marks the requested page as read/write, and then allows execution to resume. Other memory locations within the page may be accessed and modified without raising an exception unless and until the page is again marked read-only.
  • each page includes 4 kilobytes of addressable memory.
  • each page is marked as read-only. With the first write to virtual address 0x0000, an exception is raised because the page is marked read-only. The exception handler saves debugging information, marks the page as read-write, and execution continues. The program then attempts to write to virtual address 0x10480. Because this address resides in the same page as 0x10000, the page is already marked as read-write and execution continues without exception handler invocation. Because each of the next three accesses reside in different pages marked read-only, each invokes the exception handling and debugging information is saved. Then, execution continues with the last two memory accesses. Because the page is already marked read-write, access to 0x13010 occurs normally; however, the access of address 0x13040 causes a crash.
  • a debugger may be used to restore the system to any of the previous states and a programmer may analyze the code and its execution to attempt to identify the cause of the crash. Crashes may occur for many reasons, including, for example, insufficient access privileges, an illegal address, and the like.
  • pages are periodically re-marked as read-only. This can be performed using an exception handler. For example, a watchdog timer may be used to raise an exception periodically. When the exception is raised, an exception handler then marks one or more pages as read-only. For example, an exception may be raised every 10 ms, every 50 ms, every 100 ms, every second, or at any other time interval. Each time the exception handler is invoked, some or all of the accessed pages may be marked read-only. Implementations may mark all pages read-only (including the ones already marked read-only), may mark changed pages read-only, may mark a subset of pages read-only, or the like. Some implementations perform maintenance functions, such as those described above, during system calls.
  • a program may be restored to a previous operable state using stored debugging information.
  • crashes may be caused by hardware errors that are unrelated (or at least non-deterministically related) to the program execution. In this case, simply restoring to a previous state may be sufficient to allow program execution to continue normally. In other instances, crashes are caused by software bugs. In these cases, a program may be restored to a previous state. If the program is interactive, it may be possible to avoid the interaction that caused the crash; however, there is a possibility that the crash will occur again.
  • Some implementations may provide for a multi-stage restore process whereby the most recently saved state is restored first. If a crash again occurs, then a previously saved state is restored. This process may continue until the user intervenes, or may automatically cease after a predetermined event or occurrence (such as, lapse of time, a predetermined number of repeated failures, and the like).
  • the system simply saves program context every time a page fault occurs. This provides less granularity than the technique discussed above, but may be desirable for some debugging tasks.

Abstract

A method for debugging includes interacting with a memory management component to force an interrupt upon access to one or more memory locations during software execution, and in response to the forced interrupt, saving information regarding the execution of the software, and interacting with the memory management component to disable the interrupt upon access to the one or more memory locations during software execution.

Description

    CROSS-REFERENCE TO RELATED APPLICATION
  • This application is a continuation of U.S. application Ser. No. 11/767,399, filed Jun. 22, 2007, now U.S. Pat. No. 7,689,868, issuing Mar. 30, 2010, the contents of which is incorporated herein by reference in its entirety.
  • TECHNICAL FIELD
  • Briefly, and in general terms, this disclosure relates to memory handling techniques to facilitate software debugging.
  • BACKGROUND
  • General computing platforms and entertainment platforms (including computers, video game consoles, portable video game devices, cell phones, personal digital assistants (“PDAs”) and the like) (hereinafter referred to generally as “processing platform”) typically use a virtual memory address space that exceeds the platform's addressable physical random access memory (RAM). Using virtual memory provides many well known benefits.
  • A virtual memory addressing scheme enables a processor to perform operations (e.g., reads, writes, bitwise operations, arithmetic operations, and the like) using a virtual memory address that typically exceeds the processor's available random access memory (“RAM”). Such virtual memory addresses are translated to physical addresses using a memory management system. For example, a processing platform may include a memory management unit (“MMU”) to manage a virtual address space, providing features such as virtual to physical address translation. A conventional MMU splits the virtual address space and physical address space into segments of memory called pages. Any page size may be chosen; however, typically a page size of 4 kilobytes to 32 kilobytes is used.
  • When a program running on a processing platform attempts to access a virtual memory address, the MMU attempts to facilitate the requested access, for example, by performing one or more of the following operations: (i) determining if the memory corresponding to the virtual memory address resides in the cache; (ii) translating the virtual memory address to a physical memory address; (iii) determining if the requesting process has sufficient privileges to access the referenced address; and (iv) determining whether the referenced memory is paged out to secondary storage. If the MMU is unable to provide access to the referenced memory address through translation to physical memory or through the cache, then a fault may be raised. If, for example, a page fault is raised because a requested virtual memory address is paged out to secondary storage, an error handler may simply load the page to physical memory and continue. However, if the fault is caused by insufficient privileges or illegal requests, then the system may not be able to recover and a crash may occur. When a crash occurs various information may be stored for later debugging and analysis.
  • One problem with debugging a crash is a lack of information regarding the events that led to the crash. A programmer may only have information regarding the state of the platform at the time of the crash; however, the event that ultimately caused the crash may have occurred earlier. One solution to this problem is to configure a program to save certain state information every time a procedure is executed. While this technique may be effective, such debugging granularity creates significant overhead. For programs where timing may be important, the added overhead may disrupt such timing, hindering the debugging process. Accordingly, there is a need for an improved debugging technique that provides sufficient granularity to identify and correct problems without excessively increasing system overhead.
  • SUMMARY
  • Generally, there is disclosed a method for debugging that includes interacting with a memory management component to force an interrupt upon access to one or more memory locations during software execution. In response to the forced interrupt, information regarding the execution of the software is saved allowing interaction with the memory management component to disable the interrupt upon access to the one or more memory locations during software execution.
  • In some implementations, an interrupt handler initially and/or subsequently interacts with the memory management component to force interrupts. This interrupt handler may be executed upon the occurrence of an event, after a predetermined period of time, and the like.
  • When an interrupt occurs, debugging information is stored. This debugging information may include any information, including the time and/or date, the list of application(s) being executed, process information, the contents of registers and/or memory locations, stack contents, and the like. Such information may be used to automatically recover from a crash by restoring saved information and resuming execution.
  • Additionally, there is disclosed a system that includes a memory, a memory management component coupled to the memory, and a processor coupled to the memory management component. The processor is configured to facilitate debugging by interacting with the memory management component to force an interrupt upon access to one or more memory locations during software execution. In response to the forced interrupt, information regarding the execution of the software is saved to allow interaction with the memory management component to disable the interrupt upon access to the one or more memory locations during software execution.
  • The memory may be implemented using any technology including, random access memory, dynamic random access memory, synchronous dynamic random access memory, and the like. The memory management component provides an interface between the processor and the memory and may be implemented using a memory management unit, which may include a translation lookaside buffer to assist in mapping virtual address space to physical address space. The processor may be implemented using any technology including, a general-purpose microprocessor, an application-specific integrated circuit, a digital signal processor, and the like.
  • Furthermore, there is disclosed a method for capturing debugging information. The method includes enabling the capture of debugging information upon access to a portion of a memory, such as, for example, a memory address, a block of memory, a page of memory, and the like. Upon access to the portion of the memory, debugging information is saved and the capture of debugging information is disabled upon access to the portion of the memory. This method may be performed by an emulator to capture debugging information in a software and/or hardware emulator environment.
  • In some implementations, the capture of debugging information is periodically re-enabled (e.g., after an event occurs, after a predetermined amount of time elapses, and the like). If a predetermined time period is used, the time period may be adjusted to change the granularity of captured information. For example, a predetermined time period of less than one second may be used.
  • In another general implementation, a computer-readable medium includes software operable to facilitate debugging in a processing system. The software includes an interrupt enable component operable to force an interrupt upon access to one or more memory locations during software execution and a debugging information storing component operable to save information regarding the execution of the software in response to the forced interrupt so as to disable the interrupt. In some implementations, the interrupt enable component and the debugging information storing component are each implemented using one or more interrupt handlers.
  • This software also may be used to facilitate automated crash recovery using an automated recovery component operable to restore saved information to prevent a fatal error. For example, when a fault occurs that would otherwise result in a fatal error, the system can restore to a previous state using saved debugging information.
  • The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features and advantages will be apparent from the description and drawings, and from the claims.
  • DESCRIPTION OF DRAWINGS
  • FIG. 1 is block diagram of a debugging device, having a processor, a memory management unit, and a memory.
  • FIG. 2 is a block diagram of a memory management unit.
  • FIG. 3 is a diagram of virtual address space to physical address space mapping.
  • FIG. 4 is a sequence diagram of a debugging technique.
  • DETAILED DESCRIPTION
  • One way to provide debugging information in a processor-based system is to periodically save debugging information based on memory access. For example, memory-handling information may be modified such that the processor-based system periodically invokes an error handling code. This error handling code saves debugging information and then performs whatever steps are necessary such that program execution can be resumed.
  • Referring to FIG. 1, a device under test includes a processor 12 operable to access a memory 14 through one or more buses 16 using a memory management unit 18. Memory-handling information is modified to periodically force error handler invocation, upon which invocation, state information is stored to facilitate debugging. The granularity of error handling information may be modified by changing the frequency of error handler invocation and/or by changing the frequency that state information is saved upon such invocation.
  • The processor 12 may be implemented using any processing device, such as, for example, those found in general computing platforms and entertainment platforms (e.g., computers, video game consoles, portable video game devices, cell phones, personal digital assistants (“PDAs”), and the like). The memory 14 may be implemented using any data storage technology including, for example, random access memory (“RAM”), dynamic RAM (“DRAM”), synchronous DRAM (“SDRAM”), and the like. In some implementations, the memory 14 is formed of multiple memory modules, which may or may not each employee the same technology (e.g., the memory 14 may include multiple RAM modules and/or a DRAM or SDRAM cache). Furthermore, a cache may be implemented as a component of a memory management unit 18, as discussed below.
  • The processor 12, the memory 14, and the memory management unit 18 are coupled using one or more buses 16. In some implementations, a single bus 16 is used such that the processor 12, the memory 14, and the memory management unit 18 each communicate using the same bus 16. In additional implementations, separate buses are used between the memory management unit 18 and the processor 12, and between the memory management unit 18 and the memory 14. The debugging techniques described herein are independent of the bus 16 architecture. One skilled in the art will appreciate that any arrangement of buses 16, processor(s) 12, and memories 14 may be employed.
  • Furthermore, one skilled in the art will appreciate that the techniques disclosed herein may be used by a software and/or hardware emulator, which provides a virtual platform for executing programs. In an emulator, some or all portions of the system may be implemented in software to mimic the functionality of an emulated computing system. Thus, the entire environment shown in FIG. 1 may be provided by an emulator.
  • Referring to FIG. 2, some implementations of the memory management unit 18 include a cache 22 and a translation lookaside buffer 24. The cache 22 is implemented using any data storage technology and, typically, provides improved performance over the memory 14. The memory management unit 18 translates virtual addresses to physical addresses using a translation lookaside buffer (“TLB”) 24, as discussed below in more detail with reference to FIG. 3.
  • Referring to FIG. 3, the translation lookaside buffer (“TLB”) 24 translates virtual addresses 32 to physical addresses 34 using TLB table 36. The TLB table 36 maintains entries for one or more virtual addresses 32, which include flags 37 and a physical mapping 38. To reduce the overhead of memory management, the memory management unit 18 typically operates on pages of data within virtual and physical address spaces (32 and 34), instead of individual memory addresses. Any page size may be employed, such as, for example, 1 kilobyte, 4 kilobytes, 16 kilobytes, or any other page size. The TLB table 36 includes one or more flags 37 used to maintain various information, such as, for example, whether an entry is valid, whether an entry is protected, whether an entry has changed, and the like.
  • Referring to FIG. 4, one technique for providing debugging information in a processing system, such as that discussed above with respect to FIG. 1, is to include a debugging component 40 a to capture information about a program under test 40 b when such software attempts to read and/or write to memory using a memory access component 40 c. The debugging component 40 a flags memory to force interrupts (41). Typically the debugging component 40 a is implemented in software and includes one or more interrupt handlers. A portion of the debugging component 40 a may be implemented as a software application to allow a user, programmer, and/or system administer to selectively enable or disable debugging functionality. Memory may be flagged by the debugging software 40 a either automatically, in response to a user request, or in response to an interrupt. For example, the TLB flags 37 may be set so as to force an interrupt by marking a TLB table 36 entry as read-only or otherwise protected.
  • The program under test 40 b accesses a virtual memory address (42). If the virtual memory address that is requested is not flagged to force an interrupt, then the memory access component 40 c permits access to the requested address and/or returns the contents of such address (43). If, however, a requested memory access (44) generates an interrupt, the debugging component 40 a is activated. In some implementations, debugging software component 40 a and interrupt handlers are not executed except in response to an interrupt forced by such component 40 a. However, some implementations may execute such interrupt handlers more frequently. In this situation, it may be beneficial for the debugging software component 40 a to determine the cause of the interrupt before taking further action.
  • When an interrupt occurs that is forced by the debugging software component 40 a, the appropriate interrupt handler saves the current state (45) and enables access to the requested data (46). The memory access component 40 c then permits access (47) to the program under test 40 b. In some implementations, the debugging component 40 a forces interrupts by interacting with the memory access component 40 c to protect data, to mark the date as read-only, or to otherwise flag the data in such a way that an interrupt or other exception will be raised when the program under test 40 b attempts to access (either read and/or write) a portion of the memory 14.
  • In one such implementation, a memory management unit 18 is configured to mark every page in memory 14 as read-only. When a program under test 40 b attempts to write to a read-only memory location, a fault occurs and an exception handler is invoked. The exception handler saves certain debugging information, marks the requested page as read/write, and then allows execution to resume. Other memory locations within the page may be accessed and modified without raising an exception unless and until the page is again marked read-only.
  • Consider, for example, a program that writes to addresses, as shown in the following table:
  • Address Result
    0x10000 Exception Raised
    0x10480
    0x11800 Exception Raised
    0x12000 Exception Raised
    0x13000 Exception Raised
    0x13010
    0x13040 Crash
  • In this example, each page includes 4 kilobytes of addressable memory. To begin, each page is marked as read-only. With the first write to virtual address 0x0000, an exception is raised because the page is marked read-only. The exception handler saves debugging information, marks the page as read-write, and execution continues. The program then attempts to write to virtual address 0x10480. Because this address resides in the same page as 0x10000, the page is already marked as read-write and execution continues without exception handler invocation. Because each of the next three accesses reside in different pages marked read-only, each invokes the exception handling and debugging information is saved. Then, execution continues with the last two memory accesses. Because the page is already marked read-write, access to 0x13010 occurs normally; however, the access of address 0x13040 causes a crash.
  • Upon occurrence of a crash, the system dumps certain debugging information and program execution terminates. At this point, the system has saved debugging information four times. Accordingly, a debugger may be used to restore the system to any of the previous states and a programmer may analyze the code and its execution to attempt to identify the cause of the crash. Crashes may occur for many reasons, including, for example, insufficient access privileges, an illegal address, and the like.
  • In some embodiments, pages are periodically re-marked as read-only. This can be performed using an exception handler. For example, a watchdog timer may be used to raise an exception periodically. When the exception is raised, an exception handler then marks one or more pages as read-only. For example, an exception may be raised every 10 ms, every 50 ms, every 100 ms, every second, or at any other time interval. Each time the exception handler is invoked, some or all of the accessed pages may be marked read-only. Implementations may mark all pages read-only (including the ones already marked read-only), may mark changed pages read-only, may mark a subset of pages read-only, or the like. Some implementations perform maintenance functions, such as those described above, during system calls.
  • The techniques described herein also may be used to facilitate automated error recovery. Upon a crash or other unrecoverable exception, a program may be restored to a previous operable state using stored debugging information. Sometimes crashes may be caused by hardware errors that are unrelated (or at least non-deterministically related) to the program execution. In this case, simply restoring to a previous state may be sufficient to allow program execution to continue normally. In other instances, crashes are caused by software bugs. In these cases, a program may be restored to a previous state. If the program is interactive, it may be possible to avoid the interaction that caused the crash; however, there is a possibility that the crash will occur again.
  • Some implementations may provide for a multi-stage restore process whereby the most recently saved state is restored first. If a crash again occurs, then a previously saved state is restored. This process may continue until the user intervenes, or may automatically cease after a predetermined event or occurrence (such as, lapse of time, a predetermined number of repeated failures, and the like).
  • In another implementation, the system simply saves program context every time a page fault occurs. This provides less granularity than the technique discussed above, but may be desirable for some debugging tasks.
  • The above implementations are described for purposes of example only; many additional variations are within the scope of this disclosure. For example, one skilled in the art will appreciate that exception handling may be invoked through mechanisms other than marking pages read-only.

Claims (18)

1. A method for automatic recovery from a fatal error in a computer system, the method comprising:
periodically forcing interrupts in a computer system;
in response to a forced interrupt, capturing information regarding the operating state of the computer system;
detecting an error during operation of the computer system; and
in response to a detected error, invoking an automatic recovery process that places the computer system in a recovered operating state using the captured information.
2. The method of claim 1, wherein a memory management component of the computer system periodically forces interrupts.
3. The method of claim 1, wherein capturing information regarding the operating state of the computer system includes storing contents of one or more registers.
4. The method of claim 1, wherein capturing information regarding the operating state of the computer system includes storing contents of one or more memory locations.
5. The method of claim 1, wherein capturing information regarding the operating state of the computer system includes storing user-selected state information.
6. The method of claim 1, wherein the detected error is a segmentation violation.
7. The method of claim 1, wherein the detected error is an illegal instruction.
8. The method of claim 1, wherein the detected error is a privilege error.
9. A non-transient computer-readable medium comprising software operable to enable automatic recovery from a fatal error in a computer system, the software comprising:
instructions to periodically force interrupts in a computer system;
instructions to capture information regarding the operating state of the computer system in response to a forced interrupt; and
instructions to invoke an automatic recovery process in response to an error detected during operation of the computer system, the automatic recovery process using the captured information to place the computer system in a recovered operating state.
10. A system, comprising:
a memory; and
a processor coupled to the memory, the processor operable to execute a series of instructions,
wherein a processor interrupt is periodically invoked and system state information is captured in response to an invoked processor interrupt, and wherein the system is operable to recover from a fatal error by restoring the captured system state information in response to the fatal error.
11. The system of claim 10, wherein the processor interrupt is invoked in response to an attempted memory access.
12. The system of claim 10, wherein the processor interrupt is invoked after an elapsed period of time.
13. The system of claim 10, wherein the captured system state information includes contents of one or more registers.
14. The system of claim 10, wherein the captured system state information includes contents of one or more memory locations.
15. The system of claim 10, wherein the captured system state information includes user-selected state information.
16. The system of claim 10, wherein the fatal error is a segmentation violation.
17. The method of claim 10, wherein the fatal error is an illegal instruction.
18. The method of claim 10, wherein the fatal error is a privilege error.
US12/726,129 2007-06-22 2010-03-17 Memory Handling Techniques To Facilitate Debugging Abandoned US20100205477A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/726,129 US20100205477A1 (en) 2007-06-22 2010-03-17 Memory Handling Techniques To Facilitate Debugging

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/767,399 US7689868B2 (en) 2007-06-22 2007-06-22 Memory handling techniques to facilitate debugging
US12/726,129 US20100205477A1 (en) 2007-06-22 2010-03-17 Memory Handling Techniques To Facilitate Debugging

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US11/767,399 Continuation US7689868B2 (en) 2007-06-22 2007-06-22 Memory handling techniques to facilitate debugging

Publications (1)

Publication Number Publication Date
US20100205477A1 true US20100205477A1 (en) 2010-08-12

Family

ID=40137773

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/767,399 Active 2028-08-25 US7689868B2 (en) 2007-06-22 2007-06-22 Memory handling techniques to facilitate debugging
US12/726,129 Abandoned US20100205477A1 (en) 2007-06-22 2010-03-17 Memory Handling Techniques To Facilitate Debugging

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US11/767,399 Active 2028-08-25 US7689868B2 (en) 2007-06-22 2007-06-22 Memory handling techniques to facilitate debugging

Country Status (1)

Country Link
US (2) US7689868B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080209268A1 (en) * 2007-02-22 2008-08-28 Arm Limited Selective disabling of diagnostic functions within a data processing system
WO2017172058A1 (en) * 2016-03-30 2017-10-05 Intel Corporation Method and apparatus for using target or unit under test (uut) as debugger
CN114546823A (en) * 2021-12-27 2022-05-27 芯华章科技股份有限公司 Method for reproducing debugging scene of logic system design and related equipment

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2864655B1 (en) * 2003-12-31 2006-03-24 Trusted Logic METHOD OF CONTROLLING INTEGRITY OF PROGRAMS BY VERIFYING IMPRESSIONS OF EXECUTION TRACES
FR2996321B1 (en) * 2012-10-03 2014-10-24 Emulation And Verification Engineering METHOD AND DEVICE FOR SAVING A STATE OF A TEST-EMULATOR EMULATOR-TESTED CIRCUIT AND ITS SOFTWARE TEST ENVIRONMENT, AND CORRESPONDING RESTORATION METHOD AND DEVICE
TWI514145B (en) * 2013-10-21 2015-12-21 Univ Nat Sun Yat Sen Processor and cache, control method thereof for data trace storage
CN112541166A (en) * 2019-09-20 2021-03-23 杭州中天微系统有限公司 Method, system and computer readable storage medium

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4740969A (en) * 1986-06-27 1988-04-26 Hewlett-Packard Company Method and apparatus for recovering from hardware faults
US4905196A (en) * 1984-04-26 1990-02-27 Bbc Brown, Boveri & Company Ltd. Method and storage device for saving the computer status during interrupt
US5611043A (en) * 1994-03-18 1997-03-11 Borland International, Inc. Debugger system and method for controlling child processes
US5809229A (en) * 1995-09-20 1998-09-15 Sharp Kabushiki Kaisha Runaway detection/restoration device
US6044475A (en) * 1995-06-16 2000-03-28 Lucent Technologies, Inc. Checkpoint and restoration systems for execution control
US6622263B1 (en) * 1999-06-30 2003-09-16 Jack Justin Stiffler Method and apparatus for achieving system-directed checkpointing without specialized hardware assistance
US6629271B1 (en) * 1999-12-28 2003-09-30 Intel Corporation Technique for synchronizing faults in a processor having a replay system
US6779087B2 (en) * 2001-04-06 2004-08-17 Sun Microsystems, Inc. Method and apparatus for checkpointing to facilitate reliable execution
US6934886B2 (en) * 2001-04-13 2005-08-23 Lg Electronics Inc. Debugging apparatus and method
US6981243B1 (en) * 2000-07-20 2005-12-27 International Business Machines Corporation Method and apparatus to debug a program from a predetermined starting point
US7047520B2 (en) * 2001-10-25 2006-05-16 International Business Machines Corporation Computer system with watchpoint support
US20070033577A1 (en) * 2005-08-08 2007-02-08 Arackal Paulose K Method and apparatus for debugging program code
US7293200B2 (en) * 2004-08-26 2007-11-06 Availigent, Inc. Method and system for providing transparent incremental and multiprocess checkpointing to computer applications
US7447942B2 (en) * 2005-07-19 2008-11-04 Microsoft Corporation Fast data breakpoint emulation
US7467325B2 (en) * 2005-02-10 2008-12-16 International Business Machines Corporation Processor instruction retry recovery

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4905196A (en) * 1984-04-26 1990-02-27 Bbc Brown, Boveri & Company Ltd. Method and storage device for saving the computer status during interrupt
US4740969A (en) * 1986-06-27 1988-04-26 Hewlett-Packard Company Method and apparatus for recovering from hardware faults
US5611043A (en) * 1994-03-18 1997-03-11 Borland International, Inc. Debugger system and method for controlling child processes
US6044475A (en) * 1995-06-16 2000-03-28 Lucent Technologies, Inc. Checkpoint and restoration systems for execution control
US5809229A (en) * 1995-09-20 1998-09-15 Sharp Kabushiki Kaisha Runaway detection/restoration device
US6622263B1 (en) * 1999-06-30 2003-09-16 Jack Justin Stiffler Method and apparatus for achieving system-directed checkpointing without specialized hardware assistance
US6629271B1 (en) * 1999-12-28 2003-09-30 Intel Corporation Technique for synchronizing faults in a processor having a replay system
US6981243B1 (en) * 2000-07-20 2005-12-27 International Business Machines Corporation Method and apparatus to debug a program from a predetermined starting point
US6779087B2 (en) * 2001-04-06 2004-08-17 Sun Microsystems, Inc. Method and apparatus for checkpointing to facilitate reliable execution
US6934886B2 (en) * 2001-04-13 2005-08-23 Lg Electronics Inc. Debugging apparatus and method
US7047520B2 (en) * 2001-10-25 2006-05-16 International Business Machines Corporation Computer system with watchpoint support
US7293200B2 (en) * 2004-08-26 2007-11-06 Availigent, Inc. Method and system for providing transparent incremental and multiprocess checkpointing to computer applications
US7467325B2 (en) * 2005-02-10 2008-12-16 International Business Machines Corporation Processor instruction retry recovery
US7447942B2 (en) * 2005-07-19 2008-11-04 Microsoft Corporation Fast data breakpoint emulation
US20070033577A1 (en) * 2005-08-08 2007-02-08 Arackal Paulose K Method and apparatus for debugging program code

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080209268A1 (en) * 2007-02-22 2008-08-28 Arm Limited Selective disabling of diagnostic functions within a data processing system
US7913120B2 (en) * 2007-02-22 2011-03-22 Arm Limited Selective disabling of diagnostic functions within a data processing system
WO2017172058A1 (en) * 2016-03-30 2017-10-05 Intel Corporation Method and apparatus for using target or unit under test (uut) as debugger
CN114546823A (en) * 2021-12-27 2022-05-27 芯华章科技股份有限公司 Method for reproducing debugging scene of logic system design and related equipment

Also Published As

Publication number Publication date
US20080320333A1 (en) 2008-12-25
US7689868B2 (en) 2010-03-30

Similar Documents

Publication Publication Date Title
US20100205477A1 (en) Memory Handling Techniques To Facilitate Debugging
CN107357666B (en) Multi-core parallel system processing method based on hardware protection
US7035963B2 (en) Method for resolving address space conflicts between a virtual machine monitor and a guest operating system
US8656222B2 (en) Method and system for recording a selected computer process for subsequent replay
US8713371B2 (en) Controlling generation of debug exceptions
US8321842B2 (en) Replay time only functionalities in a virtual machine
US7711914B2 (en) Debugging using virtual watchpoints
CN108351826B (en) Monitoring operation of a processor
US20090037710A1 (en) Recovery from nested kernel mode exceptions
WO2005098616A2 (en) Providing support for single stepping a virtual machine in a virtual machine environment
US9189620B2 (en) Protecting a software component using a transition point wrapper
US9921925B2 (en) Method and apparatus for recovering abnormal data in internal memory
CN114077379B (en) Computer equipment, exception handling method and interrupt handling method
US10795997B2 (en) Hardened safe stack for return oriented programming attack mitigation
US10061918B2 (en) System, apparatus and method for filtering memory access logging in a processor
US20120072638A1 (en) Single step processing of memory mapped accesses in a hypervisor
US20060294433A1 (en) Debugging using watchpoints
David et al. Exploring Recovery from Operating System Lockups.
JP4155052B2 (en) Emulator, emulation method and program
CN101650688A (en) Method for accessing VM_IO address space and user mode debugger
CN117573419B (en) Page exception handling method and device
CN112380529B (en) Embedded bare computer system safety isolation system based on operation
CN111767119B (en) Kernel hooking method without triggering system protection
Jiang et al. Architectural framework for supporting operating system survivability
Hwang et al. AppWatch: detecting kernel bug for protecting consumer electronics applications

Legal Events

Date Code Title Description
AS Assignment

Owner name: SONY NETWORK ENTERTAINMENT PLATFORM INC., JAPAN

Free format text: CHANGE OF NAME;ASSIGNOR:SONY COMPUTER ENTERTAINMENT INC.;REEL/FRAME:027446/0001

Effective date: 20100401

AS Assignment

Owner name: SONY COMPUTER ENTERTAINMENT INC., JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SONY NETWORK ENTERTAINMENT PLATFORM INC.;REEL/FRAME:027557/0001

Effective date: 20100401

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: SONY INTERACTIVE ENTERTAINMENT INC., JAPAN

Free format text: CHANGE OF NAME;ASSIGNOR:SONY COMPUTER ENTERTAINMENT INC.;REEL/FRAME:039239/0343

Effective date: 20160401