GSFC/NGIMS-FSW-55 To: CONTOUR/NGIMS From: Michael Paulkovich Subj: FC Reboot Investigation Date: April 17, 2002 INTRODUCTION Occasional quirks have been reported during integration and testing of the NGIMS instrument with the spacecraft at APL, clearly results of a "random" reboot of the FC. It is not known the cause of such reboots, which could be: Instrument Flight Software Bug Instrument Flight Hardware Bug Spacecraft Bug (hardware or software) Operations error. These random reboots occur quite rarely, and are therefore quite difficult to reproduce, characterize, and troubleshoot. The report describes our investigation to date, and will be updated from time to time. 1. Bootstrap Reboot and Alt Boot This section characterizes the circumstances and behavior of the random re-boot, and the reboot operation of the Flight Software. 1.1 Intermittent reboot a) Circumstances. The reboot happened during recent S/C testing while running the 9hr sequenced mission sim test. A pressure check with BA filaments on in low emission was being performed; this test has been run many times previously with no problems. Since there were basically no "deltas" from previous runs, this points to either a hardware glitch, operations error, or random FSW anomaly. b) Behavioral Overview. The characteristics of the occasional reboot are such that it appears to be a "real" reboot: - IMon in NPTM goes to the level of INITMODE - TLM "disappears" until next command in Encounter Script is executed - The # scans counter (aka "secret counter") resets - TLM Sequence Count resets - Memory dump shows Alt Boot Counter has been decremented Note that the Encounter script can issue a TGO Boot command, as well as the Spacecraft's autonomy rules; these possibilities also should be considered. 1.2 Bootstrap and Alt Boot The FSW has some status bits indicating the results of error checks and reboot operation: * EEPROM Boot ("Alt Boot") * Tables Copied * TGO Detect * IC Error ("Wayward IC") Operation and Quirks The EEPROM Boot and Tables Copied flags work fine. The TGO Detect boot has the following quirk - during "initial bootup" a status word called "Tgo Reboot Occurred" (at address 044A) is overridden if there is an EEPROM ("Alt") Boot. This should nevertheless be patchable - rather than reading the flag from bootstrap processing, the software can read the 1750 Configuration Register to determine the state of the TGO Boot bit. It is a similar situation with IC Error ("Wayward IC") bit, but probably not patchable, since it's an "internally detected" condition, whereas TGO Boot is a status that is sent to the 1750 micro-processor externally (from a register in the FC), and is always readable by the 1750. Documentation will be updated appropriately, and a patch to correctly report TGO Reboot will be written. 2. MET Jump As reported in WR-136, the MET bit 23 can become randomly set (extremely rare). We will attempt to analyze and characterize this behavior, and if fixable: - analyze by hand to see if problem seemed severe enough to cause reboot; - fix and see if Intermittent reboot recurs. 3. NPTM Words As a matter of performing these analyses, some re-testing of FSW NPTM was performed; the results are: All Amux channels are reported correctly in right NPTM word locations A Subsystem fail value 0x1000 (MET late?) showed up in non-latched but not latched? 4. Actions The following actions are to be taken: * Paulkovich to enter Work Requests on FSW DR list for any non-conformances. * Paulkovich to generate patch for TGO Boot reporting. * Paulkovich to update documentation to reflect operation. * Paulkovich to investigate WR-136, and if possible, characterize operation and generate fix. * Paulkovich, Huff, Tan to continue to investigate and report any anomalies. 1