# `Nerves.Runtime.StartupGuard`
[🔗](https://github.com/nerves-project/nerves_runtime/blob/v0.13.12/lib/nerves_runtime/startup_guard.ex#L4)

Monitor system startup and validate firmware

This module provides a default for preventing devices that have failed to
complete initialization from either reverting to an earlier firmware or
rebooting to try again. Enough time is given so that a device doesn't get
into an undebuggable boot loop, but also doesn't wait forever in a state that
may also be impossible to debug.

This is a generic default that is intended to be suitable for all use cases.
However, you will eventually find that you can do better, and you are
encouraged to replace it when ready. For example, you may want to confirm
connectivity to a firmware update server before validating a new image just
in case a change broke networking. Please investigate using alarms (via
`:alarm_handler` or `alarmist`) for aggregating these checks.

If your Nerves system requires that new firmware images are validated, you
will need this. In other words, if you have to run
`Nerves.Runtime.validate_firmware/0` every time you upload new firmware, then
your Nerves system requires validation.

## Setup

Add the following to your project's `target.exs` or `config.exs`:

```elixir
config :nerves_runtime, startup_guard_enabled: true
```

To handle a case where Erlang starts fine, but somehow hangs before `StartupGuard` can
register itself with Erlang's heart feature, there's a handshake that needs to occur.
The handshake needs to be enabled in Nerves Heart (which integrates with
Erlang heart), though. To do this, add the following to your project's
`rel/vm.args.eex`:

```text
## Require an initialization handshake within 10 minutes
-env HEART_INIT_TIMEOUT 600
```

## Further discussion

Here's the high level summary of how this works:

1. On init, OTP starts up all applications. When it starts up
  `:nerves_runtime`, `StartupGuard` gets run.
2. `StartupGuard` registers a `:heart` callback. The callback is a time bomb
   that starts failing after 15 minutes.
3. `StartupGuard` gets the list of OTP applications that should be started.
   Applications marked in the Mix release to only `:load` aren't counted.
4. `StartupGuard` waits for all expected applications to start
5. Once everything starts, `StartupGuard` validates the firmware and removes
   the `:heart` callback.
6. If anything went wrong, log the errors. Since the `:heart` callback is
   still registered, the system will be available for debugging, but it will
   eventually reboot.

One nice alteration to this is to leave the `:heart` callback in place, but
have it check some kind of "system ok" flag. If you do this, keep in mind
that the callback is totally unforgiving to errors and function calls taking
too long. Making it too complicated can backfire and cause inadvertent
reboots. Rebooting too quickly on errors can impact your ability to debug
partial failures. If using this code as a template, try to keep your code in
`Task` or change this to a `GenServer` or anything else that can be
supervised. Decoupling the checks into alarms is another nice pattern.

## Troubleshooting

1. If getting the log message about exceeding the number of retries for
   getting firmware validation status, then
   `Nerves.Runtime.firmware_validation_status/0` is returning `:unknown`. This
   is probably due to the Nerves system's `fwup.conf` not
   initializing `<slot>.nerves_fw_validated` to `0` (or `1` if always valid).
2. If falling back without logs, try installing `ramoops_logger` to capture
   log messages that don't make it to disk.

# `child_spec`

Returns a specification to start this module under a supervisor.

`arg` is passed as the argument to `Task.start_link/1` in the `:start` field
of the spec.

For more information, see the `Supervisor` module,
the `Supervisor.child_spec/2` function and the `t:Supervisor.child_spec/0` type.

---

*Consult [api-reference.md](api-reference.md) for complete listing*
