Skip to content
This repository was archived by the owner on Sep 1, 2024. It is now read-only.

Latest commit

 

History

History
340 lines (265 loc) · 18.9 KB

File metadata and controls

340 lines (265 loc) · 18.9 KB

System Manual

Applicable Version

This manual described the design, behaviour and configuration of ReatMetric version 1.0.x.

System Overview

ReatMetric is a Java-based software infrastructure for the implementation of Monitoring & Control (M&C) systems, with a strong focus on the space domain. ReatMetric components provide a simple but efficient implementation of the typical functions used in an M&C system.

Introduction

A Monitoring and Control (abbreviated as M&C) system is an application used to:

  • retrieve and process information of the status of hardware and software devices in a given context,

  • allow control and commanding of such devices,

  • present such information to human users, operators and other software systems,

  • detect non-nominal situation and raise alarms accordingly,

  • schedule operations and activities to be executed on the devices,

  • record the most important data for subsequent analysis.

ReatMetric is designed as a modular M&C framework, whereas each module follows the Java Platform Module System specification (https://en.wikipedia.org/wiki/Java_Platform_Module_System). Each module provide a specific set of functionalities, data structures and interfaces that can be used by other modules. While some modules provide data structures and interfaces that are at the foundation of the framework, other modules can be replaced in isolation.

The modules that form the foundation of the ReatMetric system are shown in the following figure.

ReatMetric Foundation Modules and dependencies

Monitoring and Control Concepts

In the ReatMetric world, the environment to be monitored and controlled is seen as a hierarchical decomposition, starting from a top level object, further decomposed into groups, systems, sub-systems, devices and so on. For instance, the monitoring of a domotic system for a home automation system could be hierarchically arranged as follows:

  • My house

    • Blinders

      • Bedroom 1

        • Window Roller 1

          • Engine (device)

          • Control Panel (device)

        • Window Roller 2

          • …​

      • Bedroom 2

        • Window Roller

          • …​

      • Kitchen

        • Window Roller

          • …​

      • Bathroom

        • Window Roller

          • …​

    • Appliances

      • Kitchen

        • Dishwasher (device)

        • Oven (device)

        • Microwave (device)

        • Fridge (device)

      • Bathroom

        • Washing Machine (device)

        • Dryer (device)

      • …​

    • Heating System

      • …​

    • Light System

      • …​

    • Network

      • …​

The arrangement of the hierarchy is sometimes obvious to design, while in other cases can be more a matter of taste. In the example above, for instance, the hierarchy is a functional hierarchy, based on the functional elements present in the domotic system. However, a hierarchical breakdown based on the rooms of the house could have been an alternative. What it is not a matter of taste are the physical objects (labelled as 'device'), which are ultimately monitored and controlled. Such objects are characterised by:

  • A readable state - i.e. a set of parameters - with related values, which you can actively monitor and, in some cases, also change. For instance, a television has the following state parameters: if it is on or off, selected input source, selected program, current volume, if is muted or not, version of the installed firmware…​ Some parameters can be also set (e.g. changing the program or muting/unmuting the device), while some others are read-only (e.g. the version of the installed firmware).

  • A commanding interface - i.e. a set of defined activities - that specify what you can do with the device. For instance, an oven can have a way to request the start of the self-cleaning. Depending on the protocol exposed by the device, the lifecycle of activity executions can be monitored (e.g. the device informs that the operation has been accepted and, after a while, that the operation has been completed - with success or with error).

  • A way to signal when something happened - i.e. an event - which is relevant for the functioning purpose of the device. For instance, a television might signal when a new version of the firmware is detected.

ReatMetric concepts are derived from this general device characterisation, as described above. As it is a M&C framework oriented to the space domain, such concepts are actually a re-elaboration of the M&C concepts reported in the ECSS standard ECSS-E-ST-70C (https://esastar-publication.sso.esa.int/api/filemanagement/download?url=emits.sso.esa.int/emits-doc/ESOC/1-6223/ECSS-E-ST-70-31C(31July2008).pdf). ReatMetric manages a hierarchical tree composed of so-called System Entities. A System Entity (at any level of the hierarchy) is characterised by parameters, activities and events. Some system entities are mapped to actual devices, while others are introduced only as 'containers', to partition large systems in subsystems.

Parameters

A parameter is a property containing a value. A parameter defines a so-called raw type and an engineering type:

  • The raw type is the type of the value that will be reported for that parameter, for further processing. For instance, the status of the mute/unmute parameter of a television might be reported by the television interface as 0 if the TV is muted, and 1 if it is unmuted. In such case, the raw type of the 'muted status' parameter is an unsigned integer. The value as reported by the TV takes the name of source value.

  • The engineering type is the type of the value that will be reported after the processing. For instance, it might be desirable to have a mapping between the value 0 to the string 'MUTED' and the value 1 to the string 'UNMUTED'. The function mapping the source value to the engineering value takes the name of calibration function.

The value of a parameter might be considered valid depending on certain conditions. For instance, the current value of the TV volume level should be irrelevant, if the TV is muted. The parameter mapping the TV volume level will have therefore a validity condition, linked to the parameter value of the mute/unmute parameter. Parameters that are not valid are not calibrated.

Each parameter may define a set of monitoring checks, i.e. conditions that are evaluated again the engineering value of the parameter every time a new engineering value of that parameter is available. If the value is not satisfying the defined check, the parameter alarm state will change to a non-nominal state. Each check may define an applicability condition, which is evaluated to understand if the related check must be verified or not. Parameters that are not valid are never checked.

There is a type of parameters, for which the source value is not retrieved by devices, but it is computed internally by ReatMetric, based on an arithmetic/algorithmic expression. Such parameters are called synthetic parameters: the source value of synthetic parameters is recomputed every time one of the dependant parameters is updated.

Activities

An activity is the definition of an operation that can be performed by a given system entity. It is an abstract representation of an action.

An activity is characterized by a set of named arguments, each having a raw type and an engineering type. As per parameters, a de-calibration function may be present for each argument, to convert the argument value provided as engineering value into a source value, which is then provided to the underlying layer that implements the execution of such activity. Properties can also be defined.

Once invoked, an activity occurrence is created and dispatched to the specific ReatMetric driver for implementation. Upon its creation, an activity occurrence starts its lifecycle. Such lifecycle in ReatMetric is defined by the following states:

  • CREATION: this state is assigned upon creation of the activity occurrence.

  • RELEASE: this state is reached when the activity occurrence is prepared for release.

  • TRANSMISSION: this state is reached as soon as the activity occurrence leaves the ReatMetric system.

  • SCHEDULING: this state is reached when the activity occurrence will be executed by the final destination at a given point in time.

  • EXECUTION: this state is reached when the activity occurrence is under execution by the final destination

  • VERIFICATION: this state is reached when the effects of the activity occurrence can be verified at parameter level in the ReatMetric processing model.

  • COMPLETED: this state is reached when no further state updates are expected. The activity occurrence lifecycle is considered complete.

In order to transition from one state to the other, verification stages about the progress of the activity occurrence are announced to the ReatMetric processing model, internally by ReatMetric itself and by the specific driver that manages the implementation of the activity occurrence. A verification stage can be considered as a concrete step in the progress of an activity occurrence. Each verification stage is announced and further updated by means of reports. Typically, such reports depend on the way the transmission protocol is defined and in the specific lifecycle of the activity implementation. Each report specifies: the name of the verification stage, the activity state it belongs to; the estimated/exact execution time of the activity occurrence; the activity state the activity occurrence shall transition to, upon processing the report (if the report is successful); an optional result.

The report of a verification stage can have one of the following states:

  • UNKNOWN: The stage state is unknown and no better prediction can be done

  • EXPECTED: The stage should be concluded, but no confirmation of the stage was received yet

  • PENDING: The stage is reported as currently open and the system is waiting for its confirmation

  • TIMEOUT: The timeout linked to the stage expired

  • OK: The stage is reported as successfully executed

  • FAIL: The stage is reported as failed, but this failure is not fatal for the execution of the activity occurrence

  • FATAL: The stage is reported as failed, the activity occurrence shall be considered completed

  • ERROR: This specific state is reported in relation to a verification expression, linked to the activity occurrence state VERIFICATION when the expression cannot be evaluated due to expression’s errors

An important state of an activity occurrence lifecycle is represented by the VERIFICATION state: in such state, ReatMetric will evaluate a specific expression. Such expression, if defined, is based on parameters and activity occurrence arguments, and it is used to verify that the effects of the activity occurrence execution are actually visible in the monitored system, regardless of any execution report possibly generated by the end system.

Events

An event is used to indicate the occurrence of a situation. Differently from activity occurrences and parameters, events do not have a state as such, because their lifetime is, theoretically speaking, istantaneous. For instance, an event could be raised by a television to indicate the presence of a new firmware version. It could be that the same information is delivered by a parameter as well (e.g. a parameter named "Latest available firmware version"): the change in value of this specific parameter triggers the event. In more correct terms, the object generated by ReatMetric should be called event report rather than just event: the terminology has been used to align it to the ECSS standard.

An event has a qualifier and a source, and it is defined by a severity and a type. Events in ReatMetric can include a generic report object, which contains additional information about the raised event.

In ReatMetric events can be reported by the drivers (so-called reported events), or can be detected autonomously by the ReatMetric processing model, by evaluating conditions upon change of parameters (so-called TM-based events).

Connectors, activity handlers and routes

In order to receive data from monitored devices and to send commands to such devices, ReatMetric needs to establish data connections to them. Typically, a device allows a connection using a specific network/transport protocol, which is used to exchange data using an application protocol. For instance:

  • you can connect using a TCP/IP connection, which it is used to exchange binary or ASCII-based Protocol Data Units (PDUs);

  • you can connect using a TCP/IP connection, which it is used to exchange HTTP(s) requests/responses;

  • you can query the device using SNMP GET requests and send commands via SNMP SET requests;

  • you can connect via a serial port, which it is used to exchange binary or ASCII-based PDUs.

The way of connecting and exchanging messages to the device is device-dependent. Even the number of connections is device-dependent. Some devices might deliver monitoring data using a TCP/IP port and receive control requests and commands using a separate TCP/IP port.

In ReatMetric, a connector is used to represent and manage a logical connection between the ReatMetric system and the target device. For every device, you may have one or more connectors. Via the connector interface, you can start, stop, abort, initialise and monitor the logical connection to the target device. A logical connection can be mapped to a single physical connection to the device, or to multiple physical connections, even to different devices. This depends on the way the support to the device is implemented (see section [_drivers]).

The state of a logical connection determines the availability of a so-called route. A route is a uniquely identified path from the ReatMetric system to the target device. Every parameter or event received from a given route reports also the incoming route as part of its state. In the same way, when an activity occurrence is dispatched to an activity handler, the outgoing route must be indicated. If the route, whose state is reported by the corresponding activity handler, is not available, then the dispatch of the activity occurrence fails.

Even if there is typically a logical link between a connector and a specific route, this link is not enforced by ReatMetric directly. In fact:

  • A connector can manage one or more logical connections;

  • A logical connection is mapped to zero or more physical connections;

  • A route is declared as managed by one activity handler;

  • An activity handler manages one or more routes;

  • A route can be logically mapped to one or more logical connections by the activity handler that manages the route.

For the ReatMetric architecture, all this management happens inside the so-called drivers (see section [_drivers]).

Raw Data

In ReatMetric, a raw data is an semi-opaque data object, with the meaning that ReatMetric does not know the internal structure or data of the object, but only some meta-data attributes, such as the originating source, the route, the generation and reception time, and so on. Any module in the ReatMetric architecture can store and retrieve raw data.

For instance, raw data can be used to distribute and optioanlly store low level PDUs that are exchanged at connection level between ReatMetric and the target devices. In the same way, it can be used to store changes of state or internal states of modules and drivers, as required.

Drivers

A driver is the object that connect ReatMetric with an external device or system. A driver is responsible for:

  • Managing and providing a set of activity handlers, which are the entry point for activity dispatching by the ReatMetric system;

  • Managing and providing a set of connectors, which are used to manage the connections between ReatMetric and the external device or system;

  • Managing the specific protocols and formats to retrieve parameters and events from the external device or system, and ingest such data into the ReatMetric processing model;

  • Managing and providing the raw data interpreters, i.e. renderers that can be used to display raw data internals, as made visible by the driver;

  • Providing internal state information, for debug and monitoring purposes.

The lifecycle of a driver is pretty simple:

  • A driver is instantiated and initialised by the ReatMetric Core module, by invoking the initialise(…​) method. This method contains several arguments, among others the name and configuration of the drivers, and the context of the ReatMetric Core instance.

  • The driver is enquired to know the available activity handlers, raw data renderers and transport connectors. Each of this elements is then used accordingly by the relevant ReatMetric modules, i.e. the activity handlers are used by the processing model when dispatching activity occurrences; the renderers are used by the MMI; the transport connectors are used by the ReatMetric Core and by the MMI to activate and manage the external connections.

  • At ReatMetric Core instance shutdown, the dispose() method is invoked, to perform clean-up activities and to release the resources.

Once a driver is disposed, it is never re-used. Therefore, the initialise(…​) and dispose() methods are called only once during a single ReatMetric Core instance lifecycle.

In the initialise(…​) method, a reference to a context object is provided. Such object allows the driver to reach the different functions made available by the ReatMetric system:

  • The archive, if available;

  • The scheduler, if available;

  • The processing model;

  • The raw data broker;

  • The operational message broker;

  • The ReatMetric system main interface.