A Running Data Stream

Here we consider another real-life example: taking a stream of monitoring data from an operational experiment, and using that past data to set limits on future behavior.

The data are from some low voltage DC power supplies in use at the MINOS experiment. These are pretty sophisticated beasts, and can be controlled and read out over a network. We use the CANbus protocol if you're curious. These files are the results of a test run of our in-development monitoring system. Every so often, a program reads all the voltages and the currents at each voltage being supplied by each of a number of different supplies. A line is written to a text file for each voltage/current pair read on each different unit. The first part of the line is a timestamp (complete with hours different from GMT), then there are crate ID, voltage channel ID, voltage, current.

Your mission: parse this data into something usable and find a set of criteria which will tell us when something interesting has happened. Examine the history for each channel and crate. Identify any weirdnesses. There are a number of these things – you'll want to create some automated way to do this to save a lot of eyeballing, but of course graphing the data and staring at it is the first step towards creating such a process. Figure out a set of monitoring criteria – globally and/or tailored to each individual channel. Things we might want to be warned about:

Are things off?
Is a voltage drooping higher or lower than normal?
Is a channel supplying more current than usual? Than initially? Imminent hardware death is often preceeded by increasing current draw.
Can you identify hardware swaps? Different cards being powered might draw different currents.

Test these criteria by pretending that the existing data is new and running it through a program which flags the interesting events. Show me some graphs of stuff vs time to elaborate when things happened and what was going on.

Are there any trends with time we should take into account when doing this?

A practical consideration: how easy would your set if criteria be to maintain over time, as hardware is swapped around?

Data files:

From CAN controller 0: can0log.txt
From CAN controller 1: can1log.txt

(That's ½ the total number of power supplies in use at Soudan. At Fermilab, there are several times this more units).