What are the applications of CRISP-DM?

The Cross Industry Standard Process for Data Mining (CRISP-DM) was recognized as an approach to further standardise the M&V methodology and allows more efficient estimation of energy savings. There are several applications of CRISP-DM which are as follows −

Business Understanding − A biomedical manufacturing facility was selected as a case study to create the feasibility of the application of DM to help M&V. A quality understanding of the business under analysis was important to execute the results at the modelling and evaluation phase of the process. This was implemented by carrying out a process walk-through, learning process flow diagrams, and piping and instrumentation diagrams.

A knowledge of the systems within the boundary of analysis was needed from this phase and some more issues were understand with the facility’s engineering team. The boundary of the analysis was the electrical energy consumption across the whole manufacturing facility.

Data Understanding − The data understanding procedure of the CRISP-DM reference model was completed through investigation into the data technology infrastructure at the facility. An understanding of the flow of energy consumption data and the databases in which it was saved was acquired.

Data Preparation − Energy consumption data is complex to compute because of the feature of the metering. Cumulative meters are used for electrical energy and as a result, pre-processing should be completed on the outputted data. In the case under investigation, this was achieved previous to being output to the user.

But regardless of this pre-cleansing of the data, outliers remained in the data set as the pre-cleansing procedure did not eliminate all anomalies. Hence, the data preparation phase was used to eliminate some remaining outliers in the data set delivered to the user.

Two data sources were used to gather the data needed for an entire analysis of the electrical energy consumers on-site − energy management application and wind turbine management application.

The electrical energy fascinated on-site is calculated by cumulative kilowatt-hour (kWh) meters. Pre-processing of this data contained detecting outliers generated by meter errors and transforming the data from kWh to average electrical loads in kilowatts (kW). The second step was needed to analyse some data in the equal format and units.

Modelling − The dataset output from the data preparation procedure was in a clean and functional format as an outcome of the data cleansing implemented. For the gaols of this case study, the compressed air load was the selected quantity to be modelled, as it was the most suitable variable to feature the power of the available energy data.

When the load was considered at a high-level, there was no clear and apparent correlation to other essential energy users on-site. The other essential energy users were more predictable because of scheduling of supplies and the presence of standard operating process.