AI leverages big data; it promises new insights that derive from applying machine learning to datasets with more variables, longer timescales, and higher granularity than ever.
NB: This is an extract from an article by McKinsey & Co. related to Heavy Industry but some of the principles outlined are cross sector applicable.
Using months or even years’ worth of information, analytics models can tease out efficient operating regimes based on controllable variables. These insights can be embedded into existing systems, bundled into a separate advisory tool, or used for performance management.
Subscribe to our weekly newsletter and stay up to date
To succeed with AI, companies should have an automation environment with reliable historian data. Then, they will need to adapt their big data into a form that is amenable to AI, often with far fewer variables and with intelligent, first principles–based feature engineering. We term the latter format “smart data” to emphasize the focus on an expert-driven approach that improves predictive accuracy and aids in root-cause analysis.
Creating smart data
A common failure mode for companies looking to leverage AI is poor integration of operational expertise into the data-science process. Indeed, we advocate applying machine learning only after process data have been analyzed, enriched, and transformed with expert-driven data engineering. In practice, we suggest the following steps (Exhibit 1):
1. Define the process
Outline the steps of the process with experts and plant engineers, sketching out physical changes (such as grinding and heating) and chemical changes (such as oxidation and polymerization). Identify critical sensors and instruments, along with their maintenance dates, limits, units of measure, and whether they can be controlled. Finally, note the deterministic equations that govern the process (such as thermodynamic relationships or reaction stoichiometry), as well as the variables involved. The latter step should be accompanied by a literature search to expand the realm of thinking beyond the knowledge of the organization. If process expertise is limited, the use of external experts can be essential.
2. Enrich the data
Raw process data nearly always contain deficiencies. Thus, creating a high-quality dataset should be the focus, rather than striving for the maximum number of observables for training. Teams should be aggressive in removing nonsteady-state information, such as the ramping up and down of equipment, along with data from unrelated plant configurations or operating regimes. Generic methods to treat missing or anomalous data should be avoided, such as imputing using averages, “clipping” to a maximum, or fitting to an assumed normal distribution.