Call for datasets: complex mixtures and artificial intelligence

The advancement of knowledge in artificial intelligence is unlocking new potential in the prediction and modelling of several (bio)chemical processes and of structure-property relationships.

Biomass is the most sustainable and abundant renewable resource to replace fossil oil based raw materials. Biomass refining is however still a very challenging task, often resulting in complex mixtures which do not show a high degree of reproducibility due to variability in the biomass source. However, for high value application domains like materials science, reproducibility of the starting products to ensure reliable materials properties is required. Moreover, the better defined the building blocks from biomass, the easier the material development. Therefore, it is important to find the biomass refinery procedures that are most efficient, that result in mixtures enriched in the desired building blocks and/or in processes that can be run in a reproducible way. This is a very challenging task due to the high number of parameters that play a role, while often the influence of them is not fully understood: biomass source variability, place where the biomass was cultivated, biomass preprocessing (e.g. washing, milling, extraction, …), biomass conversion strategies (e.g. catalytic hydrogenation of lignin, selective solubilization processes, …), biomass separation and purification, ….

The advancement of knowledge in artificial intelligence is unlocking new potential in the prediction and modelling of several (bio)chemical processes and of structure-property relationships. Examples where artificial intelligence was used in relation to biomass:

  • to predict the thermal properties of biomass which is used for energy generation[1]
  • biomass characterization[2]
  • find the right reaction conditions to tune the composition of the complex mixture[3],[4]
  • find the right processing parameters for biomass preprocessing[5]

In order to develop an artificial intelligence model, the model has to be trained with a dataset of existing, experimental data. At this point, artificial intelligence often hits limitations, since small datasets (even a few 100 datapoints is small) do not give reliable predictions. Though huge datasets with millions of datapoints exist in biology and medical sciences, they are rather scarce in materials science. Practically, it is not straightforward (laborious, manpower limitations, raw materials costs, …) to generate huge datasets in lab environments, unless the experimentation is performed in a high-throughput manner (e.g. Flamac). In order to be able to apply artificial intelligence models reliably in materials science, Maastricht University has developed an algorithm that can deal with small datasets.[6] In their article, a dataset of adhesive foam tape formulations containing less than 50 datapoints was used. In the experimental datasets used to train the model, three experimental parameters (ratio of three building blocks) were varied while processing parameters were kept constant, in order to study the effect on the properties of the adhesives (90◦ peel strength on steel and the elongation at break). This example is given to show what is possible with the algorithm at the limits. However, if we would have had the same dataset of 50 datapoints with 20 input variables, then the size of the dataset should have been much larger to be able to draw reliable conclusions.

 The Green-Chem network is looking for experimental datasets related to biomass in order to apply articifial intelligence techniques, with as final target to accelerate biomass refinery for the implementation of biomass as a raw material for different applications, e.g. in materials science, in pharmaceutical sciences, energy, [7]

How to participate?

If you have an experimental dataset available, you can contact the Green-Chem network via in order to discuss the appropriateness of the dataset. We are also open to starting projects which generate useful datasets. Preferably, a discussion takes place before the experiments start to ensure the best fit for developing an artificial intelligence model.

In first instance, a feasibility project will be done with a master student specialized in artificial intelligence of Maastricht University, supervised by both a chemist with knowledge in artificial intelligence and an expert in artificial intelligence. After this preliminary project, further funding opportunities will be evaluated depending on the results.



[1] O. Olatunji, S. akinlabi, N. Madushele, Application of artificial intelligence in the prediction of thermal properties of biomass, in Valorization of biomass to value-added commodities, p. 59-91

[2] U. Ahmed, P. Andersson, T. Andersson, E.T. Aparicio, H. Baaz, S. Barua, A. Berström, D. Bengtsson, D. Orisio, J. Skvaril, J. Zambrano, A Machine Learning Approach for Biomass Characterization, Energy Procedia, 158, 1279-1287 (2019)

[3] A.Y. Mutlu, O. Yucel, An artificial intelligence based approach to predicting syngas composition for downdraft biomass gasification, Energy, Elsevier, 165, p. 895-901 (2018)

[4] A.G. Adeniyi, J.O. Ighalo, G. Marques, Utilisation of machine learning algorithms for the prediction of syngas composition from biomass bio-oil steam reforming, International Journal of Sustainable Energy, 9(1)  (2020)

[5] Artificial intelligence helps improve biomass preprocessing,

[6] D.E.P. Vanpoucke, O.S.J. van Knippenberg, K. Hermans, K.V. Bernaerts, S. Mehrkanoon, Small data materials design with machine learning: When the average model knows best, Journal of Applied Physics, 128(5), 054901 (2020)