Read an experimental data as a GEMSEO Dataset¶

from __future__ import annotations

from pathlib import Path

from pandas import read_csv

from vimseo.api import activate_logger
from vimseo.tools.validation.validation_point import NominalValuesOutputType
from vimseo.tools.validation.validation_point import read_nominal_values
from vimseo.utilities.datasets import SEP

activate_logger()

Loading experimental data is necessary for model validation or calibration. In general, the raw experimental data is not directly compatible with VIMSEO tools. In addition, at least two types of experimental data can be considered: - a collection of validation points defined by its nominal values. - a collection of validation points with repeats: the experiment is repeated several times for the same nominal values. Each repeat captures some variability in the material properties and some uncertainties in the experimental set-up. Some input and output variables are measured for each repeat.

The conversion from raw to VIMSEO-compatible data should consider the following requirements: - conversion of data with repeats to nominal values, by considering the mean value of the measured data for each validation point. - presence of vectors among the measured or nominal data. For instance, experiments on composites material generally define a stacking sequence (layup), which is an array of ply angles. We assume that the vectors are specified as strings, through an encoding procedure. Typically, [0.0, 5.0, 8.9] is encoded as '0.0_5.0_8.9'. NumPy string representation of array can also be used.

Let's consider a non-trivial experimental data containing several validation points, with several repeats for each, and a vector variable layup.

raw_file = Path("dummy_experimental_data.csv")
read_csv(raw_file, delimiter=SEP)

	Unnamed: 0	layup	nominal_width	nominal_length	nominal_thickness	nominal_diameter	width	thickness	max_force	nominal_radius
0	0	45_0_0_135_90_45_0_0_135_0_0_135_0_0_45_90_135...	32.1	180	3.74	6.35	32.09	3.72	75864.0	3.175
1	1	45_0_0_135_90_45_0_0_135_0_0_135_0_0_45_90_135...	32.1	180	3.74	6.35	31.89	3.72	73360.0	3.175
2	2	45_0_0_135_90_45_0_0_135_0_0_135_0_0_45_90_135...	32.1	180	3.74	6.35	32.02	3.75	72816.0	3.175
3	3	45_0_0_135_90_45_0_0_135_0_0_135_0_0_45_90_135...	32.1	180	3.74	6.35	31.94	3.76	75555.0	3.175
4	4	45_0_0_135_90_45_0_0_135_0_0_135_0_0_45_90_135...	32.1	180	3.74	6.35	32.05	3.72	75670.0	3.175
...	...	...	...	...	...	...	...	...	...	...
61	61	45_135_0_0_45_0_135_0_0_45_135_135_45_0_0_135_...	32.0	180	4.01	6.35	32.00	4.05	85800.0	3.175
62	62	45_135_0_0_45_0_135_0_0_45_135_135_45_0_0_135_...	32.0	180	4.01	6.35	31.98	4.05	91648.0	3.175
63	63	45_135_0_0_45_0_135_0_0_45_135_135_45_0_0_135_...	32.0	180	4.01	6.35	31.99	4.03	84606.0	3.175
64	64	45_135_0_0_45_0_135_0_0_45_135_135_45_0_0_135_...	32.0	180	4.01	6.35	32.01	4.03	82570.0	3.175
65	65	45_135_0_0_45_0_135_0_0_45_135_135_45_0_0_135_...	32.0	180	4.01	6.35	31.99	4.03	84123.0	3.175

66 rows × 10 columns

The read_nominal_values function allows to load the raw data and returns nominal values. The master_name is the name of the variable that is used to identify the validation points: each validation point has the same master_name value, and this value is unique among the validation points. The additional_names are variables added to the nominal values. Their values is computed as the mean value of all the repeats for each validation point. In case the raw data names do not match the model variable names, a renaming can be done with name_remapping. Finally, most VIMSEO tools use a GEMSEO Dataset as input. GEMSEO Datasets are multi-index column Pandas DataFrame, allowing to properly handle data by groups and by component (useful to handle vectors). Here, the nominal values are returned as a GEMSEO Dataset. In this case, the following additional arguments are necessary: - a mapping between scalar variable names and group names, to specify in which groups the variables are placed, - a mapping between vector names and group names. Vectors are thus identified, and are decoded as numerical arrays in the Dataset. For example, '0.0_5.0_8.9' is stored as a variable having three components.

ds = read_nominal_values(
    master_name="layup",
    csv_path=raw_file,
    additional_names=[
        "nominal_width",
        "nominal_length",
        "nominal_thickness",
        "nominal_diameter",
        "nominal_radius",
        "max_force",
    ],
    name_remapping={
        "nominal_width": "width",
        "nominal_length": "length",
        "nominal_thickness": "thickness",
        "nominal_diameter": "diameter",
        "nominal_radius": "radius",
    },
    output_type=NominalValuesOutputType.GEMSEO_DATASET,
    variable_names_to_group_names={
        "layup": "inputs",
        "nominal_width": "inputs",
        "nominal_length": "inputs",
        "nominal_thickness": "inputs",
        "nominal_diameter": "inputs",
        "nominal_radius": "inputs",
        "max_force": "outputs",
    },
    vector_names_to_group_names={"layup": "inputs"},
)

Out:

/home/sebastien.bocquet/PycharmProjects/vimseo/src/vimseo/utilities/datasets.py:376: FutureWarning:

Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`

/home/sebastien.bocquet/PycharmProjects/vimseo/src/vimseo/utilities/datasets.py:376: FutureWarning:

Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`

/home/sebastien.bocquet/PycharmProjects/vimseo/src/vimseo/utilities/datasets.py:376: FutureWarning:

Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`

/home/sebastien.bocquet/PycharmProjects/vimseo/src/vimseo/utilities/datasets.py:376: FutureWarning:

Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`

/home/sebastien.bocquet/PycharmProjects/vimseo/src/vimseo/utilities/datasets.py:376: FutureWarning:

Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`

/home/sebastien.bocquet/PycharmProjects/vimseo/src/vimseo/utilities/datasets.py:376: FutureWarning:

Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`

/home/sebastien.bocquet/PycharmProjects/vimseo/src/vimseo/utilities/datasets.py:376: FutureWarning:

Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`

The nominal values can be exported to a csv file:

ds.to_csv(f"{raw_file.stem}_nominal_values.csv", sep=SEP)

Total running time of the script: ( 0 minutes 0.041 seconds)

Download Python source code: plot_read_reference_data.py

Download Jupyter notebook: plot_read_reference_data.ipynb

Gallery generated by mkdocs-gallery