Skip to content

Note

Click here to download the full example code

Generating synthetic model output data for examples, tests or mock studies.

Problem

As a developer, you want ot generate reference data for examples or tests, controlling the bias of the data with respect to the model output. As a user, you want to generate synthetic experimental data that could be used to validate or verify a model.

Solution

A utility function is provided to generate synthetic reference data from a model and either:

  • an input dataset or
  • a parameter space to sample from.

Example

In the example below, synthetic reference data is generated for a bending test analytical beam model, and a bias is added to the data to obtain non-zero error metrics in the validation case.

from gemseo.datasets.io_dataset import IODataset
from numpy import atleast_1d

from vimseo import EXAMPLE_RUNS_DIR
from vimseo.api import create_model
from vimseo.core.model_settings import IntegratedModelSettings
from vimseo.tools.space.space_tool import SpaceTool
from vimseo.utilities.datasets import SEP
from vimseo.utilities.generate_validation_reference import Bias
from vimseo.utilities.generate_validation_reference import (
    generate_reference_from_dataset,
)

Load a parameter space to sample from.

space_tool_result = SpaceTool.load_results("bending_test_validation_input_space.json")
print(space_tool_result)

Out:

/home/sebastien.bocquet/PycharmProjects/vimseo/.tox/doc/lib/python3.11/site-packages/pydantic/main.py:209: DeprecationWarning:

Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)

SpaceToolResult(metadata=ToolResultMetadata(generic={'datetime': '24-01-2025_21-20-26', 'version': '1.0.1.dev19+gef64ff85.d20250121'}, misc={}, settings={'distribution_name': 'OTTriangularDistribution', 'space_builder_name': 'FromModelCenterAndCov', 'minimum_values': None, 'maximum_values': None, 'center_value_expr': '0.5*(mini+maxi)', 'use_default_values_as_center': True, 'variable_names': ['length', 'width', 'height', 'imposed_dplt', 'young_modulus', 'nu_p'], 'center_values': None, 'cov': 0.05, 'truncate_to_model_bounds': True, 'lower_bounds': None, 'upper_bounds': None}, report={}, model=None), parameter_space=Parameter space:
+---------------+-------------+--------------------+-------------+-------+--------------------------------------------------------------------------+--------------------+
| Name          | Lower bound |       Value        | Upper bound | Type  |                           Initial distribution                           | Transformation(x)= |
+---------------+-------------+--------------------+-------------+-------+--------------------------------------------------------------------------+--------------------+
| nu_p          |    0.285    | 0.3002399751580314 |    0.315    | float |   Triangular(lower=-1000000000000.0, mode=0.3, upper=1000000000000.0)    |      Trunc(x)      |
| imposed_dplt  |    -5.25    | -4.99955553660073  |    -4.75    | float |   Triangular(lower=-1000000000000.0, mode=-5.0, upper=1000000000000.0)   |      Trunc(x)      |
| young_modulus |    199500   | 209999.9992855908  |    220500   | float | Triangular(lower=-1000000000000.0, mode=210000.0, upper=1000000000000.0) |      Trunc(x)      |
| length        |     570     |  599.999950346782  |     630     | float |  Triangular(lower=-1000000000000.0, mode=600.0, upper=1000000000000.0)   |      Trunc(x)      |
| height        |      38     | 39.99977464700989  |      42     | float |   Triangular(lower=-1000000000000.0, mode=40.0, upper=1000000000000.0)   |      Trunc(x)      |
| width         |     28.5    | 29.99955343520399  |     31.5    | float |   Triangular(lower=-1000000000000.0, mode=30.0, upper=1000000000000.0)   |      Trunc(x)      |
+---------------+-------------+--------------------+-------------+-------+--------------------------------------------------------------------------+--------------------+)

Generate 3 samples of input data from the parameter space, and create a dataset with the input data.

input_data = space_tool_result.parameter_space.compute_samples(
    n_samples=3, as_dict=False
)
reference_data = IODataset()
reference_data.add_group(
    IODataset.INPUT_GROUP,
    input_data,
    space_tool_result.parameter_space.uncertain_variables,
)

Prepare the model and the bias to apply to the output data.

model_name = "BendingTestAnalytical"
load_case = "Cantilever"
model = create_model(
    model_name,
    load_case,
    model_options=IntegratedModelSettings(
        directory_archive_root=EXAMPLE_RUNS_DIR / "archive/generate_reference_data",
        directory_scratch_root=EXAMPLE_RUNS_DIR / "scratch/generate_reference_data",
        cache_file_path=EXAMPLE_RUNS_DIR
        / f"caches/generate_reference_data/{model_name}_{load_case}.hdf",
    ),
)
outputs_to_bias = {"reaction_forces": Bias(mult_factor=1.05)}

Generate the synthetic reference data from the model, the input dataset and the bias. Specific input data can be prescribed for some input variables, that are not in the input dataset, or that should be different from the one in the input dataset. The generated data can be returned as a dataset:

specific_inputs = {"length": atleast_1d(100.0)}
df = generate_reference_from_dataset(
    model,
    reference_data,
    specific_inputs=specific_inputs,
    outputs_to_bias=outputs_to_bias,
    as_dataset=True,
)
print(df)
df.to_csv("dataset_validation_beam_cantilever.csv", sep=SEP)

# Or a dataframe:
df = generate_reference_from_dataset(
    model,
    reference_data,
    specific_inputs=specific_inputs,
    outputs_to_bias=outputs_to_bias,
    as_dataset=False,
)
print(df)
df.to_csv("dataframe_validation_beam_cantilever.csv", sep=SEP)

Out:

GROUP         inputs                           ...         outputs                                                             
VARIABLE      height imposed_dplt      length  ... reaction_forces               user                          vims_git_version
COMPONENT          0            0           0  ...               0                  0                                         0
0          40.378662    -5.136475  610.926758  ...    -2323.603871  sebastien.bocquet  bd04719587923889b30edb372ef71eeaeb4c168d
1          38.020020    -4.822388  599.229004  ...    -2113.937195  sebastien.bocquet  bd04719587923889b30edb372ef71eeaeb4c168d
2          39.801025    -5.150757  597.712769  ...    -2642.579197  sebastien.bocquet  bd04719587923889b30edb372ef71eeaeb4c168d

[3 rows x 230 columns]
      height  imposed_dplt      length  ...  reaction_forces               user                          vims_git_version
0  40.378662     -5.136475  610.926758  ...     -2323.603871  sebastien.bocquet  bd04719587923889b30edb372ef71eeaeb4c168d
1  38.020020     -4.822388  599.229004  ...     -2113.937195  sebastien.bocquet  bd04719587923889b30edb372ef71eeaeb4c168d
2  39.801025     -5.150757  597.712769  ...     -2642.579197  sebastien.bocquet  bd04719587923889b30edb372ef71eeaeb4c168d

[3 rows x 230 columns]

Note

Data generation can append data to an existing dataframe, but not to an existing dataset.

Total running time of the script: ( 0 minutes 1.789 seconds)

Download Python source code: plot_generate_synthetic_data.py

Download Jupyter notebook: plot_generate_synthetic_data.ipynb

Gallery generated by mkdocs-gallery