Skip to content

Adding a attribute_setup.yml metadata file for global metadata attributes

A new metadata file containing the attribute setups has been added to "climix/climix/etc". This file can be replaced with the "-f" flag if the user wants to add their own settings.

Attribute file description:

# This file specifies the how the 'global attributes' metadata from the input files are
# transferred to the 'global attributes' in the output file. This includes actions for
# creating new, removing, renaming, replacing, appending values, keeping, and joining, 
# global attributes. The 'order' determines the order of which the actions are performed. 
# E.g., if the 'tracking_id' attribute is droped before it is joined the value will not
# exist and cannot not be joined. The settings can be specified for both the 'input 
# attributes' and 'output attributes'. The settings for the 'input attributes' are applied
# after loading the input datafiles and the settings for the 'output attributes' are 
# applied in the post_processing step of the cube. It is important that the settings
# are applied at the correct step in the process. E.g., attributes that may be removed
# in the concatenation process needs to be transferred in the 'input attributes' to be
# stored in the output file. However, replacing attributes in the 'input attributes'
# can change the look of the history from the input files and replacing values should
# be done in the 'output attributes' to ensure that the history is intact.
#
# order:
#    - action_name_1
#    - action_name_2
# Description: The order of which the actions are performed. The action at the top
#              will be performed first, etc.
# join:
#   - output_attribute_name_1 : attribute_name_1
#   - output_attribute_name_1 : attribute_name_2
#   - output_attribute_name_2 : attribute_name_3
# Description: Joins together the global attributes, given by 'attribute_name' for 
#              each input file, that has the same 'output_attribute_name'. Each 
#              'output_attribute_name' are stored in the output files global attributes.
#              The input attribute is removed when it is joined.
# append:
#    - output_attribute_name : attribute_value
# Description: Appends a value to an already existing global attribute with the same 
#              output_attribute_name. Creates a new global attribute if the name does not
#              exist and stores it as 'output_attribute_name' in the output global
#              attributes.
# rename:
#    - output_attribute_name : attribute_name
# Description: Takes a global attribute given by 'attribute_name' and stores it as
#              'output_attribute_name' in the output files global attributes. The
#              input attribute is removed when it is renamed.
# replace:
#    - attribute_name: new_value
# Description: Replaces the global attribute value for the attribute given by 'attribute_name'
#              with 'new_value'. 
# keep:
#    - attribute_name
# Description: Keeps the global attribute value given by 'attribute_name'. This is useful when
#              using 'unspecified' for the drop action. 
# drop: 
#    - attribute_name
# Description: Removes the global attribute given by 'attribute_name'. 'unspecified' can be used 
#              to remove all of the unspecified attributes, i.e. only the attributes that have
#              been appended, renamed, replaced, kept, or, joined is stored. 'unspecified' can be
#              used for either the 'output_attributes', 'input_attributes' or both. If it is given
#              for the 'input_attributes', all unspecified values that is not used will be removed
#              before concatenation. This makes it easier to see if some attributes that we want to
#              keep is removed by iris. However, it will not remove values that are created in the
#              post processing of the cube. If 'unspecified' is used for the 'output_attributes' 
#              some attributes that are created in the post processing of the cube might be removed,
#              e.g. 'proposed_standard_name' and 'frequency', these can still be kept using the keep 
#              action.

default attribute setups:

input_attributes:
  order:
   - replace
   - append
   - keep
   - join
   - rename
   - drop
  drop:
     - unspecified
  join:
     - senario : driving_experiment_name
     - rcm : model_id
     - rcm : rcm_version_id
     - gcm : driving_model_id
     - gcm_ensemble_member : driving_model_ensemble_member
     - tracking-id_creation-date : tracking_id
     - tracking-id_creation-date : creation_date
     - history-attributes : history
     - history-attributes : history_of_appended_files
     - comments : rossby_comment
     - comments : comment
     - comments : driving_experiment_comment
  rename:
     - input_frequency : frequency
     - input_institution : institution
     - input_institute_id : institute_id
     - input_references : references
     - input_product : product

output_attributes:
  order:
   - append
   - replace
   - join
   - keep
   - drop
  append:
     - product: "climate indicator"
     - institution: "Swedish Meteorological and Hydrological Institute, Rossby Centre"
     - institute_id: SMHI
     - references: "https://www.smhi.se/en/research/research-departments/climate-research-at-the-rossby-centre"
     - creation_date: "$NOW"
     - tracking_id: "$TRACKING_ID"
     - software: "$CLIMIX_VERSION"
  replace:
     - title: "--- TODO ---"
  keep:
     - frequency
     - CORDEX_domain
     - proposed_standard_name

Output when running: climix -e -l debug -v -p annual -x fd /nobackup/rossby27/users/sm_carni/data/tmp/data_files/tasmin_EUR-11_MPI-M-MPI-ESM-LR_rcp85_r2i1p1_MPI-CSC-REMO2009_v1_day_20060101-20101231.nc /nobackup/rossby27/users/sm_carni/data/tmp/data_files/tasmin_EUR-11_MPI-M-MPI-ESM-LR_rcp85_r2i1p1_MPI-CSC-REMO2009_v1_day_20110101-20151231.nc

Information text and warnings with debug:

6510ms:metadata.py:prepare_joined_attributes() DEBUG:root:Failed to join attribute <['history_of_appended_files', 'rossby_comment', 'driving_experiment_comment']> it does not exist in the cube data.
    6511ms:metadata.py:prepare_joined_attributes() DEBUG:root:Failed to join attribute <['history_of_appended_files', 'rossby_comment', 'driving_experiment_comment']> it does not exist in the cube data.
    6511ms:metadata.py:perform_actions() INFO:root:Performs action <join> on attributes <[{'senario': 'driving_experiment_name'}, {'rcm': 'model_id'}, {'rcm': 'rcm_version_id'}, {'gcm': 'driving_model_id'}, {'gcm_ensemble_member': 'driving_model_ensemble_member'}, {'tracking-id_creation-date': 'tracking_id'}, {'tracking-id_creation-date': 'creation_date'}, {'history-attributes': 'history'}, {'history-attributes': 'history_of_appended_files'}, {'comments': 'rossby_comment'}, {'comments': 'comment'}, {'comments': 'driving_experiment_comment'}]>
    6511ms:metadata.py:perform_actions() INFO:root:Performs action <rename> on attributes <[{'input_frequency': 'frequency'}, {'input_institution': 'institution'}, {'input_institute_id': 'institute_id'}, {'input_references': 'references'}, {'input_product': 'product'}]>
    6511ms:metadata.py:perform_actions() INFO:root:Performs action <drop> on attributes <['unspecified']>
    6512ms:metadata.py:perform() DEBUG:root:Failed to store attributes <['institution', 'driving_model_ensemble_member', 'comment', 'software', 'rossby_comment', 'rcm_version_id', 'creation_date', 'driving_experiment_comment', 'driving_experiment_name', 'driving_model_id', 'institute_id', 'history_of_appended_files', 'product', 'tracking_id', 'frequency', 'model_id', 'references', 'history']> they does not exist in the cube data. This can occur when the values did not exist in the input file or they where renamed or joined.
    6512ms:metadata.py:perform() DEBUG:root:Failed to store attributes <['institution', 'driving_model_ensemble_member', 'comment', 'software', 'rossby_comment', 'rcm_version_id', 'creation_date', 'driving_experiment_comment', 'driving_experiment_name', 'driving_model_id', 'institute_id', 'history_of_appended_files', 'product', 'tracking_id', 'frequency', 'model_id', 'references', 'history']> they does not exist in the cube data. This can occur when the values did not exist in the input file or they where renamed or joined.
    6512ms:metadata.py:check_removed_attributes() INFO:root:Removed 'unspecified' attributes: ['associated_files', 'physics_version', 'table_id', 'initialization_method', 'experiment', 'experiment_id', 'Conventions', 'project_id', 'driving_experiment', 'realization', 'cmor_version', 'contact', 'modeling_realm', 'source']
    8290ms:main.py:do_main() DEBUG:root:Calculating index
    8291ms:index.py:__call__() DEBUG:root:Starting preprocess
    8291ms:index.py:__call__() DEBUG:root:Finished preprocess
    8292ms:index.py:__call__() DEBUG:root:Data found for input <data>
    8293ms:index.py:__call__() DEBUG:root:Adding coord categorisation.
    9133ms:index.py:__call__() DEBUG:root:Preparing cubes
    9134ms:index.py:__call__() DEBUG:root:Setting up aggregation
    9283ms:aggregators.py:compute_pre_result() DEBUG:root:Setting up pre-result in aggregate mode
    9304ms:aggregators.py:compute_pre_result() DEBUG:root:Setup completed in    0
    9307ms:metadata.py:perform_actions() INFO:root:Performs action <append> on attributes <[{'product': 'climate indicator'}, {'institution': 'Swedish Meteorological and Hydrological Institute, Rossby Centre'}, {'institute_id': 'SMHI'}, {'references': 'https://www.smhi.se/en/research/research-departments/climate-research-at-the-rossby-centre'}, {'creation_date': '$NOW'}, {'tracking_id': '$TRACKING_ID'}, {'software': '$CLIMIX_VERSION'}]>
    9307ms:metadata.py:fill_in_value() INFO:root:Filled value <$NOW> with <2023-02-03T13:19:57 UTC>
    9307ms:metadata.py:fill_in_value() INFO:root:Filled value <$TRACKING_ID> with <--- TODO --->
    9307ms:metadata.py:fill_in_value() INFO:root:Filled value <$CLIMIX_VERSION> with <CLIMIX version 0.15.0+19.gae27e31>
    9307ms:metadata.py:perform_actions() INFO:root:Performs action <replace> on attributes <[{'title': '--- TODO ---'}]>
    9307ms:metadata.py:perform_actions() INFO:root:Performs action <keep> on attributes <['frequency', 'CORDEX_domain', 'proposed_standard_name']>

Output global attributes:

// global attributes:
		:CORDEX_domain = "EUR-11" ;
		:comments = "daily-minimum near-surface (usually, 2 meter) air temperature." ;
		:creation_date = "2023-02-03T13:19:57 UTC" ;
		:frequency = "yr" ;
		:gcm = "MPI-M-MPI-ESM-LR" ;
		:gcm_ensemble_member = "r2i1p1" ;
		:history-attributes = "2016-02-04T17:37:42Z altered by CMOR: Treated scalar dimension: height., 2016-05-10T19:55:23Z altered by CMOR: Treated scalar dimension: height." ;
		:input_frequency = "day" ;
		:input_institute_id = "MPI-CSC" ;
		:input_institution = "Helmholtz-Zentrum Geesthacht, Climate Service Center, Max Planck Institute for Meteorology" ;
		:input_product = "output" ;
		:input_references = "http://www.remo-rcm.de/" ;
		:institute_id = "SMHI" ;
		:institution = "Swedish Meteorological and Hydrological Institute, Rossby Centre" ;
		:product = "climate indicator" ;
		:rcm = "MPI-CSC-REMO2009_v1" ;
		:references = "https://www.smhi.se/en/research/research-departments/climate-research-at-the-rossby-centre" ;
		:senario = "rcp85" ;
		:software = "CLIMIX version 0.15.0+19.gae27e31" ;
		:title = "--- TODO ---" ;
		:tracking-id_creation-date = "436ab648-e2de-4696-9b11-fe1ac68bb87b_2016-05-10T19:55:24Z, 13096055-aa41-4c04-bab3-a86e45600ab3_2016-02-04T17:37:42Z" ;
		:tracking_id = "--- TODO ---" ;
		:Conventions = "CF-1.7" ;
}

Error handling have been added such that any improper use will give a warning or an understandable error message.

Test case: Importing a attribute setup file containing several 'errors' which triggering warning messages:

run:

climix -e -l debug -p annual -x fd /nobackup/rossby27/users/sm_carni/data/tmp/data_files/tasmin_EUR-11_MPI-M-MPI-ESM-LR_rcp85_r2i1p1_MPI-CSC-REMO2009_v1_day_20060101-20101231.nc /nobackup/rossby27/users/sm_carni/data/tmp/data_files/tasmin_EUR-11_MPI-M-MPI-ESM-LR_rcp85_r2i1p1_MPI-CSC-REMO2009_v1_day_20110101-20151231.nc -f /home/sm_carni/Project/YML/attribute_setups_1.yml

For attribute setups:

input_attributes:
  drop:
     - tracking_ID
     - creation_date 
  join:
     - input_joined_attributes: tracking_ID
     - input_joined_attributes: creation_date
  append:
     - input_unit: Degree_Celsius
  replace:
     - product: climate service
  rename:
     - input_unit: Fahrenheit
     - unit : input_unit
     
output_attributes:
  order:
   - append
   - replace
   - join
   - drop
  append:
     - {frequency : dayss}
     - input_unit : C
  join:
     - output_joined_attributes : input_unit
     - output_joined_attributes : output_unit
  replace:
     - output_unit : Kelvin

Information and warnings:

WARNING:root:Replacing input attributes can change the 'look' of the history of the input datafiles (only use this if you want to replace or rename historical values). To replace values in the output file use replace for the output attributes.
DEBUG:root:Failed to join attribute <['tracking_ID']> it does not exist in the cube data.
DEBUG:root:Failed to join attribute <['tracking_ID']> it does not exist in the cube data.
INFO:root:Performs action <drop> on attributes <['tracking_ID', 'creation_date']>
DEBUG:root:Failed to drop attribute <['tracking_ID', 'creation_date']> it does not exist in the cube data.
DEBUG:root:Failed to drop attribute <['tracking_ID', 'creation_date']> it does not exist in the cube data.
INFO:root:Performs action <join> on attributes <[{'input_joined_attributes': 'tracking_ID'}, {'input_joined_attributes': 'creation_date'}]>
INFO:root:Performs action <append> on attributes <[{'input_unit': 'Degree_Celsius'}]>
INFO:root:Performs action <replace> on attributes <[{'product': 'climate service'}]>
INFO:root:Performs action <rename> on attributes <[{'input_unit': 'Fahrenheit'}, {'unit': 'input_unit'}]>
DEBUG:root:Failed to rename attribute <['Fahrenheit']> it does not exist in the cube data.
DEBUG:root:Failed to rename attribute <['Fahrenheit']> it does not exist in the cube data.
WARNING:root:Attributes removed by iris: <[{'history': "2016-05-10T19:55:23Z altered by CMOR: Treated scalar dimension: 'height'.", 'tracking_id': '436ab648-e2de-4696-9b11-fe1ac68bb87b'}, {'history': "2016-02-04T17:37:42Z altered by CMOR: Treated scalar dimension: 'height'.", 'tracking_id': '13096055-aa41-4c04-bab3-a86e45600ab3'}]>. Note: To transfer values that are incompatible between cubes use the join action in the attribute setups.
DEBUG:root:Calculating index
DEBUG:root:Starting preprocess
DEBUG:root:Finished preprocess
DEBUG:root:Data found for input <data>
DEBUG:root:Adding coord categorisation.
DEBUG:root:Preparing cubes
DEBUG:root:Setting up aggregation
DEBUG:root:Setting up pre-result in aggregate mode
DEBUG:root:Setup completed in    0
DEBUG:root:Failed to join attribute <['input_unit', 'output_unit']> it does not exist in the cube data.
INFO:root:Performs action <append> on attributes <[{'frequency': 'dayss'}, {'input_unit': 'C'}]>
INFO:root:Performs action <replace> on attributes <[{'output_unit': 'Kelvin'}]>
DEBUG:root:Failed to replace attribute <['output_unit']> it does not exist in the cube data
INFO:root:Performs action <join> on attributes <[{'output_joined_attributes': 'input_unit'}, {'output_joined_attributes': 'output_unit'}]>

Output:

// global attributes:
		:CORDEX_domain = "EUR-11" ;
		:associated_files = "gridspecFile: gridspec_atmos_fx_MPI-CSC-REMO2009_rcp85_r0i0p0.nc" ;
		:cmor_version = "2.9.1" ;
		:comment = "daily-minimum near-surface (usually, 2 meter) air temperature." ;
		:contact = "gerics-cordex@hzg.de" ;
		:driving_experiment = "MPI-M-MPI-ESM-LR, rcp85, r2i1p1" ;
		:driving_experiment_name = "rcp85" ;
		:driving_model_ensemble_member = "r2i1p1" ;
		:driving_model_id = "MPI-M-MPI-ESM-LR" ;
		:experiment = "RCP8.5" ;
		:experiment_id = "rcp85" ;
		:frequency = "yr, dayss" ;
		:initialization_method = 1 ;
		:input_joined_attributes = "2016-02-04T17:37:42Z, 2016-05-10T19:55:24Z, 2016-02-04T17:37:42Z, 2016-05-10T19:55:24Z" ;
		:input_unit = "C" ;
		:institute_id = "MPI-CSC" ;
		:institution = "Helmholtz-Zentrum Geesthacht, Climate Service Center, Max Planck Institute for Meteorology" ;
		:model_id = "MPI-CSC-REMO2009" ;
		:modeling_realm = "atmos" ;
		:physics_version = 1 ;
		:product = "climate service" ;
		:project_id = "CORDEX" ;
		:rcm_version_id = "v1" ;
		:realization = 2 ;
		:references = "http://www.remo-rcm.de/" ;
		:source = "MPI-CSC-REMO2009" ;
		:table_id = "Table day (March 2015) 6f55fe4ad23cded422652f83a747ce32" ;
		:title = "MPI-CSC-REMO2009 model output prepared for CORDEX RCP8.5" ;
		:unit = "Degree_Celsius" ;
		:Conventions = "CF-1.7" ;
}

Errors in the attribute setups such as missing global attributes only log a warning and does not end the run. If some attributes where removed by iris the user is also warned. For more problematic errors the run is interrupted and a error message is given.

If attributes that the user wants to keep differ between datafiles they are removed by iris. One option is to join these attributes, instead of rename/append/etc., then they will be kept. However, we need to decide on which these attributes are and put them under join.

TODO:

  • Update the default attribute setup file.
  • Test that the output stores all the necessary attributes for different files.
  • Test that the output stores all the necessary attributes for different files using index with two datafiles.
Edited by Carolina Nilsson

Merge request reports