Package 'hubValidations' reference manual

Title:	Testing framework for hubverse hub validations
Description:	This package aims at providing a simple interface to run validations on data and metadata submitted to a hubverse modeling hub. Validation tests can be run at different levels (single file, single folder, whole repository) and locally as well as part of a continuous integration workflow.
Authors:	Anna Krystalli [aut, cre] , Evan Ray [aut], Hugo Gruson [aut] , Zhian N. Kamvar [ctb] , Consortium of Infectious Disease Modeling Hubs [cph]
Maintainer:	Anna Krystalli <[email protected]>
License:	MIT + file LICENSE
Version:	0.11.0
Built:	2025-03-12 18:24:22 UTC
Source:	https://github.com/hubverse-org/hubValidations

Capture a condition of the result of validation check.

Description

Capture a condition of the result of validation check.

Usage

capture_check_cnd(
  check,
  file_path,
  msg_subject,
  msg_attribute,
  msg_verbs = c("is", "must be"),
  error = FALSE,
  details = NULL,
  ...
)
capture_check_cnd(
  check,
  file_path,
  msg_subject,
  msg_attribute,
  msg_verbs = c("is", "must be"),
  error = FALSE,
  details = NULL,
  ...
)

Arguments

`check`	logical, the result of a validation check. If `check` is `FALSE`, validation has failed. If `check` is `TRUE`, validation has succeeded.
`file_path`	character string. Path to the file being validated. Must be the relative path to the hub's `model-output` (or equivalent) directory.
`msg_subject`	character string. The subject of the validation.
`msg_attribute`	character string. The attribute of subject being validated.
`msg_verbs`	character vector of length 2. The verbs describing the state of the attribute in relation to the validation subject. The first element describes the state when validation succeeds, the second element, when validation fails.
`error`	logical. In the case of validation failure, whether the function should return an object of class `⁠<error/check_error>⁠` (`TRUE`) or `⁠<error/check_failure>⁠` (`FALSE`, default).
`details`	further details to be appended to the output message.
`...`	<dynamic> Named data fields stored inside the condition object.

Details

Arguments msg_subject, msg_attribute, msg_verbs and details accept text that can interpreted and formatted by cli::format_inline().

Value

Depending on whether validation has succeeded and the value of the error argument, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.
⁠<error/check_error>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Examples

capture_check_cnd(
  check = TRUE, file_path = "test/file.csv",
  msg_subject = "{.var round_id}", msg_attribute = "valid.", error = FALSE
)
capture_check_cnd(
  check = FALSE, file_path = "test/file.csv",
  msg_subject = "{.var round_id}", msg_attribute = "valid.", error = FALSE,
  details = "Must be one of 'A' or 'B', not 'C'"
)
capture_check_cnd(
  check = FALSE, file_path = "test/file.csv",
  msg_subject = "{.var round_id}", msg_attribute = "valid.", error = TRUE,
  details = "Must be one of {.val {c('A', 'B')}}, not {.val C}"
)
capture_check_cnd(
  check = TRUE, file_path = "test/file.csv",
  msg_subject = "{.var round_id}", msg_attribute = "valid.", error = FALSE
)
capture_check_cnd(
  check = FALSE, file_path = "test/file.csv",
  msg_subject = "{.var round_id}", msg_attribute = "valid.", error = FALSE,
  details = "Must be one of 'A' or 'B', not 'C'"
)
capture_check_cnd(
  check = FALSE, file_path = "test/file.csv",
  msg_subject = "{.var round_id}", msg_attribute = "valid.", error = TRUE,
  details = "Must be one of {.val {c('A', 'B')}}, not {.val C}"
)

Capture a simple info message condition

Description

Capture a simple info message condition. Useful for communicating when a check is ignored or skipped.

Usage

capture_check_info(file_path, msg, call = rlang::caller_call())
capture_check_info(file_path, msg, call = rlang::caller_call())

Arguments

`file_path`	character string. Path to the file being validated. Must be the relative path to the hub's `model-output` (or equivalent) directory.
`msg`	Character string. Accepts text that can interpreted and formatted by `cli::format_inline()`.
`call`	The defused call of the function that generated the message. Use to override default which uses the caller call. See rlang::stack for more details.

Value

A ⁠<message/check_info>⁠ condition class object. Returned object also inherits from subclass ⁠<hub_check>⁠.

Capture an execution error condition

Description

Capture an execution error condition. Useful for communicating when a check execution has failed. Usually used in conjunction with try.

Usage

capture_exec_error(file_path, msg, call = NULL)
capture_exec_error(file_path, msg, call = NULL)

Arguments

`file_path`	character string. Path to the file being validated. Must be the relative path to the hub's `model-output` (or equivalent) directory.
`msg`	Character string.
`call`	Character string. Name of the parent call that failed to execute. If `NULL` (default), the caller's call name is captured.

Value

A ⁠<error/check_exec_error>⁠ condition class object. Returned object also inherits from subclass ⁠<hub_check>⁠.

Capture an execution warning condition

Description

Capture an execution warning condition. Useful for communicating when a check execution has failed. Usually used in conjunction with try.

Usage

capture_exec_warning(file_path, msg, call = NULL)
capture_exec_warning(file_path, msg, call = NULL)

Arguments

`file_path`	character string. Path to the file being validated. Must be the relative path to the hub's `model-output` (or equivalent) directory.
`msg`	Character string.
`call`	Character string. Name of the parent call that failed to execute. If `NULL` (default), the caller's call name is captured.

Value

A ⁠<warning/check_exec_warn>⁠ condition class object. Returned object also inherits from subclass ⁠<hub_check>⁠.

Check hub correctly configured

Description

Checks that admin and tasks configuration files in directory hub-config are valid.

Usage

check_config_hub_valid(hub_path)
check_config_hub_valid(hub_path)

Arguments

hub_path

Either a character string path to a local Modeling Hub directory or an object of class ⁠<SubTreeFileSystem>⁠ created using functions s3_bucket() or gs_bucket() by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the arrow package. The hub must be fully configured with valid admin.json and tasks.json files within the hub-config directory.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_error>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check file exists at the file path specified

Description

Check file exists at the file path specified

Usage

check_file_exists(
  file_path,
  hub_path = ".",
  subdir = c("model-output", "model-metadata", "hub-config")
)
check_file_exists(
  file_path,
  hub_path = ".",
  subdir = c("model-output", "model-metadata", "hub-config")
)

Arguments

`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`subdir`	subdirectory within the hub

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_error>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check file format is accepted by hub.

Description

Check file format is accepted by hub.

Usage

check_file_format(file_path, hub_path, round_id)
check_file_format(file_path, hub_path, round_id)

Arguments

`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`round_id`	character string. The round identifier.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_error>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check file is being submitted to the correct folder

Description

Checks that the model_id metadata in the file name matches the directory name the file is being submitted to.

Usage

check_file_location(file_path)
check_file_location(file_path)

Arguments

file_path

character string. Path to the file being validated relative to the hub's model-output directory.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check number of files submitted per round does not exceed the allowed number of submissions per team.

Description

Check number of files submitted per round does not exceed the allowed number of submissions per team.

Usage

check_file_n(file_path, hub_path, allowed_n = 1L)
check_file_n(file_path, hub_path, allowed_n = 1L)

Arguments

`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`allowed_n`	integer(1). The maximum number of files allowed per round.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check a model output file name can be correctly parsed.

Description

Check a model output file name can be correctly parsed.

Usage

check_file_name(file_path)
check_file_name(file_path)

Arguments

file_path

character string. Path to the file being validated relative to the hub's model-output directory.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_error>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check file can be read successfully

Description

Check file can be read successfully

Usage

check_file_read(file_path, hub_path = ".")
check_file_read(file_path, hub_path = ".")

Arguments

file_path

character string. Path to the file being validated relative to the hub's model-output directory.

hub_path

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_error>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Raise conditions stored in a `hub_validations` S3 object

Description

This is meant to be used in CI workflows to raise conditions from hub_validations objects but can also be useful locally to summarise the results of checks contained in a hub_validations S3 object.

Usage

check_for_errors(x, verbose = FALSE)
check_for_errors(x, verbose = FALSE)

Arguments

`x`	A `hub_validations` object
`verbose`	Logical. If `TRUE`, print the results of all checks prior to raising condition and summarising `hub_validations` S3 object check results.

Value

An error if one of the elements of x is of class check_failure, check_error, check_exec_error or check_exec_warning. TRUE invisibly otherwise.

Check whether a metadata schema file exists

Description

Check whether a metadata schema file exists

Usage

check_metadata_file_exists(hub_path = ".", file_path)
check_metadata_file_exists(hub_path = ".", file_path)

Arguments

hub_path

file_path

character string. Path to the file being validated relative to the hub's model-metadata directory.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_error>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check file is being submitted to the correct folder

Description

Checks that the model_id metadata in the file name matches the directory name the file is being submitted to.

Usage

check_metadata_file_ext(file_path)
check_metadata_file_ext(file_path)

Arguments

file_path

character string. Path to the file being validated relative to the hub's model-output directory.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check that the metadata file is being submitted to the correct folder

Description

Check that the metadata file is being submitted to the correct folder

Usage

check_metadata_file_location(file_path)
check_metadata_file_location(file_path)

Arguments

file_path

character string. Path to the file being validated relative to the hub's model-metadata directory.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check whether the file name of a metadata file matches the model_id or combination of team_abbr and model_abbr specified within the metadata file

Description

Check whether the file name of a metadata file matches the model_id or combination of team_abbr and model_abbr specified within the metadata file

Usage

check_metadata_file_name(file_path, hub_path = ".")
check_metadata_file_name(file_path, hub_path = ".")

Arguments

file_path

character string. Path to the file being validated relative to the hub's model-metadata directory.

hub_path

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check whether a metadata file matches the schema provided by the hub

Description

Check whether a metadata file matches the schema provided by the hub

Usage

check_metadata_matches_schema(file_path, hub_path = ".")
check_metadata_matches_schema(file_path, hub_path = ".")

Arguments

file_path

character string. Path to the file being validated relative to the hub's model-metadata directory.

hub_path

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check whether a metadata schema file exists

Description

Check whether a metadata schema file exists

Usage

check_metadata_schema_exists(hub_path = ".")
check_metadata_schema_exists(hub_path = ".")

Arguments

hub_path

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check whether a metadata file for the given model exists

Description

Check whether a metadata file for the given model exists

Usage

check_submission_metadata_file_exists(file_path, hub_path = ".")
check_submission_metadata_file_exists(file_path, hub_path = ".")

Arguments

file_path

character string. Path to the file being validated relative to the hub's model-output directory.

hub_path

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_error>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Checks submission is within the valid submission window for a given round.

Description

Checks submission is within the valid submission window for a given round.

Usage

check_submission_time(
  hub_path,
  file_path,
  ref_date_from = c("file", "file_path")
)
check_submission_time(
  hub_path,
  file_path,
  ref_date_from = c("file", "file_path")
)

Arguments

`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`ref_date_from`	whether to get the reference date around which relative submission windows will be determined from the file's `file_path` round ID or the `file` contents themselves. `file` requires that the file can be read. Only applicable when a round is configured to determine the submission windows relative to the value in a date column in model output files. Not applicable when explicit submission window start and end dates are provided in the hub's config.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check model data column data types

Description

Check that model output data column datatypes conform to those define in the hub config.

Usage

check_tbl_col_types(
  tbl,
  file_path,
  hub_path,
  output_type_id_datatype = c("from_config", "auto", "character", "double", "integer",
    "logical", "Date")
)
check_tbl_col_types(
  tbl,
  file_path,
  hub_path,
  output_type_id_datatype = c("from_config", "auto", "character", "double", "integer",
    "logical", "Date")
)

Arguments

`tbl`	a tibble/data.frame of the contents of the file being validated.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`output_type_id_datatype`	character string. One of `"from_config"`, `"auto"`, `"character"`, `"double"`, `"integer"`, `"logical"`, `"Date"`. Defaults to `"from_config"` which uses the setting in the `output_type_id_datatype` property in the `tasks.json` config file if available. If the property is not set in the config, the argument falls back to `"auto"` which determines the `output_type_id` data type automatically from the `tasks.json` config file as the simplest data type required to represent all output type ID values across all output types in the hub. When only point estimate output types (where `output_type_id`s are `NA`,) are being collected by a hub, the `output_type_id` column is assigned a `character` data type when auto-determined. Other data type values can be used to override automatic determination. Note that attempting to coerce `output_type_id` to a data type that is not valid for the data (e.g. trying to coerce`"character"` values to `"double"`) will likely result in an error or potentially unexpected behaviour so use with care.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check column names of model output data

Description

Checks that a tibble/data.frame of data read in from the file being validated contains the expected task ID and standard column names according the round configuration being validated against.

Usage

check_tbl_colnames(tbl, round_id, file_path, hub_path = ".")
check_tbl_colnames(tbl, round_id, file_path, hub_path = ".")

Arguments

`tbl`	a tibble/data.frame of the contents of the file being validated.
`round_id`	character string. The round identifier.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_error>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check derived task ID columns contain valid values

Description

This check is used to validate that values in any derived task ID columns matches accepted values for each derived task ID in the config. Given the dependence of derived task IDs on the values of other values, it ignores the combinations of derived task ID values with those of other task IDs and focuses only on identifying values that do not match the accepted values.

Usage

check_tbl_derived_task_id_vals(
  tbl,
  round_id,
  file_path,
  hub_path,
  derived_task_ids = get_hub_derived_task_ids(hub_path, round_id)
)
check_tbl_derived_task_id_vals(
  tbl,
  round_id,
  file_path,
  hub_path,
  derived_task_ids = get_hub_derived_task_ids(hub_path, round_id)
)

Arguments

`tbl`	a tibble/data.frame of the contents of the file being validated. Column types must all be character.
`round_id`	character string. The round identifier.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`derived_task_ids`	Character vector of derived task ID names (task IDs whose values depend on other task IDs) to ignore. Columns for such task ids will contain `NA`s. Defaults to extracting derived task IDs from `config_tasks`. See `get_config_derived_task_ids()` for more details.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.

If no derived_task_ids are specified, the check is skipped and a ⁠<message/check_info>⁠ condition class object is retuned.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check model output data tbl round ID matches submission round ID.

Description

Check model output data tbl round ID matches submission round ID.

Usage

check_tbl_match_round_id(tbl, file_path, hub_path, round_id_col = NULL)
check_tbl_match_round_id(tbl, file_path, hub_path, round_id_col = NULL)

Arguments

`tbl`	a tibble/data.frame of the contents of the file being validated.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`round_id_col`	Character string. The name of the column containing `round_id`s. Usually, the value of round property `round_id` in hub `tasks.json` config file. Defaults to `NULL` and determined from the config if applicable.

Details

This check only applies to files being submitted to rounds where round_id_from_variable: true or where a round_id_col name is explicitly provided. Skipped otherwise.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_error>⁠ condition class object.

If round_id_from_variable: false and no round_id_col name is provided, check is skipped and a ⁠<message/check_info>⁠ condition class object is returned. If no valid round_id_col name is provided or can extracted from config (check through check_valid_round_id_col), a ⁠<message/check_error>⁠ condition class object is returned and the rest of the check skipped.

Check model data rows are all unique

Description

Checks that combinations of task ID, output type and output type ID value combinations are unique, by checking that there are no duplicate rows across all tbl columns excluding the value column.

Usage

check_tbl_rows_unique(tbl, file_path, hub_path)
check_tbl_rows_unique(tbl, file_path, hub_path)

Arguments

`tbl`	a tibble/data.frame of the contents of the file being validated. Column types must all be character.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check model output data tbl sample compound task id sets for each modeling task match or are coarser than the expected set defined in the config.

Description

This check detects the compound task ID sets of samples, implied by the output_type_id and task ID values, and checks them for internal consistency and compliance with the compound_taskid_set defined for each round modeling task in the tasks.json config.

Usage

check_tbl_spl_compound_taskid_set(
  tbl,
  round_id,
  file_path,
  hub_path,
  derived_task_ids = get_hub_derived_task_ids(hub_path)
)
check_tbl_spl_compound_taskid_set(
  tbl,
  round_id,
  file_path,
  hub_path,
  derived_task_ids = get_hub_derived_task_ids(hub_path)
)

Arguments

`tbl`	a tibble/data.frame of the contents of the file being validated. Column types must all be character.
`round_id`	character string. The round identifier.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`derived_task_ids`	Character vector of derived task ID names (task IDs whose values depend on other task IDs) to ignore. Columns for such task ids will contain `NA`s. Defaults to extracting derived task IDs from hub `task.json`. See `get_hub_derived_task_ids()` for more details.

Details

If the check fails, the output of the check includes an errors element, a list of items, one for each modeling task failing validation. The structure depends on the reason the check failed.

If the check failed because more that a single unique compound_taskid_set was found for a given model task, the errors object will be a list with one element for each compound_taskid_set detected and will have the following structure:

tbl_comp_tids: a compound task id set detected in the the tbl.
output_type_ids: The output type ID of the sample that does not contain a single, unique value for each compound task ID.

If the check failed because task IDs which is not allowed in the config, were identified as compound task ID (i.e. samples describe "finer" compound modeling tasks) for a given model task, the errors object will be a list with the structure described above as well as the additional following elements:

config_comp_tids: the allowed compound_taskid_set defined in the modeling task config.
invalid_tbl_comp_tids: the names of invalid compound task IDs.

The name of each element is the index identifying the config modeling task the sample is associated with mt_id. See hubverse documentation on samples for more details.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_error>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check model output data tbl samples contain single unique values for each compound task ID within individual samples

Description

Check model output data tbl samples contain single unique values for each compound task ID within individual samples

Usage

check_tbl_spl_compound_tid(
  tbl,
  round_id,
  file_path,
  hub_path,
  compound_taskid_set = NULL,
  derived_task_ids = get_hub_derived_task_ids(hub_path, round_id)
)
check_tbl_spl_compound_tid(
  tbl,
  round_id,
  file_path,
  hub_path,
  compound_taskid_set = NULL,
  derived_task_ids = get_hub_derived_task_ids(hub_path, round_id)
)

Arguments

`tbl`	a tibble/data.frame of the contents of the file being validated. Column types must all be character.
`round_id`	character string. The round identifier.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`compound_taskid_set`	a list of `compound_taskid_set`s (characters vector of compound task IDs), one for each modeling task. Used to override the compound task ID set in the config file, for example, when validating coarser samples.
`derived_task_ids`	Character vector of derived task ID names (task IDs whose values depend on other task IDs) to ignore. Columns for such task ids will contain `NA`s. Defaults to extracting derived task IDs from hub `task.json`. See `get_hub_derived_task_ids()` for more details.

Details

Output of the check includes an errors element, a list of items, one for each sample failing validation, with the following structure:

mt_id: Index identifying the config modeling task the sample is associated with.
output_type_id: The output type ID of the sample that does not contain a single, unique value for each compound task ID.
values: The unique values of each compound task ID. See hubverse documentation on samples for more details.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_error>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check model output data tbl samples contain the appropriate number of samples for a given compound idx.

Description

Check model output data tbl samples contain the appropriate number of samples for a given compound idx.

Usage

check_tbl_spl_n(
  tbl,
  round_id,
  file_path,
  hub_path,
  compound_taskid_set = NULL,
  derived_task_ids = get_hub_derived_task_ids(hub_path, round_id)
)
check_tbl_spl_n(
  tbl,
  round_id,
  file_path,
  hub_path,
  compound_taskid_set = NULL,
  derived_task_ids = get_hub_derived_task_ids(hub_path, round_id)
)

Arguments

`tbl`	a tibble/data.frame of the contents of the file being validated. Column types must all be character.
`round_id`	character string. The round identifier.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`compound_taskid_set`	a list of `compound_taskid_set`s (characters vector of compound task IDs), one for each modeling task. Used to override the compound task ID set in the config file, for example, when validating coarser samples.
`derived_task_ids`	Character vector of derived task ID names (task IDs whose values depend on other task IDs) to ignore. Columns for such task ids will contain `NA`s. Defaults to extracting derived task IDs from hub `task.json`. See `get_hub_derived_task_ids()` for more details.

Details

Output of the check includes an errors element, a list of items, one for each compound_idx failing validation, with the following structure:

compound_idx: the compound idx that failed validation of number of samples.
n: the number of samples counted for the compound idx.
min_samples_per_task: the minimum number of samples required for the compound idx.
max_samples_per_task: the maximum number of samples required for the compound idx.
compound_idx_tbl: a tibble of the expected structure for samples belonging to the compound idx. See hubverse documentation on samples for more details.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check model output data tbl samples contain single unique combination of non-compound task ID values across all samples

Description

Check model output data tbl samples contain single unique combination of non-compound task ID values across all samples

Usage

check_tbl_spl_non_compound_tid(
  tbl,
  round_id,
  file_path,
  hub_path,
  compound_taskid_set = NULL,
  derived_task_ids = get_hub_derived_task_ids(hub_path, round_id)
)
check_tbl_spl_non_compound_tid(
  tbl,
  round_id,
  file_path,
  hub_path,
  compound_taskid_set = NULL,
  derived_task_ids = get_hub_derived_task_ids(hub_path, round_id)
)

Arguments

`tbl`	a tibble/data.frame of the contents of the file being validated. Column types must all be character.
`round_id`	character string. The round identifier.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`compound_taskid_set`	a list of `compound_taskid_set`s (characters vector of compound task IDs), one for each modeling task. Used to override the compound task ID set in the config file, for example, when validating coarser samples.
`derived_task_ids`	Character vector of derived task ID names (task IDs whose values depend on other task IDs) to ignore. Columns for such task ids will contain `NA`s. Defaults to extracting derived task IDs from hub `task.json`. See `get_hub_derived_task_ids()` for more details.

Details

Output of the check includes an errors element, a list of items, one for each modeling task containing samples failing validation, with the following structure:

mt_id: Index identifying the config modeling task the samples are associated with.
output_type_ids: The output type IDs of samples that do not match the most frequent non-compound task ID value combination across all samples in the modeling task.
frequent: The most frequent non-compound task ID value combination across all samples in the modeling task to which all samples were compared. See hubverse documentation on samples for more details.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_error>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check model output data tbl contains a single unique round ID.

Description

Check model output data tbl contains a single unique round ID.

Usage

check_tbl_unique_round_id(tbl, file_path, hub_path, round_id_col = NULL)
check_tbl_unique_round_id(tbl, file_path, hub_path, round_id_col = NULL)

Arguments

`tbl`	a tibble/data.frame of the contents of the file being validated.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`round_id_col`	Character string. The name of the column containing `round_id`s. Usually, the value of round property `round_id` in hub `tasks.json` config file. Defaults to `NULL` and determined from the config if applicable.

Details

This check only applies to files being submitted to rounds where round_id_from_variable: true or where a round_id_col name is explicitly provided. Skipped otherwise.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_error>⁠ condition class object.

Check output type values of model output data against config

Description

Checks that values in the value column of a tibble/data.frame of data read in from the file being validated conform to the configuration for each output type of the appropriate model task.

Usage

check_tbl_value_col(
  tbl,
  round_id,
  file_path,
  hub_path,
  derived_task_ids = get_hub_derived_task_ids(hub_path, round_id)
)
check_tbl_value_col(
  tbl,
  round_id,
  file_path,
  hub_path,
  derived_task_ids = get_hub_derived_task_ids(hub_path, round_id)
)

Arguments

`tbl`	a tibble/data.frame of the contents of the file being validated.
`round_id`	character string. The round identifier.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`derived_task_ids`	Character vector of derived task ID names (task IDs whose values depend on other task IDs) to ignore. Columns for such task ids will contain `NA`s. Defaults to extracting derived task IDs from hub `task.json`. See `get_hub_derived_task_ids()` for more details.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check that `quantile` and `cdf` output type values of model output data are non-descending

Description

Checks that values in the value column for quantile and cdf output type data for each unique task ID/output type combination are non-descending when arranged by increasing output_type_id order. Check only performed if tbl contains quantile or cdf output type data. If not, the check is skipped and a ⁠<message/check_info>⁠ condition class object is returned.

Usage

check_tbl_value_col_ascending(
  tbl,
  file_path,
  hub_path,
  round_id,
  derived_task_ids = get_hub_derived_task_ids(hub_path)
)
check_tbl_value_col_ascending(
  tbl,
  file_path,
  hub_path,
  round_id,
  derived_task_ids = get_hub_derived_task_ids(hub_path)
)

Arguments

`tbl`	a tibble/data.frame of the contents of the file being validated. Column types must all be character.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`round_id`	character string. The round identifier.
`derived_task_ids`	Character vector of derived task ID names (task IDs whose values depend on other task IDs) to ignore. Columns for such task ids will contain `NA`s. Defaults to extracting derived task IDs from hub `task.json`. See `get_hub_derived_task_ids()` for more details.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check that `pmf` output type values of model output data sum to 1.

Description

Checks that values in the value column of pmf output type data for each unique task ID combination sum to 1. Check only performed if tbl contains pmf output type data. If not, the check is skipped and a ⁠<message/check_info>⁠ condition class object is returned.

Usage

check_tbl_value_col_sum1(tbl, file_path)
check_tbl_value_col_sum1(tbl, file_path)

Arguments

`tbl`	a tibble/data.frame of the contents of the file being validated.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check model output data tbl contains valid value combinations

Description

Check model output data tbl contains valid value combinations

Usage

check_tbl_values(
  tbl,
  round_id,
  file_path,
  hub_path,
  derived_task_ids = get_hub_derived_task_ids(hub_path, round_id)
)
check_tbl_values(
  tbl,
  round_id,
  file_path,
  hub_path,
  derived_task_ids = get_hub_derived_task_ids(hub_path, round_id)
)

Arguments

`tbl`	a tibble/data.frame of the contents of the file being validated. Column types must all be character.
`round_id`	character string. The round identifier.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`derived_task_ids`	Character vector of derived task ID names (task IDs whose values depend on other task IDs) to ignore. Columns for such task ids will contain `NA`s. Defaults to extracting derived task IDs from hub `task.json`. See `get_hub_derived_task_ids()` for more details.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_error>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check all required task ID/output type/output type ID value combinations present in model data.

Description

Check all required task ID/output type/output type ID value combinations present in model data.

Usage

check_tbl_values_required(
  tbl,
  round_id,
  file_path,
  hub_path,
  derived_task_ids = get_hub_derived_task_ids(hub_path)
)
check_tbl_values_required(
  tbl,
  round_id,
  file_path,
  hub_path,
  derived_task_ids = get_hub_derived_task_ids(hub_path)
)

Arguments

`tbl`	a tibble/data.frame of the contents of the file being validated. Column types must all be character.
`round_id`	character string. The round identifier.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`derived_task_ids`	Character vector of derived task ID names (task IDs whose values depend on other task IDs) to ignore. Columns for such task ids will contain `NA`s. Defaults to extracting derived task IDs from hub `task.json`. See `get_hub_derived_task_ids()` for more details.

Details

Note that it is necessary for derived_task_ids to be specified if any of the task IDs with required values have dependent derived task IDs. If this is the case and derived task IDs are not specified, the dependent nature of derived task ID values will result in false validation errors when validating required values.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check whether the `round_id` determined for the submission is valid

Description

Check whether the round_id determined for the submission is valid

Usage

check_valid_round_id(round_id, file_path, hub_path = ".")
check_valid_round_id(round_id, file_path, hub_path = ".")

Arguments

`round_id`	character string. The round identifier.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_error>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check that any round_id_col name provided or extracted from the hub config is valid.

Description

Check that any round_id_col name provided or extracted from the hub config is valid.

Usage

check_valid_round_id_col(tbl, file_path, hub_path, round_id_col = NULL)
check_valid_round_id_col(tbl, file_path, hub_path, round_id_col = NULL)

Arguments

`tbl`	a tibble/data.frame of the contents of the file being validated.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`round_id_col`	Character string. The name of the column containing `round_id`s. Usually, the value of round property `round_id` in hub `tasks.json` config file. Defaults to `NULL` and determined from the config if applicable.

Details

This check only applies to files being submitted to rounds where round_id_from_variable: true or where a round_id_col name is explicitly provided. Skipped otherwise.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.

If round_id_from_variable: false and no round_id_col name is provided, check is skipped and a ⁠<message/check_info>⁠ condition class object is returned. Returned object also inherits from subclass ⁠<hub_check>⁠.

Concatenate `hub_validations` S3 class objects

Description

Concatenate hub_validations S3 class objects

Usage

combine(...)
combine(...)

Arguments

...

hub_validations S3 class objects to be concatenated.

Value

a hub_validations S3 class object.

Create a custom validation check function template file.

Description

Create a custom validation check function template file.

Usage

create_custom_check(
  name,
  hub_path = ".",
  r_dir = "src/validations/R",
  error = FALSE,
  conditional = FALSE,
  error_object = FALSE,
  config = FALSE,
  extra_args = FALSE,
  overwrite = FALSE
)
create_custom_check(
  name,
  hub_path = ".",
  r_dir = "src/validations/R",
  error = FALSE,
  conditional = FALSE,
  error_object = FALSE,
  config = FALSE,
  extra_args = FALSE,
  overwrite = FALSE
)

Arguments

`name`	Character string. Name of the custom check function. We recommend following the hubValidations package naming convention. For more details, consult the article on writing custom check functions.
`hub_path`	Character string. Path to the hub directory. Default is the current working directory.
`r_dir`	Character string. Path (relative to `hub_path`) to the directory the custom check function file will be written to. Default is `src/validations/R` which is the recommended directory for storing custom check functions.
`error`	Logical. Defaults to `FALSE`, which will return a `⁠<error/check_failure>⁠` class object in the case of a failed check. Set this to `TRUE` if your custom check function is required to pass for other custom checks to be performed; in the case of a failed check, the custom check function will then return an `⁠<error/check_error>⁠` class object and cause custom validations to return early. Note that in the case of custom validations, executions errors in custom functions will also result in custom validations returning early.
`conditional`	Logical. If `TRUE`, the custom check function template will include a block of code to check a condition before running the check. This is useful when a check may need to be skipped based on a condition.
`error_object`	Logical. If `TRUE`, the custom check function template will include an error object that can be used to store additional information about the properties of the object being checked that caused check failure. For example, it could store the index of rows in a `tbl` that caused a check failure.
`config`	Logical. If `TRUE`, the custom check function template will include `hub_path` as a function argument and a block of code for reading in the hub `tasks.json` config file.
`extra_args`	Logical. If `TRUE`, the custom check function template will include an `extra_arg` template function argument and template block of code to check the input arguments of the custom check function.
`overwrite`	Logical. If `TRUE`, the function will overwrite an existing

Details

See the article on writing custom check functions for more.

Value

Invisible TRUE if the custom check function file is created successfully.

Examples

withr::with_tempdir({
  # Create the custom check file with default settings.
  create_custom_check("check_default")
  cat(readLines("src/validations/R/check_default.R"), sep = "\n")

  # Create fully featured custom check file.
  create_custom_check("check_full",
    error = TRUE, conditional = TRUE,
    error_object = TRUE, config = TRUE,
    extra_args = TRUE
  )
  cat(readLines("src/validations/R/check_full.R"), sep = "\n")
})
withr::with_tempdir({
  # Create the custom check file with default settings.
  create_custom_check("check_default")
  cat(readLines("src/validations/R/check_default.R"), sep = "\n")

  # Create fully featured custom check file.
  create_custom_check("check_full",
    error = TRUE, conditional = TRUE,
    error_object = TRUE, config = TRUE,
    extra_args = TRUE
  )
  cat(readLines("src/validations/R/check_full.R"), sep = "\n")
})

Create expanded grid of valid task ID and output type value combinations

Description

Create expanded grid of valid task ID and output type value combinations

Usage

expand_model_out_grid(
  config_tasks,
  round_id,
  required_vals_only = FALSE,
  force_output_types = FALSE,
  all_character = FALSE,
  output_type_id_datatype = c("from_config", "auto", "character", "double", "integer",
    "logical", "Date"),
  as_arrow_table = FALSE,
  bind_model_tasks = TRUE,
  include_sample_ids = FALSE,
  compound_taskid_set = NULL,
  output_types = NULL,
  derived_task_ids = get_config_derived_task_ids(config_tasks, round_id)
)
expand_model_out_grid(
  config_tasks,
  round_id,
  required_vals_only = FALSE,
  force_output_types = FALSE,
  all_character = FALSE,
  output_type_id_datatype = c("from_config", "auto", "character", "double", "integer",
    "logical", "Date"),
  as_arrow_table = FALSE,
  bind_model_tasks = TRUE,
  include_sample_ids = FALSE,
  compound_taskid_set = NULL,
  output_types = NULL,
  derived_task_ids = get_config_derived_task_ids(config_tasks, round_id)
)

Arguments

`config_tasks`	a list version of the content's of a hub's `tasks.json` config file, accessed through the `"config_tasks"` attribute of a `⁠<hub_connection>⁠` object or function `hubUtils::read_config()`.
`round_id`	Character string. Round identifier. If the round is set to `round_id_from_variable: true`, IDs are values of the task ID defined in the round's `round_id` property of `config_tasks`. Otherwise should match round's `round_id` value in config. Ignored if hub contains only a single round.
`required_vals_only`	Logical. Whether to return only combinations of Task ID and related output type ID required values.
`force_output_types`	Logical. Whether to force all output types to be required. If `TRUE`, all output type ID values are treated as required regardless of the value of the `is_required` property. Useful for creating grids of required values for optional output types.
`all_character`	Logical. Whether to return all character column.
`output_type_id_datatype`	character string. One of `"from_config"`, `"auto"`, `"character"`, `"double"`, `"integer"`, `"logical"`, `"Date"`. Defaults to `"from_config"` which uses the setting in the `output_type_id_datatype` property in the `tasks.json` config file if available. If the property is not set in the config, the argument falls back to `"auto"` which determines the `output_type_id` data type automatically from the `tasks.json` config file as the simplest data type required to represent all output type ID values across all output types in the hub. When only point estimate output types (where `output_type_id`s are `NA`,) are being collected by a hub, the `output_type_id` column is assigned a `character` data type when auto-determined. Other data type values can be used to override automatic determination. Note that attempting to coerce `output_type_id` to a data type that is not valid for the data (e.g. trying to coerce`"character"` values to `"double"`) will likely result in an error or potentially unexpected behaviour so use with care.
`as_arrow_table`	Logical. Whether to return an arrow table. Defaults to `FALSE`.
`bind_model_tasks`	Logical. Whether to bind expanded grids of values from multiple modeling tasks into a single tibble/arrow table or return a list.
`include_sample_ids`	Logical. Whether to include sample identifiers in the `output_type_id` column.
`compound_taskid_set`	List of character vectors, one for each modeling task in the round. Can be used to override the compound task ID set defined in the config. If `NULL` is provided for a given modeling task, a compound task ID set of all task IDs is used.
`output_types`	Character vector of output type names to include. Use to subset for grids for specific output types.
`derived_task_ids`	Character vector of derived task ID names (task IDs whose values depend on other task IDs) to ignore. Columns for such task ids will contain `NA`s. Defaults to extracting derived task IDs from `config_tasks`. See `get_config_derived_task_ids()` for more details.

Details

When a round is set to round_id_from_variable: true, the value of the task ID from which round IDs are derived (i.e. the task ID specified in round_id property of config_tasks) is set to the value of the round_id argument in the returned output.

When sample output types are included in the output and include_sample_ids = TRUE, the output_type_id column contains example sample indexes which are useful for identifying the compound task ID structure of multivariate sampling distributions in particular, i.e. which combinations of task ID values represent individual samples.

Value

If bind_model_tasks = TRUE (default) a tibble or arrow table containing all possible task ID and related output type ID value combinations. If bind_model_tasks = FALSE, a list containing a tibble or arrow table for each round modeling task.

Columns are coerced to data types according to the hub schema, unless all_character = TRUE. If all_character = TRUE, all columns are returned as character which can be faster when large expanded grids are expected. If required_vals_only = TRUE, values are limited to the combinations of required values only.

Note that if required_vals_only = TRUE and an optional output type is requested through output_types, a zero row grid will be returned. If all output types are requested however (i.e. when output_types = NULL) and they are all optional, a grid of required task ID values only will be returned. However, whenever force_output_types = TRUE, all output types are treated as required.

Examples

hub_con <- hubData::connect_hub(
  system.file("testhubs/flusight", package = "hubUtils")
)
config_tasks <- attr(hub_con, "config_tasks")
expand_model_out_grid(config_tasks, round_id = "2023-01-02")
expand_model_out_grid(
  config_tasks,
  round_id = "2023-01-02",
  required_vals_only = TRUE
)
# Specifying a round in a hub with multiple round configurations.
hub_con <- hubData::connect_hub(
  system.file("testhubs/simple", package = "hubUtils")
)
config_tasks <- attr(hub_con, "config_tasks")
expand_model_out_grid(config_tasks, round_id = "2022-10-01")
# Later round_id maps to round config that includes additional task ID 'age_group'.
expand_model_out_grid(config_tasks, round_id = "2022-10-29")
# Coerce all columns to character
expand_model_out_grid(config_tasks,
  round_id = "2022-10-29",
  all_character = TRUE
)
# Return arrow table
expand_model_out_grid(config_tasks,
  round_id = "2022-10-29",
  all_character = TRUE,
  as_arrow_table = TRUE
)
# Hub with sample output type
config_tasks <- read_config_file(system.file("config", "tasks.json",
  package = "hubValidations"
))
expand_model_out_grid(config_tasks,
  round_id = "2022-12-26"
)
# Include sample IDS
expand_model_out_grid(config_tasks,
  round_id = "2022-12-26",
  include_sample_ids = TRUE
)
# Hub with sample output type and compound task ID structure
config_tasks <- read_config_file(
  system.file("config", "tasks-comp-tid.json", package = "hubValidations")
)
expand_model_out_grid(config_tasks,
  round_id = "2022-12-26",
  include_sample_ids = TRUE
)
# Override config compound task ID set
# Create coarser compound task ID set for the first modeling task which contains
# samples
expand_model_out_grid(config_tasks,
  round_id = "2022-12-26",
  include_sample_ids = TRUE,
  compound_taskid_set = list(
    c("forecast_date", "target"),
    NULL
  )
)
expand_model_out_grid(config_tasks,
  round_id = "2022-12-26",
  include_sample_ids = TRUE,
  compound_taskid_set = list(
    NULL,
    NULL
  )
)
# Subset output types
config_tasks <- read_config(
  system.file("testhubs", "samples", package = "hubValidations")
)
expand_model_out_grid(config_tasks,
  round_id = "2022-10-29",
  include_sample_ids = TRUE,
  bind_model_tasks = FALSE,
  output_types = c("sample", "pmf"),
)
expand_model_out_grid(config_tasks,
  round_id = "2022-10-29",
  include_sample_ids = TRUE,
  bind_model_tasks = TRUE,
  output_types = "sample",
)
# Ignore derived task IDs
expand_model_out_grid(config_tasks,
  round_id = "2022-10-29",
  include_sample_ids = TRUE,
  bind_model_tasks = FALSE,
  output_types = "sample",
  derived_task_ids = "target_end_date"
)
# Return only required values
hub_path <- system.file("testhubs", "v4", "simple", package = "hubUtils")
config_tasks <- read_config(hub_path)
# Return required output types and output_types_ids only
expand_model_out_grid(
  config_tasks = config_tasks,
  round_id = "2022-10-22",
  required_vals_only = TRUE
)
# Force all output types to be required
expand_model_out_grid(
  config_tasks = config_tasks,
  round_id = "2022-10-22",
  required_vals_only = TRUE,
  force_output_types = TRUE
)
# Sub-setting for an optional output type returns an empty data frame
expand_model_out_grid(
  config_tasks = config_tasks,
  round_id = "2022-10-22",
  output_types = "mean",
  required_vals_only = TRUE
)
# force_output_types on an optional output type forces all output_type_id values
# to be required
expand_model_out_grid(
  config_tasks = config_tasks,
  round_id = "2022-10-22",
  output_types = "mean",
  required_vals_only = TRUE,
  force_output_types = TRUE
)
# Ignore derived task IDs
hub_path <- system.file("testhubs", "v4", "flusight", package = "hubUtils")
config_tasks <- read_config(hub_path)
# Defaults to using derived_task_ids from config
expand_model_out_grid(config_tasks, round_id = "2023-05-08")
# Can be overridden by argument derived_task_ids
expand_model_out_grid(config_tasks,
  round_id = "2023-05-08",
  derived_task_ids = NULL
)
hub_con <- hubData::connect_hub(
  system.file("testhubs/flusight", package = "hubUtils")
)
config_tasks <- attr(hub_con, "config_tasks")
expand_model_out_grid(config_tasks, round_id = "2023-01-02")
expand_model_out_grid(
  config_tasks,
  round_id = "2023-01-02",
  required_vals_only = TRUE
)
# Specifying a round in a hub with multiple round configurations.
hub_con <- hubData::connect_hub(
  system.file("testhubs/simple", package = "hubUtils")
)
config_tasks <- attr(hub_con, "config_tasks")
expand_model_out_grid(config_tasks, round_id = "2022-10-01")
# Later round_id maps to round config that includes additional task ID 'age_group'.
expand_model_out_grid(config_tasks, round_id = "2022-10-29")
# Coerce all columns to character
expand_model_out_grid(config_tasks,
  round_id = "2022-10-29",
  all_character = TRUE
)
# Return arrow table
expand_model_out_grid(config_tasks,
  round_id = "2022-10-29",
  all_character = TRUE,
  as_arrow_table = TRUE
)
# Hub with sample output type
config_tasks <- read_config_file(system.file("config", "tasks.json",
  package = "hubValidations"
))
expand_model_out_grid(config_tasks,
  round_id = "2022-12-26"
)
# Include sample IDS
expand_model_out_grid(config_tasks,
  round_id = "2022-12-26",
  include_sample_ids = TRUE
)
# Hub with sample output type and compound task ID structure
config_tasks <- read_config_file(
  system.file("config", "tasks-comp-tid.json", package = "hubValidations")
)
expand_model_out_grid(config_tasks,
  round_id = "2022-12-26",
  include_sample_ids = TRUE
)
# Override config compound task ID set
# Create coarser compound task ID set for the first modeling task which contains
# samples
expand_model_out_grid(config_tasks,
  round_id = "2022-12-26",
  include_sample_ids = TRUE,
  compound_taskid_set = list(
    c("forecast_date", "target"),
    NULL
  )
)
expand_model_out_grid(config_tasks,
  round_id = "2022-12-26",
  include_sample_ids = TRUE,
  compound_taskid_set = list(
    NULL,
    NULL
  )
)
# Subset output types
config_tasks <- read_config(
  system.file("testhubs", "samples", package = "hubValidations")
)
expand_model_out_grid(config_tasks,
  round_id = "2022-10-29",
  include_sample_ids = TRUE,
  bind_model_tasks = FALSE,
  output_types = c("sample", "pmf"),
)
expand_model_out_grid(config_tasks,
  round_id = "2022-10-29",
  include_sample_ids = TRUE,
  bind_model_tasks = TRUE,
  output_types = "sample",
)
# Ignore derived task IDs
expand_model_out_grid(config_tasks,
  round_id = "2022-10-29",
  include_sample_ids = TRUE,
  bind_model_tasks = FALSE,
  output_types = "sample",
  derived_task_ids = "target_end_date"
)
# Return only required values
hub_path <- system.file("testhubs", "v4", "simple", package = "hubUtils")
config_tasks <- read_config(hub_path)
# Return required output types and output_types_ids only
expand_model_out_grid(
  config_tasks = config_tasks,
  round_id = "2022-10-22",
  required_vals_only = TRUE
)
# Force all output types to be required
expand_model_out_grid(
  config_tasks = config_tasks,
  round_id = "2022-10-22",
  required_vals_only = TRUE,
  force_output_types = TRUE
)
# Sub-setting for an optional output type returns an empty data frame
expand_model_out_grid(
  config_tasks = config_tasks,
  round_id = "2022-10-22",
  output_types = "mean",
  required_vals_only = TRUE
)
# force_output_types on an optional output type forces all output_type_id values
# to be required
expand_model_out_grid(
  config_tasks = config_tasks,
  round_id = "2022-10-22",
  output_types = "mean",
  required_vals_only = TRUE,
  force_output_types = TRUE
)
# Ignore derived task IDs
hub_path <- system.file("testhubs", "v4", "flusight", package = "hubUtils")
config_tasks <- read_config(hub_path)
# Defaults to using derived_task_ids from config
expand_model_out_grid(config_tasks, round_id = "2023-05-08")
# Can be overridden by argument derived_task_ids
expand_model_out_grid(config_tasks,
  round_id = "2023-05-08",
  derived_task_ids = NULL
)

Get hub configuration fields from a `⁠<config>⁠` class object

Description

Get hub configuration fields from a ⁠<config>⁠ class object

Usage

get_config_derived_task_ids(config_tasks, round_id = NULL)
get_config_derived_task_ids(config_tasks, round_id = NULL)

Arguments

`config_tasks`	a list version of the content's of a hub's `tasks.json` config file, accessed through the `"config_tasks"` attribute of a `⁠<hub_connection>⁠` object or function `hubUtils::read_config()`.
`round_id`	Character string. Round identifier. If the round is set to `round_id_from_variable: true`, IDs are values of the task ID defined in the round's `round_id` property of `config_tasks`. Otherwise should match round's `round_id` value in config. Ignored if hub contains only a single round.

Value

get_config_derived_task_ids: character vector of hub or round level derived task ID names. If round_id is NULL or the round does not have a round level derived_tasks_ids setting, returns the hub level derived_tasks_ids setting.

Functions

get_config_derived_task_ids(): Get the hub or round level derived_tasks_ids

Examples

hub_path <- system.file("testhubs/v4/flusight", package = "hubUtils")
config_tasks <- read_config(hub_path)
get_config_derived_task_ids(config_tasks)
get_config_derived_task_ids(config_tasks, round_id = "2023-05-08")
hub_path <- system.file("testhubs/v4/flusight", package = "hubUtils")
config_tasks <- read_config(hub_path)
get_config_derived_task_ids(config_tasks)
get_config_derived_task_ids(config_tasks, round_id = "2023-05-08")

Detect the compound_taskid_set for a tbl for each modeling task in a given round.

Description

Detect the compound_taskid_set for a tbl for each modeling task in a given round.

Usage

get_tbl_compound_taskid_set(
  tbl,
  config_tasks,
  round_id,
  compact = TRUE,
  error = TRUE,
  derived_task_ids = get_config_derived_task_ids(config_tasks, round_id)
)
get_tbl_compound_taskid_set(
  tbl,
  config_tasks,
  round_id,
  compact = TRUE,
  error = TRUE,
  derived_task_ids = get_config_derived_task_ids(config_tasks, round_id)
)

Arguments

`tbl`	a tibble/data.frame of the contents of the file being validated. Column types must all be character.
`config_tasks`	a list representantion of the `tasks.json` config file.
`round_id`	Character string. The round ID.
`compact`	Logical. If TRUE, the output will be compacted to remove NULL elements.
`error`	Logical. If TRUE, an error will be thrown if the compound task ID set is not valid. If FALSE and an error is detected, the detected compound task ID set will be returned with error attributes attached.
`derived_task_ids`	Character vector of derived task ID names (task IDs whose values depend on other task IDs) to ignore. Columns for such task ids will contain `NA`s. Defaults to extracting derived task IDs from `config_tasks`. See `get_config_derived_task_ids()` for more details.

Value

A list of vectors of compound task IDs detected in the tbl, one for each modeling task in the round. If compact is TRUE, modeling tasks returning NULL elements will be removed.

Examples

hub_path <- system.file("testhubs/samples", package = "hubValidations")
file_path <- "flu-base/2022-10-22-flu-base.csv"
round_id <- "2022-10-22"
tbl <- read_model_out_file(
  file_path = file_path,
  hub_path = hub_path,
  coerce_types = "chr"
)
config_tasks <- read_config(hub_path, "tasks")
get_tbl_compound_taskid_set(tbl, config_tasks, round_id)
get_tbl_compound_taskid_set(tbl, config_tasks, round_id,
  compact = FALSE
)

hub_path <- system.file("testhubs/samples", package = "hubValidations")
file_path <- "flu-base/2022-10-22-flu-base.csv"
round_id <- "2022-10-22"
tbl <- read_model_out_file(
  file_path = file_path,
  hub_path = hub_path,
  coerce_types = "chr"
)
config_tasks <- read_config(hub_path, "tasks")
get_tbl_compound_taskid_set(tbl, config_tasks, round_id)
get_tbl_compound_taskid_set(tbl, config_tasks, round_id,
  compact = FALSE
)

Get status of a hub check

Description

Get status of a hub check

Usage

is_success(x)

is_failure(x)

is_error(x)

is_info(x)

not_pass(x)

is_exec_error(x)

is_exec_warn(x)

is_any_error(x)
is_success(x)

is_failure(x)

is_error(x)

is_info(x)

not_pass(x)

is_exec_error(x)

is_exec_warn(x)

is_any_error(x)

Arguments

`x`	an object that inherits from class `⁠<hub_check>⁠` to test.

Value

Logical. Is given status of check TRUE?

Functions

is_success(): Is check success?
is_failure(): Is check failure?
is_error(): Is check error?
is_info(): Is check info?
not_pass(): Did check not pass?
is_exec_error(): Is exec error?
is_exec_warn(): Is exec warning?
is_any_error(): Is error or exec error?

Match model output `tbl` data to their model tasks in `config_tasks`.

Description

Split and match model output tbl data to their corresponding model tasks in config_tasks. Useful for performing model task specific checks on model output. For v3 samples, the output_type_id column is set to NA for sample outputs.

Usage

match_tbl_to_model_task(
  tbl,
  config_tasks,
  round_id,
  output_types = NULL,
  derived_task_ids = get_config_derived_task_ids(config_tasks, round_id),
  all_character = TRUE
)
match_tbl_to_model_task(
  tbl,
  config_tasks,
  round_id,
  output_types = NULL,
  derived_task_ids = get_config_derived_task_ids(config_tasks, round_id),
  all_character = TRUE
)

Arguments

`tbl`	a tibble/data.frame of the contents of the file being validated.
`config_tasks`	a list version of the content's of a hub's `tasks.json` config file, accessed through the `"config_tasks"` attribute of a `⁠<hub_connection>⁠` object or function `hubUtils::read_config()`.
`round_id`	Character string. Round identifier. If the round is set to `round_id_from_variable: true`, IDs are values of the task ID defined in the round's `round_id` property of `config_tasks`. Otherwise should match round's `round_id` value in config. Ignored if hub contains only a single round.
`output_types`	Character vector of output type names to include. Use to subset for grids for specific output types.
`derived_task_ids`	Character vector of derived task ID names (task IDs whose values depend on other task IDs) to ignore. Columns for such task ids will contain `NA`s. Defaults to extracting derived task IDs from `config_tasks`. See `get_config_derived_task_ids()` for more details.
`all_character`	Logical. Whether to return all character column.

Value

A list containing a tbl_df of model output data matched to a model task with one element per round model task.

Examples

hub_path <- system.file("testhubs/samples", package = "hubValidations")
tbl <- read_model_out_file(
  file_path = "flu-base/2022-10-22-flu-base.csv",
  hub_path, coerce_types = "chr"
)
config_tasks <- read_config(hub_path, "tasks")
match_tbl_to_model_task(tbl, config_tasks, round_id = "2022-10-22")
match_tbl_to_model_task(tbl, config_tasks,
  round_id = "2022-10-22",
  output_types = "sample"
)
hub_path <- system.file("testhubs/samples", package = "hubValidations")
tbl <- read_model_out_file(
  file_path = "flu-base/2022-10-22-flu-base.csv",
  hub_path, coerce_types = "chr"
)
config_tasks <- read_config(hub_path, "tasks")
match_tbl_to_model_task(tbl, config_tasks, round_id = "2022-10-22")
match_tbl_to_model_task(tbl, config_tasks,
  round_id = "2022-10-22",
  output_types = "sample"
)

Create new or convert list to `hub_validations` S3 class object

Description

Create new or convert list to hub_validations S3 class object

Usage

new_hub_validations(...)

as_hub_validations(x)
new_hub_validations(...)

as_hub_validations(x)

Arguments

`...`	named elements to be included. Each element must be an object which inherits from class `⁠<hub_check>⁠`.
`x`	a list of named elements. Each element must be an object which inherits from class `⁠<hub_check>⁠`.

Value

an S3 object of class ⁠<hub_validations>⁠.

Functions

new_hub_validations(): Create new ⁠<hub_validations>⁠ S3 class object
as_hub_validations(): Convert list to ⁠<hub_validations>⁠ S3 class object

Examples

new_hub_validations()

hub_path <- system.file("testhubs/simple", package = "hubValidations")
file_path <- "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
new_hub_validations(
  file_exists = check_file_exists(file_path, hub_path),
  file_name = check_file_name(file_path)
)
x <- list(
  file_exists = check_file_exists(file_path, hub_path),
  file_name = check_file_name(file_path)
)
as_hub_validations(x)
new_hub_validations()

hub_path <- system.file("testhubs/simple", package = "hubValidations")
file_path <- "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
new_hub_validations(
  file_exists = check_file_exists(file_path, hub_path),
  file_name = check_file_name(file_path)
)
x <- list(
  file_exists = check_file_exists(file_path, hub_path),
  file_name = check_file_name(file_path)
)
as_hub_validations(x)

Check that submitting team does not exceed maximum number of allowed models per team

Description

Check that submitting team does not exceed maximum number of allowed models per team

Usage

opt_check_metadata_team_max_model_n(file_path, hub_path, n_max = 2L)
opt_check_metadata_team_max_model_n(file_path, hub_path, n_max = 2L)

Arguments

`file_path`	character string. Path to the file being validated relative to the hub's model-metadata directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`n_max`	Integer. Number of maximum allowed models per team.

Details

Should be deployed as part of validate_model_metadata optional checks.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check time difference between values in two date columns equal a defined period.

Description

Check time difference between values in two date columns equal a defined period.

Usage

opt_check_tbl_col_timediff(
  tbl,
  file_path,
  hub_path,
  t0_colname,
  t1_colname,
  timediff = lubridate::weeks(2),
  output_type_id_datatype = c("from_config", "auto", "character", "double", "integer",
    "logical", "Date")
)
opt_check_tbl_col_timediff(
  tbl,
  file_path,
  hub_path,
  t0_colname,
  t1_colname,
  timediff = lubridate::weeks(2),
  output_type_id_datatype = c("from_config", "auto", "character", "double", "integer",
    "logical", "Date")
)

Arguments

`tbl`	a tibble/data.frame of the contents of the file being validated.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`t0_colname`	Character string. The name of the time zero date column.
`t1_colname`	Character string. The name of the time zero + 1 time step date column.
`timediff`	an object of class `lubridate` `Period` and length 1.
`output_type_id_datatype`	character string. One of `"from_config"`, `"auto"`, `"character"`, `"double"`, `"integer"`, `"logical"`, `"Date"`. Defaults to `"from_config"` which uses the setting in the `output_type_id_datatype` property in the `tasks.json` config file if available. If the property is not set in the config, the argument falls back to `"auto"` which determines the `output_type_id` data type automatically from the `tasks.json` config file as the simplest data type required to represent all output type ID values across all output types in the hub. When only point estimate output types (where `output_type_id`s are `NA`,) are being collected by a hub, the `output_type_id` column is assigned a `character` data type when auto-determined. Other data type values can be used to override automatic determination. Note that attempting to coerce `output_type_id` to a data type that is not valid for the data (e.g. trying to coerce`"character"` values to `"double"`) will likely result in an error or potentially unexpected behaviour so use with care.

Details

Should be deployed as part of validate_model_data optional checks.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Check that predicted values per location are less than total location population.

Description

Check that predicted values per location are less than total location population.

Usage

opt_check_tbl_counts_lt_popn(
  tbl,
  file_path,
  hub_path,
  targets = NULL,
  popn_file_path = "auxiliary-data/locations.csv",
  popn_col = "population",
  location_col = "location"
)
opt_check_tbl_counts_lt_popn(
  tbl,
  file_path,
  hub_path,
  targets = NULL,
  popn_file_path = "auxiliary-data/locations.csv",
  popn_col = "population",
  location_col = "location"
)

Arguments

`tbl`	a tibble/data.frame of the contents of the file being validated.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`targets`	Either a single target key list or a list of multiple target key lists.
`popn_file_path`	Character string. Path to population data relative to the hub root. Defaults to `auxiliary-data/locations.csv`.
`popn_col`	Character string. The name of the population size column in the population data set.
`location_col`	Character string. The name of the location column. Used to join population data to submission file data. Must be shared by both files.

Details

Should only be applied to rows containing count predictions. Use argument targets to filter tbl data to appropriate count target rows.

Should be deployed as part of validate_model_data optional checks.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Examples

hub_path <- system.file("testhubs/flusight", package = "hubValidations")
file_path <- "hub-ensemble/2023-05-08-hub-ensemble.parquet"
tbl <- hubValidations::read_model_out_file(file_path, hub_path)
# Single target key list
targets <- list("target" = "wk ahead inc flu hosp")
opt_check_tbl_counts_lt_popn(tbl, file_path, hub_path, targets = targets)
hub_path <- system.file("testhubs/flusight", package = "hubValidations")
file_path <- "hub-ensemble/2023-05-08-hub-ensemble.parquet"
tbl <- hubValidations::read_model_out_file(file_path, hub_path)
# Single target key list
targets <- list("target" = "wk ahead inc flu hosp")
opt_check_tbl_counts_lt_popn(tbl, file_path, hub_path, targets = targets)

Check time difference between values in two date columns equals a defined time period defined by values in a horizon column

Description

Check time difference between values in two date columns equals a defined time period defined by values in a horizon column

Usage

opt_check_tbl_horizon_timediff(
  tbl,
  file_path,
  hub_path,
  t0_colname,
  t1_colname,
  horizon_colname = "horizon",
  timediff = lubridate::weeks(),
  output_type_id_datatype = c("from_config", "auto", "character", "double", "integer",
    "logical", "Date")
)
opt_check_tbl_horizon_timediff(
  tbl,
  file_path,
  hub_path,
  t0_colname,
  t1_colname,
  horizon_colname = "horizon",
  timediff = lubridate::weeks(),
  output_type_id_datatype = c("from_config", "auto", "character", "double", "integer",
    "logical", "Date")
)

Arguments

`tbl`	a tibble/data.frame of the contents of the file being validated.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`t0_colname`	Character string. The name of the time zero date column.
`t1_colname`	Character string. The name of the time zero + 1 time step date column.
`horizon_colname`	Character string. The name of the horizon column. Defaults to `"horizon"`.
`timediff`	an object of class `lubridate` `Period` and length 1. The period of a single horizon. Default to 1 week.
`output_type_id_datatype`	character string. One of `"from_config"`, `"auto"`, `"character"`, `"double"`, `"integer"`, `"logical"`, `"Date"`. Defaults to `"from_config"` which uses the setting in the `output_type_id_datatype` property in the `tasks.json` config file if available. If the property is not set in the config, the argument falls back to `"auto"` which determines the `output_type_id` data type automatically from the `tasks.json` config file as the simplest data type required to represent all output type ID values across all output types in the hub. When only point estimate output types (where `output_type_id`s are `NA`,) are being collected by a hub, the `output_type_id` column is assigned a `character` data type when auto-determined. Other data type values can be used to override automatic determination. Note that attempting to coerce `output_type_id` to a data type that is not valid for the data (e.g. trying to coerce`"character"` values to `"double"`) will likely result in an error or potentially unexpected behaviour so use with care.

Details

Should be deployed as part of validate_model_data optional checks.

Value

Depending on whether validation has succeeded, one of:

⁠<message/check_success>⁠ condition class object.
⁠<error/check_failure>⁠ condition class object.

Returned object also inherits from subclass ⁠<hub_check>⁠.

Parse model output file metadata from file name

Description

Parse model output file metadata from file name

Usage

parse_file_name(file_path, file_type = c("model_output", "model_metadata"))
parse_file_name(file_path, file_type = c("model_output", "model_metadata"))

Arguments

`file_path`	Character string. A model output file name. Can include parent directories which are ignored.
`file_type`	Character string. Type of file name being parsed. One of `"model_output"` or `"model_metadata"`.

Details

File names are allowed to contain the following compression extension prefixes: .snappy, .gzip, .gz, .brotli, .zstd, .lz4, .lzo, .bz2. These extension prefixes are now extracted when parsing the file name and returned as compression_ext element if present.

Value

A list with the following elements:

round_id: The round ID the model output is associated with (NA for model metadata files.)
team_abbr: The team responsible for the model.
model_abbr: The name of the model.
model_id: The unique model ID derived from the concatenation of ⁠<team_abbr>-<model_abbr>⁠.
ext: The file extension.
compression_ext: optional. The compression extension if present.

Examples

parse_file_name("hub-baseline/2022-10-15-hub-baseline.csv")
parse_file_name("hub-baseline/2022-10-15-hub-baseline.gzip.parquet")
parse_file_name("hub-baseline/2022-10-15-hub-baseline.csv")
parse_file_name("hub-baseline/2022-10-15-hub-baseline.gzip.parquet")

Print results of `validate_...()` function as a bullet list

Description

Print results of validate_...() function as a bullet list

Usage

## S3 method for class 'hub_validations'
print(x, ...)
## S3 method for class 'hub_validations'
print(x, ...)

Arguments

`x`	An object of class `hub_validations`
`...`	Unused argument present for class consistency

Print results of `validate_pr()` function as a bullet list

Description

Print results of validate_pr() function as a bullet list

Usage

## S3 method for class 'pr_hub_validations'
print(x, ...)
## S3 method for class 'pr_hub_validations'
print(x, ...)

Arguments

`x`	An object of class `pr_hub_validations`
`...`	Unused argument present for class consistency

Read a model output file

Description

Read a model output file

Usage

read_model_out_file(
  file_path,
  hub_path = ".",
  coerce_types = c("hub", "chr", "none"),
  output_type_id_datatype = c("from_config", "auto", "character", "double", "integer",
    "logical", "Date")
)
read_model_out_file(
  file_path,
  hub_path = ".",
  coerce_types = c("hub", "chr", "none"),
  output_type_id_datatype = c("from_config", "auto", "character", "double", "integer",
    "logical", "Date")
)

Arguments

`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`coerce_types`	character. What to coerce column types to on read. `hub`: (default) read in (`csv`) or coerce (`parquet`, `arrow`) to hub schema. When coercing data types using the `hub` schema, the `output_type_id_datatype` can also be used to set the `output_type_id` column data type manually. `chr`: read in (`csv`) or coerce (`parquet`, `arrow`) all columns to character. `none`: No coercion. Use `arrow` `⁠read_*⁠` function defaults.
`output_type_id_datatype`	character string. One of `"from_config"`, `"auto"`, `"character"`, `"double"`, `"integer"`, `"logical"`, `"Date"`. Defaults to `"from_config"` which uses the setting in the `output_type_id_datatype` property in the `tasks.json` config file if available. If the property is not set in the config, the argument falls back to `"auto"` which determines the `output_type_id` data type automatically from the `tasks.json` config file as the simplest data type required to represent all output type ID values across all output types in the hub. When only point estimate output types (where `output_type_id`s are `NA`,) are being collected by a hub, the `output_type_id` column is assigned a `character` data type when auto-determined. Other data type values can be used to override automatic determination. Note that attempting to coerce `output_type_id` to a data type that is not valid for the data (e.g. trying to coerce`"character"` values to `"double"`) will likely result in an error or potentially unexpected behaviour so use with care.

Value

a tibble of contents of the model output file.

Create a model output submission file template

Description

Create a model output submission file template

Usage

submission_tmpl(
  path,
  round_id,
  required_vals_only = FALSE,
  force_output_types = FALSE,
  complete_cases_only = TRUE,
  compound_taskid_set = NULL,
  output_types = NULL,
  derived_task_ids = NULL,
  hub_con = deprecated(),
  config_tasks = deprecated()
)
submission_tmpl(
  path,
  round_id,
  required_vals_only = FALSE,
  force_output_types = FALSE,
  complete_cases_only = TRUE,
  compound_taskid_set = NULL,
  output_types = NULL,
  derived_task_ids = NULL,
  hub_con = deprecated(),
  config_tasks = deprecated()
)

Arguments

`path`	Character string. Can be one of: a path to a local fully configured hub directory a path to a local `tasks.json` file. a URL to the repository of a fully configured hub on GitHub. a URL to the raw contents of a `tasks.json` file on GitHub. a `⁠<SubTreeFileSystem>⁠` class object pointing to the root of an S3 cloud hub. a `⁠<SubTreeFileSystem>⁠` class object pointing to a `tasks.json` config file in an S3 cloud hub, relative to the hub's root directory. See examples for more details.
`round_id`	Character string. Round identifier. If the round is set to `round_id_from_variable: true`, IDs are values of the task ID defined in the round's `round_id` property of `config_tasks`. Otherwise should match round's `round_id` value in config. Ignored if hub contains only a single round.
`required_vals_only`	Logical. Whether to return only combinations of Task ID and related output type ID required values.
`force_output_types`	Logical. Whether to force all output types to be required. If `TRUE`, all output type ID values are treated as required regardless of the value of the `is_required` property. Useful for creating grids of required values for optional output types.
`complete_cases_only`	Logical. If `TRUE` (default) and `required_vals_only = TRUE`, only rows with complete cases of combinations of required values are returned. If `FALSE`, rows with incomplete cases of combinations of required values are included in the output.
`compound_taskid_set`	List of character vectors, one for each modeling task in the round. Can be used to override the compound task ID set defined in the config. If `NULL` is provided for a given modeling task, a compound task ID set of all task IDs is used.
`output_types`	Character vector of output type names to include. Use to subset for grids for specific output types.
`derived_task_ids`	Character vector of derived task ID names (task IDs whose values depend on other task IDs) to ignore. Columns for such task ids will contain `NA`s. If `NULL`, defaults to extracting derived task IDs from `config_tasks` or the `config_tasks` attribute of `hub_con`. See `get_config_derived_task_ids()` for more details.
`hub_con`	Use `path` instead. A `⁠⁠<hub_connection>⁠⁠` class object.
`config_tasks`	Use `path` instead. A list version of the content's of a hub's `tasks.json` config file, accessed through the `"config_tasks"` attribute of a `⁠<hub_connection>⁠` object or function `read_config()`.

Details

For task IDs where all values are optional, by default, columns are created as columns of NAs when required_vals_only = TRUE. When such columns exist, the function returns a tibble with zero rows, as no complete cases of required value combinations exists. (Note that determination of complete cases does excludes valid NA output_type_id values in "mean" and "median" output types). To return a template of incomplete required cases, which includes NA columns, use complete_cases_only = FALSE.

To include output types that are optional in the submission template when required_vals_only = TRUE and complete_cases_only = FALSE, use force_output_types = TRUE. Use this in combination with sub-setting for output types you plan to submit via argument output_types to create a submission template customised to your submission plans. Tip: to ensure you create a template with all required output types, it's a good idea to first run the functions without subsetting or forcing output types and examing the unique values in output_type to check which output types are required.

When sample output types are included in the output, the output_type_id column contains example sample indexes which are useful for identifying the compound task ID structure of multivariate sampling distributions in particular, i.e. which combinations of task ID values represent individual samples.

Value

a tibble template containing an expanded grid of valid task ID and output type ID value combinations for a given submission round and output type. If required_vals_only = TRUE, values are limited to the combination of required values only.

Examples

hub_path <- system.file("testhubs/flusight", package = "hubUtils")
submission_tmpl(hub_path, round_id = "2023-01-02")
# Return required values only
submission_tmpl(
  hub_path,
  round_id = "2023-01-02",
  required_vals_only = TRUE
)
submission_tmpl(
  hub_path,
  round_id = "2023-01-02",
  required_vals_only = TRUE,
  complete_cases_only = FALSE
)
# Specify a round in a hub with multiple rounds
hub_path <- system.file("testhubs/simple", package = "hubUtils")
submission_tmpl(hub_path, round_id = "2022-10-01")
submission_tmpl(hub_path, round_id = "2022-10-29")
# Subset for a specific output type
hub_path <- system.file("testhubs", "samples", package = "hubValidations")
submission_tmpl(
  hub_path,
  round_id = "2022-12-17",
  output_types = "sample"
)
# Create a template from the path to a tasks config file
config_path <- system.file("config", "tasks.json",
  package = "hubValidations"
)
submission_tmpl(
  config_path,
  round_id = "2022-12-26"
)
# Hub with sample output type and compound task ID structure
config_path <- system.file("config", "tasks-comp-tid.json",
  package = "hubValidations"
)
submission_tmpl(
  config_path,
  round_id = "2022-12-26",
  output_types = "sample"
)
# Override config compound task ID set
# Create coarser compound task ID set for the first modeling task which contains
# samples
submission_tmpl(
  config_path,
  round_id = "2022-12-26",
  output_types = "sample",
  compound_taskid_set = list(
    c("forecast_date", "target"),
    NULL
  )
)
# Derive a template with ignored derived task ID. Useful to avoid creating
# a template with invalid derived task ID value combinations.
hub_path <- system.file("testhubs", "flusight", package = "hubValidations")
submission_tmpl(
  hub_path,
  round_id = "2022-12-12",
  output_types = "pmf",
  derived_task_ids = "target_end_date",
  complete_cases_only = FALSE
)
# Force optional output type, in this case "mean".
submission_tmpl(
  hub_path,
  round_id = "2022-12-12",
  required_vals_only = TRUE,
  output_types = c("pmf", "quantile", "mean"),
  force_output_types = TRUE,
  derived_task_ids = "target_end_date",
  complete_cases_only = FALSE
)
# Create a template from a URL to fully configured hub repository on GitHub
submission_tmpl(
  path = "https://github.com/hubverse-org/example-simple-forecast-hub",
  round_id = "2022-11-28",
  output_types = "quantile"
)
# Create a template from a URL to the raw contents of a tasks.json file on
# GitHub
config_raw_url <- paste0(
  "https://raw.githubusercontent.com/hubverse-org/",
  "example-simple-forecast-hub/refs/heads/main/hub-config/tasks.json"
)
submission_tmpl(
  path = config_raw_url,
  round_id = "2022-11-28",
  output_types = "quantile"
)

# Create submission file using config file from AWS S3 bucket hub
# Use `s3_bucket()` to create a path to the hub's root directory
s3_hub_path <- arrow::s3_bucket("hubverse/hubutils/testhubs/simple/")
submission_tmpl(
  path = s3_hub_path,
  round_id = "2022-10-01",
  output_types = "quantile"
)
# Use `path()` method to create a path to the tasks.json file relative to the
# the S3 cloud hub's root directory
s3_config_path <- s3_hub_path$path("hub-config/tasks.json")
submission_tmpl(
  path = s3_config_path,
  round_id = "2022-10-01",
  output_types = "quantile"
)

hub_path <- system.file("testhubs/flusight", package = "hubUtils")
submission_tmpl(hub_path, round_id = "2023-01-02")
# Return required values only
submission_tmpl(
  hub_path,
  round_id = "2023-01-02",
  required_vals_only = TRUE
)
submission_tmpl(
  hub_path,
  round_id = "2023-01-02",
  required_vals_only = TRUE,
  complete_cases_only = FALSE
)
# Specify a round in a hub with multiple rounds
hub_path <- system.file("testhubs/simple", package = "hubUtils")
submission_tmpl(hub_path, round_id = "2022-10-01")
submission_tmpl(hub_path, round_id = "2022-10-29")
# Subset for a specific output type
hub_path <- system.file("testhubs", "samples", package = "hubValidations")
submission_tmpl(
  hub_path,
  round_id = "2022-12-17",
  output_types = "sample"
)
# Create a template from the path to a tasks config file
config_path <- system.file("config", "tasks.json",
  package = "hubValidations"
)
submission_tmpl(
  config_path,
  round_id = "2022-12-26"
)
# Hub with sample output type and compound task ID structure
config_path <- system.file("config", "tasks-comp-tid.json",
  package = "hubValidations"
)
submission_tmpl(
  config_path,
  round_id = "2022-12-26",
  output_types = "sample"
)
# Override config compound task ID set
# Create coarser compound task ID set for the first modeling task which contains
# samples
submission_tmpl(
  config_path,
  round_id = "2022-12-26",
  output_types = "sample",
  compound_taskid_set = list(
    c("forecast_date", "target"),
    NULL
  )
)
# Derive a template with ignored derived task ID. Useful to avoid creating
# a template with invalid derived task ID value combinations.
hub_path <- system.file("testhubs", "flusight", package = "hubValidations")
submission_tmpl(
  hub_path,
  round_id = "2022-12-12",
  output_types = "pmf",
  derived_task_ids = "target_end_date",
  complete_cases_only = FALSE
)
# Force optional output type, in this case "mean".
submission_tmpl(
  hub_path,
  round_id = "2022-12-12",
  required_vals_only = TRUE,
  output_types = c("pmf", "quantile", "mean"),
  force_output_types = TRUE,
  derived_task_ids = "target_end_date",
  complete_cases_only = FALSE
)
# Create a template from a URL to fully configured hub repository on GitHub
submission_tmpl(
  path = "https://github.com/hubverse-org/example-simple-forecast-hub",
  round_id = "2022-11-28",
  output_types = "quantile"
)
# Create a template from a URL to the raw contents of a tasks.json file on
# GitHub
config_raw_url <- paste0(
  "https://raw.githubusercontent.com/hubverse-org/",
  "example-simple-forecast-hub/refs/heads/main/hub-config/tasks.json"
)
submission_tmpl(
  path = config_raw_url,
  round_id = "2022-11-28",
  output_types = "quantile"
)

# Create submission file using config file from AWS S3 bucket hub
# Use `s3_bucket()` to create a path to the hub's root directory
s3_hub_path <- arrow::s3_bucket("hubverse/hubutils/testhubs/simple/")
submission_tmpl(
  path = s3_hub_path,
  round_id = "2022-10-01",
  output_types = "quantile"
)
# Use `path()` method to create a path to the tasks.json file relative to the
# the S3 cloud hub's root directory
s3_config_path <- s3_hub_path$path("hub-config/tasks.json")
submission_tmpl(
  path = s3_config_path,
  round_id = "2022-10-01",
  output_types = "quantile"
)

Wrap check expression in try to capture check execution errors

Description

Wrap check expression in try to capture check execution errors

Usage

try_check(expr, file_path)
try_check(expr, file_path)

Arguments

`expr`	check function expression to run.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.

Value

If expr executes correctly, the output of expr is returned. If execution fails, and object of class ⁠<error/check_exec_error>⁠ is returned. The execution error message is attached as attribute msg.

Validate the contents of a submitted model data file

Description

Validate the contents of a submitted model data file

Usage

validate_model_data(
  hub_path,
  file_path,
  round_id_col = NULL,
  output_type_id_datatype = c("from_config", "auto", "character", "double", "integer",
    "logical", "Date"),
  validations_cfg_path = NULL,
  derived_task_ids = NULL
)
validate_model_data(
  hub_path,
  file_path,
  round_id_col = NULL,
  output_type_id_datatype = c("from_config", "auto", "character", "double", "integer",
    "logical", "Date"),
  validations_cfg_path = NULL,
  derived_task_ids = NULL
)

Arguments

`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`round_id_col`	Character string. The name of the column containing `round_id`s. Usually, the value of round property `round_id` in hub `tasks.json` config file. Defaults to `NULL` and determined from the config if applicable.
`output_type_id_datatype`	character string. One of `"from_config"`, `"auto"`, `"character"`, `"double"`, `"integer"`, `"logical"`, `"Date"`. Defaults to `"from_config"` which uses the setting in the `output_type_id_datatype` property in the `tasks.json` config file if available. If the property is not set in the config, the argument falls back to `"auto"` which determines the `output_type_id` data type automatically from the `tasks.json` config file as the simplest data type required to represent all output type ID values across all output types in the hub. When only point estimate output types (where `output_type_id`s are `NA`,) are being collected by a hub, the `output_type_id` column is assigned a `character` data type when auto-determined. Other data type values can be used to override automatic determination. Note that attempting to coerce `output_type_id` to a data type that is not valid for the data (e.g. trying to coerce`"character"` values to `"double"`) will likely result in an error or potentially unexpected behaviour so use with care.
`validations_cfg_path`	Path to `validations.yml` file. If `NULL` defaults to `hub-config/validations.yml`.
`derived_task_ids`	Character vector of derived task ID names (task IDs whose values depend on other task IDs) to ignore. Columns for such task ids will contain `NA`s. If `NULL`, defaults to extracting derived task IDs from hub `task.json`. See `get_hub_derived_task_ids()` for more details.

Details

Note that it is necessary for derived_task_ids to be specified if any task IDs with required values have dependent derived task IDs. If this is the case and derived task IDs are not specified, the dependent nature of derived task ID values will result in false validation errors when validating required values.

Details of checks performed by validate_model_data()

Name	Check	Early return	Fail output	Extra info
file_read	File can be read without errors	TRUE	check_error
valid_round_id_col	Round ID var from config exists in data column names. Skipped if `round_id_from_var` is FALSE in config.	FALSE	check_failure
unique_round_id	Round ID column contains a single unique round ID. Skipped if `round_id_from_var` is FALSE in config.	TRUE	check_error
match_round_id	Round ID from file contents matches round ID from file name. Skipped if `round_id_from_var` is FALSE in config.	TRUE	check_error
colnames	File column names match expected column names for round (i.e. task ID names + hub standard column names)	TRUE	check_error
col_types	File column types match expected column types from config. Mainly applicable to parquet & arrow files.	FALSE	check_failure
valid_vals	Columns (excluding the `value` and any derived task ID columns) contain valid combinations of task ID / output type / output type ID values	TRUE	check_error	error_tbl: table of invalid task ID/output type/output type ID value combinations
derived_task_id_vals	Derived task ID columns contain valid values.	FALSE	check_failure	errors: named list of derived task ID values. Each element contains the invalid values for each derived task ID that failed the check.
rows_unique	Columns (excluding the `value` and any derived task ID columns) contain unique combinations of task ID / output type / output type ID values	FALSE	check_failure
req_vals	Columns (excluding the `value` and any derived task ID columns) contain all required combinations of task ID / output type / output type ID values	FALSE	check_failure	missing_df: table of missing task ID/output type/output type ID value combinations
value_col_valid	Values in `value` column are coercible to data type configured for each output type	FALSE	check_failure
value_col_non_desc	Values in `value` column are non-decreasing as output_type_ids increase for all unique task ID /output type value combinations. Applies to `quantile` or `cdf` output types only	FALSE	check_failure	error_tbl: table of rows affected
value_col_sum1	Values in the `value` column of `pmf` output type data for each unique task ID combination sum to 1.	FALSE	check_failure	error_tbl: table of rows affected
spl_compound_taskid_set	Sample compound task id sets for each modeling task match or are coarser than the expected set defined in tasks.json config.	TRUE	check_error	errors: list containing item for each failing modeling task. Exact structure dependent on type of validation failure. See check function documentation for more details.
spl_compound_tid	Samples contain single unique values for each compound task ID within individual samples (v3 and above schema only).	TRUE	check_error	errors: list containing item for each sample failing validation with breakdown of unique values for each compound task ID.
spl_non_compound_tid	Samples contain single unique combination of non-compound task ID values across all samples (v3 and above schema only).	TRUE	check_error	errors: list containing item for each modeling task with vectors of output type ids of samples failing validation and example table of most frequent non-compound task ID value combination across all samples in the modeling task.
spl_n	Number of samples for a given compound idx falls within accepted compound task range (v3 and above schema only).	FALSE	check_failure	errors: list containing item for each compound_idx failing validation with sample count, metadata on expected samples and example table of expected structure for samples belonging to the compound idx in question.

Value

An object of class hub_validations. Each named element contains a hub_check class object reflecting the result of a given check. Function will return early if a check returns an error.

For more details on the structure of ⁠<hub_validations>⁠ objects, including how to access more information on individual checks, see article on ⁠<hub_validations>⁠ S3 class objects.

Examples

hub_path <- system.file("testhubs/simple", package = "hubValidations")
file_path <- "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
validate_model_data(hub_path, file_path)
hub_path <- system.file("testhubs/simple", package = "hubValidations")
file_path <- "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
validate_model_data(hub_path, file_path)

Valid file level properties of a submitted model output file.

Description

Valid file level properties of a submitted model output file.

Usage

validate_model_file(hub_path, file_path, validations_cfg_path = NULL)
validate_model_file(hub_path, file_path, validations_cfg_path = NULL)

Arguments

`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`validations_cfg_path`	Path to `validations.yml` file. If `NULL` defaults to `hub-config/validations.yml`.

Details

Details of checks performed by validate_model_file()

Name	Check	Early return	Fail output
file_exists	File exists at `file_path` provided	TRUE	check_error
file_name	File name valid	TRUE	check_error
file_location	File located in correct team directory	FALSE	check_failure
round_id_valid	File round ID is valid hub round IDs	TRUE	check_error
file_format	File format is accepted hub/round format	TRUE	check_error
file_n	Number of submission files per round per team does not exceed allowed number	FALSE	check_failure
metadata_exists	Model metadata file exists in expected location	FALSE	check_failure

Value

An object of class hub_validations. Each named element contains a hub_check class object reflecting the result of a given check. Function will return early if a check returns an error.

For more details on the structure of ⁠<hub_validations>⁠ objects, including how to access more information on individual checks, see article on ⁠<hub_validations>⁠ S3 class objects.

Examples

hub_path <- system.file("testhubs/simple", package = "hubValidations")
validate_model_file(hub_path,
  file_path = "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
)
validate_model_file(hub_path,
  file_path = "team1-goodmodel/2022-10-15-team1-goodmodel.csv"
)
hub_path <- system.file("testhubs/simple", package = "hubValidations")
validate_model_file(hub_path,
  file_path = "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
)
validate_model_file(hub_path,
  file_path = "team1-goodmodel/2022-10-15-team1-goodmodel.csv"
)

Valid properties of a metadata file.

Description

Valid properties of a metadata file.

Usage

validate_model_metadata(
  hub_path,
  file_path,
  round_id = "default",
  validations_cfg_path = NULL
)
validate_model_metadata(
  hub_path,
  file_path,
  round_id = "default",
  validations_cfg_path = NULL
)

Arguments

`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`round_id`	character string. The round identifier. Used primarily to indicate whether the "default" or a round specific configuration should be used for custom validations.
`validations_cfg_path`	Path to `validations.yml` file. If `NULL` defaults to `hub-config/validations.yml`.

Details

Details of checks performed by validate_model_metadata()

Name	Check	Early return	Fail output
metadata_schema_exists	A model metadata schema file exists in `hub-config` directory.	TRUE	check_error
metadata_file_exists	A file with name provided to argument `file_path` exists at the expected location (the `model-metadata` directory).	TRUE	check_error
metadata_file_ext	The metadata file has correct extension (yaml or yml).	TRUE	check_error
metadata_file_location	The metadata file has been saved to correct location.	TRUE	check_failure
metadata_matches_schema	The contents of the metadata file match the hub's model metadata schema	TRUE	check_error
metadata_file_name	The metadata filename matches the model ID specified in the contents of the file.	TRUE	check_error

Value

An object of class hub_validations. Each named element contains a hub_check class object reflecting the result of a given check. Function will return early if a check returns an error.

Examples

hub_path <- system.file("testhubs/simple", package = "hubValidations")
validate_model_metadata(hub_path,
  file_path = "hub-baseline.yml"
)
validate_model_metadata(hub_path,
  file_path = "team1-goodmodel.yaml"
)
hub_path <- system.file("testhubs/simple", package = "hubValidations")
validate_model_metadata(hub_path,
  file_path = "hub-baseline.yml"
)
validate_model_metadata(hub_path,
  file_path = "team1-goodmodel.yaml"
)

Validate Pull Request

Description

Validates model output and model metadata files in a Pull Request.

Usage

validate_pr(
  hub_path = ".",
  gh_repo,
  pr_number,
  round_id_col = NULL,
  output_type_id_datatype = c("from_config", "auto", "character", "double", "integer",
    "logical", "Date"),
  validations_cfg_path = NULL,
  skip_submit_window_check = FALSE,
  file_modification_check = c("error", "failure", "warn", "message", "none"),
  allow_submit_window_mods = TRUE,
  submit_window_ref_date_from = c("file", "file_path"),
  derived_task_ids = NULL
)
validate_pr(
  hub_path = ".",
  gh_repo,
  pr_number,
  round_id_col = NULL,
  output_type_id_datatype = c("from_config", "auto", "character", "double", "integer",
    "logical", "Date"),
  validations_cfg_path = NULL,
  skip_submit_window_check = FALSE,
  file_modification_check = c("error", "failure", "warn", "message", "none"),
  allow_submit_window_mods = TRUE,
  submit_window_ref_date_from = c("file", "file_path"),
  derived_task_ids = NULL
)

Arguments

`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`gh_repo`	GitHub repository address in the format `username/repo`
`pr_number`	Number of the pull request to validate
`round_id_col`	Character string. The name of the column containing `round_id`s. Only required if files contain a column that contains `round_id` details but has not been configured via `round_id_from_variable: true` and `⁠round_id:⁠` in in hub `tasks.json` config file.
`output_type_id_datatype`	character string. One of `"from_config"`, `"auto"`, `"character"`, `"double"`, `"integer"`, `"logical"`, `"Date"`. Defaults to `"from_config"` which uses the setting in the `output_type_id_datatype` property in the `tasks.json` config file if available. If the property is not set in the config, the argument falls back to `"auto"` which determines the `output_type_id` data type automatically from the `tasks.json` config file as the simplest data type required to represent all output type ID values across all output types in the hub. When only point estimate output types (where `output_type_id`s are `NA`,) are being collected by a hub, the `output_type_id` column is assigned a `character` data type when auto-determined. Other data type values can be used to override automatic determination. Note that attempting to coerce `output_type_id` to a data type that is not valid for the data (e.g. trying to coerce`"character"` values to `"double"`) will likely result in an error or potentially unexpected behaviour so use with care.
`validations_cfg_path`	Path to `validations.yml` file. If `NULL` defaults to `hub-config/validations.yml`.
`skip_submit_window_check`	Logical. Whether to skip the submission window check.
`file_modification_check`	Character string. Whether to perform check and what to return when modification/deletion of a previously submitted model output file or deletion of a previously submitted model metadata file is detected in PR: `"error"`: Appends a `⁠<error/check_error>⁠` condition class object for each applicable modified/deleted file. `"warning"`: Appends a `⁠<error/check_failure>⁠` condition class object for each applicable modified/deleted file. `"message"`: Appends a `⁠<message/check_info>⁠` condition class object for each applicable modified/deleted file. `"none"`: No modification/deletion checks performed.
`allow_submit_window_mods`	Logical. Whether to allow modifications/deletions of model output files within their submission windows. Defaults to `TRUE`.
`submit_window_ref_date_from`	whether to get the reference date around which relative submission windows will be determined from the file's `file_path` round ID or the `file` contents themselves. `file` requires that the file can be read. Only applicable when a round is configured to determine the submission windows relative to the value in a date column in model output files. Not applicable when explicit submission window start and end dates are provided in the hub's config.
`derived_task_ids`	Character vector of derived task ID names (task IDs whose values depend on other task IDs) to ignore. Columns for such task ids will contain `NA`s. If `NULL`, defaults to extracting derived task IDs from hub `task.json`. See `get_hub_derived_task_ids()` for more details.

Details

Only model output and model metadata files are individually validated using validate_submission() or validate_model_metadata() respectively although as part of checks, hub config files are also validated. Any other files included in the PR are ignored but flagged in a message.

By default, modifications (which include renaming) and deletions of previously submitted model output files and deletions or renaming of previously submitted model metadata files are not allowed and return a ⁠<error/check_error>⁠ condition class object for each applicable modified/deleted file. This behaviour can be modified through arguments file_modification_check, which controls whether modification/deletion checks are performed and what is returned if modifications/deletions are detected, and allow_submit_window_mods, which controls whether modifications/deletions of model output files are allowed within their submission windows.

Note that to establish relative submission windows when performing modification/deletion checks and allow_submit_window_mods is TRUE, the reference date is taken as the round_id extracted from the file path (i.e. submit_window_ref_date_from is always set to "file_path"). This is because we cannot extract dates from columns of deleted files. If hub submission window reference dates do not match round IDs in file paths, currently allow_submit_window_mods will not work correctly and is best set to FALSE. This only relates to hubs/rounds where submission windows are determined relative to a reference date and not when explicit submission window start and end dates are provided in the config.

Finally, note that it is necessary for derived_task_ids to be specified if any task IDs with required values have dependent derived task IDs. If this is the case and derived task IDs are not specified, the dependent nature of derived task ID values will result in false validation errors when validating required values.

Checks on model output files

Details of checks performed by validate_submission()

Name	Check	Early return	Fail output	Extra info
valid_config	Hub config valid	TRUE	check_error
submission_time	Current time within file submission window	FALSE	check_failure
file_exists	File exists at `file_path` provided	TRUE	check_error
file_name	File name valid	TRUE	check_error
file_location	File located in correct team directory	FALSE	check_failure
round_id_valid	File round ID is valid hub round IDs	TRUE	check_error
file_format	File format is accepted hub/round format	TRUE	check_error
file_n	Number of submission files per round per team does not exceed allowed number	FALSE	check_failure
metadata_exists	Model metadata file exists in expected location	FALSE	check_failure
file_read	File can be read without errors	TRUE	check_error
valid_round_id_col	Round ID var from config exists in data column names. Skipped if `round_id_from_var` is FALSE in config.	FALSE	check_failure
unique_round_id	Round ID column contains a single unique round ID. Skipped if `round_id_from_var` is FALSE in config.	TRUE	check_error
match_round_id	Round ID from file contents matches round ID from file name. Skipped if `round_id_from_var` is FALSE in config.	TRUE	check_error
colnames	File column names match expected column names for round (i.e. task ID names + hub standard column names)	TRUE	check_error
col_types	File column types match expected column types from config. Mainly applicable to parquet & arrow files.	FALSE	check_failure
valid_vals	Columns (excluding the `value` and any derived task ID columns) contain valid combinations of task ID / output type / output type ID values	TRUE	check_error	error_tbl: table of invalid task ID/output type/output type ID value combinations
derived_task_id_vals	Derived task ID columns contain valid values.	FALSE	check_failure	errors: named list of derived task ID values. Each element contains the invalid values for each derived task ID that failed the check.
rows_unique	Columns (excluding the `value` and any derived task ID columns) contain unique combinations of task ID / output type / output type ID values	FALSE	check_failure
req_vals	Columns (excluding the `value` and any derived task ID columns) contain all required combinations of task ID / output type / output type ID values	FALSE	check_failure	missing_df: table of missing task ID/output type/output type ID value combinations
value_col_valid	Values in `value` column are coercible to data type configured for each output type	FALSE	check_failure
value_col_non_desc	Values in `value` column are non-decreasing as output_type_ids increase for all unique task ID /output type value combinations. Applies to `quantile` or `cdf` output types only	FALSE	check_failure	error_tbl: table of rows affected
value_col_sum1	Values in the `value` column of `pmf` output type data for each unique task ID combination sum to 1.	FALSE	check_failure	error_tbl: table of rows affected
spl_compound_taskid_set	Sample compound task id sets for each modeling task match or are coarser than the expected set defined in tasks.json config.	TRUE	check_error	errors: list containing item for each failing modeling task. Exact structure dependent on type of validation failure. See check function documentation for more details.
spl_compound_tid	Samples contain single unique values for each compound task ID within individual samples (v3 and above schema only).	TRUE	check_error	errors: list containing item for each sample failing validation with breakdown of unique values for each compound task ID.
spl_non_compound_tid	Samples contain single unique combination of non-compound task ID values across all samples (v3 and above schema only).	TRUE	check_error	errors: list containing item for each modeling task with vectors of output type ids of samples failing validation and example table of most frequent non-compound task ID value combination across all samples in the modeling task.
spl_n	Number of samples for a given compound idx falls within accepted compound task range (v3 and above schema only).	FALSE	check_failure	errors: list containing item for each compound_idx failing validation with sample count, metadata on expected samples and example table of expected structure for samples belonging to the compound idx in question.

Checks on model metadata files

Details of checks performed by validate_model_metadata()

Name	Check	Early return	Fail output	optional
metadata_schema_exists	A model metadata schema file exists in `hub-config` directory.	TRUE	check_error	FALSE
metadata_file_exists	A file with name provided to argument `file_path` exists at the expected location (the `model-metadata` directory).	TRUE	check_error	FALSE
metadata_file_ext	The metadata file has correct extension (yaml or yml).	TRUE	check_error	FALSE
metadata_file_location	The metadata file has been saved to correct location.	TRUE	check_failure	FALSE
metadata_matches_schema	The contents of the metadata file match the hub's model metadata schema	TRUE	check_error	FALSE
metadata_file_name	The metadata filename matches the model ID specified in the contents of the file.	TRUE	check_error	FALSE
NA	The number of metadata files submitted by a single team does not exceed the maximum number allowed.	FALSE	check_failure	TRUE

Value

An object of class hub_validations.

Examples

## Not run: 
validate_pr(
  hub_path = ".",
  gh_repo = "hubverse-org/ci-testhub-simple",
  pr_number = 3
)

## End(Not run)
## Not run: 
validate_pr(
  hub_path = ".",
  gh_repo = "hubverse-org/ci-testhub-simple",
  pr_number = 3
)

## End(Not run)

Validate a submitted model data file.

Description

Checks both file level properties like file name, extension, location etc as well as model output data, i.e. the contents of the file.

Usage

validate_submission(
  hub_path,
  file_path,
  round_id_col = NULL,
  validations_cfg_path = NULL,
  output_type_id_datatype = c("from_config", "auto", "character", "double", "integer",
    "logical", "Date"),
  skip_submit_window_check = FALSE,
  skip_check_config = FALSE,
  submit_window_ref_date_from = c("file", "file_path"),
  derived_task_ids = NULL
)
validate_submission(
  hub_path,
  file_path,
  round_id_col = NULL,
  validations_cfg_path = NULL,
  output_type_id_datatype = c("from_config", "auto", "character", "double", "integer",
    "logical", "Date"),
  skip_submit_window_check = FALSE,
  skip_check_config = FALSE,
  submit_window_ref_date_from = c("file", "file_path"),
  derived_task_ids = NULL
)

Arguments

`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`round_id_col`	Character string. The name of the column containing `round_id`s. Usually, the value of round property `round_id` in hub `tasks.json` config file. Defaults to `NULL` and determined from the config if applicable.
`validations_cfg_path`	Path to `validations.yml` file. If `NULL` defaults to `hub-config/validations.yml`.
`output_type_id_datatype`	character string. One of `"from_config"`, `"auto"`, `"character"`, `"double"`, `"integer"`, `"logical"`, `"Date"`. Defaults to `"from_config"` which uses the setting in the `output_type_id_datatype` property in the `tasks.json` config file if available. If the property is not set in the config, the argument falls back to `"auto"` which determines the `output_type_id` data type automatically from the `tasks.json` config file as the simplest data type required to represent all output type ID values across all output types in the hub. When only point estimate output types (where `output_type_id`s are `NA`,) are being collected by a hub, the `output_type_id` column is assigned a `character` data type when auto-determined. Other data type values can be used to override automatic determination. Note that attempting to coerce `output_type_id` to a data type that is not valid for the data (e.g. trying to coerce`"character"` values to `"double"`) will likely result in an error or potentially unexpected behaviour so use with care.
`skip_submit_window_check`	Logical. Whether to skip the submission window check.
`skip_check_config`	Logical. Whether to skip the hub config validation check. check.
`submit_window_ref_date_from`	whether to get the reference date around which relative submission windows will be determined from the file's `file_path` round ID or the `file` contents themselves. `file` requires that the file can be read. Only applicable when a round is configured to determine the submission windows relative to the value in a date column in model output files. Not applicable when explicit submission window start and end dates are provided in the hub's config.
`derived_task_ids`	Character vector of derived task ID names (task IDs whose values depend on other task IDs) to ignore. Columns for such task ids will contain `NA`s. If `NULL`, defaults to extracting derived task IDs from hub `task.json`. See `get_hub_derived_task_ids()` for more details.

Details

Details of checks performed by validate_submission()

Name	Check	Early return	Fail output	Extra info
valid_config	Hub config valid	TRUE	check_error
submission_time	Current time within file submission window	FALSE	check_failure
file_exists	File exists at `file_path` provided	TRUE	check_error
file_name	File name valid	TRUE	check_error
file_location	File located in correct team directory	FALSE	check_failure
round_id_valid	File round ID is valid hub round IDs	TRUE	check_error
file_format	File format is accepted hub/round format	TRUE	check_error
file_n	Number of submission files per round per team does not exceed allowed number	FALSE	check_failure
metadata_exists	Model metadata file exists in expected location	FALSE	check_failure
file_read	File can be read without errors	TRUE	check_error
valid_round_id_col	Round ID var from config exists in data column names. Skipped if `round_id_from_var` is FALSE in config.	FALSE	check_failure
unique_round_id	Round ID column contains a single unique round ID. Skipped if `round_id_from_var` is FALSE in config.	TRUE	check_error
match_round_id	Round ID from file contents matches round ID from file name. Skipped if `round_id_from_var` is FALSE in config.	TRUE	check_error
colnames	File column names match expected column names for round (i.e. task ID names + hub standard column names)	TRUE	check_error
col_types	File column types match expected column types from config. Mainly applicable to parquet & arrow files.	FALSE	check_failure
valid_vals	Columns (excluding the `value` and any derived task ID columns) contain valid combinations of task ID / output type / output type ID values	TRUE	check_error	error_tbl: table of invalid task ID/output type/output type ID value combinations
derived_task_id_vals	Derived task ID columns contain valid values.	FALSE	check_failure	errors: named list of derived task ID values. Each element contains the invalid values for each derived task ID that failed the check.
rows_unique	Columns (excluding the `value` and any derived task ID columns) contain unique combinations of task ID / output type / output type ID values	FALSE	check_failure
req_vals	Columns (excluding the `value` and any derived task ID columns) contain all required combinations of task ID / output type / output type ID values	FALSE	check_failure	missing_df: table of missing task ID/output type/output type ID value combinations
value_col_valid	Values in `value` column are coercible to data type configured for each output type	FALSE	check_failure
value_col_non_desc	Values in `value` column are non-decreasing as output_type_ids increase for all unique task ID /output type value combinations. Applies to `quantile` or `cdf` output types only	FALSE	check_failure	error_tbl: table of rows affected
value_col_sum1	Values in the `value` column of `pmf` output type data for each unique task ID combination sum to 1.	FALSE	check_failure	error_tbl: table of rows affected
spl_compound_taskid_set	Sample compound task id sets for each modeling task match or are coarser than the expected set defined in tasks.json config.	TRUE	check_error	errors: list containing item for each failing modeling task. Exact structure dependent on type of validation failure. See check function documentation for more details.
spl_compound_tid	Samples contain single unique values for each compound task ID within individual samples (v3 and above schema only).	TRUE	check_error	errors: list containing item for each sample failing validation with breakdown of unique values for each compound task ID.
spl_non_compound_tid	Samples contain single unique combination of non-compound task ID values across all samples (v3 and above schema only).	TRUE	check_error	errors: list containing item for each modeling task with vectors of output type ids of samples failing validation and example table of most frequent non-compound task ID value combination across all samples in the modeling task.
spl_n	Number of samples for a given compound idx falls within accepted compound task range (v3 and above schema only).	FALSE	check_failure	errors: list containing item for each compound_idx failing validation with sample count, metadata on expected samples and example table of expected structure for samples belonging to the compound idx in question.

Value

An object of class hub_validations. Each named element contains a hub_check class object reflecting the result of a given check. Function will return early if a check returns an error.

For more details on the structure of ⁠<hub_validations>⁠ objects, including how to access more information on individual checks, see article on ⁠<hub_validations>⁠ S3 class objects.

Examples

hub_path <- system.file("testhubs/simple", package = "hubValidations")
file_path <- "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
validate_submission(hub_path, file_path)
hub_path <- system.file("testhubs/simple", package = "hubValidations")
file_path <- "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
validate_submission(hub_path, file_path)

Validate a submitted model data file submission time.

Description

Validate a submitted model data file submission time.

Usage

validate_submission_time(
  hub_path,
  file_path,
  ref_date_from = c("file_path", "file")
)
validate_submission_time(
  hub_path,
  file_path,
  ref_date_from = c("file_path", "file")
)

Arguments

`hub_path`	Either a character string path to a local Modeling Hub directory or an object of class `⁠<SubTreeFileSystem>⁠` created using functions `s3_bucket()` or `gs_bucket()` by providing a string S3 or GCS bucket name or path to a Modeling Hub directory stored in the cloud. For more details consult the Using cloud storage (S3, GCS) in the `arrow` package. The hub must be fully configured with valid `admin.json` and `tasks.json` files within the `hub-config` directory.
`file_path`	character string. Path to the file being validated relative to the hub's model-output directory.
`ref_date_from`	whether to get the reference date around which relative submission windows will be determined from the file's `file_path` round ID or the `file` contents themselves. `file` requires that the file can be read. Only applicable when a round is configured to determine the submission windows relative to the value in a date column in model output files. Not applicable when explicit submission window start and end dates are provided in the hub's config.

Value

An object of class hub_validations. Each named element contains a hub_check class object reflecting the result of a given check. Function will return early if a check returns an error.

For more details on the structure of ⁠<hub_validations>⁠ objects, including how to access more information on individual checks, see article on ⁠<hub_validations>⁠ S3 class objects.

Examples

hub_path <- system.file("testhubs/simple", package = "hubValidations")
file_path <- "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
validate_submission_time(hub_path, file_path)
hub_path <- system.file("testhubs/simple", package = "hubValidations")
file_path <- "team1-goodmodel/2022-10-08-team1-goodmodel.csv"
validate_submission_time(hub_path, file_path)

Package 'hubValidations'

Help Index

Capture a condition of the result of validation check.

Description

Usage

Arguments

Details

Value

Examples

Capture a simple info message condition

Description

Usage

Arguments

Value

Capture an execution error condition

Description

Usage

Arguments

Value

Capture an execution warning condition

Description

Usage

Arguments

Value

Check hub correctly configured

Description

Usage

Arguments

Value

Check file exists at the file path specified

Description

Usage

Arguments

Value

Check file format is accepted by hub.

Description

Usage

Arguments

Value

Check file is being submitted to the correct folder

Description

Usage

Arguments

Value

Check number of files submitted per round does not exceed the allowed number of submissions per team.

Description

Usage

Arguments

Value

Check a model output file name can be correctly parsed.

Description

Usage

Arguments

Value

Check file can be read successfully

Description

Usage

Arguments

Value

Raise conditions stored in a hub_validations S3 object

Description

Usage

Arguments

Value

Check whether a metadata schema file exists

Description

Usage

Arguments

Value

Check file is being submitted to the correct folder

Description

Usage

Arguments

Value

Check that the metadata file is being submitted to the correct folder

Description

Usage

Arguments

Value

Check whether the file name of a metadata file matches the model_id or combination of team_abbr and model_abbr specified within the metadata file

Raise conditions stored in a `hub_validations` S3 object