Title: | Testing framework for hubverse hub validations |
---|---|
Description: | This package aims at providing a simple interface to run validations on data and metadata submitted to a hubverse modeling hub. Validation tests can be run at different levels (single file, single folder, whole repository) and locally as well as part of a continuous integration workflow. |
Authors: | Anna Krystalli [aut, cre] , Evan Ray [aut], Hugo Gruson [aut] , Zhian N. Kamvar [ctb] , Consortium of Infectious Disease Modeling Hubs [cph] |
Maintainer: | Anna Krystalli <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.10.0 |
Built: | 2024-12-12 20:29:45 UTC |
Source: | https://github.com/hubverse-org/hubValidations |
Capture a condition of the result of validation check.
capture_check_cnd( check, file_path, msg_subject, msg_attribute, msg_verbs = c("is", "must be"), error = FALSE, details = NULL, ... )
capture_check_cnd( check, file_path, msg_subject, msg_attribute, msg_verbs = c("is", "must be"), error = FALSE, details = NULL, ... )
check |
logical, the result of a validation check. If |
file_path |
character string. Path to the file being validated. Must be
the relative path to the hub's |
msg_subject |
character string. The subject of the validation. |
msg_attribute |
character string. The attribute of subject being validated. |
msg_verbs |
character vector of length 2. The verbs describing the state of the attribute in relation to the validation subject. The first element describes the state when validation succeeds, the second element, when validation fails. |
error |
logical. In the case of validation failure, whether the function
should return an object of class |
details |
further details to be appended to the output message. |
... |
<dynamic> Named data fields stored inside the condition object. |
Arguments msg_subject
, msg_attribute
, msg_verbs
and details
accept text that can interpreted and formatted by cli::format_inline()
.
Depending on whether validation has succeeded and the value
of the error
argument, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
<error/check_error>
condition class object.
Returned object also inherits from subclass <hub_check>
.
capture_check_cnd( check = TRUE, file_path = "test/file.csv", msg_subject = "{.var round_id}", msg_attribute = "valid.", error = FALSE ) capture_check_cnd( check = FALSE, file_path = "test/file.csv", msg_subject = "{.var round_id}", msg_attribute = "valid.", error = FALSE, details = "Must be one of 'A' or 'B', not 'C'" ) capture_check_cnd( check = FALSE, file_path = "test/file.csv", msg_subject = "{.var round_id}", msg_attribute = "valid.", error = TRUE, details = "Must be one of {.val {c('A', 'B')}}, not {.val C}" )
capture_check_cnd( check = TRUE, file_path = "test/file.csv", msg_subject = "{.var round_id}", msg_attribute = "valid.", error = FALSE ) capture_check_cnd( check = FALSE, file_path = "test/file.csv", msg_subject = "{.var round_id}", msg_attribute = "valid.", error = FALSE, details = "Must be one of 'A' or 'B', not 'C'" ) capture_check_cnd( check = FALSE, file_path = "test/file.csv", msg_subject = "{.var round_id}", msg_attribute = "valid.", error = TRUE, details = "Must be one of {.val {c('A', 'B')}}, not {.val C}" )
Capture a simple info message condition. Useful for communicating when a check is ignored or skipped.
capture_check_info(file_path, msg, call = rlang::caller_call())
capture_check_info(file_path, msg, call = rlang::caller_call())
file_path |
character string. Path to the file being validated. Must be
the relative path to the hub's |
msg |
Character string. Accepts text that can interpreted and
formatted by |
call |
The defused call of the function that generated the message. Use to override default which uses the caller call. See rlang::stack for more details. |
A <message/check_info>
condition class object. Returned object also
inherits from subclass <hub_check>
.
Capture an execution error condition. Useful for communicating when a check
execution has failed. Usually used in conjunction with try
.
capture_exec_error(file_path, msg, call = NULL)
capture_exec_error(file_path, msg, call = NULL)
file_path |
character string. Path to the file being validated. Must be
the relative path to the hub's |
msg |
Character string. |
call |
Character string. Name of the parent call that failed to execute.
If |
A <error/check_exec_error>
condition class object. Returned object also
inherits from subclass <hub_check>
.
Capture an execution warning condition. Useful for communicating when a check
execution has failed. Usually used in conjunction with try
.
capture_exec_warning(file_path, msg, call = NULL)
capture_exec_warning(file_path, msg, call = NULL)
file_path |
character string. Path to the file being validated. Must be
the relative path to the hub's |
msg |
Character string. |
call |
Character string. Name of the parent call that failed to execute.
If |
A <warning/check_exec_warn>
condition class object. Returned object also
inherits from subclass <hub_check>
.
Checks that admin
and tasks
configuration files in directory hub-config
are valid.
check_config_hub_valid(hub_path)
check_config_hub_valid(hub_path)
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_error>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Check file exists at the file path specified
check_file_exists( file_path, hub_path = ".", subdir = c("model-output", "model-metadata", "hub-config") )
check_file_exists( file_path, hub_path = ".", subdir = c("model-output", "model-metadata", "hub-config") )
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
subdir |
subdirectory within the hub |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_error>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Check file format is accepted by hub.
check_file_format(file_path, hub_path, round_id)
check_file_format(file_path, hub_path, round_id)
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
round_id |
character string. The round identifier. |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_error>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Checks that the model_id
metadata in the file name matches the directory name
the file is being submitted to.
check_file_location(file_path)
check_file_location(file_path)
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Check number of files submitted per round does not exceed the allowed number of submissions per team.
check_file_n(file_path, hub_path, allowed_n = 1L)
check_file_n(file_path, hub_path, allowed_n = 1L)
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
allowed_n |
integer(1). The maximum number of files allowed per round. |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Check a model output file name can be correctly parsed.
check_file_name(file_path)
check_file_name(file_path)
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_error>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Check file can be read successfully
check_file_read(file_path, hub_path = ".")
check_file_read(file_path, hub_path = ".")
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_error>
condition class object.
Returned object also inherits from subclass <hub_check>
.
hub_validations
S3 objectThis is meant to be used in CI workflows to raise conditions from
hub_validations
objects but can also be useful locally to summarise the
results of checks contained in a hub_validations
S3 object.
check_for_errors(x, verbose = FALSE)
check_for_errors(x, verbose = FALSE)
x |
A |
verbose |
Logical. If |
An error if one of the elements of x
is of class check_failure
,
check_error
, check_exec_error
or check_exec_warning
.
TRUE
invisibly otherwise.
Check whether a metadata schema file exists
check_metadata_file_exists(hub_path = ".", file_path)
check_metadata_file_exists(hub_path = ".", file_path)
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
file_path |
character string. Path to the file being validated relative to the hub's model-metadata directory. |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_error>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Checks that the model_id
metadata in the file name matches the directory name
the file is being submitted to.
check_metadata_file_ext(file_path)
check_metadata_file_ext(file_path)
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Check that the metadata file is being submitted to the correct folder
check_metadata_file_location(file_path)
check_metadata_file_location(file_path)
file_path |
character string. Path to the file being validated relative to the hub's model-metadata directory. |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Check whether the file name of a metadata file matches the model_id or combination of team_abbr and model_abbr specified within the metadata file
check_metadata_file_name(file_path, hub_path = ".")
check_metadata_file_name(file_path, hub_path = ".")
file_path |
character string. Path to the file being validated relative to the hub's model-metadata directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Check whether a metadata file matches the schema provided by the hub
check_metadata_matches_schema(file_path, hub_path = ".")
check_metadata_matches_schema(file_path, hub_path = ".")
file_path |
character string. Path to the file being validated relative to the hub's model-metadata directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Check whether a metadata schema file exists
check_metadata_schema_exists(hub_path = ".")
check_metadata_schema_exists(hub_path = ".")
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Check whether a metadata file for the given model exists
check_submission_metadata_file_exists(file_path, hub_path = ".")
check_submission_metadata_file_exists(file_path, hub_path = ".")
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_error>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Checks submission is within the valid submission window for a given round.
check_submission_time( hub_path, file_path, ref_date_from = c("file", "file_path") )
check_submission_time( hub_path, file_path, ref_date_from = c("file", "file_path") )
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
ref_date_from |
whether to get the reference date around
which relative submission windows will be determined from the file's
|
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Check that model output data column datatypes conform to those define in the hub config.
check_tbl_col_types( tbl, file_path, hub_path, output_type_id_datatype = c("from_config", "auto", "character", "double", "integer", "logical", "Date") )
check_tbl_col_types( tbl, file_path, hub_path, output_type_id_datatype = c("from_config", "auto", "character", "double", "integer", "logical", "Date") )
tbl |
a tibble/data.frame of the contents of the file being validated. |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
output_type_id_datatype |
character string. One of |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Checks that a tibble/data.frame of data read in from the file being validated contains the expected task ID and standard column names according the round configuration being validated against.
check_tbl_colnames(tbl, round_id, file_path, hub_path = ".")
check_tbl_colnames(tbl, round_id, file_path, hub_path = ".")
tbl |
a tibble/data.frame of the contents of the file being validated. |
round_id |
character string. The round identifier. |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_error>
condition class object.
Returned object also inherits from subclass <hub_check>
.
This check is used to validate that values in any derived task ID columns matches accepted values for each derived task ID in the config. Given the dependence of derived task IDs on the values of other values, it ignores the combinations of derived task ID values with those of other task IDs and focuses only on identifying values that do not match the accepted values.
check_tbl_derived_task_id_vals( tbl, round_id, file_path, hub_path, derived_task_ids = get_hub_derived_task_ids(hub_path, round_id) )
check_tbl_derived_task_id_vals( tbl, round_id, file_path, hub_path, derived_task_ids = get_hub_derived_task_ids(hub_path, round_id) )
tbl |
a tibble/data.frame of the contents of the file being validated. Column types must all be character. |
round_id |
character string. The round identifier. |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
derived_task_ids |
Character vector of derived task ID names (task IDs whose
values depend on other task IDs) to ignore. Columns for such task ids will
contain |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
If no derived_task_ids
are specified, the check is skipped and a
<message/check_info>
condition class object is retuned.
Returned object also inherits from subclass <hub_check>
.
Check model output data tbl round ID matches submission round ID.
check_tbl_match_round_id(tbl, file_path, hub_path, round_id_col = NULL)
check_tbl_match_round_id(tbl, file_path, hub_path, round_id_col = NULL)
tbl |
a tibble/data.frame of the contents of the file being validated. |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
round_id_col |
Character string. The name of the column containing
|
This check only applies to files being submitted to rounds where
round_id_from_variable: true
or where a round_id_col
name is explicitly
provided. Skipped otherwise.
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_error>
condition class object.
If round_id_from_variable: false
and no round_id_col
name is provided,
check is skipped and a <message/check_info>
condition class object is
returned. If no valid round_id_col
name is provided or can extracted from
config (check through check_valid_round_id_col
), a <message/check_error>
condition class object is returned and the rest of the check skipped.
Checks that combinations of task ID, output type and output type ID value
combinations are unique, by checking that there are no duplicate rows across
all tbl
columns excluding the value
column.
check_tbl_rows_unique(tbl, file_path, hub_path)
check_tbl_rows_unique(tbl, file_path, hub_path)
tbl |
a tibble/data.frame of the contents of the file being validated. Column types must all be character. |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
Returned object also inherits from subclass <hub_check>
.
This check detects the compound task ID sets of samples, implied by the output_type_id
and task ID values, and checks them for internal consistency and compliacance with
the compound_taskid_set
defined for each round modeling task in the tasks.json
config.
check_tbl_spl_compound_taskid_set( tbl, round_id, file_path, hub_path, derived_task_ids = get_hub_derived_task_ids(hub_path) )
check_tbl_spl_compound_taskid_set( tbl, round_id, file_path, hub_path, derived_task_ids = get_hub_derived_task_ids(hub_path) )
tbl |
a tibble/data.frame of the contents of the file being validated. Column types must all be character. |
round_id |
character string. The round identifier. |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
derived_task_ids |
Character vector of derived task ID names (task IDs whose
values depend on other task IDs) to ignore. Columns for such task ids will
contain |
If the check fails, the output of the check includes an errors
element,
a list of items, one for each modeling task failing validation.
The structure depends on the reason the check failed.
If the check failed because more that a single unique compound_taskid_set
was found
for a given model task, the errors
object will be a list with one element for each
compound_taskid_set
detected and will have the following structure:
tbl_comp_tids
: a compound task id set detected in the the tbl.
output_type_ids
: The output type ID of the sample that does not contain a
single, unique value for each compound task ID.
If the check failed because task IDs which is not allowed in the config, were identified
as compound task ID (i.e. samples describe "finer" compound modeling tasks)
for a given model task, the errors
object will be a list with the structure
described above as well as the additional following elements:
config_comp_tids
: the allowed compound_taskid_set
defined in the modeling
task config.
invalid_tbl_comp_tids
: the names of invalid compound task IDs.
The name of each element is the index identifying the config modeling task the sample is associated with mt_id
.
See hubverse documentation on samples
for more details.
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_error>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Check model output data tbl samples contain single unique values for each compound task ID within individual samples
check_tbl_spl_compound_tid( tbl, round_id, file_path, hub_path, compound_taskid_set = NULL, derived_task_ids = get_hub_derived_task_ids(hub_path, round_id) )
check_tbl_spl_compound_tid( tbl, round_id, file_path, hub_path, compound_taskid_set = NULL, derived_task_ids = get_hub_derived_task_ids(hub_path, round_id) )
tbl |
a tibble/data.frame of the contents of the file being validated. Column types must all be character. |
round_id |
character string. The round identifier. |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
compound_taskid_set |
a list of |
derived_task_ids |
Character vector of derived task ID names (task IDs whose
values depend on other task IDs) to ignore. Columns for such task ids will
contain |
Output of the check includes an errors
element, a list of items,
one for each sample failing validation, with the following structure:
mt_id
: Index identifying the config modeling task the sample is associated with.
output_type_id
: The output type ID of the sample that does not contain a
single, unique value for each compound task ID.
values
: The unique values of each compound task ID.
See hubverse documentation on samples
for more details.
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_error>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Check model output data tbl samples contain the appropriate number of samples for a given compound idx.
check_tbl_spl_n( tbl, round_id, file_path, hub_path, compound_taskid_set = NULL, derived_task_ids = get_hub_derived_task_ids(hub_path, round_id) )
check_tbl_spl_n( tbl, round_id, file_path, hub_path, compound_taskid_set = NULL, derived_task_ids = get_hub_derived_task_ids(hub_path, round_id) )
tbl |
a tibble/data.frame of the contents of the file being validated. Column types must all be character. |
round_id |
character string. The round identifier. |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
compound_taskid_set |
a list of |
derived_task_ids |
Character vector of derived task ID names (task IDs whose
values depend on other task IDs) to ignore. Columns for such task ids will
contain |
Output of the check includes an errors
element, a list of items,
one for each compound_idx failing validation, with the following structure:
compound_idx
: the compound idx that failed validation of number of samples.
n
: the number of samples counted for the compound idx.
min_samples_per_task
: the minimum number of samples required for the compound idx.
max_samples_per_task
: the maximum number of samples required for the compound idx.
compound_idx_tbl
: a tibble of the expected structure for samples belonging to
the compound idx.
See hubverse documentation on samples
for more details.
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Check model output data tbl samples contain single unique combination of non-compound task ID values across all samples
check_tbl_spl_non_compound_tid( tbl, round_id, file_path, hub_path, compound_taskid_set = NULL, derived_task_ids = get_hub_derived_task_ids(hub_path, round_id) )
check_tbl_spl_non_compound_tid( tbl, round_id, file_path, hub_path, compound_taskid_set = NULL, derived_task_ids = get_hub_derived_task_ids(hub_path, round_id) )
tbl |
a tibble/data.frame of the contents of the file being validated. Column types must all be character. |
round_id |
character string. The round identifier. |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
compound_taskid_set |
a list of |
derived_task_ids |
Character vector of derived task ID names (task IDs whose
values depend on other task IDs) to ignore. Columns for such task ids will
contain |
Output of the check includes an errors
element, a list of items,
one for each modeling task containing samples failing validation,
with the following structure:
mt_id
: Index identifying the config modeling task the samples are associated with.
output_type_ids
: The output type IDs of samples that do not match the most frequent
non-compound task ID value combination across all
samples in the modeling task.
frequent
: The most frequent non-compound task ID value combination
across all samples in the modeling task to which all samples were compared.
See hubverse documentation on samples
for more details.
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_error>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Check model output data tbl contains a single unique round ID.
check_tbl_unique_round_id(tbl, file_path, hub_path, round_id_col = NULL)
check_tbl_unique_round_id(tbl, file_path, hub_path, round_id_col = NULL)
tbl |
a tibble/data.frame of the contents of the file being validated. |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
round_id_col |
Character string. The name of the column containing
|
This check only applies to files being submitted to rounds where
round_id_from_variable: true
or where a round_id_col
name is explicitly
provided. Skipped otherwise.
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_error>
condition class object.
If round_id_from_variable: false
and no round_id_col
name is provided,
check is skipped and a <message/check_info>
condition class object is
returned. If no valid round_id_col
name is provided or can extracted from
config (check through check_valid_round_id_col
), a <message/check_error>
condition class object is returned and the rest of the check skipped.
Checks that values in the value
column of a tibble/data.frame of data read
in from the file being validated conform to the configuration for each output
type of the appropriate model task.
check_tbl_value_col( tbl, round_id, file_path, hub_path, derived_task_ids = get_hub_derived_task_ids(hub_path, round_id) )
check_tbl_value_col( tbl, round_id, file_path, hub_path, derived_task_ids = get_hub_derived_task_ids(hub_path, round_id) )
tbl |
a tibble/data.frame of the contents of the file being validated. |
round_id |
character string. The round identifier. |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
derived_task_ids |
Character vector of derived task ID names (task IDs whose
values depend on other task IDs) to ignore. Columns for such task ids will
contain |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
Returned object also inherits from subclass <hub_check>
.
quantile
and cdf
output type values of model output data
are non-descendingChecks that values in the value
column for quantile
and cdf
output type
data for each unique task ID/output type combination
are non-descending when arranged by increasing output_type_id
order.
Check only performed if tbl
contains quantile
or cdf
output type data.
If not, the check is skipped and a <message/check_info>
condition class
object is returned.
check_tbl_value_col_ascending(tbl, file_path)
check_tbl_value_col_ascending(tbl, file_path)
tbl |
a tibble/data.frame of the contents of the file being validated. |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
Returned object also inherits from subclass <hub_check>
.
pmf
output type values of model output data sum to 1.Checks that values in the value
column of pmf
output type
data for each unique task ID combination sum to 1.
Check only performed if tbl
contains pmf
output type data.
If not, the check is skipped and a <message/check_info>
condition class
object is returned.
check_tbl_value_col_sum1(tbl, file_path)
check_tbl_value_col_sum1(tbl, file_path)
tbl |
a tibble/data.frame of the contents of the file being validated. |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Check model output data tbl contains valid value combinations
check_tbl_values( tbl, round_id, file_path, hub_path, derived_task_ids = get_hub_derived_task_ids(hub_path, round_id) )
check_tbl_values( tbl, round_id, file_path, hub_path, derived_task_ids = get_hub_derived_task_ids(hub_path, round_id) )
tbl |
a tibble/data.frame of the contents of the file being validated. Column types must all be character. |
round_id |
character string. The round identifier. |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
derived_task_ids |
Character vector of derived task ID names (task IDs whose
values depend on other task IDs) to ignore. Columns for such task ids will
contain |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_error>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Check all required task ID/output type/output type ID value combinations present in model data.
check_tbl_values_required( tbl, round_id, file_path, hub_path, derived_task_ids = get_hub_derived_task_ids(hub_path) )
check_tbl_values_required( tbl, round_id, file_path, hub_path, derived_task_ids = get_hub_derived_task_ids(hub_path) )
tbl |
a tibble/data.frame of the contents of the file being validated. Column types must all be character. |
round_id |
character string. The round identifier. |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
derived_task_ids |
Character vector of derived task ID names (task IDs whose
values depend on other task IDs) to ignore. Columns for such task ids will
contain |
Note that it is necessary for derived_task_ids
to be specified if any of
the task IDs a derived task ID depends on have required values. If this is the
case and derived task IDs are not specified, the dependent nature of derived
task ID values will result in false validation errors when validating
required values.
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
Returned object also inherits from subclass <hub_check>
.
round_id
determined for the submission is validCheck whether the round_id
determined for the submission is valid
check_valid_round_id(round_id, file_path, hub_path = ".")
check_valid_round_id(round_id, file_path, hub_path = ".")
round_id |
character string. The round identifier. |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_error>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Check that any round_id_col name provided or extracted from the hub config is valid.
check_valid_round_id_col(tbl, file_path, hub_path, round_id_col = NULL)
check_valid_round_id_col(tbl, file_path, hub_path, round_id_col = NULL)
tbl |
a tibble/data.frame of the contents of the file being validated. |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
round_id_col |
Character string. The name of the column containing
|
This check only applies to files being submitted to rounds where
round_id_from_variable: true
or where a round_id_col
name is explicitly
provided. Skipped otherwise.
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
If round_id_from_variable: false
and no round_id_col
name is provided,
check is skipped and a <message/check_info>
condition class object is
returned.
Returned object also inherits from subclass <hub_check>
.
hub_validations
S3 class objectsConcatenate hub_validations
S3 class objects
combine(...)
combine(...)
... |
|
a hub_validations
S3 class object.
Create a custom validation check function template file.
create_custom_check( name, hub_path = ".", r_dir = "src/validations/R", error = FALSE, conditional = FALSE, error_object = FALSE, config = FALSE, extra_args = FALSE, overwrite = FALSE )
create_custom_check( name, hub_path = ".", r_dir = "src/validations/R", error = FALSE, conditional = FALSE, error_object = FALSE, config = FALSE, extra_args = FALSE, overwrite = FALSE )
name |
Character string. Name of the custom check function. We recommend following the hubValidations package naming convention. For more details, consult the article on writing custom check functions. |
hub_path |
Character string. Path to the hub directory. Default is the current working directory. |
r_dir |
Character string. Path (relative to |
error |
Logical. Defaults to |
conditional |
Logical. If |
error_object |
Logical. If |
config |
Logical. If |
extra_args |
Logical. If |
overwrite |
Logical. If |
See the article on writing custom check functions for more.
Invisible TRUE
if the custom check function file is created successfully.
withr::with_tempdir({ # Create the custom check file with default settings. create_custom_check("check_default") cat(readLines("src/validations/R/check_default.R"), sep = "\n") # Create fully featured custom check file. create_custom_check("check_full", error = TRUE, conditional = TRUE, error_object = TRUE, config = TRUE, extra_args = TRUE ) cat(readLines("src/validations/R/check_full.R"), sep = "\n") })
withr::with_tempdir({ # Create the custom check file with default settings. create_custom_check("check_default") cat(readLines("src/validations/R/check_default.R"), sep = "\n") # Create fully featured custom check file. create_custom_check("check_full", error = TRUE, conditional = TRUE, error_object = TRUE, config = TRUE, extra_args = TRUE ) cat(readLines("src/validations/R/check_full.R"), sep = "\n") })
Create expanded grid of valid task ID and output type value combinations
expand_model_out_grid( config_tasks, round_id, required_vals_only = FALSE, force_output_types = FALSE, all_character = FALSE, output_type_id_datatype = c("from_config", "auto", "character", "double", "integer", "logical", "Date"), as_arrow_table = FALSE, bind_model_tasks = TRUE, include_sample_ids = FALSE, compound_taskid_set = NULL, output_types = NULL, derived_task_ids = get_config_derived_task_ids(config_tasks, round_id) )
expand_model_out_grid( config_tasks, round_id, required_vals_only = FALSE, force_output_types = FALSE, all_character = FALSE, output_type_id_datatype = c("from_config", "auto", "character", "double", "integer", "logical", "Date"), as_arrow_table = FALSE, bind_model_tasks = TRUE, include_sample_ids = FALSE, compound_taskid_set = NULL, output_types = NULL, derived_task_ids = get_config_derived_task_ids(config_tasks, round_id) )
config_tasks |
a list version of the content's of a hub's |
round_id |
Character string. Round identifier. If the round is set to
|
required_vals_only |
Logical. Whether to return only combinations of Task ID and related output type ID required values. |
force_output_types |
Logical. Whether to force all output types to be required.
If |
all_character |
Logical. Whether to return all character column. |
output_type_id_datatype |
character string. One of |
as_arrow_table |
Logical. Whether to return an arrow table. Defaults to |
bind_model_tasks |
Logical. Whether to bind expanded grids of values from multiple modeling tasks into a single tibble/arrow table or return a list. |
include_sample_ids |
Logical. Whether to include sample identifiers in
the |
compound_taskid_set |
List of character vectors, one for each modeling task
in the round. Can be used to override the compound task ID set defined in the
config. If |
output_types |
Character vector of output type names to include. Use to subset for grids for specific output types. |
derived_task_ids |
Character vector of derived task ID names (task IDs whose
values depend on other task IDs) to ignore. Columns for such task ids will
contain |
When a round is set to round_id_from_variable: true
,
the value of the task ID from which round IDs are derived (i.e. the task ID
specified in round_id
property of config_tasks
) is set to the value of the
round_id
argument in the returned output.
When sample output types are included in the output and include_sample_ids = TRUE
,
the output_type_id
column contains example sample indexes which are useful
for identifying the compound task ID structure of multivariate sampling
distributions in particular, i.e. which combinations of task ID values
represent individual samples.
If bind_model_tasks = TRUE
(default) a tibble or arrow table
containing all possible task ID and related output type ID
value combinations. If bind_model_tasks = FALSE
, a list containing a
tibble or arrow table for each round modeling task.
Columns are coerced to data types according to the hub schema,
unless all_character = TRUE
. If all_character = TRUE
, all columns are returned as
character which can be faster when large expanded grids are expected.
If required_vals_only = TRUE
, values are limited to the combinations of required
values only.
Note that if required_vals_only = TRUE
and an optional output type is
requested through output_types
, a zero row grid will be returned.
If all output types are requested however (i.e. when output_types = NULL
) and
they are all optional, a grid of required task ID values only will be returned.
However, whenever force_output_types = TRUE
, all output types are treated as
required.
hub_con <- hubData::connect_hub( system.file("testhubs/flusight", package = "hubUtils") ) config_tasks <- attr(hub_con, "config_tasks") expand_model_out_grid(config_tasks, round_id = "2023-01-02") expand_model_out_grid( config_tasks, round_id = "2023-01-02", required_vals_only = TRUE ) # Specifying a round in a hub with multiple round configurations. hub_con <- hubData::connect_hub( system.file("testhubs/simple", package = "hubUtils") ) config_tasks <- attr(hub_con, "config_tasks") expand_model_out_grid(config_tasks, round_id = "2022-10-01") # Later round_id maps to round config that includes additional task ID 'age_group'. expand_model_out_grid(config_tasks, round_id = "2022-10-29") # Coerce all columns to character expand_model_out_grid(config_tasks, round_id = "2022-10-29", all_character = TRUE ) # Return arrow table expand_model_out_grid(config_tasks, round_id = "2022-10-29", all_character = TRUE, as_arrow_table = TRUE ) # Hub with sample output type config_tasks <- read_config_file(system.file("config", "tasks.json", package = "hubValidations" )) expand_model_out_grid(config_tasks, round_id = "2022-12-26" ) # Include sample IDS expand_model_out_grid(config_tasks, round_id = "2022-12-26", include_sample_ids = TRUE ) # Hub with sample output type and compound task ID structure config_tasks <- read_config_file( system.file("config", "tasks-comp-tid.json", package = "hubValidations") ) expand_model_out_grid(config_tasks, round_id = "2022-12-26", include_sample_ids = TRUE ) # Override config compound task ID set # Create coarser compound task ID set for the first modeling task which contains # samples expand_model_out_grid(config_tasks, round_id = "2022-12-26", include_sample_ids = TRUE, compound_taskid_set = list( c("forecast_date", "target"), NULL ) ) expand_model_out_grid(config_tasks, round_id = "2022-12-26", include_sample_ids = TRUE, compound_taskid_set = list( NULL, NULL ) ) # Subset output types config_tasks <- read_config( system.file("testhubs", "samples", package = "hubValidations") ) expand_model_out_grid(config_tasks, round_id = "2022-10-29", include_sample_ids = TRUE, bind_model_tasks = FALSE, output_types = c("sample", "pmf"), ) expand_model_out_grid(config_tasks, round_id = "2022-10-29", include_sample_ids = TRUE, bind_model_tasks = TRUE, output_types = "sample", ) # Ignore derived task IDs expand_model_out_grid(config_tasks, round_id = "2022-10-29", include_sample_ids = TRUE, bind_model_tasks = FALSE, output_types = "sample", derived_task_ids = "target_end_date" ) # Return only required values hub_path <- system.file("testhubs", "v4", "simple", package = "hubUtils") config_tasks <- read_config(hub_path) # Return required output types and output_types_ids only expand_model_out_grid( config_tasks = config_tasks, round_id = "2022-10-22", required_vals_only = TRUE ) # Force all output types to be required expand_model_out_grid( config_tasks = config_tasks, round_id = "2022-10-22", required_vals_only = TRUE, force_output_types = TRUE ) # Sub-setting for an optional output type returns an empty data frame expand_model_out_grid( config_tasks = config_tasks, round_id = "2022-10-22", output_types = "mean", required_vals_only = TRUE ) # force_output_types on an optional output type forces all output_type_id values # to be required expand_model_out_grid( config_tasks = config_tasks, round_id = "2022-10-22", output_types = "mean", required_vals_only = TRUE, force_output_types = TRUE ) # Ignore derived task IDs hub_path <- system.file("testhubs", "v4", "flusight", package = "hubUtils") config_tasks <- read_config(hub_path) # Defaults to using derived_task_ids from config expand_model_out_grid(config_tasks, round_id = "2023-05-08") # Can be overridden by argument derived_task_ids expand_model_out_grid(config_tasks, round_id = "2023-05-08", derived_task_ids = NULL )
hub_con <- hubData::connect_hub( system.file("testhubs/flusight", package = "hubUtils") ) config_tasks <- attr(hub_con, "config_tasks") expand_model_out_grid(config_tasks, round_id = "2023-01-02") expand_model_out_grid( config_tasks, round_id = "2023-01-02", required_vals_only = TRUE ) # Specifying a round in a hub with multiple round configurations. hub_con <- hubData::connect_hub( system.file("testhubs/simple", package = "hubUtils") ) config_tasks <- attr(hub_con, "config_tasks") expand_model_out_grid(config_tasks, round_id = "2022-10-01") # Later round_id maps to round config that includes additional task ID 'age_group'. expand_model_out_grid(config_tasks, round_id = "2022-10-29") # Coerce all columns to character expand_model_out_grid(config_tasks, round_id = "2022-10-29", all_character = TRUE ) # Return arrow table expand_model_out_grid(config_tasks, round_id = "2022-10-29", all_character = TRUE, as_arrow_table = TRUE ) # Hub with sample output type config_tasks <- read_config_file(system.file("config", "tasks.json", package = "hubValidations" )) expand_model_out_grid(config_tasks, round_id = "2022-12-26" ) # Include sample IDS expand_model_out_grid(config_tasks, round_id = "2022-12-26", include_sample_ids = TRUE ) # Hub with sample output type and compound task ID structure config_tasks <- read_config_file( system.file("config", "tasks-comp-tid.json", package = "hubValidations") ) expand_model_out_grid(config_tasks, round_id = "2022-12-26", include_sample_ids = TRUE ) # Override config compound task ID set # Create coarser compound task ID set for the first modeling task which contains # samples expand_model_out_grid(config_tasks, round_id = "2022-12-26", include_sample_ids = TRUE, compound_taskid_set = list( c("forecast_date", "target"), NULL ) ) expand_model_out_grid(config_tasks, round_id = "2022-12-26", include_sample_ids = TRUE, compound_taskid_set = list( NULL, NULL ) ) # Subset output types config_tasks <- read_config( system.file("testhubs", "samples", package = "hubValidations") ) expand_model_out_grid(config_tasks, round_id = "2022-10-29", include_sample_ids = TRUE, bind_model_tasks = FALSE, output_types = c("sample", "pmf"), ) expand_model_out_grid(config_tasks, round_id = "2022-10-29", include_sample_ids = TRUE, bind_model_tasks = TRUE, output_types = "sample", ) # Ignore derived task IDs expand_model_out_grid(config_tasks, round_id = "2022-10-29", include_sample_ids = TRUE, bind_model_tasks = FALSE, output_types = "sample", derived_task_ids = "target_end_date" ) # Return only required values hub_path <- system.file("testhubs", "v4", "simple", package = "hubUtils") config_tasks <- read_config(hub_path) # Return required output types and output_types_ids only expand_model_out_grid( config_tasks = config_tasks, round_id = "2022-10-22", required_vals_only = TRUE ) # Force all output types to be required expand_model_out_grid( config_tasks = config_tasks, round_id = "2022-10-22", required_vals_only = TRUE, force_output_types = TRUE ) # Sub-setting for an optional output type returns an empty data frame expand_model_out_grid( config_tasks = config_tasks, round_id = "2022-10-22", output_types = "mean", required_vals_only = TRUE ) # force_output_types on an optional output type forces all output_type_id values # to be required expand_model_out_grid( config_tasks = config_tasks, round_id = "2022-10-22", output_types = "mean", required_vals_only = TRUE, force_output_types = TRUE ) # Ignore derived task IDs hub_path <- system.file("testhubs", "v4", "flusight", package = "hubUtils") config_tasks <- read_config(hub_path) # Defaults to using derived_task_ids from config expand_model_out_grid(config_tasks, round_id = "2023-05-08") # Can be overridden by argument derived_task_ids expand_model_out_grid(config_tasks, round_id = "2023-05-08", derived_task_ids = NULL )
<config>
class objectGet hub configuration fields from a <config>
class object
get_config_derived_task_ids(config_tasks, round_id = NULL)
get_config_derived_task_ids(config_tasks, round_id = NULL)
config_tasks |
a list version of the content's of a hub's |
round_id |
Character string. Round identifier. If the round is set to
|
get_config_derived_task_ids
: character vector of hub or round level derived
task ID names. If round_id
is NULL
or the round does not have a round level
derived_tasks_ids
setting, returns the hub level derived_tasks_ids
setting.
get_config_derived_task_ids()
: Get the hub or round level derived_tasks_ids
hub_path <- system.file("testhubs/v4/flusight", package = "hubUtils") config_tasks <- read_config(hub_path) get_config_derived_task_ids(config_tasks) get_config_derived_task_ids(config_tasks, round_id = "2023-05-08")
hub_path <- system.file("testhubs/v4/flusight", package = "hubUtils") config_tasks <- read_config(hub_path) get_config_derived_task_ids(config_tasks) get_config_derived_task_ids(config_tasks, round_id = "2023-05-08")
Detect the compound_taskid_set for a tbl for each modeling task in a given round.
get_tbl_compound_taskid_set( tbl, config_tasks, round_id, compact = TRUE, error = TRUE, derived_task_ids = get_config_derived_task_ids(config_tasks, round_id) )
get_tbl_compound_taskid_set( tbl, config_tasks, round_id, compact = TRUE, error = TRUE, derived_task_ids = get_config_derived_task_ids(config_tasks, round_id) )
tbl |
a tibble/data.frame of the contents of the file being validated. Column types must all be character. |
config_tasks |
a list representantion of the |
round_id |
Character string. The round ID. |
compact |
Logical. If TRUE, the output will be compacted to remove NULL elements. |
error |
Logical. If TRUE, an error will be thrown if the compound task ID set is not valid. If FALSE and an error is detected, the detected compound task ID set will be returned with error attributes attached. |
derived_task_ids |
Character vector of derived task ID names (task IDs whose
values depend on other task IDs) to ignore. Columns for such task ids will
contain |
A list of vectors of compound task IDs detected in the tbl, one for each
modeling task in the round. If compact
is TRUE, modeling tasks returning NULL
elements will be removed.
hub_path <- system.file("testhubs/samples", package = "hubValidations") file_path <- "flu-base/2022-10-22-flu-base.csv" round_id <- "2022-10-22" tbl <- read_model_out_file( file_path = file_path, hub_path = hub_path, coerce_types = "chr" ) config_tasks <- read_config(hub_path, "tasks") get_tbl_compound_taskid_set(tbl, config_tasks, round_id) get_tbl_compound_taskid_set(tbl, config_tasks, round_id, compact = FALSE )
hub_path <- system.file("testhubs/samples", package = "hubValidations") file_path <- "flu-base/2022-10-22-flu-base.csv" round_id <- "2022-10-22" tbl <- read_model_out_file( file_path = file_path, hub_path = hub_path, coerce_types = "chr" ) config_tasks <- read_config(hub_path, "tasks") get_tbl_compound_taskid_set(tbl, config_tasks, round_id) get_tbl_compound_taskid_set(tbl, config_tasks, round_id, compact = FALSE )
Get status of a hub check
is_success(x) is_failure(x) is_error(x) is_info(x) not_pass(x) is_exec_error(x) is_exec_warn(x) is_any_error(x)
is_success(x) is_failure(x) is_error(x) is_info(x) not_pass(x) is_exec_error(x) is_exec_warn(x) is_any_error(x)
x |
an object that inherits from class |
Logical. Is given status of check TRUE?
is_success()
: Is check success?
is_failure()
: Is check failure?
is_error()
: Is check error?
is_info()
: Is check info?
not_pass()
: Did check not pass?
is_exec_error()
: Is exec error?
is_exec_warn()
: Is exec warning?
is_any_error()
: Is error or exec error?
tbl
data to their model tasks in config_tasks
.Split and match model output tbl
data to their corresponding model tasks in
config_tasks
. Useful for performing model task specific checks on model output.
For v3 samples, the output_type_id
column is set to NA
for sample
outputs.
match_tbl_to_model_task( tbl, config_tasks, round_id, output_types = NULL, derived_task_ids = get_config_derived_task_ids(config_tasks, round_id), all_character = TRUE )
match_tbl_to_model_task( tbl, config_tasks, round_id, output_types = NULL, derived_task_ids = get_config_derived_task_ids(config_tasks, round_id), all_character = TRUE )
tbl |
a tibble/data.frame of the contents of the file being validated. |
config_tasks |
a list version of the content's of a hub's |
round_id |
Character string. Round identifier. If the round is set to
|
output_types |
Character vector of output type names to include. Use to subset for grids for specific output types. |
derived_task_ids |
Character vector of derived task ID names (task IDs whose
values depend on other task IDs) to ignore. Columns for such task ids will
contain |
all_character |
Logical. Whether to return all character column. |
A list containing a tbl_df
of model output data matched to a model
task with one element per round model task.
hub_path <- system.file("testhubs/samples", package = "hubValidations") tbl <- read_model_out_file( file_path = "flu-base/2022-10-22-flu-base.csv", hub_path, coerce_types = "chr" ) config_tasks <- read_config(hub_path, "tasks") match_tbl_to_model_task(tbl, config_tasks, round_id = "2022-10-22") match_tbl_to_model_task(tbl, config_tasks, round_id = "2022-10-22", output_types = "sample" )
hub_path <- system.file("testhubs/samples", package = "hubValidations") tbl <- read_model_out_file( file_path = "flu-base/2022-10-22-flu-base.csv", hub_path, coerce_types = "chr" ) config_tasks <- read_config(hub_path, "tasks") match_tbl_to_model_task(tbl, config_tasks, round_id = "2022-10-22") match_tbl_to_model_task(tbl, config_tasks, round_id = "2022-10-22", output_types = "sample" )
hub_validations
S3 class objectCreate new or convert list to hub_validations
S3 class object
new_hub_validations(...) as_hub_validations(x)
new_hub_validations(...) as_hub_validations(x)
... |
named elements to be included. Each element must be an object which
inherits from class |
x |
a list of named elements. Each element must be an object which
inherits from class |
an S3 object of class <hub_validations>
.
new_hub_validations()
: Create new <hub_validations>
S3 class object
as_hub_validations()
: Convert list to <hub_validations>
S3 class object
new_hub_validations() hub_path <- system.file("testhubs/simple", package = "hubValidations") file_path <- "team1-goodmodel/2022-10-08-team1-goodmodel.csv" new_hub_validations( file_exists = check_file_exists(file_path, hub_path), file_name = check_file_name(file_path) ) x <- list( file_exists = check_file_exists(file_path, hub_path), file_name = check_file_name(file_path) ) as_hub_validations(x)
new_hub_validations() hub_path <- system.file("testhubs/simple", package = "hubValidations") file_path <- "team1-goodmodel/2022-10-08-team1-goodmodel.csv" new_hub_validations( file_exists = check_file_exists(file_path, hub_path), file_name = check_file_name(file_path) ) x <- list( file_exists = check_file_exists(file_path, hub_path), file_name = check_file_name(file_path) ) as_hub_validations(x)
Check that submitting team does not exceed maximum number of allowed models per team
opt_check_metadata_team_max_model_n(file_path, hub_path, n_max = 2L)
opt_check_metadata_team_max_model_n(file_path, hub_path, n_max = 2L)
file_path |
character string. Path to the file being validated relative to the hub's model-metadata directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
n_max |
Integer. Number of maximum allowed models per team. |
Should be deployed as part of validate_model_metadata
optional checks.
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Check time difference between values in two date columns equal a defined period.
opt_check_tbl_col_timediff( tbl, file_path, hub_path, t0_colname, t1_colname, timediff = lubridate::weeks(2), output_type_id_datatype = c("from_config", "auto", "character", "double", "integer", "logical", "Date") )
opt_check_tbl_col_timediff( tbl, file_path, hub_path, t0_colname, t1_colname, timediff = lubridate::weeks(2), output_type_id_datatype = c("from_config", "auto", "character", "double", "integer", "logical", "Date") )
tbl |
a tibble/data.frame of the contents of the file being validated. |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
t0_colname |
Character string. The name of the time zero date column. |
t1_colname |
Character string. The name of the time zero + 1 time step date column. |
timediff |
an object of class |
output_type_id_datatype |
character string. One of |
Should be deployed as part of validate_model_data
optional checks.
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Check that predicted values per location are less than total location population.
opt_check_tbl_counts_lt_popn( tbl, file_path, hub_path, targets = NULL, popn_file_path = "auxiliary-data/locations.csv", popn_col = "population", location_col = "location" )
opt_check_tbl_counts_lt_popn( tbl, file_path, hub_path, targets = NULL, popn_file_path = "auxiliary-data/locations.csv", popn_col = "population", location_col = "location" )
tbl |
a tibble/data.frame of the contents of the file being validated. |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
targets |
Either a single target key list or a list of multiple target key lists. |
popn_file_path |
Character string.
Path to population data relative to the hub root.
Defaults to |
popn_col |
Character string. The name of the population size column in the population data set. |
location_col |
Character string. The name of the location column. Used to join population data to submission file data. Must be shared by both files. |
Should only be applied to rows containing count predictions. Use argument
targets
to filter tbl
data to appropriate count target rows.
Should be deployed as part of validate_model_data
optional checks.
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
Returned object also inherits from subclass <hub_check>
.
hub_path <- system.file("testhubs/flusight", package = "hubValidations") file_path <- "hub-ensemble/2023-05-08-hub-ensemble.parquet" tbl <- hubValidations::read_model_out_file(file_path, hub_path) # Single target key list targets <- list("target" = "wk ahead inc flu hosp") opt_check_tbl_counts_lt_popn(tbl, file_path, hub_path, targets = targets)
hub_path <- system.file("testhubs/flusight", package = "hubValidations") file_path <- "hub-ensemble/2023-05-08-hub-ensemble.parquet" tbl <- hubValidations::read_model_out_file(file_path, hub_path) # Single target key list targets <- list("target" = "wk ahead inc flu hosp") opt_check_tbl_counts_lt_popn(tbl, file_path, hub_path, targets = targets)
Check time difference between values in two date columns equals a defined time period defined by values in a horizon column
opt_check_tbl_horizon_timediff( tbl, file_path, hub_path, t0_colname, t1_colname, horizon_colname = "horizon", timediff = lubridate::weeks(), output_type_id_datatype = c("from_config", "auto", "character", "double", "integer", "logical", "Date") )
opt_check_tbl_horizon_timediff( tbl, file_path, hub_path, t0_colname, t1_colname, horizon_colname = "horizon", timediff = lubridate::weeks(), output_type_id_datatype = c("from_config", "auto", "character", "double", "integer", "logical", "Date") )
tbl |
a tibble/data.frame of the contents of the file being validated. |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
t0_colname |
Character string. The name of the time zero date column. |
t1_colname |
Character string. The name of the time zero + 1 time step date column. |
horizon_colname |
Character string. The name of the horizon column.
Defaults to |
timediff |
an object of class |
output_type_id_datatype |
character string. One of |
Should be deployed as part of validate_model_data
optional checks.
Depending on whether validation has succeeded, one of:
<message/check_success>
condition class object.
<error/check_failure>
condition class object.
Returned object also inherits from subclass <hub_check>
.
Parse model output file metadata from file name
parse_file_name(file_path, file_type = c("model_output", "model_metadata"))
parse_file_name(file_path, file_type = c("model_output", "model_metadata"))
file_path |
Character string. A model output file name. Can include parent directories which are ignored. |
file_type |
Character string. Type of file name being parsed. One of |
File names are allowed to contain the following compression extension prefixes:
.snappy, .gzip, .gz, .brotli, .zstd, .lz4, .lzo, .bz2.
These extension prefixes are now extracted when parsing the file name
and returned as compression_ext
element if present.
A list with the following elements:
round_id
: The round ID the model output is associated with (NA
for
model metadata files.)
team_abbr
: The team responsible for the model.
model_abbr
: The name of the model.
model_id
: The unique model ID derived from the concatenation of
<team_abbr>-<model_abbr>
.
ext
: The file extension.
compression_ext
: optional. The compression extension if present.
parse_file_name("hub-baseline/2022-10-15-hub-baseline.csv") parse_file_name("hub-baseline/2022-10-15-hub-baseline.gzip.parquet")
parse_file_name("hub-baseline/2022-10-15-hub-baseline.csv") parse_file_name("hub-baseline/2022-10-15-hub-baseline.gzip.parquet")
validate_...()
function as a bullet listPrint results of validate_...()
function as a bullet list
## S3 method for class 'hub_validations' print(x, ...)
## S3 method for class 'hub_validations' print(x, ...)
x |
An object of class |
... |
Unused argument present for class consistency |
validate_pr()
function as a bullet listPrint results of validate_pr()
function as a bullet list
## S3 method for class 'pr_hub_validations' print(x, ...)
## S3 method for class 'pr_hub_validations' print(x, ...)
x |
An object of class |
... |
Unused argument present for class consistency |
Read a model output file
read_model_out_file( file_path, hub_path = ".", coerce_types = c("hub", "chr", "none"), output_type_id_datatype = c("from_config", "auto", "character", "double", "integer", "logical", "Date") )
read_model_out_file( file_path, hub_path = ".", coerce_types = c("hub", "chr", "none"), output_type_id_datatype = c("from_config", "auto", "character", "double", "integer", "logical", "Date") )
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
coerce_types |
character. What to coerce column types to on read.
|
output_type_id_datatype |
character string. One of |
a tibble of contents of the model output file.
Create a model output submission file template
submission_tmpl( hub_con, config_tasks, round_id, required_vals_only = FALSE, force_output_types = FALSE, complete_cases_only = TRUE, compound_taskid_set = NULL, output_types = NULL, derived_task_ids = NULL )
submission_tmpl( hub_con, config_tasks, round_id, required_vals_only = FALSE, force_output_types = FALSE, complete_cases_only = TRUE, compound_taskid_set = NULL, output_types = NULL, derived_task_ids = NULL )
hub_con |
A |
config_tasks |
a list version of the content's of a hub's |
round_id |
Character string. Round identifier. If the round is set to
|
required_vals_only |
Logical. Whether to return only combinations of Task ID and related output type ID required values. |
force_output_types |
Logical. Whether to force all output types to be required.
If |
complete_cases_only |
Logical. If |
compound_taskid_set |
List of character vectors, one for each modeling task
in the round. Can be used to override the compound task ID set defined in the
config. If |
output_types |
Character vector of output type names to include. Use to subset for grids for specific output types. |
derived_task_ids |
Character vector of derived task ID names (task IDs whose
values depend on other task IDs) to ignore. Columns for such task ids will
contain |
For task IDs where all values are optional, by default, columns
are created as columns of NA
s when required_vals_only = TRUE
.
When such columns exist, the function returns a tibble with zero rows, as no
complete cases of required value combinations exists.
(Note that determination of complete cases does excludes valid NA
output_type_id
values in "mean"
and "median"
output types).
To return a template of incomplete required cases, which includes NA
columns, use
complete_cases_only = FALSE
.
To include output types that are optional in the submission template
when required_vals_only = TRUE
and complete_cases_only = FALSE
, use
force_output_types = TRUE
. Use this in combination with sub-setting for
output types you plan to submit via argument output_types
to create a
submission template customised to your submission plans.
Tip: to ensure you create a template with all required output types, it's
a good idea to first run the functions without subsetting or forcing output
types and examing the unique values in output_type
to check which output
types are required.
When sample output types are included in the output, the output_type_id
column contains example sample indexes which are useful for identifying the
compound task ID structure of multivariate sampling distributions in particular,
i.e. which combinations of task ID values represent individual samples.
When a round is set to round_id_from_variable: true
,
the value of the task ID from which round IDs are derived (i.e. the task ID
specified in round_id
property of config_tasks
) is set to the value of the
round_id
argument in the returned output.
a tibble template containing an expanded grid of valid task ID and
output type ID value combinations for a given submission round
and output type.
If required_vals_only = TRUE
, values are limited to the combination of required
values only.
hub_con <- hubData::connect_hub( system.file("testhubs/flusight", package = "hubUtils") ) submission_tmpl(hub_con, round_id = "2023-01-02") submission_tmpl( hub_con, round_id = "2023-01-02", required_vals_only = TRUE ) submission_tmpl( hub_con, round_id = "2023-01-02", required_vals_only = TRUE, complete_cases_only = FALSE ) # Specifying a round in a hub with multiple rounds hub_con <- hubData::connect_hub( system.file("testhubs/simple", package = "hubUtils") ) submission_tmpl(hub_con, round_id = "2022-10-01") submission_tmpl(hub_con, round_id = "2022-10-29") submission_tmpl(hub_con, round_id = "2022-10-29", required_vals_only = TRUE ) submission_tmpl(hub_con, round_id = "2022-10-29", required_vals_only = TRUE, complete_cases_only = FALSE ) # Hub with sample output type config_tasks <- read_config_file(system.file("config", "tasks.json", package = "hubValidations" )) submission_tmpl( config_tasks = config_tasks, round_id = "2022-12-26" ) # Hub with sample output type and compound task ID structure config_tasks <- read_config_file(system.file("config", "tasks-comp-tid.json", package = "hubValidations" )) submission_tmpl( config_tasks = config_tasks, round_id = "2022-12-26" ) # Override config compound task ID set # Create coarser compound task ID set for the first modeling task which contains # samples submission_tmpl( config_tasks = config_tasks, round_id = "2022-12-26", compound_taskid_set = list( c("forecast_date", "target"), NULL ) ) # Subsetting for a single output type submission_tmpl( config_tasks = config_tasks, round_id = "2022-12-26", output_types = "sample" ) # Derive a template with ignored derived task ID. Useful to avoid creating # a template with invalid derived task ID value combinations. config_tasks <- read_config( system.file("testhubs", "flusight", package = "hubValidations") ) submission_tmpl( config_tasks = config_tasks, round_id = "2022-12-12", output_types = "pmf", derived_task_ids = "target_end_date", complete_cases_only = FALSE ) # Force optional output type, in this case "mean". submission_tmpl( config_tasks = config_tasks, round_id = "2022-12-12", required_vals_only = TRUE, output_types = c("pmf", "quantile", "mean"), force_output_types = TRUE, derived_task_ids = "target_end_date", complete_cases_only = FALSE )
hub_con <- hubData::connect_hub( system.file("testhubs/flusight", package = "hubUtils") ) submission_tmpl(hub_con, round_id = "2023-01-02") submission_tmpl( hub_con, round_id = "2023-01-02", required_vals_only = TRUE ) submission_tmpl( hub_con, round_id = "2023-01-02", required_vals_only = TRUE, complete_cases_only = FALSE ) # Specifying a round in a hub with multiple rounds hub_con <- hubData::connect_hub( system.file("testhubs/simple", package = "hubUtils") ) submission_tmpl(hub_con, round_id = "2022-10-01") submission_tmpl(hub_con, round_id = "2022-10-29") submission_tmpl(hub_con, round_id = "2022-10-29", required_vals_only = TRUE ) submission_tmpl(hub_con, round_id = "2022-10-29", required_vals_only = TRUE, complete_cases_only = FALSE ) # Hub with sample output type config_tasks <- read_config_file(system.file("config", "tasks.json", package = "hubValidations" )) submission_tmpl( config_tasks = config_tasks, round_id = "2022-12-26" ) # Hub with sample output type and compound task ID structure config_tasks <- read_config_file(system.file("config", "tasks-comp-tid.json", package = "hubValidations" )) submission_tmpl( config_tasks = config_tasks, round_id = "2022-12-26" ) # Override config compound task ID set # Create coarser compound task ID set for the first modeling task which contains # samples submission_tmpl( config_tasks = config_tasks, round_id = "2022-12-26", compound_taskid_set = list( c("forecast_date", "target"), NULL ) ) # Subsetting for a single output type submission_tmpl( config_tasks = config_tasks, round_id = "2022-12-26", output_types = "sample" ) # Derive a template with ignored derived task ID. Useful to avoid creating # a template with invalid derived task ID value combinations. config_tasks <- read_config( system.file("testhubs", "flusight", package = "hubValidations") ) submission_tmpl( config_tasks = config_tasks, round_id = "2022-12-12", output_types = "pmf", derived_task_ids = "target_end_date", complete_cases_only = FALSE ) # Force optional output type, in this case "mean". submission_tmpl( config_tasks = config_tasks, round_id = "2022-12-12", required_vals_only = TRUE, output_types = c("pmf", "quantile", "mean"), force_output_types = TRUE, derived_task_ids = "target_end_date", complete_cases_only = FALSE )
Wrap check expression in try to capture check execution errors
try_check(expr, file_path)
try_check(expr, file_path)
expr |
check function expression to run. |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
If expr
executes correctly, the output of expr
is returned. If
execution fails, and object of class <error/check_exec_error>
is returned.
The execution error message is attached as attribute msg
.
Validate the contents of a submitted model data file
validate_model_data( hub_path, file_path, round_id_col = NULL, output_type_id_datatype = c("from_config", "auto", "character", "double", "integer", "logical", "Date"), validations_cfg_path = NULL, derived_task_ids = NULL )
validate_model_data( hub_path, file_path, round_id_col = NULL, output_type_id_datatype = c("from_config", "auto", "character", "double", "integer", "logical", "Date"), validations_cfg_path = NULL, derived_task_ids = NULL )
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
round_id_col |
Character string. The name of the column containing
|
output_type_id_datatype |
character string. One of |
validations_cfg_path |
Path to |
derived_task_ids |
Character vector of derived task ID names (task IDs whose
values depend on other task IDs) to ignore. Columns for such task ids will
contain |
Note that it is necessary for derived_task_ids
to be specified if any of
the task IDs a derived task ID depends on have required values. If this is the
case and derived task IDs are not specified, the dependent nature of derived
task ID values will result in false validation errors when validating
required values.
Details of checks performed by validate_model_data()
Name | Check | Early return | Fail output | Extra info |
---|---|---|---|---|
file_read | File can be read without errors | TRUE | check_error | |
valid_round_id_col | Round ID var from config exists in data column names. Skipped if `round_id_from_var` is FALSE in config. | FALSE | check_failure | |
unique_round_id | Round ID column contains a single unique round ID. Skipped if `round_id_from_var` is FALSE in config. | TRUE | check_error | |
match_round_id | Round ID from file contents matches round ID from file name. Skipped if `round_id_from_var` is FALSE in config. | TRUE | check_error | |
colnames | File column names match expected column names for round (i.e. task ID names + hub standard column names) | TRUE | check_error | |
col_types | File column types match expected column types from config. Mainly applicable to parquet & arrow files. | FALSE | check_failure | |
valid_vals | Columns (excluding the `value` and any derived task ID columns) contain valid combinations of task ID / output type / output type ID values | TRUE | check_error | error_tbl: table of invalid task ID/output type/output type ID value combinations |
derived_task_id_vals | Derived task ID columns contain valid values. | FALSE | check_failure | errors: named list of derived task ID values. Each element contains the invalid values for each derived task ID that failed the check. |
rows_unique | Columns (excluding the `value` and any derived task ID columns) contain unique combinations of task ID / output type / output type ID values | FALSE | check_failure | |
req_vals | Columns (excluding the `value` and any derived task ID columns) contain all required combinations of task ID / output type / output type ID values | FALSE | check_failure | missing_df: table of missing task ID/output type/output type ID value combinations |
value_col_valid | Values in `value` column are coercible to data type configured for each output type | FALSE | check_failure | |
value_col_non_desc | Values in `value` column are non-decreasing as output_type_ids increase for all unique task ID /output type value combinations. Applies to `quantile` or `cdf` output types only | FALSE | check_failure | error_tbl: table of rows affected |
value_col_sum1 | Values in the `value` column of `pmf` output type data for each unique task ID combination sum to 1. | FALSE | check_failure | error_tbl: table of rows affected |
spl_compound_taskid_set | Sample compound task id sets for each modeling task match or are coarser than the expected set defined in tasks.json config. | TRUE | check_error | errors: list containing item for each failing modeling task. Exact structure dependent on type of validation failure. See check function documentation for more details. |
spl_compound_tid | Samples contain single unique values for each compound task ID within individual samples (v3 and above schema only). | TRUE | check_error | errors: list containing item for each sample failing validation with breakdown of unique values for each compound task ID. |
spl_non_compound_tid | Samples contain single unique combination of non-compound task ID values across all samples (v3 and above schema only). | TRUE | check_error | errors: list containing item for each modeling task with vectors of output type ids of samples failing validation and example table of most frequent non-compound task ID value combination across all samples in the modeling task. |
spl_n | Number of samples for a given compound idx falls within accepted compound task range (v3 and above schema only). | FALSE | check_failure | errors: list containing item for each compound_idx failing validation with sample count, metadata on expected samples and example table of expected structure for samples belonging to the compound idx in question. |
An object of class hub_validations
. Each named element contains
a hub_check
class object reflecting the result of a given check. Function
will return early if a check returns an error.
For more details on the structure of <hub_validations>
objects, including
how to access more information on individual checks,
see article on <hub_validations>
S3 class objects.
hub_path <- system.file("testhubs/simple", package = "hubValidations") file_path <- "team1-goodmodel/2022-10-08-team1-goodmodel.csv" validate_model_data(hub_path, file_path)
hub_path <- system.file("testhubs/simple", package = "hubValidations") file_path <- "team1-goodmodel/2022-10-08-team1-goodmodel.csv" validate_model_data(hub_path, file_path)
Valid file level properties of a submitted model output file.
validate_model_file(hub_path, file_path, validations_cfg_path = NULL)
validate_model_file(hub_path, file_path, validations_cfg_path = NULL)
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
validations_cfg_path |
Path to |
Details of checks performed by validate_model_file()
Name | Check | Early return | Fail output | Extra info |
---|---|---|---|---|
file_exists | File exists at `file_path` provided | TRUE | check_error | |
file_name | File name valid | TRUE | check_error | |
file_location | File located in correct team directory | FALSE | check_failure | |
round_id_valid | File round ID is valid hub round IDs | TRUE | check_error | |
file_format | File format is accepted hub/round format | TRUE | check_error | |
file_n | Number of submission files per round per team does not exceed allowed number | FALSE | check_failure | |
metadata_exists | Model metadata file exists in expected location | FALSE | check_failure |
An object of class hub_validations
. Each named element contains
a hub_check
class object reflecting the result of a given check. Function
will return early if a check returns an error.
For more details on the structure of <hub_validations>
objects, including
how to access more information on individual checks,
see article on <hub_validations>
S3 class objects.
hub_path <- system.file("testhubs/simple", package = "hubValidations") validate_model_file(hub_path, file_path = "team1-goodmodel/2022-10-08-team1-goodmodel.csv" ) validate_model_file(hub_path, file_path = "team1-goodmodel/2022-10-15-team1-goodmodel.csv" )
hub_path <- system.file("testhubs/simple", package = "hubValidations") validate_model_file(hub_path, file_path = "team1-goodmodel/2022-10-08-team1-goodmodel.csv" ) validate_model_file(hub_path, file_path = "team1-goodmodel/2022-10-15-team1-goodmodel.csv" )
Valid properties of a metadata file.
validate_model_metadata( hub_path, file_path, round_id = "default", validations_cfg_path = NULL )
validate_model_metadata( hub_path, file_path, round_id = "default", validations_cfg_path = NULL )
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
round_id |
character string. The round identifier. Used primarily to indicate whether the "default" or a round specific configuration should be used for custom validations. |
validations_cfg_path |
Path to |
Details of checks performed by validate_model_metadata()
Name | Check | Early return | Fail output | Extra info |
---|---|---|---|---|
metadata_schema_exists | A model metadata schema file exists in `hub-config` directory. | TRUE | check_error | |
metadata_file_exists | A file with name provided to argument `file_path` exists at the expected location (the `model-metadata` directory). | TRUE | check_error | |
metadata_file_ext | The metadata file has correct extension (yaml or yml). | TRUE | check_error | |
metadata_file_location | The metadata file has been saved to correct location. | TRUE | check_failure | |
metadata_matches_schema | The contents of the metadata file match the hub's model metadata schema | TRUE | check_error | |
metadata_file_name | The metadata filename matches the model ID specified in the contents of the file. | TRUE | check_error |
An object of class hub_validations
. Each named element contains
a hub_check
class object reflecting the result of a given check. Function
will return early if a check returns an error.
hub_path <- system.file("testhubs/simple", package = "hubValidations") validate_model_metadata(hub_path, file_path = "hub-baseline.yml" ) validate_model_metadata(hub_path, file_path = "team1-goodmodel.yaml" )
hub_path <- system.file("testhubs/simple", package = "hubValidations") validate_model_metadata(hub_path, file_path = "hub-baseline.yml" ) validate_model_metadata(hub_path, file_path = "team1-goodmodel.yaml" )
Validates model output and model metadata files in a Pull Request.
validate_pr( hub_path = ".", gh_repo, pr_number, round_id_col = NULL, output_type_id_datatype = c("from_config", "auto", "character", "double", "integer", "logical", "Date"), validations_cfg_path = NULL, skip_submit_window_check = FALSE, file_modification_check = c("error", "failure", "warn", "message", "none"), allow_submit_window_mods = TRUE, submit_window_ref_date_from = c("file", "file_path"), derived_task_ids = NULL )
validate_pr( hub_path = ".", gh_repo, pr_number, round_id_col = NULL, output_type_id_datatype = c("from_config", "auto", "character", "double", "integer", "logical", "Date"), validations_cfg_path = NULL, skip_submit_window_check = FALSE, file_modification_check = c("error", "failure", "warn", "message", "none"), allow_submit_window_mods = TRUE, submit_window_ref_date_from = c("file", "file_path"), derived_task_ids = NULL )
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
gh_repo |
GitHub repository address in the format |
pr_number |
Number of the pull request to validate |
round_id_col |
Character string. The name of the column containing
|
output_type_id_datatype |
character string. One of |
validations_cfg_path |
Path to |
skip_submit_window_check |
Logical. Whether to skip the submission window check. |
file_modification_check |
Character string. Whether to perform check and what to return when modification/deletion of a previously submitted model output file or deletion of a previously submitted model metadata file is detected in PR:
|
allow_submit_window_mods |
Logical. Whether to allow modifications/deletions
of model output files within their submission windows. Defaults to |
submit_window_ref_date_from |
whether to get the reference date around
which relative submission windows will be determined from the file's
|
derived_task_ids |
Character vector of derived task ID names (task IDs whose
values depend on other task IDs) to ignore. Columns for such task ids will
contain |
Only model output and model metadata files are individually validated using
validate_submission()
or validate_model_metadata()
respectively although
as part of checks, hub config files are also validated.
Any other files included in the PR are ignored but flagged in a message.
By default, modifications (which include renaming) and deletions of
previously submitted model output files and deletions or renaming of
previously submitted model metadata files are not allowed
and return a <error/check_error>
condition class object for each
applicable modified/deleted file. This behaviour can be modified through
arguments file_modification_check
, which controls whether modification/deletion
checks are performed and what is returned if modifications/deletions are detected,
and allow_submit_window_mods
, which controls whether modifications/deletions
of model output files are allowed within their submission windows.
Note that to establish relative submission windows when performing
modification/deletion checks and allow_submit_window_mods
is TRUE
, the reference date is taken as the round_id
extracted from
the file path (i.e. submit_window_ref_date_from
is always set to "file_path"
).
This is because we cannot extract dates from columns of deleted
files. If hub submission window reference dates do not match round IDs in file paths,
currently allow_submit_window_mods
will not work correctly and is best set
to FALSE
. This only relates to hubs/rounds where submission windows are
determined relative to a reference date and not when explicit submission
window start and end dates are provided in the config.
Finally, note that it is necessary for derived_task_ids
to be specified if any of
the task IDs a derived task ID depends on have required values. If this is the
case and derived task IDs are not specified, the dependent nature of derived
task ID values will result in false validation errors when validating
required values.
Details of checks performed by validate_submission()
Name | Check | Early return | Fail output | Extra info |
---|---|---|---|---|
valid_config | Hub config valid | TRUE | check_error | |
submission_time | Current time within file submission window | FALSE | check_failure | |
file_exists | File exists at `file_path` provided | TRUE | check_error | |
file_name | File name valid | TRUE | check_error | |
file_location | File located in correct team directory | FALSE | check_failure | |
round_id_valid | File round ID is valid hub round IDs | TRUE | check_error | |
file_format | File format is accepted hub/round format | TRUE | check_error | |
file_n | Number of submission files per round per team does not exceed allowed number | FALSE | check_failure | |
metadata_exists | Model metadata file exists in expected location | FALSE | check_failure | |
file_read | File can be read without errors | TRUE | check_error | |
valid_round_id_col | Round ID var from config exists in data column names. Skipped if `round_id_from_var` is FALSE in config. | FALSE | check_failure | |
unique_round_id | Round ID column contains a single unique round ID. Skipped if `round_id_from_var` is FALSE in config. | TRUE | check_error | |
match_round_id | Round ID from file contents matches round ID from file name. Skipped if `round_id_from_var` is FALSE in config. | TRUE | check_error | |
colnames | File column names match expected column names for round (i.e. task ID names + hub standard column names) | TRUE | check_error | |
col_types | File column types match expected column types from config. Mainly applicable to parquet & arrow files. | FALSE | check_failure | |
valid_vals | Columns (excluding the `value` and any derived task ID columns) contain valid combinations of task ID / output type / output type ID values | TRUE | check_error | error_tbl: table of invalid task ID/output type/output type ID value combinations |
derived_task_id_vals | Derived task ID columns contain valid values. | FALSE | check_failure | errors: named list of derived task ID values. Each element contains the invalid values for each derived task ID that failed the check. |
rows_unique | Columns (excluding the `value` and any derived task ID columns) contain unique combinations of task ID / output type / output type ID values | FALSE | check_failure | |
req_vals | Columns (excluding the `value` and any derived task ID columns) contain all required combinations of task ID / output type / output type ID values | FALSE | check_failure | missing_df: table of missing task ID/output type/output type ID value combinations |
value_col_valid | Values in `value` column are coercible to data type configured for each output type | FALSE | check_failure | |
value_col_non_desc | Values in `value` column are non-decreasing as output_type_ids increase for all unique task ID /output type value combinations. Applies to `quantile` or `cdf` output types only | FALSE | check_failure | error_tbl: table of rows affected |
value_col_sum1 | Values in the `value` column of `pmf` output type data for each unique task ID combination sum to 1. | FALSE | check_failure | error_tbl: table of rows affected |
spl_compound_taskid_set | Sample compound task id sets for each modeling task match or are coarser than the expected set defined in tasks.json config. | TRUE | check_error | errors: list containing item for each failing modeling task. Exact structure dependent on type of validation failure. See check function documentation for more details. |
spl_compound_tid | Samples contain single unique values for each compound task ID within individual samples (v3 and above schema only). | TRUE | check_error | errors: list containing item for each sample failing validation with breakdown of unique values for each compound task ID. |
spl_non_compound_tid | Samples contain single unique combination of non-compound task ID values across all samples (v3 and above schema only). | TRUE | check_error | errors: list containing item for each modeling task with vectors of output type ids of samples failing validation and example table of most frequent non-compound task ID value combination across all samples in the modeling task. |
spl_n | Number of samples for a given compound idx falls within accepted compound task range (v3 and above schema only). | FALSE | check_failure | errors: list containing item for each compound_idx failing validation with sample count, metadata on expected samples and example table of expected structure for samples belonging to the compound idx in question. |
Details of checks performed by validate_model_metadata()
Name | Check | Early return | Fail output | Extra info | optional |
---|---|---|---|---|---|
metadata_schema_exists | A model metadata schema file exists in `hub-config` directory. | TRUE | check_error | FALSE | |
metadata_file_exists | A file with name provided to argument `file_path` exists at the expected location (the `model-metadata` directory). | TRUE | check_error | FALSE | |
metadata_file_ext | The metadata file has correct extension (yaml or yml). | TRUE | check_error | FALSE | |
metadata_file_location | The metadata file has been saved to correct location. | TRUE | check_failure | FALSE | |
metadata_matches_schema | The contents of the metadata file match the hub's model metadata schema | TRUE | check_error | FALSE | |
metadata_file_name | The metadata filename matches the model ID specified in the contents of the file. | TRUE | check_error | FALSE | |
NA | The number of metadata files submitted by a single team does not exceed the maximum number allowed. | FALSE | check_failure | TRUE |
An object of class hub_validations
.
## Not run: validate_pr( hub_path = ".", gh_repo = "hubverse-org/ci-testhub-simple", pr_number = 3 ) ## End(Not run)
## Not run: validate_pr( hub_path = ".", gh_repo = "hubverse-org/ci-testhub-simple", pr_number = 3 ) ## End(Not run)
Checks both file level properties like file name, extension, location etc as well as model output data, i.e. the contents of the file.
validate_submission( hub_path, file_path, round_id_col = NULL, validations_cfg_path = NULL, output_type_id_datatype = c("from_config", "auto", "character", "double", "integer", "logical", "Date"), skip_submit_window_check = FALSE, skip_check_config = FALSE, submit_window_ref_date_from = c("file", "file_path"), derived_task_ids = NULL )
validate_submission( hub_path, file_path, round_id_col = NULL, validations_cfg_path = NULL, output_type_id_datatype = c("from_config", "auto", "character", "double", "integer", "logical", "Date"), skip_submit_window_check = FALSE, skip_check_config = FALSE, submit_window_ref_date_from = c("file", "file_path"), derived_task_ids = NULL )
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
round_id_col |
Character string. The name of the column containing
|
validations_cfg_path |
Path to |
output_type_id_datatype |
character string. One of |
skip_submit_window_check |
Logical. Whether to skip the submission window check. |
skip_check_config |
Logical. Whether to skip the hub config validation check. check. |
submit_window_ref_date_from |
whether to get the reference date around
which relative submission windows will be determined from the file's
|
derived_task_ids |
Character vector of derived task ID names (task IDs whose
values depend on other task IDs) to ignore. Columns for such task ids will
contain |
Note that it is necessary for derived_task_ids
to be specified if any of
the task IDs a derived task ID depends on have required values. If this is the
case and derived task IDs are not specified, the dependent nature of derived
task ID values will result in false validation errors when validating
required values.
Details of checks performed by validate_submission()
Name | Check | Early return | Fail output | Extra info |
---|---|---|---|---|
valid_config | Hub config valid | TRUE | check_error | |
submission_time | Current time within file submission window | FALSE | check_failure | |
file_exists | File exists at `file_path` provided | TRUE | check_error | |
file_name | File name valid | TRUE | check_error | |
file_location | File located in correct team directory | FALSE | check_failure | |
round_id_valid | File round ID is valid hub round IDs | TRUE | check_error | |
file_format | File format is accepted hub/round format | TRUE | check_error | |
file_n | Number of submission files per round per team does not exceed allowed number | FALSE | check_failure | |
metadata_exists | Model metadata file exists in expected location | FALSE | check_failure | |
file_read | File can be read without errors | TRUE | check_error | |
valid_round_id_col | Round ID var from config exists in data column names. Skipped if `round_id_from_var` is FALSE in config. | FALSE | check_failure | |
unique_round_id | Round ID column contains a single unique round ID. Skipped if `round_id_from_var` is FALSE in config. | TRUE | check_error | |
match_round_id | Round ID from file contents matches round ID from file name. Skipped if `round_id_from_var` is FALSE in config. | TRUE | check_error | |
colnames | File column names match expected column names for round (i.e. task ID names + hub standard column names) | TRUE | check_error | |
col_types | File column types match expected column types from config. Mainly applicable to parquet & arrow files. | FALSE | check_failure | |
valid_vals | Columns (excluding the `value` and any derived task ID columns) contain valid combinations of task ID / output type / output type ID values | TRUE | check_error | error_tbl: table of invalid task ID/output type/output type ID value combinations |
derived_task_id_vals | Derived task ID columns contain valid values. | FALSE | check_failure | errors: named list of derived task ID values. Each element contains the invalid values for each derived task ID that failed the check. |
rows_unique | Columns (excluding the `value` and any derived task ID columns) contain unique combinations of task ID / output type / output type ID values | FALSE | check_failure | |
req_vals | Columns (excluding the `value` and any derived task ID columns) contain all required combinations of task ID / output type / output type ID values | FALSE | check_failure | missing_df: table of missing task ID/output type/output type ID value combinations |
value_col_valid | Values in `value` column are coercible to data type configured for each output type | FALSE | check_failure | |
value_col_non_desc | Values in `value` column are non-decreasing as output_type_ids increase for all unique task ID /output type value combinations. Applies to `quantile` or `cdf` output types only | FALSE | check_failure | error_tbl: table of rows affected |
value_col_sum1 | Values in the `value` column of `pmf` output type data for each unique task ID combination sum to 1. | FALSE | check_failure | error_tbl: table of rows affected |
spl_compound_taskid_set | Sample compound task id sets for each modeling task match or are coarser than the expected set defined in tasks.json config. | TRUE | check_error | errors: list containing item for each failing modeling task. Exact structure dependent on type of validation failure. See check function documentation for more details. |
spl_compound_tid | Samples contain single unique values for each compound task ID within individual samples (v3 and above schema only). | TRUE | check_error | errors: list containing item for each sample failing validation with breakdown of unique values for each compound task ID. |
spl_non_compound_tid | Samples contain single unique combination of non-compound task ID values across all samples (v3 and above schema only). | TRUE | check_error | errors: list containing item for each modeling task with vectors of output type ids of samples failing validation and example table of most frequent non-compound task ID value combination across all samples in the modeling task. |
spl_n | Number of samples for a given compound idx falls within accepted compound task range (v3 and above schema only). | FALSE | check_failure | errors: list containing item for each compound_idx failing validation with sample count, metadata on expected samples and example table of expected structure for samples belonging to the compound idx in question. |
An object of class hub_validations
. Each named element contains
a hub_check
class object reflecting the result of a given check. Function
will return early if a check returns an error.
For more details on the structure of <hub_validations>
objects, including
how to access more information on individual checks,
see article on <hub_validations>
S3 class objects.
hub_path <- system.file("testhubs/simple", package = "hubValidations") file_path <- "team1-goodmodel/2022-10-08-team1-goodmodel.csv" validate_submission(hub_path, file_path)
hub_path <- system.file("testhubs/simple", package = "hubValidations") file_path <- "team1-goodmodel/2022-10-08-team1-goodmodel.csv" validate_submission(hub_path, file_path)
Validate a submitted model data file submission time.
validate_submission_time( hub_path, file_path, ref_date_from = c("file_path", "file") )
validate_submission_time( hub_path, file_path, ref_date_from = c("file_path", "file") )
hub_path |
Either a character string path to a local Modeling Hub directory
or an object of class |
file_path |
character string. Path to the file being validated relative to the hub's model-output directory. |
ref_date_from |
whether to get the reference date around
which relative submission windows will be determined from the file's
|
An object of class hub_validations
. Each named element contains
a hub_check
class object reflecting the result of a given check. Function
will return early if a check returns an error.
For more details on the structure of <hub_validations>
objects, including
how to access more information on individual checks,
see article on <hub_validations>
S3 class objects.
hub_path <- system.file("testhubs/simple", package = "hubValidations") file_path <- "team1-goodmodel/2022-10-08-team1-goodmodel.csv" validate_submission_time(hub_path, file_path)
hub_path <- system.file("testhubs/simple", package = "hubValidations") file_path <- "team1-goodmodel/2022-10-08-team1-goodmodel.csv" validate_submission_time(hub_path, file_path)