Transparency Questions

Transparency Questions#

The following questions are intended to guide the TRACE specification.

Questions about inputs#

Are all inputs included in a TRO?

Access to input data is important for research transparency and reproducibility. However, there are many reasons why input files may not be included in the generated TRO. Inputs may be excluded for privacy reasons, terms of use, or size. Inputs may also not be included because they are retrieved dynamically from external systems (e.g., API calls, database queries, downloaded via HTTP, etc).

To trust results contained in a TRO that does not include all inputs, the consumer may have questions about the policies of the system that generated it. For example:

If inputs are excluded, are they retained by the system, identifiable, and accessible by an authorized third party?
How is it possible to determine that an excluded input is the one that was actually used by the TRO?

If the TRO uses inputs that were dynamically retrieved or queried from external sources:

Does the system detect and record this?
Are the inputs retained, identifiable, accessible by an authorized third party?
Is information about dynamically retrieved inputs included in the TRO?

These questions require that the TRO also contain:

A complete list of inputs regardless of whether they are excluded or dynamically loaded.
Metadata about the inputs to support future identification and use (such as path, filename, hash, or persistent identifier)

Have the inputs been modified?

It is also possible that an input may be modified or removed by a TRACE system workflow during execution. An example of this is disclosure avoidance processes.

Can inputs be modified or removed by a workflow during execution?
If so, does the system detect and record this?
Is information about modified/removed inputs included in a TRO?

Questions about code#

Research transparency and reproducibility require access to the exact version of all code or scripts used to execute the steps of a computational workflow. TRACE systems, particularly manual systems, may modify code and scripts in order for them to run in their environments. Questions about code may include:

Is the exact code used to execute steps of the computational workflow included in the TRO?
If not, what is included?
Can the code be modified or deleted during TRACE workflow execution?
If so, does the system detect and record this?
Is information about modified/removed code included in a TRO?

Questions about Outputs:#

Research transparency and reproducibility requires access to the outputs associated with reported results.

Are all outputs included in a TRO?

If outputs may be excluded, are they retained, identifiable, and accessible by an authorized third party?
How is it possible to determine that an excluded outputs were generated by a particular workflow execution?

Have the outputs been modified or deleted?

Can outputs be modified or deleted during TRACE workflow execution?
If so, does the system detect and record this?
Is information about modified or deleted outputs included in a TRO?

Questions about the computational environment:#

Does the system provide a complete description of the software environment used to execute a workflow? If so, in what format? Is the environment retained/persisted/shared, identifiable and accessible by an authorized third party?
Can the environment be modified (e.g., packages installed) during workflow execution? If so, does the system detect and record this?

Questions about TRACE workflow execution#

The consumer may also have questions about the conditions under which the computational workflow was executed within a TRACE system. For example:

How are author-provided artifacts submitted to the system?
Does the system prevent network access during execution?
Does the system prevent interaction with the author during runtime?
Does the system track intermediate steps? If so, how are steps tracked and at what level?
Does the system provide information about resources used?