This document illustrates the “full package”. It provides implementable examples of how to define a TRACE server in the formal specification of a TRS, then how to run two different types of TRACE-compliant workflows (automated and manual), and two separate ways to publish the final results (all-in-one server, and separately with a trusted repository).
Pre-requisites¶
This guide will help you set up a TRACE server and infrastructure. This involves:
Having a way to sign the TROs (Trace Record Objects) that are generated by the TRACE server. TRACE allows for GPG and X.509. See Signing for more information.
Having a way to display the TRS (Trace Record Server) capabilities and the TROs that are generated by the TRACE server, via a web server.
Reference tools are implemented using Python, but could be implemented in other languages.
Throughout, we will provide a running example for a functional sample server.
Initial setup¶
TRACE requires a digital signature mechanism. This is used to
sign the system descriptions (TRS)
sign the TROs (Trace Record Objects)
These should be permanently associated with a system, and can be used for multiple TRS. The private keys, as well as any passphrases used, should be kept secure.
Example: Generating a GPG key pair
To generate a GPG key pair, run the following command:
gpg --full-generate-keygpg (GnuPG) 2.4.4; Copyright (C) 2024 g10 Code GmbH
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Please select what kind of key you want:
(1) RSA and RSA
(2) DSA and Elgamal
(3) DSA (sign only)
(4) RSA (sign only)
(9) ECC (sign and encrypt) *default*
(10) ECC (sign only)
(14) Existing key from card
Your selection? 9
Please select which elliptic curve you want:
(1) Curve 25519 *default*
(4) NIST P-384
(6) Brainpool P-256
Your selection? 1
Please specify how long the key should be valid.
0 = key does not expire
<n> = key expires in n days
<n>w = key expires in n weeks
<n>m = key expires in n months
<n>y = key expires in n years
Key is valid for? (0) 2y
Key expires at Fri Jan 15 15:05:04 2027 UTC
Is this correct? (y/N) y
GnuPG needs to construct a user ID to identify your key.
Real name: Example Trace System
Email address: valid.email@my.organization.com
Comment:
You selected this USER-ID:
"Example Trace System <valid.email@my.organization.com>"
Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? OAt this stage you will be asked to enter a passphrase. This passphrase will be used to unlock your private key when you need to use it. Make sure you remember this passphrase, as you will need it later on. For this example we are going to use s3cr3tkey.
We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
gpg: /home/ubuntu/.gnupg/trustdb.gpg: trustdb created
gpg: directory '/home/ubuntu/.gnupg/openpgp-revocs.d' created
gpg: revocation certificate stored as '/home/ubuntu/.gnupg/openpgp-revocs.d/9A5209A464B9E0CFCCD108C96FB96730E419869C.rev'
public and secret key created and signed.
pub ed25519 2025-01-15 [SC] [expires: 2027-01-15]
9A5209A464B9E0CFCCD108C96FB96730E419869C <-------- THIS IS YOUR FINGERPRINT
uid Example Trace System <valid.email@my.organization.com>
sub cv25519 2025-01-15 [E] [expires: 2027-01-15]
Please note that the key that we just generated has a fingerprint of 9A5209A464B9E0CFCCD108C96FB96730E419869C. You will need this fingerprint for signing TROs. You can ensure that proper fingerprint and secret is used during the signing process by adding the following to your ~/.bashrc file:
export GPG_FINGERPRINT="9A5209A464B9E0CFCCD108C96FB96730E419869C"
export GPG_PASSPHRASE="s3cr3tkey"Setting up the TRACE Server Environment¶
The TRACE “server” environment is where the workflow is executed. This will depend greatly on the existing infrastructure, and is meant to be general. Your environment will need the tools to collect and sign the information for TROs, but can otherwise be quite variable. The section about TRS Description will capture this environment in a formal manner.
Installing TRO-UTILS¶
We have prepared a reference implementation in Python to simplify the generation of TROS. We suggest to use these in your TRACE server if possible. The utilities are available at tro-utils.
Define the TRACE Server capabilities¶
TROs contain by default basic information about the TRS that was used to generate them. We therefore need to specify the TRACE System
Certificate, that specifies how transparency is
supported by the system and a signing key associated with the certificate. The current implementation relies on a JSON-LD representation of the capabilities of the system. By convention, this is stored in a file trs.jsonld or similar. There can be multiple such specifications in use at the same time. Those in use should be separately published (see web server), and preserved.
The TRACE System Certificate is expressed in structured language that describe assertions about supported transparency levels and features (see transparency questions).
Preparing pipeline¶
Preparing for a TRACE-compliant workflow recording¶
The TRO toolkit should be able to access the user-provided code. Note that by most definitions of trusted workflows, this part of the recording happens in a hands-off manner, without user interaction. The first step necessarily instantiates a project-specific TRO with the unmodified user code, before it is run.
Executing defined workflow.¶
At this point, the intial TRO has been created. The various tasks that a typical workflow requires are now executed. At the core, this means executing the user-provided research code, however, it might also entail discrete additional (manual) steps. The described workflow should be explicit about these steps, and care should be taken to ensure that TRO snapshots are made at each step of more complex workflows.
Storing the Composition¶
Per the TRACE Conceptual Model, the TRO composition comprises all of the digital artifacts described in the TRO declaration. By design, elements of the composition may be stored in different locations (or possibly unpersisted) due to various restrictions. We can consider the following examples:
TRO with two arrangements: As in the current example the initial arrangement (pre-execution) and final arrangement (post-execution).
TRO with confidential elements: The initial arrangment contains confidential information that cannot be redistributed but is retained on a secure system. After execution, the second arrangement still contains confidential information and is similarly retained. After the removal of confidential information (e.g., via disclosure avoidance activities), the final arrangement is captured and disseminated.
There are variety of ways to capture the compositions for both TROs. As in the current example, we can create a ZIP or BagIt archive of the project directory at each stage, reflecting the different arrangements (possibly with confidential elements removed). For storage efficiency, the composition could also be managed using a version control system (e.g., Git) where each arrangement is a tag.
For this example, we have been capturing the compositions using ZIP archives, each of which can be published.
Finalizing TRO¶
Add details about the workflow execution¶
Once the workflow is completed, the host institution might want to augment the workflow information in the TRO with additional information. These details might be obtained from system logs or task tracking systems. These assertions should be added to the TRO before it is signed.
Timestamp and sign the TRO¶
To wrap up, the TRO is signed. This ensures that no further modifications can be made to the TRO. The signature is created using the private key and stored in a separate file. The signature also uses a time-stamp service (TSA) to ensure that the signature is valid at the time of signing.
You should now have a TRO along with its signature and a time-stamp file (TSR).
Publishing the TRO and TRS¶
Now we can proceed to publish the TRO. The organization must provide a landing page where TROs can be indexed, possibly accessed, and TRS capabilities can be viewed.
TROs themselves can reside on the organization’s web server, or a trusted repository (e.g., Zenodo, Dataverse instances, etc.).
TRS capabilities should be published on the organization’s web server. This is not integrated into this example, but see TROV demos, in particular
Optionally, when not self-hosting TROs, the organization can provide a landing page that links to the TROs hosted on other repositories.
Publicly displaying system information¶
While the TRS information is embedded into the TRO, the entire system should be documented on the organization’s own website, f.i., via the TRS Report. When multiple methods exist to create TROs (f.i., some fully automated, others with some manual intervention), multiple TRS descriptions should be used.