This repository contains reusable GitHub Actions for Datalayer workflows.
The datalayer-evals action runs Datalayer eval reports in CI and produces report artifacts.
It uses the datalayer-core Python API directly (the DatalayerClient and the
core eval-report helpers) rather than shelling out to the CLI, so the generated
reports contain the full structured failure diagnostics (per-run failure causes,
stages, types and detail excerpts). Failures are also aggregated into the GitHub
step summary and exposed as action outputs.
It supports two execution modes:
- Primary report mode (single evalset).
- Comparison mode (primary + secondary evalsets) with a generated summary markdown.
It also supports optional multi-agentspec runtime bootstrap before reporting via:
agentspec-id(single)agentspec-ids(comma-separated list)
Evalsets can be provided as IDs, or created on the fly from spec files.
Primary report mode produces, for each report:
<output-markdown>and a matching.csv(when export-csv is true)- a
<output-markdown>.logartifact containing the full structured report JSON (including every run-level failure cause) - timestamped files
report-<timestamp>.mdandreport-<timestamp>.csv
The action is implemented in Python and can be consumed from other repositories.
Authentication and billing context are supplied through repository secrets so they never appear in workflow files or logs:
| Secret | Maps to input | Required | Purpose |
|---|---|---|---|
DATALAYER_API_KEY |
api-key |
✅ Required | Authenticates every call the action makes to Datalayer. |
DATALAYER_BILLABLE_ACCOUNT_UID |
billable-account-uid |
Optional | Billable account context used for eval operations and any optional runtime creation. |
Reference them in the consumer workflow:
with:
api-key: ${{ secrets.DATALAYER_API_KEY }}
billable-account-uid: ${{ secrets.DATALAYER_BILLABLE_ACCOUNT_UID }}To make the billable account optional at dispatch time while still defaulting to
the secret, use the || fallback:
billable-account-uid: ${{ inputs.billable_account_uid || secrets.DATALAYER_BILLABLE_ACCOUNT_UID }}When upload-report-artifacts is true (the default), the action uploads the
generated markdown and CSV reports (plus the structured .log JSON and
any timestamped/secondary/comparison files) as a build artifact in a final step
— no extra upload step is required in the consumer workflow.
- evalset-id: required, target evalset UID
- evalset-spec-file: optional, path to primary evalset spec JSON; action creates evalset and reports it
- secondary-evalset-id: optional, secondary evalset UID
- secondary-evalset-spec-file: optional, path to secondary evalset spec JSON
- api-key: required, Datalayer API key
- ai-agents-url: optional, override API URL
- billable-account-uid: optional, billable account UID context for eval operations and optional runtime creation
When billable-account-uid is omitted (or empty), the action does not force a
billing override and calls run in the default account context for the API key.
- run-limit: optional, default 50
- output-markdown: optional, default evals-report.md
- secondary-output-markdown: optional, output file for secondary report
- comparison-summary-output: optional, output file for secondary-vs-primary summary
- export-csv: optional, default true
- upload-report-artifacts: optional, default true; uploads generated markdown/csv/log artifacts in a final step
- report-artifact-name: optional, default datalayer-evals-reports
- iam-url: optional, IAM URL override used when creating the optional agent runtime
- runtimes-url: optional, Runtimes URL override used when creating the optional agent runtime
- agentspec-id: optional, create an agent runtime before reporting using this spec id (default example-simple)
- agentspec-ids: optional, comma-separated list of spec ids for multi-runtime bootstrap before reporting
- agentspec: optional, URL or local file path to YAML/JSON agent spec; mutually exclusive with agentspec-id
- agent-environment-name: optional, default ai-agents-env
- agent-given-name: optional runtime name for the created agent runtime
- agent-time-reservation: optional runtime reservation in minutes, default 10
When the action creates a runtime via agentspec-id or agentspec, it automatically tears the runtime down after report generation (including early-exit paths).
At least one of evalset-id or evalset-spec-file must be provided.
- report-file: markdown report file path
- csv-file: CSV report file path (empty when export-csv=false)
- log-file: full structured report JSON log file path (captures all failure causes)
- timestamped_report_file: timestamped markdown path
- timestamped_csv_file: timestamped CSV path
- secondary-report-file: secondary markdown report path
- secondary-csv-file: secondary CSV report path
- secondary-log-file: secondary structured report JSON log
- secondary-timestamped-report-file: secondary timestamped markdown
- secondary-timestamped-csv-file: secondary timestamped CSV
- comparison-summary-file: generated comparison summary markdown
- agent-runtime-pod-name: pod name of runtime optionally created through the core client
- agent-runtime-ingress: ingress URL of that optional runtime
- agent-runtime-pod-names: JSON array of pod names created through agentspec-ids
- agent-runtime-ingresses: JSON array of ingress URLs created through agentspec-ids
- failed-run-count: total number of failed runs across primary and secondary reports
- primary-failed-run-count: number of failed runs in the primary report
- secondary-failed-run-count: number of failed runs in the secondary report
Example workflow step (single evalset):
uses: datalayer/github-actions@v1 with: evalset-id: 01KXXXXXXXXXXXX api-key: ${{ secrets.DATALAYER_API_KEY }} run-limit: "50" output-markdown: artifacts/evals-report.md export-csv: "true"
Example workflow step with runtime bootstrap from spec id before report:
uses: datalayer/github-actions@v1
with:
evalset-id: 01KXXXXXXXXXXXX
api-key:
Example workflow step with multi-agentspec bootstrap:
uses: datalayer/github-actions@v1 with: evalset-id: 01KXXXXXXXXXXXX api-key: ${{ secrets.DATALAYER_API_KEY }} agentspec-ids: example-evals,example-evals-nocodemode agent-environment-name: ai-agents-env agent-time-reservation: "10" output-markdown: artifacts/evals-report.md export-csv: "true"
The action now includes a final upload step by default (upload-report-artifacts=true) that publishes markdown/csv/log artifacts.
To disable built-in upload and manage upload yourself:
uses: datalayer/github-actions@v1 with: evalset-id: 01KXXXXXXXXXXXX api-key: ${{ secrets.DATALAYER_API_KEY }} upload-report-artifacts: "false"
Example workflow step (two spec files, one comparison run):
uses: datalayer/github-actions@v1 with: evalset-spec-file: .github/evals/no-codemode.evalset.json secondary-evalset-spec-file: .github/evals/codemode.evalset.json api-key: ${{ secrets.DATALAYER_API_KEY }} output-markdown: artifacts/no-codemode-report.md secondary-output-markdown: artifacts/codemode-report.md comparison-summary-output: artifacts/comparison-summary.md export-csv: "true"
Upload artifacts in the consumer workflow:
uses: actions/upload-artifact@v4 with: name: evals-report path: | artifacts/evals-report.md artifacts/evals-report.csv artifacts/evals-report.md.log
For two-spec comparison mode, also upload:
artifacts/no-codemode-report.md
artifacts/no-codemode-report.csv
artifacts/no-codemode-report.md.log
artifacts/codemode-report.md
artifacts/codemode-report.csv
artifacts/codemode-report.md.log
artifacts/comparison-summary.md
- Commit and push changes to main.
- Tag a version.
- Push the tag.
Commands:
git tag -a v1.0.0 -m "datalayer-evals v1.0.0" git push origin v1.0.0
Recommended tag strategy:
- Maintain a moving major tag for stable consumers.
- Example:
- v1.0.0 immutable release tag
- v1 moving major tag
Move major tag:
git tag -f v1 v1.0.0 git push -f origin v1
Consumers should reference v1 for stable updates, or pin an immutable tag for strict reproducibility.