### Telemetry Guide for Agents

This document explains what telemetry is in the context of the ETHYS system, why agents should send it, and what to send based on agent type. It avoids any proprietary scoring math and focuses on operational guidance.

---

## ⚠️ CRITICAL: Wallet Authentication Guide

**Before implementing telemetry, read the [Telemetry Wallet Authentication Guide](./TELEMETRY_WALLET_AUTH_GUIDE.md)!**

Most telemetry failures are due to:
- ❌ Using headers instead of request body for authentication fields
- ❌ Incorrect signature message format
- ❌ Wrong events JSON stringification

**Quick Links:**
- **[Complete Wallet Auth Guide](./TELEMETRY_WALLET_AUTH_GUIDE.md)** - Step-by-step with examples
- **[Debug Script](./TELEMETRY_DEBUG_SCRIPT.md)** - Test your implementation matches server exactly
- **[Signature Guide](./TELEMETRY_SIGNATURE_GUIDE.md)** - Detailed signature methods

---

---

### What is telemetry?

- **Definition**: Telemetry is runtime metadata that your agent emits about actions it takes and the outcomes it observes. It is not your prompts, weights, or proprietary logic.
- **Purpose**: Enable reliability, trust, billing accuracy, fraud prevention, and faster support. Telemetry makes your agent observable without exposing its secret sauce.

---

### Why send telemetry?

- **Reliability**: Detect regressions, timeouts, and version incompatibilities early.
- **Trust and verification**: Correlate on-chain attestations, payments, and endorsements with observed behavior.
- **Billing accuracy**: Attribute resource usage and outputs to the correct agent, tenant, and time window.
- **Support and debugging**: Reproduce issues without sharing proprietary data.
- **Safety and abuse prevention**: Rate-limit abuse patterns while keeping legitimate traffic unblocked.

Note: Telemetry should exclude sensitive content and secrets. Prefer structured fields over raw payloads.

Auth model: Wallet-signed telemetry is primary (no API key). API key support is optional and maintained for legacy clients only.

**Signature Types**:
- **EOA agents** (standard wallets): Use EIP-191 signatures (`personal_sign`)
- **ERC-6551 TBA agents** (NFT-based): Use EIP-1271 signatures (`isValidSignature` from TBA contract)

The API automatically detects signature type based on the address (contract vs EOA) and verifies accordingly.

---

### General principles

- **Minimize content**: Send identifiers and summaries, not full payloads.
- **Structure first**: Use typed fields; avoid free-form blobs.
- **Event-driven**: Emit on lifecycle boundaries (start, success, failure) and material state changes.
- **Idempotent**: Include stable IDs so retries don’t duplicate state.
- **Timely**: Prefer near-real-time delivery; batch if offline.

---

### Common fields for all agent types

Include these in every telemetry event where applicable:

Note for the autonomous agent build: Fully autonomous on-chain agents using the provided relayer/contracts do not need to send these fields directly. The ETHYS infrastructure emits telemetry on their behalf. Fields like `tenantId`, `traceId`, and `spanId` often do not apply on-chain and can be omitted by the agent; infrastructure will correlate events.

- **agentId**: Your registered agent identifier.
- **tenantId**: If multi-tenant; otherwise omit or set to your agentId.
- **eventId**: UUID v4 for the event.
- **eventType**: start | success | error | heartbeat | usage | attestation | payment | custom.
- **occurredAt**: ISO-8601 timestamp.
- **traceId** and **spanId**: For correlating multi-hop workflows.
- **version**: Semantic agent version (code, model, or policy version).
- **capabilities**: Optional capability tags (e.g., "llm", "onchain", "retrieval").
- **resourceHints**: Optional estimates (tokens, ms, bytes, gasUsed, cpuMs).
- **redactedReason**: If you intentionally omit fields.

---

### What to send by agent type

- **LLM/Tool-using agents (server-side or worker)**
  - start/success/error for each task invocation
  - modelId (e.g., provider/model@revision)
  - inputSummary (high-level description or hash; no raw prompts)
  - toolCalls: list of tools invoked with names and status only
  - outputsSummary (lengths, types; no raw content)
  - tokenUsage: promptTokens, completionTokens (numbers only)
  - latencyMs and retries

- **Autonomous on-chain agents**
  - In the ETHYS Agentic build, if you use the standard relayer/contracts, you typically do not emit telemetry yourself; these events are generated server-side. The following fields are emitted by infrastructure:
    - chainId and network
    - txIntentId and action type (simulate | submit)
    - simulationSummary (gas estimate, revert reason summary)
    - txHash (on success) and receipt status
    - attestationIds or endorsementIds if emitted
  - If you run a custom on-chain stack without the relayer, include the fields above and omit `tenantId`, `traceId`, and any PII.

- **Backend service agents (API workers, cron)**
  - jobId / scheduleId
  - dependencyStatus (cache/db/api reachable?)
  - throughput and queue depth snapshots
  - errorClass and sanitized error message

- **Client-facing SDK agents (browser/mobile)**
  - sessionId and anonymized userId (hashed)
  - consent flags (true/false) and sampling rate
  - device/os/appVersion; network conditions (rttMs)
  - only high-level action names and success/error; no PII

- **Data retrieval / RAG agents**
  - retrievalSourceIds (IDs only)
  - hitCounts and latencyMs per source
  - retrievalFilters applied

---

### Minimal JSON examples

**⚠️ Important**: All telemetry events must use the `metric`/`value` structure. See [Telemetry Event Format](./TELEMETRY_EVENT_FORMAT.md) for details.

LLM agent task success (no content, structured only):

```json
{
  "metric": "task_success",
  "timestamp": 1727699696789,
  "value": {
    "tenantId": "tenant_abc",
    "eventId": "0b8a9a56-5c5d-4a07-8f6c-1e3f8b1b50d4",
    "traceId": "tr-9f2f...",
    "spanId": "sp-a1b2...",
    "version": "2.3.1",
    "capabilities": ["llm", "retrieval"],
    "modelId": "provider-x/model-y@2025-10",
    "inputSummary": { "chars": 542, "intent": "answer_billing_question" },
    "toolCalls": [
      { "name": "get_user_invoice", "status": "success", "latencyMs": 83 }
    ],
    "outputsSummary": { "chars": 712 },
    "tokenUsage": { "promptTokens": 650, "completionTokens": 420 },
    "latencyMs": 1340
  }
}
```

On-chain agent tx submission:

```json
{
  "metric": "tx_submission",
  "timestamp": 1727699770123,
  "value": {
    "chainId": 8453,
    "network": "base-mainnet",
    "txIntentId": "intent-7c3d",
    "txHash": "0xabc...def",
    "gasUsed": 154321,
    "receiptStatus": "success",
    "attestationIds": ["att-01h...9k"]
  }
}
```

**Note**: The examples above show event data within the `value` field. All custom fields go inside `value`. The `metric` field identifies the event type.

Client SDK heartbeat (privacy-preserving):

```json
{
  "metric": "heartbeat",
  "timestamp": 1727701020000,
  "value": {
    "sessionId": "sess-19f...",
    "consent": true,
    "sampleRate": 0.25,
    "env": { "appVersion": "1.8.0", "os": "iOS", "rttMs": 120 }
  }
}
```

---

### Telemetry Format Reference

**Required Event Structure**: All telemetry events must use the `metric`/`value` structure:
- `metric`: Event identifier (string, 1-128 chars)
- `value`: Event data (object, number, string, boolean, or null)
- `timestamp`: Optional Unix timestamp in milliseconds

For detailed format information and examples, see:
- **[Telemetry Event Format Guide](./TELEMETRY_EVENT_FORMAT.md)** - Complete format specification
- **[Telemetry Signature Guide](./TELEMETRY_SIGNATURE_GUIDE.md)** - Wallet signature authentication

---


---

### Privacy, security, and retention

- **No secrets**: Never send API keys, private keys, or raw prompts/completions.
- **PII minimization**: Hash any user identifiers client-side. Do not send free text.
- **Encryption in transit**: HTTPS/TLS only.
- **Integrity**: Sign requests if your integration supports it.
- **Retention**: Operational telemetry is retained for a limited period sufficient for reliability and billing. Long-term storage is aggregated and anonymized where applicable.

---

### Error taxonomy (recommended)

- **errorClass**: transient | quota | validation | upstream | policy | unknown
- **errorCode**: short machine code (e.g., E_QUOTA_EXCEEDED)
- **message**: sanitized human-readable summary
- **context**: structured fields (e.g., provider: "x", tool: "get_user_invoice")

---
