Skip to content

Archive Integrity Verification (PLAN-1610)

Introduced: v1.2 (PLAN-1610)

Long-term integrity verification lets you continuously verify that archived content has not been silently corrupted. A background walker re-hashes stored files and compares them against the SHA-256 fingerprints recorded at import time. Any mismatch moves the item into Quarantine, triggering an alert and blocking read-access until the item is remediated or overridden.


How it works

  1. Record ingestion — The archive-import tool calls POST /_api/archive/integrity/records immediately after writing each content item. Each record stores the item key, archive site ID, SHA-256 hash of the content, and the hash of the item metadata.

  2. Background walkerIntegrityVerificationService (a hosted background service) wakes up on a configurable cadence (default: every 24 h), enumerates all pending or previously-verified records, re-reads the stored content, and re-hashes it. Each item is classified as Verified, Quarantined, or Pending.

  3. On-access gateArchiveReadIntegrityGate intercepts every content-download request. Items in Quarantined status are blocked with HTTP 423 (Locked). The blocked response body includes the quarantine reason and the most recent mismatch count.

  4. Live progress — While a walk pass is running the walker broadcasts WalkProgress events over SignalR (/signalr/integrity) so the ControlCenter dashboard updates in real time.


Configuration knobs

All settings live under Cesivi:ArchiveIntegrity in appsettings.json or environment variables.

Key Default Description
Cesivi__ArchiveIntegrity__WalkIntervalHours 24 Hours between automatic walk passes. Set to 0 to disable automatic walks.
Cesivi__ArchiveIntegrity__WalkConcurrency 4 Number of records to hash concurrently per walk pass. Higher values increase throughput but also CPU and I/O load.
Cesivi__ArchiveIntegrity__BlockOnQuarantine true Whether to return HTTP 423 for quarantined items. Set to false to log-only mode (no blocking).
Cesivi__ArchiveIntegrity__QuarantineThreshold 1 Number of consecutive mismatches before an item is quarantined. Useful for tolerating transient storage hiccups.
Cesivi__ArchiveIntegrity__JsonlCompactionThreshold 50000 FileSystem store: number of records per JSONL segment before a new segment is started.

Example: disable automatic walks

{
  "Cesivi": {
    "ArchiveIntegrity": {
      "WalkIntervalHours": 0
    }
  }
}

Example: log-only mode (no blocking)

{
  "Cesivi": {
    "ArchiveIntegrity": {
      "BlockOnQuarantine": false
    }
  }
}

Walk cadence

The background service (timer-based) fires every WalkIntervalHours. Each invocation:

  1. Queries IIntegrityStore.GetStatsAsync for all registered archive sites.
  2. For each site, enumerates records using a resumable cursor (persisted in IIntegrityStore via UpdateWalkProgressAsync).
  3. Checkpoints progress every 1,000 records so a crash or restart resumes rather than restarts.
  4. Marks the pass Completed or Failed in IntegrityWalkProgress.

Manual trigger

An operator or script can trigger an immediate walk pass for a specific site:

curl -X POST http://localhost:5010/_api/archive/integrity/sites/{archiveSiteId}/walks/run \
  -H "Authorization: Basic <base64(SHAREPOINT\administrator:password)>" \
  -H "Content-Type: application/json" \
  -d '{}'

The response is 202 Accepted with a walkPassId that can be polled via GET .../current-walk.


REST API reference

All endpoints require site-administrator credentials.

POST /_api/archive/integrity/records

Ingest integrity records from the migration tool (up to 500 per request).

Request body:

[
  {
    "itemKey": "/sites/archive/Shared Documents/report.pdf",
    "archiveSiteId": "00000000-0000-0000-0000-000000000001",
    "listId": "00000000-0000-0000-0000-000000000002",
    "itemId": 42,
    "contentSha256": "aabbccddeeff...",
    "metadataSha256": "11223344...",
    "hashAlgorithm": "sha256",
    "capturedAt": "2026-01-01T00:00:00Z"
  }
]

Response: 202 Accepted{ "accepted": 1, "submitted": 1 }


GET /_api/archive/integrity/sites/{archiveSiteId}/status

Returns aggregate statistics for a site.

Response:

{
  "totalItems": 12450,
  "verifiedItems": 12100,
  "quarantinedItems": 3,
  "pendingItems": 347
}


GET /_api/archive/integrity/sites/{archiveSiteId}/mismatches

Lists quarantined items. Supports $skip, $top, and since query parameters.

Response:

{
  "value": [ /* IntegrityRecord objects */ ],
  "count": 3,
  "skip": 0,
  "top": 100
}


GET /_api/archive/integrity/sites/{archiveSiteId}/current-walk

Returns the current or most recent walk pass status.

Response (walk in progress):

{
  "walkPassId": "walk-2026-05-27-001",
  "archiveSiteId": "...",
  "completedItems": 8200,
  "mismatchCount": 1,
  "status": "InProgress"
}

Response (no walk yet):

{ "walkInProgress": false }


POST /_api/archive/integrity/sites/{archiveSiteId}/walks/run

Triggers an immediate background walk pass. Returns immediately.

Response: 202 Accepted{ "walkPassId": "...", "accepted": true }


GET /_api/archive/integrity/items/{itemKey}

Returns the integrity record for a specific item. URL-encode the item key.

Response: 200 OKIntegrityRecord JSON
Error: 404 Not Found if no record exists for that key.


Mismatch runbook

Step 1 — Identify the scope

# Get the count of quarantined items
curl http://localhost:5010/_api/archive/integrity/sites/{id}/status \
  -H "Authorization: Basic ..."

# List the quarantined items
curl "http://localhost:5010/_api/archive/integrity/sites/{id}/mismatches" \
  -H "Authorization: Basic ..."

Step 2 — Investigate each quarantined item

Common root causes:

Symptom Likely cause Action
mismatchCount = 1 Transient storage blip Retrigger the walk; if it clears, no action needed
mismatchCount >= 3, consistent hash Storage-level corruption Restore from backup; remove the quarantine flag by re-importing the item
All items in a library quarantined at once Bulk re-encode or migration issue Cross-check with the audit log (ARCHIVE_AUDIT.md) for batch events
New items only Import tool bug Audit /_api/archive/audit/query for ItemImported events near the timestamp

Step 3 — Clear a quarantine (after remediation)

Re-POST the corrected integrity record with the updated hash. The upsert will overwrite the quarantined record, resetting its status to Pending, and the next walk pass will re-verify it.

curl -X POST http://localhost:5010/_api/archive/integrity/records \
  -H "Authorization: Basic ..." \
  -H "Content-Type: application/json" \
  -d '[{"itemKey":"...","archiveSiteId":"...","contentSha256":"<new hash>"}]'

ControlCenter dashboard

Navigate to Archive → Integrity in the ControlCenter (/Archive/Integrity). The dashboard shows a card per registered archive site, with:

  • Total / Verified / Quarantined / Pending counts
  • Walk-pass status and progress bar (live-updated via SignalR)
  • Trigger Walk button to start an immediate pass
  • View Mismatches link to the mismatch detail page

The SignalR connection status is shown as a dot in the page header (green = connected, grey = polling).


Storage layout

Integrity records are stored in a separate store (IIntegrityStore) that is independent of the main Cesivi storage engine. Two implementations ship:

Implementation Used when Storage location
InMemoryIntegrityStore In-memory storage provider Transient; lost on restart
FileSystemIntegrityStore FileSystem storage provider <data-root>/integrity/<archiveSiteId>/records-NNN.jsonl

Future v1.3 adapters (S3, Azure Blob) can implement IIntegrityStore using the contract tests in Cesivi.Server.Tests/Integrity/Contract/IntegrityStoreContractTests.cs as the verification harness.


See also


See also: Archive Admin Bundle — ControlCenter Quick Tour

See also: Archive Tools Operator Guide

See also: Tutorial G — SharePoint On-Premises Retirement Archive

See also: Cesivi Archive Variant A — Whitepaper

See also: Compliance Cookbook — HIPAA/GDPR/SOX/FRCP

See also: Archive Cluster Deployment Guide