Archive Integrity Verification (PLAN-1610)¶
Introduced: v1.2 (PLAN-1610)
Long-term integrity verification lets you continuously verify that archived content has not been silently corrupted. A background walker re-hashes stored files and compares them against the SHA-256 fingerprints recorded at import time. Any mismatch moves the item into Quarantine, triggering an alert and blocking read-access until the item is remediated or overridden.
How it works¶
-
Record ingestion — The archive-import tool calls
POST /_api/archive/integrity/recordsimmediately after writing each content item. Each record stores the item key, archive site ID, SHA-256 hash of the content, and the hash of the item metadata. -
Background walker —
IntegrityVerificationService(a hosted background service) wakes up on a configurable cadence (default: every 24 h), enumerates all pending or previously-verified records, re-reads the stored content, and re-hashes it. Each item is classified asVerified,Quarantined, orPending. -
On-access gate —
ArchiveReadIntegrityGateintercepts every content-download request. Items inQuarantinedstatus are blocked with HTTP 423 (Locked). The blocked response body includes the quarantine reason and the most recent mismatch count. -
Live progress — While a walk pass is running the walker broadcasts
WalkProgressevents over SignalR (/signalr/integrity) so the ControlCenter dashboard updates in real time.
Configuration knobs¶
All settings live under Cesivi:ArchiveIntegrity in appsettings.json or environment variables.
| Key | Default | Description |
|---|---|---|
Cesivi__ArchiveIntegrity__WalkIntervalHours |
24 |
Hours between automatic walk passes. Set to 0 to disable automatic walks. |
Cesivi__ArchiveIntegrity__WalkConcurrency |
4 |
Number of records to hash concurrently per walk pass. Higher values increase throughput but also CPU and I/O load. |
Cesivi__ArchiveIntegrity__BlockOnQuarantine |
true |
Whether to return HTTP 423 for quarantined items. Set to false to log-only mode (no blocking). |
Cesivi__ArchiveIntegrity__QuarantineThreshold |
1 |
Number of consecutive mismatches before an item is quarantined. Useful for tolerating transient storage hiccups. |
Cesivi__ArchiveIntegrity__JsonlCompactionThreshold |
50000 |
FileSystem store: number of records per JSONL segment before a new segment is started. |
Example: disable automatic walks¶
{
"Cesivi": {
"ArchiveIntegrity": {
"WalkIntervalHours": 0
}
}
}
Example: log-only mode (no blocking)¶
{
"Cesivi": {
"ArchiveIntegrity": {
"BlockOnQuarantine": false
}
}
}
Walk cadence¶
The background service (timer-based) fires every WalkIntervalHours. Each invocation:
- Queries
IIntegrityStore.GetStatsAsyncfor all registered archive sites. - For each site, enumerates records using a resumable cursor (persisted in
IIntegrityStoreviaUpdateWalkProgressAsync). - Checkpoints progress every 1,000 records so a crash or restart resumes rather than restarts.
- Marks the pass
CompletedorFailedinIntegrityWalkProgress.
Manual trigger¶
An operator or script can trigger an immediate walk pass for a specific site:
curl -X POST http://localhost:5010/_api/archive/integrity/sites/{archiveSiteId}/walks/run \
-H "Authorization: Basic <base64(SHAREPOINT\administrator:password)>" \
-H "Content-Type: application/json" \
-d '{}'
The response is 202 Accepted with a walkPassId that can be polled via GET .../current-walk.
REST API reference¶
All endpoints require site-administrator credentials.
POST /_api/archive/integrity/records¶
Ingest integrity records from the migration tool (up to 500 per request).
Request body:
[
{
"itemKey": "/sites/archive/Shared Documents/report.pdf",
"archiveSiteId": "00000000-0000-0000-0000-000000000001",
"listId": "00000000-0000-0000-0000-000000000002",
"itemId": 42,
"contentSha256": "aabbccddeeff...",
"metadataSha256": "11223344...",
"hashAlgorithm": "sha256",
"capturedAt": "2026-01-01T00:00:00Z"
}
]
Response: 202 Accepted — { "accepted": 1, "submitted": 1 }
GET /_api/archive/integrity/sites/{archiveSiteId}/status¶
Returns aggregate statistics for a site.
Response:
{
"totalItems": 12450,
"verifiedItems": 12100,
"quarantinedItems": 3,
"pendingItems": 347
}
GET /_api/archive/integrity/sites/{archiveSiteId}/mismatches¶
Lists quarantined items. Supports $skip, $top, and since query parameters.
Response:
{
"value": [ /* IntegrityRecord objects */ ],
"count": 3,
"skip": 0,
"top": 100
}
GET /_api/archive/integrity/sites/{archiveSiteId}/current-walk¶
Returns the current or most recent walk pass status.
Response (walk in progress):
{
"walkPassId": "walk-2026-05-27-001",
"archiveSiteId": "...",
"completedItems": 8200,
"mismatchCount": 1,
"status": "InProgress"
}
Response (no walk yet):
{ "walkInProgress": false }
POST /_api/archive/integrity/sites/{archiveSiteId}/walks/run¶
Triggers an immediate background walk pass. Returns immediately.
Response: 202 Accepted — { "walkPassId": "...", "accepted": true }
GET /_api/archive/integrity/items/{itemKey}¶
Returns the integrity record for a specific item. URL-encode the item key.
Response: 200 OK — IntegrityRecord JSON
Error: 404 Not Found if no record exists for that key.
Mismatch runbook¶
Step 1 — Identify the scope¶
# Get the count of quarantined items
curl http://localhost:5010/_api/archive/integrity/sites/{id}/status \
-H "Authorization: Basic ..."
# List the quarantined items
curl "http://localhost:5010/_api/archive/integrity/sites/{id}/mismatches" \
-H "Authorization: Basic ..."
Step 2 — Investigate each quarantined item¶
Common root causes:
| Symptom | Likely cause | Action |
|---|---|---|
mismatchCount = 1 |
Transient storage blip | Retrigger the walk; if it clears, no action needed |
mismatchCount >= 3, consistent hash |
Storage-level corruption | Restore from backup; remove the quarantine flag by re-importing the item |
| All items in a library quarantined at once | Bulk re-encode or migration issue | Cross-check with the audit log (ARCHIVE_AUDIT.md) for batch events |
| New items only | Import tool bug | Audit /_api/archive/audit/query for ItemImported events near the timestamp |
Step 3 — Clear a quarantine (after remediation)¶
Re-POST the corrected integrity record with the updated hash. The upsert will overwrite the quarantined record, resetting its status to Pending, and the next walk pass will re-verify it.
curl -X POST http://localhost:5010/_api/archive/integrity/records \
-H "Authorization: Basic ..." \
-H "Content-Type: application/json" \
-d '[{"itemKey":"...","archiveSiteId":"...","contentSha256":"<new hash>"}]'
ControlCenter dashboard¶
Navigate to Archive → Integrity in the ControlCenter (/Archive/Integrity). The dashboard shows a card per registered archive site, with:
- Total / Verified / Quarantined / Pending counts
- Walk-pass status and progress bar (live-updated via SignalR)
- Trigger Walk button to start an immediate pass
- View Mismatches link to the mismatch detail page
The SignalR connection status is shown as a dot in the page header (green = connected, grey = polling).
Storage layout¶
Integrity records are stored in a separate store (IIntegrityStore) that is independent of the main Cesivi storage engine. Two implementations ship:
| Implementation | Used when | Storage location |
|---|---|---|
InMemoryIntegrityStore |
In-memory storage provider | Transient; lost on restart |
FileSystemIntegrityStore |
FileSystem storage provider | <data-root>/integrity/<archiveSiteId>/records-NNN.jsonl |
Future v1.3 adapters (S3, Azure Blob) can implement IIntegrityStore using the contract tests in Cesivi.Server.Tests/Integrity/Contract/IntegrityStoreContractTests.cs as the verification harness.
See also¶
- Archive Mode — Enabling read-only archive mode per site/list
- Archive Audit Log — WORM audit trail for archive events
- Archive Identity Resolution — Historical user/group identity mapping
- Archive Importer — MigrationTool integration guide
See also: Archive Admin Bundle — ControlCenter Quick Tour
See also: Archive Tools Operator Guide
See also: Tutorial G — SharePoint On-Premises Retirement Archive
See also: Cesivi Archive Variant A — Whitepaper
See also: Compliance Cookbook — HIPAA/GDPR/SOX/FRCP
See also: Archive Cluster Deployment Guide