Archive Importer — Operator Guide¶
Plan: PLAN-1608 v1.2 G2
Status: Shipped (v1.2)
Overview¶
Cesivi.MigrationTool archive-import imports a live SharePoint farm into a Cesivi
instance with full archive semantics: every list, item, version, attachment, content
type, field, permission, user, group, and term-store entry — frozen at import time.
After the import completes:
archive_mode = trueis set on all imported webs and lists.- Write paths (
POST,PUT,DELETE) reject with423 Lockedunless theX-Cesivi-Archive-Override: trueheader is present. - Every user encountered has an identity snapshot in
IIdentitySnapshotStore. - Every role assignment is frozen into
IArchivedAclStore. - The Cesivi ControlCenter shows import progress at
/Archive/ImportProgress. - Every imported item has an
ItemImportedevent in the durable WORM audit log. See ARCHIVE_AUDIT.md for querying and verifying the log.
Audit sink modes: The default sink (DurableAuditEventSink) writes all events
to a sealed-segment JSONL journal under <data-root>/audit/. For testing, set
Cesivi:Audit:UseInMemorySink=true in appsettings.json to revert to the
volatile in-memory ring buffer (events are lost on restart).
Prerequisites¶
| Requirement | Notes |
|---|---|
| Cesivi target instance running | Must be reachable from the machine running the tool |
| Target Cesivi admin credentials | Default admin:admin for local dev; set --target-user / --target-password for production |
| Source SharePoint URL + credentials | Site Collection Administrator permissions required |
| .NET 10.0 SDK | dotnet --version must show 10.x |
cd Cesivi.MigrationTool
dotnet build
Quick-Start¶
1. Dry-run (validate, no writes)¶
Connects to the source farm and walks all lists/items/files without writing anything to the target. Use this to estimate size and catch connection errors.
dotnet run -- archive-import \
--source "https://intranet.contoso.com" \
--username "CONTOSO\administrator" \
--password "Passw0rd" \
--dry-run
2. Full import¶
dotnet run -- archive-import \
--source "https://intranet.contoso.com" \
--username "CONTOSO\administrator" \
--password "Passw0rd" \
--target "http://localhost:11000" \
--target-user "admin" \
--target-password "admin"
--target is the Cesivi server URL. --target-user / --target-password default
to the source credentials when omitted.
3. Resume an interrupted import¶
If the tool is killed (power loss, OOM, manual Ctrl+C), restart with --resume:
dotnet run -- archive-import \
--source "https://intranet.contoso.com" \
--username "CONTOSO\administrator" \
--password "Passw0rd" \
--target "http://localhost:11000" \
--resume
The tool reads the checkpoint store and continues from the last completed item. Items already imported are skipped; progress resumes from the checkpoint boundary.
To restart from scratch (wipe the checkpoint):
dotnet run -- archive-import … --restart
4. Import without enabling archive mode¶
By default the tool sets archive_mode = true on every imported web and list.
Pass --no-set-archive-mode to skip that step (useful for testing):
dotnet run -- archive-import \
--source "https://intranet.contoso.com" \
--username "CONTOSO\administrator" \
--password "Passw0rd" \
--target "http://localhost:11000" \
--no-set-archive-mode
Command-Line Reference¶
| Option | Description | Required |
|---|---|---|
--source <url> |
Source SharePoint URL | Yes |
--username <user> |
Source farm username | Yes |
--password <pwd> |
Source farm password | Yes |
--target <url> |
Target Cesivi URL | Yes (unless --dry-run) |
--target-user <user> |
Target Cesivi username | No (defaults to --username) |
--target-password <pwd> |
Target Cesivi password | No (defaults to --password) |
--no-set-archive-mode |
Skip enabling archive_mode on imported webs/lists |
No |
--resume |
Resume from the last checkpoint | No |
--restart |
Wipe checkpoint and import from scratch | No |
--dry-run |
Validate without writing to target | No |
--checkpoint-path <path> |
Override default checkpoint directory | No |
--export-path <path> |
Override local directory for exported JSON files | No |
--kill-at-item <n> |
Test hook: stop cleanly after N items (for resume tests) | No |
--log-json |
Write Serilog JSON log to logs/ |
No |
--log-text |
Write Serilog text log to logs/ |
No |
What Gets Imported¶
The importer exports the source farm to a local JSON package using the same
DataExporter as the regular export command, then replays that package into
the target Cesivi instance via REST API calls.
Data imported:
- Webs (sub-sites)
- Lists and document libraries
- List items (all fields)
- Document versions
- Files and attachments
- Content types
- Fields (list-level and web-level)
- Role definitions and role assignments (frozen ACL)
- Users and groups
- Term store entries
- Navigation settings, property bag, regional settings
Archive-specific actions taken for each user encountered:
POST /_api/web/ensureuser— register user in targetPOST /_api/archive/identity-snapshots— capture identity snapshot withdisplayName,upn,email, andprimaryGroupsas they exist on the source
Archive-specific actions taken for each list/item with unique permissions:
POST /_api/archive/acls— freeze role assignments at import time
Verifying the Import¶
ControlCenter dashboard¶
Open http://localhost:11000/Archive/ImportProgress (or the equivalent path on
your Cesivi instance) to see:
- Webs, lists, items, files processed
- Identity snapshots captured
- ACL records frozen
- Errors (if any)
REST verification¶
# Check identity snapshots for a farm
curl -u admin:admin \
"http://localhost:11000/_api/archive/identity-snapshots?farmId=<farmId>"
# Check frozen ACL count
curl -u admin:admin \
"http://localhost:11000/_api/archive/acls?farmId=<farmId>"
# Check archive_mode on the root web
curl -u admin:admin \
"http://localhost:11000/_api/web" \
-H "Accept: application/json"
# → "ArchiveMode": true in the response
Write-gate test¶
After a successful import, write operations must be rejected:
# Should return 423 Locked
curl -u admin:admin -X POST \
"http://localhost:11000/_api/web/lists/getbytitle('Documents')/items" \
-H "Content-Type: application/json" \
-d '{"Title":"test"}'
Common Errors and Remediation¶
"Access denied" on source¶
Cause: The username does not have Site Collection Administrator permissions.
Fix: Add the account to Site Collection Administrators on the source farm, or use a higher-privileged account.
"Connection refused" on target¶
Cause: Cesivi is not running, or the --target URL is wrong.
Fix: Start Cesivi: cd Cesivi.Server && dotnet run. Verify the URL with curl http://localhost:11000/_api/web.
"archive_mode could not be set" in summary¶
Cause: The target Cesivi user does not have Site Admin privileges.
Fix: Use admin credentials via --target-user / --target-password.
Resume produces duplicate items¶
Cause: --resume was used but the checkpoint was corrupted or the source data changed between runs.
Fix: Use --restart to wipe the checkpoint and start fresh.
Identity snapshots count is 0 after import¶
Cause: The source farm returned users with empty SIDs (common with claims-based auth).
The importer falls back to the LoginName field as the snapshot key.
Fix: Verify by calling GET /_api/archive/identity-snapshots?farmId=<farmId>. If still empty, check that the source farm exposes user information via CSOM (/_vti_bin/client.svc).
"Unknown user (id=...)" shown for all archived users¶
Cause: NullFederatedIdentityLookup is the default (see _docs/ARCHIVE_IDENTITY.md → "Production federation"). Live IDP lookups are disabled; only snapshots are used.
Fix: Identity snapshots must have been captured at import time. Verify the snapshot count. If snapshots are present but users still show as "Unknown", check that the Sid / LoginName in the snapshot matches the value stored in the archived items.
Large Imports¶
Checkpoint strategy¶
The checkpoint store (default: local filesystem) records the last completed item
ID. If the import is killed, --resume restarts from that boundary. Choose a
checkpoint path on fast local storage — network shares may cause atomic-write
failures.
dotnet run -- archive-import … \
--checkpoint-path "C:\Temp\CesiviImport\checkpoint"
Memory limits¶
The importer streams items using StreamListItemsAsync (bounded memory per
PLAN-1314). File downloads are streamed; no full-file buffering. For extremely large
document libraries (>10 GB), run on a machine with at least 4 GB free RAM.
Parallel imports¶
Import one site collection per run. Parallel imports of different site collections
into different Cesivi webs are supported — run multiple processes pointing to the
same --target URL; each uses its own checkpoint path.
Related Documentation¶
- Archive Identity Resolution — three-tier user resolution (Live / Snapshot / Unknown)
- Archive Mode —
archive_modeflag semantics, write-gate, admin override - ControlCenter Identity Dashboard
- Archive Retention Enforcement — retention capture at import, hard-gate enforcement, extension workflow (PLAN-1611)
> Retention note: Retention metadata is captured automatically during import using the site's configured RetentionPolicy. Each imported item receives a RetentionUntilUtc date computed from the anchor mode (ImportDate, ItemCreated, ItemModified, or CustomField) plus DefaultWindowDays. After import, the item cannot be deleted or modified via any API surface until its retention window expires. See ARCHIVE_RETENTION.md for policy configuration and the REST API reference.¶
See also: Archive Admin Bundle — ControlCenter Quick Tour
See also: Archive Tools Operator Guide
See also: Tutorial G — SharePoint On-Premises Retirement Archive
See also: Cesivi Archive Variant A — Whitepaper
See also: Compliance Cookbook — HIPAA/GDPR/SOX/FRCP
See also: Archive Cluster Deployment Guide