Skip to content

Archive Importer — Operator Guide

Plan: PLAN-1608 v1.2 G2
Status: Shipped (v1.2)


Overview

Cesivi.MigrationTool archive-import imports a live SharePoint farm into a Cesivi instance with full archive semantics: every list, item, version, attachment, content type, field, permission, user, group, and term-store entry — frozen at import time.

After the import completes:

  • archive_mode = true is set on all imported webs and lists.
  • Write paths (POST, PUT, DELETE) reject with 423 Locked unless the X-Cesivi-Archive-Override: true header is present.
  • Every user encountered has an identity snapshot in IIdentitySnapshotStore.
  • Every role assignment is frozen into IArchivedAclStore.
  • The Cesivi ControlCenter shows import progress at /Archive/ImportProgress.
  • Every imported item has an ItemImported event in the durable WORM audit log. See ARCHIVE_AUDIT.md for querying and verifying the log.

Audit sink modes: The default sink (DurableAuditEventSink) writes all events to a sealed-segment JSONL journal under <data-root>/audit/. For testing, set Cesivi:Audit:UseInMemorySink=true in appsettings.json to revert to the volatile in-memory ring buffer (events are lost on restart).


Prerequisites

Requirement Notes
Cesivi target instance running Must be reachable from the machine running the tool
Target Cesivi admin credentials Default admin:admin for local dev; set --target-user / --target-password for production
Source SharePoint URL + credentials Site Collection Administrator permissions required
.NET 10.0 SDK dotnet --version must show 10.x
cd Cesivi.MigrationTool
dotnet build

Quick-Start

1. Dry-run (validate, no writes)

Connects to the source farm and walks all lists/items/files without writing anything to the target. Use this to estimate size and catch connection errors.

dotnet run -- archive-import \
  --source "https://intranet.contoso.com" \
  --username "CONTOSO\administrator" \
  --password "Passw0rd" \
  --dry-run

2. Full import

dotnet run -- archive-import \
  --source "https://intranet.contoso.com" \
  --username "CONTOSO\administrator" \
  --password "Passw0rd" \
  --target  "http://localhost:11000" \
  --target-user "admin" \
  --target-password "admin"

--target is the Cesivi server URL. --target-user / --target-password default to the source credentials when omitted.

3. Resume an interrupted import

If the tool is killed (power loss, OOM, manual Ctrl+C), restart with --resume:

dotnet run -- archive-import \
  --source "https://intranet.contoso.com" \
  --username "CONTOSO\administrator" \
  --password "Passw0rd" \
  --target  "http://localhost:11000" \
  --resume

The tool reads the checkpoint store and continues from the last completed item. Items already imported are skipped; progress resumes from the checkpoint boundary.

To restart from scratch (wipe the checkpoint):

dotnet run -- archive-import  --restart

4. Import without enabling archive mode

By default the tool sets archive_mode = true on every imported web and list. Pass --no-set-archive-mode to skip that step (useful for testing):

dotnet run -- archive-import \
  --source "https://intranet.contoso.com" \
  --username "CONTOSO\administrator" \
  --password "Passw0rd" \
  --target  "http://localhost:11000" \
  --no-set-archive-mode

Command-Line Reference

Option Description Required
--source <url> Source SharePoint URL Yes
--username <user> Source farm username Yes
--password <pwd> Source farm password Yes
--target <url> Target Cesivi URL Yes (unless --dry-run)
--target-user <user> Target Cesivi username No (defaults to --username)
--target-password <pwd> Target Cesivi password No (defaults to --password)
--no-set-archive-mode Skip enabling archive_mode on imported webs/lists No
--resume Resume from the last checkpoint No
--restart Wipe checkpoint and import from scratch No
--dry-run Validate without writing to target No
--checkpoint-path <path> Override default checkpoint directory No
--export-path <path> Override local directory for exported JSON files No
--kill-at-item <n> Test hook: stop cleanly after N items (for resume tests) No
--log-json Write Serilog JSON log to logs/ No
--log-text Write Serilog text log to logs/ No

What Gets Imported

The importer exports the source farm to a local JSON package using the same DataExporter as the regular export command, then replays that package into the target Cesivi instance via REST API calls.

Data imported:

  • Webs (sub-sites)
  • Lists and document libraries
  • List items (all fields)
  • Document versions
  • Files and attachments
  • Content types
  • Fields (list-level and web-level)
  • Role definitions and role assignments (frozen ACL)
  • Users and groups
  • Term store entries
  • Navigation settings, property bag, regional settings

Archive-specific actions taken for each user encountered:

  1. POST /_api/web/ensureuser — register user in target
  2. POST /_api/archive/identity-snapshots — capture identity snapshot with displayName, upn, email, and primaryGroups as they exist on the source

Archive-specific actions taken for each list/item with unique permissions:

  1. POST /_api/archive/acls — freeze role assignments at import time

Verifying the Import

ControlCenter dashboard

Open http://localhost:11000/Archive/ImportProgress (or the equivalent path on your Cesivi instance) to see:

  • Webs, lists, items, files processed
  • Identity snapshots captured
  • ACL records frozen
  • Errors (if any)

REST verification

# Check identity snapshots for a farm
curl -u admin:admin \
  "http://localhost:11000/_api/archive/identity-snapshots?farmId=<farmId>"

# Check frozen ACL count
curl -u admin:admin \
  "http://localhost:11000/_api/archive/acls?farmId=<farmId>"

# Check archive_mode on the root web
curl -u admin:admin \
  "http://localhost:11000/_api/web" \
  -H "Accept: application/json"
# → "ArchiveMode": true in the response

Write-gate test

After a successful import, write operations must be rejected:

# Should return 423 Locked
curl -u admin:admin -X POST \
  "http://localhost:11000/_api/web/lists/getbytitle('Documents')/items" \
  -H "Content-Type: application/json" \
  -d '{"Title":"test"}'

Common Errors and Remediation

"Access denied" on source

Cause: The username does not have Site Collection Administrator permissions.
Fix: Add the account to Site Collection Administrators on the source farm, or use a higher-privileged account.

"Connection refused" on target

Cause: Cesivi is not running, or the --target URL is wrong.
Fix: Start Cesivi: cd Cesivi.Server && dotnet run. Verify the URL with curl http://localhost:11000/_api/web.

"archive_mode could not be set" in summary

Cause: The target Cesivi user does not have Site Admin privileges.
Fix: Use admin credentials via --target-user / --target-password.

Resume produces duplicate items

Cause: --resume was used but the checkpoint was corrupted or the source data changed between runs.
Fix: Use --restart to wipe the checkpoint and start fresh.

Identity snapshots count is 0 after import

Cause: The source farm returned users with empty SIDs (common with claims-based auth).
The importer falls back to the LoginName field as the snapshot key.
Fix: Verify by calling GET /_api/archive/identity-snapshots?farmId=<farmId>. If still empty, check that the source farm exposes user information via CSOM (/_vti_bin/client.svc).

"Unknown user (id=...)" shown for all archived users

Cause: NullFederatedIdentityLookup is the default (see _docs/ARCHIVE_IDENTITY.md → "Production federation"). Live IDP lookups are disabled; only snapshots are used.
Fix: Identity snapshots must have been captured at import time. Verify the snapshot count. If snapshots are present but users still show as "Unknown", check that the Sid / LoginName in the snapshot matches the value stored in the archived items.


Large Imports

Checkpoint strategy

The checkpoint store (default: local filesystem) records the last completed item ID. If the import is killed, --resume restarts from that boundary. Choose a checkpoint path on fast local storage — network shares may cause atomic-write failures.

dotnet run -- archive-import  \
  --checkpoint-path "C:\Temp\CesiviImport\checkpoint"

Memory limits

The importer streams items using StreamListItemsAsync (bounded memory per PLAN-1314). File downloads are streamed; no full-file buffering. For extremely large document libraries (>10 GB), run on a machine with at least 4 GB free RAM.

Parallel imports

Import one site collection per run. Parallel imports of different site collections into different Cesivi webs are supported — run multiple processes pointing to the same --target URL; each uses its own checkpoint path.


> Retention note: Retention metadata is captured automatically during import using the site's configured RetentionPolicy. Each imported item receives a RetentionUntilUtc date computed from the anchor mode (ImportDate, ItemCreated, ItemModified, or CustomField) plus DefaultWindowDays. After import, the item cannot be deleted or modified via any API surface until its retention window expires. See ARCHIVE_RETENTION.md for policy configuration and the REST API reference.

See also: Archive Admin Bundle — ControlCenter Quick Tour

See also: Archive Tools Operator Guide

See also: Tutorial G — SharePoint On-Premises Retirement Archive

See also: Cesivi Archive Variant A — Whitepaper

See also: Compliance Cookbook — HIPAA/GDPR/SOX/FRCP

See also: Archive Cluster Deployment Guide