Search Engine¶

Home > Documentation > Features > Search

Cesivi Server includes a full-text search engine that supports SharePoint Search REST API, SOAP Search, and CSOM Search APIs.

Table of Contents¶

Overview
Configuration
Search Engines
TF-IDF (Default)
Lucene.NET (Advanced)
API Support
Query Syntax
Performance Tuning
Troubleshooting

Overview¶

Cesivi Server provides two search engine implementations:

Engine	Description	Use Case
TF-IDF	Lightweight term-frequency search	Development, testing, simple scenarios
Lucene.NET	Full-featured search engine	Advanced queries, production-like behavior

Both engines support: - Full-text search across documents and list items - Field-specific boosting (title weighted 3x) - Pagination and result limiting - Score-based ranking - Search statistics

Configuration¶

Quick Start¶

Add to appsettings.json:

{
  "Cesivi": {
    "SearchEngine": "TfIdf"
  }
}

Or via environment variable:

# Windows PowerShell
$env:Cesivi__SearchEngine = "Lucene"

# Linux/macOS
export Cesivi__SearchEngine=Lucene

Or via command-line:

dotnet run --Cesivi:SearchEngine=Lucene

Available Values¶

Value	Engine
`TfIdf` (default)	Term-frequency search (lightweight)
`Lucene`	Lucene.NET 4.8 (advanced features)

Search Engines¶

TF-IDF (Default)¶

The default search engine uses Term Frequency-Inverse Document Frequency (TF-IDF) scoring.

Features: - No external dependencies - In-memory index (rebuilt on startup) - Fast for small to medium document sets - Simple query syntax

Limitations: - No phrase queries ("exact phrase") - No wildcard queries (test*) - No fuzzy matching (test~) - No boolean operators in query syntax

Best For: - Development and testing - Document sets < 10,000 items - Simple keyword search

Configuration:

{
  "Cesivi": {
    "SearchEngine": "TfIdf"
  }
}

Lucene.NET (Advanced)¶

The Lucene.NET engine provides full-featured search capabilities.

Features: - Phrase queries: "exact phrase match" - Wildcard queries: test*, te?t - Fuzzy matching: test~ (typo tolerance) - Boolean operators: term1 AND term2, term1 OR term2, NOT term - Field-specific search: title:report - Range queries: size:[1000 TO 5000] - Field boosting: title:important^2 - Persistent index (survives restarts) - Incremental indexing

Limitations: - Adds ~5MB to application size (Lucene.NET packages) - Index directory requires write access

Best For: - Production-like behavior testing - Large document sets (10,000+ items) - Advanced query syntax requirements - Testing SharePoint Search-dependent code

Configuration:

{
  "Cesivi": {
    "SearchEngine": "Lucene"
  }
}

Index Location:

The Lucene index is stored in:

{DataRootPath}/SearchIndex/

For example: R:/MockData/SearchIndex/ or /var/cesivi/data/SearchIndex/

API Support¶

REST API¶

GET /_api/search/query?querytext='sharepoint'

Response:

{
  "d": {
    "query": {
      "PrimaryQueryResult": {
        "RelevantResults": {
          "RowCount": 10,
          "TotalRows": 42,
          "Table": {
            "Rows": [
              {
                "Cells": [
                  { "Key": "Title", "Value": "SharePoint Guide" },
                  { "Key": "Path", "Value": "/sites/docs/guide.docx" },
                  { "Key": "HitHighlightedSummary", "Value": "<c0>SharePoint</c0> development..." }
                ]
              }
            ]
          }
        }
      }
    }
  }
}

SOAP API¶

POST /_vti_bin/search.asmx
SOAPAction: "urn:Microsoft.Search/Query"

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <Query xmlns="urn:Microsoft.Search">
      <queryXml>
        <QueryPacket>
          <Query>
            <QueryText>sharepoint</QueryText>
          </Query>
        </QueryPacket>
      </queryXml>
    </Query>
  </soap:Body>
</soap:Envelope>

CSOM¶

using Microsoft.SharePoint.Client;
using Microsoft.SharePoint.Client.Search.Query;

var ctx = new ClientContext("http://mocksharepoint.local:5000");
var query = new KeywordQuery(ctx)
{
    QueryText = "sharepoint"
};

var executor = new SearchExecutor(ctx);
var results = executor.ExecuteQuery(query);
ctx.ExecuteQuery();

foreach (var row in results.Value[0].ResultRows)
{
    Console.WriteLine(row["Title"]);
}

PnP PowerShell¶

Connect-PnPOnline -Url "http://mocksharepoint.local:5000" -Credentials $cred

# Simple search
$results = Submit-PnPSearchQuery -Query "sharepoint"
$results.ResultRows | ForEach-Object { $_.Title }

# With pagination
$results = Submit-PnPSearchQuery -Query "sharepoint" -StartRow 0 -MaxResults 50

Query Syntax¶

TF-IDF Engine¶

Simple keyword matching:

sharepoint                    # Single term
sharepoint documents          # Multiple terms (OR logic)

Lucene.NET Engine¶

Full Lucene query syntax:

# Basic terms
sharepoint                    # Single term
sharepoint documents          # Multiple terms (AND by default)

# Phrase queries
"quick brown fox"             # Exact phrase match

# Boolean operators
sharepoint AND documents      # Both terms required
sharepoint OR files           # Either term
sharepoint NOT draft          # Exclude term
+sharepoint -draft            # Required/excluded shorthand

# Wildcards
share*                        # Prefix wildcard
te?t                          # Single character wildcard

# Fuzzy matching
roam~                         # Fuzzy (finds "roam", "foam", "roams")
roam~0.8                      # Fuzzy with similarity threshold

# Field-specific search
title:sharepoint              # Search in title field
author:john                   # Search by author
content:report                # Search in content

# Boosting
title:important^4             # Boost title matches 4x

# Range queries
size:[1000 TO 5000]           # Numeric range
modified:[2024-01-01 TO *]    # Date range

# Grouping
(sharepoint OR office) AND documents

Available Fields: - title - Document/item title - content - Full text content - author - Author/creator - webUrl - Web URL - serverRelativeUrl - Server-relative path - documentType - Item type (Document, ListItem, etc.) - created - Creation date - modified - Last modified date - size - File size in bytes

Performance Tuning¶

Index Rebuild¶

To force a full index rebuild:

Stop the server

Delete the search index directory:

# Windows
Remove-Item -Recurse -Force "R:\MockData\SearchIndex"

# Linux
rm -rf /var/cesivi/data/SearchIndex

Restart the server (index rebuilds automatically)

Memory Considerations¶

TF-IDF Engine: - Index is in-memory - Memory usage scales with document count - ~1KB per document on average

Lucene.NET Engine: - Index is on disk - Uses memory-mapped files - Memory usage is configurable via JVM-like settings (not exposed)

Large Document Sets¶

For large document sets (50,000+ items):

Use Lucene.NET engine
Place index on SSD storage
Consider increasing Kestrel limits:

{
  "Kestrel": {
    "Limits": {
      "MaxRequestBodySize": 104857600,
      "MaxConcurrentConnections": 1000
    }
  }
}

Troubleshooting¶

Search Returns No Results¶

Check: Is the search engine initialized?

GET /_vti_bin/health

Look for search statistics in the response.

Check: Are documents indexed?

# Use admin endpoint
Invoke-RestMethod "http://localhost:5000/_admin/search/stats"

Search is Slow¶

Check index location - Use SSD, not network drive
Check document count - Consider Lucene.NET for large sets
Check query complexity - Wildcards and fuzzy queries are slower

Index Corruption (Lucene.NET)¶

If Lucene reports index corruption:

Stop the server
Delete the index directory
Restart (automatic rebuild)

Stop-Process -Name "Cesivi.Server" -ErrorAction SilentlyContinue
Remove-Item -Recurse -Force "R:\MockData\SearchIndex"
dotnet run --project Cesivi.Server

Query Parse Errors¶

If queries fail to parse (Lucene.NET):

Escape special characters: + - && || ! ( ) { } [ ] ^ " ~ * ? : \ /
Example: search\:term instead of search:term

Or use simple query fallback:

// The engine automatically falls back to simple term query on parse error

Comparison: TF-IDF vs Lucene.NET¶

Feature	TF-IDF	Lucene.NET
Phrase queries	No	Yes
Wildcards	No	Yes
Fuzzy matching	No	Yes
Boolean operators	Implicit OR	Full support
Field-specific search	No	Yes
Index persistence	In-memory	On disk
Startup time	Rebuilds	Loads existing
Dependencies	None	~5MB packages
Memory usage	Higher	Lower
Query performance	Fast (simple)	Fast (complex)
Best for	Dev/Test	Production-like

Next Steps¶

Configure server -> Configuration Guide
REST API details -> REST API Guide
CSOM reference -> CSOM Guide
PnP PowerShell -> PnP Guide

< Back to Features | View All Docs