Skip to content

Search Engine

Home > Documentation > Features > Search

Cesivi Server includes a full-text search engine that supports SharePoint Search REST API, SOAP Search, and CSOM Search APIs.

Table of Contents

Overview

Cesivi Server provides two search engine implementations:

Engine Description Use Case
TF-IDF Lightweight term-frequency search Development, testing, simple scenarios
Lucene.NET Full-featured search engine Advanced queries, production-like behavior

Both engines support: - Full-text search across documents and list items - Field-specific boosting (title weighted 3x) - Pagination and result limiting - Score-based ranking - Search statistics

Configuration

Quick Start

Add to appsettings.json:

{
  "Cesivi": {
    "SearchEngine": "TfIdf"
  }
}

Or via environment variable:

# Windows PowerShell
$env:Cesivi__SearchEngine = "Lucene"

# Linux/macOS
export Cesivi__SearchEngine=Lucene

Or via command-line:

dotnet run --Cesivi:SearchEngine=Lucene

Available Values

Value Engine
TfIdf (default) Term-frequency search (lightweight)
Lucene Lucene.NET 4.8 (advanced features)

Search Engines

TF-IDF (Default)

The default search engine uses Term Frequency-Inverse Document Frequency (TF-IDF) scoring.

Features: - No external dependencies - In-memory index (rebuilt on startup) - Fast for small to medium document sets - Simple query syntax

Limitations: - No phrase queries ("exact phrase") - No wildcard queries (test*) - No fuzzy matching (test~) - No boolean operators in query syntax

Best For: - Development and testing - Document sets < 10,000 items - Simple keyword search

Configuration:

{
  "Cesivi": {
    "SearchEngine": "TfIdf"
  }
}

Lucene.NET (Advanced)

The Lucene.NET engine provides full-featured search capabilities.

Features: - Phrase queries: "exact phrase match" - Wildcard queries: test*, te?t - Fuzzy matching: test~ (typo tolerance) - Boolean operators: term1 AND term2, term1 OR term2, NOT term - Field-specific search: title:report - Range queries: size:[1000 TO 5000] - Field boosting: title:important^2 - Persistent index (survives restarts) - Incremental indexing

Limitations: - Adds ~5MB to application size (Lucene.NET packages) - Index directory requires write access

Best For: - Production-like behavior testing - Large document sets (10,000+ items) - Advanced query syntax requirements - Testing SharePoint Search-dependent code

Configuration:

{
  "Cesivi": {
    "SearchEngine": "Lucene"
  }
}

Index Location:

The Lucene index is stored in:

{DataRootPath}/SearchIndex/

For example: R:/MockData/SearchIndex/ or /var/cesivi/data/SearchIndex/

API Support

REST API

GET /_api/search/query?querytext='sharepoint'

Response:

{
  "d": {
    "query": {
      "PrimaryQueryResult": {
        "RelevantResults": {
          "RowCount": 10,
          "TotalRows": 42,
          "Table": {
            "Rows": [
              {
                "Cells": [
                  { "Key": "Title", "Value": "SharePoint Guide" },
                  { "Key": "Path", "Value": "/sites/docs/guide.docx" },
                  { "Key": "HitHighlightedSummary", "Value": "<c0>SharePoint</c0> development..." }
                ]
              }
            ]
          }
        }
      }
    }
  }
}

SOAP API

POST /_vti_bin/search.asmx
SOAPAction: "urn:Microsoft.Search/Query"

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <Query xmlns="urn:Microsoft.Search">
      <queryXml>
        <QueryPacket>
          <Query>
            <QueryText>sharepoint</QueryText>
          </Query>
        </QueryPacket>
      </queryXml>
    </Query>
  </soap:Body>
</soap:Envelope>

CSOM

using Microsoft.SharePoint.Client;
using Microsoft.SharePoint.Client.Search.Query;

var ctx = new ClientContext("http://mocksharepoint.local:5000");
var query = new KeywordQuery(ctx)
{
    QueryText = "sharepoint"
};

var executor = new SearchExecutor(ctx);
var results = executor.ExecuteQuery(query);
ctx.ExecuteQuery();

foreach (var row in results.Value[0].ResultRows)
{
    Console.WriteLine(row["Title"]);
}

PnP PowerShell

Connect-PnPOnline -Url "http://mocksharepoint.local:5000" -Credentials $cred

# Simple search
$results = Submit-PnPSearchQuery -Query "sharepoint"
$results.ResultRows | ForEach-Object { $_.Title }

# With pagination
$results = Submit-PnPSearchQuery -Query "sharepoint" -StartRow 0 -MaxResults 50

Query Syntax

TF-IDF Engine

Simple keyword matching:

sharepoint                    # Single term
sharepoint documents          # Multiple terms (OR logic)

Lucene.NET Engine

Full Lucene query syntax:

# Basic terms
sharepoint                    # Single term
sharepoint documents          # Multiple terms (AND by default)

# Phrase queries
"quick brown fox"             # Exact phrase match

# Boolean operators
sharepoint AND documents      # Both terms required
sharepoint OR files           # Either term
sharepoint NOT draft          # Exclude term
+sharepoint -draft            # Required/excluded shorthand

# Wildcards
share*                        # Prefix wildcard
te?t                          # Single character wildcard

# Fuzzy matching
roam~                         # Fuzzy (finds "roam", "foam", "roams")
roam~0.8                      # Fuzzy with similarity threshold

# Field-specific search
title:sharepoint              # Search in title field
author:john                   # Search by author
content:report                # Search in content

# Boosting
title:important^4             # Boost title matches 4x

# Range queries
size:[1000 TO 5000]           # Numeric range
modified:[2024-01-01 TO *]    # Date range

# Grouping
(sharepoint OR office) AND documents

Available Fields: - title - Document/item title - content - Full text content - author - Author/creator - webUrl - Web URL - serverRelativeUrl - Server-relative path - documentType - Item type (Document, ListItem, etc.) - created - Creation date - modified - Last modified date - size - File size in bytes

Performance Tuning

Index Rebuild

To force a full index rebuild:

  1. Stop the server
  2. Delete the search index directory:
    # Windows
    Remove-Item -Recurse -Force "R:\MockData\SearchIndex"
    
    # Linux
    rm -rf /var/cesivi/data/SearchIndex
    
  3. Restart the server (index rebuilds automatically)

Memory Considerations

TF-IDF Engine: - Index is in-memory - Memory usage scales with document count - ~1KB per document on average

Lucene.NET Engine: - Index is on disk - Uses memory-mapped files - Memory usage is configurable via JVM-like settings (not exposed)

Large Document Sets

For large document sets (50,000+ items):

  1. Use Lucene.NET engine
  2. Place index on SSD storage
  3. Consider increasing Kestrel limits:
{
  "Kestrel": {
    "Limits": {
      "MaxRequestBodySize": 104857600,
      "MaxConcurrentConnections": 1000
    }
  }
}

Troubleshooting

Search Returns No Results

Check: Is the search engine initialized?

GET /_vti_bin/health

Look for search statistics in the response.

Check: Are documents indexed?

# Use admin endpoint
Invoke-RestMethod "http://localhost:5000/_admin/search/stats"

Search is Slow

  1. Check index location - Use SSD, not network drive
  2. Check document count - Consider Lucene.NET for large sets
  3. Check query complexity - Wildcards and fuzzy queries are slower

Index Corruption (Lucene.NET)

If Lucene reports index corruption:

  1. Stop the server
  2. Delete the index directory
  3. Restart (automatic rebuild)
Stop-Process -Name "Cesivi.Server" -ErrorAction SilentlyContinue
Remove-Item -Recurse -Force "R:\MockData\SearchIndex"
dotnet run --project Cesivi.Server

Query Parse Errors

If queries fail to parse (Lucene.NET):

  • Escape special characters: + - && || ! ( ) { } [ ] ^ " ~ * ? : \ /
  • Example: search\:term instead of search:term

Or use simple query fallback:

// The engine automatically falls back to simple term query on parse error

Comparison: TF-IDF vs Lucene.NET

Feature TF-IDF Lucene.NET
Phrase queries No Yes
Wildcards No Yes
Fuzzy matching No Yes
Boolean operators Implicit OR Full support
Field-specific search No Yes
Index persistence In-memory On disk
Startup time Rebuilds Loads existing
Dependencies None ~5MB packages
Memory usage Higher Lower
Query performance Fast (simple) Fast (complex)
Best for Dev/Test Production-like

Next Steps


< Back to Features | View All Docs