Search Engine¶
Home > Documentation > Features > Search
Cesivi Server includes a full-text search engine that supports SharePoint Search REST API, SOAP Search, and CSOM Search APIs.
Table of Contents¶
- Overview
- Configuration
- Search Engines
- TF-IDF (Default)
- Lucene.NET (Advanced)
- API Support
- Query Syntax
- Performance Tuning
- Troubleshooting
Overview¶
Cesivi Server provides two search engine implementations:
| Engine | Description | Use Case |
|---|---|---|
| TF-IDF | Lightweight term-frequency search | Development, testing, simple scenarios |
| Lucene.NET | Full-featured search engine | Advanced queries, production-like behavior |
Both engines support: - Full-text search across documents and list items - Field-specific boosting (title weighted 3x) - Pagination and result limiting - Score-based ranking - Search statistics
Configuration¶
Quick Start¶
Add to appsettings.json:
{
"Cesivi": {
"SearchEngine": "TfIdf"
}
}
Or via environment variable:
# Windows PowerShell
$env:Cesivi__SearchEngine = "Lucene"
# Linux/macOS
export Cesivi__SearchEngine=Lucene
Or via command-line:
dotnet run --Cesivi:SearchEngine=Lucene
Available Values¶
| Value | Engine |
|---|---|
TfIdf (default) |
Term-frequency search (lightweight) |
Lucene |
Lucene.NET 4.8 (advanced features) |
Search Engines¶
TF-IDF (Default)¶
The default search engine uses Term Frequency-Inverse Document Frequency (TF-IDF) scoring.
Features: - No external dependencies - In-memory index (rebuilt on startup) - Fast for small to medium document sets - Simple query syntax
Limitations:
- No phrase queries ("exact phrase")
- No wildcard queries (test*)
- No fuzzy matching (test~)
- No boolean operators in query syntax
Best For: - Development and testing - Document sets < 10,000 items - Simple keyword search
Configuration:
{
"Cesivi": {
"SearchEngine": "TfIdf"
}
}
Lucene.NET (Advanced)¶
The Lucene.NET engine provides full-featured search capabilities.
Features:
- Phrase queries: "exact phrase match"
- Wildcard queries: test*, te?t
- Fuzzy matching: test~ (typo tolerance)
- Boolean operators: term1 AND term2, term1 OR term2, NOT term
- Field-specific search: title:report
- Range queries: size:[1000 TO 5000]
- Field boosting: title:important^2
- Persistent index (survives restarts)
- Incremental indexing
Limitations: - Adds ~5MB to application size (Lucene.NET packages) - Index directory requires write access
Best For: - Production-like behavior testing - Large document sets (10,000+ items) - Advanced query syntax requirements - Testing SharePoint Search-dependent code
Configuration:
{
"Cesivi": {
"SearchEngine": "Lucene"
}
}
Index Location:
The Lucene index is stored in:
{DataRootPath}/SearchIndex/
For example: R:/MockData/SearchIndex/ or /var/cesivi/data/SearchIndex/
API Support¶
REST API¶
GET /_api/search/query?querytext='sharepoint'
Response:
{
"d": {
"query": {
"PrimaryQueryResult": {
"RelevantResults": {
"RowCount": 10,
"TotalRows": 42,
"Table": {
"Rows": [
{
"Cells": [
{ "Key": "Title", "Value": "SharePoint Guide" },
{ "Key": "Path", "Value": "/sites/docs/guide.docx" },
{ "Key": "HitHighlightedSummary", "Value": "<c0>SharePoint</c0> development..." }
]
}
]
}
}
}
}
}
}
SOAP API¶
POST /_vti_bin/search.asmx
SOAPAction: "urn:Microsoft.Search/Query"
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<Query xmlns="urn:Microsoft.Search">
<queryXml>
<QueryPacket>
<Query>
<QueryText>sharepoint</QueryText>
</Query>
</QueryPacket>
</queryXml>
</Query>
</soap:Body>
</soap:Envelope>
CSOM¶
using Microsoft.SharePoint.Client;
using Microsoft.SharePoint.Client.Search.Query;
var ctx = new ClientContext("http://mocksharepoint.local:5000");
var query = new KeywordQuery(ctx)
{
QueryText = "sharepoint"
};
var executor = new SearchExecutor(ctx);
var results = executor.ExecuteQuery(query);
ctx.ExecuteQuery();
foreach (var row in results.Value[0].ResultRows)
{
Console.WriteLine(row["Title"]);
}
PnP PowerShell¶
Connect-PnPOnline -Url "http://mocksharepoint.local:5000" -Credentials $cred
# Simple search
$results = Submit-PnPSearchQuery -Query "sharepoint"
$results.ResultRows | ForEach-Object { $_.Title }
# With pagination
$results = Submit-PnPSearchQuery -Query "sharepoint" -StartRow 0 -MaxResults 50
Query Syntax¶
TF-IDF Engine¶
Simple keyword matching:
sharepoint # Single term
sharepoint documents # Multiple terms (OR logic)
Lucene.NET Engine¶
Full Lucene query syntax:
# Basic terms
sharepoint # Single term
sharepoint documents # Multiple terms (AND by default)
# Phrase queries
"quick brown fox" # Exact phrase match
# Boolean operators
sharepoint AND documents # Both terms required
sharepoint OR files # Either term
sharepoint NOT draft # Exclude term
+sharepoint -draft # Required/excluded shorthand
# Wildcards
share* # Prefix wildcard
te?t # Single character wildcard
# Fuzzy matching
roam~ # Fuzzy (finds "roam", "foam", "roams")
roam~0.8 # Fuzzy with similarity threshold
# Field-specific search
title:sharepoint # Search in title field
author:john # Search by author
content:report # Search in content
# Boosting
title:important^4 # Boost title matches 4x
# Range queries
size:[1000 TO 5000] # Numeric range
modified:[2024-01-01 TO *] # Date range
# Grouping
(sharepoint OR office) AND documents
Available Fields:
- title - Document/item title
- content - Full text content
- author - Author/creator
- webUrl - Web URL
- serverRelativeUrl - Server-relative path
- documentType - Item type (Document, ListItem, etc.)
- created - Creation date
- modified - Last modified date
- size - File size in bytes
Performance Tuning¶
Index Rebuild¶
To force a full index rebuild:
- Stop the server
- Delete the search index directory:
# Windows Remove-Item -Recurse -Force "R:\MockData\SearchIndex" # Linux rm -rf /var/cesivi/data/SearchIndex - Restart the server (index rebuilds automatically)
Memory Considerations¶
TF-IDF Engine: - Index is in-memory - Memory usage scales with document count - ~1KB per document on average
Lucene.NET Engine: - Index is on disk - Uses memory-mapped files - Memory usage is configurable via JVM-like settings (not exposed)
Large Document Sets¶
For large document sets (50,000+ items):
- Use Lucene.NET engine
- Place index on SSD storage
- Consider increasing Kestrel limits:
{
"Kestrel": {
"Limits": {
"MaxRequestBodySize": 104857600,
"MaxConcurrentConnections": 1000
}
}
}
Troubleshooting¶
Search Returns No Results¶
Check: Is the search engine initialized?
GET /_vti_bin/health
Look for search statistics in the response.
Check: Are documents indexed?
# Use admin endpoint
Invoke-RestMethod "http://localhost:5000/_admin/search/stats"
Search is Slow¶
- Check index location - Use SSD, not network drive
- Check document count - Consider Lucene.NET for large sets
- Check query complexity - Wildcards and fuzzy queries are slower
Index Corruption (Lucene.NET)¶
If Lucene reports index corruption:
- Stop the server
- Delete the index directory
- Restart (automatic rebuild)
Stop-Process -Name "Cesivi.Server" -ErrorAction SilentlyContinue
Remove-Item -Recurse -Force "R:\MockData\SearchIndex"
dotnet run --project Cesivi.Server
Query Parse Errors¶
If queries fail to parse (Lucene.NET):
- Escape special characters:
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \ / - Example:
search\:terminstead ofsearch:term
Or use simple query fallback:
// The engine automatically falls back to simple term query on parse error
Comparison: TF-IDF vs Lucene.NET¶
| Feature | TF-IDF | Lucene.NET |
|---|---|---|
| Phrase queries | No | Yes |
| Wildcards | No | Yes |
| Fuzzy matching | No | Yes |
| Boolean operators | Implicit OR | Full support |
| Field-specific search | No | Yes |
| Index persistence | In-memory | On disk |
| Startup time | Rebuilds | Loads existing |
| Dependencies | None | ~5MB packages |
| Memory usage | Higher | Lower |
| Query performance | Fast (simple) | Fast (complex) |
| Best for | Dev/Test | Production-like |
Next Steps¶
- Configure server -> Configuration Guide
- REST API details -> REST API Guide
- CSOM reference -> CSOM Guide
- PnP PowerShell -> PnP Guide