Speed matters in data analytics. When you are processing billions of rows, even small performance improvements compound into massive time and cost savings. For ClickHouse 25.10, I contributed three patches that push the boundaries of query performance and reliability—particularly for text search and data ingestion operations that are ubiquitous in modern data workloads.
Merged contributions
Three pull requests merged into the ClickHouse 25.10 changelog:
- New SIMD-optimized text search functions (PR#87374)
- Added startsWithCaseInsensitive and endsWithCaseInsensitive functions with hand-tuned SIMD optimizations
- Query optimizer enhancement for pattern matching (Issue#71421, PR#85920)
- Automatically rewrites LIKE expressions with prefix/suffix patterns into faster SIMD-optimized function calls
- ~5x performance improvement on affix pattern queries
- Critical bug fix for parallel INSERT SELECT with CTEs (Issue#85368, PR#87789)
- Resolved metadata catalog resolution failure in parallel insertion to replicated tables
Overview
Making text search blazing fast
ClickHouse has earned its reputation as the fastest analytics database through relentless optimization. My contributions in 25.10 focus on a critical operation that touches nearly every data workload: text search.
Text data is everywhere.
Whether you are:
- Searching through application logs and distributed traces in observability platforms
- Retrieving information from document stores and knowledge bases
- Transforming textual data in data warehouses
- Building retrieval-augmented generation (RAG) pipelines for AI applications,
you need efficient text search operations. When these operations run on billions of rows, even small improvements multiply into dramatic performance gains.
Bridging pattern complexity and performance
ClickHouse’s query optimizer is decades of database research distilled into production code. And like research, it is constantly evolving for better. One area of improvement identified by the community is pattern matching with regular expressions. Regular expression can be computationally expensive to evaluate. But many common patterns—like checking if a log message starts with “ERROR” or ends with “.json”—don’t need the full power of regex engines.
The idea: For simple prefix and suffix patterns (what we call “affix” matching), SIMD-optimized substring comparison can be 5x faster than regex evaluation.
The solution: Rather than requiring users to manually optimize their queries, we built logic into the query optimizer itself. A new optimization pass automatically detects LIKE expressions with affix patterns and rewrites them into SIMD-optimized function calls—transparent to the user, massive performance gain.
Now with PR#85920 (the query rewriting optimization) and PR#87374 (the additional SIMD-optimized functions), you can speed up your queries with:
- New database setting optimize_rewrite_like_perfect_affix to enable this optimization
- New SIMD-optimized functions startsWithCaseInsensitive, startsWithCaseInsensitiveUTF8, endsWithCaseInsensitive, endsWithCaseInsensitiveUTF8.
Fixing data ingestion at scale
ClickHouse excels at ingesting massive data volumes from sources like Apache Kafka and Apache Iceberg. In replicated, shared-nothing deployments, ClickHouse parallelizes data insertion across replicas for maximum throughput.
However, a subtle bug (Issue#85368) caused failures when INSERT SELECT queries used common table expressions (CTEs). The query interpreter on each replica failed to resolve the CTE in its metadata catalog, breaking parallel ingestion workflows.
PR#87789 fixes this catalog resolution issue, ensuring reliable data ingestion in production environments where replication is critical for both performance and fault tolerance.
The community reported the bug back in August and since then there had been multiple follow-ups. Given the bug’s impact, the fix to be also backported to 25.8.
Now with PR#87789, whether you are self-hosting ClickHouse, or using a managed service such as Instaclustr for ClickHouse, you will be able to use the faster, parallel data ingestion reliably by:
- Enabling parallel_distributed_insert_select for constant CTEs
Deep dive
Let’s explore how these optimizations work under the hood—from query tree transformations to SIMD intrinsics to distributed system debugging.
PR#85920: Query optimizer rewrite for LIKE expressions
The problem: Regex overhead for simple patterns
Consider a common query pattern in data warehouse (try it yourself):
|
1 |
SELECT count(*) FROM products WHERE description LIKE 'ClickHouse%'; |
This matches “ClickHouse Server”, “ClickHouse Local”, “ClickHouse MCP”, etc., but not “chDB”.
In standard ClickHouse execution, LIKE patterns are compiled into regular expressions and evaluated using finite automata. This general-purpose approach handles complex patterns beautifully—but it’s overkill for simple prefix and suffix matching. For these “affix” patterns, direct substring comparison is ~5x faster.
The solution: Automatic query rewriting
Rather than asking users to manually rewrite their queries, we teach the optimizer to do it automatically. This PR adds a new optimization pass to ClickHouse’s query analyzer.
Understanding query trees
Like most modern databases (and programming language compilers), ClickHouse parses SQL into abstract syntax trees (ASTs) and performs multiple optimization passes to transform these trees into more efficient execution plans.
Here’s what the query tree looks like for our example:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
QUERY id: 0 PROJECTION ... JOIN TREE ... WHERE FUNCTION id: 4, function_name: like, function_type: ordinary ARGUMENTS LIST id: 5, nodes: 2 IDENTIFIER id: 6, identifier: name CONSTANT id: 7, constant_value: 'ClickHouse%', constant_value_type: String The WHERE clause contains a like function with two arguments: the column name and the pattern 'ClickHouse%'. |
Choosing the right rewrite strategy
When we detect a LIKE expression with an affix pattern (e.g., description LIKE ‘ClickHouse%’), we have two main rewrite options:
Option 1: Range Comparison
|
1 |
'ClickHouse' <= description AND description < 'ClickHousf' |
Option 2: SIMD-Optimized Function
|
1 |
startsWith(description, 'ClickHouse') |
In a typical programming language, these might seem equivalent. But in an analytical database processing millions of rows, Option 2 is ~5x faster. Why?
Why SIMD functions win: Less computation, simpler pipeline
The performance difference comes down to how ClickHouse processes data in batches (called “granules”) through execution pipelines.
Option 1 challenges:
- Requires two string comparisons per row (lower bound and upper bound)
- Creates intermediate results after the first comparison
- Must materialize and pass these intermediate results to the second comparison
- Doubles the number of filter transforms in the execution pipeline
- Increases memory pressure from storing intermediate data
Option 2 advantages:
- Requires only one comparison per row
- The startsWith function is hand-optimized with SIMD intrinsics to compare prefixes against multiple strings in parallel
- No intermediate results to store or pass
- Simpler, more efficient execution pipeline
Visualizing the pipeline difference
We can see this directly in the execution pipelines. Option 2 reduces filter transforms by 50% in our example:
Option 1: Range comparison (8 filter transforms)
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
EXPLAIN pipeline SELECT * FROM products WHERE name >= 'ClickHouse' AND name < 'ClickHousf'; (Expression) ExpressionTransform × 4 (Filter) FilterTransform × 8 ← More transforms (ReadFromMemoryStorage) |
Option 2: SIMD function (4 filter transforms)
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
EXPLAIN pipeline SELECT * FROM products WHERE startsWith(name, 'ClickHouse'); (Expression) ExpressionTransform × 4 (Filter) FilterTransform × 4 ← Fewer transforms (ReadFromMemoryStorage) |
The optimized query tree
Based on this analysis, our optimization pass automatically transforms the query tree. The original like function node is replaced with a startsWith function:
Before optimization:
|
1 2 3 4 5 6 7 8 9 |
WHERE FUNCTION function_name: like ARGUMENTS IDENTIFIER: name CONSTANT: 'ClickHouse%' <span style="font-weight: 400;"> </span> |
After optimization:
|
1 2 3 4 5 6 7 8 9 |
WHERE FUNCTION function_name: startsWith, result_type: UInt8 ARGUMENTS COLUMN: name CONSTANT: 'ClickHouse' |
This transformation is completely transparent to users. They write idiomatic SQL with LIKE patterns, and ClickHouse automatically optimizes it to take the fastest execution path. The performance results speak for themselves (see the Performance section below).
PR#87374: SIMD-optimized case-insensitive text search
The performance-critical inner loop
Text search functions in ClickHouse execute in what we call the “inner loop”—they’re evaluated against millions of row granules in rapid succession. Even tiny CPU cycle savings per comparison multiply dramatically across massive datasets.
This PR introduces new case-insensitive text search functions (startsWithCaseInsensitive and endsWithCaseInsensitive) with hand-tuned SIMD optimizations. These functions are the foundation that makes the query rewriting in PR#85920 possible for case-insensitive searches.
Understanding SIMD: Data parallelism at the instruction level
SIMD (Single Instruction, Multiple Data) allows a single CPU instruction to operate on multiple data elements simultaneously. Instead of comparing one character at a time, SIMD instructions can compare 16 bytes in a single operation on modern x86 CPUs with SSE2.
For example,
- Scalar processing: Compare “C” with “c”, then “L” with “l”, then “I” with “i”… (one at a time)
- SIMD processing: Compare “ClickHouse1234567” against target pattern in just a few instructions (16 bytes at once)
Optimization techniques
Implementing high-performance SIMD functions requires careful engineering. We employ several techniques:
1. Separate fast paths for ASCII and UTF-8
ASCII characters are always 1 byte, while UTF-8 uses 1-4 bytes per character. By separating these code paths, we avoid unnecessary width checks in the ASCII fast path:
|
1 2 3 4 5 |
using CaseInsensitiveComparator = std::variant< std::unique_ptr<ASCIICaseInsensitiveStringSearcher>, std::unique_ptr<UTF8CaseInsensitiveStringSearcher>>; |
2. Hoist constant comparator construction
When comparing against a constant pattern like ‘ClickHouse’, we construct the comparator once before the loop, not millions of times inside it:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
// Build comparator ONCE before the loop const CaseInsensitiveComparator const_comparator = constCaseInsensitiveComparatorOf<NeedleSource>(needle_source); size_t row_num = 0; while (!haystack_source.isEnd()) { // Compare each row with pre-built comparator res_data[row_num] = comparator->compare(...); row_num++; } |
3. Substring-only comparison
For affix patterns, we only compare substrings of matching length. No need to scan entire strings when checking prefixes or suffixes:
|
1 2 3 4 5 6 7 8 9 10 11 |
/ For prefix matching, only compare first N characters res_data[row_num] = comparator->compare( haystack.data, haystack.data + pattern_length, // Only compare prefix haystack.data ); |
4. SIMD intrinsics for parallel comparison
The core case-insensitive comparison uses SSE2 intrinsics to compare 16 bytes at once:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
// Load 16 bytes from input string const auto v_haystack = _mm_loadu_si128(reinterpret_cast<const __m128i *>(pos)); // Compare against lowercase pattern (16 bytes in parallel) const auto v_against_l = _mm_cmpeq_epi8(v_haystack, cachel); // Compare against uppercase pattern (16 bytes in parallel) const auto v_against_u = _mm_cmpeq_epi8(v_haystack, cacheu); // Combine results: match if either lowercase OR uppercase matched const auto v_against_l_or_u = _mm_or_si128(v_against_l, v_against_u); // Extract comparison results as bitmask const auto mask = _mm_movemask_epi8(v_against_l_or_u); |
This processes 16 character comparisons in just a few CPU cycles instead of iterating through them one by one.
Reference: Intel Intrinsic Reference
The result
These optimizations deliver ~38% speedup on a TPC-H derived workload for case-insensitive pattern matching queries (see Performance section below).
PR#87789: Fixing INSERT SELECT with constant common table expressions
The bug: Metadata catalog resolution in replicated environments
Common table expressions (CTEs) are a powerful SQL feature that lets you define temporary named result sets within a query. They’re especially useful for complex data transformations:
|
1 2 3 4 5 6 7 8 9 |
WITH transformed_data AS ( -- Some constant expressions here ) INSERT INTO user_totals SELECT * FROM transformed_data; |
ClickHouse’s replicated architecture provides both high availability and parallel ingestion performance. When you insert data into a replicated table, ClickHouse distributes the work across replicas to maximize throughput.
However, Issue#85368 revealed a subtle bug: when INSERT SELECT queries used CTEs, the query interpreter on each replica failed to resolve the CTE definition in its local metadata catalog. The CTE was defined in the coordinator’s context but wasn’t properly propagated to replica query interpreters.
The root cause: Missing context propagation
In ClickHouse’s distributed query execution:
- The coordinator parses the query and builds the execution plan
- For replicated inserts, sub-queries are distributed to replicas
- Each replica interprets and executes its portion of the query
The bug occurred at step 3. CTEs are stored in the query context’s metadata catalog. When the coordinator distributed the INSERT SELECT to replicas, it sent the query text but didn’t ensure the CTE definitions were available in each replica’s query context.
Result: Each replica’s query interpreter tried to resolve transformed_data (from the example above) and failed—because it wasn’t in their local catalog.
The fix: Ensuring context consistency
PR#87789 fixes this by ensuring CTE definitions are properly propagated to replica query contexts before execution. The coordinator now:
- Identifies all CTEs in the query
- Serializes their definitions from its metadata catalog
- Includes them in the execution context sent to each replica
- Each replica registers these CTEs in its local context before interpretation
This ensures replicas can resolve CTE references during query interpretation, restoring parallel ingestion functionality for queries with CTEs.
Impact
This fix is critical for production deployments where:
- Data ingestion pipelines use CTEs for complex transformations
- Replicated tables require high-throughput parallel inserts
- Reliability and consistency across replicas is non-negotiable
Performance: Optimization verification
Now let’s look at the numbers. Here’s how these optimizations perform in validation tests.
PR#85920: LIKE rewrite optimization (~5x Speedup)
As part of PR#85920, we added affix pattern queries to ClickHouse’s continuous performance testing suite. This ensures the CI/CD pipeline monitors performance for any improvements or regressions.
Results summary
The optimization delivers approximately 5x speedup for both prefix and suffix pattern matching:
| Pattern type | Without optimization | With optimization | Speedup |
| Prefix (‘prefix%’) | 0.572s | 0.135s | 4.2x faster |
| Suffix (‘%suffix’) | 0.682s | 0.135s | 5.1x faster |
Note: Relative time variance < 0.004 for all measurements, indicating stable, reliable results
Test query details
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
- Prefix matching: 572ms → 135ms (4.2x improvement) SELECT count() FROM tab WHERE str LIKE 'prefix%' SETTINGS optimize_rewrite_like_perfect_affix=1 -- Suffix matching: 682ms → 135ms (5.1x improvement) SELECT count() FROM tab WHERE str LIKE '%suffix' SETTINGS optimize_rewrite_like_perfect_affix=1 |
The optimization is enabled by default in ClickHouse 25.10. Users can disable it with SETTINGS optimize_rewrite_like_perfect_affix=0 if needed.
Full report: PR#85920 Performance Comparison
PR#87374: SIMD case-insensitive search (~38% speedup)
We verified the performance of PR#87374 using queries derived from TPC-H at scale factor 30, which generates a ~30GB lineitem table with 240 million rows. This represents a realistic analytical workload.
Result summary
We compared three approaches for finding comments starting with “te” (case-insensitive):
| Approach | Execution time | Throughput | Speedup vs. baseline |
| Baseline: startsWith(lower(…), ‘te’) | 0.957s | 251M rows/s | — |
| Alternative: lower(left(…, 2)) = ‘te’ | 0.821s | 292M rows/s | 1.17x |
| SIMD optimized: startsWithCaseInsensitive | 0.605s | 397M rows/s | 1.58x (38% faster) |
Test query details
Baseline: Combine startsWith with lower()
|
1 2 3 |
SELECT sum(startsWith(lower(l_comment), 'te')) FROM lineitem; |
— Result: 0.957 sec, 240M rows, 8.22 GB
— Throughput: 250.86M rows/s, 8.59 GB/s
|
1 2 3 4 5 |
Alternative: Substring extraction with case normalization SELECT sum(lower(left(l_comment, 2)) = 'te') FROM lineitem; <span style="font-weight: 400;"> </span> |
— Result: 0.821 sec, 240M rows, 8.22 GB
— Throughput: 292.40M rows/s, 10.01 GB/s
SIMD Optimized: Direct case-insensitive comparison
|
1 2 3 |
SELECT sum(startsWithCaseInsensitive(l_comment, 'te')) FROM lineitem; <span style="font-weight: 400;"> </span> |
— Result: 0.605 sec, 240M rows, 8.22 GB
— Throughput: 396.56M rows/s, 13.58 GB/s ← Faster
Real-world impact
These improvements matter because:
- Text search is everywhere: Logs, traces, documents, user-generated content—text data pervades modern analytics
- Scale multiplies gains: A 5x speedup on queries processing billions of rows translates to massive time and cost savings
- Transparency: Users get automatic optimizations without changing their SQL
- Production-ready: Fixes ensure reliable operation at scale
Looking forward
Contributing to ClickHouse has been an incredible learning experience for me in:
- Database query optimization and compiler techniques
- Low-level performance engineering with SIMD intrinsics
- Distributed systems and metadata consistency
- Collaborative open-source development with a world-class engineering team
I’m excited to see these optimizations help ClickHouse users extract insights faster from their data. The journey of making the fastest database even faster continues!
Sign up for a free trial of Instaclustr for ClickHouse and try out the improved performance and reliability now!
For questions or discussions about these contributions, feel free to reach out by email [email protected], and GitHub @zheguang, and join the conversation in ClickHouse open source.