Deduplication Sandbox
Select a pre-loaded test case to demonstrate the engine's capabilities.
The Paraphrased Repost
Checks an admissions query that has been reworded and reordered. Demonstrates set-based Jaccard matching.
Monospace Code Copy
Checks an AVX2 compilation benchmark post where technical details remain identical. Demonstrates precise systems matching.
Clean Submission
Checks a completely unrelated post about Nashville restaurants. Demonstrates zero-bucket LSH collisions.
Deduplication Engine Pipeline Flow
Scan Report
No near-duplicates found.
Post signature did not trigger any bucket matches.
Trigram Match Visualization
Hover over any highlighted shingle to highlight its exact matching pair in the opposing window.
Scanned Post
Matched Database Original
Deduplication Engine Internals
Normalized tokens are combined into overlapping 3-word shingles and hashed using the fast djb2 algorithm. The input generated 0 unique shingles.
128 universal hash variants compress the shingle set into a compact signature vector. Colliding signature cells are highlighted below.
Signature index blocks are mapped into hash buckets. Shared buckets trigger direct candidate comparisons, eliminating full database scanning.
How to Test the Engine (Walkthrough)
Recruiter GuideIndex Custom Post
This interactive console demonstrates the high-performance duplicate detection engine running on our C++ backend and Redis storage.