Bioinformatics Pattern Matching Engine

Bioinformatics Pattern Matching Engine

$4,500.00
Skip to product information
Bioinformatics Pattern Matching Engine

Bioinformatics Pattern Matching Engine

$4,500.00

A deterministic, sanitizer-clean, dependency-free sequence-alignment engine that an in-house team can drop into a pipeline tomorrow morning and run in front of regulators by tomorrow afternoon. BPME ships as source code under a perpetual, named-licensee proprietary licence. No runtime to license per-seat. No cloud component. No telemetry. No surprises in the audit.

The same engine is exposed through three coordinated interfaces — a C++20 native API, a stable C ABI, and a pure-ctypes Python binding — so it fits any procurement stack from a Rust CLI to a Jupyter notebook to a regulated clinical pipeline.

Why teams choose BPME over what they already have

What teams hit What BPME delivers
Reproducibility Different hosts produce subtly different output; hard to audit Bit-identical index files across machines and runs. Every index file embeds a SHA-256 manifest of its input. A single bpme verify confirms an index was built from the FASTA you think it was.
Scale on similar genomes Aligning against many near-identical references blows up RAM linearly Dual-mode index. The pangenome-aware storage layer scales with the similarity of the input, not the number of genomes.
Integration friction Python bindings break across versions; FFI is fragile One C ABI, opaque handles, status codes, thread-local error strings. Bindings work from any language with a C FFI. Bundled Python wrapper has zero third-party dependencies.
Licensing exposure The open-source incumbent is GPL or restricted-use Proprietary source licence with a clean grant. Ship the engine inside your product without infecting your stack.
Trust Hard to defend an open-source binary in a regulated audit Source-available. Sanitizer-clean. Fuzz-harnessed loader. Deterministic builds. Versioned, magic-numbered on-disk format.

Performance, measured

All numbers come from the bundled benchmark suite, single-threaded on commodity hardware. Every number is reproducible by the buyer's own engineers on day one.

  • ~2,000× faster exact pattern search than the C++ Standard Library's substring search on the same input. Sub-microsecond per 30-mer query, independent of reference size.
  • Sub-microsecond locate at standard sampling settings — the per-hit cost a downstream variant caller or coverage tool actually feels.
  • Up to 2.5× less RAM for pangenome-style references (multiple highly similar genomes) compared with the classical mode, with a build that is roughly 2.7× faster on the same workload.
  • Lockstep batched search for high-throughput pipelines: process thousands of queries in interleaved fashion against a single read-only index. Scales near-linearly across the bundled thread pool.
  • Memory-mapped indexes: queries run directly out of the OS page cache. There is no RAM ceiling on the reference. Multi-gigabyte indexes are first-class.

Technical specifications

Language C++20
Build system CMake 3.20+
Runtime dependencies libpthread only
Platforms Linux x86-64 (reference), macOS, Windows (POSIX paths via CMake)
Alphabet DNA (A, C, G, T, N) with IUPAC ambiguity codes and the standard sentinel
Index size Indexes are memory-mapped; tested at multi-gigabyte scale
Determinism Byte-identical artefacts across hosts; no PRNG in the build path
File format Versioned, magic-numbered, endian-explicit, content-hashed
Threading Bundled thread pool; lock-free upgrades on the roadmap
Audit features SHA-256 input hash embedded in every index; bpme verify CLI for round-trip integrity check


You may also like