← NEC Consult Technical detail

For the evaluators.

The full capability breakdown, per-GPU benchmarks, and the private Blackwell cluster spec — for the technical buyer who wants to see under the hood.

What we build — in detail.

Healthcare automation

HIPAA-grade agentic systems

Closed-loop automation over EVV systems, scheduling, payroll exceptions, and onboarding. Local-LLM-only, encrypted state, full audit trail.

In production: weekly EVV-gap digest over HHAeXchange SOAP — 90-day rolling analysis across 40+ caregivers / 3,400+ visits; DocuSign-production timesheet automation with an EVV reconciliation layer that separates genuine exceptions from unlinked clocks.
Domain-specialized LLMs

Fine-tuning + RAG on your corpus

Continued pretraining over your private documents, then domain SFT, then retrieval grounding. We train on our cluster — your data never sees a third-party API.

Validated: 14B model lifted from 20% to 65% on the official NCBE MBE benchmark (bar-passing band) — 3.25× over the untuned base, foundation-first CPT pipeline.
Agent orchestration

Master + specialist agent platforms

A purpose-built agent framework — declarative agent contracts, unified tool registry, per-agent budget + full audit log, PHI-aware model routing, and a single operator review inbox. 235B-parameter MoE primary with fallback paths.

Live: cluster-health agent on 5-min / 30-min / daily probes across 4 nodes with structured state output.
Onboarding & lifecycle

Automated paperwork chase-down

Owns the envelope-to-signature loop: DocuSign orchestration, personalized chase, escalation triggers, human-escalation paths.

Live in production: DocuSign production integration authenticated end-to-end; automation-grade branded templates generated and field-filled programmatically; 7 locked escalation triggers; encrypted per-CG state.
Voice & synthetic media

Voice cloning, talking avatars, real-time

Voice cloning over minutes of reference audio, real-time lip-sync rendering, sub-second targets for interactive use cases.

Stack: GPT-SoVITS voice, streaming STT, neural-video avatars, real-time render pipeline.
Custom data & RAG

Retrieval over your private documents

Pipeline-built corpus ingestion (PDFs, regulatory text, treatises), vector retrieval, and a citation-verifier that hard-flags fabricated cites before they reach your user.

Built: multi-corpus ingestion + stdlib-only citation verifier covering 40+ fabrication patterns.
Measured performance

Numbers we can defend.

Benchmarked end-to-end on the actual cluster, not extrapolated from vendor decks. PyTorch microbench at 8K square matmul, BF16 + FP8 tensor-core paths, median of 200 iterations.

FP8 realized
2.5 PFLOPS
Cluster aggregate, 6 GPUs
BF16 realized
1.3 PFLOPS
Cluster aggregate, 6 GPUs
% of vendor peak
~62%
Realized vs NVIDIA published dense peak
NVIDIA-rated (FP4 sparse)
~18 PFLOPS
4 PF × 4 RTX PRO 6000 + 1 PF × 2 Spark
NodeGPUBF16 TFLOPS% peakFP8 TFLOPS% peak
Spark 1GB109274%18473%
Spark 2GB109173%18273%
Node 3PRO 6000 Full35671%66867%
Node 3PRO 6000 Max-Q24656%47454%
Node 4PRO 6000 Max-Q24556%52460%
Node 4PRO 6000 Max-Q26159%48655%
Cluster total1,29163%2,51861%
Honest framing: 60–70% of NVIDIA's published peak is normal realized throughput for general PyTorch matmul on bleeding-edge Blackwell silicon without hand-tuned CUTLASS kernels. NVIDIA's headline "AI TOPS" are FP4 sparse (~18 PFLOPS for this cluster); the realized figures above are measured FP8/BF16 dense. We quote both — the dense numbers are what your workload actually runs at.
The cluster

Hardware spec.

Nodes
4
2× NVIDIA DGX Spark (ARM, GB10) · 2× Threadripper PRO x86
Blackwell GPUs
6 total
1× RTX PRO 6000 Full + 3× Max-Q + 2× GB10
VRAM aggregated
576 GB
96 GB per PRO 6000 × 4 + GB10 unified memory
Inter-node fabric
100 Gb/s
MikroTik CRS504 · Mellanox ConnectX-6 Dx · 98.6 Gb/s verified
System RAM
384 GB
DDR5 ECC across x86 nodes; expandable to 2 TB
NVMe storage
~12 TB
Gen 4 + Gen 5, distributed, NFSv4 from canonical source