NEC Consult — Technical detail

What we build — in detail.

Healthcare automation

Agentic systems for regulated healthcare operations

Closed-loop automation over EVV systems, scheduling, payroll exceptions, and onboarding. Designed for sensitive and regulated workloads: private model execution, encrypted state, audit logging, human-review paths.

In production: weekly EVV-gap digest over HHAeXchange SOAP — 90-day rolling analysis across 40+ caregivers / 3,400+ visits; DocuSign-production timesheet automation with an EVV reconciliation layer that separates genuine exceptions from unlinked clocks.

Domain-specialized LLMs

Fine-tuning + RAG on your corpus

Continued pretraining over your private documents, then domain SFT, then retrieval grounding. We train on our cluster — your data never sees a third-party API.

Validated: 14B model lifted from 20% to 65% on the official NCBE MBE benchmark (bar-passing band) — 3.25× over the untuned base, foundation-first CPT pipeline.

Agent orchestration

Master + specialist agent platforms

A purpose-built agent framework — declarative agent contracts, unified tool registry, per-agent budget + full audit log, PHI-aware model routing, and a single operator review inbox. 235B-parameter MoE primary with fallback paths.

Live: cluster-health agent on 5-min / 30-min / daily probes across 4 nodes with structured state output.

Onboarding & lifecycle

Automated paperwork chase-down

Owns the envelope-to-signature loop: DocuSign orchestration, personalized chase, escalation triggers, human-escalation paths.

Live in production: DocuSign production integration authenticated end-to-end; automation-grade branded templates generated and field-filled programmatically; 7 locked escalation triggers; encrypted per-CG state.

Voice & synthetic media

Voice cloning, talking avatars, real-time

Voice cloning over minutes of reference audio, real-time lip-sync rendering, sub-second targets for interactive use cases.

Stack: GPT-SoVITS voice, streaming STT, neural-video avatars, real-time render pipeline.

Custom data & RAG

Retrieval over your private documents

Pipeline-built corpus ingestion (PDFs, regulatory text, treatises), vector retrieval, and a citation-verifier that hard-flags fabricated cites before they reach your user.

Built: multi-corpus ingestion + stdlib-only citation verifier covering 40+ fabrication patterns.

Measured performance

Numbers we can defend.

Benchmarked end-to-end on the actual cluster, not extrapolated from vendor decks. PyTorch microbench at 8K square matmul, BF16 + FP8 tensor-core paths, median of 200 iterations.

FP8 realized

2.5 PFLOPS

Cluster aggregate, 6 GPUs

BF16 realized

1.3 PFLOPS

Cluster aggregate, 6 GPUs

% of vendor peak

~62%

Realized vs NVIDIA published dense peak

NVIDIA-rated (FP4 sparse)

~18 PFLOPS

4 PF × 4 RTX PRO 6000 + 1 PF × 2 Spark

Node	GPU	BF16 TFLOPS	% peak	FP8 TFLOPS	% peak
Spark 1	GB10	92	74%	184	73%
Spark 2	GB10	91	73%	182	73%
Node 3	PRO 6000 Full	356	71%	668	67%
Node 3	PRO 6000 Max-Q	246	56%	474	54%
Node 4	PRO 6000 Max-Q	245	56%	524	60%
Node 4	PRO 6000 Max-Q	261	59%	486	55%
Cluster total		1,291	63%	2,518	61%

Honest framing: vendor-rated peak and measured dense throughput are different precisions and sparsity assumptions and are not directly comparable. NVIDIA's headline "AI TOPS" are FP4 sparse (up to 18 PFLOPS vendor-rated peak for this cluster); the realized figures above are measured FP8/BF16 dense. We quote both — the dense numbers are what your workload actually runs at.

For the evaluators.