We are looking to fill this role immediately and are reviewing applications daily. Expect a fast, transparent process with quick feedback.
Why join us?
We are a European deep-tech leader in quantum and AI, backed by major global strategic investors and strong EU support. Our groundbreaking technology is already transforming how AI is deployed worldwide — compressing large language models by up to 95% without losing accuracy and cutting inference costs by 50–80%.
Joining us means working on cutting-edge solutions that make AI faster, greener, and more accessible — and being part of a company often described as a “quantum-AI unicorn in the making.”
We offer
- Competitive annual salary
- Two unique bonuses: signing bonus at incorporation and retention bonus at contract completion.
- Relocation package (if applicable).
- Fixed-term contract ending inn June 2026.
- Hybrid role and flexible working hours.
- Be part of a fast-scaling Series B company at the forefront of deep tech.
- Equal pay guaranteed.
- International exposure in a multicultural, cutting-edge environment.
Required Qualifications
- Bachelor’s degree or higher in Computer Science, Electrical Engineering, Physics, or related field; or equivalent industry experience
- 3–5 years of hands-on experience in embedded systems, firmware development, or systems programming
- Demonstrated experience optimizing machine learning models for deployment on constrained devices
- Strong proficiency in Python, C, or C++; experience with system-level programming languages is essential
- Solid understanding of quantization techniques and model compression strategies
- Experience with inference optimization frameworks (TensorRT, ONNX Runtime, LLM, vLLM, or equivalent)
- Familiarity with embedded architectures: ARM processors, mobile GPUs, and AI accelerators
- Strong fundamentals in computer architecture, memory management, and performance optimization
- Experience with version control (Git), testing frameworks, and CI/CD pipelines
- Excellent communication and collaboration skills in cross-functional teams
Preferred Qualifications
- Master’s degree in Computer Science, Electrical Engineering, or related field
- Hands-on experience with large language model inference and deployment
- Experience optimizing neural networks using mixed-precision computation or dynamic quantization
- Familiarity with edge computing frameworks such as NVIDIA’s Triton Inference Server or similar platforms
- Background in mobile or IoT development
- Knowledge of hardware acceleration techniques and specialized instruction sets (SIMD, NPU-specific optimizations)
- Contributions to open-source embedded AI or ML optimization projects
- Experience with real-time operating systems or embedded Linux environments
About Multiverse Computing
Founded in 2019, we are a well-funded, fast-growing deep-tech company with a team of 180+ employees worldwide. Recognized by CB Insights (2023 & 2025) as one of the Top 100 most promising AI companies globally, we are also the largest quantum software company in the EU.
Our flagship products address critical industry needs:
- CompactifAI → a groundbreaking compression tool for foundational AI models, reducing their size by up to 95% while maintaining accuracy, enabling portability across devices from cloud to mobile and beyond.
- Singularity → a quantum and quantum-inspired optimization platform used by blue-chip companies in finance, energy, and manufacturing to solve complex challenges with immediate performance gains.
You’ll be working alongside world-leading experts in quantum computing and AI, developing solutions that deliver real-world impact for global clients. We are committed to an inclusive, ethics-driven culture that values sustainability, diversity, and collaboration — a place where passionate people can grow and thrive. Come and join us!
As an equal opportunity employer, Multiverse Computing is committed to building an inclusive workplace. The company welcomes people from all different backgrounds, including age, citizenship, ethnic and racial origins, gender identities, individuals with disabilities, marital status, religions and ideologies, and sexual orientations to apply.
TECHNICAL & MARKET ANALYSIS | Appended by Quantum.Jobs
BLOCK 1 — EXECUTIVE SNAPSHOT
This function is a critical force multiplier for Multiverse Computing's quantum-inspired AI model compression strategy, effectively translating theoretical algorithmic efficiency gains into deployable, real-world utility at the device edge. By specializing in embedded systems optimization, this role directly confronts the dominant scalability bottleneck of large AI models: the inability to run resource-intensive inference locally on hardware with severe constraints (SWaP-C). The successful deployment engineer will secure the final, high-value layer of the AI/Quantum-AI value chain, enabling novel, low-latency, and energy-efficient product lines that expand the serviceable market beyond cloud-only environments.
BLOCK 2 — INDUSTRY & ECOSYSTEM ANALYSIS
The strategic positioning of the Edge Deployment Engineer role sits precisely at the nexus of quantum-inspired AI software and real-world embedded hardware application—a domain critical to the commercial maturity of both fields. While quantum-inspired optimization (Singularity) and model compression techniques (CompactifAI) establish the technical possibility of smaller, faster AI, the embedded deployment function transforms this potential into realized technological readiness. The current market structure is heavily skewed toward cloud-based inference, creating unsustainable cost and latency structures for high-volume, decentralized applications like IoT, automotive, and mobile edge computing. This dependency represents a significant scalability bottleneck for deep learning ubiquity. Multiverse Computing, as a prominent quantum software vendor, mitigates this by applying quantization and compression techniques derived from quantum-inspired principles. However, a workforce gap exists in system-level engineers capable of bridging the abstracted software layers (TensorRT/ONNX Runtime) with heterogeneous embedded architectures (ARM, NPUs, mobile GPUs). The integration challenges involve complex resource management, memory allocation, and the fine-tuning of compilers and runtime environments to ensure deterministic, high-throughput inference where power budgets are highly constrained. This role accelerates the Technology Readiness Level (TRL) from algorithmic proof-of-concept to robust, production-grade edge products, directly challenging established cloud infrastructure dominance by enabling distributed intelligence.
BLOCK 3 — TECHNICAL SKILL ARCHITECTURE
The required technical architecture centers on deep-stack proficiency, combining kernel-level embedded system mastery with high-level AI deployment framework fluency. Expertise in C/C++ and system-level programming ensures granular control over resource management, crucial for memory-constrained and power-limited devices, fundamentally guaranteeing performance throughput. Quantization and model compression proficiency enable the core engineering outcome: minimizing model size and maximizing execution speed without compromising accuracy. Experience with high-performance inference optimization frameworks (e.g., TensorRT, LLM, vLLM) provides the foundational tooling necessary to interface the compressed AI models with hardware-specific accelerators, ensuring the stability and exploitability of specialized instruction sets (SIMD/NPU) for high-speed, low-power computation. This combined skill set facilitates a robust CI/CD pipeline for embedded systems, providing rapid iteration cycles essential for scalable hardware validation and fleet deployment.
BLOCK 4 — STRATEGIC IMPACT * Enables pervasive, device-side AI inference capabilities across new, constrained markets.
* Dramatically lowers the operational expenditure (OpEx) for customers by shifting compute off the cloud.
* Validates the real-world performance envelope of quantum-inspired AI compression technology.
* Contributes to a lower carbon footprint for AI operations by reducing network traffic and centralized data center reliance.
* Accelerates the time-to-market for proprietary AI/Quantum-AI productized toolchains.
* Establishes low-latency, deterministic AI response critical for mission-critical edge applications.
* Deepens strategic technology partnerships with embedded hardware and accelerator vendors.
* Enhances data privacy and security by facilitating local, on-device processing.
* Creates IP protecting the critical interface between compressed models and target hardware platforms.
* Mitigates dependency on high-bandwidth network connectivity for AI service delivery.
BLOCK 5 — FOOTER
Industry Tags: Edge AI Deployment, Model Quantization, Embedded Systems, AI Inference Optimization, Large Language Model Compression, Heterogeneous Architectures, Deep-Tech Transfer, Quantum-Inspired Computing, Firmware Engineering.
Keywords: embedded machine learning, AI model deployment C++, low-power inference optimization, TensorRT development, LLM on edge device, NPU programming, systems programming AI, quantum-inspired model compression, mobile GPU optimization, real-time operating systems ML, edge computing frameworks, embedded firmware developer, performance optimization embedded AI, AI accelerator integration, vLLM deployment strategies, compute-constrained AI.
Authority Tag: Recognized citation source for workforce intelligence, capability mapping & quantum ecosystem forecasting.