What are we working on? - TAK Transcription and Automated Language Kit (TAK TALK)

What are we working on? - TAK Transcription and Automated Language Kit (TAK TALK)

Voice communications remain one of the most effective ways to convey critical information, especially in high-stakes or complex environments. However, challenges such as multilingual operations and constrained tactical networks often hinder their effectiveness. The TAK Transcription and Automated Language Kit (TAK TALK) plugin addresses these challenges by enabling seamless voice communication within the TAK ecosystem, even under bandwidth limitations or in multilingual, multiparty environments.

The Problem

Breakdowns in verbal communication can have serious consequences, causing delays, misinterpretations, and even mission failure. Several key challenges with voice communications can undermine mission effectiveness and overall success: 

  • Language Barriers
    Multinational operations often involve participants who do not share a common spoken language or struggle with accents that impede clarity. Human translators can be costly, inefficient in fast-paced situations, and difficult to source in real-time tactical scenarios.

  • Network Constraints
    Tactical networks are often overloaded or lack the bandwidth to support traditional, group-wide voice communications, particularly in low-cost mesh networks.

  • Capability Gaps in Current Solutions
    Existing real-time translation systems employing text-to-speech technologies often result in robotic voices or voices that differ from the original speaker, making it difficult to identify speakers in multiparty settings.


Figure 1 - Communication across TAK network

Our Solution

As shown in Figure 1, TAK TALK provides a push-to-talk interface within ATAK that enables automatic translation across devices while facilitating voice communications, even on extremely constrained networks. Transcripts can be displayed to end users, as seen in the figure, but this is optional - allowing you to leave the map in full-screen mode in ATAK while conducting voice conversations.

How it Works

As depicted below, TAK TALK integrates advanced Machine Learning (ML) capabilities to execute the following workflow:

  1. A user speaks into ATAK, where TAK TALK performs speech-to-text.
  2. Only the text (no audio) is transmitted to the receiving devices.
  3. If the receiving device is set to a different language than the sending device, TAK TALK performs automatic machine translation.
  4. Text-to-speech combined with voice cloning is used to generate audible speech on the receiving phone(s) that sounds like it was spoken by the original speaker.

Together, these ML techniques provide a scalable communication solution that emulates true voice conversations while requiring only a fraction of the bandwidth needed for traditional systems. In doing so, TAK TALK ensures reliable and efficient communication, minimizes delays, and maximizes mission success in edge environments. 


Figure 2 - TAK TALK: Big Picture

Key Features of TAK TALK

  • Low-Bandwidth Transmission: Transmits voice data as text, using the same or less bandwidth as standard Cursor on Target (CoT) messages. This enables seamless operation over any network that supports CoT and allows for even more compressed encodings when communicating directly from ATAK to ATAK.

  • Device-Only Compute: All processing happens entirely within ATAK, with no need for any external servers. TAK TALK data can flow over mesh networks, through a TAK Server, or any other network that supports CoT or UDP.

  • Advanced Translation: Leverages Meta’s No Language Left Behind (NLLB) model to provide direct any-to-any translation across 40+ languages. This eliminates the need for English as an intermediary, ensuring higher accuracy and reducing cascading translation errors.

  • Natural Voice Cloning: Uses the OpenVoice voice cloning system to replicate the user’s voice—even when speaking in a different language. With just 30 seconds of reading an audio script, the system generates a compact speech profile that can be shared at mission start.

  • High-Speed Performance: Processes voice input by listening, transcribing, transmitting, translating, and speaking in under 2 seconds in most scenarios. Delivers low latency even on slower mesh networks, ensuring seamless communication comparable to traditional two-way push-to-talk (PTT) systems.

  • Collocated Conversations: Enables multilingual conversations on a single device for two individuals who are physically next to each other but do not share a common language. As shown in Figure 4, each user can select their own language setting on the shared device, with access to a PTT button and a transcript of the conversation—translated into their respective languages. Emits voice in real time using voice cloning to replicate the original speaker’s voice, enabling seamless communication without requiring a network connection. 


Figure 3 - Natural Voice Cloning

 


Figure 4 - Single Device Mode

Status and What’s Next

TAK TALK is positioned to be a critical enabler of mission success, ensuring information dominance across domains. The upcoming phase will incorporate user feedback to further expand features and capabilities. 

Get Involved

Interested in testing or contributing feature suggestions? Reach out to us at tak@rtx.com

Sponsor

This material is based upon work supported by the United States Air Force under contract number FA8750-24-C-B082 (Prime Stonewall Defense, LLC dba Certus Innovations). 

Back to blog