In the realm of digital content, podcasts have emerged as a popular medium for storytelling, education, and entertainment. However, language barriers often limit the reach of these valuable resources. Recognizing this challenge, the team at ArtSciLab has embarked on an ambitious project to make their Creative Disturbance Publishers (CDP) podcasts accessible to a global audience, regardless of language. The solution? Tafsiri.
Tafsiri is an AI-powered software designed to transcribe and translate CDP podcasts, which are primarily recorded in English, into any of the 100 most spoken languages in the world. This web-based application leverages Azure Cognitive Services and is powered by the Neural Machine Translation (NMT) model, enabling it to support conversations, search for podcasts, transcribe, and translate content, and return the requested podcasts with subtitles in the user’s preferred language.
The impact of Tafsiri is far-reaching. By breaking down language barriers, Tafsiri democratizes access to the CDP’s rich content, making it accessible to non-English speakers across the globe. This aligns with the vision of the Harry Bass Jr. School of Arts, Humanities, and Technology at The University of Texas at Dallas, which initiated the CDP project.
The development of Tafsiri is spearheaded by a team with a wealth of knowledge in AI implementation and secure coding practices.
Lead Developer
Developer
Collins Mwange (Sir Mbwika) is a Cybersecurity ’25 graduate student at UT Dallas. He has over 5 years of experience in software development, systems support, and cybersecurity. Collins likes experimenting with new technologies.
Vinayak Mooliyil is a Business Analytics and AI graduate student at JSOM. He has 3+ years of experience in the IT industry and is currently transitioning to ML and Data Science having worked on IoT and Web Development projects.
Tafsiri represents a significant stride in making digital content more accessible. By harnessing the power of AI, it transcends language barriers, bringing diverse audiences closer to the wealth of knowledge shared through the Creative Disturbance Publishers. The dedicated team behind Tafsiri continues to innovate, driven by the vision of a world where language is no longer a barrier to information.
We used the Microsoft Translator API (https://api.cognitive.microsofttranslator.com/), which is powered by Microsoft’s proprietary Neural Machine Translation (NMT) models, part of Azure AI services.
Microsoft developed these large language models (LLMs) for translation, separate from OpenAI’s GPT models. They are trained using deep learning techniques and optimized for multilingual translation across 100+ languages.
Ditching the Microsoft Azure managed service Neural Machine Translation (NMT) model for open-source locally hosted options, we embarked on research to find a solution that was FREE for consumption and supported Translation from English to other languages.
The ideal solution would be free to consume and support Speech-to-Speech Translation from English to other languages. If that was not available, we would settle for a Free solution that supported Speech-to-Text (English-to-English) and Text-to-Speech (English-to-Other Languages). It could be a single model supporting the 2 steps, or two different models supporting individual steps.
⚠️ DID NOT ADDRESS OUR PROBLEM
Why Consider SeamlessM4T Large (v1) for Our Use Case?
SeamlessM4T supports:
🔹 Key Features
⚠️ Limitations
1️⃣ Using edge-tts
(Offline Microsoft TTS)
For offline TTS, installed and used edge-tts
(Microsoft’s Text-to-Speech engine):
✅ Pros: Works offline, no API key needed
❌ Cons: Fewer voice options
❌ edge-tts
has a character limit per request (~300 characters).
✅ Handles long text automatically (splits text into smaller chunks before sending it to edge-tts
).
✅ Combines multiple audio segments into one final MP3 file.
Choosing between containerizing earlier and containerizing later, we settled for containerizing earlier. This is the analysis of both options:
✅ Best for:
🔹 Pros:
🔸 Cons:
💡 Best Practice:
✅ Best for:
🔹 Pros:
🔸 Cons:
💡 Best Practice:
🚀 For long-term projects: Containerize early to maintain consistency.
⚡ For quick prototypes: Containerize later but plan ahead.
To build the GUI (Graphical User Interface), we used PyQt5, a Python binding for the Qt application framework.
React
Tafsiri full-stack technologies:
✅ FastAPI (Backend) – Manages file uploads, translation, and text-to-speech processing.
✅ Vite (Server) – is a build tool and development server with hot module replacement (HMR).
✅ React + TailwindCSS (Frontend) – Provides an intuitive UI for users.
✅ SeamlessM4T v2 (LLM Model) – Handles multilingual speech-to-text translation.
torchaudio
+ SeamlessM4T v2
.edge_tts
.