audEERING® Provides
Open-Source

Become a part of the
voice AI community.

Unlock the full potential of audio analysis with audEERING’s open-source voice AI models. As leaders in the field, we empower researchers and developers worldwide to collaborate on groundbreaking innovations in audio processing and machine learning. By making our Voice AI models publicly accessible, we foster a thriving community of innovators, accelerating the pace of development and expanding the possibilities for holistic audio AI applications.

In addition to offering open-source models, we have developed a comprehensive suite of software tools essential for building and evaluating these models. So you can explore all of our open-source projects on our GitHub repository.

We are the openSMILE creators

openSMILE (open-source Speech and Music Interpretation by Large-space Extraction) is an open-source toolkit for audio feature extraction and classification of speech and music signals. Therefore parts of the open-source toolkit are wrapped into our commercial products to give the possibility of commercial development of your product. 

openSMILE is free and available for research purposes on Github, download the latest release or install it for Python with:

<code>pip install opensmile</code>

150,000 +Downloads

2,650 +Citations in scientific publications

Giving you the right audio solution
To listen between the lines

Our open-source models are for research purposes only. A commercial license for a model trained on more data is available via SDK, Web API, Unity Plug-In, and the web-based platform AI SoundLab. It is the most suitable AI technology – easy to implement, analyses in real-time, and offers your company the baseline for voice-based analytics.

For commercial use, check out our core product devAIce®.

Our speech analytic models on Hugging Face

With 3.7 million downloads on Hugging Face, our open-source models are among the highest rated for speech analytics, speech and voice recognition, and expression recognition!

You can get high accuracy for your academic research. If you want to work deeper, better, faster, and more efficiently, you need the commercial suite of devAIce® or AI SoundLab. So you can spread your product all over the world and show the genius of voice.

The commercial devAIce® models are trained on much more data and with the newest technology, so if you want to build a top-of-the-shelf product, please see our devAIce® offerings.

Say you’re a researcher and want to compare your work with 1,000s of academic papers, audEERING’s open-source models will suit you just right. To see how they work, click below, select one of the sample audios, upload your own, or record it now. The open-source models are available through Hugging Face and give you access to over 20 years of research in Affective Computing.

Click on the button to load the content from audeering-speech-analysis.hf.space.

Load content

The expression model
Based on wav2vec 2.

The model expects a raw audio signal as input and outputs predictions for arousal, dominance, and valence in a range of approximately 0…1. In addition, it also provides the pooled states of the last transformer layer. It was created by fine-tuning Wav2Vec2-Large-Robust on MSP-Podcast (v1.7). Pruning reduced the model from 24 to 12 transformers. An ONNX export of the model is available from doi:10.5281/zenodo.6221127. Further details are given in the associated paper – Closing the Valence Gap – and tutorial.

Age & gender recognition
Based on wav2vec 2.0

This model expects a raw audio signal as input and outputs predictions for age in a range of approximately 0…1 (0…100 years) and gender expressing the probability of being child, female, or male. In addition, it also provides the pooled states of the last transformer layer. It was created by fine-tuning Wav2Vec2-Large-Robust on aGender, Mozilla Common Voice, Timit, and Voxceleb 2. For this version of the model, we trained all 24 transformer layers. An ONNX export of the model is available from doi:10.5281/zenodo.7761387. Further details are given in the associated paper and tutorial.

What to expect
From open-source

  • Trained on less data
  • Only trained in English
  • Lower robustness in comparison to the commercial models
  • Multi-layer models: working slower, with higher resource consumption

For commercial use:
devAIce® Web API on AI SoundLab

This demo is the fusion of our two core products. It invites you to test and trust our AI. The devAIce® demo is created in our R&D platform AI SoundLab for data collection and real-time analysis. It shows you the most important modules of our sleek tech suite. With just a few clicks you can test our demo – choose between different modules or take them all:

  • Expression
  • Speaker Attributes
  • Scene Detection
  • Voice Activity Detection (VAD)
  • Prosody

 

Experience our outstanding Voice AI technology to get the WOW out of your voice.

Prepaid packages
For devAIce® Web API

Access the devAIce® Web API with our prepaid plans, designed for trials and occasional analysis. Enjoy the flexibility of a risk-free experience.

Choose your plan and explore our technology effortlessly. Get started! 

Q&A

  • Free redistribution: You can freely share the software with others.
  • Source code access: The source code is available for anyone to examine and modify.
  • Derived works: You can create new software based on the original.
  • Integrity of the author’s source code: The original author’s copyright must be respected.
  • AI Frameworks: Powerful tools like TensorFlow, PyTorch, and scikit-learn provide the foundation for building and training machine learning models.
  • Data Science: Libraries such as Pandas, NumPy, and Matplotlib are essential for data manipulation and analysis.
  • Cloud Computing: OpenStack offers a flexible platform for creating public and private clouds.
  • Web Development: Frameworks like React, Angular, and Vue.js power many modern web applications.
  • DevOps: Docker and Kubernetes streamline the deployment and management of applications in containers.
  • Cost-effective: Open source software is often free or very low-cost.
  • Reliable: Open source software benefits from the scrutiny of a large community of developers, leading to fewer bugs and vulnerabilities.
  • Flexible: Open source software can be customized to meet specific needs.
  • Innovation: Open source fosters innovation by encouraging collaboration and experimentation.
  • Community support: Open source projects have active communities that provide support and documentation.
  • Licensing: Choose a license that aligns with your project goals and community expectations.
  • Community: Build a strong, diverse, and supportive community around your project.
  • Code Quality: Maintain high coding standards and continuously improve your code.
  • Data Privacy: Handle audio data with utmost care and comply with privacy regulations.
  • Ethics: Address ethical implications of AI in your open-source projects.
  • Sustainability: Plan for the long-term maintenance and support of your project.

devAIce® Expression and
Scene Detection

Learn more about our devAIce®. audEERING®’s lightweight technology for expression detection, scene detection and many other purposes.

Learn more ›

Who is
audEERING?

audEERING® is not only the developer of openSMILE but also the worldwide leading innovator in Audio AI. Discover more about the company.

Learn more ›