With sound labeling, the data annotators are given a recording and they need to separate all of the needed sounds and label them. For example, these can be certain keywords or the sound of a specific musical instrument.
Event tracking evaluates performance of the sound event detection systems in multisource conditions similar to our everyday life, where the sound sources are rarely heard in isolation. In this task, there is no control over the number of overlapping sound events at each time, not in the training nor in the testing audio data.
Speech to text transcription is an important part of creating NLP technology. It involves taking recorded speech and transcribing them to text while carefully labeling both words and sounds that the person pronounces. It is also important to use the right punctuation as well.
Audio classification is listening and analyzing audio recordings. Using this data, the machines are able to differentiate between sounds and voice commands. This type of audio annotation is important in the development of virtual assistants, automatic speech recognition and text to speech systems. There are many different types of audio classification: