All-In-One Audio Analysis Toolkit with Python

6 min readDec 2, 2022

Support Me (Contribute): https://ko-fi.com/aryanbajaj

Analyze your audio file in seconds.

Language is the foundation of all human conversations. As a result, the discipline of Natural Language Processing (NLP) has undeniably enormous promise in supporting humans in their daily lives.

In short, the domain of NLP consists of a set of strategies aimed at comprehending human language data and performing a downstream job.

NLP approaches include a wide range of topics, including Named Entity Recognition (NER), Text Summarization, Natural Language Generation (NLG), and many more.

While the majority of previous NLP research and development has primarily focused on applying various techniques, specifically over “textual” data, the community has recently witnessed a tremendous adoption of speech-based interaction, veering machine learning engineers to experiment and innovate in the speech space as well.

Text analysis is an important and powerful tool for understanding how language is used in a variety of contexts. By analyzing the structure, composition, and language of texts, we can gain insights into how people communicate and how language shapes our understanding of the world.

In this blog, we’ll explore the various types and methods of text analysis, examining how they can be used to give transcription, summaries, and sentiment analysis.

Let’s begin with calling important libraries:

import speech_recognition as sr

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize

import re

The audio file, I will be using is:

Click here to listen to the audio

After calling the libraries, the first step will be Transcribing:

Benefits:

Increased accessibility
Improved searchability
Easier translation
Increased engagement
Time-saving

r = sr.Recognizer()

hellow=sr.AudioFile('mix_1m50s-_audio-joiner.com_.wav')
with hellow as source:
    audio = r.record(source)
try:
    s = r.recognize_google(audio)
    print(s)
except Exception as e:
    print("Exception: "+str(e))

Output:

if I told you there is an AI model capable of cloning your voice to say 
anything you might be surprised to hear that it doesn't need lots of data 
in fact it only needs 15 to 20 seconds of recorded speech this is a 
significant finding because it means that voice cloning can be done without 
having to collect large amounts of data which can be expensive and 
time-consuming where is able to coronavirus by learning the patterns of 
vibration in the vocal cords that produce certain sounds these patterns are 
then used to generate new similar sound that can make the original voice 
there are three main types of neural networks artificial neural networks and 
recurrent neural networks RN and convolutional neural networks CNN and the 
simplest type of neural network consists of an input layer hidden layer and 
an output layer the input layer receives the inputs the hidden layer performs 
the computations and the output layer produces the results are Ananda similar 
to and they have additional layers that allowed them to process sequences of 
data this makes them well suited for tasks such as speech recognition or 
machine translation CNN are designed to process images they have an input 
layer a series of convolutional layer and an output layer the convolutional 
layers extract features from the images and the output layer produces the 
results and we will be using rnn what is voice cloning voice cloning is a 
technology that uses artificial intelligence to make a person's voice this 
technology can be used to create a copy of someone's voice or to create a 
new voice that sound similar to the original voice cloning can be used for 
many different applications including creating synthetic voices for Digital 
assistant generating voice overs for movies and games and creating new voices 
for communication devices. ISRO is also uses various tech like these and 
it is the Indian Space Research Organisation of India headquartered in 
Bengaluru. It operates under Department of Space which is directly overseen 
by the Prime Minister of India while chairman of ISRO acts as executive of 
DOS as well.

Yay! we are done with the first part.

Now, the second step is the Text Summarizer.

For this, we will be using facebook/bart-large-cnn model. Facebook/BART-Large-CNN is a pre-trained language model developed by Facebook AI Research (FAIR). It is a large-scale bidirectional transformer encoder network that can be used for natural languages processing tasks such as text summarization, question-answering, and text-to-text generation. It is trained on the book Corpus and the English Wikipedia corpus.

The best part is, the length of the summary can also be adjusted.

Benefits:

Save time
Improved comprehension
Increased efficiency

from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
article = text
summary = summarizer(article, max_length=200, min_length=100)

After running the above-mentioned code, the model will start downloading a few files.

My system took 7 mins. This may vary for you as it depends on the system specs and internet speed.

print(summary)

Output:

[{'summary_text': 'Voice cloning can be done without having to collect large 
amounts of data which can be expensive and time-consuming. The technology 
can be used for many different applications including creating synthetic 
voices for Digital assistant generating voice overs for movies and games and 
creating new voices for communication devices. We will be using rnn to create 
a voice clone of you to say anything. We hope you will find this article of 
interest and help us understand more about voice cloning and how artificial 
intelligence can help us create synthetic voices.'}]

Yay! the second step is also finished.

The next step is Name Entity Recognition (N.E.R):

Benefits:

Improved data accuracy and consistency
Facilitates easier data integration
Automates manual data entry
Increased searchability and information retrieval
Helps build knowledge graphs and ontologies

import spacy
from spacy import displacy

NER = spacy.load("en_core_web_sm")

text1= NER(text)

for word in text1.ents:
    print(word.text,word.label_)

Output:

Yay! third part is also finished.

The last part is Sentiment Analysis:

We will be using Vader Sentiment Analysis

Benefits:

Improved understanding of participants’ feelings
Better engagement
Increased productivity
Improved decision-making

def sentiment_scores(sentence):
 
    # Create a SentimentIntensityAnalyzer object.
    sid_obj = SentimentIntensityAnalyzer()
 
    # polarity_scores method of SentimentIntensityAnalyzer
    # object gives a sentiment dictionary.
    # which contains pos, neg, neu, and compound scores.
    sentiment_dict = sid_obj.polarity_scores(sentence)
     
    print("Overall sentiment dictionary is : ", sentiment_dict)
    print(sentiment_dict['neg']*100, "% 😡")
    print(sentiment_dict['neu']*100, "% 😑")
    print(sentiment_dict['pos']*100, "% 😇")
 
    print("Minutes of the meetings Overall Rated as", end = " ")
 
    # decide sentiment as positive, negative and neutral
    if sentiment_dict['compound'] >= 0.05 :
        print("Positive")
 
    elif sentiment_dict['compound'] <= - 0.05 :
        print("Negative")
 
    else :
        print("Neutral")
 
 
   
# Driver code
if __name__ == "__main__" :
    sentiment_scores(text)

Output:

Overall sentiment dictionary is :  {'neg': 0.0, 'neu': 0.917, 'pos': 0.083, 
'compound': 0.9674}

0.0 % 😡
91.7 % 😑
8.3 % 😇

Minutes of the meetings Overall Rated as Positive

Yay! we have built our own Audio Analyzer.

This model will help you to analyze virtual meetings audios in seconds and also help in storing valuable information like Minutes of the meetings (M.O.M), etc.

In case of questions, leave a Comment or Email me at aryanbajaj104@gmail.com

ABOUT THE AUTHOR

Passionate about studying how to improve performance and automate tasks. Seeking to leverage data analytics, machine learning and artificial intelligence skills to improve corporate performance by optimum utilization of available resources.

Website — acumenfinalysis.com (CHECK THIS OUT)

Support Me (Contribute): https://ko-fi.com/aryanbajaj

CONTACTS:

If you have any questions or suggestions on what my next article should be about, please write to me at aryanbajaj104@gmail.com.

If you want to keep updated with my latest articles and projects, follow me on Medium.

Subscribe to my Medium Account

Click Here to Subscribe

All-In-One Audio Analysis Toolkit with Python

ABOUT THE AUTHOR

CONTACTS:

Subscribe to my Medium Account

CONNECT WITH ME VIA:

Written by Aryan Bajaj

No responses yet