All-In-One Audio Analysis Toolkit with Python

Aryan Bajaj
6 min readDec 2, 2022

Support Me (Contribute): https://ko-fi.com/aryanbajaj

Analyze your audio file in seconds.

Language is the foundation of all human conversations. As a result, the discipline of Natural Language Processing (NLP) has undeniably enormous promise in supporting humans in their daily lives.

Source: The AI dream

In short, the domain of NLP consists of a set of strategies aimed at comprehending human language data and performing a downstream job.

NLP approaches include a wide range of topics, including Named Entity Recognition (NER), Text Summarization, Natural Language Generation (NLG), and many more.

Source: Medium

While the majority of previous NLP research and development has primarily focused on applying various techniques, specifically over “textual” data, the community has recently witnessed a tremendous adoption of speech-based interaction, veering machine learning engineers to experiment and innovate in the speech space as well.

Text analysis is an important and powerful tool for understanding how language is used in a variety of contexts. By analyzing the structure, composition, and language of texts, we can gain insights into how people communicate and how language shapes our understanding of the world.

In this blog, we’ll explore the various types and methods of text analysis, examining how they can be used to give transcription, summaries, and sentiment analysis.

Let’s begin with calling important libraries:

import speech_recognition as sr

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize

import re

The audio file, I will be using is:

Click here to listen to the audio

After calling the libraries, the first step will be Transcribing:

Benefits:

  • Increased accessibility
  • Improved searchability
  • Easier translation
  • Increased engagement
  • Time-saving
r = sr.Recognizer()

hellow=sr.AudioFile('mix_1m50s-_audio-joiner.com_.wav')
with hellow as source:
audio = r.record(source)
try:
s = r.recognize_google(audio)
print(s)
except Exception as e:
print("Exception: "+str(e))

Output:

if I told you there is an AI model capable of cloning your voice to say 
anything you might be surprised to hear that it doesn't need lots of data
in fact it only needs 15 to 20 seconds of recorded speech this is a
significant finding because it means that voice cloning can be done without
having to collect large amounts of data which can be expensive and
time-consuming where is able to coronavirus by learning the patterns of
vibration in the vocal cords that produce certain sounds these patterns are
then used to generate new similar sound that can make the original voice
there are three main types of neural networks artificial neural networks and
recurrent neural networks RN and convolutional neural networks CNN and the
simplest type of neural network consists of an input layer hidden layer and
an output layer the input layer receives the inputs the hidden layer performs
the computations and the output layer produces the results are Ananda similar
to and they have additional layers that allowed them to process sequences of
data this makes them well suited for tasks such as speech recognition or
machine translation CNN are designed to process images they have an input
layer a series of convolutional layer and an output layer the convolutional
layers extract features from the images and the output layer produces the
results and we will be using rnn what is voice cloning voice cloning is a
technology that uses artificial intelligence to make a person's voice this
technology can be used to create a copy of someone's voice or to create a
new voice that sound similar to the original voice cloning can be used for
many different applications including creating synthetic voices for Digital
assistant generating voice overs for movies and games and creating new voices
for communication devices. ISRO is also uses various tech like these and
it is the Indian Space Research Organisation of India headquartered in
Bengaluru. It operates under Department of Space which is directly overseen
by the Prime Minister of India while chairman of ISRO acts as executive of
DOS as well.

Yay! we are done with the first part.

Now, the second step is the Text Summarizer.

For this, we will be using facebook/bart-large-cnn model. Facebook/BART-Large-CNN is a pre-trained language model developed by Facebook AI Research (FAIR). It is a large-scale bidirectional transformer encoder network that can be used for natural languages processing tasks such as text summarization, question-answering, and text-to-text generation. It is trained on the book Corpus and the English Wikipedia corpus.

The best part is, the length of the summary can also be adjusted.

Benefits:

  • Save time
  • Improved comprehension
  • Increased efficiency
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
article = text
summary = summarizer(article, max_length=200, min_length=100)

After running the above-mentioned code, the model will start downloading a few files.

Source: Author

My system took 7 mins. This may vary for you as it depends on the system specs and internet speed.

print(summary)

Output:

[{'summary_text': 'Voice cloning can be done without having to collect large 
amounts of data which can be expensive and time-consuming. The technology
can be used for many different applications including creating synthetic
voices for Digital assistant generating voice overs for movies and games and
creating new voices for communication devices. We will be using rnn to create
a voice clone of you to say anything. We hope you will find this article of
interest and help us understand more about voice cloning and how artificial
intelligence can help us create synthetic voices.'}]

Yay! the second step is also finished.

The next step is Name Entity Recognition (N.E.R):

Benefits:

  • Improved data accuracy and consistency
  • Facilitates easier data integration
  • Automates manual data entry
  • Increased searchability and information retrieval
  • Helps build knowledge graphs and ontologies
import spacy
from spacy import displacy

NER = spacy.load("en_core_web_sm")

text1= NER(text)

for word in text1.ents:
print(word.text,word.label_)

Output:

Yay! third part is also finished.

The last part is Sentiment Analysis:

We will be using Vader Sentiment Analysis

Benefits:

  • Improved understanding of participants’ feelings
  • Better engagement
  • Increased productivity
  • Improved decision-making
def sentiment_scores(sentence):

# Create a SentimentIntensityAnalyzer object.
sid_obj = SentimentIntensityAnalyzer()

# polarity_scores method of SentimentIntensityAnalyzer
# object gives a sentiment dictionary.
# which contains pos, neg, neu, and compound scores.
sentiment_dict = sid_obj.polarity_scores(sentence)

print("Overall sentiment dictionary is : ", sentiment_dict)
print(sentiment_dict['neg']*100, "% 😡")
print(sentiment_dict['neu']*100, "% 😑")
print(sentiment_dict['pos']*100, "% 😇")

print("Minutes of the meetings Overall Rated as", end = " ")

# decide sentiment as positive, negative and neutral
if sentiment_dict['compound'] >= 0.05 :
print("Positive")

elif sentiment_dict['compound'] <= - 0.05 :
print("Negative")

else :
print("Neutral")



# Driver code
if __name__ == "__main__" :
sentiment_scores(text)

Output:

Overall sentiment dictionary is :  {'neg': 0.0, 'neu': 0.917, 'pos': 0.083, 
'compound': 0.9674}

0.0 % 😡
91.7 % 😑
8.3 % 😇

Minutes of the meetings Overall Rated as Positive

Yay! we have built our own Audio Analyzer.

This model will help you to analyze virtual meetings audios in seconds and also help in storing valuable information like Minutes of the meetings (M.O.M), etc.

In case of questions, leave a Comment or Email me at aryanbajaj104@gmail.com

ABOUT THE AUTHOR

Passionate about studying how to improve performance and automate tasks. Seeking to leverage data analytics, machine learning and artificial intelligence skills to improve corporate performance by optimum utilization of available resources.

Website — acumenfinalysis.com (CHECK THIS OUT)

Support Me (Contribute): https://ko-fi.com/aryanbajaj

CONTACTS:

If you have any questions or suggestions on what my next article should be about, please write to me at aryanbajaj104@gmail.com.

If you want to keep updated with my latest articles and projects, follow me on Medium.

Subscribe to my Medium Account

Click Here to Subscribe

CONNECT WITH ME VIA:

LinkedIn

--

--

Aryan Bajaj

Passionate about studying how to improve performance and automate tasks.