How to generate text from a video file using python

Published in

Becoming Human: Artificial Intelligence Magazine

3 min readMay 7, 2019

As per the trend, everyone is talking about Natural language processing, speech recognition, text generation etc. In this article, we will discuss on how can we get text from the video or audio files.

Pre-requisites:
>> Python 3.7
>> ffmpeg
>> Libraries: os and speech_recognition

Step 1: Prepare directory
Create a new folder and add some video files. For instance, I have created a folder ‘SpeechConversion’ and in this folder I have one video song (in .mp4 format).

Step 2: Import libraries
Import the required libraries, refer below code:
import os
import speech_recognition as sr

Step 3: Command for video conversion
I am using ffmpeg to convert the video file to audio. First, I will convert this to mp3 format and then will transform it to the wav format, as wav format allows you to extract better features.
Here, my video file name is Bolna.mp4, I convert this to Bolna.mp3 then to Bolna.wav.
Below are the commands for the conversion process.
Let’s save them in variables as below.
command2mp3 = “ffmpeg -i Bolna.mp4 Bolna.mp3”
command2wav = “ffmpeg -i Bolna.mp3 Bolna.wav”

Don’t forget to give us your 👏 !

How to generate text from a video file using python

Trending AI Articles:

Don’t forget to give us your 👏 !

Written by Akash Bhiwgade