Skip to content
Rohit Anand edited this page Feb 19, 2019 · 2 revisions

Welcome to the AudFeature_extraction wiki!

for the file feature_extraction.py

The feature that it is extracting on the basis of pitch value are :

  1. min_pitch

  2. max_pitch

  3. mean_pitch

  4. num_voice_breaks

  5. percentage_breaks

  6. speak_rate

  7. num_pause

  8. Total_dur_pause

  9. no. of rise

  10. no. of fall

  11. total duration of the audio file

  12. play_time Logic for finding differnt features are :

  13. min_pitch = just apply the function min over all the numpy array value

  14. max_pitch = just apply the function max over all the numpy array value

  15. mean_pitch = just apply the fucntion mean over all numpy array value

  16. num_voice_breaks = in order to find this value what i did is that whenever there is pitch changes from zero to some value and some value to zero then it means there is some sort of voice breaks and i counted all the occurences and displayed it.

  17. percentage_breaks = total number of voice breaks divides by the lenght of the numpy value.

  18. speak rates = it means we have to find the words per minutes for this what i did is i converted the spoken word into text and then count the total play time by subtracting pause_time from duration_time and then dividing the lenght of the word by the paly time it basically display the word spoken per second.

  19. num_pause = in order to find this value the simple logic that i applied is when pitch is zero it is pause time

  20. Total_dur_pause = for this I find the corrosponding time when pitches are zero and then add all the corrosponding value and got the Total_dur_pause

  21. duration_file = divide the total number of frames with frame rate

  22. play_time = for this subtrace the pause time from duration of the audion file.

  23. num_rise = when pitch is incrasing means rise since it depennds on the frequensy as well as the amplitude

  24. num_fall = when pitch is decreasing.

audio_graph.py

This file is used to represent the audio in different format like spectrogram, spectrogram roll off, spectrogram centroid, mfcc etc.It uses the library librosa in python See the result by running this code

foo@bar python audio_graph.py /audio/human.wav

The above code will dispay some of the important features in terms of graph that is :

  1. spectrogram
  2. Zero cross rating
  3. Zoomed in views
  4. Spectral centroid
  5. Spectral roll off
  6. MFCC

The importnace of spectrogram graph is that it can easily be used as an input feature to any neural networl which can be used to extract some important features.

for file extra_feature_extract.py

This file contain the code that can be used to extract some of the measure and important features from the audio file. The importance is that these features are more important than the other since it contains most of the features that is enough when we train the model.

The features that it extracting are :

  1. ZCR
  2. Energy
  3. Entropy of energy
  4. Spectral centroid
  5. Spectral Entropy
  6. Spectral Flux
  7. Spectral Roll off
  8. MFCC
  9. Chroma vector
  10. Chroma deviation or 'zcr', 'energy', 'energy_entropy', 'spectral_centroid', 'spectral_spread', 'spectral_entropy', 'spectral_flux', 'spectral_rolloff', 'mfcc_1', 'mfcc_2', 'mfcc_3', 'mfcc_4', 'mfcc_5', 'mfcc_6', 'mfcc_7', 'mfcc_8', 'mfcc_9', 'mfcc_10', 'mfcc_11', 'mfcc_12', 'mfcc_13', 'chroma_1', 'chroma_2', 'chroma_3', 'chroma_4', 'chroma_5', 'chroma_6', 'chroma_7', 'chroma_8', 'chroma_9', 'chroma_10', 'chroma_11', 'chroma_12', 'chroma_std' The value will be displayed in terms of an array.
Clone this wiki locally