Is the time required to conduct emotion processing on a video roughly equal to that of the video?


I am using the Affectiva’s SDK for Linux to conduct emotion processing on videos. Even though I disabled the visual display which shows the tracking of facial movements, my processing time still seems to be equivalent to that of the video itself. I have been processing videos from 30 seconds to 5 minutes, but I plan to go up to larger videos in the future.

Is the processing time required to detect emotion in a video equivalent to the length of the video itself?

I guess this would make sense if the software is still playing back the entire video to analyze it. Is there a way to speed things up? Perhaps I am missing something?

Thank you for your time, consideration and help! :slight_smile:


Yes, the processing time for a video is proportional to video length, since the video is decoded into its constituent frames for analysis; however it’s not directly related to the video duration.

In other words, a 20 second video would typically take roughly twice as long to process as a 10 second video because the former has twice as many frames as the latter.

The video frames are decoded and processed as quickly as possible; they are not processed at the frame rate of the source video. So a 20 second video would not necessarily take 20 seconds to process, except by coincidence.

You can potentially speed things up a few ways:

  • only activate the classifiers you need (each activated classifier incurs additional processing time)
  • decrease the value of the processFPS parameter you pass to the VideoDetector constructor
  • look at your callback methods to see if there is code there that can be optimized
  • and of course, you could run on a faster machine :slight_smile: