Abstract
Determining automatically what constitutes a scene in a video is a challenging task, particularly since there is no precise definition of the term “scene”. It is left to the individual to set attributes shared by consecutive shots which group them into scenes. Certain basic attributes such as dialogs, settings and continuing sounds are consistent indicators. We have therefore developed a scheme for identifying scenes which clusters shots according to detected dialogs, settings and similar audio. Results from experiments show automatic identification of these types of scenes to be reliable.