Abstract
Determination of scenes from a video is a challenging task. When asking humans for it, results will be inconsistent since the term scene is not precisely defined. It leaves it up to each human to set shared attributes which inte grate shots to scenes. However, consistent results can be found for certain basic attributes like dialogs, same settings and continuing sounds. We have therefore developed a scene determination scheme which clusters shots based on detected dialogs, same settings and similar audio. Our experimental results show that automatic deter mination of these types of scenes can be performed reliably.