Throughout the project, the FU Berlin project team created a very high quality data set of manual film-analytical annotations for a set of feature films, documentaries, and television news. These valuable annotations are published here as Linked Open Data under the CC BY-SA 3.0 license to make the data available to other film scientists as well as researchers from other domains.
The annotations were created based on a film scholar’s analytical framework (eMAEX method) to study the aesthetics of audio-visual images. The annotation work followed a strict annotation routine to precisely describe the films of the corpus under different levels of description (see ontology).
The annotation process is carried out with Advene, a free software toolkit for annotating audio-visual documents. We worked closely with Olivier Aubert, one of the authors of Advene, on the one hand to improve the user interface for faster annotation work, and on the other hand to enable the import of the AdA filmontology and the export of W3C compliant video annotations as RDF data.
To create annotations that conform to the AdA filmontology, you can use the Advene template package that we provide in our GitHub repository.
The datasets created in the AdA project consist of thousands of annotations that use timecode-based references to the original video material. Each annotation refers to a fragment of a movie using start and end timecodes to which the annotation applies. The actual content of the annotation refers to the film-analytical concept described by the annotator as defined by the annotation types and values in the AdA filmontology. Each annotation also contains metadata about the author and the creation date.
For example, the following information is available to characterize camera movement in minute 41 of the feature film “The Company Men”:
|Annotation Type||Camera Movement Type|
|Annotation Value||tracking shot|
Our annotations are encoded using the latest W3C Web Annotation Data Model. An annotation in this model is a relationship between resources, which normally consists of a body (the description) and a target (an external resource, e.g., a movie, an MP3 file, a PDF document). Since we have to refer to parts of external resources (video fragments), we use the standard W3C Media Fragments URI to encode temporal references in URIs. Using the example above, the annotation type and value are encoded in the annotation body and the video segment is referenced with the timecode interval t=2489.900,2510.620.
All annotations are published online in our triplestore. The annotations can be accessed through their URIs. Here are some examples:
|The Company Men||Annotation with one predefined value||Timecode 00:41:29-00:41:50||Camera Movement Type: tracking shot|
|The Company Men||Annotation with evolving values||Timecode 00:41:29-00:41:50||Camera Angle Canted: level [TO] tilt right|
|Inside Job||Annotation with a text value||Timecode 00:01:05-00:01:13||Dialogue transcript|
We also developed the Annotation Explorer. It’s a web-based application for querying, analyzing and visualizing semantic video annotations that provides access to over 90,000 annotations in a consistent way. It is also possible to query the raw RDF data using our public SPARQL endpoint.
All annotation datasets are available for download in our GitHub repository as RDF export in Turtle format and in JSON-LD format. Currently, we provide annotations for the following movies:
|Capitalism: A Love Story||19917|
|Occupy Wall Street||581|
|The Big Short||22892|
|The Company Men||24285|