Visual narrative is often a combination of explicit information and judicious omissions, relying on the viewer to supply missing details. In comics, most movements in time and space are hidden in the “gutters” between panels. To follow the story, readers logically connect panels together by inferring unseen actions through a process called “closure”. While computers can now describe the content of natural images, in this paper we examine whether they can understand the closure-driven narratives conveyed by stylized artwork and dialogue in comic book panels. We collect a dataset, COMICS, that consists of over 1.2 million panels (120 GB) paired with automatic textbox transcriptions. An in-depth analysis of COMICS demonstrates that neither text nor image alone can tell a comic book story, so a computer must understand both modalities to keep up with the plot. We introduce three cloze-style tasks that ask models to predict narrative and character-centric aspects of a panel given n preceding panels as context. Various deep neural architectures underperform human baselines on these tasks, suggesting that COMICS contains fundamental challenges for both vision and language.
From the introduction:
Comics are fragmented scenes forged into full-fledged stories by the imagination of their readers. A comics creator can condense anything from a centuries-long intergalactic war to an ordinary family dinner into a single panel. But it is what the creator hides from their pages that makes comics truly interesting, the unspoken conversations and unseen actions that lurk in the spaces (or gutters) between adjacent panels. For example, the dialogue in Figure 1 suggests that between the second and third panels, Gilda commands her snakes to chase after a frightened Michael in some sort of strange cult initiation. Through a process called closure , which involves (1) understanding individual panels and (2) making connective inferences across panels, readers form coherent storylines from seemingly disparate panels such as these. In this paper, we study whether computers can do the same by collecting a dataset of comic books (COMICS) and designing several tasks that require closure to solve.
(emphasis in original)
Comic book security: A method for defeating worldwide data slurping and automated analysis.
The authors find that human results easily exceed automated analysis, raising the question of the use of a mixture of text and images as a means to evade widespread data sweeps.
Security based on a lack of human eyes to review content is chancy but depending upon your security needs, it may be sufficient.
For example, a cartoon in a local newspaper that designates a mission target and time, only needs to be secure from the time of its publication until the mission has finished. That it is discovered days, weeks or even months later, doesn’t impact the operational security of the mission.
The data set of cartoons is available at: http://github.com/miyyer/comics.
Guaranteed, algorithmic security is great, but hiding in gaps of computational ability may be just as effective.