Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Building a Manga Dataset Manga109 with Annotations for Multimedia Applications

90 0 0.0 ( 0 )

Download Cite

Added by Kiyoharu Aizawa Dr. Prof.

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Kiyoharu Aizawa - Azuma Fujimoto - Atsushi Otsubo

Multimedia Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Manga, or comics, which are a type of multimodal artwork, have been left behind in the recent trend of deep learning applications because of the lack of a proper dataset. Hence, we built Manga109, a dataset consisting of a variety of 109 Japanese comic books (94 authors and 21,142 pages) and made it publicly available by obtaining author permissions for academic use. We carefully annotated the frames, speech texts, character faces, and character bodies; the total number of annotations exceeds 500k. This dataset provides numerous manga images and annotations, which will be beneficial for use in machine learning algorithms and their evaluation. In addition to academic use, we obtained further permission for a subset of the dataset for industrial use. In this article, we describe the details of the dataset and present a few examples of multimedia processing applications (detection, retrieval, and generation) that apply existing deep learning methods and are made possible by the dataset.

rate research

A Dataset and Benchmarks for Multimedia Social Analysis

193 - Bofan Xue , David Chan , John Canny 2020

We present a new publicly available dataset with the goal of advancing multi-modality learning by offering vision and language data within the same context. This is achieved by obtaining data from a social media website with posts containing multiple paired images/videos and text, along with comment trees containing images/videos and/or text. With a total of 677k posts, 2.9 million post images, 488k post videos, 1.4 million comment images, 4.6 million comment videos, and 96.9 million comments, data from different modalities can be jointly used to improve performances for a variety of tasks such as image captioning, image classification, next frame prediction, sentiment analysis, and language modeling. We present a wide range of statistics for our dataset. Finally, we provide baseline performance analysis for one of the regression tasks using pre-trained models and several fully connected networks.

Computation and Language Computer Vision and Pattern Recognition Multimedia

Heterogeneous component interactions: Sensors integration into multimedia applications

276 - Christine Louberry 2008

Resource-constrained embedded and mobile devices are becoming increasingly common. Since few years, some mobile and ubiquitous devices such as wireless sensor, able to be aware of their physical environment, appeared. Such devices enable proposing applications which adapt to users need according the context evolution. It implies the collaboration of sensors and software components which differ on their nature and their communication mechanisms. This paper proposes a unified component model in order to easily design applications based on software components and sensors without taking care of their nature. Then it presents a state of the art of communication problems linked to heterogeneous components and proposes an interaction mechanism which ensures information exchanges between wireless sensors and software components.

Multimedia

Building Movie Map -- A Tool for Exploring Areas in a City -- and its Evaluation

69 - Naoki Sugimoto , Yoshihito Ebine , Kiyoharu Aizawa 2020

We propose a new Movie Map system, with an interface for exploring cities. The system consists of four stages; acquisition, analysis, management, and interaction. In the acquisition stage, omnidirectional videos are taken along streets in target areas. Frames of the video are localized on the map, intersections are detected, and videos are segmented. Turning views at intersections are subsequently generated. By connecting the video segments following the specified movement in an area, we can view the streets better. The interface allows for easy exploration of a target area, and it can show virtual billboards of stores in the view. We conducted user studies to compare our system to the GSV in a scenario where users could freely move and explore to find a landmark. The experiment showed that our system had a better user experience than GSV.

Multimedia Computer Vision and Pattern Recognition

An Export Architecture for a Multimedia Authoring Environment

326 - Jan Mikac 2008

In this paper, we propose an export architecture that provides a clear separation of authoring services from publication services. We illustrate this architecture with the LimSee3 authoring tool and several standard publication formats: Timesheets, SMIL, and XHTML.

Multimedia

Kalinahia: Considering Quality of Service to Design and Execute Distributed Multimedia Applications

525 - Sophie Laplace 2008

One of the current challenges of Information Systems is to ensure semi-structured data transmission, such as multimedia data, in a distributed and pervasive environment. Information Sytems must then guarantee users a quality of service ensuring data accessibility whatever the hardware and network conditions may be. They must also guarantee information coherence and particularly intelligibility that imposes a personalization of the service. Within this framework, we propose a design method based on original models of multimedia applications and quality of service. We also define a supervision platform Kalinahia using a user centered heuristic allowing us to define at any moment which configuration of software components constitutes the best answers to users wishes in terms of service.

Multimedia

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Building a Manga Dataset Manga109 with Annotations for Multimedia Applications

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions