Building a Manga Dataset Manga109 with Annotations for Multimedia Applications

90 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Kiyoharu Aizawa Dr. Prof.

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Kiyoharu Aizawa - Azuma Fujimoto - Atsushi Otsubo

الوسائط المتعددة الرؤية الحاسوبية وتمييز الأنماط

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Manga, or comics, which are a type of multimodal artwork, have been left behind in the recent trend of deep learning applications because of the lack of a proper dataset. Hence, we built Manga109, a dataset consisting of a variety of 109 Japanese comic books (94 authors and 21,142 pages) and made it publicly available by obtaining author permissions for academic use. We carefully annotated the frames, speech texts, character faces, and character bodies; the total number of annotations exceeds 500k. This dataset provides numerous manga images and annotations, which will be beneficial for use in machine learning algorithms and their evaluation. In addition to academic use, we obtained further permission for a subset of the dataset for industrial use. In this article, we describe the details of the dataset and present a few examples of multimedia processing applications (detection, retrieval, and generation) that apply existing deep learning methods and are made possible by the dataset.

قيم البحث

193 - Bofan Xue , David Chan , John Canny 2020

We present a new publicly available dataset with the goal of advancing multi-modality learning by offering vision and language data within the same context. This is achieved by obtaining data from a social media website with posts containing multiple paired images/videos and text, along with comment trees containing images/videos and/or text. With a total of 677k posts, 2.9 million post images, 488k post videos, 1.4 million comment images, 4.6 million comment videos, and 96.9 million comments, data from different modalities can be jointly used to improve performances for a variety of tasks such as image captioning, image classification, next frame prediction, sentiment analysis, and language modeling. We present a wide range of statistics for our dataset. Finally, we provide baseline performance analysis for one of the regression tasks using pre-trained models and several fully connected networks.

الحساب واللغة الرؤية الحاسوبية وتمييز الأنماط الوسائط المتعددة

Heterogeneous component interactions: Sensors integration into multimedia applications

267 - Christine Louberry 2008

Resource-constrained embedded and mobile devices are becoming increasingly common. Since few years, some mobile and ubiquitous devices such as wireless sensor, able to be aware of their physical environment, appeared. Such devices enable proposing ap plications which adapt to users need according the context evolution. It implies the collaboration of sensors and software components which differ on their nature and their communication mechanisms. This paper proposes a unified component model in order to easily design applications based on software components and sensors without taking care of their nature. Then it presents a state of the art of communication problems linked to heterogeneous components and proposes an interaction mechanism which ensures information exchanges between wireless sensors and software components.

الوسائط المتعددة

Building Movie Map -- A Tool for Exploring Areas in a City -- and its Evaluation

69 - Naoki Sugimoto , Yoshihito Ebine , Kiyoharu Aizawa 2020

We propose a new Movie Map system, with an interface for exploring cities. The system consists of four stages; acquisition, analysis, management, and interaction. In the acquisition stage, omnidirectional videos are taken along streets in target area s. Frames of the video are localized on the map, intersections are detected, and videos are segmented. Turning views at intersections are subsequently generated. By connecting the video segments following the specified movement in an area, we can view the streets better. The interface allows for easy exploration of a target area, and it can show virtual billboards of stores in the view. We conducted user studies to compare our system to the GSV in a scenario where users could freely move and explore to find a landmark. The experiment showed that our system had a better user experience than GSV.

الوسائط المتعددة الرؤية الحاسوبية وتمييز الأنماط

An Export Architecture for a Multimedia Authoring Environment

320 - Jan Mikac 2008

In this paper, we propose an export architecture that provides a clear separation of authoring services from publication services. We illustrate this architecture with the LimSee3 authoring tool and several standard publication formats: Timesheets, SMIL, and XHTML.

الوسائط المتعددة

Kalinahia: Considering Quality of Service to Design and Execute Distributed Multimedia Applications

520 - Sophie Laplace 2008

One of the current challenges of Information Systems is to ensure semi-structured data transmission, such as multimedia data, in a distributed and pervasive environment. Information Sytems must then guarantee users a quality of service ensuring data accessibility whatever the hardware and network conditions may be. They must also guarantee information coherence and particularly intelligibility that imposes a personalization of the service. Within this framework, we propose a design method based on original models of multimedia applications and quality of service. We also define a supervision platform Kalinahia using a user centered heuristic allowing us to define at any moment which configuration of software components constitutes the best answers to users wishes in terms of service.

الوسائط المتعددة