MULTISAT: Multimodal Analysis and Retrieval

of Satelite Images

Motivation

Deep learning and neural networks have recently introduced a paradigm shift in the domain of Earth Observation (EO) to support understanding, interpretability and explainability in Artificial Intelligence. Copernicus data, Very-High Resolution (VHR) commercial satellite images, and other georeferenced data sources are often highly heterogeneous, distributed and semantically fragmented. Handling satellite images of several attributes and resolutions often demand the use of downscaling techniques for more accurate estimations at local scales and effective visualisations on GIS platforms. The value of the original data is therefore increased so as to address these challenges, encouraging the development of deep learning applications of higher value.

Image analysis with novel, supervised, semi-supervised or unsupervised learning, is already part of our lives and is extensively entering the space sector to offer value-added Earth Observation products and services. Large volumes of satellite data are frequently coming to the Earth from the Sentinel constellation, offering a basis for creating value-added products that go beyond the space sector. The visual analysis and data fusion of all streams of data need to take advantage of the existing Data and Information Access Services (DIAS) and High Performance Computing (HPC) infrastructures, when required by the involved end users to deliver fully automated processes in decision support systems. Most importantly, interpretable machine learning techniques should be deployed to unlock the knowledge that is hidden in big Copernicus data.

Organizers

Ilias Gialampoukidis (Centre for Research and Technology Hellas, Information Technologies Institute, Greece)
Stefanos Vrochidis (Centre for Research and Technology Hellas, Information Technologies Institute, Greece)
Ioannis Papoutsis (National Observatory of Athens, Greece)
Guido Vingione (Serco Italy SpA, Italy)
Ioannis Kompatsiaris (Centre for Research and Technology Hellas, Information Technologies Institute, Greece)
Mihai Datcu (DLR, Germany)

MULTIMED: Multimedia and Multimodal Analytics

in the Medical Domain and Pervasive Environments

Motivation

This special session aims at presenting the most recent works and applications in the area of multimedia analysis and digital health solutions in medical domains and pervasive environments.

More specifically, multimedia research is becoming more and more important for the medical domain, where an increasing number of videos and images are integrated in the daily routine of surgical and diagnostic work. This includes management and inspection of the data, visual analytics, as well as learning relevant semantics and using recognition results for optimizing surgical and diagnostic processes. More precisely, in the field of medical endoscopy more and more surgeons go over to record and store videos of their endoscopic procedures, such as surgeries and examinations, in long-term video archives. The recorded endoscopic videos are used later (i) as a valuable source of information for follow-up procedures, (ii) to give information about the procedure to the patients, and (iii) to train young surgeons and teach new operation techniques. Sometimes these videos are also used for manual inspection and assessment of the technical skills of surgeons, with the ultimate goal of improving surgery quality over time. However, although some surgeons record the entire procedure as video, for example in the Netherlands where it is enforced by law, many surgeons frequently record only the most important video segments. One way to support surgeons in accessing endoscopic video archives in a content-based way, i.e. in searching for a specific frame in an endoscopic video, is to automatically segment the video, remove irrelevant content, extract diverse keyframes, and provide an interactive browsing tool, e.g. with hierarchical refinement.

At the same time, the average lifespan increases and the care of diseases related to lifestyle and age, becomes costlier and less accessible. Pervasive eHealth systems seem to offer a promising solution for accessible and affordable self-management of health problems. To fulfill this vision, two important dimensions are the intelligent aggregation, fusion and interpretation of input from diverse IoT devices and personalised feedback delivered to users via intuitive interfaces and modalities. More precisely, pervasive and mobile technologies are one of the leading computing paradigms of the future. Transitioning from the world of personal computing, devices are distributed across the user’s environment, enabling the enrichment of business processes with the ability to sense, collect, integrate and combine multimodal data and services. A key requirement in multimodal domains is the ability to integrate the different pieces of information, so as to derive high-level interpretations. In this context, information is typically collected from multiple sources and complementary modalities, such as from multimedia streams (e.g. using video analysis and speech recognition), lifestyle and environmental sensors. Though each modality is informative on specific aspects of interest, the individual pieces of information themselves are not capable of delineating complex situations. Combined pieces of information on the other hand can plausibly describe the semantics of situations, facilitating intelligent situation awareness. However, the integration of devices and services to deliver novel solutions, in the so-called Internet of Things (IoT), may have been partially addressed with open platforms, but yet imposes further challenges, relevant not only to the heterogeneity, but also to the diverse context-aware information exchange and processing capabilities. On one hand, knowledge-driven approaches, such as rule- and ontology-based approaches, capitalise on knowledge representation formalisms to model activities explicitly by domain experts, combining multimodal information using predefined patterns rather than learning them from data. On the other hand, data-driven approaches rely on probabilistic and statistical models to represent activities and learn patterns from multimodal datasets. Hybrid solutions have shown that they can increase context understanding, using data-driven pre-processing (e.g. the learning of activity models) can increase the performance of ontology-based activity recognition and vice versa. Furthermore, apart from challenges emerging from the need to sense, reason, interpret, learn, predict and adapt, natural human-computer interaction via device agents, robots and avatars can deliver intuitive, personalised and context-aware spoken feedback. For example, wearable devices, smart home equipment and multimedia information can be enriched with face-to-face interactions, motivating people to actively participate in self-care activities and prescribed changes, as well as to promote chronic conditions' management and support of older adults' autonomy. Recently, human-computer interaction and conversational agents have been used in the migration domain, acting as personalised assistants of migrants and refugees supporting them in accessing health facilities, such as Public Health Services, and providing them relevant information for emergency services.

Organizers

Thanassis Mavropoulos (Centre for Research and Technology Hellas, Information Technologies Institute, Greece)
Georgios Meditskos (Aristotle University of Thessaloniki, Greece)
Klaus Schoeffmann (Klagenfurt University, Austria)
Leo Wanner(ICREA - Universitat Pompeu Fabra, Spain)
Stefanos Vrochidis (Centre for Research and Technology Hellas, Information Technologies Institute, Greece)
Athanasios Tzioufas (Medical School of the National and Kapodistrian University of Athens, Greece)

MDRE: Multimedia Datasets for

Repeatable Experimentation

Motivation

Information retrieval and multimedia content access has a long history of comparative evaluation and many of the advances in the area over the past decade can be attributed to the availability of open datasets that support comparative and repeatable experimentation. Sharing data and code to allow other researchers to replicate research results is needed in the multimedia modeling field and this will help to improve the performance of systems and the reproducibility of papers published. In terms of existing state-of-the-art, there is one other related dataset track (at MMSys) and it has been heavily oversubscribed in recent years. Following discussions among members of the MMM Steering Committee at MMM2018, it was agreed that there is a clear need for a single index of (and venue for) datasets related to multimedia modeling.

The original MDRE special session proposal was a direct result of these discussions and our goal is that it will continue as a permanent track at the MMM conference series. Consequently, this proposal is for a special session, but associated with this, we will continue to update a permanent archive of links to MMM datasets (mmdatasets.org) related to this annual special session. This multimedia dataset track will be an opportunity for researchers and practitioners to make their work permanently available and citable in a single forum, as well as to increase the public awareness of their considerable efforts.

Organizers

Cathal Gurrin (Dublin City University, Ireland)
Duc-Tien Dang-Nguyen (University of Bergen, Norway)
Björn Þór Jónsson (IT University of Copenhagen, Denmark)
Adam Jatowt (University of Innsbruck, Austria)
Liting Zhou (Dublin City University, Ireland)
Graham Healy (Dublin City University, Ireland)

MAPTA: Multimedia Analytics:

Perspectives, Tools and Applications

Motivation

Multimedia analytics is a new and exciting research area that combines techniques from multimedia analysis, visual analytics, and data management, with a focus on creating interactive (human in the loop) systems for analysing large-scale multimedia collections. The size and complexity of media collections is ever increasing, as is the desire to harvest useful information from these collections, with expected impacts ranging from the advancement of science to increased company profits. Indeed, multimedia analytics sees potential applications in diverse fields, including data journalism, urban computing, lifelogging, digital heritage, healthcare, digital forensics, marketing, people analytics, natural sciences, and social media. We therefore consider multimedia analytics to be one of the core research challenges of the multimedia research community.

Recently, research papers have been published in MMM and other multimedia conferences, such as ACM Multimedia, ACM ICMR, ACM MMSys and IEEE ICME, that focus on various aspects of multimedia analytics. However, we feel that the community still needs a dedicated and interactive venue for discussion, where definitions and directions can be proposed and debated, and we will provide such a venue in this special session.

Organizers

Björn Þór Jónsson (IT University of Copenhagen, Denmark)
Stevan Rudinac (University of Amsterdam, Netherlands)
Xirong Li (Renmin University of China, China)
Cathal Gurrin (Dublin City University, Ireland)
Laurent Amsaleg (CNRS-IRISA, France)

MACHU: Multimedia Analytics for

Contextual Human Understanding

Motivation

Contextual analysis of human activities is a key underlying challenge for many recommender systems and for personalised information retrieval systems. In recent years the variety and volume of such data for such analysis has increased significantly. In addition, many new application domains have emerged, such as quantified-self, lifelogging and large-scale epidemiological studies. What brings all these domains together is the use of a wide range of multimodal multimedia data sources that are used to model the activities of the individual and the application of various state-of-the-art AI techniques to build semantically rich user models. Such data sources include wearable biometrics and sensors, human activity detectors, location logs, along with various forms of information and knowledge context sources.

This proposal is for a special session focused on discussing the many challenges of modelling users using such contextual human data and will provide a forum for researchers and practitioners to publish and discuss various aspects of human contextual modeling, with a focus not only on the data sources, analytics and applications, but also to support authors who wish to propose position papers on related topics such as the ethics of multimedia analytics for contextual user modeling, or the myriad of privacy-related topics.

Organizers

Duc-Tien Dang-Nguyen (University of Bergen, Norway)
Minh-Son Dao (NICT, Japan)
Cathal Gurrin (Dublin City University, Ireland)
Ye Zheng (South Central University for Nationalities, China)
Thittaporn Ganokratanaa (KMUTT, Thailand)
Hung Tran-The (Deakin University, Australia)
Zhihan Lv (Qingdao University, China)

Special Session Proposals

MULTISAT: Multimodal Analysis and Retrieval

of Satelite Images

Motivation

Organizers

MULTIMED: Multimedia and Multimodal Analytics

in the Medical Domain and Pervasive Environments

Motivation

Organizers

MDRE: Multimedia Datasets for

Repeatable Experimentation

Motivation

Organizers

MAPTA: Multimedia Analytics:

Perspectives, Tools and Applications

Motivation

Organizers

MACHU: Multimedia Analytics for

Contextual Human Understanding

Motivation

Organizers