We summarize works about the management of data in a distributed manner based on Webdamlog, a datalog-extension.
The Case for Small Data Management
by Jens Dittrich
Room: A102
Chair: Tadeusz Morzy
Abstract
Exabytes of data; several hundred thousand TPC-C transactions per second on a single computing core; scale-up to hundreds of cores and a dozen Terabytes of main memory; scale-out to thousands of nodes with close to Petabyte-sized main memories; and massively parallel query processing are a reality in data management. But, hold on a second: for how many users exactly? How many users do you know that really have to handle these kinds of massive datasets and extreme query workloads? On the other hand: how many users do you know that are fighting to handle relatively small datasets, say in the range of a few thousand to a few million rows per table? How come some of the most popular open source DBMS have hopelessly outdated optimizers producing inefficient query plans? How come people don't care and love it anyway? Could it be that most of the worlds data management problems are actually quite small? How can we increase the impact of database research in areas when datasets are small? What are the typical problems? What does this mean for database research? We discuss research challenges, directions, and a concrete technical solution coined PDbF: Portable Database Files (open source at https://github.com/uds-datalab/PDBF). See also our VLDB 2015 demo.
Tutorials
Towards an Era of Trust in Personal Data Management
by Nicolas Anciaux, Benjamin Nguyen and Iulian Sandu Popa
Room: A101
Abstract
Managing personal data with strong privacy guarantees has become an important topic in an age where your glasses record and share everything you see, your wallet records and shares your financial transactions, and your set-top box records and shares your energy consumption, while several recent affairs have unveiled the severe consequences of the loss of privacy. In this context, more and more alternatives are proposed based on user centric and decentralized solutions, capitalizing on the use of trusted personal devices controlling the data at the edges of the Internet. Decentralized solutions are promising because they do not exhibit the intrinsic limitations of classical centralized solutions, e.g., sudden changes in privacy policies of companies holding the data, data exposures by negligence or because it is regulated by too weak policies, exposure to sophisticated attacks whose benefit/cost ratio is high for centralized databases. Hence, such solutions appear as a sea change for personal data management, where the control over personal data is pushed to the edges of the Internet, within sensors acquiring the data and in a variety of user devices endowed with a form of trust, e.g., tamper-resistant secure hardware-based devices.
This tutorial reviews several existing solutions going in this direction, presents a functional architecture encompassing these alternatives, and exposes the underlying techniques and open issues dealing with user centric and decentralized data management platforms. In a first part, we review the recent initiatives pursuing the objective of reestablishing user control over their data by decentralizing this control in personal secure or trusted devices. We discuss an abstract distributed architecture focusing on secure storing, managing and sharing of personal data, i.e., the asymmetric architecture, and indicate the main challenges inherent to decentralized data management. In a second part, we explore data management techniques exercised within a trusted device at the client side. We review the main attempts proposed in the literature and concentrate on those addressing the specific context of microcontrollers equipping sensors and mobile phones (SIM cards). In a third part, we investigate the problem of performing global processing without any compromise on data privacy. We present the difficulties to overcome to execute privacy preserving computations on populations of personal devices, and illustrate it by focusing on Group By SQL queries and Privacy Preserving Data Publishing. In a fourth part, we conclude the tutorial by presenting existing and future instances of decentralized privacy preserving data management architectures. We mainly focus on attempts and proposals targeting social-medical, smart houses, and rural areas contexts.
Query Processing: Beyond SQL and Relations
by Boris Novikov
Room: A101
Abstract
Query processing and optimization are essential for any data processing system since introduction of high-level declarative query languages in early 80-ies. During the last decade several new techniques were introduced in order to address requirements of new classes of applications, data models, storage and indexing, and querying paradigms.
Modern query processing and optimization extends far beyond relational queries. Several techniques were revised and a number of new techniques have been introduced to make the query processing efficient. Several systems that were originally designed as low-level storage facilities implementing persistence layer, were augmented with high level declarative features. The declarative scripting languages provide a technique for easy-to-understand specification of complex analytical scenarios that look like sequential but are executed on massively parallel systems.
The main focus of this tutorial is on the query optimization and processing in new environments and for new classes of applications.
Although many of declarative languages are designed as extensions to SQL, the internals of the implementations usually have significant differences with well-known optimization and processing techniques developed for relational systems using row-based storage structures.
Column stores are considered to be the most efficient for analytical processing on modern hardware. The physical algebraic operations for column stores differ from those used in row-based ones, and optimization strategies and heuristics are different.
Distributed data processing systems such as Hadoop weren't originally intended for declarative query processing. However, several query languages are implemented on top, bringing back the need for optimization. Examples of these languages and systems include ASTERIX, SCOPE, and Apache Hive.
Processing of semi-structured and unstructured data ultimately requires fuzzy (e.g. similarity) queries resulting in several obstacles for relational optimizers that are mostly oriented on re-ordering of join operations. Although some of recently introduced techniques, such as efficient top-down enumeration algorithms might be helpful, many issues are still open.
Parametric and dynamic optimization techniques seem to be especially useful for distributed heterogeneous environments where availability of data statistics is often severely limited and cost estimations are unreliable.
Finally, holistic optimization is an emerging technology that optimizes the database queries and application together with the goal to improve the overall application performance.
Research Sessions
Database Theory & Access Methods
Room: A101
Chair: Yannis Manolopoulos
Conditional Differential Dependencies (CDDs)by Selasi Kwashie, Jixue Liu, Jiuyong Li and Feiyue Ye (long paper)
Revisiting the Definition of the Relational Tuple Calculusby Bader Albdaiwi and Bernhard Thalheim (short paper)
Improving the Pruning Ability of Dynamic Metric Access Methods with Local Additional Pivots and Anticipation of Informationby Paulo H. Oliveira, Caetano Traina Jr. and Daniel S. Kaster (long paper)
User Requirements & Database Evolution
Room: A102
Chair: Marite Kirikova
Two Phase User Driven Schema Matchingby Nick Bozovic and Vasilis Vasalos (long paper)
CoDEL - A Relationally Complete Language for Database Evolutionby Kai Herrmann, Hannes Voigt, Andreas Behrend and Wolfgang Lehner (long paper)
A Requirements Specification Framework for Big Data Collection and Captureby Noufa Al-Najran and Ajantha Dahanayake (short paper)
Multidimensional Modeling & OLAP
Room: A202
Chair: Orlando Belo
Implementation of multidimensional databases in column-oriented NoSQL systemsby Max Chevalier, Mohammed El Malki, Arlind Kopliku, Olivier Teste and Ronan Tournier (long paper)
A Framework for Building OLAP Cubes on Graphsby Amine Ghrab, Oscar Romero, Sabri Skhiri, Alejandro Vaisman and Esteban Zimányi (long paper)
A Generic Data Warehouse Architecture for Analyzing Workflow Logsby Christian Koncilia, Horst Pichler and Robert Wrembel (long paper)
ETL
Room: A101
Chair: Helena Galhardas
HBelt: Integrating an Incremental ETL Pipeline with a Big Data Store for Real-time Analyticsby Weiping Qu, Sahana Shankar, Sandy Ganza and Stefan Dessloch (long paper)
Two-ETL phases for Data Warehouse creation: Design and Implementationby Ahlem Nabli, Senda Bouaziz, Rania Yangui and Faiez Gargouri (long paper)
AutoScale: Automatic ETL scale processby Pedro Martins, Maryam Abbasi and Pedro Furtado (short paper)
Using a Domain-Specific Language to Enrich ETL Schemasby Orlando Belo, Claudia Gomes, Bruno Oliveira, Ricardo Marques and Vasco Santos (short paper)
Time Series Processing
Room: A102
Chair: Christian Koncilia
ForCE: Is Estimation of Data Completeness Through Time Series Forecasts Feasible?by Gregor Endler, Philipp Baumgärtel, Andreas M. Wahl and Richard Lenz (long paper)
Best-match Time Series Subsequence Search on the Intel Many Integrated Core Architectureby Mikhail Zymbler (long paper)
Feedback Based Continuous Skyline Queries over a Distributed Frameworkby Ahmed Khan Leghari, Jianneng Cao and Yongluan Zhou (long paper)
Continuous Query Processing over Data, Streams and Services: Application to Roboticsby Vasile-Marian Scuturici, Yann Gripay, Jean-Marc Petit, Yutaka Deguchi and Einoshin Suzuki (short paper)
Preferences & Recommender Systems
Room: A202
Chair: Alsayed Algergawy
The Structure of Preference Ordersby Markus Endres (long paper)
Database Querying in the Presence of Suspect Valuesby Olivier Pivert and Henri Prade (short paper)
Context-Awareness and Viewer Behavior Prediction in Social-TV Recommender Systems: Survey and Challengesby Mariem Bambia, Rim Faiz and Mohand Boughanem (short paper)
Generalized Bichromatic Homogeneous Vicinity Query Algorithm in Road Network Distanceby Yutaka Ohsawa, Htoo Htoo, Naw Jacklin Nyunt and Myint Myint Sein (short paper)
Transformation & Extraction
Room: A102
Chair: Robert Wrembel
Direct Transformation Techniques for Compressed Data: General Approach and Application Scenariosby Patrick Damme, Dirk Habich and Wolfgang Lehner (long paper)
Analysis of the Blocking Behaviour of Schema Transformations in Relational Database Systemsby Lesley Wevers, Matthijs Hofstra, Menno Tammens, Marieke Huisman and Maurice van Keulen (long paper)
A Benchmark for Relation Extraction Kernelsby João L. M. Pereira, Helena Galhardas and Bruno Martins (long paper)
Ontologies
Room: A102
Chair: Maria Keet
Ontological Commitments, DL-Lite Logics and Reasoning Tractabilityby Mauricio Minuto Espil, María Gabriela Ojea and Maria Alejandra Ojea (long paper)
SeeCOnt: A New Seeding-based Clustering Approach For Ontology Matchingby Alsayed Algergawy, Samira Babalou, Mohammad J. Kargar and S. Hashem Davarpanah (long paper)
SLA Ontology-Based Elasticity in Cloud Computingby Taher Labidi, Achraf Mtibaa and Faiez Gargouri (short paper)
Advanced Query Processing
Room: A102
Chair: Jaroslav Pokorný
A Self-Tuning Framework for Cloud Storage Clustersby Siba Mohammad, Eike Schallehn and Gunter Saake (long paper)
Incrementally Maintaining Materialized Temporal Views in Column-oriented NoSQL Databases with Partial Deltasby Yong Hu and Stefan Dessloch (short paper)
Towards self-management in a distributed column-store systemby George Chernishev (short paper)
Optimizing Sort in Hadoop using Replacement Selectionby Pedro Martins Dusso, Caetano Sauer and Theo Häerder (long paper)
New Trends in Data
Room: A202
Chair: Einoshin Suzuki
Distributed Sequence Pattern Detection over Multiple Data Streamsby Ahmed Khan Leghari, Jianneng Cao and Yongluan Zhou (long paper)
Relational-Based Sensor Data Cleansingby Nadeem Iftikhar, Xiufeng Liu and Finn Ebertsen Nordbjerg (short paper)
Avoiding Ontology Confusion in ETL Processesby Selma Khouri, Sabrina Abdellaoui and Fahima Nader (short paper)
Towards A Generic Approach for the Management and the Assessment of Cooperative Workby Amina Cherouana, Amina Aouine, Abdelaziz Khadraoui and Latifa Mahdaoui (short paper)
Web Content
Room: A203
Chair: Johann Gamper
Web Content Management Systems Archivabilityby Vangelis Banos and Yannis Manolopoulos (long paper)
MLES: Multilayer Exploration Structure for Multimedia Explorationby Juraj Moško, Jakub Lokoč, Tomáš Grošup, Přemysl Čech, Tomáš Skopal and Jan Lánský (short paper)
Advanced Design Modeling
Room: A101
Chair: Bernhard Thalheim
Evidence-based Languages for Conceptual Data Modelling Profilesby Pablo R. Fillottrani and C. Maria Keet (long paper)
OLAP4Tweets: Multidimensional Modeling of tweetsby Maha Ben Kraiem, Jamel Feki, Kaïs Khrouf, Franck Ravat and Olivier Teste (short paper)
Data Warehouse Design Methods Review: Trends, Challenges and Future Directions for the Healthcare Domainby Christina Khnaisser, Luc Lavoie, Hassan Diab and Jean-François Ethier (short paper)
Performance & Tuning
Room: A202
Chair: Boris Novikov
Partitioning Templates for RDFby Rebeca Schroeder and Carmem S. Hara (long paper)
Efficient Computation of Parsimonious Temporal Aggregationby Giovanni Mahlknecht, Anton Dignös and Johann Gamper (long paper)
TDQMed: Managing Collections of Complex Test Databy Johannes Held and Richard Lenz (long paper)
Approximation & Skyline
Room: A101
Chair: Yannis Manolopoulos
Space-bounded query approximationby Boris Cule, Floris Geerts and Reuben Ndindi (long paper)
Bi-objective Optimization for Approximate Query Evaluationby Anna Yarygina and Boris Novikov (short paper)
Hybrid Web Service Discovery Based on Fuzzy Condorcet Aggregationby Fethallah Hadjila, Amel Halfaoui and Amine Belabed (long paper)
Confidentiality & Trust
Room: A202
Chair: Ladjel Bellatreche
Confidentiality Preserving Evaluation of Open Relational Queriesby Joachim Biskup, Martin Bring and Michael Bulinski (long paper)
A General Trust Management Framework for Provider Selection in Cloud Environmentby Fatima Zohra Filali and Belabbas Yagoubi (long paper)
Sybil Tolerance and Probabilistic Databases to Compute Web Services Trustby Zohra Saoud, Noura Faci, Zakaria Maamar and Djamal Benslimane (long paper)
Workshops
Workshop on Ontologies in Advanced Information Systems (OAIS)
Mobile Co-Authoring of Linked Data in the Cloudby Moulay Driss Mechaoui, Nadir Guetmi and Abdessamad Imine
Ontology based Linkage between Enterprise Architecture, Processes, and Timeby Marite Kirikova, Ludmila Penicina and Andrejs Gaidukovs
Fuzzy Inference-based Ontology Matching Using Upper Ontologyby S. Hashem Davarpanah, Alsayed Algergawy and Samira Babalou
AAn ontology-based approach for handling explicit and implicit knowledge over trajectoriesby Rouaa Wannous, Cécile Vincent, Jamal Malki and Alain Bouju
Interpretation of DD-LOTOS specication by C-DATA*by Toufik Messaoud Maarouk, Djamel Eddine Saïdouni, Rafik Mahdaoui and Hichem Houassi
Workshop on Semantic Web for Cultural Heritage (SW4CH)
Integrating cultural data using an ontology-based framework
by Martin Rezk
Abstract
In this talk we will introduce an ontology-based data access framework that allows to virtually integrate different databases by means of a conceptual layer (an ontology). The ontology provides a convenient query vocabulary to the user, and a unified view of the underlying data. The ontology is connected to the data sources through a declarative specification given in terms of mappings. I will illustrate how to integrate cultural data by relying on a OBDA framework. In particular, I will concentrate on the following crucial questions:
How this paradigm can contribute to ease the access of scholars to cultural heritage data: integrating temporal and spatial data, cross-linking datasets.
What is the theory behind it.
How to map available data sources to an ontology.
How to query the underlying data sources using the terms in the ontology.
How to check consistency of the data sources w.r.t. the ontology.
Session 1: CIDOC CRM Real-life Use
Knowledge Representation in EPNetby Alessandro Mosca, Joé Remesal, Martin Rezk and Guillem Rull
A Pattern-based Framework for Best Practice Implementation of CRM/FRBRooby Trond Aalberg, Audun Vennesland and Maliheh Farrokhnia
Application of CIDOC-CRM for the Russian Heritage Cloud platformby Eugene Cherny, Peter Haase, Dmitry Mouromtsev, Alexey Andreev and Dmitry Pavlov
Session 2: Cultural Heritage Preservation and Enhancement
Designing for Inconsistency – The Dependency-based PERICLES Approachby Jean-Yves Vion-Dury, Nikolaos Lagos, Efstratios Kontopoulos, Marina Riga, Panagiotis Mitzias, Georgios Meditskos, Simon Waddington, Pip Laurenson and Ioannis Kompatsiaris
A Semantic exploration method based on an ontology of 17th century texts on theatre: la Haine du theatreby Chiara Mainardi, Zied Sellami and Vincent Jolivet
Combining semantic and collaborative recommendations to generate personalized museum toursby Idir Benouaret and Dominique Lenne
Session 3: Entity linking for Cultural Heritage
Improving Retrieval of Historical Content with Entity Linkingby Max De Wilde
A Novel Vision for Navigation and Enrichment in Cultural Heritage Collectionsby Joffrey Decourselle, Audun Vennesland, Trond Aalberg, Fabien Duchateau and Nicolas Lumineau
Disambiguation of Named Entities in cultural heritage texts using Linked Data setsby Carmen Brando, Francesca Frontini and Jean-Gabriel Ganascia
Workshop on Big Data Applications and Principles (BIGDAP)
Cross-Checking Data Sources in MapReduceby Foto Afrati, Zaid Momani and Nikos Stasinopoulos
CLUS: Parallel subspace clustering algorithm on SPARKby Bo Zhu, Alexandru Mara and Alberto Mozo
Massively Parallel Unsupervised Feature Selection on Sparkby Bruno Ordozgoiti, Sandra Gómez Canaval and Alberto Mozo
Unsupervised Network Anomaly Detection in Real-time on Big Databy Juliette Dromard, Gilles Roudière and Philippe Owezarski
NPEPE: Massive Natural Computing Engine for Optimally Solving NP-complete Problems in Big Data Scenariosby Sandra Gómez Canaval, Bruno Ordozgoiti Rubio and Alberto Mozo
Andromeda: A System for Processing Queries and Updates on Big XML Documentsby Nicole Bidoit, Dario Colazzo, Carlo Sartiani, Alessandro Solimando and Federico Ulliana
Fast and effective decision support for crisis management by the analysis of people's reactions collected from Twitterby Antonio Attanasio, Louis Jallet, Antonio Lotito, Michele Osella and Francesco Ruà
Adaptive Quality of Experience: a novel approach to real-time big data analysis in core networksby Alejandro Bascuñana, Manuel Lorenzo, Miguel-Ángel Monjas and Patricia Sánchez
A review of scalable approaches for Frequent Itemset Miningby Daniele Apiletti, Paolo Garza and Fabio Pulvirenti
Workshop on Information Systems for AlaRm Diffusion (WISARD)
ADMAN: an Alarm-based mobile Diabetes MANagement system for mobile geriatric teamsby Dana Al Kukhun, Bouchra Soukkarieh and Florence Sèdes
Abduction for Analysing Data Exchange Policiesby Laurence Cholvy
An Architectural Roadmap Towards Building an Alarm Diffusion Systemby Sumit Kalra, T. V. Prabhakar and Saurabh Srivastava
A case study on the influence of the user profile enrichment on Buzz propagation in social media: Experiments on Deliciousby Manel Mezghani, Sirinya On-At, André Peninou, Marie-Françoise Canut, Corinne Amel Zayani, Ikram Amous and Florence Sèdes
Session 2
Critical Information Diffusion Systemsby Rémi Delmas and Thomas Polacsek
Information exchange policies at an organisational level: formal expression and analysisby Claire Saurel
Workshop on Data Centered Smart Applications (DCSA)
A Mutual Resource Exchanging Model and its Applications to Data Analysis in Mobile Environmentby Naofumi Yoshida
Detection of trends and opinions in geo-tagged social text streamsby Jevgenij Jakunschin, Andreas Heuer and Antje Raab-Düsterhöft
Software Architecture for Collaborative Crowd-storming Applicationsby Nouf Jaafar and Ajantha Dahanayake
Gamification in Saudi Society: A Framework to Develop Human Values for Early Generationsby Alia AlBalawi, Bariah AlSaawi, Ghada AlTassan and Zaynab Fakeerah
Joint sessions of the Workshop on Managing Evolving Business Intelligence Systems (MEBIS) and Workshop on GPUs in Databases (GID)