HADAD: A Lightweight Approach for Optimizing Hybrid Complex Analytics Queries (Extended Version)

03/23/2021
by   Rana Alotaibi, et al.
0

Hybrid complex analytics workloads typically include (i) data management tasks (joins, selections, etc. ), easily expressed using relational algebra (RA)-based languages, and (ii) complex analytics tasks (regressions, matrix decompositions, etc.), mostly expressed in linear algebra (LA) expressions. Such workloads are common in many application areas, including scientific computing, web analytics, and business recommendation. Existing solutions for evaluating hybrid analytical tasks - ranging from LA-oriented systems, to relational systems (extended to handle LA operations), to hybrid systems - either optimize data management and complex tasks separately, exploit RA properties only while leaving LA-specific optimization opportunities unexploited, or focus heavily on physical optimization, leaving semantic query optimization opportunities unexplored. Additionally, they are not able to exploit precomputed (materialized) results to avoid recomputing (part of) a given mixed (RA and/or LA) computation. In this paper, we take a major step towards filling this gap by proposing HADAD, an extensible lightweight approach for optimizing hybrid complex analytics queries, based on a common abstraction that facilitates unified reasoning: a relational model endowed with integrity constraints. Our solution can be naturally and portably applied on top of pure LA and hybrid RA-LA platforms without modifying their internals. An extensive empirical evaluation shows that HADAD yields significant performance gains on diverse workloads, ranging from LA-centered to hybrid.

READ FULL TEXT

page 24

page 28

page 31

page 32

page 34

page 36

page 37

page 39

research
04/12/2020

A Relational Matrix Algebra and its Implementation in a Column Store

Analytical queries often require a mixture of relational and linear alge...
research
05/03/2020

An Algebraic Approach for High-level Text Analytics

Text analytical tasks like word embedding, phrase mining, and topic mode...
research
06/12/2019

Kaskade: Graph Views for Efficient Graph Analytics

Graphs are an increasingly popular way to model real-world entities and ...
research
04/04/2023

High-Throughput Vector Similarity Search in Knowledge Graphs

There is an increasing adoption of machine learning for encoding data in...
research
03/10/2021

Functional Collection Programming with Semi-Ring Dictionaries

This paper introduces semi-ring dictionaries, a powerful class of compos...
research
03/23/2017

Flare: Native Compilation for Heterogeneous Workloads in Apache Spark

The need for modern data analytics to combine relational, procedural, an...
research
08/14/2023

3D Analytics: Opportunities and Guidelines for Information Systems Research

Progress in sensor technologies has made three-dimensional (3D) represen...

Please sign up or login with your details

Forgot password? Click here to reset