- This event has passed.
CIS Seminar: “Fast and Effective Analytics for Big Multi-Dimensional Data”
March 21, 2022 at 3:30 PM - 4:30 PM
Today, automated processes, Internet‑of‑Things deployments, and Web and mobile applications generate an overwhelming amount of high‑dimensional data. Meanwhile, computational resources remain limited, and advances in machine learning (ML) create a pressing need to support increasingly expensive and complex analytical tasks. Unfortunately, traditional data management techniques offer limited support for high‑dimensional data, ML tasks, and adaptation to data properties, often resulting in reduced performance. Similarly, due to the difficulty of providing invariances to specific data distortions, applications often resort to inadequate ML methods, reducing their effectiveness.
In my work, I ask how we can address the lack of task‑aware and data‑driven adaptations in data management and ML methods. Specifically, I will discuss three solutions for (i) data representations and (ii) computational methods using techniques to exploit similarities, shapes, densities, and distributions in data. Motivated by the ubiquity of high-dimensional time series, I will first present a similarity-preserving representation to minimize storage footprint and accelerate specific ML analytics for time-series data. Then, I will discuss a variance-aware quantization method for indexing high-dimensional data. Finally, I will present a method for anomaly detection in streaming data to account for distribution drifts. In all three examples, the proposed methods substantially improve performance and accuracy, demonstrating the benefit of designing task-aware and data-driven solutions for large-scale data science applications.
Postdoctoral Researcher, University of Chicago.
John Paparrizos is a postdoctoral researcher at the University of Chicago. He works in the area of advanced database systems with a focus on enabling complex analytics for high-dimensional data, supporting the next generation of data-intensive and machine learning applications. John completed his Ph.D. at Columbia University and earned his M.S. from EPFL. His research has received multiple distinctions, including a “Best of SIGMOD” selection, an ACM SIGMOD Research Highlight Award, a recognition of his Ph.D. thesis at the ACM SIGKDD Dissertation Award competition, and a NetApp Faculty Award. His ideas have been adopted in various domains, including energy, medicine, biology, neuroscience, and organizations, including Fortune 100 companies and the European Space Agency. Several media outlets have covered his research, including The New York Times, Washington Post, Guardian, and MIT Technology Review.