Week 2 Join Table Search

May 24, 2026

This Week

Subject Overview of joinable table discovery. I will introduce the problem and some classic approaches based on value overlap. Specifically, I will overview LSH Ensemble from PVLDB16 and JOSIE from SIGMOD19. Both do single attribute equi-joins. I will then cover MATE from VLDB22 which does multi-attribute equi-joins.

Notice that the reading list below covers semantic joins, applications and benchmarking. Any paper on the list that I have not covered is a possible paper you can present in class.

Reviews For this class, please prepare and upload to dropbox two 1-page (ish) pdf files containing a review of both JOSIE and MATE. I'd like you to include at least 3 strong points about each paper and 3 week points. For each point, please include specific justification/evidence from the paper itself. I would like you to write these yourself and not use AI. Please upload the reviews by Friday May 22nd, 7pm EDT.

The dropbox link will be posted on Piazza.

Please use file naming format: 2-JOSIE-.pdf or 2-MATE0.pdf, respectively.

Recommended

PVLDB2016
LSH Ensemble: Internet-Scale Domain Search for Tables with Containment Joins
E. Zhu, F. Nargesian, K. Q. Pu, R. J. Miller
Tackles the variable-cardinality problem: when column sizes range from hundreds to millions, single LSH indexes degrade badly. The ensemble partitions columns by size and tunes each partition's LSH parameters to approximate the containment measure efficiently.

Required (submit reviews of these)

SIGMOD2019Required
JOSIE: Overlap Set Similarity Search for Finding Joinable Tables in Data Lakes
E. Zhu, D. Deng, F. Nargesian, R. J. Miller
Defines the joinable table search problem as a top-k overlap set similarity query. Introduces a cost-aware algorithm that mixes prefix-filtering with position-list scans to scale to lakes with millions of columns. The reference exact baseline for nearly all follow-on work.
PVLDB2022
MATE: Multi-Attribute Table Extraction
M. Esmailoghli, J.-A. Quiané-Ruiz, Z. Abedjan
Introduces a hash-based index supporting n-ary join discovery via space-efficient "super keys" that combine multiple columns. Demonstrates significant gains over applying unary methods combinatorially.
Complete Joinable Table Search Reading List, contains additional papers that you may choose to present.

Lecture Materials

To be added