Six reading lists are currently available; more will be added as the term progresses. Each list is organized into thematic sections with technical notes alongside paper links.

Surveys & Tutorials

Position papers, surveys, and tutorials that orient newcomers to data lake and model lake research. The right entry points before diving into any of the topic-specific lists — these establish problem definitions, major research threads, and open questions.

Data Lake Foundations & Position Papers
Table Discovery Surveys & Tutorials
Data Lake Systems & Concepts
Model Lakes

Open list →

01 — Discovery

Joinable Table Search

Given a query column, find tables in a data lake whose columns can be meaningfully joined with it. Covers set-overlap baselines, embedding-based methods, n-ary joins, transformation-based joins, context-aware approaches, and benchmarks.

Foundations & Set-Overlap Search
Semantic & Embedding-Based Search
N-ary Joins & Composite Keys
Transformation-Based Joins
Context-Aware Joinability
Benchmarks & Applications

Open list →

02 — Discovery

Table Union Search

Given a query table, find tables in a data lake whose rows can be appended to extend it. Covers the foundational formulation, column-semantics methods, deep representation learning, relationship-aware approaches, table-centric and LLM-based methods, and benchmarks.

Foundations
Column-Semantics Methods
Representation Learning
Relationship-Aware Search
Table-Centric & LLM-Based
Novelty, Diversity & Benchmarks

Open list →

03 — Querying

Multi-Table QA & Text-to-SQL

Question answering and SQL generation when the relevant tables must first be found in a data lake. Covers classical single-database text-to-SQL, schema linking at scale, agentic methods, open-domain table QA, multi-table retrieval, and recent data-lake benchmarks.

Foundations & Single-DB Text-to-SQL
Schema Linking at Scale
Agentic & Multi-Step Methods
Open-Domain Table QA
Multi-Table Retrieval for QA
Text-to-SQL over Data Lakes

Open list →

04 — Versioning

Table Version Management

Storing, exploring, and explaining changes across versions of tabular datasets. Covers the foundational storage/recreation tradeoff, version management systems, semantic change explanation, change exploration in the wild, and the theoretical underpinnings.

Foundations & Storage Tradeoffs
Version Management Systems
Semantic Versioning & Change Explanation
Change Exploration & Search
Theoretical Foundations
Background & Antecedents

Open list →

05 — Model Lakes

Model Lake Management

Storing, discovering, versioning, and reasoning about large collections of trained ML models, much as data lakes do for tabular data. Covers the model-lake vision, model management infrastructure, provenance, documentation, model search, and empirical studies of public hubs.

Foundations & Model Lake Vision
Model Management Infrastructure
Provenance & Versioning
Model Documentation
Model Search & Discovery
Empirical Studies of Model Hubs

Open list →

How these lists relate

Join search asks "what extends this table?"; union search asks "what is more of this table?"; multi-table QA asks "which tables, joined or unioned how, will answer this question?"; version management asks "how did this table get here?"; and model lakes generalize all of this from tabular data to trained ML models. All five share techniques — embeddings, search, provenance, governance — so reading them side by side is the fastest way to get oriented in the field.