Published on

Understanding Query Intents

In a paper I co-authored (Mai et al., 2024), we explored how to improve LLM performance on the Text-to-SQL task via in-context learning1. For those unfamiliar with the task, the basic idea is to get an intelligent system to automatically generate a SQL query to address your question. It could be as simple as: What is the current time? \longrightarrow SELECT SYSDATE();. But can quickly become much more complex, especially for large databases.

The not-so-obvious intuition here is that LLMs are actually quite proficient at generating SQL. Once the user's intent is clearly understood, writing the actual query is not too difficult2.

We trained an embedding model in a novel way to select in-context examples that share a similar intent to the user's question. Off-the-shelf approaches do a poor job at this—they tend to perform entity matching rather than selecting demonstrations that are actually informative.

One major finding from the paper is that greater linguistic diversity across selected in-context examples (which share similar intents) can help the LLM generalize and understand new questions better. In other words, when a new question is asked, the model is able to look at similar questions written differently to interpret the new question, and, as a result, generate the correct SQL for it.

Another awesome outcome of the paper is that I got to present it at the Table Representation Learning workshop at NeurIPS 2024, and got to visit Vancouver two years after I had left for free (thanks Jeff).

... 484 words in 40 minutes is 12 words per minute

Dai, D., Sun, Y., Dong, L., Hao, Y., Ma, S., Sui, Z., & Wei, F. (2023). Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers. https://arxiv.org/abs/2212.10559
Mai, C., Tal, R., & Mohamed, T. (2024). Learning Metadata-Agnostic Representations for Text-to-SQL In-Context Example Selection. NeurIPS 2024 Third Table Representation Learning Workshop. https://openreview.net/forum?id=HW91Uvfuoh

Footnotes

  1. LLMs can simultaneously learn from examples and solve a similar task in a generalized way (although not too generalized), without updating any model parameters. By increasing inference-time compute, the ability of an LLM to learn from in-context examples can be thought of as implicit fine-tuning (Dai et al., 2023). This view considers transformers as meta-optimizers, and, in the paper, the authors present an interesting dual form between gradient descent and transformer attention.

  2. Here is the grammar for Presto SQL, a flexible open-source dialect. It is only 978 short lines, which for a billion parameter model is not that much to consume and understand. Digression... at one point I attempted to constrain the LLM to only generate tokens that satisfy the grammar. The idea here was to force the LLM to only generate tokens that satisfy valid SQL. The challenge, however, is that it is not trivial constraining LLM token generations while parsing SQL at the same time. Some LLM tokens can contain multiple characters which can span the boundary of two adjacent tokens in the grammar. When this happens, it no longer becomes easy to validate the partial generation against the SQL grammar as the SQL token itself is not complete.