Column descriptions

The #1 thing you can do to improve how QueryMuse addresses your needs is by adding more detailed information to column descriptions.

Add context for IDs and Enums

Many databases use integer values to represent values. Instead of describing a column with simply “Integer” or “Primary key”, replace the description with something that provides examples or key values.

Don’t be shy, just stuff information in there. In this example, we show the AI that category_id can have 3 meanings:

Untitled

This will allow you to query by terms like “Kitchen” and “Bathroom” in future queries.

Remove red herrings

If you observe the query generator going down bad paths, sometimes pruning your tables and/or columns can help improve its overall behavior.

When it comes to large databases, we’ve found there’s a sweet spot in terms of providing full table context versus providing just enough context to complete a query. You may not need to include all those deprecated columns or metadata columns.

(Side note: this behavior will improve soon with improvements to the way we generate and use embeddings)

Minimizing hallucinations

Language language models such as the one used by QueryMuse can encounter the pitfall of "hallucination", the tendency to invent solutions that seem plausible at a glance but are actually not viable.

We try to identify cases such as these and label them accordingly, but you should always interpret query input for mistakes and unusual references.

The generator will usually admit when it doesn’t know something.

The generator will usually admit when it doesn’t know something.

You should also keep an eye out for references to non-existent tables and columns. Column references are checked by AST parsing for PostgreSQL, MySQL, SQLite, and BigQuery databases.

When possible, we’ll flag unusual tables or columns.

When possible, we’ll flag unusual tables or columns.

If you’re encountering many hallucinations, your best bet is to add more column descriptions (described above) or make the language in your query more precise.