29.06.2023 | Nikolay Valiotti
How about having an assistant that generates SQL queries using business questions as prompts? Sounds nice, right? But with the boom of large language models (LLMs) all over the internet, it’s not that surprising. What will definitely inspire you is our telling you that a database schema in the JSON format can be taken as input.
And the big news is that you can make use of a tool like that right after reading this blog post since it’s a step-by-step tutorial on how to install and run it. Developed by the brilliant minds behind Valiotti Analytics, this SQL writer uses a Jupyter Notebook and the ChatGPT API and is already released in a public GitHub repository. It is, what’s more, available to everyone!
You can also ask the assistant to compose code in any programming language to visualize the results of a generated SQL query.
Before running the code, make sure you have the OpenAI API key. Also, please note that the use of the API is not free. You can find the API pricing information on the OpenAI website.
1. Clone the repository:
git clone https://github.com/valiotti/chatgpt-sql-data-analyst.git
2. Install the required dependencies:
pip install openai panel
What’s inside this repository?
1. To interact with the ChatGPT assistant, provide your OpenAI API key:
openai.api_key = 'MY_OPENAI_API_KEY'
2. Set up the path and name of the JSON file with the database schema and the database type (for ChatGPT to generate queries for a specific SQL dialect):
JSON_STORE_PATH = './jsons'
JSON_NAME = 'my_schema.json'
DB = 'MY_DB'
3. Run all cells.
4. After running the final cell, the UI for interacting with the ChatGPT tool will appear. Enter a business question in the input field and click the Chat! button. This will send you back an SQL query corresponding to your prompt. If you have a generated SQL query, you can also ask the tool to write code in R, Python, or others to visualize the results.
5. While you chat with the model, your whole chat history is sent as input. This will eventually affect response time, and sooner or later, you will reach the text-length limit. To reset the chat history, execute the cell where context is defined and the next one launching the UI.
There is a point to keep in mind while working with the tool and LLMs in general. Remember to provide clear and detailed instructions in your prompts to ensure accurate results and the best output the model can provide you with. The more defined and well-structured your prompts are, the smoother the output will be.
Since the tool takes a database schema as the context in the JSON format, an easy way to convert SQL scripts will save you a lot of trouble.
1. To convert SQL scripts to the JSON format, provide your OpenAI API key:
openai.api_key = 'MY_OPENAI_API_KEY'
2. Set up the input SQL scripts path and the path and name for a JSON file to store results:
SQL_SCRIPTS_PATH = './sql_scripts'
JSON_STORE_PATH = './jsons'
JSON_NAME = 'my_schema.json'
Please note that one SQL script should correspond to one single database table.
3. Run all cells.
Do you want to make data analysis easier and more efficient? At the moment of writing this article, two corresponding AIPRM templates could be used with the tool. Be sure to check them out!
Do you want to make data analysis easier and more efficient? At the moment of writing this article, two corresponding AIPRM templates could be used with the tool. Be sure to check them out!
Convert SQL to JSON.ipynb
Chat.ipynb
We hope this tool streamlines your workflow and makes data analysis effortless, becoming your personal go-to assistant. We also encourage you to share this blog post on social media to help this solution find more adherents and application cases! Your support will inspire us to create more innovative ChatGPT-based tools like this.
You need this guide, because it includes:
Turn it on to get exclusive guide on modern data stack
Emails suck. This newsletter doesn’t
Subscribe to the newsletter and get the most useful guide on modern data stack
The newsletter
You will also receive other useful materials on data analysis hacks with case examples from our company.