Help Center
How can we help? πŸ‘‹

How to use ChatGPT to generate content for your dataset

Duplicate your text content and create variants to train your AI Block on

It is recommended to train your AI model on real data that is the same as the data you will be making live predictions on, but that is not always possible or realistic. If you only have a few examples of the text you’d like to classify, it is possible to use new AI technologies to generate more examples.

What is ChatGPT?

ChatGPT is a chatbot AI model which interacts in a conversational way. It is trained on the GPT-3 family of language models by OpenAI, and can be successfully used to generate content, and answer questions in a lifelike way. For example, you can ask it questions such as β€œHow would someone respond to a cold outbound marketing email?” and get an answer identical to a typical human response.

How do you use ChatGPT?

First, head over to the ChatGPT website, and log in. Then, make sure you’re in the New chat area, and type your prompt, question, or instruction.

Notion image

It will take a few seconds, and then start generating your data. From here, you can copy and paste the responses into a spreadsheet ready to adjust and label as needed.

Note: ChatGPT is a contextual AI service, meaning that a chat can be continued with additional prompts, and the AI will consider the previous content when formulating its response. If your request is denied (for example, if you ask for impolite responses or bad grammar), close the chat, open a new one, and re-word your request.

Tips for getting the most varied data from ChatGPT

Give it an example, and ask for variations of that example:

Notion image

Ask it to roleplay a scenario to get it into character:

Notion image

Add more requests in the same chat to adjust the responses:

Notion image

Ask for label ideas:

Notion image

Some limitations and cautions on using AI for dataset content generation

The service provided by ChatGPT is limited by the data it is trained on, for example, it will not provide inappropriate requests such as purposeful bad grammar and convincing poor spelling. The responses can be very similar in style, and can cause your dataset to become unbalanced or not reflective of a real piece of data.

There is also a limitation on the number of requests you can make in a given hour - if you see a red banner with this message, you will have to wait an hour to use it again. In addition, the service can be intermittent at times of high volume, and is prone to returning errors.

We always recommend caution when training an AI model on templated or dummy content, as ultimately it will likely not perform as well as real data would.

Did this answer your question?