In this article, we will build a bot using Python and the WhatsApp chats that will speak like you
Have you ever seen those Artificial Intelligence bots on websites that answer your questions? And wondered how are they able to understand my questions and answer them correctly? These are called chatbots which make use of Natural Language Processing to converse in a human manner
Natural Language Processing is the branch of Machine Learning which makes use of various ML algorithms in order to facilitate humans converse with machines in their own language.
Whatsapp. The chatbot will generate messages coming from you and the person you download the chat with. The article is divided into multiple sections as follows:
- Cleaning Chats
- Preprocess Data
- Using the model to speak like you
- Building the API
- Building the UI
- Containerize the application
If you want to jump to the code, here’s a link to the code repository: me_bot
First of all, you need to download the data from WhatsApp
From WhatsApp on your phone, go to any chat and export it by going into the settings. Move the txt file that you receive inside the main folder.
Let’s clean these chats before pre-processing
Let’s dissect this file:
- We start by importing the necessary files and then defining the YOUR_NAME and OTHER_NAME variables (replace these with the names from your WhatsApp chat)
- We then make use of the read_file function that reads the file whose name we pass through the command line
- Our main function, extract_text , generates arrays for all the texts, texts sent by you, texts sent by the other person. It also keeps tracks of whether the last text was sent by you or the other person, in case the messages are multi-line, so they can be attributed to the correct person
- make_directory creates a folder for saving these arrays
- Finally, write_to_files is called on all these arrays to write them to disk using pickle
We can call the file using the following command where you replace “<name_of_text_file.txt>” with the name of your chats file:
python clean_whatsapp_chats.py <name_of_text_file.txt>
Let’s prepare the data before building our WhatsApp chatbot. We will start by importing the necessary libraries and loading the sentence model:
Let’s explain what’s happening in this block of code:
- We start by importing the necessary libraries.
- Then, we define some variables, mainly our media app, i.e, WhatsApp and the link to the model URL.
- In the next few lines, we load a pre-trained sentence model. This model is a Universal Sentence Encoder that encodes text into high dimensional vectors that can be used for text classification, semantic similarity, clustering, and other natural language tasks.
- We use tensorflow_hub to load this model and further use sentence piece library to call functions like encode(change the text to high dimensional vectors) on our chats
Now, let’s define some functions that will read the chats we cleaned in the last section along with generating embeddings for them:
Let’s explain these functions one by one
- process_to_IDs_in_sparse_format : This function takes in the sentences from the chats and returns three things values (IDs generated against every word by the sentence piece model), indices (list of lists containing the index of a sentence and its length) and dense_shape (the shape of the matrix which is the number of sentences by the length of the longest sentence)
- embed_sentence_lite : This function takes in the sentences and calls the above message. Then passes those values to the model to generate the embeddings and returns them
- write_embeddings_to_file : Takes in the whose texts to read and generates embeddings using the above function.
- write_dialogues_to_file : Generates embeddings using the above functions but uses the dialogues dictionary created in the previous section
Let’s call these functions to process our chats:
You can analyze the embeddings by printing them. They are just arrays containing the vectors representing the texts:
Using the model to speak like you
Let’s now get to the fun part, getting the program to speak and respond like you
Let’s start by importing the required libraries and functions defined above along with loading the chats and embeddings we created above:
your_embeddings contain the embeddings of quotes where you have spoken and key_embeddings contain the embeddings of quotes where you have responded
Let’s now build the functions that we will use to speak and respond like you
Let’s explain these functions here:
- find_closest : Finds the closest embeddings with respect to the query we pass. It calculates the distance between all the embeddings and the query embedding and returns the top K indices that close to the query embedding
- speak_like_me : Returns the top K sentences in response to the query sentence. Calls find_closest on your_embeddings and the query embedding and uses the indices to return the sentences
- response_like_me : Same as above but uses key_embeddings for finding the closest sentences
Let’s call these functions and see the output
Building the API
There will be two endpoints for speaking and responding. Let’s first create separate files for the above modules. Here’s the name of all the files that we will be using:
- index.py (for UI)
- Dockerfile (Containerize the application)
- templates/index.html (Contains the UI)
Here’s the directory structure:
We will be using Flask for our purposes as it helps in building simple APIs really fast. I will introduce you to the whole code of API and then explain it.
Let’s dissect this file:
- We start by importing the required libraries. These include Flask and functions from the modules above.
- We define the app variable, which basically represents our web app
- We then use this syntax, @app.route . This basically is a decorator. All you need to know is whenever we hit this route, the function below that is called.
- We have only included the method GET as we are not passing any data from a form.
- We define the routes for speaking and responding like you.
- Notice the last line in each of the endpoints, jsonify . JSON is the standard data format in web applications and hence we change our output to JSON before sending.
- Lastly, we run the app on port 80.
Let’s run this file using the following commands:
Currently, there is no UI so we can’t view the app on our browser. Instead, we will use curl to access the API endpoints. We need to keep the above terminal running and open a new tab for this. We need to pass the query and make sure its named query as it’s named in the api.py , this is required for the API to function correctly
Building the UI
As before, let’s first see the code:
Let’s dissect the code:
- We import and define the app as we did in the api.py file but here we also define the Upload folder for saving the files
- Then we define the route as before but here we are going to define two methods GET and POST as we are passing the query through a form.
- In case of GET , we just render the form.
- For POST method, we read the query and call the functions speak_like_me and respond_like_me
- Finally, we send the sentences to the template.
Let’s now look into the index.html file. Note that the index.html file should be defined in the templates folder otherwise Flask won’t be able to check it.
Here, we check if we have the sentences dictionary and shows the form or the sentences respectively.
We will then run the app using the following command:
Below are the screenshots from the app:
Containerize the Application
To make our app more useful and easily accessible to people, we would containerize it using Docker. We will create a Dockerfile in the same folder as the index.py. Let’s examine the Dockerfile
The commands are executed one by one here.
- First, we install python and create a new working directory and copy all the contents from the current directory to this one.
- Then, we run the commands to upgrade pip and install all the necessary libraries for this.
- Finally, we run the index.py file which runs the server of our app.
To make an image, first, go inside the folder where your Dockerfile exists, then run the following command:
This will take some time to execute. After it’s done you can run your app as follows:
-p tag defines the port that you want the application to run on. You can view the image running on the docker desktop and also view the application in the browser.
I have also pushed this docker image to docker hub so you can download and play around with this application here: me_bot
In this article, we looked into using Python to build a bot that speaks and responds like you using your WhatsApp chats:
- We started by cleaning the chats and making separate lists for each user texts
- We then loaded the sentence models and generated embeddings for the chats
- Then, we used that model to speak and respond like you
- After that, we created an API and a UI for our application using Flask
- Finally, we built a docker image so that anyone can use this application.