Using Machine Learning to Classify Movie Reviews
Last weekend I was excited to hang out with my friends and watch a movie, but the problem was that it was just too hard to find a good movie. We sifted through hundreds of reviews and end up watching the WORST MOVIE WE HAD EVER SEEN!
Then out of rage, I decided to use my knowledge of AI and machine learning to create an algorithm that would never put anyone in such a situation again.👍
How Can We Use Machine Learning to Pick a Better Movie?
For this specific problem, we can use supervised machine learning to classify the reviews. A supervised learning algorithm takes a known set of input data and known responses to the data (output) and trains a model to generate reasonable predictions for the response to new data. (Click here for more info on machine learning)
To make this supervised machine learning model we’ll need to do a few things:
- Gather data
Since our model will be based on supervised learning we’ll need a dataset of movie reviews to get started. Datasets can be download form different sites/libraries. For this model, we will be using datasets from Keras. We will also be using a bunch of different modules such as tensor flow and numpy.

2. Cleaning the Data
Getting the data just isn’t enough we need to clean it and reduce the data since. The dataset from Keras includes 50,000 different words We can reduce this size to 10,000 to get rid of some of the rare words which and simplify the data into the top 10,000 most common words. We also need to split the data into training data and testing data.

Moreover, each review has to be the same length since the inputs for the neural network have to be the same. This issue can we solved easily by adding spaces to the shorter movie reviews. For example, one review might be “This was a great movie” and another review might be “bad”. For the neural network to actually process this, we need to change the second review to “bad……………..………………………….”(dots represent spaces).
3. Making the Neural Network
The neural network will be the part of our code making the decision and deciding if the review is a good review or a bad one.

Neural networks work by taking in inputs applying weights and baizes, then processing the information in the hidden layers and then finally outputting the prediction. Our neural network will look similar to the one above.

4. Calculating the Loss function and improving the network
After we have a working neural network we need to make it better by calculating the loss function using cross-entropy.

Using the history variable above we can train our data and improve its accuracy significantly.
And we’re done! You can now scan through thousands of reviews and see if a movie’s worth watching or not.😊
Key Takeaways
- Supervised learning uses labelled data
- gathering data and filtering data is important for accurate results
- Neural networks can be trained using the cost function
- Here’s the full code: https://colab.research.google.com/drive/1canrZ3TPkc8oe7UAPUj-mEAngBHCMIu
If you enjoyed reading this article, please press the👏 button, and follow me to stay updated on my future articles. Also, feel free to share this article with others!
Follow me on Medium and LinkedIn to stay updated with my progress in AI.