Using Machine Learning to Classify Movie Reviews

4 min readNov 15, 2019

Last weekend I was excited to hang out with my friends and watch a movie, but the problem was that it was just too hard to find a good movie. We sifted through hundreds of reviews and end up watching the WORST MOVIE WE HAD EVER SEEN!

Then out of rage, I decided to use my knowledge of AI and machine learning to create an algorithm that would never put anyone in such a situation again.👍

How Can We Use Machine Learning to Pick a Better Movie?

For this specific problem, we can use supervised machine learning to classify the reviews. A supervised learning algorithm takes a known set of input data and known responses to the data (output) and trains a model to generate reasonable predictions for the response to new data. (Click here for more info on machine learning)

To make this supervised machine learning model we’ll need to do a few things:

Gather data

Since our model will be based on supervised learning we’ll need a dataset of movie reviews to get started. Datasets can be download form different sites/libraries. For this model, we will be using datasets from Keras. We will also be using a bunch of different modules such as tensor flow and numpy.

2. Cleaning the Data

Getting the data just isn’t enough we need to clean it and reduce the data since. The dataset from Keras includes 50,000 different words We can reduce this size to 10,000 to get rid of some of the rare words which and simplify the data into the top 10,000 most common words. We also need to split the data into training data and testing data.

Moreover, each review has to be the same length since the inputs for the neural network have to be the same. This issue can we solved easily by adding spaces to the shorter movie reviews. For example, one review might be “This was a great movie” and another review might be “bad”. For the neural network to actually process this, we need to change the second review to “bad……………..………………………….”(dots represent spaces).

3. Making the Neural Network

The neural network will be the part of our code making the decision and deciding if the review is a good review or a bad one.

Neural networks work by taking in inputs applying weights and baizes, then processing the information in the hidden layers and then finally outputting the prediction. Our neural network will look similar to the one above.

4. Calculating the Loss function and improving the network

After we have a working neural network we need to make it better by calculating the loss function using cross-entropy.

Using the history variable above we can train our data and improve its accuracy significantly.

And we’re done! You can now scan through thousands of reviews and see if a movie’s worth watching or not.😊

Key Takeaways

Supervised learning uses labelled data
gathering data and filtering data is important for accurate results
Neural networks can be trained using the cost function
Here’s the full code: https://colab.research.google.com/drive/1canrZ3TPkc8oe7UAPUj-mEAngBHCMIu

If you enjoyed reading this article, please press the👏 button, and follow me to stay updated on my future articles. Also, feel free to share this article with others!

Follow me on Medium and LinkedIn to stay updated with my progress in AI.

Using Machine Learning to Classify Movie Reviews

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Sumeet Pathania

No responses yet