How long does it take an average traveler to pick out a hotel? As far as we know, no scientific research is being done to answer this question.
NB: This is an article from Altexsoft
Yet, real-life experience clearly shows that people spend hours and even days sifting through dozens if not hundreds of options.
The number of things to consider and the variety of reviews from previous guests is mind-blowing. This article describes our experience with using sentiment analysis to produce instant snapshots of feedback to allow travelers to compare different options at a glance and make the best choice in no time.
Solutions of this kind can benefit hoteliers, online travel agencies, booking sites, metasearch and travel review platforms seeking ways to put their customers in more relaxed mood.
Subscribe to our weekly newsletter and stay up to date
What is sentiment analysis?
Sentiment analysis is the technique of capturing the emotional coloring behind the text. It applies natural language processing (NLP) and machine learning to detect, extract, and study customers’ perceptions about a product or service. That’s why this type of examination is often called opinion mining or emotional AI.
The goal of opinion mining is to identify the text polarity, which means to classify it as positive, negative, or neutral. For example, we can say that a comment like
“We stayed at this hotel for five days” is neutral,
“I liked staying here” is positive, and
“I disliked the hotel” is negative.
You can learn much more about the types, tools, and use cases of sentiment analysis in our dedicated blog post. This time, we’ll focus on exactly how we taught machines to recognize emotions across reviews and what lessons we learned from creating an NLP-based tool called Choicy. So, let’s start!
Applying machine learning to sentiment analysis.
Sentiment analysis dataset: Tons of good samples are half the battle
The first step in sentiment analysis is obtaining a training dataset with annotations to tell your algorithm what’s positive or negative in there. Here, you have two options: To create it yourself or to get use of publicly available lexicons.
Do it yourself: when accuracy is a top priority
You don’t need the power of machine learning to predict that a dataset tailor-made for a particular purpose will bring the best results. Yet, the improved efficiency and accuracy comes at a price, as preparing data for sentiment analysis is a time- and labor-intensive process that includes three important steps.
Step 1 — data collection. First, you have to gather real reviews left by hotel guests. The best way to do it is to use feedback from your website. If this option is unavailable, you may try to partner with resources that have ownership of such data. The common method of collecting datasets — scraping — is not recommended as it may entail legal issues. Under the GDPR and CCPA rules you can’t apply this technique to personal data. You also may unwittingly violate property rights of website owners.
Step 2 — sentiment annotation. To make opinions hidden in a review visible to machines, you need to manually assign sentiment labels (positive, neutral or negative) to words and phrases. Data labeling for sentiments is considered reliable when more than one human judge has annotated the dataset. The rule of thumb is to engage three annotators.
Step 3 — text cleansing. Raw hotel reviews contain tons of irrelevant or just meaningless data that can badly affect model accuracy. So, we need to clean them up, which includes: