Trending March 2024 # Definitive Guide To Prepare For An Analytics Interview # Suggested April 2024 # Top 12 Popular

You are reading the article Definitive Guide To Prepare For An Analytics Interview updated in March 2024 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested April 2024 Definitive Guide To Prepare For An Analytics Interview

Looking to land your first data science role but struggling to clear interviews? We have curated the most comprehensive course to ace your next data science interview. Hundreds of questions, plenty of videos, and multiple resources – the ultimate course!

Let’s face it! Facing an analytics interview can be daunting at times!

I have met a lot of analysts, who are good analysts when you interact with them informally. But something happens to them, as soon as they enter into an interview!

Have you seen one of these analysts and wondered what happens to them in the room? Or have you faced this situation yourself? This guide is meant to help you / your friend to ace the next analytics interview!

The first thing to keep in mind before appearing analytics interview is:

As long as you know your subject, are a logical person and can stay calm – you can ace these interviews easily!

What is the employer trying to judge you on?

The actual skills, which the employer might be judging on, would vary from employer to employer, but it is likely a mix of the following skills:

Technical skills – comfort and knowledge about various analytical tools

Knowledge of statistics – whether you apply algorithms blindly or actually understand what they do?

Structured thinking – Can you take ambiguous problems and put a framework around them?

Business understanding – How well can you put on your business thinking hat?

Problem solving – Can you provide (out of the box) solutions to problems?

Communication skills – Can you communicate your thoughts clearly and crisply? Can you influence people?

Comfort with numbers – How good are you at crunching them?

Attention to details – Do you pay attention to small details and at them up to see the bigger picture

This article can help you understand the perspective of an employer in some more details.

Types of analytics interviews:

Analytics interviews can be divided in broadly three categories:

The preparation for technical analytics interviews happens over time. These interviews test how much time and efforts have you put, in learning your subject and tools.

If you are really good at what you do, these rounds should be a cake walk. If you are not, the best strategy is to be honest about what you know and what you don’t and let your potential employer know. Here are a few articles to help you with technical interviews:

There is a lot of material available on the internet to prepare for behavioural interview, hence I would skip those details.

Skill assessment interviews:

These are the deciding factor in most of the analytical hiring, and for a good reason – if a person has sound logical skills and can demonstrate good business thinking and logical skills – he can pick up technical skills easily! Since these interviews are aimed to assess various skills, what matters more, is that you demonstrate those skills. The actual answer and solution is irrelevant in most cases. Any hiring manager would prefer a wrong answer with a better approach rather than an accurate answer with bad approach.

Skill interviews, again can be categorized in  2 categories:

Guess estimates

Case studies and role plays


Guess estimates are puzzle like questions, where you are expected to estimate a figure by putting a framework to a question, creating segments, making assumptions and adding up the numbers to arrive at a number.

You can read details on how to ace a guess-estimate along with a few examples here. Here are a few tips I would recommend:

It is the approach that matters – not the exact numbers. However, you should cross-validate numbers once you have them.

Always go top down to solve a problem. Draw neat segmentation & diagrams to illustrate your approach.

Keep a few common starting points / proxies on your finger tips. Population of your country, population across the globe, the GDP of your country are a few good starting points you should definitely remember.

Analyze all possible uses of the subject. E.g. You should consider B2B & B2C markets, if you are asked to estimate market of tablets or smartphones.

Call out assumptions and possible blind spots.

Case studies / Role plays:

Here is what Capital One says about case studies on its website:

Case interviews are broad, two-way discussions rather than one-way tests. You will be assessed more on how you go about dealing with the problem rather than on the specific answers you come up with.

A case typically starts with a broad question providing a business scenario and then narrows down in a particular direction. Cases might also evolve and grow in complexity as the interview progresses. Here is how a typical interview evolves over time:

Here is an example of a typical case study interview. Here is another one.

Following are some best practices to follow in a case study round:

Case study is all about illustrating 3 things – Structure, structure and structure! Focus on putting framework to the problem provided, and you will be safe. Try deviating from it and you’ll find yourself in trouble.

For example, when asked how can you increase Profits for a product company, you should not jump to conclusions like “I’ll improve marketing or I’ll cut costs”. You should say Profits = Revenues – Costs. In order to increase profits, we can either increase Revenues or reduce costs. Revenues can be increased by increasing Sales or increasing the price. Costs can be reduced by doing ….

Keeping a structure will not only help the interviewer understand you better, it will also help you make sure that you have not missed out any thing.

Call out assumptions, whenever you are making them. These could be assumptions about business or the sector in discussion.

Lay out things neatly on paper, such that, they can be re-used later. Most of the times, case studies evolve over time. You will be asked to do similar questions, multiple times under multiple scenario. Keeping them handy can reduce calculation time!

Think out loud – it is the thinking process, which matters. If you are not sure – ask the interviewer rather than staying quite!

Communicate crisply and clearly – if you are not clear about your thoughts, take 2 minutes from the interviewer to arrange your thoughts and then communicate them nicely.

Finally, here is a list of activities / behaviour, you should avoid during the interview. These, along with the best practices mentioned above, should give you enough ammunition to handle any analytics interview.

If you like what you just read & want to continue your 

analytics learning

, subscribe to our emails, follow us on twitter or like our 



image credit: Oregon State University


You're reading Definitive Guide To Prepare For An Analytics Interview

How To Prepare For A Virtual Job Interview

Grabbing that dream job in the current times is not an easy ride, and you need to excel in lots of areas. Your excellent resume certainly creates an impact; however, verbal skills still hold the utmost importance. It is true for a face-to-face, as well as for a virtual job interview. The current scenario indicates that virtual interviews and video introductions will be the most prominent way of hiring in the coming years.

More and more organizations are now using video interviews to shortlist the candidates for various positions. This is not only cost-effective but also timesaving and a comfortable option for both interviewer and the candidate.

Virtual Job Interview Tips

First, get to know the various video conferencing platforms available these days, which can be used by the organization to interview the candidate. These tools are generally sophisticated and highly user friendly. However, the challenge lies in being accustomed to using these tools with finesses and creating a lasting impact on your interviewer, seemingly like face-to-face interaction.

Hence, before knowing the ways to prepare for a virtual job interview, here is a glimpse of specific challenges that arise during the video interviews.

Challenges you may face during a virtual job interview & how to avoid them

2] Besides, bad lighting and inability to view your gestures clearly can annoy the interviewer. Hence, make sure you are sitting in a bright room. The surroundings should look clean and tidy; you must not appear clumsy and unorganized to the interviewer. It is better to have a bare wall or clear background. Make sure there is no clutter in your location. You may also use some professional virtual backgrounds or blur the background, feature available on some platforms. Teams, Zoom, etc, all allow you to set a virtual background. You also have the option to blur the background in Teams and Skype.

3] External disturbances can make a wrong impression on the interviewer. Hence, select a room that causes the least disturbances and external noise. While appearing for an interview from home on a video call, make sure that you are at a quiet and secluded spot where there are minimum distractions and noise. Make sure you are away from the kitchen or living room to avoid unnecessary noise.

4] Shabby dressing and inattentive appearance can create a negative impact. A virtual job interview is as important as a face-to-face interview. Hence, make sure you dress smartly, if not over.

5] Do not look unprepared. Be Technology ready. In case you want to pursue a career online and plan to appear for virtual interviews, it is essential to invest in quality microphones and web cameras. If the inbuilt camera on your laptop or computer is not high quality, it is good to upgrade or arrange a better one. Interviewers may not want to look at a grainy image of their prospective employer.

Check your video quality on your computer camera

Check the microphone to ensure your voice is not breaking or echoing

Check Speakers? headphones to make sure you can listen to what the interviewers are saying

Setup your workspace near Wi-Fi Router or use LAN cable to make sure that the connectivity is good

6] Professional Tip: Avoid taking an interview on a mobile phone. It may not have all the features and holding the smartphone in your hands while appearing for interview/ Video conference may look unprofessional.

How to prepare for a virtual job interview

1] Make sure you have all the paraphernalia sorted and placed right on your desk before the interview starts. This is to ensure you do not look disoriented to the interviewer. Make sure you have your answers, your objectives, and your expectations clearly defined. Do not fumble during the interview and never look for answers on the internet while appearing in a virtual interview.

2] Professional Tip: Always have a printout of essential documents like your resume and experience letters, in front of you.

3] Dress well and look sophisticated. Often people forget that even though the interview is happening from their homes, virtually they are connected to the official group. It is essential to look professional and dress in formal wear like you would do in your face to face interview. Make sure to dress up in a blazer or a formal shirt; the business formals are preferable for virtual interviews. Do not appear shabby and unprepared for the interview.

4] Get to the point fast. The virtual meetings and interviews are timed, so you do not have all the time to start small talk to warm up the group. It is essential to make a connection fast, indulge in personal stories, and share your real-life examples and experiences with the interviewer to make a lasting impact.

5] Genuineness matters. Pretentiousness is often not appreciated, especially when employers are looking for the right candidate for their jobs. Thus, it is essential to be yourself and speak naturally while appearing in a virtual interview. Make the conversations real and engaging instead of looking unreal and giving a prepared speech.

7] Avoid typing or answering text for the entire length of the interview. It appears as a significant distraction to the interviewer. Though you are not in a boardroom, you are virtually inside a meeting room, so you should look attentive and confident.

8] Sit upright with the back straight to look more formal and prepared. People who lean or laid back appear sloppy and disinterested in conversations. It is better to select a comfortable chair and cushion it well, so you do not struggle with your position while seating.

9] Prudently make use of hand gestures to appear confident and informed.

Follow prudent video conferencing etiquette.

Final words

You can successfully make a lasting impression on the interviewer during a virtual job interview if you follow these simple tips and stay prepared beforehand.

Useful link: Best free Job Search Sites for searching for jobs online.

An Esg Guide For Tech Companies

● ESG is essential for tech companies when it comes to winning the war for talent and building trust with investors

Read more: A simple guide to ESG

Why should tech companies care about ESG?

Technology companies also significantly impact society through their products and services.

By encouraging greater social responsibility, ESG helps those in the technology sector provide further consideration to the social impact on cybersecurity, data privacy, and accessibility.

Regarding governance, ESG can encourage technology companies to prioritise issues such as accountability, transparency, and ethical decision-making.

Board diversity, executive compensation, and shareholder engagement can all be considered within this space.

Overall, ESG is vital within the technology sector because it promotes more responsible practices and can lead to better resilience in the long term.

Adapt, build, achieve

Build a better future with the Diploma in Environmental, Social and Governance (ESG).

Download brochure

Book a call

Adapt, build, achieve

Build a better future with the Diploma in Environmental, Social and Governance (ESG).

Download brochure

Book a call

How is ESG measured in technology firms?

ESG is often measured by rating agencies who evaluate companies by looking at ESG performance, among other factors.

Ratings for technology companies can be measured and based on criteria such as energy efficiency, carbon footprint, data privacy, and labour practices.

Technology companies also publish reports which disclose their ESG practices and provide transparency around initiatives, allowing stakeholders to evaluate progress.

The Global Reporting Initiative and the Sustainable Accounting Standards Board provide frameworks for such reports.

Another factor vital in measuring ESG is stakeholder engagement, which focuses on how companies engage with different stakeholder groups and incorporate their feedback into their ESG strategy. This can help to identify ESG-related risks.

Finally, companies can measure their ESG performance against peers by benchmarking to better understand their relative strengths and weaknesses.

How are ESG factors regulated within the technology sector?

ESG is not explicitly regulated within the technology sector, but some frameworks in place encourage companies in this space to consider ESG issues.

These include frameworks such as data privacy regulations, which subject companies to regulations involving GDPR and aim to protect consumers’ personal data.

Some Governments have also set carbon reduction targets, meaning companies must disclose their carbon emissions.

In addition, the EU has introduced a classification system for sustainable investments, including environmental and social factors.

This system aims to direct investments towards more sustainable businesses.

Stock exchanges such as the Nasdaq and New York Stock Exchange have also introduced listing requirements related to ESG.

For example, the Nasdaq requires companies to make disclosures regarding board diversity.

So, while ESG factors are not explicitly regulated within the technology sector, there are several frameworks in place which encourage more sustainable practices.

How can executives play a greater role in shaping this area?

Chief Information Officers are crucial in helping companies meet their ESG goals.

One of the biggest challenges when it comes to implementing ESG policies is the need for good-quality data.

CIOs have many opportunities to drive more sustainable digital transformation and must consider the environmental impact of new technology and infrastructure.

Since data centres must be considered regarding energy consumption, ESG now concerns the IT department more than ever.

Sustainability-related questions have been increasing in proposal requests, and data centres and cloud services are not considered top contributors to the carbon footprint.

These factors can be optimised by working with different suppliers. IT leaders can also adopt tools that help report complex metrics around carbon emissions.

According to the experts, various technologies can help feed into ESG metrics, including blockchain, AI, the Internet of Things, and Virtual Reality.

Why ESG will grow in importance for tech firms

It has been forecasted that ESG assets are set to make up more than a third of assets under management by 2025, so ESG is here to stay.

The S&P 500 technology sector currently makes up a big portion of many ESG funds.

This means there will also be greater scrutiny from shareholders.

Companies in tech, such as Salesforce, Amazon, and Alphabet, have already signed up to support consistent climate reporting.

Companies are increasingly looking toward greater diversity and inclusion practices to win the talent war (see video below), which could also help boost innovation.

By following ESG standards technology, companies can avail of better financing, such as Sustainability Linked Loans.

These loans can have good terms depending on whether businesses meet ESG-related targets.

However, while tech CEOs understand the importance of ESG, it is still difficult to track the data and metrics to measure it.

Research has suggested that there may be a need for more in-house talent in the tech sector to measure and achieve ESG-related goals better.

What should tech companies focus on when it comes to ESG? 

Some areas of focus which might be helpful for companies include power consumption and greenhouse gas emissions and utilising the cloud and serverless architecture.

Supporting corporate wellness and improving issues with the supply chain can also help.

Both digital transformation and ESG apply to how businesses can increase efficiency and better business outcomes.

Both topics look beyond the organisation’s immediate success, and it is essential to consider how valuable technologies could be scaled to achieve ESG-related goals better.

Innovations within the IT world can make ESG practices much more efficient.

Unfortunately, a high level of computer power can come with an equally high level of energy consumption.

So the environmental impact of this must be considered when it comes to the planning of tech companies.

Both big data and machine learning have allowed for the optimisation of production schedules, and companies have increased opportunities to make a difference in this space.

Customers have also been found to increasingly reference ESG criteria when choosing ICT providers to work with, which is a growing issue due to technological change.

With many IT vendors competing for space in the market, it is now easier than ever for customers to filter out certain vendors based on ESG-related criteria.

Technology companies in the spotlight

Given the global dominance of technology companies, it is no surprise that they have found themselves very much in the spotlight regarding ESG practices.

They have led in climate change, diversity, and inclusion.

By following such criteria, technology companies can help to better align themselves with customer preferences and showcase their commitment to societal issues.

Tech giants in the US have all now made net zero or carbon neutral pledges, which can have a further knock-on effect of encouraging their vendors and suppliers to reduce their emissions.

In the future, there will be more reporting requirements around ESG placed on technology companies, so companies must be able to identify appropriate data sources.

Businesses should take note of a younger demographic of investors now getting involved.

There has never been a better time to focus on ESG-related priorities.

While ESG reporting is currently voluntary, it may not continue to be so. In this case, those who position themselves well to address these issues now will be better equipped to manage risk in the future.

A Guide To Building An End

This article was published as a part of the Data Science Blogathon.

Knock! Knock!

Who’s there?

It’s Natural Language Processing!

Today we will implement a multi-class text classification model on an open-source dataset and explore more about the steps and procedure. Let’s begin.

Table of Contents


Loading the data

Feature Engineering

Text processing

Exploring Multi-classification Models

Compare Model performance



Dataset for Text Classification

The dataset consists of real-world complaints received from the customers regarding financial products and services. The complaints are labeled to a specific product. Hence, we can conclude that this is a supervised problem statement, where we have the input and the target output for that. We will play with different machine learning algorithms and check which algorithm works better.

Our aim is to classify the complaints of the consumer into predefined categories using a suitable classification algorithm. For now, we will be using the following classification algorithms.

Linear Support Vector Machine (LinearSVM)

Random Forest

Multinomial Naive Bayes

Logistic Regression.

Loading the Data

Download the dataset from the link given in the above section. Since I am using Google Colab, if you want to use the same you can use the Google drive link given here and import the dataset from your google drive. The below code will mount the drive and unzip the data to the current working directory in colab.

from google.colab import drive drive.mount('/content/drive') !unzip /content/drive/MyDrive/

First, we will install the required modules.

Pip install numpy

Pip install pandas

Pip install seaborn

Pip install scikit-learn

Pip install scipy

Ones everything successfully installed, we will import required libraries.

import os import pandas as pd import numpy as np from scipy.stats import randint import seaborn as sns # used for plot interactive graph. import matplotlib.pyplot as plt import seaborn as sns from io import StringIO from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.feature_selection import chi2 from IPython.display import display from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import TfidfTransformer from sklearn.naive_bayes import MultinomialNB from sklearn.linear_model import LogisticRegression from sklearn.ensemble import RandomForestClassifier from chúng tôi import LinearSVC from sklearn.model_selection import cross_val_score from sklearn.metrics import confusion_matrix from sklearn import metrics

Now after this let us load the dataset and see the shape of the loaded dataset.

# loading data df = pd.read_csv('/content/rows.csv') print(df.shape)

From the output of the above code, we can say that the dataset is very huge and it has 18 columns. Let us see how the data looks like. Execute the below code.


Now, for our multi-class text classification task, we will be using only two of these columns out of 18, that is the column with the name ‘Product’ and the column ‘Consumer complaint narrative’. Now let us create a new DataFrame to store only these two columns and since we have enough rows, we will remove all the missing (NaN) values. To make it easier to understand we will rename the second column of the new DataFrame as ‘consumer_complaints’.

# Create a new dataframe with two columns df1 = df[['Product', 'Consumer complaint narrative']].copy() # Remove missing values (NaN) df1 = df1[pd.notnull(df1['Consumer complaint narrative'])] # Renaming second column for a simpler name df1.columns = ['Product', 'Consumer_complaint'] print(df1.shape) df1.head(3).T

We can see that after discarding all the missing values, we have around 383k rows and 2 columns, this will be our data for training. Now let us check how many unique products are there.


There are 18 categories in products. To make the training process easier, we will do some changes in the names of the category.

# Because the computation is time consuming (in terms of CPU), the data was sampled df2 = df1.sample(10000, random_state=1).copy() # Renaming categories df2.replace({'Product': {'Credit reporting, credit repair services, or other personal consumer reports': 'Credit reporting, repair, or other', 'Credit reporting': 'Credit reporting, repair, or other', 'Credit card': 'Credit card or prepaid card', 'Prepaid card': 'Credit card or prepaid card', 'Payday loan': 'Payday loan, title loan, or personal loan', 'Money transfer': 'Money transfer, virtual currency, or money service', 'Virtual currency': 'Money transfer, virtual currency, or money service'}}, inplace= True) pd.DataFrame(df2.Product.unique())

The 18 categories are now reduced to 13, we have combined ‘Credit Card’ and ‘Prepaid card’ to a single class and so on.

Now, we will map each of these categories to a number, so that our model can understand it in a better way and we will save this in a new column named ‘category_id’. Where each of the 12 categories is represented in numerical.

# Create a new column 'category_id' with encoded categories df2['category_id'] = df2['Product'].factorize()[0] category_id_df = df2[['Product', 'category_id']].drop_duplicates() # Dictionaries for future use category_to_id = dict(category_id_df.values) id_to_category = dict(category_id_df[['category_id', 'Product']].values) # New dataframe df2.head()

Let us visualize the data, and see how many numbers of complaints are there per category. We will use Bar chart here.

fig = plt.figure(figsize=(8,6)) colors = ['grey','grey','grey','grey','grey','grey','grey','grey','grey', 'grey','darkblue','darkblue','darkblue'] df2.groupby('Product').Consumer_complaint.count().sort_values().plot.barh( ylim=0, color=colors, title= 'NUMBER OF COMPLAINTS IN EACH PRODUCT CATEGORYn') plt.xlabel('Number of ocurrences', fontsize = 10);

Above graph shows that most of the customers complained regarding:

Credit reporting, repair, or other

Debt collection


Text processing

The text needs to be preprocessed so that we can feed it to the classification algorithm. Here we will transform the texts into vectors using Term Frequency-Inverse Document Frequency (TFIDF) and evaluate how important a particular word is in the collection of words. For this we need to remove punctuations and do lower casing, then the word importance is determined in terms of frequency.

We will be using TfidfVectorizer function with the below parameters:

min_df: remove the words which has occurred in less than ‘min_df’ number of files.

Sublinear_tf: if True, then scale the frequency in logarithmic scale.

Stop_words: it removes stop words which are predefined in ‘english’.

tfidf = TfidfVectorizer(sublinear_tf=True, min_df=5, ngram_range=(1, 2), stop_words='english') # We transform each complaint into a vector features = tfidf.fit_transform(df2.Consumer_complaint).toarray() labels = df2.category_id print("Each of the %d complaints is represented by %d features (TF-IDF score of unigrams and bigrams)" %(features.shape))

Now, we will find the most correlated terms with each of the defined product categories. Here we are finding only three most correlated terms.

# Finding the three most correlated terms with each of the product categories N = 3 for Product, category_id in sorted(category_to_id.items()): features_chi2 = chi2(features, labels == category_id) indices = np.argsort(features_chi2[0]) feature_names = np.array(tfidf.get_feature_names())[indices] unigrams = [v for v in feature_names if len(v.split(' ')) == 1] bigrams = [v for v in feature_names if len(v.split(' ')) == 2] print(" * Most Correlated Unigrams are: %s" %(', '.join(unigrams[-N:]))) print(" * Most Correlated Bigrams are: %s" %(', '.join(bigrams[-N:])))

* Most Correlated Unigrams are: overdraft, bank, scottrade * Most Correlated Bigrams are: citigold checking, debit card, checking account * Most Correlated Unigrams are: checking, branch, overdraft * Most Correlated Bigrams are: 00 bonus, overdraft fees, checking account * Most Correlated Unigrams are: dealership, vehicle, car * Most Correlated Bigrams are: car loan, vehicle loan, regional acceptance * Most Correlated Unigrams are: express, citi, card * Most Correlated Bigrams are: balance transfer, american express, credit card * Most Correlated Unigrams are: report, experian, equifax * Most Correlated Bigrams are: credit file, equifax xxxx, credit report * Most Correlated Unigrams are: collect, collection, debt * Most Correlated Bigrams are: debt collector, collect debt, collection agency * Most Correlated Unigrams are: ethereum, bitcoin, coinbase * Most Correlated Bigrams are: account coinbase, coinbase xxxx, coinbase account * Most Correlated Unigrams are: paypal, moneygram, gram * Most Correlated Bigrams are: sending money, western union, money gram * Most Correlated Unigrams are: escrow, modification, mortgage * Most Correlated Bigrams are: short sale, mortgage company, loan modification * Most Correlated Unigrams are: meetings, productive, vast * Most Correlated Bigrams are: insurance check, check payable, face face * Most Correlated Unigrams are: astra, ace, payday * Most Correlated Bigrams are: 00 loan, applied payday, payday loan * Most Correlated Unigrams are: student, loans, navient * Most Correlated Bigrams are: income based, student loan, student loans * Most Correlated Unigrams are: honda, car, vehicle * Most Correlated Bigrams are: used vehicle, total loss, honda financial

Exploring Multi-classification Models

The classification models which we are using:

Random Forest

Linear Support Vector Machine

Multinomial Naive Bayes

Logistic Regression.

For more information regarding each model, you can refer to their official guide.

Now, we will split the data into train and test sets. We will use 75% of the data for training and the rest for testing. Column ‘consumer_complaint’ will be our X or the input and the product is out Y or the output.

X = df2['Consumer_complaint'] # Collection of documents y = df2['Product'] # Target or the labels we want to predict (i.e., the 13 different complaints of products) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state = 0)

We will keep all the using models in a list and loop through the list for each model to get a mean accuracy and standard deviation so that we can calculate and compare the performance for each of these models. Then we can decide with which model we can move further.

models = [ RandomForestClassifier(n_estimators=100, max_depth=5, random_state=0), LinearSVC(), MultinomialNB(), LogisticRegression(random_state=0), ] # 5 Cross-validation CV = 5 cv_df = pd.DataFrame(index=range(CV * len(models))) entries = [] for model in models: model_name = model.__class__.__name__ accuracies = cross_val_score(model, features, labels, scoring='accuracy', cv=CV) for fold_idx, accuracy in enumerate(accuracies): entries.append((model_name, fold_idx, accuracy)) cv_df = pd.DataFrame(entries, columns=['model_name', 'fold_idx', 'accuracy'])

The above code will take sometime to complete its execution.

Compare Text Classification Model performance

Here, we will compare the ‘Mean Accuracy’ and ‘Standard Deviation’ for each of the four classification algorithms.

mean_accuracy = cv_df.groupby('model_name').accuracy.mean() std_accuracy = cv_df.groupby('model_name').accuracy.std() acc = pd.concat([mean_accuracy, std_accuracy], axis= 1, ignore_index=True) acc.columns = ['Mean Accuracy', 'Standard deviation'] acc

From the above table, we can clearly say that ‘Linear Support Vector Machine’ outperforms all the other classification algorithms. So, we will use LinearSVC to train model multi-class text classification tasks.

plt.figure(figsize=(8,5)) sns.boxplot(x='model_name', y='accuracy', data=cv_df, color='lightblue', showmeans=True) plt.title("MEAN ACCURACY (cv = 5)n", size=14);

Evaluation of Text Classification Model

Now, let us train our model using ‘Linear Support Vector Machine’, so that we can evaluate and check it performance on unseen data.

X_train, X_test, y_train, y_test,indices_train,indices_test = train_test_split(features, labels, df2.index, test_size=0.25, random_state=1) model = LinearSVC(), y_train) y_pred = model.predict(X_test)

We will generate claasifiaction report, to get more insights on model performance.

# Classification report print('ttttCLASSIFICATIION METRICSn') print(metrics.classification_report(y_test, y_pred, target_names= df2['Product'].unique()))

From the above classification report, we can observe that the classes which have a greater number of occurrences tend to have a good f1-score compared to other classes. The categories which yield better classification results are ‘Student loan’, ‘Mortgage’ and ‘Credit reporting, repair, or other’. The classes like ‘Debt collection’ and ‘credit card or prepaid card’ can also give good results. Now let us plot the confusion matrix to check the miss classified predictions.

conf_mat = confusion_matrix(y_test, y_pred) fig, ax = plt.subplots(figsize=(8,8)) sns.heatmap(conf_mat, annot=True, cmap="Blues", fmt='d', xticklabels=category_id_df.Product.values, yticklabels=category_id_df.Product.values) plt.ylabel('Actual') plt.xlabel('Predicted') plt.title("CONFUSION MATRIX - LinearSVCn", size=16);

From the above confusion matrix, we can say that the model is doing a pretty decent job. It has classified most of the categories accurately.


Let us make some prediction on the unseen data and check the model performance.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state = 0) tfidf = TfidfVectorizer(sublinear_tf=True, min_df=5, ngram_range=(1, 2), stop_words='english') fitted_vectorizer = tfidf_vectorizer_vectors = fitted_vectorizer.transform(X_train) model = LinearSVC().fit(tfidf_vectorizer_vectors, y_train)

Now run the prediction.

complaint = """I have received over 27 emails from XXXX XXXX who is a representative from Midland Funding LLC. From XX/XX/XXXX I received approximately 6 emails. From XX/XX/XXXX I received approximately 6 emails. From XX/XX/XXXX I received approximately 9 emails. From XX/XX/XXXX I received approximately 6 emails. All emails came from the same individual, XXXX XXXX. It is becoming a nonstop issue of harassment.""" print(model.predict(fitted_vectorizer.transform([complaint]))) complaint = """Respected Sir/ Madam, I am exploring the possibilities for financing my daughter 's XXXX education with private loan from bank. I am in the XXXX on XXXX visa. My daughter is on XXXX dependent visa. As a result, she is considered as international student. I am waiting in the Green Card ( Permanent Residency ) line for last several years. I checked with Discover, XXXX XXXX websites. While they allow international students to apply for loan, they need cosigners who are either US citizens or Permanent Residents. I feel that this is unfair. I had been given mortgage and car loans in the past which I closed successfully. I have good financial history. print(model.predict(fitted_vectorizer.transform([complaint]))) complaint = """They make me look like if I was behind on my Mortgage on the month of XX/XX/2024 & XX/XX/XXXX when I was not and never was, when I was even giving extra money to the Principal. The Money Source Web site and the managers started a problem, when my wife was trying to increase the payment, so more money went to the Principal and two payments came out that month and because I reverse one of them thru my Bank as Fraud they took revenge and committed slander against me by reporting me late at the Credit Bureaus, for 45 and 60 days, when it was not thru. Told them to correct that and the accounting department or the company revert that letter from going to the Credit Bureaus to correct their injustice. The manager by the name XXXX requested this for the second time and nothing yet. I am a Senior of XXXX years old and a Retired XXXX Veteran and is a disgraced that Americans treat us that way and do not want to admit their injustice and lies to the Credit Bureau.""" print(model.predict(fitted_vectorizer.transform([complaint])))

The model is not perfect, yet it is performing very good.

The notebook is available here.


We have implemented a basic multi-class text classification model, you can play with other models like Xgboost, or you can try to compare multiple model performance on this dataset using a machine learning framework called AutoML. This is not yet, still there are complex problems associated within the multi-class text classification tasks, you can always explore more and acquire new concepts and ideas about this topic. That’s It!!

Thank you!

All images are created by the author.

My LinkedIn

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion


How To Set Up Google Analytics For Shopify

If you are interested in scaling your Shopify business, you will need to make adjustments based on data analytics and not assumptions.

Implementing Google Analytics will give you the information you need to make store optimizations that increase your search visibility, conversions, and revenue.

Implementing Google Analytics for your Shopify store may seem daunting but in this guide, you’ll find answers to these common questions and challenges:

What is Google Analytics?

How to Implement the Google Analytics Code on Your Shopify Site.

What to Track with Google Analytics.

Key Metrics to Know in Google Analytics.

What Is Google Analytics?

Google Analytics is a free web analytics platform that websites can implement throughout their website to track a variety of metrics.

With Google Analytics properly implemented, you will be able to determine whether your campaigns were effective. You can see how many users interacted with your campaign, how many converted, and the total revenue generated through your efforts.

This essential data analytics platform will also enable you to see all of your marketing funnels in one place.

Why Do I Need Google Analytics If I Already Have Shopify Analytics?

You may be thinking your Shopify Analytics platform offers all of the information you need to make an educated business decision.

But while this provides a general summary of how your store is performing, it does not give you all the information you need to make the best-informed business decisions.

Your Shopify Analytics dashboard can show you total sales, average order value, and conversion rate. It also has pre-built reports that require very little setup on your part.

This analytics source is convenient, but it does have some pretty impactful drawbacks.

For example, if you are looking to compare different traffic sources or what types of devices are commonly used when making a purchase, you will not be able to see this breakdown unless you use Google Analytics.

You can also use Google Analytics to identify how users engage with your site. If you discover your site is primarily visited through a mobile device, your team can begin to optimize your store’s UX experience for those end users.

Google Analytics is a little more difficult to use, but it is the best tool to give a robust overview of how customers interact with your online store.

You can gather information needed to segment your customers into more effective marketing funnels, and create marketing campaigns that scale with your business.

All of this information is free with Google Analytics. If you want to access more reports with Shopify, you will need to upgrade to a more costly plan.

How To Implement The Google Analytics Code On Your Shopify Site

To help make this as simple as possible, we will break down the step-by-step process of implementing the Google Analytics code on your Shopify site.

1. Create A Google Account For Your Business

You may already have a Gmail associated with your Shopify site, no worries, you can use this to access Google Analytics. If you do not have a pre-existing account, you will need to set up a free account to access Google Analytics.

It is not recommended to use your personal Gmail for your online store. Keeping them separate will help you manage access to your business information.

2. Create An Analytics Account

Google has two different types of Analytics platforms, Universal Analytics and Google Analytics 4.

Universal Analytics is commonly referred to as the “old” Analytics.

Google Analytics 4 is the latest version, and it gives stores more detailed data analytics and cross-device measurement capabilities. Currently, Shopify does not support Google Analytics 4.

Until a change has been made, you will need to create a Universal Analytics account.

You can follow this guide to help you create your Universal Google Analytics account.

3. Enable Google Analytics

Your Shopify site might already be enabled. Before you try to enable your code, double-check that it is not already enabled.

If you see a number code that begins with UA, your Google Analytics account is activated.

If you find one of these tags, it means your Analytics code has been enabled on your website.

If you have not enabled Google Analytics on your Shopify site, follow the steps below.

How To Enable Google Analytics

Within Google Analytics

Login to your Google Analytics account using the Gmail account you created from previous steps.

Create your account and select your data sharing preferences.

Choose a Property name, time zone, and currency.

Select Advanced Options.

Create a Universal Analytics Property – Google Analytics 4 is selected by default but is not able to be used on Shopify. Make sure you select Universal Analytics to proceed.

Once you proceed, in the “About Your Business” section, you will need to select the appropriate settings for your store.

Select “Create.”

Accept Terms and Service Agreement.

Copy your Tracking ID – the number will start with UA.

Within Shopify

Open your Shopify admin account.

Go to Online Store – Preferences

Next to Google Analytics, enter the Tracking ID you copied from our previous step.

Remember to remove password protection from your Shopify store. This will make sure Google Analytics will show data.

You did it! Google Analytics is now enabled on your Shopify site.

How To Enable Ecommerce Tracking Codes

Enabling ecommerce tracking codes gives you more insight into your customer’s user experience.

You can use two different options of ecommerce tracking codes.

Basic Ecommerce Tracking

Track transaction and revenue data using a confirmation landing page.

How to enable basic ecommerce tracking in GA:

Open Google Analytics

Turn on the Enable Ecommerce button

Enhanced Ecommerce Tracking

Track every page a user visits before they make a purchase. This includes tracking a user as they peruse different product pages, tracking what is added to their cart, and whether or not they make a purchase or a return.

How to enable enhanced ecommerce tracking:

Open Shopify admin page.

Online Store → Preferences.

Open Google Analytics.

Turn on the Enhanced Ecommerce Tracking Button.

What To Track With Google Analytics

Focusing on traffic is helpful, but it won’t give you enough info to make long-term meaningful optimizations for your business.

Making Google Analytics a beneficial tool requires that you to ask specific questions and compare analytics over different periods of time.

Here are some examples of the questions you should focus on answering using Google Analytics.

Who Are Your Best Customers?

Use Google Analytics as a tool to help you build buyer profiles that can influence how you manage your marketing campaigns to help maximize conversions.

Analyze Demographics, Interest, And Geographic Data

Access demographics by navigating to Audience → Demographics → Overview. This dashboard will help you identify what age group and gender of people most commonly convert on your website.

Likewise, if you notice a geographic region that does not convert, you can limit your spend in those areas.

How Do People Interact With Your Shopify Site?

You may have an ideal image in your head about how your users navigate and interact with your site. However, it’s entirely possible that your end users are not using your site the same way you imagined.

You can find a general overview of behavioral information by going to Behavior → Overview.

From here, you can drill down into site content and landing pages and see whether you have outstanding product pages or some that may require a bit of work.

Key Metrics To Know In Google Analytics

Sessions: Interactions with your Shopify store during a specific period of time.

Page Views: The total number of page views on a certain page. Repeated page views are counted. So if you look at one product, navigate to another product for comparison and decide to come back to the first product page, it would count as three different page views.

Unique Page Views: The number of sessions during which the specific page was viewed at least once. Several visits to your product page A during the same session would count as one unique page view.

Average Time on Page: How long an end-user will stay on a webpage.

Revenue: Total revenue generated through the website.

Transactions: Number of purchases made.

Conversion Rate: Number of sessions that ended with a transaction.

Now that you’ve got everything set up and have a general understanding of the key metrics in GA, here’s a great guide on everything you need to know about Google Analytics for Shopify to help you get the most out of it.

This resource will help you truly understand the data in your account and use it to make more informed decisions to help you scale.

More resources:

Featured image: Kaspars Grinvalds/Shutterstock

5 Presentations Ceos Must Prepare

Being a CEO means wearing many hats ― particularly when communicating your venture’s vision and future plans. While your core ideas and values will remain consistent, you must be prepared to communicate with various audiences, including investors, customers, employees and peers. Tailoring your messages to your audiences’ diverse interests and needs is crucial.

We’ll highlight five presentations every CEO should craft, practice and have ready to go when the moment calls. While your industry and the unique aspects of your organization will inform your presentations, all leaders can benefit from having these pitches and presentations at the ready.

Did You Know?

According to Decktopus, professionals prefer presentations to be 10 to 15 minutes long, have more visuals than text and be interactive. However, ensuring accurate data visualizations is critical.

5 presentations CEOs must have ready

While industry trends and unique situations will dictate a CEO’s specific presentation needs, all leaders need the following five presentations. Be sure to practice and hone your presentations so they’re polished and professional ― you don’t want your business to go viral for the wrong reasons.

1. Investor pitch

Presenting your idea to investors is a delicate balance of sales, partnership, enthusiasm and vision. It’s essential to get straight to the point and convince investors to put their money and trust in your hands. If a CEO can’t make people care about their business in less than five minutes, they must return to the drawing board.

Angel investors and venture capitalists look for signs that a company is worth their time and money. Revenue numbers, customer data, a solid marketing plan and positive business trends are critical. However, don’t rely on the numbers to do all the talking: Investors want entrepreneurs they can trust.

Your investor pitch should weave together data with an engaging story that establishes that you’re an excellent, trustworthy leader with a capable team.


Finding and attracting investors ― and keeping their attention ― can be challenging. According to DocSend, in 2023, investors spent about three and a half minutes reviewing pitch decks. In 2023, it was less than three minutes.


Look to your company’s mission statement to inspire your all-hands vision pitch. Avoid abstract platitudes like “We make the world better.” Instead, share stories about how your business lives its mission and impacts the world.

4. Press-worthy announcement speech

Apple has perfected the show-stopping press event. When many people think of Steve Jobs, they often recall his dynamic and unique presentations at Apple’s major product launch events.

Jobs elicited emotional responses from his audiences. For example, when announcing the iPod in 2001, he didn’t describe it as just a beautiful MP3 player. He equated it with the freedom to listen to any kind of music, anywhere. 

Investor pitches and all-hands vision pitches must help an audience see beyond basic facts and figures to something more meaningful. Similarly, press-worthy announcements should drive at something much deeper than technical specs. If you want people to care about what you’re announcing, help them understand the implications of your news in emotional terms.

5. Thought leadership keynote

Industry events are opportunities for CEOs to increase brand awareness and reach a broad audience of potential customers. But when a CEO is asked to give a keynote at an event, organizers don’t want a company pitch. 

Instead, CEOs should cultivate a platform of ideas beyond their products or services to call upon for keynote presentations. They can use these thought leadership tenets to prepare a presentation that is meaningful to the audience while effortlessly raising their organization’s esteem and profile. 


Incorporate your brand’s story into your keynote. Focus on your experiences, challenges and obstacles to engage your audience.

Did You Know?

Google Slides is part of Google Drive, which is one of the most effective internal collaboration tools a business can use.


Visme is an all-in-one presentation-creation platform with easy customization features. Visme is more robust than PowerPoint but just as easy to use. It’s great for teams that want a professional look. You can use its free version with limited features or upgrade to its paid tiers for more functionality. 

Here’s some information on Visme:

Templates: Visme provides thousands of templates and millions of graphics, audio and video options for a wide range of industries.

Price: Visme’s free plan has limited features. Paid tiers cost $15 and $29 and provide additional options and functionality.

Compatibility: With Visme, you show and edit presentations in a web browser.


Prezi’s presentation software allows you to create nonlinear stories. It can manage slides in groups so you can quickly bounce from one topic to the next while keeping your audience engaged. Prezi also allows you to share presentation links to collaborate with co-workers.

Here’s what you should know about Prezi:

Templates: Prezi offers over 200 templates catering to various industries.

Price: Prezi has a free plan and paid tiers. The Plus plan is $15 per month and the Premium tier is $19 per month.

Compatibility: You can use Prezi on Windows PCs and Macs (offline access). You can also access it on iOS and Android devices.

The chúng tôi software uses artificial intelligence to upgrade the standard human presentation. The platform is excellent if you are short on time. Import all your information and chúng tôi will organize your slides to be clear, concise and aesthetically appealing.

Here’s some basic info on

Templates. chúng tôi provides 62 templates that focus on use cases instead of industry.

Price. chúng tôi costs $12 per month for unlimited slides and $40 per month for sharing capabilities and custom templates.

Compatibility. With chúng tôi you show and edit presentations on a web browser.

Peter Arvai contributed to this article.

Update the detailed information about Definitive Guide To Prepare For An Analytics Interview on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!