You are reading the article A Paranoid Buyer’s Guide To Shopping Online updated in December 2023 on the website Moimoishop.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 A Paranoid Buyer’s Guide To Shopping Online
The Internet can be a very intimidating place, with many people using the anonymity it provides to do nefarious things. Since its inception, millions of people have fallen victim to scams and hackers that have stolen their identities and made purchases in their name.
What Makes Safety So ChallengingHowever, hackers are always trying to stay one step ahead of these methods and sometimes even succeed in stealing customer information from companies, making it difficult to actually make the Internet a safe place for people to make online purchases.
It’s also probably worth mentioning the fact that hundreds of millions of people around the world have their credit and debit card data somewhere on the Web. In America alone, this total reaches 94 million, which is a bit under a third of the entire country’s population.
Looking for HTTPS Isn’t EnoughWhile it’s imperative that you look for the “HTTPS://” before the URL on your address bar to ensure that your data is encrypted while you make a transaction, it’s not enough to tell whether you are being scammed or not. To get the certificate necessary to use HTTPS on your website, you only need to prove that you own the domain but not that you’re a legitimate business (read more on this here).
While it may be safe to make a purchase online from a retailer you know with absolute certainty is legitimate, unknown retailers can still scam you and use an encryption (HTTPS) certificate on their site. The authority that gave them the certification will often try to combat this, but you may still fall victim to scams regardless.
Diversify Your CredentialsThe problem with credit and debit cards on the Internet is that they are just one number. And that number is the sole thing that stands between any entity and your bank account. Once it is revealed, every penny you have at the bank is vulnerable and fair game to anyone.
PayPal is similar in that you have one account tied to all your money. But there’s one crucial difference here: changing your PayPal password is easy, but doing the same to your debit card number is a process that requires interacting with your bank. It could get complicated rather quickly.
Instead of giving out your CC info to every online retailer, it is better to use a “throwaway” number that you can invalidate at your whim. There are startups like Privacy that offer services like these and Visa has also recently rolled out with a token service that does something similar.
Retailers Don’t Need a Lot of Info About YouThere are two things an online store needs before you complete a purchase: a way to send you their product and a way to receive your payment. This includes your address, your name, your phone number (in case they need to contact you about the delivery), and your debit card credentials. Any other information they ask for is superfluous and you should never give it away.
So things like your passport number, your ID number, your SSN, and any other identifying information should never be in the hands of a simple retailer. This is reserved only for government institutions, banks, and other entities that actually require this data to ensure that you’re not an identity thief. Assume the worst if some Amazon wannabe asks you for this information.
Other Things You Should AvoidWhen parting with your money, you should always make sure that the transaction is as private as possible. Avoid making purchases in public, at a public computer, or with any sort of unencrypted WiFi. Yes, that means that even if you make a purchase from your home under an unencrypted WiFi connection, you might as well be doing it at an airport. The idea here is to lock down everything as much as you possibly can.
Miguel Leiva-Gomez
Miguel has been a business growth and technology expert for more than a decade and has written software for even longer. From his little castle in Romania, he presents cold and analytical perspectives to things that affect the tech world.
Subscribe to our newsletter!
Our latest tutorials delivered straight to your inbox
Sign up for all newsletters.
By signing up, you agree to our Privacy Policy and European users agree to the data transfer policy. We will not share your data and you can unsubscribe at any time.
You're reading A Paranoid Buyer’s Guide To Shopping Online
Graphics Card Buyer’s Guide 2023: What To Look For When Buying A Gpu
If you’re looking to build or buy a powerful gaming rig, you will need to pay close attention to the graphics card. Buying a GPU can be challenging, as there is much to consider, from the type of the monitor to the size of your chassis. It doesn’t need to be that tough, however, as if you know your budget, PC requirements and performance goals, you’re close to finding the perfect graphics card for your needs. Our graphics card buying guide will help you discover what to look for when buying a GPU.
Good to know: after selecting the correct graphics card, read through these other considerations for building a gaming PC.
Where’s the Value: NVIDIA vs. AMDToday, the GPU market is saturated with dozens of viable graphics card options from various AIB (Add-in-Board) companies, but only three companies make the GPU chips that power these cards: NVIDIA, AMD, and Intel. While Intel has managed to release some offerings in the low- and mid-range segments, it still has a long way to go before becoming a serious contender to Team Green and Team Red. The choice comes down to NVIDIA vs. AMD.
Despite some improvements, AMD’s ray tracing and upscaling tech is still lagging behind NVIDIA’s. For gamers who are okay letting go of ray tracing-enhanced visuals in exchange for retaining more performance, AMD is the better choice, as it offers more value for the money at current retail prices, strictly in terms of rasterized performance.
Image source: Newegg
AMD’s cards also offer significantly more VRAM than NVIDIA’s. Modern games are increasingly becoming VRAM-dependent, and AMD’s cards are likely to fare better in the long run when compared to that of Team Green.
Performance at Various BudgetsClose on the heels of the cryptocurrency boom and the global Silicon shortage, graphics card manufacturers introduced steep price hikes as consumer demand broke all records. The traditional definitions of “budget” and “mid-range” no longer apply in the current market and are unlikely to ever return.
Yet, you can still get yourself decent gaming performance if you temper expectations and are willing to stretch your budget a little. This GPU performance guide classifies current-gen and various previous-gen graphics cards according to price and performance.
Tip: if you are having trouble with your graphics card, you can update the AMD graphics drivers in Windows.
What to Look for When Buying a Graphics CardChoosing the best graphics card based on your budget is only half the story. You also need to consider other factors, like compatibility with your PC, GPU VRAM, potential bottlenecks, and TDP. These factors can help you make a better buying decision and help you make the most of your investment. They’re also essential to potential upgrade decisions that you may make to fully utilize your new graphics card’s performance.
1. CompatibilityNothing can be more frustrating than unboxing your shiny new graphics card only to realize it’s an inch too big for your PC case. Before you drop the big bucks, do your homework and discover how much physical space your case can offer. Compare your case’s GPU clearance value against the graphics card dimensions you’re planning to buy. If your case isn’t compatible with the graphics card, consider buying a new PC case.
Take note of your power supply as well. How many amps can it supply on the 12v rails? How many watts is it rated for, and how many six- and eight-pin PCIe connectors does it have? Cross-reference this information with the graphics card you want to buy. If your computer isn’t equipped to handle your new graphics card, you’ll want to look for a graphics card requiring less power or consider a PSU upgrade.
Image source: Newegg
Lastly, check the ports. Some monitors use DisplayPort, others have HDMI, and some older units only use DVI. Ensure the card you want to buy has at least one matching connector for your monitor. Buying a card with different ports than those on your monitor is rare. However, if it happens, you may have to buy an adapter at an extra cost.
Also helpful: if you want to get more performance out of your graphics card, learn how to safely overclock your GPU.
2. BottlenecksYour system dictates the kind of graphics card you should buy. Knowing your system’s limitations can save you money and headaches. For example, if you’re running an older four-core or dual-core CPU, it is likely to hold back your high-end graphics card, forcing you to leave performance on the table. In this situation, you can opt for a mid-range card to prolong the life of your CPU or consider a CPU upgrade if your budget permits.
Image source: Unsplash
Your display is also an essential factor to consider. With 1440p (2560 x 1440) monitors increasingly becoming mainstream, your graphics card should be able to drive a higher pixel count for smoother gaming performance. Likewise, if you intend to run multiple monitors or an ultrawide monitor, you need to factor in the extra pixels and opt for a graphics card that will support that.
3. Memory and BandwidthAlthough more VRAM on your graphics card doesn’t guarantee more performance, it’s fast becoming a crucial factor as games become more demanding, especially at higher resolutions like 4K. Many recent high-end graphics cards come with questionable amounts of GPU VRAM, such as the RTX 3070 Ti that only shipped with 8GB of VRAM. Even though the card is still more powerful, its longevity is hampered by the relatively small VRAM buffer.
Memory bandwidth is as equally important as the memory on your graphics card. Data ready to be processed by the GPU is usually stored on the card’s dedicated memory, which can be GDDR3, GDDR5, (more recently) GDDR6, or GDDR6X. Note that GDDR6 memory provides twice the bandwidth of GDDR5 clocked at the same rate.
Image source: Newegg
Most of the recent-gen graphics cards ship with GDDR6 memory, with some getting even faster GDDR6X memory. Since memory bandwidth is vital for performance, you should always choose faster memory for better performance.
Good to know: we’ve rounded up the best AMD motherboards for gaming.
4. CUDA Cores (NVIDIA) or Stream Processors (AMD)CUDA cores or Stream processors can be your rough guide to comparing gaming performance across GPUs of the same family. CUDA (Compute Unified Device Architecture) is NVIDIA’s proprietary parallel computing language that leverages the GPU in specific ways to perform tasks with greater accuracy. A CUDA core is NVIDIA’s equivalent to an AMD stream processor.
Be careful when comparing CUDA cores or Stream processors across different GPU generations, as architectural improvements generally overcome any CUDA core deficiency at comparable performance tiers. For instance, the latest RTX 4080 has 9728 CUDA cores versus 10496 CUDA cores on the previous-gen RTX 3090. But the RTX 4080 is around 20 percent faster than NVIDIA’s previous-gen premium offering.
5. TDP ValueImage source: NVIDIA
The power consumption of the latest RTX 4000 series saw a considerable jump from the previous generation cards, with the flagship RTX 4090 requiring at least an 850W power supply. AMD’s RX 7900 XTX requires at least an 800W power supply to support the flagship GPU’s performance adequately.
Tip: did you know that you can also overclock your RAM? This can also help with gaming performance.
6. G-SYNC or FreeSync?G-SYNC and FreeSync are adaptive synchronization technologies developed by NVIDIA and AMD, respectively. Buying a monitor that supports one of these features will help adjust your monitor’s refresh rate with the FPS generated by your graphics card, reducing issues like screen tearing and input latency.
Image source: Unsplash
FreeSync is an open-source standard available on far more monitors than NVIDIA’s G-SYNC. You don’t always need to spend more on a display with hardware-supported G-SYNC tech. You can instead opt for displays certified as “G-SYNC compatible” or get a FreeSync display. Buying a FreeSync or FreeSync Premium monitor will help you cut costs while not sacrificing performance, as both NVIDIA and AMD cards can work with FreeSync, similar to FSR that’s available for both NVIDIA and AMD GPUs.
Good to know: if you’re using an NVIDIA card, you’ll want to learn how NVIDIA’s GeForce software suite works.
Frequently Asked Questions What kind of GPU do I need for video editing?While video editing is a demanding workload that will benefit from more powerful GPUs, like the 80- or 90-series cards from NVIDIA or comparable AMD offerings, buying mid-range cards will be sufficient for most users. Even when editing 4K videos, a card like the RTX 3060 Ti can power your workload. The performance benefit that you’ll gain by moving to a card like RTX 3090 or RTX 4080 will not justify the associated price increase. It’s worth noting that NVIDIA GPUs have traditionally offered better support for video editing, 3D modeling, and rendering software than AMD’s cards.
Which is better for streaming: NVIDIA or AMD?NVIDIA cards have consistently performed better at streaming than AMD cards, thanks to their superior NVENC encoder and a suite of features targeted at streamers. But AMD has now closed the gap to a large extent with its updated AMF encoder and AMD noise suppression, rivaling NVIDIA’s NVENC and RTX Voice. AMD’s graphics cards can now stream videos at the same quality and bitrate as NVIDIA’s cards. The only thing needed is industry adoption for AMD’s tech; the gap between the two GPU manufacturers for streaming will become negligible.
Are NVIDIA drivers more stable than AMD drivers?This used to be the case until AMD’s RX 5000 series, but when comparing the RX 6000 and RX 7000 series drivers with those of NVIDIA, both brands have faced driver issues, and it’s impossible to declare one better than the other. AMD and NVIDIA do an excellent job of addressing driver issues, and while not ideal, face similar problems that sometimes lead to severe game-breaking and system-crashing errors.
Image credit: Unsplash
Tanveer Singh
Tanveer hunts far and wide for PC Hardware, Windows, and Gaming ideas to write about. An MBA in Marketing and the owner of a PC building business, he has written extensively on Technology, Gaming, and Marketing. When not scouring the web, he can be found binging on The Office, running for his life in GTFO, or wrecking karts in Smash Karts.
Subscribe to our newsletter!
Our latest tutorials delivered straight to your inbox
Sign up for all newsletters.
By signing up, you agree to our Privacy Policy and European users agree to the data transfer policy. We will not share your data and you can unsubscribe at any time.
A Guide To Monte Carlo Simulation!
This article was published as a part of the Data Science Blogathon
Introductionsampling to get the likelihood of a range of an unknown quantity. Sounds difficult! don’t worry, we will explore this in-depth in this article
A Brief History:The Monte Carlo Method was invented by John Neumann and Ulam Stanislaw to improve decision-making under uncertain conditions. It was named after a well-known casino town Monte Carlo called Monaco since the element of chance is core to the modelling approach as it is similar to a game of roulette.
In easy words, Monte Carlo Simulation is a method of estimating the value of an unknown quantity with the help of inferential statistics. You need not dive deep into inferential statistics to have a strong grasp of Monte Carlo simulation’s working. However, this article will go only through those points of inferential statistics which will be relevant to us in the Monte Carlo Simulation.
Inferential Statistics deals with the population which is our set of examples and sample, which is a proper subset of the population. The key point to notice is that a random sample tends to exhibit the same characteristics/property as the population from which it is drawn.
What is Monte Carlo Simulation in python?Monte Carlo simulation is a computational technique used to model and analyze complex systems or processes through the use of random sampling. It is named after the famous Monte Carlo casino in Monaco, as the simulation relies on generating random numbers.
In Python, Monte Carlo simulation can be implemented using various libraries such as NumPy and random. The basic steps involved in performing a Monte Carlo simulation are as follows:
Define the problem: Clearly state the problem you want to model or analyze using Monte Carlo simulation. This could involve anything from estimating probabilities to evaluating financial risks.
Set up the model: Create a mathematical or computational model that represents the system or process under consideration. This model should include all relevant variables, inputs, and assumptions.
Generate random inputs: Identify the input variables in your model that exhibit uncertainty or randomness. Randomly sample values for these variables according to their probability distributions. This is often done using Python’s random or NumPy’s random functions.
Run simulations: Execute the model multiple times using the randomly generated inputs. Each run of the model is called an iteration. Record the output or results of interest for each iteration.
Analyze the results: With the recorded outputs from the simulations, analyze and summarize the data. This may involve calculating summary statistics, estimating probabilities, or constructing confidence intervals.
Draw conclusions: Based on the analysis of the simulation results, draw conclusions about the behavior, performance, or characteristics of the system or process being modeled. These conclusions can help make informed decisions or gain insights into the problem.
Monte Carlo simulation is a powerful tool that can handle complex problems where analytical or deterministic solutions are difficult or impossible to obtain. It allows for the exploration of a wide range of scenarios and provides a probabilistic understanding of the system under study. Python provides a convenient environment to implement Monte Carlo simulations due to its versatility and the availability of libraries that facilitate random number generation and numerical computations.
We will go through an example to understand the working of the Monte Carlo simulation.We aim to estimate that how likely is it to get ahead if we flip a coin an infinite number of times.
1. Let’s say we flip it once and get ahead. Will we be confident to say that our answer is 1?
2. Now we flipped the coin again and it again appeared head. Are we sure that the next flip will also be ahead?
3. We flipped it over and over again, let’s say 100 times, and strangely head appears every time. Now, do we need to accept the fact that the next flip will result in another head?
4. Let us just change the scenario and assume that out of 100 flips 52 resulted in the head will rest 48 came to be tails. Is the probability of the next flip resulting in the head is 52/100? Given the observation, it’s our best estimate, But the confidence will be still low.
Why is there a difference in Confidence Level?It is important to know that our estimate depends upon two things
1. Size: the size of the sample (e.g., 100 vs 2 in cases 2 and 4 respectively)
3. As the Variance of the observation grows (case 3 and 4), there comes a need for larger observation (as in cases 2 and 4) to have the same degree of confidence.
Law of Large NumbersIn repeated independent tests with the constant probability p of the population of a particular outcome in each test, the probability that the outcome occurs i.e. obtained from the samples differs from p converges to zero as the number of trials goes to infinity.
It simply means that if deviations (Variance) occur from the expected behaviour (probability p), in the future these deviations are likely to be evened out by the opposite deviation.
Now let’s talk about an interesting incident that took place on 18 August 1913, at a casino in Monte Carlo. In roulette, black came up a record twenty-six times in succession, and there arose a panic to bet red (so to even out the deviation from expected behaviour)
Let’s analyze this situation mathematically
1. Probability of 26 consecutive reds = 1/67,108,865
2. Probability of 26 consecutive reds when previous 25 rolls were red =1/2
Regression to Mean1. Following an extreme random event, the next random event is likely to be less extreme so that the mean is maintained.
2. E.g. if the roulette wheel is spun 10 times and reds come every time, then it is an extreme event =1/1024 and it is likely that in the next 10 spins we will get less than 10 reds, But the average number is 5 only.
So, as we look at the mean of 20 spins, it will be closer to the expected mean of 50% reds than to the 100% as of in the first 10 spins.
Now time to face some reality.
Sampling space of possible Outcomes1. It is not possible to guarantee perfect accuracy through sampling and also cannot say that an estimate is not precisely correct
We face a question here that how many samples are required to look at before we can have significant confidence in our answer?
It depends upon the variability in underlying distribution.
Confidence levels and Confidence IntervalsAs in a real-life situation, we cannot be sure of any unknown parameter obtained from a sample for the whole population so we make use of confidence levels and confidence intervals.
The confidence interval provides a range that the unknown value is likely to be contained with the confidence that the unknown value lays strictly within that range.
For example, the return for betting on a slot 1000 times in roulette is -3% with a margin error of +/- 4% with a 95% level of confidence.
It can be further decoded as we conduct an infinite trial of 1000,
The expected average/mean return would be -3%
The return would roughly vary between +1% and -7% that also 95% of the time.
Probability Density Function (PDF).Distribution is usually defined by the probability density function (PDF). It is defined as the probability that the random variable lying between an interval.
The area under the curve between the two points of PDF is the probability of the random variable falling within that range.
Let’s conclude our learning by an example:Let’s say there is a deck of shuffled cards and we need to find the probability of getting 2 consecutive kings if they lay down the cards in the order they are placed.
Analytical method:
P (at least 2 consecutive kings) = 1-P (no consecutive kings)
=1-(49! X 48!)/((49-4)! X52!) = 0.217376
By Monte Carlo Simulation:
Steps
1. Repeatedly select the random data points: Here we assume the shuffling of the cards is random
2. Performing deterministic computation. A number of such shuffling and finding the results.
3. Combine the results: Exploring the result and ending with our conclusion.
By Monte Carlo method we achieve near exact solution as of analytical method.
Advantages of Monte Carlo Simulation
Easy to implement and it gives statistical sampling for numerical experiments using the computer.
Provides us with satisfactory approximate solutions to computationally expensive mathematical problems.
It can be used for deterministic as well as stochastic problems.
It is sometimes time-consuming as we have to generate a large number of samplings to get the desired satisfactory output.
The results obtained from this method are only the approximation of the true solution and not the exact solution.
Frequently Asked QuestionsI am Dinesh Junjariya a Btech student from IIT Jodhpur.
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.
Related
A Complete Guide To Cubase Shortcuts
Introduction to Cubase Shortcuts
Start Your Free Software Development Course
Web development, programming languages, Software testing & others
Shortcut of Tools of Tool Panel of CubaseGiven below are the shortcut of tools of tool panel of cubase:
Drumstick Tool (0): For making drumstick tool active we should press zero numeric key of keyboard.
Select Tool (1): By pressing 1 numeric key of keyboard we can have select tool active.
Range Tool (2): For accessing range tool quickly you have to press 2 numeric key of keyboard.
Split Tool (3): Press 3 numeric key of keyboard for having split tool.
Glue Tool (4): By pressing 4 numeric key of keyboard you can access Glue tool.
Erase Tool (5): Press 5 numeric key of keyboard as shortcut key of Erase tool.
Zoom Tool (6): 6 numeric key of keyboard can use as shortcut key of Zoom tool.
Mute Tool (7): By pressing 7 numeric key of keyboard we can have Mute tool active.
Draw Tool (8): Shortcut key of draw tool is 8 numeric key of keyboard.
Play Tool (9): 9 numeric key of keyboard can use as shortcut command of Play tool.
Previous Tool/Next Tool (10): 10 numeric key of keyboard can use as shortcut command for previous tool or for next tool selection.
Shortcut Keys for Audio Settings
Adjust Fades to Range (A): For adjusting fades on selected range we can press A key of keyboard.
Crossfade/Fade (X): By pressing X key of keyboard we can have Crossfade for editing propose.
Direct Offline Processing (F7): Press F7 functional key of keyboard for direct processing of your editing in offline mode.
Shortcut Keys for AutomationGiven below are the shortcut keys for automation:
Toggle Read Automation for all Tracks on/off (Alt + R): By pressing Alt + R keys of keyboard you can make on or off all tracks for toggle reading automation purpose.
Toggle Write Automation for all Tracks on/off (Alt + W): By pressing Alt + W keys of keyboard you can make on or off all tracks for toggle writing automation purpose.
Automation Panel (F6): Press F6 functional key of keyboard for enabling automation panel for different purposes.
Shortcut Keys for ChordsGiven below are the shortcut keys for chords:
Shortcut Keys for Device
Given below are the shortcut keys for device:
MixConsole Lower Zone (Alt + F3): For lower zone of mixconsole we can press Alt + F3 functional key of keyboard.
Mixer (F3): By pressing F3 functional key of keyboard we can have Mixer device active.
Virtual Keyboard (Alt +K): For having Virtual keyboard we can press Alt + K keys of keyboard.
VST Connections (F4): We can press F4 functional key of keyboard for having VST Connections device active.
VST Instruments (F11): Same as VST connections we have shortcut command for VST Instruments that is F11 functional key of keyboard.
VST Performance (F12): F12 functional key of keyboard will make VST Performance device active.
Shortcut Keys for Cut, Copy, Paste, Undo and Redo
Cut (Ctrl + X): By pressing Ctrl + X key of keyboard we can cut our selected element for audio editing process.
Copy (Ctrl + C): Same as cut command we can copy our desired element by pressing Ctrl + C key of keyboard.
Cut (Ctrl + V): For pasting any copied or cut element we can press Ctrl + V button of keyboard.
Shortcut Keys for Edit CommandGiven below are the shortcut keys for edit command:
Activate or Deactivate Focused Object (Alt + A): For activating or deactivating focused object of editing process we can press Alt + A key of keyboard that means first time when we press these keys it will make this command active and when we again press these keys it will deactivate this command.
Auto-Scroll On/Off (F): Press F key of keyboard and it will enable or disable auto scroll feature of this software.
Delete (Delete): If you want to delete your selected element then you can simply press Delete key of keyboard.
Duplicate (Ctrl + D): If you want to make duplicate copy of your desired element during editing process then you just have to select that element and press Ctrl + d key of keyboard.
Expand/Reduce (Alt + E): For expending or reduce layer length you can press Alt + E button of keyboard.
Insert Silence (Ctrl + Shift + E): Press Ctrl + Shift + E keys of keyboard and it will insert Silence at your selected area.
Invert (Alt + F): Press Alt + F key of keyboard for invert command.
Left Selection Side to Cursor (E): Press E button of keyboard for making selection at the left side of the cursor.
Write Selection side to Cursor (D): Press D button of keyboard for making selection at the right side of the cursor.
Mute (M): Press M key of keyboard for mute the audio during working with it.
Mute/Unmute Objects ( Alt + M): Same as Mute we have command for making objects mute or unmute and you can do this by pressing Alt + M keys of keyboard.
Conclusion – Cubase ShortcutsThese were some of the important shortcut keys of tools as well command of this software and you can start using them during working on any project in this software. These shortcuts will help you in enhancing your working skill and also provide you efficient working ability.
Recommended ArticlesThis is a guide to Cubase Shortcuts. Here we discuss shortcut of tools of tool panel of cubase, shortcut keys for audio settings, device & edit command. You may also have a look at the following articles to learn more –
A Guide To Building An End
This article was published as a part of the Data Science Blogathon.
Knock! Knock!
Who’s there?
It’s Natural Language Processing!
Today we will implement a multi-class text classification model on an open-source dataset and explore more about the steps and procedure. Let’s begin.
Table of Contents
Dataset
Loading the data
Feature Engineering
Text processing
Exploring Multi-classification Models
Compare Model performance
Evaluation
Prediction
Dataset for Text ClassificationThe dataset consists of real-world complaints received from the customers regarding financial products and services. The complaints are labeled to a specific product. Hence, we can conclude that this is a supervised problem statement, where we have the input and the target output for that. We will play with different machine learning algorithms and check which algorithm works better.
Our aim is to classify the complaints of the consumer into predefined categories using a suitable classification algorithm. For now, we will be using the following classification algorithms.
Linear Support Vector Machine (LinearSVM)
Random Forest
Multinomial Naive Bayes
Logistic Regression.
Loading the DataDownload the dataset from the link given in the above section. Since I am using Google Colab, if you want to use the same you can use the Google drive link given here and import the dataset from your google drive. The below code will mount the drive and unzip the data to the current working directory in colab.
from google.colab import drive drive.mount('/content/drive') !unzip /content/drive/MyDrive/rows.csv.zipFirst, we will install the required modules.
Pip install numpy
Pip install pandas
Pip install seaborn
Pip install scikit-learn
Pip install scipy
Ones everything successfully installed, we will import required libraries.
import os import pandas as pd import numpy as np from scipy.stats import randint import seaborn as sns # used for plot interactive graph. import matplotlib.pyplot as plt import seaborn as sns from io import StringIO from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.feature_selection import chi2 from IPython.display import display from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import TfidfTransformer from sklearn.naive_bayes import MultinomialNB from sklearn.linear_model import LogisticRegression from sklearn.ensemble import RandomForestClassifier from chúng tôi import LinearSVC from sklearn.model_selection import cross_val_score from sklearn.metrics import confusion_matrix from sklearn import metricsNow after this let us load the dataset and see the shape of the loaded dataset.
# loading data df = pd.read_csv('/content/rows.csv') print(df.shape)From the output of the above code, we can say that the dataset is very huge and it has 18 columns. Let us see how the data looks like. Execute the below code.
df.head(3).TNow, for our multi-class text classification task, we will be using only two of these columns out of 18, that is the column with the name ‘Product’ and the column ‘Consumer complaint narrative’. Now let us create a new DataFrame to store only these two columns and since we have enough rows, we will remove all the missing (NaN) values. To make it easier to understand we will rename the second column of the new DataFrame as ‘consumer_complaints’.
# Create a new dataframe with two columns df1 = df[['Product', 'Consumer complaint narrative']].copy() # Remove missing values (NaN) df1 = df1[pd.notnull(df1['Consumer complaint narrative'])] # Renaming second column for a simpler name df1.columns = ['Product', 'Consumer_complaint'] print(df1.shape) df1.head(3).TWe can see that after discarding all the missing values, we have around 383k rows and 2 columns, this will be our data for training. Now let us check how many unique products are there.
pd.DataFrame(df1.Product.unique()).valuesThere are 18 categories in products. To make the training process easier, we will do some changes in the names of the category.
# Because the computation is time consuming (in terms of CPU), the data was sampled df2 = df1.sample(10000, random_state=1).copy() # Renaming categories df2.replace({'Product': {'Credit reporting, credit repair services, or other personal consumer reports': 'Credit reporting, repair, or other', 'Credit reporting': 'Credit reporting, repair, or other', 'Credit card': 'Credit card or prepaid card', 'Prepaid card': 'Credit card or prepaid card', 'Payday loan': 'Payday loan, title loan, or personal loan', 'Money transfer': 'Money transfer, virtual currency, or money service', 'Virtual currency': 'Money transfer, virtual currency, or money service'}}, inplace= True) pd.DataFrame(df2.Product.unique())The 18 categories are now reduced to 13, we have combined ‘Credit Card’ and ‘Prepaid card’ to a single class and so on.
Now, we will map each of these categories to a number, so that our model can understand it in a better way and we will save this in a new column named ‘category_id’. Where each of the 12 categories is represented in numerical.
# Create a new column 'category_id' with encoded categories df2['category_id'] = df2['Product'].factorize()[0] category_id_df = df2[['Product', 'category_id']].drop_duplicates() # Dictionaries for future use category_to_id = dict(category_id_df.values) id_to_category = dict(category_id_df[['category_id', 'Product']].values) # New dataframe df2.head()Let us visualize the data, and see how many numbers of complaints are there per category. We will use Bar chart here.
fig = plt.figure(figsize=(8,6)) colors = ['grey','grey','grey','grey','grey','grey','grey','grey','grey', 'grey','darkblue','darkblue','darkblue'] df2.groupby('Product').Consumer_complaint.count().sort_values().plot.barh( ylim=0, color=colors, title= 'NUMBER OF COMPLAINTS IN EACH PRODUCT CATEGORYn') plt.xlabel('Number of ocurrences', fontsize = 10);Above graph shows that most of the customers complained regarding:
Credit reporting, repair, or other
Debt collection
Mortgage
Text processingThe text needs to be preprocessed so that we can feed it to the classification algorithm. Here we will transform the texts into vectors using Term Frequency-Inverse Document Frequency (TFIDF) and evaluate how important a particular word is in the collection of words. For this we need to remove punctuations and do lower casing, then the word importance is determined in terms of frequency.
We will be using TfidfVectorizer function with the below parameters:
min_df: remove the words which has occurred in less than ‘min_df’ number of files.
Sublinear_tf: if True, then scale the frequency in logarithmic scale.
Stop_words: it removes stop words which are predefined in ‘english’.
tfidf = TfidfVectorizer(sublinear_tf=True, min_df=5, ngram_range=(1, 2), stop_words='english') # We transform each complaint into a vector features = tfidf.fit_transform(df2.Consumer_complaint).toarray() labels = df2.category_id print("Each of the %d complaints is represented by %d features (TF-IDF score of unigrams and bigrams)" %(features.shape))Now, we will find the most correlated terms with each of the defined product categories. Here we are finding only three most correlated terms.
# Finding the three most correlated terms with each of the product categories N = 3 for Product, category_id in sorted(category_to_id.items()): features_chi2 = chi2(features, labels == category_id) indices = np.argsort(features_chi2[0]) feature_names = np.array(tfidf.get_feature_names())[indices] unigrams = [v for v in feature_names if len(v.split(' ')) == 1] bigrams = [v for v in feature_names if len(v.split(' ')) == 2] print(" * Most Correlated Unigrams are: %s" %(', '.join(unigrams[-N:]))) print(" * Most Correlated Bigrams are: %s" %(', '.join(bigrams[-N:])))* Most Correlated Unigrams are: overdraft, bank, scottrade * Most Correlated Bigrams are: citigold checking, debit card, checking account * Most Correlated Unigrams are: checking, branch, overdraft * Most Correlated Bigrams are: 00 bonus, overdraft fees, checking account * Most Correlated Unigrams are: dealership, vehicle, car * Most Correlated Bigrams are: car loan, vehicle loan, regional acceptance * Most Correlated Unigrams are: express, citi, card * Most Correlated Bigrams are: balance transfer, american express, credit card * Most Correlated Unigrams are: report, experian, equifax * Most Correlated Bigrams are: credit file, equifax xxxx, credit report * Most Correlated Unigrams are: collect, collection, debt * Most Correlated Bigrams are: debt collector, collect debt, collection agency * Most Correlated Unigrams are: ethereum, bitcoin, coinbase * Most Correlated Bigrams are: account coinbase, coinbase xxxx, coinbase account * Most Correlated Unigrams are: paypal, moneygram, gram * Most Correlated Bigrams are: sending money, western union, money gram * Most Correlated Unigrams are: escrow, modification, mortgage * Most Correlated Bigrams are: short sale, mortgage company, loan modification * Most Correlated Unigrams are: meetings, productive, vast * Most Correlated Bigrams are: insurance check, check payable, face face * Most Correlated Unigrams are: astra, ace, payday * Most Correlated Bigrams are: 00 loan, applied payday, payday loan * Most Correlated Unigrams are: student, loans, navient * Most Correlated Bigrams are: income based, student loan, student loans * Most Correlated Unigrams are: honda, car, vehicle * Most Correlated Bigrams are: used vehicle, total loss, honda financial
Exploring Multi-classification ModelsThe classification models which we are using:
Random Forest
Linear Support Vector Machine
Multinomial Naive Bayes
Logistic Regression.
For more information regarding each model, you can refer to their official guide.
Now, we will split the data into train and test sets. We will use 75% of the data for training and the rest for testing. Column ‘consumer_complaint’ will be our X or the input and the product is out Y or the output.
X = df2['Consumer_complaint'] # Collection of documents y = df2['Product'] # Target or the labels we want to predict (i.e., the 13 different complaints of products) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state = 0)We will keep all the using models in a list and loop through the list for each model to get a mean accuracy and standard deviation so that we can calculate and compare the performance for each of these models. Then we can decide with which model we can move further.
models = [ RandomForestClassifier(n_estimators=100, max_depth=5, random_state=0), LinearSVC(), MultinomialNB(), LogisticRegression(random_state=0), ] # 5 Cross-validation CV = 5 cv_df = pd.DataFrame(index=range(CV * len(models))) entries = [] for model in models: model_name = model.__class__.__name__ accuracies = cross_val_score(model, features, labels, scoring='accuracy', cv=CV) for fold_idx, accuracy in enumerate(accuracies): entries.append((model_name, fold_idx, accuracy)) cv_df = pd.DataFrame(entries, columns=['model_name', 'fold_idx', 'accuracy'])The above code will take sometime to complete its execution.
Compare Text Classification Model performanceHere, we will compare the ‘Mean Accuracy’ and ‘Standard Deviation’ for each of the four classification algorithms.
mean_accuracy = cv_df.groupby('model_name').accuracy.mean() std_accuracy = cv_df.groupby('model_name').accuracy.std() acc = pd.concat([mean_accuracy, std_accuracy], axis= 1, ignore_index=True) acc.columns = ['Mean Accuracy', 'Standard deviation'] accFrom the above table, we can clearly say that ‘Linear Support Vector Machine’ outperforms all the other classification algorithms. So, we will use LinearSVC to train model multi-class text classification tasks.
plt.figure(figsize=(8,5)) sns.boxplot(x='model_name', y='accuracy', data=cv_df, color='lightblue', showmeans=True) plt.title("MEAN ACCURACY (cv = 5)n", size=14); Evaluation of Text Classification ModelNow, let us train our model using ‘Linear Support Vector Machine’, so that we can evaluate and check it performance on unseen data.
X_train, X_test, y_train, y_test,indices_train,indices_test = train_test_split(features, labels, df2.index, test_size=0.25, random_state=1) model = LinearSVC() model.fit(X_train, y_train) y_pred = model.predict(X_test)We will generate claasifiaction report, to get more insights on model performance.
# Classification report print('ttttCLASSIFICATIION METRICSn') print(metrics.classification_report(y_test, y_pred, target_names= df2['Product'].unique()))From the above classification report, we can observe that the classes which have a greater number of occurrences tend to have a good f1-score compared to other classes. The categories which yield better classification results are ‘Student loan’, ‘Mortgage’ and ‘Credit reporting, repair, or other’. The classes like ‘Debt collection’ and ‘credit card or prepaid card’ can also give good results. Now let us plot the confusion matrix to check the miss classified predictions.
conf_mat = confusion_matrix(y_test, y_pred) fig, ax = plt.subplots(figsize=(8,8)) sns.heatmap(conf_mat, annot=True, cmap="Blues", fmt='d', xticklabels=category_id_df.Product.values, yticklabels=category_id_df.Product.values) plt.ylabel('Actual') plt.xlabel('Predicted') plt.title("CONFUSION MATRIX - LinearSVCn", size=16);From the above confusion matrix, we can say that the model is doing a pretty decent job. It has classified most of the categories accurately.
PredictionLet us make some prediction on the unseen data and check the model performance.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state = 0) tfidf = TfidfVectorizer(sublinear_tf=True, min_df=5, ngram_range=(1, 2), stop_words='english') fitted_vectorizer = tfidf.fit(X_train) tfidf_vectorizer_vectors = fitted_vectorizer.transform(X_train) model = LinearSVC().fit(tfidf_vectorizer_vectors, y_train)Now run the prediction.
complaint = """I have received over 27 emails from XXXX XXXX who is a representative from Midland Funding LLC. From XX/XX/XXXX I received approximately 6 emails. From XX/XX/XXXX I received approximately 6 emails. From XX/XX/XXXX I received approximately 9 emails. From XX/XX/XXXX I received approximately 6 emails. All emails came from the same individual, XXXX XXXX. It is becoming a nonstop issue of harassment.""" print(model.predict(fitted_vectorizer.transform([complaint]))) complaint = """Respected Sir/ Madam, I am exploring the possibilities for financing my daughter 's XXXX education with private loan from bank. I am in the XXXX on XXXX visa. My daughter is on XXXX dependent visa. As a result, she is considered as international student. I am waiting in the Green Card ( Permanent Residency ) line for last several years. I checked with Discover, XXXX XXXX websites. While they allow international students to apply for loan, they need cosigners who are either US citizens or Permanent Residents. I feel that this is unfair. I had been given mortgage and car loans in the past which I closed successfully. I have good financial history. print(model.predict(fitted_vectorizer.transform([complaint]))) complaint = """They make me look like if I was behind on my Mortgage on the month of XX/XX/2023 & XX/XX/XXXX when I was not and never was, when I was even giving extra money to the Principal. The Money Source Web site and the managers started a problem, when my wife was trying to increase the payment, so more money went to the Principal and two payments came out that month and because I reverse one of them thru my Bank as Fraud they took revenge and committed slander against me by reporting me late at the Credit Bureaus, for 45 and 60 days, when it was not thru. Told them to correct that and the accounting department or the company revert that letter from going to the Credit Bureaus to correct their injustice. The manager by the name XXXX requested this for the second time and nothing yet. I am a Senior of XXXX years old and a Retired XXXX Veteran and is a disgraced that Americans treat us that way and do not want to admit their injustice and lies to the Credit Bureau.""" print(model.predict(fitted_vectorizer.transform([complaint])))The model is not perfect, yet it is performing very good.
The notebook is available here.
ConclusionWe have implemented a basic multi-class text classification model, you can play with other models like Xgboost, or you can try to compare multiple model performance on this dataset using a machine learning framework called AutoML. This is not yet, still there are complex problems associated within the multi-class text classification tasks, you can always explore more and acquire new concepts and ideas about this topic. That’s It!!
Thank you!
All images are created by the author.
My LinkedIn
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion
Related
Best Chrome Extensions For Shopping
If you often do online shopping, you might be interested in these Chrome extensions for shopping. Here are some of the best extensions that help you get a coupon, check the price list on other websites, see related purchases, etc. Whether you do shopping from one or multiple e-commerce websites, these extensions will spruce up the game.
Best Chrome extensions for shoppingSome of the best reliable Chrome extensions for shopping using your PC are:
Honey
Rakuten
Keepa
FlipShope
Fakespot
The Camelizer
To learn more about these extensions, continue reading.
1] HoneyHoney is one of the most popular Chrome extensions to find coupon codes and discounts while shopping online. Whether you want to purchase clothes, furniture, electronic gadgets, or anything else, you can find a coupon code on this website. The best thing is that it automatically applies the coupon on the respective web page so that you do not miss out on a coupon unknowingly.
2] RakutenRakuten is similar to Honey but with more attractive features and options. The best thing about Rakuten is that you can actually receive the cashback to your PayPal account when it meets the threshold. Like Honey, you need to create an account so that you can manage all the coupons, discounts, cashback, etc.
3] KeepaAmazon is one of the biggest e-commerce websites that many of us customers of. At times, you might find a product is listed at $99, and some other day, you might find the same product with a $129 price tag. If you use the Keepa Chrome extension, you can learn how Amazon changed the price of a product over the past few weeks.
4] FlipShopeFlipShope is a Flipkart exclusive Chrome extension that you can use to get several things done. Flipkart often makes flash sales, which is very difficult to purchase a mobile or anything else from. However, if you use FlipShope, there is a very high chance of getting the product even during the flash sale.
5] FakespotWe often become victims of fake reviews while purchasing a product from popular online stores like Amazon, Walmart, eBay, etc. Nowadays, it is quite difficult to identify if a review a genuine or fake. That is why you must use an AI to get the job done.
6] The CamelizerThe Camelizer is a price tracking extension for Chrome that you can download before your next purchase. It displays a graph with the past pricing so that you can know how Amazon changes pricing according to the events. It is possible to know the pricing of the last one month, three months, six months, one year, or all time. You do not need to create an account, unlike other price tracking extensions, which is one of the best parts.
What is a shopping browser extension?A shopping browser extension may help you in different cases. For example, a shopping extension can help you find the history of the product’s pricing, find coupon codes, help during a flash sale, etc. Depending upon the extension, you can get various facilities like these from a shopping browser extension.
What is the best Chrome extension for coupons?There are so many Chrome extensions for finding coupons while shopping online. For example, you can use Honey, which is one of the most popular extensions out there. On the other hand, you can use Rakuten, FlipShope, etc. It is highly recommended to install all these aforementioned extensions if you want to save money while shopping online.
Read: Best Chrome extensions to install.
Update the detailed information about A Paranoid Buyer’s Guide To Shopping Online on the Moimoishop.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!