Trending March 2024 # How To Start Machine Learning With Tensorflow.js # Suggested April 2024 # Top 10 Popular

You are reading the article How To Start Machine Learning With Tensorflow.js updated in March 2024 on the website Moimoishop.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested April 2024 How To Start Machine Learning With Tensorflow.js

With artificial intelligence (AI) taking center stage in the installation of information, machine learning (ML), which can be a significant program of AI, profits significance. ML is significantly aided by using chúng tôi an open source javaScript library that this can be used to specify train and run ML models.

Machine learning is a program of artificial intelligence (AI) that supplies systems with the capacity to automatically learn and improve in experience, without being explicitly programmed. Machine learning (ML) concentrates on the evolution of computer applications that could access information and use it in order to learn for themselves.

Fundamentally, ML techniques process the information and the output signal to produce rules that fit the input to the output signal.

TensorFlow

TensorFlow is a open source software library for numerical computation and it utilizes data flow charts. The graph nodes represent mathematical operations, whereas the chart edges represent the multi-dimensional data arrays (tensors) that flow between them.

TensorFlow was initially developed by engineers and researchers working in Google. The system is general enough to be related to issues linked to a huge array of domain names.

TensorFlow.js

Tensorflow.js is a open source library which makes use of JavaScript and a high level coating API to specify, train in addition to run system learning models entirely from the browser. This open source library is powered by WebGL, which supplies a high-level coating API for specifying versions, and supplies a non refundable API for linear algebra and automated differentiation.

What could be accomplished with TensorFlow.js

Listed below are a Few of the capacities of TensorFlow.js.

The capacity to export an existing, pre-trained version or inference: When the consumer has an existing TensorFlow or Keras version that’s been previously trained offline, then it may be transformed into the chúng tôi format then packed to the browser to get inference.

The capacity to writer versions right in the browser: you could also utilize chúng tôi using JavaScript and a high level coating API to specify, train in addition to run versions completely in the browser.

Advantages of TensorFlow.js

Listed below are a Few of the benefits of TensorFlow.js:

It supports GPU acceleration and so, behind the scenes, will quicken your code if a GPU is accessible.

All information remains on the customer, making chúng tôi useful not just for low-latency inference but also for privacy-preserving software also.

Even from a mobile device, you are able to start your Web page, in which case your version can make the most of detector data.

Core theories of TensorFlow.js

TensorFlow.js supplies low-level construction blocks for system learning in addition to a high-level, Keras-inspired API for building neural networks. Let us take a peek at a few of the core elements of the library.

Tensors: The fundamental unit of information in chúng tôi is the tensor–a pair of numerical values represented as an array of one or more measurements. A tensor also includes the shape feature, which defines the range’s shape and basically provides the count of elements per measurement.

The most Frequent way to create a tensor is Using the tf.tensor function, as shown in the code snippet below:

Const shape = [2, 3];

Const a = tf.tensor([2.0, 3.0, 4.0, 20.0, 30.0, 40.0], contour );

A.print();

The following code example creates the same tensor into the one displayed in the last code snippet with tf.tensor2d:

TensorFlow.js also provides functions for producing tensors together with values set to 0 (tf.zeros) or all values set to 1 (tf.ones).

// 3×5 Tensor with values set to 0

Const zeros = tf.zeros([5, 5]);

Variables: Variables are initialised using a tensor of worth. Contrary to tensors, nevertheless, factors are mutable. You can assign a new tensor into an Present factor using the assign procedure, as shown below:

Biases.print();

Const updatedValues = tf.tensor1d([1, 0, 1, 0, 0,1]);

Biases.assign(updatedValues);

Biases.print();

Operations: Even though tensors permit you to store information, operations (ops) let you control this information.

TensorFlow.js supplies a vast selection of operations acceptable for general computations and ML-specific operations on tensors. Since tensors are immutable, surgeries return new tensors following computations are performed –for instance, a unary operator like a square.

Const info = tf.tensor2d([[2.0, 3.0], [4.0, 5.0]]);

So is the case with binary operations like add, sub and mul, as shown below:

Const Id = tf.tensor2d([[5.0, 6.0], [7.0, 8.0]]);

// Output: [[8, 6 ],

TensorFlow.js includes a chainable API; you could call operations on the effect of surgeries.

All surgeries are vulnerable as functions at the main name space; therefore that you can also do the following:

Versions and layers: Conceptually, a version is a function that takes a while and produces some output. Back in chúng tôi there are two methods to make versions. It’s possible to use operations straight to signify the job of this model.

As an instance:

Function forecast (input) undefined

// Define constants: y = 2x^2 + 4x + 8

Const a = tf.scalar(two );

const b = tf.scalar(4);

const c = tf.scalar(8);

// Predict output for enter of two

Const result = forecast (two );

result.print()

Or you may use APIs such as tf.model and tf.sequential to build a version. This code constructs a tf.sequential version:

Const version = tf.sequential();

model.add(

tf.layers.simpleRNN(undefined)

);

const optimizer = tf.train.sgd(LEARNING_RATE);

Model.fit(undefined);

There are lots of sorts of layers offered in chúng tôi A couple of examples include tf.layers.simpleRNN, chúng tôi and

tf.layers.lstm.

Memory Administration

TensorFlow.js uses the GPU to accelerate mathematics operations, therefore it is required to handle GPU memory whilst coping with tensors and factors.

TensorFlow.js supplies two purposes for this.

Eliminate: You are able to call dispose on a tensor or factor to purge it and free up its own GPU memory:

const x = tf.tensor2d([[0.0, 2.0], [4.0, 6.0]]);

const x_squared = x.square();

x.dispose();

x_squared. Remove ();

Tf.tidy: Applying to dispose of can be awkward when performing a great deal of tensor operations. chúng tôi provides another role, tf.tidy, which plays a major part to routine scopes in JavaScript, but also for GPU-backed tensors.

Tf.tidy implements a role and purges any intermediate tensors generated, freeing their GPU memory. It doesn’t purge the yield

value of the internal function.

Tf.tidy Requires a role to clean up afterwards:

average.print()

Utilizing chúng tôi can help prevent memory leaks on your program. In addition, it can be used to carefully control when memory has been recovered. The function passed to chúng tôi ought to be synchronous and additionally not return a guarantee.

Tf.tidy won’t clean up factors. As factors typically last throughout the whole life span of a machine learning model, chúng tôi does not clean up them even if they’re made at a clean nonetheless, it is possible to call dispose them on manually.

You're reading How To Start Machine Learning With Tensorflow.js

How To Improve Supply Chains With Machine Learning?

Supply chain management (SCM) is one of the important activities carried out in industries to keep track of the flow of goods and services. Machine learning has been used in various industries to enhance business processes. 

Likewise, supply chain management is leveraged with machine learning to streamline and optimize the operations involved in it. This is because; machine learning enables the manufacturers to improve production planning, forecast error rates, reduce cost and minimize the latency for components used in the customized products. 

When machine learning is combined with the Internet of Things it acts as a powerful system for supply chain forecasting. This paired technique has a greater ability to improve supply chain management in multiple ways. Do you wish to enhance your supply chain management with machine learning? Go through this article to know the perfect ways to do it!

Machine Learning Based Algorithms 

The future of supply chain and logistics management is machine learning-based algorithms. The industries can benefit from machine learning by reducing the complex constraints, operation cost, and delivery problems. In addition to this, the supply chain owners can get to know about the insights which can be used to enhance the supply chain performance and reduce the complexity. 

Integration Of Machine Learning And IoT

Get Better Pattern Recognition 

Machine learning and Artificial Intelligence not only look for the patterns that are set but also go through complex data sets to find out the potential correlation and give the best solution for future environments. 

The conventional demand forecasting is based on the correlations that have been appearing to the human eyes. But machine learning does in-depth pattern recognition and enhances the accuracy of the forecasting models. AI based Taxi dispatch system is a suitable example of the pairing of machine learning and artificial intelligence in the supply chain and logistics management. 

Identify Inconsistent Supplier Quality 

Reducing Error Rates With Machine Learning 

You should implement the machine learning-based techniques in such a way to create the best plan and optimization for the supply chain. Machine learning methodologies reduce the chance of decreasing sales due to the unavailability of products. Besides, the industries can achieve a certain range of inventory reduction when machine learning based supply chains are used. These things result in reduced error rates for supply chain management. 

Eliminating Potential Risks And Fraudulent Activities

Manufacturers should make use of the insights obtained from machine learning to improve the product and its quality while eliminating the risks and potential for fraudulent activities. This means that the supply chain management team should automate the process using the smart devices and set up them to upload the results in real-time especially in the cloud-based platform to ensure security. With the outcomes of the machine learning insights, you can reduce fraud. 

Predict And Reduce Operation Cost 

The machine learning technique employed in the supply chain is capable of predicting the failure of logistics with the help of the Internet of Things (IoT) data and maintenance logs. So, the supply chain owners can increase productivity by reducing the operation cost as well as maintenance cost than before. 

Avail End-to-End Visibility 

Machine learning and IoT are the two factors that provide real-time monitoring throughout the supply chain. The sensors connected based on IoT can be used to keep track of the supply chain in an organization. With this end-to-end visibility, the industry can resolve the problem found in it and optimize the supply chain. 

The end-to-end visibility feature improves accountability and transparency while making the availability of items within the supply chain and reduces the chance of damage to the deliverables. Obviously, the real-time monitoring system enhances various supply chain management processes from logistics to customer support. 

Final Thoughts 

Thus, these are the right ways to improve supply chain management with machine learning aspects. The proper integration of machine learning and the supply chain forecasting enable organizations to understand the supply operations and logistics in a clear-cut manner. On the other hand, the large volume of data collected from IoT and Artificial Intelligence supports industries to streamline and optimize the supply chain to yield better outcomes.

Machine Learning With Limited Data

This article was published as a part of the Data Science Blogathon.

Introduction

In machine learning, the data’s mount and quality are necessary to model training and performance. The amount of data affects machine learning and deep learning algorithms a lot. Most of the algorithm’s behaviors change if the amount of data is increased or decreased. But in the case of limited data, it is necessary to effectively handle machine learning algorithms to get better results and accurate models. Deep learning algorithms are also data-hungry, requiring a large amount of data for better accuracy.

In this article, we will discuss the relationship between the amount and the quality of the data with machine learning and deep learning algorithms, the problem with limited data, and the accuracy of dealing with it. Knowledge about these key concepts will help one understand the algorithm vs. data scenario and will shape one so that one can deal with limited data efficiently.

The “Amount of Data vs. Performace” Graph

In machine learning, a query could be raised to your mind, how strictly is the data required to train a good machine learning or deep learning model? Well, there is no threshold levels or fixed answer to this, as every piece of information is different and has different features and patterns. Still, there are some threshold levels after which the performance of the machine learning or deep learning algorithms tends to be constant.

Most of the time, machine learning and deep learning models tend to perform well as the amount of data fed is increased, but after some point or some amount of data, the behavior of the models becomes constant, and it stops learning from data.

The above pictures show the performance of some famous machine learning and deep learning architectures with the amount of data fed to the algorithms. Here we can see that the traditional machine learning algorithms learn a lot from the data in a preliminary period, where the amount of data fed is increasing, but after some time, when a threshold level comes, the performance becomes constant. Now, if you provide more data to the algorithm, it will not learn anything, and the version will not increase or decrease.

In the case of deep learning algorithms, there are a total of three types of deep learning architectures in the diagram. The shallow ty[e of deep learning stricture is a minor deep learning architecture in terms of depth, meaning that there are few hidden layers and neurons in external deep learning architectures. In the case o deep neural networks, the number of hidden layers and neurons is very high and designed very profoundly.

From the diagram, we can see a total of three deep learning architectures, and all three perform differently when some amount of data is fed and increased. The shallow, deep neural networks tend to function like traditional machine learning algorithms, where the performance becomes constant after some threshold amount of data. At the same time, deep neural networks keep learning from the data when new data is fed.

From the diagram, we can conclude that,

” THE DEEP NEURAL NETWORKS ARE DATA HUNGRY “

What Problems Arise with Limited Data?

Several problems occur with limited data, and the model could perform better if trained with limited data. The common issues that arise with limited data are listed below:

1. Classification: 

In classification, if a low amount of data is fed, then the model will classify the observations wrongly, meaning that it will not give the accurate output class for given words.

2. Regression:

In a regression problem, if the model’s accuracy is low, then the model will predict very wrong, meaning that as it is a regression problem, it will be expecting the number. Still, limited data may show a horrifying amount far from the actual output.

3. Clustering:

The model can classify the different points in the wrong clusters in the clustering problems if trained with limited data.

4. Time Series:

In time series analysis, we forecast some data for the future. Still, a low-accurate time series model can give us inferior forecast results, and there may be a lot of errors related to time.

5. Object Detection:

If an object detection model is trained on limited data, it might not detect the object correctly, or it can classify the thing incorrectly.

How to Deal With Problems of Limited Data?

There needs to be an accurate or fixed method for dealing with the limited data. Every machine learning problem is different, and the way of solving the particular problem is other. But some standard techniques are helpful for many cases.

1. Data Augmentation

Data augmentation is the technique in which the existing data is used to generate new data. Here the further information generated will look like the old data, but some of the values and parameters would be different here.

This approach can increase the amount of data, and there is a high likelihood of improving the model’s performance.

Data augmentation is preferred in most deep-learning problems, where there is limited data with images.

2. Don’t Drop and Impute:

In some of the datasets, there is a high fraction of invalid data or empty. Due to that, some amount of data s dropped not to make the process complex, but by doing this, the amount of data is decreased, and several problems can occur.

3. Custom Approach:

If there is a case of limited data, one could search for the data on the internet and find similar data. Once this type of data is obtained, it can be used to generate more data or be merged with the existing data.

Conclusion

In this article, we discussed the limited data, the performance of several machine learning and deep learning algorithms, the amount of data increasing and decreasing, the type of problem that can occur due to limited data, and the common ways to deal with limited data. This article will help one understand the process of restricted data, its effects on performance, and how to handle it.

Some Key Takeaways from this article are:

1. Machine Learning and shallow neural networks are the algorithms that are not affected by the amount of data after some threshold level.

2. Deep neural networks are data-hungry algorithms that never stop learning from data.

3. Limited data can cause problems in every field of machine learning applications, e.g., classification, regression, time series, image processing, etc.

4. We can apply Data augmentation, imputation, and some other custom approaches based on domain knowledge to handle the limited data.

Want to Contact the Author?

Follow Parth Shukla @AnalyticsVidhya, LinkedIn, Twitter, and Medium for more content.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Related

How To Land An Internship In Machine Learning?

Strengthen Your Skills

Start by learning some programming languages like Python, Java, and R. Python is the most preferred language for machine learning.

Then learn the data structures and algorithms. Programming skills are a must in any field that involves computer science.

You must be knowledgeable about computer architecture too.

Learn libraries like scikit-learn, numpy, pandas, matplotlib, seaborn, etc. in Python.

Strengthen and deepen your concepts in probability and statistics.

Then learn the concepts of machine learning, like the algorithms used in machine learning and how to apply them, data preprocessing, feature engineering, feature selection, deep learning, etc.

Learn how to deploy models using any of Flask, Django and also on AWS, Azure or any other cloud services.

Build A Strong Portfolio

Personal Projects − Showcase that machine learning is your passion, by building your own websites showcasing projects you have pursued independently. Personal projects are very helpful as the companies need proof that you will be able to bring positive contributions on the table for the company. Write some code on the kernel of Kaggle, commit your codes on Github, or contribute to open source projects.

Participate in Competitions/Hackathons − Kaggle competitions are a great way to test yourselves as well as deepen your understanding of algorithms and learn new ML concepts. Other sites like MachineHack and Topcoder can also be followed as they create lists of live competitions of machine learning.

Machine Learning portfolios are not only about codes and projects and their results, people must know why and how to use the projects. So documentation plays a key role in making the user understand the working of your code, also it helps the interviewers to go through your thought process. It is now quite obvious that you should always document your experience while creating those projects and what are phases the project went through. Always provide readme files, when uploading your projects on Github, use images, graphs, videos, links, and anything necessary to make the users understand the purpose and findings of each of your projects.

Optimize Your Resume

It is very important to optimize and modify your resume before applying for any internship. So if you have more personal projects and have little to none experience in the domain, then put your projects section above work experience. Also if you don’t have any educational background related to the machine learning domain, then keep the ‘Education’ section at the bottom. Also highlight some of your top projects in the ‘Projects’ section and include any MOOCs courses done or Webinars attended related to the field.

Connect with Established People in the Industry

Another thing that one can do is to form a society or a study group at their universities that is focused on Artificial Intelligence and Machine Learning. This will help you enhance your social and leadership skills. If you are able to organize some events or conduct workshops, you will get an opportunity to engage with some of the local companies.

Improve your online presence − Online presence really helps in networking with other people and getting recognized. Your expertise will be of no value if it is unseen by others. So, for an online presence, you can start publishing articles related to machine learning on chúng tôi or even create your own blogs.

Join some communities − The Kaggle community is a great starting point. You can also find such communities on LinkedIn, Quora, Github, Facebook, Discord, etc.

Start Applying to Companies

Stop applying to the MAANG companies at the beginning. These companies get thousands and lakhs of emails per day, so the competition is way too high for a beginner to get in. Instead try some of the smaller, less known companies. If you land an internship in these smaller companies, it won’t do any harm in fact you now get to apply your skills to real-world projects. Which is definitely a good starting point.

Research internships are more recommended than Start-Up/Corporate Internships.

Still if you are interested in Corporate internships, then such companies can be looked upon in associations at your university that organize professional networking events. Or you may search for local events on sites like Eventbrite and Meetup. LinkedIn connections can be of great help here, also you can search for local companies. If none of these work, then google search is always there for you.

Internshala, LetsIntern, Glassdoor, etc. are also great sites to hunt for openings and opportunities.

After you manage to draw the attention of some companies and get preference for the interview, try to find out answers to these questions before you go for an interview −

Why do they need an intern?

What is the core problem they are working on?

What are their current operations?

When you find answers to the above questions, you should think about how much you can fit in or contribute to their cause, this will showcase your genuine curiosity and problem-solving skills during the interview process.

Last option is to ask the professors at your university who are working on machine learning projects if they would need help in some of their research work.

Data Preprocessing In Machine Learning

Introduction to Data Preprocessing in Machine Learning

The following article provides an outline for Data Preprocessing in Machine Learning. Data pre-processing also knows as data wrangling is the technique of transforming the raw data i.e. an incomplete, inconsistent, data with lots of error, and data that lack certain behavior, into understandable format carefully using the different steps (i.e. from importing libraries, data to checking of missing values, categorical followed by validation and feature scaling ) so that proper interpretations can be made from it and negative results can be avoided, as the quality of the model in machine learning highly depends upon the quality of data we train it on.

Start Your Free Data Science Course

Data collected for training the model is from various sources. These collected data are generally in their raw format i.e. they can have noises like missing values, and relevant information, numbers in the string format, etc. or they can be unstructured. Data pre-processing increases the efficiency and accuracy of the machine learning models. As it helps in removing these noises from and dataset and giving meaning to the dataset

Six Different Steps Involved in Machine Learning

Following are six different steps involved in machine learning to perform data pre-processing:

Step 1: Import libraries

Step 2: Import data

Step 3: Checking for missing values

Step 4: Checking for categorical data

Step 5: Feature scaling

1. Import Libraries

The very first step is to import a few of the important libraries required in data pre-processing. A library is a collection of modules that can be called and used. In python, we have a lot of libraries that are helpful in data pre-processing.

A few of the following important libraries in python are:

Numpy: Mostly used the library for implementing or using complicated mathematical computation of machine learning. It is useful in performing an operation on multidimensional arrays.

Pandas: It is an open-source library that provides high performance, and easy-to-use data structure and data analysis tools in python. It is designed in a way to make working with relation and labeled data easy and intuitive.

Matplotlib: It’s a visualization library provided by python for 2D plots o array. It is built on a numpy array and designed to work with a broader Scipy stack. Visualization of datasets is helpful in the scenario where large data is available. Plots available in matplot lib are line, bar, scatter, histogram, etc.

Seaborn: It is also a visualization library given by python. It provides a high-level interface for drawing attractive and informative statistical graphs.

2. Import Dataset

Once the libraries are imported, our next step is to load the collected data. Pandas library is used to import these datasets. Mostly the datasets are available in CSV formats as they are low in size which makes it fast for processing. So, to load a csv file using the read_csv function of the panda’s library. Various other formats of the dataset that can be seen are

Once the dataset is loaded, we have to inspect it and look for any noise. To do so we have to create a feature matrix X and an observation vector Y with respect to X.

3. Checking for Missing Values

Once you create the feature matrix you might find there are some missing values. If we won’t handle it then it may cause a problem at the time of training.

Removing the entire row that contains the missing value, but there can be a possibility that you may end up losing some vital information. This can be a good approach if the size of the dataset is large.

If a numerical column has a missing value then you can estimate the value by taking the mean, median, mode, etc.

4. Checking for Categorical Data

Data in the dataset has to be in a numerical form so as to perform computation on it. Since Machine learning models contain complex mathematical computation, we can’t feed them a non-numerical value. So, it is important to convert all the text values into numerical values. LabelEncoder() class of learned is used to covert these categorical values into numerical values.

5. Feature Scaling

The values of the raw data vary extremely and it may result in biased training of the model or may end up increasing the computational cost. So it is important to normalize them. Feature scaling is a technique that is used to bring the data value in a shorter range.

Methods used for feature scaling are:

 Rescaling (min-max normalization)

 Mean normalization

 Standardization (Z-score Normalization)

 Scaling to unit length

6. Splitting Data into Training, Validation and Evaluation Sets

Finally, we need to split our data into three different sets, training set to train the model, validation set to validate the accuracy of our model and finally test set to test the performance of our model on generic data. Before splitting the Dataset, it is important to shuffle the Dataset to avoid any biases. An ideal proportion to divide the Dataset is 60:20:20 i.e. 60% as the training set, 20% as test and validation set. To split the Dataset use train_test_split of sklearn.model_selection twice. Once to split the dataset into train and validation set and then to split the remaining train dataset into train and test set.

Conclusion – Data Preprocessing in Machine Learning

Data Preprocessing is something that requires practice. It is not like a simple data structure in which you learn and apply directly to solve a problem. To get good knowledge on how to clean a Dataset or how to visualize your dataset, you need to work with different datasets. The more you will use these techniques the better understanding you will get about it. This was a general idea of how data processing plays an important role in machine learning. Along with that, we have also seen the steps needed for data pre-processing. So next time before going to train the model using the collected data be sure to apply data pre-processing.

Recommended Articles

This is a guide to Data Preprocessing in Machine Learning. Here we discuss the introduction and six different steps involved in machine learning. You can also go through our other suggested articles to learn more –

5 Challenges Of Machine Learning!

This article was published as part of the Data science Blogathon.

Introduction :

In this post, we will come through some of the major challenges that you might face while developing your machine learning model. Assuming that you know what machine learning is really about, why do people use it, what are the different categories of machine learning, and how the overall workflow of development takes place.

Image Source

What can possibly go wrong during the development and prevent you from getting accurate predictions?

So let’s get started, during the development phase our focus is to select a learning algorithm and train it on some data, the two things that might be a problem are a bad algorithm or bad data, or perhaps both of them.

Table of Content :

Not enough training data.

Poor Quality of data.

Irrelevant features.

Nonrepresentative training data.

Overfitting and Underfitting.

1. Not enough training data :

Let’s say for a child, to make him learn what an apple is, all it takes for you to point to an apple and say apple repeatedly. Now the child can recognize all sorts of apples.

2. Poor Quality of data:

Obviously, if your training data has lots of errors, outliers, and noise, it will make it impossible for your machine learning model to detect a proper underlying pattern. Hence, it will not perform well.

So put in every ounce of effort in cleaning up your training data. No matter how good you are in selecting and hyper tuning the model, this part plays a major role in helping us make an accurate machine learning model.

“Most Data Scientists spend a significant part of their time in cleaning data”.

There are a couple of examples when you’d want to clean up the data :

If you see some of the instances are clear outliers just discard them or fix them manually.

If some of the instances are missing a feature like (E.g., 2% of user did not specify their age), you can either ignore these instances, or fill the missing values by median age, or train one model with the feature and train one without it to come up with a conclusion.

3. Irrelevant Features:

“Garbage in, garbage out (GIGO).”

Image Source

In the above image, we can see that even if our model is “AWESOME” and we feed it with garbage data, the result will also be garbage(output). Our training data must always contain more relevant and less to none irrelevant features.

The credit for a successful machine learning project goes to coming up with a good set of features on which it has been trained (often referred to as feature engineering ), which includes feature selection, extraction, and creating new features which are other interesting topics to be covered in upcoming blogs.

4. Nonrepresentative training data:

To make sure that our model generalizes well, we have to make sure that our training data should be representative of the new cases that we want to generalize to.

If train our model by using a nonrepresentative training set, it won’t be accurate in predictions it will be biased against one class or a group.

For E.G., Let us say you are trying to build a model that recognizes the genre of music. One way to build your training set is to search it on youtube and use the resulting data. Here we assume that youtube’s search engine is providing representative data but in reality, the search will be biased towards popular artists and maybe even the artists that are popular in your location(if you live in India you will be getting the music of Arijit Singh, Sonu Nigam or etc).

So use representative data during training, so your model won’t be biased among one or two classes when it works on testing data.

5. Overfitting and Underfitting :

What is overfitting?

Image Source

Let’s start with an example, say one day you are walking down a street to buy something, a dog comes out of nowhere you offer him something to eat but instead of eating he starts barking and chasing you but somehow you are safe. After this particular incident, you might think all dogs are not worth treating nicely.

So this overgeneralization is what we humans do most of the time, and unfortunately machine learning model also does the same if not paid attention. In machine learning, we call this overfitting i.e model performs well on training data but fails to generalize well.

Overfitting happens when our model is too complex.

Things which we can do to overcome this problem:

Simplify the model by selecting one with fewer parameters.

By reducing the number of attributes in training data.

Constraining the model.

Gather more training data.

Reduce the noise.

What is underfitting?

Image Source

Yes, you guessed it right underfitting is the opposite of overfitting. It happens when our model is too simple to learn something from the data. For E.G., you use a linear model on a set with multi-collinearity it will for sure underfit and the predictions are bound to be inaccurate on the training set too.

Things which we can do to overcome this problem:

Train on better and relevant features.

Reduce the constraints.

Conclusion :

Machine Learning is all about making machines better by using data so that we don’t need to code them explicitly. The model will not perform well if training data is small, or noisy with errors and outliers, or if the data is not representative(results in biased), consists of irrelevant features(garbage in, garbage out), and lastly neither too simple(results in underfitting) nor too complex(results in overfitting). After you have trained a model by keeping the above parameters in mind, don’t expect that your model would simply generalize well to new cases you may need to evaluate and fine-tune it, how to do that? Stay tuned this is a topic that will be covered in the upcoming blogs.

Thank you,

Karan Amal Pradhan.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Related

Update the detailed information about How To Start Machine Learning With Tensorflow.js on the Moimoishop.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!