Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Articles by Mithilesh Pradhan
44 articles
Exploring Data Distribution
Introduction The distribution of data gives us useful insights into the data while working with any data science or machine learning use case. Data Distribution is how the data is available and its present condition, the information about specific parts of the data, any outliers in the data as well as central tendencies related to the data. To explore the data distribution there popular graphical methods that prove beneficial while working with the data. In this article let us explore these methods. Know more about your data: The Graphical Way Histograms & KDE Density Plots Histograms are the most ...
Read MoreImproving model accuracy with cross validation technique
Introduction Cross Validation (CV) is a way of training machine learning models in which multiple models are trained on a part of the data and then accessing their performance or testing them on a independent unseen set of data. In the Cross-validation technique, we generally split the original train data into different parts iteratively so that the algorithm trains and validates itself on each portion of the data none of them are left out in the process In this article let us have a deep good understanding of the Cross-Validation technique and its significance in improving Model accuracy. Cross Validation ...
Read MoreChecking the normality of a data set or a feature
Introduction Normality is defined as the phenomenon of belonging to a normal or Gaussian distribution in statistical terms. The normality of a dataset is the test for a dataset or variable if it follows a normal distribution. Many tests can be performed to check the normality of a dataset among which the most popular ones are the Histogram method, the QQ plot, and the KS Test. Normality testing – Checking for Normality There are both statistical and graphical approaches to determining the normality of a dataset or a feature. Let us look through some of these methods. Graphical Methods Histogram ...
Read MoreWhat is OOB error?
Introduction OOB or Out of Bag error and OOB Score is a term related to Random Forests. Random Forest is an ensemble of decision trees that improves the prediction from that of a single decision tree.OOB error is used to measure the error in the prediction of tree-based models like random forests, decision trees, and other ML models using the bagging method. In an OOB sample, the number of wrong classifications is an OOB error. In this article let's explore OOB error/score. Before moving ahead let us a short overview of Random Forest and Decision Trees. Random Forest Algorithm Random ...
Read MoreThe Hathaway Effect: Does The Anne Hathaway Effect Really True?
Introduction Today Machine Learning plays a crucial role in predicting stock prices and the growth of popular organizations and investment banks. While working on many such problems we consider many relations and correlations between different kinds of factors. The Anne Hathaway Effect is one such peculiar correlation related to popular businessman and investor Warren Buffet, Anne Hathaway, and his company Berkshire Hathaway(BRK). In this article let us know more about the effects and observations around this phenomenon. The Anne Hathaway Effect The Hathaway effect news was first picked up by CNBC. According to this effect, whenever Anne ...
Read MoreTechniques to find similarities in recommendation system
Introduction Similarity metrics are crucial in Recommendation Systems to find users with similar behavior, pattern, or taste. Nowadays Recommendation systems are found in lots of useful applications such as Movie Recommendations as in Netflix, Product Recommendations as in Ecommerce, Amazon, etc. Organizations use preference matrices to capture use behavioral and feedback data on products on specific attributes. They also capture the sequence and trend of users purchasing products and users with similar behavior are captured in the process. In this article, let's understand in brief the idea behind a recommendation system and explore the similar techniques and measures involved in ...
Read MoreLimitations of fixed basis function
Introduction Fixed basis functions are functions that help us to extend linear models in Machine Learning, by taking linear combinations of nonlinear functions. Since Linear models depend on the linear combination of parameters, they suffer a significant limitation. The radial function thus helps model such a group of models by utilizing non-linearity in the data while keeping the parameters linear. Different linear combinations of the fixed basis functions are used within the linear regression by creating complex functions. In this article let us look into the fixed basis function and its limitations Fixed Basis function A linear regression model ...
Read MorePython | Measure similarity between two sentences using cosine similarity
Introduction Natural Language Processing for finding the semantic similarity between sentences, words, or text is very common in modern use cases. There are numerous ways to calculate the similarity between texts. One such popular method is cosine similarity. It is used to find the similarity between two vectors that are non-zero in value and measures the cosine of the angle between the two vectors using dot product formula notation. Through this article let us briefly explore cosine similarity and see its implementation using Python. Cosine similarity – Finding similarity between two texts Cosine Similarity is defined as the cosine of ...
Read MoreHandling sparsity issues in recommendation system
Introduction In Recommendation Systems, Collaborative filtering is one of the approaches to building a model and finding seminaries between users. This concept is highly used in Ecommerce sites and OTT and video-sharing platforms. One of the highly talked about issues that such systems face while in the initial modeling phase is that of data sparsity, which occurs when only a few users give ratings or reviews on the platform and are in any way involved in the interaction. In this article let us understand the problem of data sparsity in the Recommendation System and know about ways to handle it. ...
Read MoreDifference Between Training and Testing Data
Introduction In Machine Learning, a good model is generated if we have a good representation and amount of data. Data may be divided into different sets that serve a different purposes while training a model. Two very useful and common sets of data are the training and testing set. The training set is the part of the original dataset used to train the model and find a good fit. Testing data is part of the original data used to validate the model train and analyze the metrics calculated. In this article lets us explore training and testing data sets in ...
Read More