Found 162 Articles for Data Science

Defensive R Programming

Bhuwanesh Nainwal
Updated on 17-Jan-2023 14:54:01

282 Views

Defensive programming is a software development practice that involves designing and implementing code in a way that anticipates and prevents errors and vulnerabilities. In R programming, defensive programming involves using techniques and strategies to ensure that your R code is robust, reliable, and secure. By the word “Defensive” in defensive programming, most of you might be confused about whether it means writing such a code that doesn’t fail at all. But the actual definition of “Defensive programming” is writing such a code that fails properly. By “failing properly”, we mean − If the code fails, then it should be ... Read More

Dealing with Missing Data in R

Bhuwanesh Nainwal
Updated on 17-Jan-2023 16:12:22

24K+ Views

In data science, one of the common tasks is dealing with missing data. If we have missing data in your dataset, there are several ways to handle it in R programming. One way is to simply remove any rows or columns that contain missing data. Another way to handle missing data is to impute the missing values using a statistical method. This means replacing the missing values with estimates based on the other values in the dataset. For example, we can replace missing values with the mean or median value of the variable in which the missing values are found. ... Read More

Data Manipulation in R with data.table

Bhuwanesh Nainwal
Updated on 17-Jan-2023 14:17:38

1K+ Views

Data manipulation is a crucial step in the data analysis process, as it allows us to prepare and organize our data in a way that is suitable for the specific analysis or visualization. There are many different tools and techniques for data manipulation, depending on the type and structure of the data, as well as the specific goals of the manipulation. The data.table package is an R package that provides an enhanced version of the data.frame class in R. It’s syntax and features make it easier and faster to manipulate and work with large datasets. The date.table is one ... Read More

Introduction to Data Science in Python

Prabhdeep Singh
Updated on 11-Jan-2023 11:31:06

511 Views

As the world entered the era of big data in recent decades, the demand for more effective and efficient data storage greatly expanded. Businesses that use big data invest a lot of time and energy in creating frameworks that can hold a lot of information. The storage of vast amounts of data was then made possible by the creation of frameworks like Hadoop. As the storage issue can be resolved by using the frameworks the next issue that comes is to process the data that had already been stored. The solution to processing the data and getting the useful information ... Read More

Introduction to Git for Data Science

Prabhdeep Singh
Updated on 11-Jan-2023 11:20:43

888 Views

The data science and engineering fields are interacting more and more because data scientists are working on production systems and joining R&D teams. We want to make it simpler for data scientists without prior engineering experience to understand the core engineering best practices. We are building a manual on engineering subjects like Git, Docker, cloud infrastructure, and model serving that we hear data science practitioners think about. Introduction to Git A version control system called Git is made to keep track of changes made to a source code over time. Without a version control system, a collaboration between multiple people ... Read More

Python Data Science using List and Iterators

Prabhdeep Singh
Updated on 11-Jan-2023 11:23:00

185 Views

Data science is the process of organizing, processing, and analyzing vast amounts of data in order to extract knowledge and insights from them. It involves a number of different fields, including statistical and mathematical modelling, data extraction from its source, and methods for data visualization. Working with big data technology to gather both structured and unstructured data is commonly required. In the parts that follow, we'll examine several applications of data science and how python might be useful there. Python is a widely used high-level, general-purpose, object-oriented, and interpreted language. To utilize Python for a task, one only needs to ... Read More

Introduction to Python for Data Science

Prabhdeep Singh
Updated on 11-Jan-2023 11:15:18

193 Views

Python is a general-purpose, object-oriented, interpreted, high-level language and is very popular in the market. Python has a very rich library that contains pre-defined code for almost every purpose and to use python for a task using only needs the logic, as most of the coding part is handled by python itself. Python has a large community of developers which provides an extra benefit to newcomers and the experienced python user that there is no issue with any bugs. Before moving to the introduction of python for data science let’s see some basics of data science. What is Data Science? ... Read More

Software Engineering for Data Scientists in Python

Prerna Tiwari
Updated on 09-Jan-2023 16:41:06

203 Views

Data science integrates math and statistics, specialized programming, advanced analytics, machine learning, and artificial intelligence (AI) with specific subject matter expertise to reveal actionable insights hidden in an organization’s data. Data science is one of the fields which has shown the quickest growth rates across all industries. This is a result of the increasing volume of data sources and data that results from them. Data Science has generated controversy among other disciplines as a field ever since it began to gain recognition. In this article we will be learning about the fundamentals of software engineering, why it ... Read More

Parallel Computing with Dask

Prerna Tiwari
Updated on 09-Jan-2023 16:08:30

426 Views

Dask is a flexible open-source Python library which is used for parallel computing. In this article, we will learn about parallel computing and why we should choose Dask for this purpose. We will compare it with various other libraries like spark, ray and modin. We have also discussed use cases of Dask. Parallel Computing A type of computation known as parallel computing carries out several computations or processes simultaneously. Large issues are typically divided into manageable pieces that may be solved separately. The four categories of parallel computing are Bit-level Instruction-level Data-level Job parallelism. ... Read More

Data Analysis with Spreadsheets

Prerna Tiwari
Updated on 09-Jan-2023 16:30:14

400 Views

Cleansing, transforming, and analyzing raw data is the first step in the process of obtaining useful, pertinent information which can help businesses make informed conclusions. By offering relevant information and facts, which are usually presented as charts, pictures, tables, and graphs, the strategy helps to lower the risks associated with decision-making. Data analysis is concerned with the process of converting unprocessed data into pertinent statistics, knowledge, and explanations. Data analysis is a crucial competence that may support better decision-making. Spreadsheets are the most often used tools for data analysis, and built-in pivot tables are the most popular analytical tool. ... Read More

Advertisements