Sign in

I am an aspiring data scientist who enjoys connecting the dots: be it ideas from different disciplines, people from different teams.


Students need to focus on six areas during their education to ensure that they are trained well to become professionals

Industry-Academia gap is probably the most clichéd term heard during campus placements. As campus recruits cannot be directly deployed in projects, rigorous training for them is a prime area of concern for many organizations. This entails a heavy investment of time and money and often becomes a reason to avoid campus hires.

Therefore, it is important for students to be job ready from Day 1. They should look upon college education as a medium of overall personality growth…

Image by Bpodataentryhelp

Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data.

The steps and techniques for data cleaning will vary from dataset to dataset. As a result, it’s impossible for a single guide to cover everything you might run into.

Better Data > Fancier Algorithms

Data cleaning is one those things that everyone does but no one really talks about. Sure, it’s…

With great amounts of data comes the greater need to process data accurately. And in this case, analysis with tons of data onboard can be a difficult task to deal with. Therefore, such techniques are employed in data preprocessing in data mining to get the required results and can be done so in the following ways.

Image by @ibm
  1. Data Cube Aggregation:
    A data cube is constructed using the operation of data aggregation.
  2. Attribute Subset Selection:
    using only attributes that are highly relevant is usually the correct way to deal with things. Unnecessary data can always be discarded. …

Data transformation is often used in processes such as data migration, data integration or data management tasks such as data wrangling and data warehousing.

Image by University of Sussex

In projects involving data analytics, the data can be transformed at stages. Companies that have data warehouses that are on-premises typically use the ETL (extract, transform, and load) process where data transformation is one of the middle steps. Most organizations lately use cloud-based data warehouses where computing and storage resources can be scaled with extremely low latency that can be measured in seconds. This scalability allows organizations to bypass the preload transformations and load raw data…

The four components of Data Preprocessing are:

  1. Data Integration
  2. Data Transformation
  3. Data Reduction
  4. Data Cleaning

Data Integration

Image by @aimultiple

Data integration is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information. A complete data integration solution delivers trusted data from various sources to support a business-ready data pipeline for DataOps.

Data integration involves combining data from several disparate sources, which are stored using various technologies and provide a unified view of the data. Data integration becomes increasingly important in cases of merging systems of two companies or consolidating applications within one company…

When we talk about data, we usually think of some large datasets with huge number of rows and columns. While that is a likely scenario, it is not always the case — data could be in so many different forms: Structured Tables, Images, Audio files, Videos etc..

Machines don’t understand free text, image or video data as it is, they understand 1s and 0s. So it probably won’t be good enough if we put on a slideshow of all our images and expect our machine learning model to get trained just by that!

In any Machine Learning process, Data Preprocessing…

Two years after joining in SRM University in a Computer Science Bachelors Degree, was all it took before I realized I wanted to shift towards a more data-centric career. After seeing the skillset of Data Scientist like coding ,math and statistics,Machine learning,…. was the moment when I decided where my long-term passions resided.

As per my knowledge the skills required to be a Data Scientist are:

  • Statistics
  • At least one programming language — R/ Python
  • Data Extraction, Transformation, and Loading
  • Data Wrangling and Data Exploration
  • Machine Learning Algorithms
  • Advanced Machine Learning (Deep Learning)
  • Big Data Processing Frameworks
  • Data Visualization


Gurram Bhaskar

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store