Data Science for Beginners: 2023 - 2024 Road Map

Photo by fabio on Unsplash

Data Science for Beginners: 2023 - 2024 Road Map

Data science is the art and science of finding hidden insights from data using math, statistics, programming, and AI (source: Google). As a software developer, I see it as an alternative approach to problem-solving, moving from writing manual instructions for a "dumb box" - computer - to deriving those instructions from examples. The same way human beings learn new concepts. It allows for solving complex problems that would otherwise be daunting with traditional programming.

That said, if you want to get into Data Science, where do you begin? This article seeks to give you a way to go about it from noob to expert.

Let's start by addressing key tools that you would need to get started:

  1. Programming languages: Python or R, and SQL

  2. Machine learning libraries: TensorFlow, Keras or Scikit-learn or Pytorch or JAX

  3. Data visualization tools: Visualization tools like Tableau, Power BI, and Matplotlib

  4. Data storage and management systems: Databases like MySQL, MongoDB, and PostgreSQL

  5. Cloud computing platforms: AWS, Azure, and Google Cloud Platform

  6. Version control tool: Git or Bitbucket

  7. Data Science Community: Kaggle platform

You don't need to know every single one of them. Pick one programming language (Python or R); one database tool; and one visualization tool and get familiar with one cloud computing platform; get familiar with version control to store your data-driven apps and join Kaggle to collaborate and contribute your knowledge and expertise.

Next, dive into data collection and data cleaning

Data is at the heart of Data Science. Immerse yourself in projects (you can find them on Kaggle). This will force you to look for data on your own to be used in model creation. Kaggle itself is a good source of data, but you can find it on other platforms like:

  • DTM Data Generator.

  • Mockaroo.

  • Redgate SQL Data Generator.

  • MOSTLY AI.

  • DATPROF.

Another way of getting data is through scraping with libraries like Puppeteer and Selenium.

The data you obtain is usually raw. It may therefore be laden with mistakes. It's up to you to identify and correct the mistakes (invalid format, missing values etc.)

You will use tools like Pandas and Numpy to explore and modify the noisy data.

One idea of a project that you could do to learn data collection is collecting data via Twitter API and then possibly analyzing or doing sentiment analysis for a more advanced project.

Next, get into data analysis and practice storytelling

This is simple, write articles as you learn - like I am :)

Dive into data analysis by plotting data and creating visualizations as a way of drawing insights from data. Dive into financial, health, demographic databases and device formulas to help these industries be efficient or make a profit - just a thought :)

Practice asking questions that target business metrics; plot data using Plotly or Seaborn; practice formatting, filtering, handling missing values, outliers, and univariate and multivariate analysis.

Level up

At this stage, you can do Data collection and cleaning and Exploratory Data Analysis.

Now it's time to dive deep and learn how Machine Learning Algorithms work. You do exactly that by learning Applied Statistics and Mathematics. At this stage, you focus on Inferential statistics, descriptive statistics, Linear Algebra and multivariate statistics.

At this point, you are ready for Machine Learning

You can now dive into machine learning to build models that leverage data to make predictions. Learn the various stages of building a machine learning model and the various types of models. Choose a framework/tool(Tensorflow, Pytorch, Scikit-learn, Keras, JAX) and start building.

This is the final and most exciting stage - at least for me!

I hope you enjoyed this article, like or comment if you found it insightful. Keep learning!