Hello all! I have writing on Medium for some time, but I just realized that I never create a complete About Me page — well, here it is.

Author image in his Environment

My name is Cornellius Yudha Wijaya and I am originally came from Indonesia. Currently, I have a full-time job as a Data Scientist at Allianz Life Indonesia and done some content creation here and there about Data Science. Mostly, my writing is my learning material or something that was sparked by my conversation with other people — that is why, the more people discuss with me, the more idea I get.

A…


Which pandas data frame EDA packages suit you?

GIF created by Author

As a data scientist, our work would always involve exploring data or often called Exploratory Data Analysis (EDA). The purposes of exploring data are to know our data better and grasp what we are dealing with.

Previously, exploring data using the pandas data frame is a big hassle because we need to code every single analysis from scratch. Not only it takes a lot of time, but it takes our mind focus as well.

Take an example of the mpg dataset below.

import pandas as pd
import seaborn as sns
mpg = sns.load_dataset('mpg')
mpg.head()


Alternative you might want to use for many reasons.

Image by Author

As modern Data scientists, programming is the main tool that we must master. Although, unlike software developers or engineers, we utilize programming language for data analysis and not for software development. That is why we data scientists are taught to use the data analysis IDE for our work.

Jupyter Notebook is the main environment where many Data Scientists start from because it is the easiest to understood and integrated with the Anaconda environment (which many people use to set up the Data Science platform). …


Elevate your time series analysis with this Python Packages

Photo by Agê Barros on Unsplash

As a Data Scientist, you are employed because of your skill in data analysis and machine learning. One of the analyses often requested by the business is to do a business forecast, especially the time-related forecast. For example, how is our selling would do with the current strategy or how is the investment in the future. This problem is what we called a time-series analysis problem.

Many businesses want data scientists to solve their problems; it is good for data scientists to learn about time-series analysis, especially Python. For that reason, I want to outline my top three Python Package…


Basic knowledge to understand machine learning math (with Python)

Image by Author

When we learn machine learning for the first time, it might be challenging because many strange notations and mathematical concepts are hard to understand. Especially for people who are not formally taught, it similar to learning a new language.

One of the basic concepts that I feel people need to understand when learning math for machine learning is the numbers and set concept. Many simples to advance equations would include these number concepts, and different numbers lead to a different conclusion.

In this article, I would explain the concept of Numbers and Sets in math with coding examples to help…


Notes from Industry

Basic analysis that people often overlook

Image by Author

As Data Scientists, developing machine learning models is a part of our daily job and why we are employed in the first place. However, The machine learning model we develop is not just for show but an actual tool to solve the business problem. This is why we need to evaluate our machine learning model — to measure how our model impacts the business.

Many data scientists would measure people in technical metrics such as Accuracy, Precision, F1 Score, ROC-AUC, and many more. It is a necessary metric, but sometimes these metrics did not reflect how the model would do…


You need to interpret your Machine Learning model

Image by Author

Developing a machine learning model is something that expected from any data scientist. I have come across many data science study that focuses only on the modelling aspect and the evaluation without the interpretation.

However, many haven’t realized the importance of machine learning interpretability in the business process. In my experience, business people would want to know how the model works rather than the metric evaluation itself.

That is why, in this post, I want to introduce you to some of my top python package for machine learning interpretability. Let’s get into it!

1. Yellowbrick

Yellowbrick is an open-source Python package that…


Boost your Machine Learning knowledge with these Python packages

Photo by Markus Winkler on Unsplash

As data scientists, one of the reasons we are employed is because of our machine learning skills. In the paper, it sounds exciting to learn about artificial intelligence and machine learning. Still, as we are going deeper into the matter, we realize that machine learning is not as easy as it looks.

You might produce a machine learning supervised model with a single line of code — like what all the experts do in the industry. Many experts have developed the complex math and statistic behind the model into a one-liner code that helps our everyday job. …


How different the classroom and industrial employment experience

Photo by Scott Graham on Unsplash

The Data field is a vast world, where many branches within the field. Therefore, you could hold many titles in this field, such as Data Scientist, Data Analyst, Data Engineer, Business Intelligence, Machine Learning Engineer, and many more. However, whatever your data title you hold currently, it always starts somewhere — and it starts from education.

I have a title as a Data Scientist, but I have experience as a Data Educator previously. I have work in an industrial and academic environment, and I even have experience educating industrial people. This means that I have known what you could expect…


Alternative workflows comparison for your Data Analysis with Python

Photo by Campaign Creators on Unsplash

For many modern data scientists, Python is the programming language that was used for their everyday work — as a consequence, the data analysis would be done using one of the most data packages, which are Pandas. Many online courses and lectures would introduce Pandas as the basis for every data analysis with Python.

In my opinion, Pandas is still the most useful and viable package to do your data analysis in Python. However, for comparison purposes, I want to introduce you to several Pandas package alternatives.

Cornellius Yudha Wijaya

Data Scientist@Allianz |LinkedIn:Cornellius Yudha Wijaya| Twitter:@CornelliusYW

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store