Hello all! I have writing on Medium for some time, but I just realized that I never create a complete About Me page — well, here it is.
My name is Cornellius Yudha Wijaya and I am originally came from Indonesia. Currently, I have a full-time job as a Data Scientist at Allianz Life Indonesia and done some content creation here and there about Data Science. Mostly, my writing is my learning material or something that was sparked by my conversation with other people — that is why, the more people discuss with me, the more idea I get.
As a data scientist, our work would always involve exploring data or often called Exploratory Data Analysis (EDA). The purposes of exploring data are to know our data better and grasp what we are dealing with.
Previously, exploring data using the pandas data frame is a big hassle because we need to code every single analysis from scratch. Not only it takes a lot of time, but it takes our mind focus as well.
Take an example of the mpg dataset below.
import pandas as pd
import seaborn as sns
mpg = sns.load_dataset('mpg')
As modern Data scientists, programming is the main tool that we must master. Although, unlike software developers or engineers, we utilize programming language for data analysis and not for software development. That is why we data scientists are taught to use the data analysis IDE for our work.
Jupyter Notebook is the main environment where many Data Scientists start from because it is the easiest to understood and integrated with the Anaconda environment (which many people use to set up the Data Science platform). …
As a Data Scientist, you are employed because of your skill in data analysis and machine learning. One of the analyses often requested by the business is to do a business forecast, especially the time-related forecast. For example, how is our selling would do with the current strategy or how is the investment in the future. This problem is what we called a time-series analysis problem.
Many businesses want data scientists to solve their problems; it is good for data scientists to learn about time-series analysis, especially Python. For that reason, I want to outline my top three Python Package…
When we learn machine learning for the first time, it might be challenging because many strange notations and mathematical concepts are hard to understand. Especially for people who are not formally taught, it similar to learning a new language.
One of the basic concepts that I feel people need to understand when learning math for machine learning is the numbers and set concept. Many simples to advance equations would include these number concepts, and different numbers lead to a different conclusion.
In this article, I would explain the concept of Numbers and Sets in math with coding examples to help…
As Data Scientists, developing machine learning models is a part of our daily job and why we are employed in the first place. However, The machine learning model we develop is not just for show but an actual tool to solve the business problem. This is why we need to evaluate our machine learning model — to measure how our model impacts the business.
Many data scientists would measure people in technical metrics such as Accuracy, Precision, F1 Score, ROC-AUC, and many more. It is a necessary metric, but sometimes these metrics did not reflect how the model would do…
Developing a machine learning model is something that expected from any data scientist. I have come across many data science study that focuses only on the modelling aspect and the evaluation without the interpretation.
However, many haven’t realized the importance of machine learning interpretability in the business process. In my experience, business people would want to know how the model works rather than the metric evaluation itself.
That is why, in this post, I want to introduce you to some of my top python package for machine learning interpretability. Let’s get into it!
Yellowbrick is an open-source Python package that…
As data scientists, one of the reasons we are employed is because of our machine learning skills. In the paper, it sounds exciting to learn about artificial intelligence and machine learning. Still, as we are going deeper into the matter, we realize that machine learning is not as easy as it looks.
You might produce a machine learning supervised model with a single line of code — like what all the experts do in the industry. Many experts have developed the complex math and statistic behind the model into a one-liner code that helps our everyday job. …
The Data field is a vast world, where many branches within the field. Therefore, you could hold many titles in this field, such as Data Scientist, Data Analyst, Data Engineer, Business Intelligence, Machine Learning Engineer, and many more. However, whatever your data title you hold currently, it always starts somewhere — and it starts from education.
I have a title as a Data Scientist, but I have experience as a Data Educator previously. I have work in an industrial and academic environment, and I even have experience educating industrial people. This means that I have known what you could expect…
For many modern data scientists, Python is the programming language that was used for their everyday work — as a consequence, the data analysis would be done using one of the most data packages, which are Pandas. Many online courses and lectures would introduce Pandas as the basis for every data analysis with Python.
In my opinion, Pandas is still the most useful and viable package to do your data analysis in Python. However, for comparison purposes, I want to introduce you to several Pandas package alternatives. …
Data Scientist@Allianz |LinkedIn:Cornellius Yudha Wijaya| Twitter:@CornelliusYW