Markus_Aditya_Surya

PROFILE

Nationality: Indonesian (Singapore PR) ✉️: markus.widjaja@yahoo.com LinkedIn

ABOUT

I am Markus Aditya Surya Widjaja, a Data analyst and civil engineer who advocates the application of data analytics and automation in the construction industry. Passionate in exploring machine learning use-cases for the construction business. Experienced in handling multiple tasks, able to think critically and analytically and comfortable with communicating technical content to laymen. Able to grasp new concepts and skills quickly.

Education
Projects
Experience
Skills

EDUCATION

RISE by BCG (Business and Data Analytics Specialisation)

Mar 2021- Sep 2021: 6 months (Graduated with Merit) Credentials here

Data manipulation/wrangling, analytics and visualisation
SQL, Python (Pandas, Scikit), PowerBI
Machine Learning, Capstone project completed with distinction

M.Sc. in Civil Engineering at National University of Singapore (Transportation Engineering Specialisation)

Aug 2018 – Jul 2019: 1 year

Specialisation in Transportation Engineering
Transportation demand forecasting methods
Basic understanding of Vehicular Routing Problem algorithms (VRP)

B.Eng (Merit) in Civil Engineering at National University of Singapore

Aug 2012 – Jul 2016: 4 years

Undergraduate ASEAN Scholarship Recipient

PROJECTS

I’m doing some data related projects to slowly build my portfolio.

Multivariate Time Series Prediction

Market Price Prediction

A simple project on predicting the market value of Japan Yen (JPY) using VARMA and Long Short-Term Memory (LSTM) deep-learning model.

Brief: Predicting the adjusted close values of “JPY=X” for the next 5 days, using the market data of several funds and other currency from the last 10 years (2011 to September 2021):

Tools used: Python (Pandas, scipy, keras, etc.)

The market data are checked for correlation with JPY data and are converted to their principal components with PCA as shown below
PCA

The data from 2011 to 2018 is useed to train the model and the remaining data from 2019 to 2021 is used as test set. The VARMAX model result is shown below with rmse of 2.336.
VARMAX VARMAX2

Using the same train and test set, the LSTM model is used with SGD optimiser. This model improves the rmse to 1.06.
LSTM
The model is then used to predict the values for the next 5 days.
LSTM Prediction

Click here to view codebase, charts and csv datasets

HR Analytics (BCG RISE Capstone Project)

Improving visibility and talent management on a multinational mining company (completed with distinction credentials here)

A final capstone project done during the RISE program to propose solutions to better manage the organisation and provide efficient recommendation to managers on managing employees

Brief: HR leadership has less visibility in the current workforce management process, which caused high dependency on line managers and individual teams. The client is looking for:

Visualisation of talent database to enable efficient talent conversation
Employee profiling to facilitate talent development and management

Tools used: Python (Pandas, scipy, etc.), Microsoft Power BI

As the project involves real client and employee data, native files and datasets are not shared.

A/B testing mini project

A mini project done during the RISE program to test the best level for setting the gate of the game. Dataset contained user data with number of rounds played and retention status.

Brief: Cookie Cats is a popular mobile puzzle game where players complete a task and level up. While leveling up, players encounter gates which force players to wait before continue playing or make in game purchases.
Business Problem: Revenue from in-game purchases has been declining over time and total number of active players declining, with players uninstalling the game after playing for a few days.
Hypothesis: players are churning because the first gate encounter at level 30 is too early. A/B test is performed comparing 2 groups of players, one encounter the gate at level 30 and the other, at level 40.

Tools used: Python (Pandas, matplotlib, numpy) Dataset: Kaggle.com

The two groups are defined as Group A (gate at Level 30) and Group B (Gate at level 40). The sum of number of rounds of game played are checked for normality first by plotting the histogram and qq plot as shown below and shapiro test. All plots and tests indicate that the data is non-normal. Normality Checks

Both groups are then tested for equality of variances with levene test, which shows that the variances are equal. The 2 groups is then tested with mann-whitney U test for 2-tailed test which yield the result below

Twotailedtest

Notice that with the 2 tailed test, H1 is rejected. Which means that changing the gate to level 40 would not yield any meaningful improvement. The 2 groups are then also tested for 1-tailed test as the mean of sum of rounds of game played of group A (gate 30) is noticably higher than that of group B (gate 40). Here we can see that for 1 tailed test, we can see that H1 is accepted and that putting the gate at level 30 would give a higher sum of rounds of game played than that of group B.

Onetailedtest

Bootstrap resampling is also done with size of 40000 and 1000 samples for each group to further remove outliers and verify the significance test result above. We can clearly see below that the mean retention rate at both 1 day and 7 day are higher in group A than that of group B

Retention Rate Distribution

Click here to view codebase, charts and csv datasets

Visualisation and data wrangling project

Formula One: Track & Thrill

A simple passion project on visualising Formula One data to find out where the most exciting race locations are. For fans, teams and drivers alike. A simple dashboard that can serve a starting point to asnwer several business questions such as:

Where should the next Formula One race be held to attract the most viewership and revenue?
Which circuit need modification to improve the excitement factor of the race?
How should we plan the race schedule for the season?
How do we plan our coverage/filler content? Which teams or driver should we focus on in each event?

For the fans, it may give them some insights on:

On which circuits can I see more actions? Which races should I watch?
Which teams have the most competitive cars? On which circuits?
Who are the drivers to look out for?

Tools used: Python (Pandas), Power BI
Dataset: Kaggle.com

The charts below visualise the data of F1 races between 2017 season to 2021 season (up to Austrian GP) by using a slicer for race year. The dataset itself contains race results from 1950 to 2021.

F1 Most Exciting Circuit V2 Team wins per circuit V2 Driver wins per circuit V2 Shares of points won per circuit

Some guiding questions can be asked to set a direction for a continuation of this project:

How does pitstop/racing incident affect the ranking?
How does the racing weather/ time of the season affect the ranking?
Deep dive on each circuit characteristics. How do we profile the circuits? Length/number of straights? slow/medium/fast corners? Length and location of pitlanes? Elevation of tracks?
How well does the number of overtakes correlates with the number of viewerships/ticket sales?

Learning points: Wrangling the data frame too much makes the visualisation in Power BI inflexible as the relation between the aggregated values and the different IDs is lost On the other hand, Power BI is unable to sort values within partition/groupby column, as the ‘Top N’ filter function is applied before the data aggregation in the chart i.e. instead of showing the constructor with the highest number of win (Top N=1) for each circuit, it is showing the constructor with the highest number of win (Top N=1) throughout ALL circuits and then, its number of win for each respective circuit.

Click here to view codebase, pbix file and csv datasets

EXPERIENCE

Obayashi Corporation Asia-Pacific Regional Headquarters (APRHQ)

Data Management & Analytics Assistant Manager

Jan 2022 – Present

• Spearheaded the data pipeline setup for monthly project reports for than 30 ongoing APAC projects under 5 different overseas subsidiaries as the first step towards reporting automation and analytics.Led the ongoing study to improve the data collection workflow for multiple pipelines. • Led the development of Power BI reports and dashboards to transform and digitalise the company’s way of monitoring regional projects. • Initiated trials to improve the data collection workflow for multiple pipelines. • Collaborated with project management software provider on customising multiple Power BI reports to ensure effective data presentation on site. • Created multiple custom visualisations in Power BI using Python language to suit business needs.

Web Structures Pte Ltd

Structural Engineer
Aug-2019 to Mar-2021: 1 year 7 Months Project highlights: HDB Bukit Panjang N6C15, SIT New Punggol Campus

• Obtained approval from 5 authority agencies (LTA, PUB, HDB, BCA and NParks) for the design, coordination of external drainage and road works.
• Facilitated 5 concurrent projects in carrying out ad-hoc design items and report consolidation.
• Clawed back a 2-months delay of RFI and RFA submission response by generating key metrics, providing visibility and maintaining the submission data.
• Started the detailed design process for the Addition and Alteration (A&A) works of a conservation building which includes the foundation strengthening in preparation for authority submission.
• Accommodated Qualified Person (QP) for the inspection, verification design, drawing production and report submission of retention works to BCA.
• Maintained coordination with client, architect, contractor, M&E consultants and project site staff to generate solutions for site issues.

KTC Civil Engineering & Construction Pte Ltd

Project Engineer
Aug-2016 to Jun-2018: 1 year 11 Months Project highlights: LTA Contract T312 Construction of Sungei Bedok Station & Tunnel for Thomson-East Coast Line – Contract Value $418 million

• Transformed a 2-months submission delay into a 1-month excess supply of approved drawings by initiating and maintaining drawing submission tracking lists.
• Reduced utility gaps from 26 to 11 and improve the excavation safety by spearheading the planning and development of a diversion proposal to 4 utility agencies (Singtel, PUB water, PUB sewer and SP).
• Removed 3 months procurement lead time out from potential project critical path and reduce wastage by identifying the opportunity for material reallocations.
• Designed visualisation for construction sequence proposal and presentation.
• Managed the production and development of fabrication drawings with more than 10 subcontractors via weekly alignment meetings to ensure efficient site coordination and timely approval.
• Administered the development and coordination of design matters which consist of deep foundation elements, deep excavation, drain diversion and traffic diversion for site feasibility.

SKILLS

Data Analytics Skills

SQL, Python (Pandas, Scikit), Power BI, Power Query, Power Automate, Statistics, Data Wrangling, Data Reporting, Storytelling, Dashboard Design, Machine Learning, A/B Testing, Hypothesis Testing

Engineering Software

AutoCAD, Etabs, SAFE, Tedds, SAP2000, PLAXIS 2D, SIDRA, VISUM

Media Skills

Photography, audio recording & editing

Language

Fluent in English and Bahasa Indonesia