Experience

Experience

I have developed a strong background and expertise in data engineering, data analysis, and data assimilation. Throughout my career, I have acquired proficiency in various tools and technologies, including dbt (data build tool), Python (including the Pandas library), Jinja templating language, Google BigQuery, and Looker Studio.

In the field of data engineering, I have demonstrated my ability to create robust and scalable data pipelines using dbt, Python, and Jinja templating. I take pride in ensuring consistent and clean data for analysis, leveraging my skills in transforming and querying data effectively.

My expertise in data analysis shines through my proficiency in utilizing Python for tasks such as data manipulation, aggregation, calculations, and data filtering. Whether it’s conducting daily or weekly analyses, I am able to derive meaningful insights by examining specific metrics and uncovering valuable patterns in datasets.

An essential aspect of my work is integrating visualization and reporting tools to present analysis results effectively. By utilizing tools like Looker Studio, I have been able to create interactive dashboards and visualizations that provide stakeholders with a user-friendly and informative experience. This highlights my strong communication skills and my ability to effectively convey complex data analysis to a diverse audience.

My experience showcases my competence in data engineering, data analysis, and visualization. I am skilled at handling complex datasets, extracting valuable insights, and presenting analysis results in a clear and compelling manner.


Research Experience 🌍

Director of the Applied Math and Computer Science Lab.

  • Spring 2017 to Fall 2022.
  • Department of Computer Science, Universidad del Norte, BAQ 080001, Colombia.
  • Description: Scientific Computing methods for the solution of real-life problems.

Research Assistant

  • June 2014 to July 2014.
  • Mathematics and Computer Science Division, Lawrence Livermore National Laboratory, CA 94550, USA.
  • Supervisor: Greg Bronevetsky, Ph.D.
  • Description: Analysis of sequential data assimilation methods using the SIGHT toolbox.

Givens Associate

  • June 2013 to August 2013
  • Mathematics and Computer Science Division, Argonne National Laboratory, IL 60439, USA.
  • Supervisor: Cosmin Petra, Ph.D.
  • Description: Working on time-dependent, background-error covariance matrix estimation.

Research Assistant

  • August 2011 to December 2015
  • Computational Science Laboratory, Computer Science Department, Virginia Polytechnic Institute and State University, VA 24060, USA.
  • Supervisor: Adrian Sandu, Ph.D.
  • Description: Working on sequential and variational data assimilation for the weather forecast.

Faculty Experience 🛸

Associate Professor

  • Spring 2023 - Current.
  • Department of Systems Engineering, Universidad del Norte, BAQ 080001, Colombia

Associate Professor & Department Head

  • Spring 2018 - Spring 2023.
  • Department of Systems Engineering, Universidad del Norte, BAQ 080001, Colombia

Instructor of Data Assimilation

  • Fall 2020, Fall 2021.
  • Ph.D. Program in Mathematical Engineering, Universidad EAFIT, MDE 050001, Colombia.

Teaching Assistant of Data Science for All

  • Fall 2020 - Current
  • Program of Data Science for All - Colombia, Correlation-One (DS4A - Colombia), USA

Assistant Professor

  • Spring 2016 - Fall 2017.
  • Department of Systems Engineering, Universidad del Norte, BAQ 080001, Colombia.

Instructor of Numerical Methods

  • Fall 2015.
  • Department of Computer Science, Virginia Polytechnic Institute and State University, VA 24060, USA.

Industry Experience 🚀

Estimating Prices in Freight Markets

The project focused on developing machine learning models to estimate RPM (rate per mile) and rates in the freight transportation industry. Two versions of the model were provided, each built using a different approach. Model 1 utilized a dbt pipeline for data extraction, transformation, and loading, while Model 2 employed Python for these tasks. The models incorporated various variables to accurately estimate prices, including mode, distance, angles, and more. Main tasks executed in the project:

  • Data extraction, transformation, and loading using a dbt pipeline (Model 1).
  • Data extraction, transformation, and loading using Python (Model 2).
  • Feature engineering to incorporate variables such as mode, distance, angles, etc.
  • Training machine learning models using a Random Forest algorithm.
  • Evaluating model accuracy using TRAC/Contract datasets.Creating training and validation sets with representative lane characteristics.
  • Testing the models for different scenarios and resolutions.

Major contributions:

  • Leveraged dbt (data build tool) to develop a robust data pipeline for Model 1, including data extraction, transformation, and loading processes.
  • Utilized Python for data extraction, transformation, and loading in Model 2, providing an alternative approach for the project.
  • Incorporated various technologies and techniques to accurately estimate prices, including feature engineering and the utilization of Random Forest algorithm.
  • Implemented testing methodologies using TRAC/Contract datasets to evaluate and validate the accuracy of the developed models.
  • Developed flexible and customizable solutions, allowing for different scenarios and resolutions in estimating RPM and rates in the freight transportation industry.
  • Leveraging dbt and Python for data extraction, transformation, and loading provides the client with flexibility in choosing the preferred approach based on their existing infrastructure and technology stack.

Technologies:

dbt (data build tool) for data extraction, transformation, and loading (Model 1). Python for data extraction, transformation, and loading (Model 2). Random Forest algorithm for machine learning modeling (BigQuery Machine Learning - ML and Python scikit-learn ).

Accelerated Dashboard Development Framework

This project focuses on creating a Python-based dashboard framework that revolutionizes the process of dashboard development. It aims to provide developers with an integrated solution that streamlines the entire development process and enables them to create robust and visually appealing dash applications with ease. The main tasks executed in this project include:

  • Firstly, the framework focuses on data integration and querying by implementing the Connector class. This involves executing SQL queries and retrieving results as Pandas DataFrames, leveraging the capabilities of the Google Cloud Platform (GCP) BigQuery connector for efficient data retrieval and processing. Additionally, the Logic class is developed to handle data integration and manipulation tasks, including loading data from various sources, applying filters, and transforming data as required.
  • Secondly, the framework addresses HTML and CSS template management, as well as HTML element generation. This includes creating the CSS_Template class to manage CSS styles for the dashboard, providing methods for retrieving specified styles and generating custom styles. The HTML_Factory class is developed to generate various HTML elements using Dash’s components, enabling developers to create dropdowns, headings, images, buttons, tables, and graphs with ease. Moreover, the framework encompasses HTML container management through the HTML_Container class, allowing developers to organize and add elements to the container for proper layout and organization within the dashboard.
  • These tasks aim to simplify and accelerate the process of creating dashboards. By providing pre-built components, efficient data integration and querying capabilities, and streamlined HTML and CSS management, the framework empowers developers to create visually appealing and customizable dash applications, enhancing data-driven decision-making for businesses and organizations.

Major contribution

The major contribution of this framework lies in its ability to enhance developers’ productivity by providing pre-built components and functionalities. It simplifies the development process, reduces development time, and improves code maintainability. By leveraging the framework, developers can focus on creating valuable data visualization solutions rather than dealing with intricate technical details.

Technologies

Python as the programming language, Pandas for data manipulation, Google Cloud Platform (GCP) BigQuery for efficient data retrieval, Dash for creating interactive dashboards, and Plotly for generating visually appealing plots and graphs.

Exploratory Analysis of Air Cargo Potential

The objective of this project was to evaluate the relationship between two datasets in the air cargo logistics domain. The project focused on conducting an explanatory analysis to understand the factors contributing to the similarities and differences between these datasets. By analyzing the data and identifying relevant insights, the project aimed to gain a better understanding of how these datasets complement each other and provide a comprehensive view of air cargo logistics.

Main Tasks:

  • Data Preparation with dbt:
    • Developed a dbt pipeline to extract and transform the data from the datasets.
    • Utilized dbt’s data transformation capabilities to clean, structure, and query the data.
    • Leveraged the Jinja templating language within dbt to create dynamic SQL queries and data transformation logic.
  • Analysis on a Daily Basis:
    • Examined various aspects of the dataset, such as the total number of loads.
    • Utilized Python for data manipulation and analysis tasks, including aggregations, calculations, and data filtering.
    • Analyzed the availability of specific cargo information using relevant tables.
  • Analysis on a Weekly Basis:
    • Performed weekly aggregations and calculations on metrics like total volume, total tonnage, and loads.
    • Utilized Python and dbt for efficient data manipulation and analysis tasks, leveraging libraries such as Pandas.
  • Visualization and Reporting:
    • Utilized Looker Studio to create interactive dashboards and visualizations.
    • Integrated the dbt project with Looker Studio to access the transformed data for visualization purposes.

Major Contributions:

  • Data Engineering:
    • Developed a robust and scalable data pipeline using dbt, Python, and the Jinja templating language.
    • Ensured consistent and clean data for analysis through dbt’s transformation capabilities and Jinja templating.
    • Integrated Python scripts within the dbt project for seamless data processing and integration.
  • Daily Analysis and Insights:
    • Identified valuable insights through daily analysis, including the examination of loads and specific cargo information.
    • Leveraged Python for efficient data manipulation, filtering, and analysis tasks.
  • Weekly Aggregations and Calculations:
    • Conducted weekly aggregations and calculations on key metrics using Python and dbt.
    • Ensured accurate and efficient calculations for metrics like total volume, total tonnage, and loads.
  • Visualization and Reporting Enhancement:
    • Created interactive dashboards and visualizations in Looker Studio to present the analysis results.
    • Integrated the dbt project with Looker Studio for seamless access to transformed data for visualization purposes.

Technologies

dbt (data build tool), Python (including Pandas library), Jinja templating language, and Looker Studio.

Pipeline Step Comparison

The project focuses on developing a data engineering solution for comparing pipeline steps from different datasets. The solution utilizes the Google Cloud Platform (GCP), specifically Google BigQuery, for data processing and storage. The objective is to compare the number of loads at each step of two pipelines: an airflow based one and a novel one via dbt on a quarterly and monthly basis.

Main tasks executed in the project:

  • Developed class definitions and implemented components using Python to build the data engineering solution, including:
    • Step: Defined pipeline steps and their variables for comparison.
    • Dataset: Identified the datasets from which the pipeline steps were read.
    • Contraster: Compared steps from different datasets.
    • Connector: Handled the connection to Google BigQuery for executing queries.
  • Utilized Google BigQuery and SQL to execute queries and retrieve data efficiently.
  • Implemented parallel processing techniques using Python’s multiprocessing library to optimize performance and reduce processing time.
  • Employed object-oriented programming in Python to create scalable and maintainable class structures for the Contraster, Dataset, and Step components.
  • Leveraged GCP’s authentication and authorization mechanisms to securely establish connections and access Google BigQuery resources.
  • Integrated with Looker Studio to provide an interactive and visually appealing interface for exploring the comparison results.

Major contribution

My major contribution in this project was the development of a scalable and efficient data engineering solution using a combination of Google Cloud Platform (GCP) technologies and Python. By leveraging Google BigQuery and SQL, I ensured fast and accurate data processing and retrieval, enabling seamless comparison of pipeline steps. The implementation of parallel processing techniques optimized performance, allowing for rapid analysis and reduced processing time. The utilization of object-oriented programming in Python facilitated code maintenance, extensibility, and scalability. Additionally, the integration with Looker Studio enhanced the user experience, providing an intuitive platform for exploring and visualizing the comparison results.

Technologies

Google Cloud Platform (GCP), Google BigQuery, SQL, Python (including libraries for multiprocessing), and Looker Studio.

Analyzing Forecast Quality Assurance Results

Evaluated the forecasting performance of the DeepAR Forecasting Algorithm on transportation data. Constructed a data pipeline using dbt, trained forecasting models, and performed statistical analysis and visualizations.

Main tasks executed in the project:

  • Constructed a data pipeline using dbt for data extraction, transformation, and loading.
  • Trained forecasting models using the DeepAR Forecasting Algorithm.
  • Applied statistical analysis techniques to evaluate forecast quality.
  • Created visualizations to communicate forecast quality and insights.

Major contribution

My major contribution in this project was providing a comprehensive analysis of forecast quality and insights. By constructing a data pipeline using dbt, I facilitated efficient data preparation and trained forecasting models. The statistical analysis and visualizations provided valuable insights into forecast quality and highlighted areas for improvement. The client benefited from improved forecasting accuracy, optimized operational planning, and resource allocation, leading to enhanced performance and decision-making.

Technologies

dbt (data build tool), Python, DeepAR Forecasting Algorithm, statistical analysis techniques, and visualization libraries (e.g., Matplotlib, Seaborn, Plotly).

Vehicle Migration Pipeline

This project involved the development of a data engineering solution utilizing Microsoft Azure services. The solution utilized Azure Blob Storage, Azure Data Factory, Azure Functions, and SQL Server to manage and process data effectively in a cloud environment. The main objective was to build a robust data pipeline that could handle data stored in Azure Blob Storage, perform data transformations using Azure Functions, and store the processed data in SQL Server.

Main tasks executed in the project:

  • Data Ingestion and Exchange with GraphQL:
  • Utilized GraphQL to send and receive data from external sources before storing it in Azure Blob Storage.
  • Implemented GraphQL queries and mutations to fetch and store data, enhancing the data ingestion process.
  • Azure Blob Storage:
  • Utilized Azure Blob Storage as a data lake to store the JSON files provided for the assessment.
  • Managed data ingestion and storage in Azure Blob Storage.
  • Azure Data Factory:
  • Configured Azure Data Factory to handle data movement and orchestration tasks.
  • Implemented data copying from one location to another, including from Azure Blob Storage to SQL Server.
  • Azure Functions:
  • Developed an Azure Function named “TimerTriggerProcessEvent” to format the JSON data and store it in CSV files in Azure Blob Storage.
  • Leveraged Azure Functions to process the data efficiently and trigger the formatting process based on defined triggers.
  • SQL Server:
  • Utilized SQL Server as a data warehouse to store the processed data in a relational format.
  • Implemented the data storage and retrieval mechanism using SQL Server.

Major contribution

My major contribution in this project was the successful implementation of a comprehensive data engineering solution using Microsoft Azure services. By utilizing Azure Blob Storage, Azure Data Factory, Azure Functions, and SQL Server, I demonstrated expertise in handling and processing data in the cloud environment. The solution provided several benefits:

  • Scalable and efficient data management leveraging the power of Microsoft Azure.
  • Modular and customizable architecture, enabling easy modification and extension of functionality.
  • Utilization of Pandas DataFrames and Azure Blob Storage for efficient data manipulation and storage.
  • Structured event handling for easy adaptation to handle new event types.

Technologies

Microsoft Azure (Azure Blob Storage, Azure Data Factory, Azure Functions), SQL Server, and Python (including Pandas library).

Go to the main site