Hello World!

I’m Eric, let’s meet each other!

Eric Gonzalez

Eric Gonzalez

Data Engineer and MLops @PepsiCo

Biography

Hi! I'm Eric Gonzalez, a BSc & MSc engineer with industrial/digital sector experience and excellent problem solving, analytical and programming skills passionate about software development and data & analytics technologies. Eager to be part of the digital transformation that is shaping the future.

· Currently working full-time as a Data Engineer at PepsiCo.
· Interested in areas related to big data, data science, machine learning, HPC, industry 4.0 or internet of things.
· Now looking to get certified in Microsoft Azure. Always learning.

Interests

  • Science and Technology
  • AI, Data & Analytics
  • HPC and Computer Hardware
  • IoT, Blockain and Industry 4.0
  • Space, Energy, Transportation

Experience

 
 
 
 
 

Data Engineer & MLops

PepsiCo

May 2021 – Present Barcelona, Spain
· Developed recurrent ELT/ETL pipelines from a wide range of sources (Oracle and MySql DBs, Teradata, FTPs, Manual Dropoffs, ...) into the European Data Lake and Data Warehouse following the Data Quality and Data Governance guidelines.

· Provided assistance to the Commercial Advanced Analytics team helping Data Scientists optimize the performance of their machine learning code in scalable distributed environments and to apply MLOps best practices using Databricks, Git and MLFlow to projects accross European markets.

· Helped with the Supply Chain digitalization agenda by designing and implementing scalable critical ingestion pipelines triggered via handshake and provided daily monitoring and migration support in production environment to ensure the correct continuos execution.

· Helped with Data Modelling tasks to ensure a robust model for data standardization of ingested datasources across European markets.

· Helped with the design and implementation of the CI/CD approach of the DE team both for Databricks and Data Factory by making use of Azure DevOps services such as Git Repos, Artifacts, Pipelines and Releases.

· Extensive use of Python/PySpark and Microsoft Azure suite (Databricks, Data Factory, Storage, Key Vault...)

· Taking place in an international environment, the European Data & Digital team, following Agile methodologies with English as the main communication language.
 
 
 
 
 

Software Engineer (Internship)

Endesa

Feb 2021 – May 2021 Madrid, Spain
· Processing and visualization of historical and real-time data coming from industrial SCADA systems using OSIsoft PI System (PI Vision, PI DataLink) connected with Python and Excel to improve operational intelligence tasks.

· Software development for processing data coming from solar & wind renewable energy production facilities.

· Taking place in the operations and maintenance department of Enel Green Power Spain.

Education

 
 
 
 
 

Master’s in Big Data Analytics (English) - 9.0/10

Universidad Carlos III de Madrid

Sep 2020 – Jul 2021 Madrid, Spain
· Theoretical and highly practical foundations of data engineering, data science and data analytics. Focused on the processing and analysis of big data by means of statistical and machine learning techniques to obtain real world valuable insights.

· Emphasis on the use of analytical languages such as R or Python, processing frameworks such as Apache Spark (PySpark/Scala), database management systems (SQL/NoSQL) and analytical techniques (machine learning (PyTorch), time series, predictive modelling, statistical learning...)

· GPA 9.0/10

· Completely taught in English

· Thesis(10/10) - Development of classification tools, based on statistics and artificial intelligence (ML/DL), to forecast in the short term the sign of the system imbalance in the Spanish Electricity Market.
 
 
 
 
 

Bachelor’s in Aerospace Engineering (English) - 8.0/10

Universidad Politécnica de Valencia

Sep 2016 – Jul 2020 Valencia, Spain
· All you can expect from an engineering degree (Mathematics, Physics, Computer Science, Statistics...)

· GPA 8.0/10

· Completely taught in English

· Thesis (9.9/10) - Optimized a state-of-the-art Computational Fluid Dynamics (CFD) code which performs a Direct Numerical Simulation (DNS) of a developed turbulent channel. The computational optimization was carried by means of High Performace Computing (HPC) techniques mainly involving communications among processors. The code was written in Fortran and the parallel computing standards of MPI and OpenMP were used. Final code was partially run with over 2000 CPUs for benchmarking in the SuperMUC supercomputer in Germany and MareNostrum in Barcelona, accomplishing a maximum optimization of a 10% in time consumption involving monetary savings of approximately 10000 € in the final execution of the code. The complete full-duration simulation is (as of today) the biggest DNS worldwide, performed by my advisor Mr. Sergio Hoyas.

Technological Skills

Python

Numpy, Pandas, Matplotlib…

R

Machine & Deep Learning

Algorithms, NN architectures, Scikit-learn, PyTorch, MLflow

Big Data

Spark, Hadoop, Scala

Back-end

SQL, MongoDB, Cassandra

HPC

Fortran, MPI, OpenMP, Linux

Cloud

Microsoft Azure, Google Cloud

Git

Statistics

Contact