Independent consulting practice of Marco Anteghini, PhD

Bioinformatics & ML, delivered.

Reproducible pipelines. Decision-ready outputs.

Mosaic Bioinformatics turns noisy biological and health data into reliable, reproducible results: custom pipelines, machine-learning models, and clear analysis for life-science research and industry.

About

A focused, senior bioinformatics partner

Mosaic Bioinformatics is the independent consulting practice of Marco Anteghini, PhD. It delivers end-to-end computational work, from raw sequencing reads and administrative records to validated models and reporting, built to be reproducible, documented, and ready to hand over.

Engagements typically span three domains:

  • Biotech & life sciences: sequence/variant analysis, microbiome and proteomics pipelines.
  • Healthcare & administrative data: SQL/ETL pipelines, cohort building, indicators, and QC.
  • EU-funded R&I projects: technical contributions, structured documentation, and delivery support.
Services

What I deliver

Practical, validated computational work, scoped to your data and built to last beyond the engagement.

Custom Bioinformatics Pipelines

End-to-end NGS, metagenomics/microbiome, and proteomics workflows: automated, scalable, and reproducible, with QC gates and versioned outputs.

AI & Machine Learning

Models for protein property, variant impact, and risk prediction, using embeddings and protein language models, with explicit validation and failure modes.

Data Engineering & Analysis

SQL/ETL pipelines, cohort and indicator definitions, and reproducible extracts, plus dashboards and publication-ready figures.

Project & Delivery Support

Technical contributions and structured documentation for EU-funded consortia and R&D teams, with handover-ready artefacts and traceable decision logs.

Approach

From noisy inputs to decisions you can act on.

Every engagement is grounded in your data and your constraints, then built to be reproducible, validated, and ready to hand over.

  • Reproducible by default: containerised, versioned, and documented.
  • QC gates and explicit validation at every step.
  • Interpretable outputs, with stated assumptions and limits.
  • Clean handover: code, documentation, and decision logs.

1. Ingest & structure

Raw sequencing, variant, microbiome, or administrative data, standardised and quality-checked.

2. Pipelines & QC

Automated, reproducible workflows with QC gates and versioned, traceable outputs.

3. Model & validate

Machine-learning and embedding-based models, evaluated with explicit failure modes.

4. Decide & hand over

Interpretable results, reporting, and documentation ready for operational decisions.

Projects

Selected work

Representative engagements across research and industry. Client identities are kept confidential.

Pathogenicity Risk Assessment

Computational tools and ML workflows for predicting the impact of genetic variants on protein stability, structure, function, and localisation.

Data Management & ETL

Engineering and standardisation of complex administrative and biological datasets, with QC checks and reproducible extracts.

Gut Microbiota Analysis

Reproducible pipelines for taxonomic and functional profiling and dysbiosis detection in human gut microbiome data.

Sustainable Bio-Manufacturing

Computational biology and decision-support modelling for sustainable bio-manufacturing within EU-funded research consortia.

Credentials

Background & expertise

Academic foundation and applied experience behind the practice.

PhD in Bioinformatics

Deep-learning approaches for protein function, interaction, and localisation.

MSc Bioinformatics · BSc Biological Sciences

Strong grounding in molecular biology and computational methods.

Marie Skłodowska-Curie Fellow

H2020 MSCA-ITN fellowship (2018 cohort).

Co-founder, Bioinform (NGO)

Training schools and materials in bioinformatics and applied AI across Europe.

Host, BioIntelligent podcast

Conversations at the intersection of biology, data, and industry.

Reproducibility-first

Containerised, versioned, and documented work designed for clean handover.

Core toolkit

PythonSQLRBash PyTorchTensorFlow PostgreSQLDockerGit Protein embeddings / pLMsVariant & mutation analysis NGS & microbiome pipelines
Publications

Peer-reviewed research

A selection of published work in bioinformatics and machine learning.

AI applications to biological networks
Computers in Biology and Medicine · 2025

How did we get there? AI applications to biological networks and sequences

How AI drives biomedical science: processing large datasets and predicting complex phenomena.

View publication
In-pero
Int. J. of Molecular Sciences · 2021

In-pero: deep learning for peroxisomal protein localization

Embedding-based deep learning to predict peroxisomal protein localization with high accuracy.

View publication
OrganelX
Computational & Structural Biotech. J. · 2022

OrganelX web server for sub-organelle protein localization

Precise sub-organelle protein localization using machine-learning approaches.

View publication
PortPred
J. of Cellular Biochemistry · 2023

PortPred: transporter protein identification

Deep learning for identifying transporter proteins and predicting their substrates.

View publication
PhD thesis
Doctoral Thesis · 2022

Deep learning for peroxisomal proteins

Revealing function, interactions, and localization using deep-learning approaches.

View publication
Blog

Latest articles

Notes on bioinformatics, machine learning, and reproducible data work, from me and occasional guest contributors.

June 4, 2026

Welcome to the Mosaic Bioinformatics blog

This is a space for practical writing on bioinformatics, machine learning, and reproducible data work.

Read article
Contact

Let's talk about your data

Whether it's a pipeline to build, a model to validate, or a project to support, get in touch to discuss scope, timeline, and how Mosaic can help.