About me
Hi, I'm Alessio
I am an AI Researcher at Google XR Zurich in Federico Tombari's team. I lead my own applied research team focused on AI technologies for the current and next generation of XR devices. We work on the whole stack ranging from the integration of AI technologies to improving Gemini for egocentric assistant use cases.
I earned my Ph.D. in Computer Science and Engineering from the University of Bologna in 2019 at the Computer Vision Lab, advised by Professor Luigi Di Stefano.
My research journey began with deep learning for retail environments and depth estimation for autonomous driving. Since then, it has evolved through several different topics, ranging from implementing some of the visual translation stack powering Google Lens and Google translate to different image understanding tasks without forgetting some of my early 3D reconstruction and generation works. My current focus is on multimodal video/image understanding and generation using LLMs and Diffusion models.
I'm always excited to discuss new ideas or explore emerging topics. If you find any of our work interesting, please don't hesitate to reach out!
Latest News
Change everything at once: Fabio realized doing one image edit at a time is for people with way too much patience, so he built MICE to handle eight-plus concurrent tweaks without the attributes leaking like a cheap faucet.
Cleaning data is overrated compared to mixing it like a mad scientist, so Matteo and the team built DCVLM, a 6T token benchmark that makes our 8B models slightly less embarrassing to evaluate.
Sick of visual tokens eating your VRAM? Selim built PARCEL to squeeze images into smaller budgets without the usual blurry existential crisis, making efficiency look almost intentional.
Generating pictures was too easy, so Eric made flow models talk back; FullFlow turns flow generators into bidirectional overachievers without the usual VRAM-induced existential dread.
No more AI ghosts: Jiahao built R-CoV to force models to actually check image regions, because apparently basic observation is still a bit too much to ask of them.

