[XetHub — Custom Diffs]

Thriving in ambiguity

Dec 2023 - Aug 2024

It is exhilarating to be on the edge of the unknown and discover a path forward.

Most good designs start with a problem to solve, not a technology to home. However, no matter how you're brought in, a simple reframe can help define your mission.

Team

Myself and the Head of Product

Setup

I was the first designer of an early-stage startup, XetHub. Our technology could scale version control to petabyte-sized files and folders storing thousands of files. Our approach to deduplication and optimization was more efficient because it broke down data to a finer grain of detail using size-defined chunks instead of file boundaries (more information). This meant we could bring version control to new domains—from creatives such as video production and game design to the critical complexity of biotechnology and the ever-expanding field of ML and AI.

While developers rely on version control, it may initially seem like a small benefit to bring to these other domains. However, locked within the complex workflows and norms of git is the expansive power of remote, asynchronous, decentralized collaboration. This collaboration has helped open-source software thrive. Git gives both the flexibility and security to support anyone jumping in on a project without worrying about losing hard work. Simply put, version control is truly about collaboration.

Our company included three founders, a dozen engineers, myself, and Ann, the Head of Product. As the two-person product team, Ann and I proactively sought out potential customers, conducted user research and information gathering, synthesized patterns from that data, and presented findings to the founders. We used that information to explore the possible paths to product market fit and worked with the founders to iterate and refine. We would then scope that information into bite-sized experiments for the engineering team to build and deploy. Finally, we all anxiously watched the data to learn, iterate and refine.

When I joined, we had the technology, various developer experience tools (a command line interface and Python library), and a rudimentary UI frontend to our service, forked from a popular open-source GitHub clone. But we had very few users. Our job was to fix that.

Finding the problem to solve

I realized from the customer conversations that git supports collaboration for developers by helping them truly see and understand all of the changes. It isn't enough to have the versions and be able to travel through them in time. Developers care deeply about preserving a clean commit history and understanding line-by-line diffs as it is their key to maintaining and debugging their work.

Our experience allowed users to store and access every version of their data. However, to see what changed, you'd have to download PB-sized files and repos, wait to open large files in specialized software, or run scripts to understand what had changed. This meant approving PRs was excruciating, and collaborating with others slow and laborious. Users needed a quicker way to understand how their data were changing to see the real value of version control.

As a company, we had focused in on the domain of ML, and building a platform to allowed users to collaboratively build models. But with a world as vast as AI, we still needed to support customizable, high-level summaries of diffs.

Creating the vision

I pitched the concept of Custom Diff views to leadership. This included a short-term concept of using a simple inline markdown view to reference different versions of the data and use compute to run functions across it. My long-term vision included summaries at multiple levels of detail as you navigated your repo. The contextual summaries would adapt based on file types. We would build out standard file types, and allow users to tweak or write their views to hone in on what they cared about. Eventually, I imagined a potential marketplace that could grow and allow users to build on each other's work.

Initial sketches exploring the potential vision of custom diffs.

Experimentation

To scope the project, we started with custom views. While the north star remained directly computing deltas of key metrics across diffs, summarization and visualization at the version level still added great value. Conceptually, we created a system that allowed users to host the equivalent of GitHub pages within the context of their repo. This page had secure access to the specific version of the data being viewed. Our proof of concept included many things including: viewers (parquets, data frames, 3d files, and other custom file formats), visualizers (Netron for neural networks, visualizing feature importance for tree models, etc.), and compute (calculating feature importance, other in-browser python examples). I created a page on our marketing site to showcase these examples, and we added the ability to import these views into your repo with a single line of code. We started building out a robust actions framework to support full compute workflows. Users seemed intrigued by these concepts, and momentum was starting to build up.

Sketches exploring potential shorter-term experiments to explore implementing with the team

Working with the team to diagram and understand how to handle a diverse set of file types and scenarios

The gallery I created to collect and share the various custom views we had

An example of a custom view showing protein structures

A flow we created that allowed users to write python in the browser to create their views

Partnership

Our company was also on the search for potential design partners. When we heard Tableau was searching for Version Control partners, I used my knowledge of Tableau users to propose a diff approval experience for workbooks. The diff view was the highlight of many customer calls, some customers saying it would save them hours of work when collaborating on workbooks. The demo also sparked promising conversations with other companies in the BI space.

I worked with the engineers to build out the diff viewer, and we began to line up customers to reach out to be our initial testers.

Illustrating the layers of collaboration we could build out for Tableau workbooks

A design prototype to demo a custom diff experience we built for Tableau workbooks

Impact

XetHub was acquired by Hugging Face in August of 2024. While we didn't get to see where the momentum we were building around custom views and diffs would go, it was called out as a differentiator when we were being evaluated by Hugging Face which could incorporate elements of the concepts in the future.