From Pandas to ClickHouse: The Evolution of Our Data Analytics Journey

Chetan Patel

  1. Oct 28, 2024
  2. 4 min read

At Incentius, data has always been at the heart of what we do. We’ve built our business around providing insightful, data-driven solutions to our clients. Over the years, as we scaled our operations, our reliance on tools like Pandas helped us manage and analyze data effectively—until it didn’t.

The turning point came when our data grew faster than our infrastructure could handle. What was once a seamless process started showing cracks. It became clear that the tool we had relied on so heavily for data manipulation—Pandas—was struggling to keep pace. And that’s when the idea of shifting to ClickHouse began to take root.

But this wasn’t just about switching from one tool to another; it was the story of a fundamental transformation in how we approached data analytics at scale.

The Early Days: Pandas, Our First Love

When we first adopted Pandas, it felt like we had unlocked the perfect solution. The flexibility, the powerful data frames, and the ease with which we could manipulate small to mid-sized datasets—it was a game-changer. Our team of data engineers and analysts loved it for its simplicity. And, for a long time, it served us well.

But then something happened. Our datasets began to grow, and so did the complexity of our queries. We went from handling thousands of rows to millions, and then, in what seemed like no time at all, billions. The once seamless operations we had with Pandas turned into long waits for processes to complete, or worse, system crashes.

We found ourselves asking: How can we keep scaling without compromising performance?

The Bottleneck: Pandas in a Big Data World

At first, we tried to optimize Pandas in every way possible. We ran computations on smaller chunks of data, tried parallel processing techniques, and even moved to bigger and more expensive machines to support the growing memory requirements. But these were short-term fixes for a long-term challenge. Pandas was designed to load data into memory, which, for our growing datasets stored on S3, was becoming a major bottleneck.

We realized that as the data we were handling continued to scale, our tools needed to scale with it. Pandas, for all its strengths, wasn’t designed for this new world of distributed, high-performance data analytics. That’s when we started exploring alternatives—and found ClickHouse.

Enter ClickHouse: A New Frontier in Data Analytics

We didn’t immediately jump into using ClickHouse. Like any good story, there was a journey of discovery, a few moments of doubt, and ultimately, a realization that this was the solution we needed.

ClickHouse came onto our radar because of its reputation for handling real-time, high-performance analytics. It was built to thrive in environments like ours—where datasets are huge, queries are complex, and the need for speed is paramount. We started small, running a few test queries on ClickHouse to see how it would perform against Pandas. The results were staggering.

Where Pandas took minutes, sometimes hours, to process data, ClickHouse completed the same tasks in seconds. The first time we ran a complex aggregation on ClickHouse and saw the results in the blink of an eye, we knew we were onto something.

The Turning Point: Scaling Without Limits

Transitioning from Pandas to ClickHouse wasn’t just about better performance; it was about rethinking how we managed our entire data pipeline. Here’s what changed:

  1. Handling Larger Datasets with Ease: ClickHouse’s columnar storage model meant we could now work with datasets that would’ve been impossible to manage in Pandas. Instead of loading everything into memory, ClickHouse processed data directly from S3, allowing us to scale infinitely without worrying about memory limits.
  2. Real-Time Insights, Faster Than Ever: One of the biggest challenges we faced with Pandas was the time it took to generate real-time reports. With ClickHouse, real-time analytics became just that—real-time. We could now offer clients up-to-the-minute insights on their data, something that would’ve taken hours in our previous setup.
  3. Distributed Processing, Maximum Efficiency: ClickHouse’s ability to distribute queries across multiple nodes unlocked a new level of efficiency. We were no longer constrained by the limitations of a single machine. We could now process billions of rows of data across multiple servers, achieving performance that was unimaginable with Pandas.
  4. Seamless S3 Integration: One of the most powerful features of ClickHouse is its seamless integration with S3. This eliminated the need for us to move data between different storage systems or perform complex ETL processes just to analyze it. We could query data directly where it was stored, saving time, money, and resources.

From Challenge to Opportunity: What This Means for Our Future

Looking back, the decision to transition from Pandas to ClickHouse was more than just a technical upgrade—it was a turning point in how we think about data. The challenges we faced with Pandas forced us to push the boundaries and explore new technologies. ClickHouse wasn’t just a replacement; it became the foundation for a more scalable, robust, and future-proof data infrastructure.

Now, instead of being bogged down by the limitations of in-memory processing, we’re able to take on projects that involve massive datasets with confidence. Our clients benefit from faster insights, more reliable data processing, and a system that’s built to grow with them.

Conclusion: The Evolution Continues

The move to ClickHouse wasn’t the end of our story; it was just the next chapter. As we continue to evolve and scale, we’re constantly looking for ways to push the envelope, to find new tools and technologies that allow us to deliver even greater value to our clients. The lesson we learned from this transition is simple: As the world of data evolves, so must we.

Our journey from Pandas to ClickHouse is a testament to that philosophy—an evolution driven by necessity, but one that has opened the door to endless possibilities.

And with ClickHouse powering our analytics, the possibilities are truly endless.

About Author
Chetan Patel

See What Our Clients Say

Mindgap

Incentius has been a fantastic partner for us. Their strong expertise in technology helped deliver some complex solutions for our customers within challenging timelines. Specific call out to Sujeet and his team who developed custom sales analytics dashboards in SFDC for a SoCal based healthcare diagnostics client of ours. Their professionalism, expertise, and flexibility to adjust to client needs were greatly appreciated. MindGap is excited to continue to work with Incentius and add value to our customers.

Samik Banerjee

Founder & CEO

World at Work

Having worked so closely for half a year on our website project, I wanted to thank Incentius for all your fantastic work and efforts that helped us deliver a truly valuable experience to our WorldatWork members. I am in awe of the skills, passion, patience, and above all, the ownership that you brought to this project every day! I do not say this lightly, but we would not have been able to deliver a flawless product, but for you. I am sure you'll help many organizations and projects as your skills and professionalism are truly amazing.

Shantanu Bayaskar

Senior Project Manager

Gogla

It was a pleasure working with Incentius to build a data collection platform for the off-grid solar sector in India. It is rare to find a team with a combination of good understanding of business as well as great technological know-how. Incentius team has this perfect combination, especially their technical expertise is much appreciated. We had a fantastic time working with their expert team, especially with Amit.

Viraj gada

Gogla

Humblx

Choosing Incentius to work with is one of the decisions we are extremely happy with. It's been a pleasure working with their team. They have been tremendously helpful and efficient through the intense development cycle that we went through recently. The team at Incentius is truly agile and open to a discussion in regards to making tweaks and adding features that may add value to the overall solution. We found them willing to go the extra mile for us and it felt like working with someone who rooted for us to win.

Samir Dayal Singh

CEO Humblx

Transportation & Logistics Consulting Organization

Incentius is very flexible and accommodating to our specific needs as an organization. In a world where approaches and strategies are constantly changing, it is invaluable to have an outsourcer who is able to adjust quickly to shifts in the business environment.

Transportation & Logistics Consulting Organization

Consultant

Mudraksh & McShaw

Incentius was instrumental in bringing the visualization aspect into our investment and trading business. They helped us organize our trading algorithms processing framework, review our backtests and analyze results in an efficient, visual manner.

Priyank Dutt Dwivedi

Mudraksh & McShaw Advisory

Leading Healthcare Consulting Organization

The Incentius resource was highly motivated and developed a complex forecasting model with minimal supervision. He was thorough with quality checks and kept on top of multiple changes.

Leading Healthcare Consulting Organization

Sr. Principal

US Fortune 100 Telecommunications Company

The Incentius resource was highly motivated and developed a complex forecasting model with minimal supervision. He was thorough with quality checks and kept on top of multiple changes.

Incentive Compensation

Sr. Director

Most Read
Building a Simple E-Invoicing Solution with AWS Lambda and Flask

In today’s fast-moving distribution industry, efficiency is everything. Distributors need quick, reliable tools to handle tasks like generating invoices and e-way bills. That’s why we created a serverless e-invoicing solution using AWS Lambda and Flask—keeping things simple, cost-effective, and secure. Here’s how we did it and the benefits it brought to distributors.

Yash Pukale

  1. Nov 13, 2024
  2. 4 min read
Scaling Data Analytics with ClickHouse

In the modern data-driven world, businesses are generating vast amounts of data every second, ranging from web traffic, IoT device telemetry, to transaction logs. Handling this data efficiently and extracting meaningful insights from it is crucial. Traditional databases, often designed for transactional workloads, struggle to manage this sheer volume and complexity of analytical queries.

Kartik Puri

  1. Nov 07, 2024
  2. 4 min read
From Pandas to ClickHouse: The Evolution of Our Data Analytics Journey

At Incentius, data has always been at the heart of what we do. We’ve built our business around providing insightful, data-driven solutions to our clients. Over the years, as we scaled our operations, our reliance on tools like Pandas helped us manage and analyze data effectively—until it didn’t.

The turning point came when our data grew faster than our infrastructure could handle. What was once a seamless process started showing cracks. It became clear that the tool we had relied on so heavily for data manipulation—Pandas—was struggling to keep pace. And that’s when the idea of shifting to ClickHouse began to take root.

But this wasn’t just about switching from one tool to another; it was the story of a fundamental transformation in how we approached data analytics at scale.

Chetan Patel

  1. Oct 28, 2024
  2. 4 min read
Designing Beyond Aesthetics: How UI Shapes the User Experience in Enterprise Solutions

UI design in enterprise solutions goes beyond aesthetics, focusing on enhancing usability and user satisfaction. By emphasizing clarity, visual hierarchy, feedback, and consistency, UI improves efficiency and productivity, allowing users to navigate complex tasks seamlessly.

Mandeep Kaur

  1. Oct 23, 2024
  2. 4 min read