Google introduced its Apache Kafka for BigQuery cloud service at its convention Google Cloud Subsequent 2024 in Las Vegas. Welcome to the information streaming membership becoming a member of Amazon, Microsoft, IBM, Oracle, Confluent, and others. This weblog put up explores this new managed Kafka providing for GCP, evaluations the present standing of the information streaming panorama, and shares some standards to judge when Kafka basically and Google Apache Kafka specifically ought to (not) be used.
Welcome Google Apache Kafka to the Knowledge Streaming Membership
Higher late than by no means… Google introduced a model new Apache Kafka cloud service for GCP at Google Cloud Subsequent 2024. All different main cloud suppliers have already got one, together with AWS, Azure, Oracle, IBM, and Alibaba. Varied different software program distributors present Kafka providers, together with Confluent, Aiven, Redpanda, WarpStream, and plenty of extra. Most leverage the open-source Kafka challenge as its core element, whereas others re-implement the Kafka protocol.
Apache Kafka and Apache Flink dominate the open-source knowledge streaming ecosystem. Distributors and cloud options present cloud-native choices. Some builders, knowledge engineers, and enterprise folks nonetheless battle with a paradigm shift: Steady knowledge processing permits higher knowledge high quality, lowered price, and quicker time to market with revolutionary new purposes. Kafka and Flink are a match made in heaven for knowledge streaming.
Use Circumstances for knowledge streaming exist throughout all industries. Google Apache Kafka for BigQuery is probably match for a few of them, however not for others.
Google Apache Kafka for BigQuery — What Is It?
What’s Google Apache Kafka for BigQuery? Quoting Google’s web site: “Apache Kafka for BigQuery is a managed service that operates highly available Apache Kafka clusters. It is compatible with open source versions of Apache Kafka and includes first-party Google Cloud IAM, monitoring, logging, key management, organization policy, networking, and more.” Listed below are a number of extra ideas:
- Asynchronous messaging with true decoupling and producers and shoppers utilizing the publish/subscribe sample is feasible with GCP proprietary service Google Pub/Sub. Why did Google now introduce a Kafka service? Limitations of Google Pub/Sub or as a result of Kafka grew to become the usual (e.g., emigrate on-premise Kafka workloads from prospects)? I suppose a little bit of each.
- Google re-uses open-source Kafka as a substitute of re-implementing the Kafka protocol (like Microsoft Azure’s Occasion Hubs). I like this strategy as a brand new implementation at all times creates a number of new challenges like lacking completeness, delays of latest options, and sudden habits. The compatibility with open-source Kafka is talked about a number of instances. My private assumption is that Google’s major strategic objective for the brand new Kafka service is emigrate current on-premise workloads into Google Cloud.
- I actually like that the service is safe out of the field. It’s built-in with and helps Google Cloud IAM, customer-managed encryption keys (CMEK), and Digital Personal Cloud (VPC) from the start. That is necessary as most workloads at enterprises require this.
- Together with the time period ‘BigQuery’ is barely a advertising and marketing technique: “Data engineers often rely on Apache Kafka to build pipelines that stream data into BigQuery and other analytics systems. Apache Kafka for BigQuery can be used for real-time and batch use cases”. There isn’t any requirement to make use of BigQuery for analytics. Google’s Kafka service is usable with different analytics platforms, too.
- Google emphasizes analytics use instances all over the place round its Kafka service; NOT transactional workloads. This strategy is much like Amazon MSK. Hopefully, the Google phrases and circumstances do not exclude Kafka help when the service is GA (that is what MSK does — sadly, too many individuals do not learn T&C and simply use a cloud service in manufacturing).
Knowledge Streaming Is a NEW Software program Class
Knowledge streaming represents a brand new software program class that revolutionizes the best way companies harness and course of knowledge in real-time. In contrast to conventional batch processing strategies, knowledge streaming permits steady ingestion, evaluation, and processing of information because it flows by methods.
The Knowledge Streaming Panorama 2024
Many software program corporations have emerged within the knowledge streaming class in the previous few years. And a number of other mature gamers within the knowledge market added help for knowledge streaming of their platforms or cloud service ecosystem. Most software program distributors use Kafka for his or her knowledge streaming platforms. Nonetheless, there may be greater than options powered by open-source Kafka. Some distributors solely use the Kafka protocol (e.g., Azure Occasion Hubs) or totally completely different APIs (like Amazon Kinesis).
The next Knowledge Streaming Panorama 2024 summarizes the present standing of related merchandise and cloud providers for knowledge streaming round Kafka and extra stream processing engines.
Forrester Wave for Streaming Knowledge and IDG MarketScape for Stream Processing
Apache Kafka grew to become the de facto customary for knowledge streaming, much like how Amazon S3 grew to become the de facto customary for object storage.
In December 2023, the analysis firm Forrester printed “The Forrester Wave™: Streaming Knowledge Platforms, This autumn 2023.” Get free entry to the report right here. The leaders are Microsoft, Google, and Confluent, adopted by Oracle, Amazon, Cloudera, and some others.
In April 2024, IDC named Confluent a pacesetter within the IDC MarketScape for Worldwide Analytic Stream Processing 2024.
It will not be a shock if we see a Gartner Magic Quadrant for Knowledge Streaming quickly, too. Gartner reviews point out Kafka and associated distributors increasingly more 12 months by 12 months.
When Not To Select Google Apache Kafka for BigQuery
Qualifying out a expertise is usually the better choice. Why consider a service if it doesn’t meet the necessities? Let’s discover when NOT to make use of Kafka in any respect, and particularly when the Google Apache Kafka service might be NOT the suitable selection for you.
When Not To Use Apache Kafka
Apache Kafka has overlaps with applied sciences like a message dealer (like IBM MQ, TIBCO, or RabbitMQ), and different streaming analytics platforms, and it truly is a database, too. However Apache Kafka just isn’t an allrounder to unravel each downside.
Apache Kafka is NOT:
- A alternative to your favourite database, knowledge warehouse, or knowledge lake. As an alternative, it enhances and integrates with these platforms.
- An analytics platform for AI/ML mannequin coaching, although mannequin scoring is usually carried out throughout the streaming platform for vital or low-latency use instances.
- A proxy for 1000’s of shoppers in unhealthy networks.
- An API Administration resolution, although you possibly can join REST/HTTP producers and shoppers towards Kafka.
- An IoT gateway, although direct integration with IoT protocols like MQTT or OPC-UA is feasible.
- Arduous real-time for safety-critical embedded workloads.
Learn the thorough evaluation “When NOT to use Apache Kafka?” for extra particulars. Or watch this YouTube video:
When To Select One other Kafka As an alternative of Google’s
If Apache Kafka is the suitable selection to your challenge, you continue to have loads of choices.
Listed below are a number of standards that allow you to simply disqualify Google Apache Kafka for BigQuery:
- Non-GCP: In case your use case requires on-premise, multi-cloud, hybrid cloud, or edge deployments, then you definitely want one other provide.
- Important SLAs: If you happen to want 24/7 vital help and consulting experience, a devoted Kafka vendor like Confluent is the higher selection. Kafka isn’t just for analytics, however shines for transactional workloads, too. Google’s Managed Apache Kafka service just isn’t GA but. This can in all probability occur within the second half of 2024. Therefore, do not even contemplate it for vital purposes earlier than GA.
- Serverless: A managed service just isn’t at all times a very managed service. The longer term will present the place Google goes with Kafka. However proper now, Google Apache Kafka just isn’t serverless like e.g., Confluent Cloud. You pay for capability pricing and cluster capability administration is required. Amazon even created a second service Amazon MSK Serverless to deal with this difficulty with its conventional MSK providing.
- Full knowledge streaming platform: A knowledge streaming platform requires extra than simply messaging: knowledge integration with first and third-party methods, stream processing for steady knowledge correlation, versatile (long-term) retention with Tiered Storage, knowledge governance, and extra. The longer term will present us the place Google’s Kafka service goes. Google is a automotive, however not (but) a Porsche (full luxurious automotive) and never but a Google Waymo (self-driving automotive degree 5). Google Apache Kafka even misses primary options for knowledge streaming greatest practices, like defining knowledge contracts in schemas for constructing knowledge merchandise with good knowledge high quality.
The Evolution of Knowledge Streaming Is Not Stopping
If you happen to didn’t qualify out Kafka basically or Google Apache Kafka specifically but, that is nice. Begin evaluating Google’s Managed Apache Kafka cloud service and evaluate it towards self-managed open supply Kafka and different semi-managed or fully-managed Kafka cloud providers on GCP.
As we glance forward, the longer term prospects for knowledge streaming are boundless, promising extra agile, clever, and real-time insights into the ever-increasing streams of information.
I usually get the query if I’m anxious in regards to the rising competitors as I work for Confluent the place we “only do data streaming”?
No, I’m not! Truly, the brand new Google Apache Kafka cloud service is nice information for the business! Knowledge Streaming established itself as a brand new software program class. Analysis analysts like Forrester and IDG already created devoted waves and comparisons. What might be higher than working with the individuals who invented Kafka and the corporate that created this software program class throughout all industries and continents? And competitors is at all times good for innovation, too.
Actual-time knowledge beats gradual knowledge. That’s true in nearly each use case. At Confluent, we at the moment are ~3000 folks working solely on one factor: Knowledge Streaming. I believe we must always have fun this Google announcement and look ahead to extra mass adoption of information streaming world wide.
And as a strategic Google associate, prospects can
- Leverage GCP credit to devour Confluent Cloud
- Leverage GCPs safety and personal networking infrastructure
- Combine through totally managed connectors into varied GCP providers like Google Huge Question or Google Cloud Storage and third-party cloud options like MongoDB, Snowflake, or Databricks.
Are you excited in regards to the new Google Apache Kafka cloud service? Or do you continue to plan to make use of open-source Kafka or one other cloud service like Confluent Cloud? Let’s join on LinkedIn and focus on it! Keep knowledgeable about new weblog posts by subscribing to my publication.