Cloud Striim Flows Data ‘Uphill’ To Google BigQuery Adrian Bridgwater Senior Contributor Opinions expressed by Forbes Contributors are their own. I track enterprise software application development & data management. Following New! Follow this author to stay notified about their latest stories.
Got it! Oct 22, 2022, 05:55pm EDT | New! Click on the conversation bubble to join the conversation Got it! Share to Facebook Share to Twitter Share to Linkedin Visitors try out the new family water slide on June 30, 2020 at the Aqualand Moravia World of Water . . .
[+] leisure park in Pasohlavky, south of Brno, Czech Republic. – The water slide suitable for kids and adults is 242 meters long, was made in Canada and people go downhill and also uphill like on a roller coaster. (Photo by Radek Mica / AFP) (Photo by RADEK MICA/AFP via Getty Images) AFP via Getty Images Data is flowing faster.
As we have noted here recently , modern business is now increasingly running on data streaming technologies designed to channel a flood (in a positive sense) of real time data into and out of applications, across analytics engines and through database structures. Some of that data flow will now reside and be processed in well-known enterprise databases from the data vendors that even average non-technical laypersons will have heard of. Other elements of that streamed data flow need to be churned and wrangled through the new and more powerful services offered by the major ‘hyperscaler’ Cloud Services Providers (CSPs).
Getting data from one (often legacy) database into a hyperscaler data service involves more than investing in a new cable or clicking a button. Stream on Striim Logically named to convey a sense of data flow from the start, Striim, Inc. (pronounced stream, as in river) works to not only create and build the data pipeline to get data from traditional databases to new cloud services, it also works to filter, transform, enrich and correlate that data on its journey.
The company’s Striim for BigQuery is a cloud-based streaming service that uses Change Data Capture (CDC) technologies (a database process designed to track, pinpoint and subsequently work on the changed data in any given set of information) to integrate and replicate data from enterprise-grade databases such as Oracle, MS-SQL, PostgreSQL, MySQL and others to Google Cloud BigQuery enterprise data warehouse. MORE FOR YOU Livestream Shopping Stays Hot As Whatnot Valuation More Than Doubles To $3. 7 Billion The Russians Are Fleeing Southern Ukraine.
They Could Cause A Lot Of Damage On Their Way Out. Colorado Springs Area Residents Evacuated As Latest Wildfire Nears City In short, Google BigQuery cloud data service for business intelligence. To explain the technology in full, Google BigQuery is a fully managed (cloud-based platform-as-a-service) serverless (a virtualized server technique to deliver server resource requirements more precisely at the actual point of use) data warehouse (a data management technique created by bringing together information from more than one source) that enables scalable analysis over petabytes (1024 terabytes) of data with built-in machine learning capabilities.
Organizations using this technology can now build a new data pipeline to stream transactional data from hundreds and thousands of tables to Google BigQuery with sub-second end-to-end latencies. This is the kind of intelligence needed if we are going to enable real-time analytics and address time-sensitive operational issues. “Enterprises are increasingly seeking solutions that help bring critical data stored in databases into Google BigQuery with speed and reliability,” said Sudhir Hasbe, senior director of product management, Google Cloud.
Water-based data flow analogies If it feels like we’ll never conceivably run out of water-based data flow analogies, we probably won’t. This is a zone of technology where organizations need to replicate data from multiple databases (that they have previously been operating, many of them before the so-called digital transformation era) and get that data to cloud data warehouses, data lakes and data lakehouses. Why would companies need to do this and get data flowing in this direction? To enable their data science and analytics teams to optimize their decision-making and business workflows.
But, there are traditionally two problems a) legacy data warehouses are not easily scalable or high-performant enough to deliver real-time analysis capabilities and b) cloud-based data ingestion platforms often require significant effort to set up. Striim for BigQuery offers a user interface that allows users to configure and observe the ongoing and historical health and performance of their data pipelines, reconfigure their data pipelines to add or remove tables on the fly, and repair their pipelines in case of failures. Fresh data, come & get it Executive VP of engineering and products at Striim is Alok Pareek.
Pointing to the need for what he calls ‘fresh data’ (i. e. streamed real-time data that works at the speed of modern life and business with user mobile device ubiquity and new smart machines creating their own always-on information channels) to get business decisions right.
“Our customers are increasingly using BigQuery for their data analytics needs. We have designed Striim for BigQuery for operational ease, simplicity and resiliency so that users can quickly and easily extract business value from their data. We have automated schema management, snapshot functionality [a means of saving the current state of a data stream to start a new version or for backup & recovery purposes], CDC coordination [see above definition] and failure handling in the data pipelines to deliver a delightful user experience,” said Pareek.
There is automation happening here too. Striim for BigQuery continuously monitors and reports pipeline health and performance. When it detects tables that cannot be synced to BigQuery, it automatically quarantines the errant tables and keep the rest of the pipeline operational, preventing what could be hours of pipeline downtime.
Striim for BigQuery Striim works to continuously ingest, process and deliver high volumes of real-time data from diverse sources (both on-premises or in the cloud) to support multi- and hybrid cloud infrastructures. It collects data in real-time from enterprise databases (using non-intrusive change data capture), log files, messaging systems, and sensors and delivers it to virtually any target on-premises or in the cloud with sub-second latency enabling real-time operations and analytics. Hyperscaler indifference? All of which is great stuff then i.
e. we can get data from Oracle and other above-noted databases to hyperscaler Cloud Service Provider (CSP) clouds from Google, AWS and Microsoft better, faster, more easily and at a more cost-effective price point. We can even do so with a greater degree of additional (cleansing, filtering etc.
) services. Why, then, don’t the major cloud players offer this kind of technology? In truth they do – remember when we said that cloud-based data ingestion platforms often require significant effort to set up? Many of these functions are possible with the hyperscalers and it’s not hard to find reams of documentation across the web from all three big clouds detailing the internal mechanics of snapshots, streaming and schema management. It’s just more expensive and usually not as dedicated a service (they do have the planet’s biggest clouds to run, after all) and typically without all the kinds of add-ons discussed here.
The water-based data flow analogies will continue – coming next: the data jet wash, probably. Follow me on Twitter or LinkedIn . Adrian Bridgwater Editorial Standards Print Reprints & Permissions.
From: forbes
URL: https://www.forbes.com/sites/adrianbridgwater/2022/10/22/striim-flows-data-uphill-to-google-bigquery/