Spark-to-Starburst Engine Swap Speeds Big Driving Data for Arity

The IT team at Arity are cruising on the homestretch of a big project to load more than a trillion miles of driving data into a new database on Amazon S3. But if it wasn’t for a decision to switch out its engine from Spark to Starburst, the project would still be stuck in neutral….

Read More

AtScale Launches Public Leaderboard for Evaluating Text-to-SQL Solutions

As the demand for natural language data queries continues to grow, so does the need for a standardized way to evaluate Text-to-SQL (T2SQL) solutions. Despite rapid advancements in T2SQL technologies, the industry has struggled with inconsistent benchmarks. This lack of uniform standards has made it challenging for stakeholders to accurately assess and compare solution performance….

Read More

AtScale Launches Public Leaderboard for Evaluating Text-to-SQL Solutions

As the demand for natural language data queries continues to grow, so does the need for a standardized way to evaluate Text-to-SQL (T2SQL) solutions. Despite rapid advancements in T2SQL technologies, the industry has struggled with inconsistent benchmarks. This lack of uniform standards has made it challenging for stakeholders to accurately assess and compare solution performance….

Read More

Couchbase Launches Column Store to Activate Dormant JSON Data

Couchbase says the new column store that it officially launched today on AWS will streamline analytics on “dormant” JSON data residing in its NoSQL database. The company also launched vector search capabilities in the mobile version of its database and the new free tier in the cloud. Couchbase historically sought to split the difference between…

Read More

How to Become a Data Engineer

The work of data engineers is extremely technical. They are responsible for designing and maintaining the architecture of data systems, which incorporates concepts ranging from analytic infrastructures to data warehouses. A data engineer needs to have a solid understanding of commonly used scripting languages and is expected to support the steady evolution of improved Data Quality,…

Read More

Starburst Brings Dataframes Into Trino Platform

Starburst customers who prefer to manipulate data using dataframes as opposed to regular SQL will be happy with a pair of announcements made today. That includes the introduction of PyStarburst, which provides a PySpark-like syntax for transforming data residing in Starburst’s hosted Galaxy environment, as well as support for Ibis, a portable dataframe library developed…

Read More

In Search of Data Model Repeatability

Everybody wants to be data-driven–that much is clear. But that desire doesn’t necessarily translate into real business results, especially in competitive industries like ecommerce. Data quality has long been a burr in the side of would-be data champions. The need to cleanse and normalize dirty and inconsistent data often consumes the lion’s share of the…

Read More

TikTok Parent Open Sources Real-Time Data Warehouse

You might not yet be a major TikTok influencer, but you can still analyze data like TikTok’s parent company, ByteDance, which recently released its real-time data warehouse architecture as open source. ByConity, the name of ByteDance’s data warehouse, is an elastically scalable, column-oriented relational database that’s based on ClickHouse, the scalable, open-source database that the…

Read More