How to Become a Data Engineer

The work of data engineers is extremely technical. They are responsible for designing and maintaining the architecture of data systems, which incorporates concepts ranging from analytic infrastructures to data warehouses. A data engineer needs to have a solid understanding of commonly used scripting languages and is expected to support the steady evolution of improved Data Quality,…

Read More

Starburst Brings Dataframes Into Trino Platform

Starburst customers who prefer to manipulate data using dataframes as opposed to regular SQL will be happy with a pair of announcements made today. That includes the introduction of PyStarburst, which provides a PySpark-like syntax for transforming data residing in Starburst’s hosted Galaxy environment, as well as support for Ibis, a portable dataframe library developed…

Read More

In Search of Data Model Repeatability

Everybody wants to be data-driven–that much is clear. But that desire doesn’t necessarily translate into real business results, especially in competitive industries like ecommerce. Data quality has long been a burr in the side of would-be data champions. The need to cleanse and normalize dirty and inconsistent data often consumes the lion’s share of the…

Read More

TikTok Parent Open Sources Real-Time Data Warehouse

You might not yet be a major TikTok influencer, but you can still analyze data like TikTok’s parent company, ByteDance, which recently released its real-time data warehouse architecture as open source. ByConity, the name of ByteDance’s data warehouse, is an elastically scalable, column-oriented relational database that’s based on ClickHouse, the scalable, open-source database that the…

Read More