Microsoft is working to make its cloud-based Fabric combined offering of data warehousing, data science, data engineering, and data analytics services more attractive, previewing a host of new features at its first annual European Microsoft Fabric Community Conference last week.
Fabric, released in May 2023, brings together six “workloads”: Data Factory, Data Engineering, Data Warehouse, Data Science, Real-Time Intelligence, and Power BI that Microsoft says will help enterprises reduce IT integration overhead, complexity, and cost.
Microsoft is hoping to grow its share of enterprises’ data analysis spend by encouraging them to replace disparate applications from multiple vendors with its single offering, but to do that, it needs to match those vendors feature for feature. This week’s preview announcements — and the release of earlier previews to general availability — take it a step closer to parity, and maybe close enough for some enterprises to make the jump.
The changes to Fabric aim to deliver results in three areas, Arun Ulagaratchagan, corporate vice president of Azure Data at Microsoft, wrote in a blog post: AI powered development; an AI powered data estate with access to multi-cloud data from a single data lake; and AI-powered insights that can be embedded in Microsoft 365 apps accessible across the enterprise. Some of the changes are enhancements to previously introduced Copilot functionality.
Some of the highlights of the changes Ulagaratchagan presented include:
Data Factory enhancements
Fabric’s Data Factory workload provides the capabilities needed to create an automated data pipeline, including components such as Azure Data Factory and Power Query.
That can still be a complicated task, so Microsoft is trying to simplify parts of the process. For example, Copy job, now in preview, targets the data ingestion stage.
“With copy job you can ingest data at petabyte scale, without creating a dataflow or data pipeline. Copy job supports full, batch, and incremental copy from any data sources to any data destinations,” he wrote in the blog post.
Other data lakehouse and data warehouse providers, such as Snowflake, Databricks, AWS, Google, and Oracle, all provide similar capabilities to make it easier to move data or to process it without having to copy it directly.
Data Engineering additions
Microsoft’s Data Engineering consists of tools needed to design, build, and maintain systems that store and analyze data for supporting business critical applications. These tools include a Spark engine, notebooks, a lakehouse, and the underlying data pipelines.
Microsoft is previewing a tweaked version of its native execution engine for Spark that runs queries up to four times faster by running them directly on lakehouse infrastructure. It has also extended Spark support to enable querying of mirrored databases.
Enhancements in Real-Time Intelligence
Fabric’s Real-Time Intelligence workload, intended to provide up-to-the-minute insights to help enterprise users make better decisions, was formed in May by combining two modules, Synapse Real-Time Analytics and Data Activator, with some other functions. It’s still in preview, but that isn’t stopping Microsoft adding further enhancements, also still in preview.
These include the introduction of a new Real-Time hub user experience, a new page called “My Streams” to create and access custom streams, and four new eventstream connectors: Azure SQL Managed Instance – change data capture (MI CDC), SQL Server on Virtual Machine – change data capture (VM CDC), Apache Kafka, and Amazon MSK Kafka.
“These new sources empower enterprises to build richer, more dynamic eventstreams in Fabric,” Ulagaratchagan wrote, adding that Microsoft is also enhancing eventstream capabilities by supporting eventhouses as a new destination for data streams. “Eventhouses, equipped with KQL databases, are designed to analyze large volumes of data, particularly in scenarios that demand real-time insight and exploration,” he explained.
The Copilot in Real-Time Intelligence could already translate natural language into KQL to assist with data exploration; now it can operate in a conversational mode, enabling users to ask follow-up questions to refine their initial queries.
The Data Activator, which triggers actions in response to changes in the data, has also had an upgrade.
“You can set up alerts from all your streaming data, Power BI visuals, and real-time dashboards and now even set up alerts directly on your KQL queries. With these new enhancements, you can make sure action is taken the moment something important happens,” Ulagaratchagan explained.
Enhancements in Power BI and other updates
The Power BI workload combines interfaces and tools required to support data visualization and reporting to generate business intelligence or insights.
A new feature, Metric sets, currently in preview, enables trusted users to create and share standardized metrics for use across their organization, ensuring that everyone measures things in a consistent way.
In a similar vein, Microsoft is previewing “organizational apps” that enable creators to package and securely distribute Power BI reports inside an enterprise. Multiple organizational apps can exist in each workspace, and they can contain other Fabric items such as real-time dashboards and notebooks.
OneLake integration and other updates
To drive Fabric adoption Microsoft not only needs to match key features of its competitors’ products, it also needs to make it easy for customers to migrate their data into it, or access their data from it.
Enhancements to OneLake, Fabric’s unified data lake, include the general availability of data gateway shortcuts to connect to Google Cloud Services and Amazon S3 storage, as well as S3-compatible sources on premises or GCS buckets in private clouds.
PowerBI users can benefit from another generally available migration feature: Data imported into semantic models can now be automatically written to Delta Lake tables in OneLake, where it can be consumed by T-SQL, Python, Scala, PySpark, Spark SQL or R at no additional cost.
Microsoft is also previewing integration of OneLake with Azure Databricks, making it possible to access Databricks Unity Catalog tables from OneLake and keep them in sync in near real time.
One of the updates to the broader Fabric platform is the introduction, in preview, of the Terraform provider for Fabric, which can be used to streamline deployment and management tasks.
Smaller updates include the general availability of Fabric Git integration and extension of Fabric’s integration with Visual Studio Code. The Git integration will enable users to synchronize Fabric workspaces with Git repositories, leverage version control, and collaborate using Azure DevOps or GitHub, while the Visual Studio Code integration will allow users to debug Fabric notebooks with the web version of VS Code and integrate Fabric environments as artifacts with the Synapse VS Code extension, Ulagaratchagan wrote.
Fabric is still evolving and, with some key features still in preview, still has to prove itself with enterprise customers. Some modules inside Fabric, including Azure Data Factory and Power BI, have achieved popularity with enterprises while others — such as Synapse in its original avatar — have done less well. CIOs will also be cautious about changing something as fundamental as their data platform, so Microsoft will have plenty of time to add news features to Fabric — and new reasons to adopt it — while they make their minds up.
The full list of new Fabric features is in Ulagaratchagan’s blog post.
Go to Source
Author: