New Feature Offerings in SQL Server 2019


Written by:

>>Hello everyone,
I’m Rony Chatterjee, a senior program manager on
Microsoft SQL Server team. Today, I’m going to show our SQL Server
2019 product offerings in Azure Data Studio. A cross platform
multidatabase tool designed to empower data
engineers and data scientists. SQL Server 2019 is
deployed on Kubernetes providing the flexibility to run on premises or in the cloud. In this instance of
Azure Data Studio, I’m connected to two SQL
Server instances and to the Spark HDFS endpoint in
the Kubernetes cluster. SQL Server 2019 provides a unified view over
enterprise data. Whether it’s relational
data stored in databases or big data stored in HTFS clusters. SQL Server 2019 allows
querying data from other data sources such as
Oracle, Teradata and MongoDB. In this example, I’m
virtualizing data from Oracle. Data virtualization
provides data quality, data security and data privacy. Once I choose the data
I want to virtualize, I can easily now write a simple SQL query
which will actually query the results from
my remote server in Oracle. Once the data is in SQL Server, I can write a simple select
top thousand records and this actually queries
the data which is in Oracle. In SQL Server 2019, we have introduced
the ability for the SQL engine to read
files located in HDFS. In this example, I am uploading a sample file in
HDFS and then we’ll show how to write a
SQL query to query directly from the file which
we just stored in HDFS. Once we create the external
table over the files in HDFS, we can now easily join this data with other
relational data sources. In this way, SQL
Server 2019 joins high value data in
relational databases with high volume of data in HDFS. SQL 2019 also provides scalable compute and storage
for faster data processing. SQL Server 2019 is
the first release where we’re bringing
both SQL and Spark together providing
query capabilities over scalable storage across
relational and big data. Azure Data Studio, I can
easily browse my files in HDFS and in one click I can start analyzing
my files in a notebook. Within Azure Data Studio, we have an integrated
notebook viewer which seamlessly connects to
the SQL Server 2019 cluster. It’s attached to
the PI Spark kernel and lets you submit your Spark jobs
against the cluster. Data scientists spend a lot of their time to
prepare the data. In Azure Data Studio, we have made it easier for data scientists to
be more productive. Let me show you a sample file which we would like to analyze. As you can see, this file is not
properly formatted. It has lot of denominators
and has lot of white spaces. If we have to process this file, we would need complex
regular expressions. So, what we have done is we have integrated AI and ML packages for program synthesis from
Microsoft Research in our notebook offering. I can load my files using the pros code accelerator
and I can provide an example file
from which pros can easily learn and find
patterns in the data. Pros learns and generates
a sample code which I then can use to feed in on that particular file
and finally, my particular file which was just unformatted has
some structure to it. Then Azure Data Studio,
our notebook viewer, is integrated with the Jupiter
ecosystem which allows us to access over 1.5
million notebooks. The notebooks allow
customers to install custom AI and ML packages including rates,
visualization, libraries. In Azure Data Studio, we have also made it
simple for customers to submit Spark jobs
against the cluster. We have built a rich, Spark job graph
viewer which would allow customers to monitor
their submitted Spark jobs. In today’s demo,
we show how we are using SQL Server 2019 and Spark together as a unified
data platform running on Kubernetes and how Azure Data Studio provides a seamless experience
over the data. Thank you.

Leave a Reply

Your email address will not be published. Required fields are marked *