HVR in AWS Marketplace

ArticlesBlog

Written by:


welcome to this recorded demonstration my name is Joe and in this video you will learn how to use the hvr for AWS image that is available in the Amazon Web Services marketplace HVR is an all-in-one Box software solution to perform efficient real-time change data capture and continuous data integration between databases and data stores in a heterogeneous environment hvr is a flexible technology that can be used for a number of cloud data integration scenarios this slide shows an overview of the most common scenarios this video uses the example of an on-premises sequel server database integrating data into Amazon RDS for Postgres for real-time reporting HVR supports numerous technologies and services in AWS to maximize efficiency hvr uses a distributed set up with a centralized hub to control all data movement as needed agents may be installed on or close to the source and destination systems in this setup it is always the hub that initiates communication to the agent firewall rules on premises and in AWS security groups must be open to allow this traffic in an on-premises to cloud scenario the hub will commonly run on premises and the AWS security group will allow inbound traffic to the destination in the cloud this video shows a demonstration to synchronize data from an on-premises sequel server database to Amazon RDS for Postgres the hub installation is on-premises and the HVR for AWS image is used to run the hvr agent as a rule of thumb for optimal efficiency make sure HVR and AWS runs in the same availability zone as the database or service taking part in real-time data movement with the hvr instance created from the marketplace connect to the server to obtain information on how to proceed to connect use the PEM key that was used when the image was created connect as a user named ec2 – user the message of the day that you see when connecting to the ec2 instance contains important information for using this image to download software from premises use the following information download the hvr software for your on-premises platforms from the ec2 instance this can be done using your preferred file transfer tools such as secure FTP secure copy or a GUI equivalent application like Wind secure copy single or double click the file to start the installation read the license agreement before accepting to continue an hvr hub will store log files and temporarily stage compressed transaction files in the run time location identified by the hvr config variable setting when all systems are up and running storage utilization will be minimal but if connections break and volume is high then gigabytes of data may temporarily be queued in this location until connectivity resumes simply follow the wizard to complete the installation similar to the installation of Windows an installation on Mac OS Linux or UNIX requires the identification of hvr home and hvr config and optionally hvr temp the runtime binaries are installed in the hvr home directory upon starting the graphical user interface for the first time hvr will prompt for a connection to a hub the hub is an installation of hvr that controls continuous data integration channels and requires a connection to a database to store configuration metadata the hub can be running local to the GUI or in a remote location my current hvr installation will act as the hub and the repository tables will be stored in a local sql server database use HVR repo as the hub database here we’re choosing to rely on implicit OS authentication by omitting the sequel server username and password repository tables will be created if they don’t already exist the on-premises hub installation shows a warning that no local license key was found this demonstration will use the cloud license that is included with the marketplace image a public key file is required to authenticate to the hvr cloud image you certificate name is included in the message of the day on the HVR image in AWS to copy the public key from the AWS image to the on-premises hvr hub you can use a command-line interface like SFTP or a graphical utility like winscp to retrieve the public key again reference the PEM file for authentication or move the public key file to the hvr home Lib cert directory on the HVR hub this is the same system where I’m running the HVR GUI the first step in hvr is to define connections to the source and target systems start with the destination AWS environment to activate the license this connection will connect the HVR hub to the hvr agent which is running on the AWS image using tcp/ip on port 43 43 remember firewalls must be configured to allow this connection to go through any sequence of characters will do for the password since authentication will actually be done to the secure certificate either browse for the public key certificate file on the remote AWS image or type in its name now select the cloud license checkbox this AWS location defines the connection to the target Amazon RDS for Postgres database which we are connecting to through the HVR our agent running on the AWS image a Postgres nine client is installed in the environment browse to find the location for libPQ so note that the Browse button will use the remote hvr agent to browse the Linux file system in the AWS environment get the note information from the AWS console you provide the remaining connection information and then test the connection note the certification and license information were automatically captured as an action called location properties for the location AWS after creating my target connection I’ll define a new connection for my sequel server source the sequel server application database like my hvr repository happens to be local to the on-premises hvr hub leave username and password blank to use the OS authentication with the connection to the source and destination database is defined the next step is to define a channel the channel defines the data flow between sources and targets including tables that will be synchronized as well as the actions defining the flow of the data transformations and any other runtime behaviors transactional consistency is always maintained across the tables in the channel the link between the logical channel finition and the physical location connections is made through location groups channel definitions use a logical location group name to reference one or more physical source and target locations sequel server is my source database you use table explorer to select tables from the source databases dictionary the application tables are in sequel server the Postgres database is currently empty in this case the application tables are in the dbo schema I will replicate all of them HVR imports the table definitions to the metadata repository and automatically maps data types between my sequel server source and Postgres target hvr action is to find the flow of data transformations and other runtime settings choosing an object in the tree allows you to quickly create an action specific to that object HVR actions are powerful and there’s a long list of them to satisfy many use cases channels typically only use a small subset of the available actions the capture action is used to indicate the source of real-time changes the top part of the action dialog shows the context in this case the context is the channel name sequel to AWS the group SRC and a wildcard for all tables HVR recognizes that the group SRC is a sequel server database so only options relevant to sql server are shown use context-sensitive help to learn more about the options assume a default value with my administrative OS privileges and sysadmin access to the source database I can leave the defaults the integrate action is used to define the target into which data is being integrated similar to the capture action the integrate action options are appropriately filtered in this case for the Postgres target there is context-sensitive help available to explain when to use which options again the defaults are well chosen for most use cases at this point the channel definition is complete and the runtime environment can be generated run hvr initialize to deploy the setup and be ready for real-time change data movement hvr initialize must always be run to propagate changes from the repository to the runtime environment by default hvr initialize is run for all locations and all advanced options are checked Advanced Options create runtime objects and set the capture time CAPTCHA rewind options are available to recapture changes as long as transaction log backups are still accessible later when small changes to the channel are being made such as adding a new table some of the Advanced Options should be unchecked to ensure the entire channel is not reset in particular later initialization should uncheck transaction files and capture time more detailed information is on the hvr form search for initialize it’s fear initialized created the runtime environment for real-time change data movement but it did not yet create the target application tables nor did it perform the initial data load use hvr refresh to perform these next steps note that hvr also features a heterogeneous compare function that is not part of this demo but can be used to validate that systems are in sync prior to switching from on-premises to cloud in the hvr refresh dialog use the option to create absent tables increase the parallelism to speed up the data load hvr refresh makes assumptions about the data flow based on sources and targets in the channel definition bulk granularity is the default which implements a fast direct path load also hvr will automatically propose an online refresh if the channel was previously initialized to automatically align the initial data load with any data that may be in the queue when refreshing an initialized channel hvr refresh will by default truncate or delete from target tables this confirmation is an opportunity to rethink a potentially lengthy data load partial data loads are also possible my dataset was loaded in just a couple of minutes it’s fear initialize created trips and jobs under the scheduler node jobs are the runtime processes and are initially created in a suspended State put them into a started or unsuspended state to start real-time data movement start the scheduler if it is not yet running on Windows the schedulers generally run as a service that automatically starts when the server boots log files are created at the job and channel levels and for the scheduler as a whole use view log on the context menu to inspect the logs let’s introduce a transactional workload note at this point hvr is processing thousands of operations per second run statistics to get charts in numbers on the data movement the progressively larger spikes for integrated changes indicate that my Small in tune Postgres instance is having trouble keeping up with the rate of changes on my source sequel server database various alerts can be sent out automatically through maintenance tasks create one or more maintenance tasks periodically scan logged output for issues perform a system health check set a latency limit that will trigger an alert receive alerts through email messages SNMP notifications or post the slack channels scheduled the maintenance tasks interval at this point I’ve shown you an end-to-end demo using the HVR AWS marketplace Amazon machine image or ami transactions are flown between a sequel server source database and a Postgres RDS instance I didn’t demo it today but hvr does have an integrated compare utility that shows if the systems are in sync the hvr AWS marketplace image includes a ready to go complete continuous data integration solution for heterogeneous environments the offering is optimized for efficient and secure data movement for more information or to ask questions on the community forum visit the hvr website at www.hvr-software.com and thank you for watching

Leave a Reply

Your email address will not be published. Required fields are marked *