Search results for hazelcast

Talend Input component for Hazelcast  Example of input component implementation with Talend Component Kit   tutorial example partition mapper producer source hazelcast distributed

This tutorial walks you through the creation, from scratch, of a complete Talend input component for hazelcast using the Talend Component Kit (TCK) framework. hazelcast is an in-memory distributed system that can store data, which makes it a good example of input component for distributed systems. This is enough for you to get started with this tutorial, but you can find more information about it here: hazelcast.org/. A TCK project is a simple Java project with specific configurations and dependencies. You can choose your preferred build tool from Maven or Gradle as TCK supports both. In this tutorial, Maven is used. The first step consists in generating the project structure using Talend Starter Toolkit . Go to starter-toolkit.talend.io/ and fill in the project information as shown in the screenshots below, then click Finish and Download as ZIP. image::tutorial_hazelcast_generateproject_1.png[] image::tutorial_hazelcast_generateproject_2.png[] Extract the ZIP file into your workspace and import it to your preferred IDE. This tutorial uses Intellij IDE, but you can use Eclipse or any other IDE that you are comfortable with. You can use the Starter Toolkit to define the full configuration of the component, but in this tutorial some parts are configured manually to explain key concepts of TCK. The generated pom.xml file of the project looks as follows: Change the name tag to a more relevant value, for example: Component hazelcast. The component-api dependency provides the necessary API to develop the components. talend-component-maven-plugin provides build and validation tools for the component development. The Java compiler also needs a Talend specific configuration for the components to work correctly. The most important is the -parameters option that preserves the parameter names needed for introspection features that TCK relies on. Download the mvn dependencies declared in the pom.xml file: You should get a BUILD SUCCESS at this point: Create the project structure: Create the component Java packages. Packages are mandatory in the component model and you cannot use the default one (no package). It is recommended to create a unique package per component to be able to reuse it as dependency in other components, for example to guarantee isolation while writing unit tests. The project is now correctly set up. The next steps consist in registering the component family and setting up some properties. Registering every component family allows the component server to properly load the components and to ensure they are available in Talend Studio. The family registration happens via a package-info.java file that you have to create. Move to the src/main/java/org/talend/components/hazelcast package and create a package-info.java file: @Components: Declares the family name and the categories to which the component belongs. @Icon: Defines the component family icon. This icon is visible in the Studio metadata tree. Talend Component Kit supports internationalization (i18n) via Java properties files. Using these files, you can customize and translate the display name of properties such as the name of a component family or, as shown later in this tutorial, labels displayed in the component configuration. Go to src/main/resources/org/talend/components/hazelcast and create an i18n Messages.properties file as below: You can define the component family icon in the package-info.java file. The icon image must exist in the resources/icons folder. TCK supports both SVG and PNG formats for the icons. Create the icons folder and add an icon image for the hazelcast family. This tutorial uses the hazelcast icon from the official GitHub repository that you can get from: avatars3.githubusercontent.com/u/1453152?s=200&v=4 Download the image and rename it to hazelcast_icon32.png. The name syntax is important and should match _icon.32.png. The component registration is now complete. The next step consists in defining the component configuration. All Input and Output (I/O) components follow a predefined model of configuration. The configuration requires two parts: Datastore: Defines all properties that let the component connect to the targeted system. Dataset: Defines the data to be read or written from/to the targeted system. Connecting to the hazelcast cluster requires the IP address, group name and password of the targeted cluster. In the component, the datastore is represented by a simple POJO. Create a hazelcastDatastore.java class file in the src/main/java/org/talend/components/hazelcast folder. Define the i18n properties of the datastore. In the Messages.properties file let add the following lines: The hazelcast datastore is now defined. hazelcast includes different types of datastores. You can manipulate maps, lists, sets, caches, locks, queues, topics and so on. This tutorial focuses on maps but still applies to the other data structures. Reading/writing from a map requires the map name. Create the dataset class by creating a hazelcastDataset.java file in src/main/java/org/talend/components/hazelcast. The @Dataset annotation marks the class as a dataset. Note that it also references a datastore, as required by the components model. Just how it was done for the datastore, define the i18n properties of the dataset. To do that, add the following lines to the Messages.properties file. The component configuration is now ready. The next step consists in creating the Source that will read the data from the hazelcast map. The Source is the class responsible for reading the data from the configured dataset. A source gets the configuration instance injected by TCK at runtime and uses it to connect to the targeted system and read the data. Create a new class as follows. The source also needs i18n properties to provide a readable display name. Add the following line to the Messages.properties file. At this point, it is already possible to see the result in the Talend Component Web Tester to check how the configuration looks like and validate the layout visually. To do that, execute the following command in the project folder. This command starts the Component Web Tester and deploys the component there. Access localhost:8080/. The source is set up. It is now time to start creating some hazelcast specific code to connect to a cluster and read values for a map. Add the hazelcast-client Maven dependency to the pom.xml of the project, in the dependencies node. Add a hazelcast instance to the @PostConstruct method. Declare a hazelcastInstance attribute in the source class. Any non-serializable attribute needs to be marked as transient to avoid serialization issues. Implement the post construct method. The component configuration is mapped to the hazelcast client configuration to create a hazelcast instance. This instance will be used later to get the map from its name and read the map data. Only the required configuration in the component is exposed to keep the code as simple as possible. Implement the code responsible for reading the data from the hazelcast map through the Producer method. The Producer implements the following logic: Check if the map iterator is already initialized. If not, get the map from its name and initialize the map iterator. This is done in the @Producer method to ensure the map is initialized only if the next() method is called (lazy initialization). It also avoids the map initialization in the PostConstruct method as the hazelcast map is not serializable. All the objects initialized in the PostConstruct method need to be serializable as the source can be serialized and sent to another worker in a distributed cluster. From the map, create an iterator on the map keys that will read from the map. Transform every key/value pair into a Talend Record with a "key, value" object on every call to next(). The RecordBuilderFactory class used above is a built-in service in TCK injected via the Source constructor. This service is a factory to create Talend Records. Now, the next() method will produce a Record every time it is called. The method will return "null" if there is no more data in the map. Implement the @PreDestroy annotated method, responsible for releasing all resources used by the Source. The method needs to shut the hazelcast client instance down to release any connection between the component and the hazelcast cluster. The hazelcast Source is completed. The next section shows how to write a simple unit test to check that it works properly. TCK provides a set of APIs and tools that makes the testing straightforward. The test of the hazelcast Source consists in creating an embedded hazelcast instance with only one member and initializing it with some data, and then in creating a test Job to read the data from it using the implemented Source. Add the required Maven test dependencies to the project. Initialize a hazelcast test instance and create a map with some test data. To do that, create the hazelcastSourceTest.java test class in the src/test/java folder. Create the folder if it does not exist. The above example creates a hazelcast instance for the test and creates the MY-DISTRIBUTED-MAP map. The getMap creates the map if it does not already exist. Some keys and values uses in the test are added. Then, a simple test checks that the data is correctly initialized. Finally, the hazelcast test instance is shut down. Run the test and check in the logs that a hazelcast cluster of one member has been created and that the test has passed. To be able to test components, TCK provides the @WithComponents annotation which enables component testing. Add this annotation to the test. The annotation takes the component Java package as a value parameter. Create the test Job that configures the hazelcast instance and link it to an output that collects the data produced by the Source. Execute the unit test and check that it passes, meaning that the Source is reading the data correctly from hazelcast. The Source is now completed and tested. The next section shows how to implement the Partition Mapper for the Source. In this case, the Partition Mapper will split the work (data reading) between the available cluster members to distribute the workload. The Partition Mapper calculates the number of Sources that can be created and executed in parallel on the available workers of a distributed system. For hazelcast, it corresponds to the cluster member count. To fully illustrate this concept, this section also shows how to enhance the test environment to add more hazelcast cluster members and initialize it with more data. Instantiate more hazelcast instances, as every hazelcast instance corresponds to one member in a cluster. In the test, it is reflected as follows: The above code sample creates two hazelcast instances, leading to the creation of two hazelcast members. Having a cluster of two members (nodes) will allow to distribute the data. The above code also adds more data to the test map and updates the shutdown method and the test. Run the test on the multi-nodes cluster. The Source is a simple implementation that does not distribute the workload and reads the data in a classic way, without distributing the read action to different cluster members. Start implementing the Partition Mapper class by creating a hazelcastPartitionMapper.java class file. When coupling a Partition Mapper with a Source, the Partition Mapper becomes responsible for injecting parameters and creating source instances. This way, all the attribute initialization part moves from the Source to the Partition Mapper class. The configuration also sets an instance name to make it easy to find the client instance in the logs or while debugging. The Partition Mapper class is composed of the following: constructor: Handles configuration and service injections Assessor: This annotation indicates that the method is responsible for assessing the dataset size. The underlying runner uses the estimated dataset size to compute the optimal bundle size to distribute the workload efficiently. Split: This annotation indicates that the method is responsible for creating Partition Mapper instances based on the bundle size requested by the underlying runner and the size of the dataset. It creates as much partitions as possible to parallelize and distribute the workload efficiently on the available workers (known as members in the hazelcast case). Emitter: This annotation indicates that the method is responsible for creating the Source instance with an adapted configuration allowing to handle the amount of records it will produce and the required services. I adapts the configuration to let the Source read only the requested bundle of data. The Assessor method computes the memory size of every member of the cluster. Implementing it requires submitting a calculation task to the members through a serializable task that is aware of the hazelcast instance. Create the serializable task. The purpose of this class is to submit any task to the hazelcast cluster. Use the created task to estimate the dataset size in the Assessor method. The Assessor method calculates the memory size that the map occupies for all members. In hazelcast, distributing a task to all members can be achieved using an execution service initialized in the getExecutorService() method. The size of the map is requested on every available member. By summing up the results, the total size of the map in the distributed cluster is computed. The Split method calculates the heap size of the map on every member of the cluster. Then, it calculates how many members a source can handle. If a member contains less data than the requested bundle size, the method tries to combine it with another member. That combination can only happen if the combined data size is still less or equal to the requested bundle size. The following code illustrates the logic described above. The next step consists in adapting the source to take the Split into account. The following sample shows how to adapt the Source to the Split carried out previously. The next method reads the data from the members received from the Partition Mapper. A Big Data runner like Spark will get multiple Source instances. Every source instance will be responsible for reading data from a specific set of members already calculated by the Partition Mapper. The data is fetched only when the next method is called. This logic allows to stream the data from members without loading it all into the memory. Implement the method annotated with @Emitter in the hazelcastPartitionMapper class. The createSource() method creates the source instance and passes the required services and the selected hazelcast members to the source instance. Run the test and check that it works as intended. The component implementation is now done. It is able to read data and to distribute the workload to available members in a Big Data execution environment. Refactor the component by introducing a service to make some pieces of code reusable and avoid code duplication. Refactor the hazelcast instance creation into a service. Inject this service to the Partition Mapper to reuse it. Adapt the Source class to reuse the service. Run the test one last time to ensure everything still works as expected. Thank you for following this tutorial. Use the logic and approach presented here to create any input component for any system.

Implementing an Output component for Hazelcast  Example of output component implementation with Talend Component Kit   tutorial example output processor hazelcast

This tutorial is the continuation of Talend Input component for hazelcast tutorial. We will not walk through the project creation again, So please start from there before taking this one. This tutorial shows how to create a complete working output component for hazelcast As seen before, in hazelcast there is multiple data source type. You can find queues, topics, cache, maps… In this tutorials we will stick with the Map dataset and all what we will see here is applicable to the other types. Let’s assume that our hazelcast output component will be responsible of inserting data into a distributed Map. For that, we will need to know which attribute from the incoming data is to be used as a key in the map. The value will be the hole record encoded into a json format. Bu that in mind, we can design our output configuration as: the same Datastore and Dataset from the input component and an additional configuration that will define the key attribute. Let’s create our Output configuration class. Let’s add the i18n properties of our configuration into the Messages.properties file The skeleton of the output component looks as follows: @Version annotation indicates the version of the component. It is used to migrate the component configuration if needed. @Icon annotation indicates the icon of the component. Here, the icon is a custom icon that needs to be bundled in the component JAR under resources/icons. @Processor annotation indicates that this class is the processor (output) and defines the name of the component. constructor of the processor is responsible for injecting the component configuration and services. Configuration parameters are annotated with @Option. The other parameters are considered as services and are injected by the component framework. Services can be local (class annotated with @Service) or provided by the component framework. The method annotated with @PostConstruct is executed once by instance and can be used for initialization. The method annotated with @PreDestroy is used to clean resources at the end of the execution of the output. Data is passed to the method annotated with @ElementListener. That method is responsible for handling the data output. You can define all the related logic in this method. If you need to bulk write the updates accordingly to groups, see Processors and batch processing. Now, we will need to add the display name of the Output to the i18n resources file Messages.properties Let’s implement all of those methods We will create the outpu contructor to inject the component configuration and some additional local and built in services. Built in services are services provided by TCK. Here we find: configuration is the component configuration class hazelcastService is the service that we have implemented in the input component tutorial. it will be responsible of creating a hazelcast client instance. jsonb is a built in service provided by tck to handle json object serialization and deserialization. We will use it to convert the incoming record to json format before inseting them into the map. Nothing to do in the post construct method. but we could for example initialize a hazle cast instance there. but we will do it in a lazy way on the first call in the @ElementListener method Shut down the hazelcast client instance and thus free the hazelcast map reference. We get the key attribute from the incoming record and then convert the hole record to a json string. Then we insert the key/value into the hazelcast map. Let’s create a unit test for our output component. The idea will be to create a job that will insert the data using this output implementation. So, let’s create out test class. Here we start by creating a hazelcast test instance, and we initialize the map. we also shutdown the instance after all the test are executed. Now let’s create our output test. Here we start preparing the emitter test component provided bt TCK that we use in our test job to generate random data for our output. Then, we use the output component to fill the hazelcast map. By the end we test that the map contains the exact amount of data inserted by the job. Run the test and check that it’s working. Congratulation you just finished your output component.

Testing a component  Example of input component testing using Talend Component Kit   tutorial example test hazelcast testing

This tutorial focuses on writing unit tests for the input component that was created in this previous tutorial. This tutorial covers: How to load components in a unit test. How to create a job pipeline. How to run the test in standalone mode. The test class is as follows:

Tutorials  Guided implementation examples to get your hands on Talend Component Kit   tutorial example implement test dev testing

The following tutorials are designed to help you understand the main principles of component development using Talend Component Kit. With this set of tutorials, get your hands on project creation using the Component Kit Starter and implement the logic of different types of components. Creating your first component Generating a project from the starter Creating a hazelcast input component Creating a hazelcast output component Creating a Zendesk REST API connector Handling component version migration With this set of tutorials, learn the different approaches to test the components created in the previous tutorials. Testing a Zendesk REST API connector Testing a hazelcast component Testing in a continuous integration environment