Search results for developper
Integrate components you developed using Talend Component Kit to Talend Studio in a few steps. Also learn how to enable the developer and debugging modes to iterate on your component development. The version of Talend Component Kit you need to use to develop new components depends on the version of Talend Studio in which components will be integrated. Refer to this document to learn about compatibility between Talend Component Kit and the different versions of Talend applications. Learn how to build and deploy components to Talend Studio using Maven or Gradle Talend Component Kit plugins. This can be done using the deploy-in-studio goal from your development environment. If you are unfamiliar with component development, you can also follow this example to go through the entire process, from creating a project to using your new component in Talend Studio. The Studio integration relies on the Component Server, that the Studio uses to gather data about components created using Talend Component Kit. You can change the default configuration of component server by modifying the $STUDIO_HOME/configuration/config.ini file. The following parameters are available: Name Description Default component.environment Enables the developer mode when set to dev - component.debounce.timeout Specifies the timeout (in milliseconds) before calling listeners in components Text fields 750 component.kit.skip If set to true, the plugin is not enabled. It is useful if you don’t have any component developed with the framework. false component.java.arguments Component server additional options - component.java.m2 Maven repository that the server uses to resolve components Defaults to the global Studio configuration component.java.coordinates A list of comma-separated GAV (groupId:artifactId:version) of components to register - component.java.registry A properties file with values matching component GAV (groupId:artifactId:version) registered at startup. Only use slashes (even on windows) in the path. - component.java.port Sets the port to use for the server random components.server.beam.active Active, if set to true, Beam support (Experimental). It requires Beam SDK Java core dependencies to be available. false component.server.jul.forceConsole Adds a console handler to JUL to see logs in the console. This can be helpful in development because the formatting is clearer than the OSGi one in workspace/.metadata/.log. It uses the java.util.logging.SimpleFormatter.format property to define its format. By default, it is %1$tb %1$td, %1$tY %1$tl:%1$tM:%1$tS %1$Tp %2$s%n%4$s: %5$s%6$s%n, but for development purposes [%4$s] %5$s%6$s%n is simpler and more readable. false Here is an example of a common developer configuration/config.ini file: The developer mode is especially useful to iterate on your component development and to avoid closing and restarting Talend Studio every time you make a change to a component. It adds a Talend Component Kit button in the main toolbar: When clicking this button, all components developed with the Talend Component Kit framework are reloaded. The cache is invalidated and the components refreshed. You still need to add and remove the components to see the changes. To enable it, simply set the component.environment parameter to dev in the config.ini configuration file of the component server. Several methods allow you to debug custom components created with Talend Component Kit in Talend Studio. From your development tool, create a new Remote configuration, and copy the Command line arguments for running remote JVM field. For example, -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005, where: the suspend parameter of the -agentlib argument specifies whether you want to suspend the debugged JVM until the debugger attaches to it. Possible values are n (no, default value) or y (yes). the address parameter of the -agentlib argument is the port used for the remote configuration. Make sure this port is available. Open Talend Studio. Create a new Job that uses the component you want to debug or open an existing one that already uses it. Go to the Run tab of the Job and select Use specific JVM arguments. Click New to add an argument. In the popup window, paste the arguments copied from the IDE. Enter the corresponding debug mode: To debug the runtime, run the Job and access the remote host configured in the IDE. To debug the Guess schema option, click the Guess schema action button of the component and access the remote host configured in the IDE. From your development tool, create a new Remote configuration, and copy the Command line arguments for running remote JVM field. For example, -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005, where: suspend defines whether you need to access the defined configuration to run the remote JVM. Possible values are n (no, default value) or y (yes). address is the port used for the remote configuration. Make sure this port is available. Access the installation directory of your Talend Sutdio. Open the .ini file corresponding to your Operating System. For example, TOS_DI-win-x86_64.ini. Paste the arguments copied from the IDE in a new line of the file. Go to Talend Studio to use the component, and access the host host configured in the IDE. If you run multiple Studio instances automatically in parallel, you can run into some issues with the random port computation. For example on a CI platform. For that purpose, you can create the $HOME/.talend/locks/org.talend.sdk.component.studio-integration.lock file. Then, when a server starts, it acquires a lock on that file and prevents another server to get a port until it is started. It ensures that you can’t have two concurrent processes getting the same port allocated. However, it is highly unlikely to happen on a desktop. In that case, forcing a different value through component.java.port in your config.ini file is a better solution for local installations.
OSS addict I'm involved in several ASF projects (http://home.apache.org/committer-index.html#rmannibucau) and Yupiik (https://www.yupiik.io/projects.html). Blog: https://rmannibucau.metawerx.net Software engineer @Talend. Components team member. Blog: undx.github.io Technical Writer Frontend Architect R&D Senior principal software engineer at @Talend, security contributor at @apache. Blog: http://coheigea.blogspot.com/ Frontend Architect. This is my Talend account. You can check out @jsomsanith for my personal account QA Automation @ Talend Nantes Blog: timeline.antoinenicolas.com Committer and PMC member of Apache Beam and Apache Avro. Free education and Open Source enthusiast. Distributed Systems practitioner (victim?) Blog: https://ismaelmejia.com/ Templar I'm a serial tooler. Blog: https://www.linkedin.com/in/zoltantakacsdev/ Focused on Big Data technologies. I contribute to open source projects. I'm an Apache Beam committer and PMC member and an Apache Software Foundation member Blog: https://echauchot.blogspot.com/ Senior Cloud Software Architect Product Manager, Information Architect Senior Application Security Engineer Blog: www.talend.com 573PH4N3 D3M15 K3RM480N Senior Software engineer at @komodohealth. Ph.D. in large-scale storage systems. Previous competitive programmer.
This section mainly concerns tools that can be used with JUnit. You can use most of these best practices with TestNG as well. Parameterized tests are a great solution to repeat the same test multiple times. This method of testing requires defining a test scenario (I test function F) and making the input/output data dynamic. Here is a test example, which validates a connection URI using ConnectionService: The testing method is always the same. Only values are changing. It can therefore be rewritten using JUnit Parameterized runner, as follows: You don’t have to define a single @Test method. If you define multiple methods, each of them is executed with all the data. For example, if another test is added to the previous example, four tests are executed - 2 per data). With JUnit 5, parameterized tests are easier to use. The full documentation is available at junit.org/junit5/docs/current/user-guide/#writing-tests-parameterized-tests. The main difference with JUnit 4 is that you can also define inline that the test method is a parameterized test as well as the values to use: However, you can still use the previous behavior with a method binding configuration: This last option allows you to inject any type of value - not only primitives - which is common to define scenarios. Add the junit-jupiter-params dependency to benefit from this feature.
Several data generators exist if you want to populate objects with a semantic that is more evolved than a plain random string like commons-lang3: github.com/Codearte/jfairy github.com/DiUS/java-faker github.com/andygibson/datafactory etc. Even more advanced, the following generators allow to directly bind generic data on a model. However, data quality is not always optimal: github.com/devopsfolks/podam github.com/benas/random-beans etc. There are two main kinds of implementation: Implementations using a pattern and random generated data. Implementations using a set of precomputed data extrapolated to create new values. Check your use case to know which one fits best. An alternative to data generation can be to import real data and use Talend Studio to sanitize the data, by removing sensitive information and replacing it with generated or anonymized data. Then you just need to inject that file into the system. If you are using JUnit 5, you can have a look at glytching.github.io/junit-extensions/randomBeans.
Talend Component Kit is a Java framework designed to simplify the development of components at two levels: The Runtime, that injects the specific component code into a job or pipeline. The framework helps unifying as much as possible the code required to run in Data Integration (DI) and BEAM environments. The Graphical interface. The framework helps unifying the code required to render the component in a browser or in the Eclipse-based Talend Studio (SWT). Most part of the development happens as a Maven or Gradle project and requires a dedicated tool such as IntelliJ. The Component Kit is made of: A Starter, that is a graphical interface allowing you to define the skeleton of your development project. APIs to implement components UI and runtime. Development tools: Maven and Gradle wrappers, validation rules, packaging, Web preview, etc. A testing kit based on JUnit 4 and 5. By using this tooling in a development environment, you can start creating components as described below. Developing new components using the Component Kit framework includes: Creating a project using the starter or the Talend IntelliJ plugin. This step allows to build the skeleton of the project. It consists in: Defining the general configuration model for each component in your project. Generating and downloading the project archive from the starter. Compiling the project. Importing the compiled project in your IDE. This step is not required if you have generated the project using the IntelliJ plugin. Implementing the components, including: Registering the components by specifying their metadata: family, categories, version, icon, type and name. Defining the layout and configurable part of the components. Defining the execution logic of the components, also called runtime. Testing the components. Deploying the components to Talend Studio or Cloud applications. Optionally, you can use services. Services are predefined or user-defined configurations that can be reused in several components. There are four types of components, each type coming with its specificities, especially on the runtime side. Input components: Retrieve the data to process from a defined source. An input component is made of: The execution logic of the component, represented by a Mapper or an Emitter class. The source logic of the component, represented by a Source class. The layout of the component and the configuration that the end-user will need to provide when using the component, defined by a Configuration class. All input components must have a dataset specified in their configuration, and every dataset must use a datastore. Processors: Process and transform the data. A processor is made of: The execution logic of the component, describing how to process each records or batches of records it receives. It also describes how to pass records to its output connections. This logic is defined in a Processor class. The layout of the component and the configuration that the end-user will need to provide when using the component, defined by a Configuration class. Output components: Send the processed data to a defined destination. An output component is made of: The execution logic of the component, describing how to process each records or batches of records it receives. This logic is defined in an Output class. Unlike processors, output components are the last components of the execution and return no data. The layout of the component and the configuration that the end-user will need to provide when using the component, defined by a Configuration class. All input components must have a dataset specified in their configuration, and every dataset must use a datastore. Standalone components: Make a call to the service or run a query on the database. A standalone component is made of: The execution logic of the component, represented by a DriverRunner class. The layout of the component and the configuration that the end-user will need to provide when using the component, defined by a Configuration class. All input components must have a datastore or dataset specified in their configuration, and every dataset must use a datastore. The following example shows the different classes of an input components in a multi-component development project: Setup your development environment Generate your first project and develop your first component
To be able to see and use your newly developed components, you need to integrate them to the right application. Currently, you can deploy your components to Talend Studio as part of your development process to iterate on them: Iterating on component development with Talend Studio You can also share your components externally and install them using a component archive (.car) file. Sharing and installing components in Talend Studio Check the versions of the framework that are compatible with your version of Talend Studio in this document. If you were used to create custom components with the Javajet framework and want to get to know the new approach and main differences of the Component Kit framework, refer to this document.
Talend Component Kit is a toolkit based on Java and designed to simplify the development of components at two levels: Runtime: Runtime is about injecting the specific component code into a job or pipeline. The framework helps unify as much as possible the code required to run in Data Integration (DI) and BEAM environments. Graphical interface: The framework helps unify the code required to be able to render the component in a browser (web) or in the Eclipse-based Studio (SWT). The Talend Component Kit framework is made of several tools designed to help you during the component development process. It allows to develop components that fit in both Java web UIs. Starter: Generate the skeleton of your development project using a user-friendly interface. The Talend Component Kit Starter is available as a web tool or as a plugin for the IntelliJ IDE. Component API: Check all classes available to implement components. Build tools: The framework comes with Maven and Gradle wrappers, which allow to always use the version of Maven or Gradle that is right for your component development environment and version. Testing tools: Test components before integrating them into Talend Studio or Cloud applications. Testing tools include the Talend Component Kit Web Tester, which allows to check the web UI of your components on your local machine. You can find more details about the framework design in this document. The Talend Component Kit project is available on GitHub in the following repository
This tutorial walks you through the most common iteration steps to create a component with Talend Component Kit and to deploy it to Talend Open Studio.
The component created in this tutorial is a simple processor that reads data coming from the previous component in a job or pipeline and displays it in the console logs of the application, along with an additional information entered by the final user.
The component designed in this tutorial is a processor and does not require nor show any datastore and dataset configuration. Datasets and datastores are required only for input and output components.
To get your development environment ready and be able to follow this tutorial:
Download and install a Java JDK 1.8 or greater.
Download and install Talend Open Studio. For example, from Sourceforge.
Download and install IntelliJ.
Download the Talend Component Kit plugin for IntelliJ. The detailed installation steps for the plugin are available in this document.
The first step in this tutorial is to generate a component skeleton using the Starter embedded in the Talend Component Kit plugin for IntelliJ.
Start IntelliJ and create a new project. In the available options, you should see Talend Component.
Make sure that a Project SDK is selected. Then, select Talend Component and click Next. The Talend Component Kit Starter opens.
Enter the component and project metadata. Change the default values, for example as presented in the screenshot below:
The Component Family and the Category will be used later in Talend Open Studio to find the new component.
Project metadata is mostly used to identify the project structure. A common practice is to replace 'company' in the default value by a value of your own, like your domain name.
Once the metadata is filled, select Add a component. A new screen is displayed in the Talend Component Kit Starter that lets you define the generic configuration of the component. By default, new components are processors.
Enter a valid Java name for the component. For example, Logger.
Select Configuration Model and add a string type field named level. This input field will be used in the component configuration for final users to enter additional information to display in the logs.
In the Input(s) / Output(s) section, click the default MAIN input branch to access its detail, and make sure that the record model is set to Generic. Leave the Name of the branch with its default MAIN value.
Repeat the same step for the default MAIN output branch.
Because the component is a processor, it has an output branch by default. A processor without any output branch is considered an output component. You can create output components when the Activate IO option is selected.
Click Next and check the name and location of the project, then click Finish to generate the project in the IDE.
At this point, your component is technically already ready to be compiled and deployed to Talend Open Studio. But first, take a look at the generated project:
Two classes based on the name and type of component defined in the Talend Component Kit Starter have been generated:
LoggerProcessor is where the component logic is defined
LoggerProcessorConfiguration is where the component layout and configurable fields are defined, including the level string field that was defined earlier in the configuration model of the component.
The package-info.java file contains the component metadata defined in the Talend Component Kit Starter, such as family and category.
You can notice as well that the elements in the tree structure are named after the project metadata defined in the Talend Component Kit Starter.
These files are the starting point if you later need to edit the configuration, logic, and metadata of the component.
There is more that you can do and configure with the Talend Component Kit Starter. This tutorial covers only the basics. You can find more information in this document.
Without modifying the component code generated from the Starter, you can compile the project and deploy the component to a local instance of Talend Open Studio.
The logic of the component is not yet implemented at that stage. Only the configurable part specified in the Starter will be visible. This step is useful to confirm that the basic configuration of the component renders correctly.
Before starting to run any command, make sure that Talend Open Studio is not running.
From the component project in IntelliJ, open a Terminal and make sure that the selected directory is the root of the project. All commands shown in this tutorial are performed from this location.
Compile the project by running the following command: mvnw clean install. The mvnw command refers to the Maven wrapper that is embedded in Talend Component Kit. It allows to use the right version of Maven for your project without having to install it manually beforehand. An equivalent wrapper is available for Gradle.
Once the command is executed and you see BUILD SUCCESS in the terminal, deploy the component to your local instance of Talend Open Studio using the following command: mvnw talend-component:deploy-in-studio -Dtalend.component.studioHome="
You can integrate and start using components developed using Talend Component Kit in Talend applications very easily. As both the development framework and Talend applications evolve over time, you need to ensure compatibility between the components you develop and the versions of Talend applications that you are targeting, by making sure that you use the right version of Talend Component Kit. The version of Talend Component Kit you need to use to develop new components depends on the versions of the Talend applications in which these components will be integrated. Talend product Talend Component Kit version Talend Studio 8.8.8 (aka master) latest release Talend Studio 8.0.1 latest release QA approved Talend Studio 7.3.1 Framework until 1.38.x Talend Studio 7.2.1 Framework until 1.1.10 Talend Studio 7.1.1 Framework until 1.1.1 Talend Studio 7.0.1 Framework until 0.0.5 Talend Cloud latest release QA and cloud teams approved More recent versions of Talend Component Kit contain many fixes, improvements and features that help developing your components. However, they can cause some compatibility issues when deploying these components to older/different versions of Talend Studio and Talend Cloud. Choose the version of Talend Component Kit that best fits your needs. Creating a project using the Component Kit Starter always uses the latest release of Talend Component Kit. However, you can manually change the version of Talend Component Kit directly in the generated project. Go to your IDE and access the project root .pom file. Look for the org.talend.sdk.component dependency nodes. Replace the version in the relevant nodes with the version that you need to use for your project. You can use a Snapshot of the version under development using the -SNAPSHOT version and Sonatype snapshot repository.
Talend Component Kit provides a migration mechanism between two versions of a component to let you ensure backward compatibility. For example, a new version of a component may have some new options that need to be remapped, set with a default value in the older versions, or disabled. This tutorial shows how to create a migration handler for a component that needs to be upgraded from a version 1 to a version 2. The upgrade to the newer version includes adding new options to the component. This tutorial assumes that you know the basics about component development and are familiar with component project generation and implementation. To follow this tutorial, you need: Java 8 A Talend component development environment using Talend Component Kit. Refer to this document. Have generated a project containing a simple processor component using the Talend Component Kit Starter. First, create a simple processor component configured as follows: Create a simple configuration class that represents a basic authentication and that can be used in any component requiring this kind of authentication. Create a simple output component that uses the configuration defined earlier. The component configuration is injected into the component constructor. The version of the configuration class corresponds to the component version. By configuring these two classes, the first version of the component is ready to use a simple authentication mechanism. Now, assuming that the component needs to support a new authentication mode following a new requirement, the next steps are: Creating a version 2 of the component that supports the new authentication mode. Handling migration from the first version to the new version. The second version of the component needs to support a new authentication method and let the user choose the authentication mode he wants to use using a dropdown list. Add an Oauth2 authentication mode to the component in addition to the basic mode. For example: The options of the new authentication mode are now defined. Wrap the configuration created above in a global configuration with the basic authentication mode and add an enumeration to let the user choose the mode to use. For example, create an AuthenticationConfiguration class as follows: Using the @ActiveIf annotation allows to activate the authentication type according to the selected authentication mode. Edit the component to use the new configuration that supports an additional authentication mode. Also upgrade the component version from 1 to 2 as its configuration has changed. The component now supports two authentication modes in its version 2. Once the new version is ready, you can implement the migration handler that will take care of adapting the old configuration to the new one. What can happen if an old configuration is passed to the new component version? It simply fails, as the version 2 does not recognize the old version anymore. For that reason, a migration handler that adapts the old configuration to the new one is required. It can be achieved by defining a migration handler class in the @Version annotation of the component class. An old configuration may already be persisted by an application that integrates the version 1 of the component (Studio or web application). Add a migration handler class to the component version. Create the migration handler class MyOutputMigrationHandler. the incoming version, which is the version of the configuration that we are migrating from a map (key, value) of the configuration, where the key is the configuration path and the value is the value of the configuration. You need to be familiar with the component configuration path construction to better understand this part. Refer to Defining component layout and configuration. As a reminder, the following changes were made since the version 1 of the component: The configuration BasicAuth from the version 1 is not the root configuration anymore, as it is under AuthenticationConfiguration. AuthenticationConfiguration is the new root configuration. The component supports a new authentication mode (Oauth2) which is the default mode in the version 2 of the component. To migrate the old component version to the new version and to keep backward compatibility, you need to: Remap the old configuration to the new one. Give the adequate default values to some options. In the case of this scenario, it means making all configurations based on the version 1 of the component have the authenticationMode set to basic by default and remapping the old basic authentication configuration to the new one. if a configuration has been renamed between 2 component versions, you can get the old configuration option from the configuration map by using its old path and set its value using its new path. You can now upgrade your component without losing backward compatibility.
From the version 7.0 of Talend Studio, Talend Component Kit becomes the recommended framework to use to develop components.
This framework is being introduced to ensure that newly developed components can be deployed and executed both in on-premise/local and cloud/big data environments.
From that new approach comes the need to provide a complete yet unique and compatible way of developing components.
With the Component Kit, custom components are entirely implemented in Java. To help you get started with a new custom component development project, a Starter is available. Using it, you will be able to generate the skeleton of your project. By importing this skeleton in a development tool, you can then implement the components layout and execution logic in Java.
With the previous Javajet framework, metadata, widgets and configurable parts of a custom component were specified in XML. With the Component Kit, they are now defined in the
Testing code that consumes REST APIs can sometimes present many constraints: API rate limit, authentication token and password sharing, API availability, sandbox expiration, API costs, and so on. As a developer, it becomes critical to avoid those constraints and to be able to easily mock the API response. The component framework provides an API simulation tool that makes it easy to write unit tests. This tutorial shows how to use this tool in unit tests. As a starting point, the tutorial uses the component that consumes Zendesk Search API and that was created in a previous tutorial. The goal is to add unit tests for it. For this tutorial, four tickets that have the open status have been added to the Zendesk test instance used in the tests. To learn more about the testing methodology used in this tutorial, refer to Component JUnit testing. Create a unit test that performs a real HTTP request to the Zendesk Search API instance. You can learn how to create a simple unit test in this tutorial. the authentication configuration using Zendesk instance URL and credentials. the search query configuration to get all the open ticket, ordered by creation date and sorted in descending order. The test is now complete and working. It performs a real HTTP request to the Zendesk instance. As an alternative, you can use mock results to avoid performing HTTP requests every time on the development environment. The real HTTP requests would, for example, only be performed on an integration environment. To transform the unit test into a mocked test that uses a mocked response of the Zendesk Search API: Add the two following JUnit rules provided by the component framework. JUnit4HttpApi: This rule starts a simulation server that acts as a proxy and catches all the HTTP requests performed in the tests. This simulation server has two modes : capture : This mode forwards the captured HTTP request to the real server and captures the response. simulation : this mode returns a mocked response from the responses already captured. This rule needs to be added as a class rule. JUnit4HttpApi: This rule has a reference to the first rule. Its role is to configure the simulation server for every unit test. It passes the context of the running test to the simulation server. This rule needs to be added as a simple (method) rule. Example to run in a simulation mode: Make the test run in capture mode to catch the real API responses that can be used later in the simulated mode. To do that, set a new talend.junit.http.capture environment variable to true. This tells the simulation server to run in a capture mode. The captured response is saved in the resources/talend.testing.http package in a JSON format, then reused to perform the API simulation.
A Processor is a component that converts incoming data to a different model.
A processor must have a method decorated with @ElementListener taking an incoming data and returning the processed data:
Processors must be Serializable because they are distributed components.
If you just need to access data on a map-based ruleset, you can use Record or JsonObject as parameter type. From there, Talend Component Kit wraps the data to allow you to access it as a map. The parameter type is not enforced. This means that if you know you will get a SuperCustomDto, then you can use it as parameter type. But for generic components that are reusable in any chain, it is highly encouraged to use Record until you have an evaluation language-based processor that has its own way to access components.
For example:
A processor also supports @BeforeGroup and @AfterGroup methods, which must not have any parameter and return void values. Any other result would be ignored. These methods are used by the runtime to mark a chunk of the data in a way which is estimated good for the execution flow size.
Because the size is estimated, the size of a group can vary. It is even possible to have groups of size 1.
It is recommended to batch records, for performance reasons:
You can optimize the data batch processing by using the maxBatchSize parameter. This parameter is automatically implemented on the component when it is deployed to a Talend application. Only the logic needs to be implemented. You can however customize its value setting in your LocalConfiguration the property _maxBatchSize.value - for the family - or ${component simple class name}._maxBatchSize.value - for a particular component, otherwise its default will be 1000. If you replace value by active, you can also configure if this feature is enabled or not. This is useful when you don’t want to use it at all. Learn how to implement chunking/bulking in this document.
In some cases, you may need to split the output of a processor in two or more connections. A common example is to have "main" and "reject" output connections where part of the incoming data are passed to a specific bucket and processed later.
Talend Component Kit supports two types of output connections: Flow and Reject.
Flow is the main and standard output connection.
The Reject connection handles records rejected during the processing. A component can only have one reject connection, if any. Its name must be REJECT to be processed correctly in Talend applications.
You can also define the different output connections of your component in the Starter.
To define an output connection, you can use @Output as replacement of the returned value in the @ElementListener:
Alternatively, you can pass a string that represents the new branch:
Having multiple inputs is similar to having multiple outputs, except that an OutputEmitter wrapper is not needed:
@Input takes the input name as parameter. If no name is set, it defaults to the "main (default)" input branch. It is recommended to use the default branch when possible and to avoid naming branches according to the component semantic.
Batch processing refers to the way execution environments process batches of data handled by a component using a grouping mechanism.
By default, the execution environment of a component automatically decides how to process groups of records and estimates an optimal group size depending on the system capacity. With this default behavior, the size of each group could sometimes be optimized for the system to handle the load more effectively or to match business requirements.
For example, real-time or near real-time processing needs often imply processing smaller batches of data, but more often. On the other hand, a one-time processing without business contraints is more effectively handled with a batch size based on the system capacity.
Final users of a component developed with the Talend Component Kit that integrates the batch processing logic described in this document can override this automatic size. To do that, a maxBatchSize option is available in the component settings and allows to set the maximum size of each group of data to process.
A component processes batch data as follows:
Case 1 - No maxBatchSize is specified in the component configuration. The execution environment estimates a group size of 4. Records are processed by groups of 4.
Case 2 - The runtime estimates a group size of 4 but a maxBatchSize of 3 is specified in the component configuration. The system adapts the group size to 3. Records are processed by groups of 3.
Batch processing relies on the sequence of three methods: @BeforeGroup, @ElementListener, @AfterGroup, that you can customize to your needs as a component Developer.
The group size automatic estimation logic is automatically implemented when a component is deployed to a Talend application.
Each group is processed as follows until there is no record left:
The @BeforeGroup method resets a record buffer at the beginning of each group.
The records of the group are assessed one by one and placed in the buffer as follows: The @ElementListener method tests if the buffer size is greater or equal to the defined maxBatchSize. If it is, the records are processed. If not, then the current record is buffered.
The previous step happens for all records of the group. Then the @AfterGroup method tests if the buffer is empty.
You can define the following logic in the processor configuration:
You can also use the condensed syntax for this kind of processor:
When writing tests for components, you can force the maxBatchSize parameter value by setting it with the following syntax:
Processors and output components are the components in charge of reading, processing and transforming data in a Talend job, as well as passing it to its required destination.
Before implementing the component logic and defining its layout and configurable fields, make sure you have specified its basic metadata, as detailed in this document.
A Processor is a component that converts incoming data to a different model.
A processor must have a method decorated with @ElementListener taking an incoming data and returning the processed data:
Processors must be Serializable because they are distributed components.
If you just need to access data on a map-based ruleset, you can use Record or JsonObject as parameter type. From there, Talend Component Kit wraps the data to allow you to access it as a map. The parameter type is not enforced. This means that if you know you will get a SuperCustomDto, then you can use it as parameter type. But for generic components that are reusable in any chain, it is highly encouraged to use Record until you have an evaluation language-based processor that has its own way to access components.
For example:
A processor also supports @BeforeGroup and @AfterGroup methods, which must not have any parameter and return void values. Any other result would be ignored. These methods are used by the runtime to mark a chunk of the data in a way which is estimated good for the execution flow size.
Because the size is estimated, the size of a group can vary. It is even possible to have groups of size 1.
It is recommended to batch records, for performance reasons:
You can optimize the data batch processing by using the maxBatchSize parameter. This parameter is automatically implemented on the component when it is deployed to a Talend application. Only the logic needs to be implemented. You can however customize its value setting in your LocalConfiguration the property _maxBatchSize.value - for the family - or ${component simple class name}._maxBatchSize.value - for a particular component, otherwise its default will be 1000. If you replace value by active, you can also configure if this feature is enabled or not. This is useful when you don’t want to use it at all. Learn how to implement chunking/bulking in this document.
In some cases, you may need to split the output of a processor in two or more connections. A common example is to have "main" and "reject" output connections where part of the incoming data are passed to a specific bucket and processed later.
Talend Component Kit supports two types of output connections: Flow and Reject.
Flow is the main and standard output connection.
The Reject connection handles records rejected during the processing. A component can only have one reject connection, if any. Its name must be REJECT to be processed correctly in Talend applications.
You can also define the different output connections of your component in the Starter.
To define an output connection, you can use @Output as replacement of the returned value in the @ElementListener:
Alternatively, you can pass a string that represents the new branch:
Having multiple inputs is similar to having multiple outputs, except that an OutputEmitter wrapper is not needed:
@Input takes the input name as parameter. If no name is set, it defaults to the "main (default)" input branch. It is recommended to use the default branch when possible and to avoid naming branches according to the component semantic.
Batch processing refers to the way execution environments process batches of data handled by a component using a grouping mechanism.
By default, the execution environment of a component automatically decides how to process groups of records and estimates an optimal group size depending on the system capacity. With this default behavior, the size of each group could sometimes be optimized for the system to handle the load more effectively or to match business requirements.
For example, real-time or near real-time processing needs often imply processing smaller batches of data, but more often. On the other hand, a one-time processing without business contraints is more effectively handled with a batch size based on the system capacity.
Final users of a component developed with the Talend Component Kit that integrates the batch processing logic described in this document can override this automatic size. To do that, a maxBatchSize option is available in the component settings and allows to set the maximum size of each group of data to process.
A component processes batch data as follows:
Case 1 - No maxBatchSize is specified in the component configuration. The execution environment estimates a group size of 4. Records are processed by groups of 4.
Case 2 - The runtime estimates a group size of 4 but a maxBatchSize of 3 is specified in the component configuration. The system adapts the group size to 3. Records are processed by groups of 3.
Batch processing relies on the sequence of three methods: @BeforeGroup, @ElementListener, @AfterGroup, that you can customize to your needs as a component Developer.
The group size automatic estimation logic is automatically implemented when a component is deployed to a Talend application.
Each group is processed as follows until there is no record left:
The @BeforeGroup method resets a record buffer at the beginning of each group.
The records of the group are assessed one by one and placed in the buffer as follows: The @ElementListener method tests if the buffer size is greater or equal to the defined maxBatchSize. If it is, the records are processed. If not, then the current record is buffered.
The previous step happens for all records of the group. Then the @AfterGroup method tests if the buffer is empty.
You can define the following logic in the processor configuration:
You can also use the condensed syntax for this kind of processor:
When writing tests for components, you can force the maxBatchSize parameter value by setting it with the following syntax:
Batch processing refers to the way execution environments process batches of data handled by a component using a grouping mechanism.
By default, the execution environment of a component automatically decides how to process groups of records and estimates an optimal group size depending on the system capacity. With this default behavior, the size of each group could sometimes be optimized for the system to handle the load more effectively or to match business requirements.
For example, real-time or near real-time processing needs often imply processing smaller batches of data, but more often. On the other hand, a one-time processing without business contraints is more effectively handled with a batch size based on the system capacity.
Final users of a component developed with the Talend Component Kit that integrates the batch processing logic described in this document can override this automatic size. To do that, a maxBatchSize option is available in the component settings and allows to set the maximum size of each group of data to process.
A component processes batch data as follows:
Case 1 - No maxBatchSize is specified in the component configuration. The execution environment estimates a group size of 4. Records are processed by groups of 4.
Case 2 - The runtime estimates a group size of 4 but a maxBatchSize of 3 is specified in the component configuration. The system adapts the group size to 3. Records are processed by groups of 3.
Batch processing relies on the sequence of three methods: @BeforeGroup, @ElementListener, @AfterGroup, that you can customize to your needs as a component Developer.
The group size automatic estimation logic is automatically implemented when a component is deployed to a Talend application.
Each group is processed as follows until there is no record left:
The @BeforeGroup method resets a record buffer at the beginning of each group.
The records of the group are assessed one by one and placed in the buffer as follows: The @ElementListener method tests if the buffer size is greater or equal to the defined maxBatchSize. If it is, the records are processed. If not, then the current record is buffered.
The previous step happens for all records of the group. Then the @AfterGroup method tests if the buffer is empty.
You can define the following logic in the processor configuration:
You can also use the condensed syntax for this kind of processor:
When writing tests for components, you can force the maxBatchSize parameter value by setting it with the following syntax:
This tutorial walks you through the creation, from scratch, of a complete Talend input component for Hazelcast using the Talend Component Kit (TCK) framework.
Hazelcast is an in-memory distributed system that can store data, which makes it a good example of input component for distributed systems. This is enough for you to get started with this tutorial, but you can find more information about it here: hazelcast.org/.
A TCK project is a simple Java project with specific configurations and dependencies. You can choose your preferred build tool from Maven or Gradle as TCK supports both. In this tutorial, Maven is used.
The first step consists in generating the project structure using Talend Starter Toolkit .
Go to starter-toolkit.talend.io/ and fill in the project information as shown in the screenshots below, then click Finish and Download as ZIP.
image::tutorial_hazelcast_generateproject_1.png[] image::tutorial_hazelcast_generateproject_2.png[]
Extract the ZIP file into your workspace and import it to your preferred IDE. This tutorial uses Intellij IDE, but you can use Eclipse or any other IDE that you are comfortable with.
You can use the Starter Toolkit to define the full configuration of the component, but in this tutorial some parts are configured manually to explain key concepts of TCK.
The generated pom.xml file of the project looks as follows:
Change the name tag to a more relevant value, for example:
Once you have generated a project, you can start implementing the logic and layout of your components and iterate on it. Depending on the type of component you want to create, the logic implementation can differ. However, the layout and component metadata are defined the same way for all types of components in your project. The main steps are: Defining family and component metadata Defining an input component logic Defining a processor/output logic Defining a standalone component logic Defining component layout and configuration In some cases, you will require specific implementations to handle more advanced cases, such as: Internationalizing a component Managing component versions Masking sensitive data Implementing batch processing Implementing streaming on a component You can also make certain configurations reusable across your project by defining services. Using your Java IDE along with a build tool supported by the framework, you can then compile your components to test and deploy them to Talend Studio or other Talend applications: Building components with Maven Building components with Gradle Wrapping a Beam I/O In any case, follow these best practices to ensure the components you develop are optimized. You can also learn more about component loading and plugins here: Loading a component
Components built using Talend Component Kit can be shared as component archives (.car). These CAR files are executable files allowing to easily deploy the components it contains to any compatible version of Talend Studio. Component developers can generate .car files from their projects to share their components and make them available for other users, as detailed in this document. This document assumes that you have a component archive (.car) file and need to deploy it to Talend Studio. The component archive (.car) is executable and exposes the studio-deploy command which takes a Talend Studio home path as parameter. When executed, it installs the dependencies into the Studio and registers the component in your instance. For example: You can also upload the dependencies to your Nexus server using the following command: In this command, Nexus URL and repository name are mandatory arguments. All other arguments are optional. If arguments contain spaces or special symbols, you need to quote the whole value of the argument. For example: Talend Studio allows you to share components you have created using Talend Component Kit to other users working on the same remote project. Remote projects are available with Enterprise versions of Talend Studio only. Also, note that this feature has been removed in Studio since 7.3 release. Make sure you are connected to a remote project and the artifact repository for component sharing has been properly configured. On the toolbar of the Studio main window, click or click File > Edit Project Properties from the menu bar to open the Project Settings dialog box. In the tree view of the dialog box, select Repository Share to open the corresponding view. Select the Propagate components update to Artifact Repository check box. In the Repository ID field, specify the artifact repository configured for component sharing, and then click Check connection to verify the connectivity. Click Apply and Close to validate the settings and close the dialog box. Create a folder named patches at the root of your Talend Studio installation directory, then copy the .car files of the components you want share to this folder. Restart your Talend Studio and connect to the remote project. The components are deployed automatically to the repository and available in the Palette for other users when connected to a remote project with the same sharing repository configuration. My custom component builds correctly but does not appear in Talend Studio, how to fix it? This issue can be caused by the icon specified in the component metadata. Make sure to specify a custom icon for the component and the component family. These custom icons must be in PNG format to be properly handled by Talend Studio. Remove SVG parameters from the talend.component.server.icon.paths property in the HTTP server configuration. Refer to this section. Learn more about defining custom icons for components in this document.
The Component Kit Starter lets you design your components configuration and generates a ready-to-implement project structure. The Starter is available on the web or as an IntelliJ plugin. This tutorial shows you how to use the Component Kit Starter to generate new components for MySQL databases. Before starting, make sure that you have correctly setup your environment. See this section. When defining a project using the Starter, do not refresh the page to avoid losing your configuration. Before being able to create components, you need to define the general settings of the project: Create a folder on your local machine to store the resource files of the component you want to create. For example, C:/my_components. Open the Starter in the web browser of your choice. Select your build tool. This tutorial uses Maven, but you can select Gradle instead. Add any facet you need. For example, add the Talend Component Kit Testing facet to your project to automatically generate unit tests for the components created in the project. Enter the Component Family of the components you want to develop in the project. This name must be a valid java name and is recommended to be capitalized, for example 'MySQL'. Once you have implemented your components in the Studio, this name is displayed in the Palette to group all of the MySQL-related components you develop, and is also part of your component name. Select the Category of the components you want to create in the current project. As MySQL is a kind of database, select Databases in this tutorial. This Databases category is used and displayed as the parent family of the MySQL group in the Palette of the Studio. Complete the project metadata by entering the Group, Artifact and Package. By default, you can only create processors. If you need to create Input or Output components, select Activate IO. By doing this: Two new menu entries let you add datasets and datastores to your project, as they are required for input and output components. Input and Output components without dataset (itself containing a datastore) will not pass the validation step when building the components. Learn more about datasets and datastores in this document. An Input component and an Output component are automatically added to your project and ready to be configured. Components added to the project using Add A Component can now be processors, input or output components. A datastore represents the data needed by an input or output component to connect to a database. When building a component, the validateDataSet validation checks that each input or output (processor without output branch) component uses a dataset and that this dataset has a datastore. You can define one or several datastores if you have selected the Activate IO step. Select Datastore. The list of datastores opens. By default, a datastore is already open but not configured. You can configure it or create a new one using Add new Datastore. Specify the name of the datastore. Modify the default value to a meaningful name for your project. This name must be a valid Java name as it will represent the datastore class in your project. It is a good practice to start it with an uppercase letter. Edit the datastore configuration. Parameter names must be valid Java names. Use lower case as much as possible. A typical configuration includes connection details to a database: url username password. Save the datastore configuration. A dataset represents the data coming from or sent to a database and needed by input and output components to operate. The validateDataSet validation checks that each input or output (processor without output branch) component uses a dataset and that this dataset has a datastore. You can define one or several datasets if you have selected the Activate IO step. Select Dataset. The list of datasets opens. By default, a dataset is already open but not configured. You can configure it or create a new one using the Add new Dataset button. Specify the name of the dataset. Modify the default value to a meaningful name for your project. This name must be a valid Java name as it will represent the dataset class in your project. It is a good practice to start it with an uppercase letter. Edit the dataset configuration. Parameter names must be valid Java names. Use lower case as much as possible. A typical configuration includes details of the data to retrieve: Datastore to use (that contains the connection details to the database) table name data Save the dataset configuration. To create an input component, make sure you have selected Activate IO. When clicking Add A Component in the Starter, a new step allows you to define a new component in your project. The intent in this tutorial is to create an input component that connects to a MySQL database, executes a SQL query and gets the result. Choose the component type. Input in this case. Enter the component name. For example, MySQLInput. Click Configuration model. This button lets you specify the required configuration for the component. By default, a dataset is already specified. For each parameter that you need to add, click the (+) button on the right panel. Enter the parameter name and choose its type then click the tick button to save the changes. In this tutorial, to be able to execute a SQL query on the Input MySQL database, the configuration requires the following parameters:+ a dataset (which contains the datastore with the connection information) a timeout parameter. Closing the configuration panel on the right does not delete your configuration. However, refreshing the page resets the configuration. Specify whether the component issues a stream or not. In this tutorial, the MySQL input component created is an ordinary (non streaming) component. In this case, leave the Stream option disabled. Select the Record Type generated by the component. In this tutorial, select Generic because the component is designed to generate records in the default Record format. You can also select Custom to define a POJO that represents your records. Your input component is now defined. You can add another component or generate and download your project. When clicking Add A Component in the Starter, a new step allows you to define a new component in your project. The intent in this tutorial is to create a simple processor component that receives a record, logs it and returns it at it is. If you did not select Activate IO, all new components you add to the project are processors by default. If you selected Activate IO, you can choose the component type. In this case, to create a Processor component, you have to manually add at least one output. If required, choose the component type: Processor in this case. Enter the component name. For example, RecordLogger, as the processor created in this tutorial logs the records. Specify the Configuration Model of the component. In this tutorial, the component doesn’t need any specific configuration. Skip this step. Define the Input(s) of the component. For each input that you need to define, click Add Input. In this tutorial, only one input is needed to receive the record to log. Click the input name to access its configuration. You can change the name of the input and define its structure using a POJO. If you added several inputs, repeat this step for each one of them. The input in this tutorial is a generic record. Enable the Generic option and click Save. Define the Output(s) of the component. For each output that you need to define, click Add Output. The first output must be named MAIN. In this tutorial, only one generic output is needed to return the received record. Outputs can be configured the same way as inputs (see previous steps). You can define a reject output connection by naming it REJECT. This naming is used by Talend applications to automatically set the connection type to Reject. Your processor component is now defined. You can add another component or generate and download your project. To create an output component, make sure you have selected Activate IO. When clicking Add A Component in the Starter, a new step allows you to define a new component in your project. The intent in this tutorial is to create an output component that receives a record and inserts it into a MySQL database table. Output components are Processors without any output. In other words, the output is a processor that does not produce any records. Choose the component type. Output in this case. Enter the component name. For example, MySQLOutput. Click Configuration Model. This button lets you specify the required configuration for the component. By default, a dataset is already specified. For each parameter that you need to add, click the (+) button on the right panel. Enter the name and choose the type of the parameter, then click the tick button to save the changes. In this tutorial, to be able to insert a record in the output MySQL database, the configuration requires the following parameters:+ a dataset (which contains the datastore with the connection information) a timeout parameter. Closing the configuration panel on the right does not delete your configuration. However, refreshing the page resets the configuration. Define the Input(s) of the component. For each input that you need to define, click Add Input. In this tutorial, only one input is needed. Click the input name to access its configuration. You can change the name of the input and define its structure using a POJO. If you added several inputs, repeat this step for each one of them. The input in this tutorial is a generic record. Enable the Generic option and click Save. Do not create any output because the component does not produce any record. This is the only difference between an output an a processor component. Your output component is now defined. You can add another component or generate and download your project. Once your project is configured and all the components you need are created, you can generate and download the final project. In this tutorial, the project was configured and three components of different types (input, processor and output) have been defined. Click Finish on the left panel. You are redirected to a page that summarizes the project. On the left panel, you can also see all the components that you added to the project. Generate the project using one of the two options available: Download it locally as a ZIP file using the Download as ZIP button. Create a GitHub repository and push the project to it using the Create on Github button. In this tutorial, the project is downloaded to the local machine as a ZIP file. Once the package is available on your machine, you can compile it using the build tool selected when configuring the project. In the tutorial, Maven is the build tool selected for the project. In the project directory, execute the mvn package command. If you don’t have Maven installed on your machine, you can use the Maven wrapper provided in the generated project, by executing the ./mvnw package command. If you have created a Gradle project, you can compile it using the gradle build command or using the Gradle wrapper: ./gradlew build. The generated project code contains documentation that can guide and help you implementing the component logic. Import the project to your favorite IDE to start the implementation. The Component Kit Starter allows you to generate a component development project from an OpenAPI JSON descriptor. Open the Starter in the web browser of your choice. Enable the OpenAPI mode using the toggle in the header. Go to the API menu. Paste the OpenAPI JSON descriptor in the right part of the screen. All the described endpoints are detected. Unselect the endpoints that you do not want to use in the future components. By default, all detected endpoints are selected. Go to the Finish menu. Download the project. When exploring the project generated from an OpenAPI descriptor, you can notice the following elements: sources the API dataset an HTTP client for the API a connection folder containing the component configuration. By default, the configuration is only made of a simple datastore with a baseUrl parameter.
Components are designed to manipulate data (access, read, create). Talend Component Kit can handle several types of data, described in this document. By design, the framework must run in DI (plain standalone Java program) and in Beam pipelines. It is out of scope of the framework to handle the way the runtime serializes - if needed - the data. For that reason, it is critical not to import serialization constraints to the stack. As an example, this is one of the reasons why Record or JsonObject were preferred to Avro IndexedRecord. Any serialization concern should either be hidden in the framework runtime (outside of the component developer scope) or in the runtime integration with the framework (for example, Beam integration). Record is the default format. It offers many possibilities and can evolve depending on the Talend platform needs. Its structure is data-driven and exposes a schema that allows to browse it. Projects generated from the Talend Component Kit Starter are by default designed to handle this format of data. Record is a Java interface but never implement it yourself to ensure compatibility with the different Talend products. Follow the guidelines below. You can build records using the newRecordBuilder method of the RecordBuilderFactory (see here). For example: In the example above, the schema is dynamically computed from the data. You can also do it using a pre-built schema, as follows: The example above uses a schema that was pre-built using factory.newSchemaBuilder(Schema.Type.RECORD). When using a pre-built schema, the entries passed to the record builder are validated. It means that if you pass a null value null or an entry type that does not match the provided schema, the record creation fails. It also fails if you try to add an entry which does not exist or if you did not set a not nullable entry. Using a dynamic schema can be useful on the backend but can lead users to more issues when creating a pipeline to process the data. Using a pre-built schema is more reliable for end-users. You can access and read data by relying on the getSchema method, which provides you with the available entries (columns) of a record. The Entry exposes the type of its value, which lets you access the value through the corresponding method. For example, the Schema.Type.STRING type implies using the getString method of the record. For example: The Record format supports the following data types: String Boolean Int Long Float Double DateTime Array Bytes Record A map can always be modelized as a list (array of records with key and value entries). For example: For example, you can use the API to provide the schema. The following method needs to be implemented in a service. Manually constructing the schema without any data: Returning the schema from an already built record: MyDataset is the class that defines the dataset. Learn more about datasets and datastores in this document. Entry names for Record and JsonObject types must comply with the following rules: The name must start with a letter or with _. If not, the invalid characters are ignored until the first valid character. Following characters of the name must be a letter, a number, or . If not, the invalid character is replaced with . For example: 1foo becomes foo. f@o becomes f_o. 1234f5@o becomes ___f5_o. foo123 stays foo123. Each array uses only one schema for all of its elements. If an array contains several elements, they must be of the same data type. For example, the following array is not correct as it contains a string and an object: The runtime also supports JsonObject as input and output component type. You can rely on the JSON services (Jsonb, JsonBuilderFactory) to create new instances. This format is close to the Record format, except that it does not natively support the Datetime type and has a unique Number type to represent Int, Long, Float and Double types. It also does not provide entry metadata like nullable or comment, for example. It also inherits the Record format limitations. The runtime also accepts any POJO as input and output component type. In this case, it uses JSON-B to treat it as a JsonObject.
The following tutorials are designed to help you understand the main principles of component development using Talend Component Kit. With this set of tutorials, get your hands on project creation using the Component Kit Starter and implement the logic of different types of components. Creating your first component Generating a project from the starter Creating a Hazelcast input component Creating a Hazelcast output component Creating a Zendesk REST API connector Handling component version migration With this set of tutorials, learn the different approaches to test the components created in the previous tutorials. Testing a Zendesk REST API connector Testing a Hazelcast component Testing in a continuous integration environment
In order to use Talend Component Kit, you need the following tools installed on your machine: Java JDK 1.8.x. You can download it from Oracle website. Talend Open Studio to integrate your components. A Java Integrated Development Environment such as Eclipse or IntelliJ. IntelliJ is recommended as a Talend Component Kit plugin is available. Optional: If you use IntelliJ, you can install the Talend Component Kit plugin for IntelliJ. Optional: A build tool: Apache Maven 3.5.4 is recommended to develop a component or the project itself. You can download it from Apache Maven website. You can also use Gradle, but at the moment certain features are not supported, such as validations. It is optional to install a build tool independently since Maven and Gradle wrappers are already available with Talend Component Kit.
This tutorial shows how to adapt the test configuration of the Zendesk search component that was done in this previous tutorial to make it work in a Continuous Integration environment.
In the test, the Zendesk credentials are used directly in the code to perform a first capture of the API response. Then, fake credentials are used in the simulation mode because the real API is not called anymore.
However, in some cases, you can require to continue calling the real API on a CI server or on a specific environment.
To do that, you can adapt the test to get the credentials depending on the execution mode (simulation/passthrough).
These instructions concern the CI server or on any environment that requires real credentials.
This tutorial uses:
A Maven server that supports password encryption as a credential provider. Encryption is optional but recommended.
The MavenDecrypterRule test rule provided by the framework. This rule lets you get credentials from Maven settings using a server ID.
To create encrypted server credentials for the Zendesk instance:
Create a master password using the command: mvn --encrypt-master-password
The component API is declarative (through annotations) to ensure it is: Evolutive. It can get new features without breaking old code. As static as possible. Because it is fully declarative, any new API can be added iteratively without requiring any change to existing components. For example, in the case of Beam potential evolution: would not be affected by the addition of the new Timer API, which can be used as follows: The intent of the framework is to be able to fit in a Java UI as well as in a web UI. It must be understood as colocalized and remote UI. The goal is to move as much as possible the logic to the UI side for UI-related actions. For example, validating a pattern, a size, and so on, should be done on the client side rather than on the server side. Being static encourages this practice. The other goal of being static in the API definition is to ensure that the model will not be mutated at runtime and that all the auditing and modeling can be done before, at the design phase. Being static also ensures that the development can be validated as much as possible through build tools. This does not replace the requirement to test the components but helps developers to maintain components with automated tools. Refer to this document. The components must be able to execute even if they have conflicting libraries. For that purpose, classloaders must be isolated. A component defines its dependencies based on a Maven format and is always bound to its own classloader. The definition payload is as flat as possible and strongly typed to ensure it can be manipulated by consumers. This way, consumers can add or remove fields with simple mapping rules, without any abstract tree handling. The execution (runtime) configuration is the concatenation of framework metadata (only the version) and a key/value model of the instance of the configuration based on the definition properties paths for the keys. It enables consumers to maintain and work with the keys/values according to their need. The framework not being responsible for any persistence, it is very important to make sure that consumers can handle it from end to end, with the ability to search for values (update a machine, update a port and so on) and keys (for example, a new encryption rule on key certificate). Talend Component Kit is a metamodel provider (to build forms) and a runtime execution platform. It takes a configuration instance and uses it volatilely to execute a component logic. This implies it cannot own the data nor define the contract it has for these two endpoints and must let the consumers handle the data lifecycle (creation, encryption, deletion, and so on). A new mime type called talend/stream is introduced to define a streaming format. It matches a JSON object per line: Icons (@Icon) are based on a fixed set. Custom icons can be used but their display cannot be guaranteed. Components can be used in any environment and require a consistent look that cannot be guaranteed outside of the UI itself. Defining keys only is the best way to communicate this information. Once you know exactly how you will deploy your component in the Studio, then you can use `@Icon(value = CUSTOM, custom = "…") to use a custom icon file.
This document explains how Asciidoctor is used in the context of Talend Component Kit as well as the specific processes in place.
For general guidelines about Asciidoctor, refer to the Asciidoc Syntax quick reference page.
There are two ways to suggest modifications or new content. Both of the options below require you to have a GitHub account created.
On every page of the Talend Component Kit Developer Guide, a Suggest and edit button is available. It allows you to access the corresponding source file on GitHub and to create a pull request with the suggested edits. The pull request is then assessed by the team in charge of the project.
Fork the Runtime repository of the Talend Component Kit project and edit .adoc files located under documentation\src\main\antora\modules\ROOT\pages. Make sure to follow the guidelines outlined in the current document, especially for large modifications or new content, to make sure it can properly be rendered. When done, create a pull request that will be assessed by the team in charge of the project.
The documentation is made of:
Documentation files manually written under documentation\src\main\antora\modules\ROOT\pages.
Documentation files automatically generated from the source code under documentation\src\main\antora\modules\ROOT\pages\_partials. These files are individually called in manually written files through includes.
Assets, especially images, stored in the documentation\src\main\antora\modules\ROOT\assets folder. Some subfolders exist to categorize the assets.
Each file has a unique name and is rendered as a unique HTML page. Some of the files are prefixed to help identifying the type of content it contains. Most common examples are:
index- for pages referenced from the main index page. These pages also contain specific attributes to be correctly rendered on the main index page (see 'List of metadata attributes' below).
tutorial- for tutorials/guided examples.
generated_ for pages generated from the source code. These pages are generally stored in the _partials folder.
For all pages:
:page-partial indicates that the current .adoc file can be included in another document using an include::. This attribute has no value.
:page-talend_skipindexation: indicates that the current .adoc file must not be indexed. This attribute has no value. Add it to files that should not be returned in the search, like index files that only contain includes.
:description: is the meta description of the file. Each .adoc file is rendered as an HTML file.
:keywords: is the list of meta keywords relevant for the current .adoc file. Separate keywords using simple commas.
:page-talend_stage: draft indicates that the current document is a draft and is not final even if published. It triggers the display of a small banner indicating the status of the page. Remove this attribute once the page is final.
For pages that should appear as a tile on the index page:
:page-documentationindex-index: is the weight of the page. A low weight indicates that the page should be one of the first tiles to appear. A high weight will push the tile towards the end of the list in the index page.
:page-documentationindex-label: is the title of the tile in the index page.
:page-documentationindex-icon: is the icon of the tile in the index page. The value of this attribute should be the name of a free icon on fontawesome.
:page-documentationindex-description: is a short description of the page that will be displayed in the tile under its title.
For pages containing API descriptions:
:page-talend_swaggerui: true indicates that the page contains some API reference that should be displayed using Swagger UI
The Talend Component Kit documentation is published as HTML and PDF. Some parts can differ between these two versions, such as lists of links, that are not functional in the PDF version.
To avoid this, it is possible to define some conditionality to have some content only displaying in one of the output formats only. For example:
Every .adoc file can only contain one 'level 1' title (=). It is the title of the page and is located at the top of the document. It is a best practices that all sublevels added to the document are kept consistent. For example, don’t use a 'level 3' (===) section directly inside a 'level 1'.
When possible, avoid going lower than section 2 (===) to keep the page readable. In the HTML output, the document title is renderedh1, section 1 titles as h2, etc. The "in-page" navigation available on the right side of the HTML rendering only considers h2 and h3 elements. Other levels are ignored to keep the navigation readable and simple.
It is possible to reuse content through "includes". Includes can be used to reuse entire files in other files, allowing to avoid copy pasting.
When using an 'include' (calling an .adoc file from another .adoc file), you can specify a level offset to keep the hierarchy consistent in the current document. Avoid using includes if not absolutely necessary. An include can be done as follows:
In this case, both doc1.adoc and doc2.adoc are rendered in the same page and their content is offset by one level, meaning that the document title of doc1 becomes a section 1 title (h2) instead of an h1 in the final rendering, and so on.
Note that both doc1.adoc and doc2.adoc will in addition be rendered as standalone pages (doc1.html and doc2.html).
All images are stored under documentation > src > main > antora > modules > ROOT > assets > images. Relatively to .adoc files, it can be ../assets/images/ or ../../assets/images for _partials (automatically generated from code) pages. To avoid handling different relative paths, the backend already resolves directly image: to the image folder. Hence, paths to images should start with the following:
image:(
Talend Component scanning is based on plugins. To make sure that plugins can be developed in parallel and avoid conflicts, they need to be isolated (component or group of components in a single jar/plugin).
Multiple options are available:
Graph classloading: this option allows you to link the plugins and dependencies together dynamically in any direction. For example, the graph classloading can be illustrated by OSGi containers.
Tree classloading: a shared classloader inherited by plugin classloaders. However, plugin classloader classes are not seen by the shared classloader, nor by other plugins. For example, the tree classloading is commonly used by Servlet containers where plugins are web applications.
Flat classpath: listed for completeness but rejected by design because it doesn’t comply with this requirement.
In order to avoid much complexity added by this layer, Talend Component Kit relies on a tree classloading. The advantage is that you don’t need to define the relationship with other plugins/dependencies, because it is built-in.
Here is a representation of this solution:
The shared area contains Talend Component Kit API, which only contains by default the classes shared by the plugins.
Then, each plugin is loaded with its own classloader and dependencies.
This section explains the overall way to handle dependencies but the Talend Maven plugin provides a shortcut for that.
A plugin is a JAR file that was enriched with the list of its dependencies. By default, Talend Component Kit runtime is able to read the output of maven-dependency-plugin in TALEND-INF/dependencies.txt. You just need to make sure that your component defines the following plugin:
Once build, check the JAR file and look for the following lines:
What is important to see is the scope related to the artifacts:
The APIs (component-api and geronimo-annotation_1.3_spec) are provided because you can consider them to be there when executing (they come with the framework).
Your specific dependencies (awesome-project in the example above) are marked as compile: they are included as needed dependencies by the framework (note that using runtime works too).
the other dependencies are ignored. For example, test dependencies.
Even if a flat classpath deployment is possible, it is not recommended because it would then reduce the capabilities of the components.
The way the framework resolves dependencies is based on a local Maven repository layout. As a quick reminder, it looks like:
This is all the layout the framework uses. The logic converts t-uple {groupId, artifactId, version, type (jar)} to the path in the repository.
Talend Component Kit runtime has two ways to find an artifact:
From the file system based on a configured Maven 2 repository.
From a fat JAR (uber JAR) with a nested Maven repository under MAVEN-INF/repository.
The first option uses either ${user.home}/.m2/repository (default) or a specific path configured when creating a ComponentManager. The nested repository option needs some configuration during the packaging to ensure the repository is correctly created.
To create the nested MAVEN-INF/repository repository, you can use the nested-maven-repository extension:
Plugins are usually programmatically registered. If you want to make some of them automatically available, you need to generate a TALEND-INF/plugins.properties file that maps a plugin name to coordinates found with the Maven mechanism described above.
You can enrich maven-shade-plugin to do it:
Here is a final job/application bundle based on maven-shade-plugin:
The configuration unrelated to transformers depends on your application.
ContainerDependenciesTransformer embeds a Maven repository and PluginTransformer to create a file that lists (one per line) artifacts (representing plugins).
Both transformers share most of their configuration:
session: must be set to ${session}. This is used to retrieve dependencies.
scope: a comma-separated list of scopes to include in the artifact filtering (note that the default will rely on provided but you can replace it by compile, runtime, runtime+compile, runtime+system or test).
include: a comma-separated list of artifacts to include in the artifact filtering.
exclude: a comma-separated list of artifacts to exclude in the artifact filtering.
userArtifacts: set of artifacts to include (groupId, artifactId, version, type - optional, file - optional for plugin transformer, scope - optional) which can be forced inline. This parameter is mainly useful for PluginTransformer.
includeTransitiveDependencies: should transitive dependencies of the components be included. Set to true by default. It is active for userArtifacts.
includeProjectComponentDependencies: should component project dependencies be included. Set to false by default. It is not needed when a job project uses isolation for components.
With the component tooling, it is recommended to keep default locations. Also if you need to use project dependencies, you can need to refactor your project structure to ensure component isolation. Talend Component Kit lets you handle that part but the recommended practice is to use userArtifacts for the components instead of project
This tutorial shows how to create components that consume a REST API. The component developed as example in this tutorial is an input component that provides a search functionality for Zendesk using its Search API. Lombok is used to avoid writing getter, setter and constructor methods. You can generate a project using the Talend Components Kit starter, as described in this tutorial. The input component relies on Zendesk Search API and requires an HTTP client to consume it. The Zendesk Search API takes the following parameters on the /api/v2/search.json endpoint. query : The search query. sort_by : The sorting type of the query result. Possible values are updated_at, created_at, priority, status, ticket_type, or relevance. It defaults to relevance. sort_order: The sorting order of the query result. Possible values are asc (for ascending) or desc (for descending). It defaults to desc. Talend Component Kit provides a built-in service to create an easy-to-use HTTP client in a declarative manner, using Java annotations. No additional implementation is needed for the interface, as it is provided by the component framework, according to what is defined above. This HTTP client can be injected into a mapper or a processor to perform HTTP requests. This example uses the basic authentication that supported by the API. The first step is to set up the configuration for the basic authentication. To be able to consume the Search API, the Zendesk instance URL, the username and the password are needed. The data store is now configured. It provides a basic authentication token. Once the data store is configured, you can define the dataset by configuring the search query. It is that query that defines the records processed by the input component. Your component is configured. You can now create the component logic. Mappers defined with this tutorial don’t implement the split part because HTTP calls are not split on many workers in this case. Once the component logic implemented, you can create the source in charge of performing the HTTP request to the search API and converting the result to JsonObject records. You now have created a simple Talend component that consumes a REST API. To learn how to test this component, refer to this tutorial.