Search results for best+practices

Talend Component Kit best practices  List of best practices for developing Talend components.   best practices checklist

Some recommendations apply to the way component packages are organized: Make sure to create a package-info.java file with the component family/categories at the root of your component package: Create a package for the configuration. Create a package for the actions. Create a package for the component and one sub-package by type of component (input, output, processors, and so on). It is recommended to serialize your configuration in order to be able to pass it through other components. When building a new component, the first step is to identify the way it must be configured. The two main concepts are: The DataStore which is the way you can access the backend. The DataSet which is the way you interact with the backend. For example: Example description DataStore DataSet Accessing a relational database like MySQL JDBC driver, URL, username, password Query to execute, row mapper, and so on. Accessing a file system File pattern (or directory + file extension/prefix/…) File format, buffer size, and so on. It is common to have the dataset including the datastore, because both are required to work. However, it is recommended to replace this pattern by defining both dataset and datastore in a higher level configuration model. For example: Input and output components are particular because they can be linked to a set of actions. It is recommended to wire all the actions you can apply to ensure the consumers of your component can provide a rich experience to their users. The most common actions are the following ones: This action exposes a way to ensure the datastore/connection works. Configuration example: Action example: Until the studio integration is complete, it is recommended to limit processors to one input. Configuring processor components is simpler than configuring input and output components because it is specific for each component. For example, a mapper takes the mapping between the input and output models: It is recommended to provide as much information as possible to let the UI work with the data during its edition. Light validations are all the validations you can execute on the client side. They are listed in the UI hint section. Use light validations first before going with custom validations because they are more efficient. Custom validations enforce custom code to be executed, but are heavier to execute. Prefer using light validations when possible. Define an action with the parameters needed for the validation and link the option you want to validate to this action. For example, to validate a dataset for a JDBC driver: You can also define a Validable class and use it to validate a form by setting it on your whole configuration: The parameter binding of the validation method uses the same logic as the component configuration injection. Therefore, the @Option method specifies the prefix to use to reference a parameter. It is recommended to use @Option("value") until you know exactly why you don’t use it. This way, the consumer can match the configuration model and just prefix it with value. to send the instance to validate. Validations are triggers based on "events". If you mark part of a configuration as @Validable but this configuration is translated to a widget without any interaction, then no validation will happen. The rule of thumb is to mark only primitives and simple types (list of primitives) as @Validable. It can be handy and user-friendly to provide completion on some fields. For example, to define completion for available drivers: Each component must have its own icon: You can use talend.surge.sh/icons/ to find the icon you want to use. It is recommended to enforce the version of your component, event though it is not mandatory for the first version. If you break a configuration entry in a later version; make sure to: Upgrade the version. Support a migration of the configuration. Testing your components is critical. You can use unit and simple standalone JUnit tests, but it is also highly recommended to have Beam tests in order to make sure that your component works in Big Data.

Testing best practices  Learn the best practices for testing components developed with Talend Component Kit   test best practices testing

This section mainly concerns tools that can be used with JUnit. You can use most of these best practices with TestNG as well. Parameterized tests are a great solution to repeat the same test multiple times. This method of testing requires defining a test scenario (I test function F) and making the input/output data dynamic. Here is a test example, which validates a connection URI using ConnectionService: The testing method is always the same. Only values are changing. It can therefore be rewritten using JUnit Parameterized runner, as follows: You don’t have to define a single @Test method. If you define multiple methods, each of them is executed with all the data. For example, if another test is added to the previous example, four tests are executed - 2 per data). With JUnit 5, parameterized tests are easier to use. The full documentation is available at junit.org/junit5/docs/current/user-guide/#writing-tests-parameterized-tests. The main difference with JUnit 4 is that you can also define inline that the test method is a parameterized test as well as the values to use: However, you can still use the previous behavior with a method binding configuration: This last option allows you to inject any type of value - not only primitives - which is common to define scenarios. Add the junit-jupiter-params dependency to benefit from this feature.

Defining a processor  How to develop a processor component with Talend Component Kit   component type processor output

A Processor is a component that converts incoming data to a different model. A processor must have a method decorated with @ElementListener taking an incoming data and returning the processed data: Processors must be Serializable because they are distributed components. If you just need to access data on a map-based ruleset, you can use Record or JsonObject as parameter type. From there, Talend Component Kit wraps the data to allow you to access it as a map. The parameter type is not enforced. This means that if you know you will get a SuperCustomDto, then you can use it as parameter type. But for generic components that are reusable in any chain, it is highly encouraged to use Record until you have an evaluation language-based processor that has its own way to access components. For example: A processor also supports @BeforeGroup and @AfterGroup methods, which must not have any parameter and return void values. Any other result would be ignored. These methods are used by the runtime to mark a chunk of the data in a way which is estimated good for the execution flow size. Because the size is estimated, the size of a group can vary. It is even possible to have groups of size 1. It is recommended to batch records, for performance reasons: You can optimize the data batch processing by using the maxBatchSize parameter. This parameter is automatically implemented on the component when it is deployed to a Talend application. Only the logic needs to be implemented. You can however customize its value setting in your LocalConfiguration the property _maxBatchSize.value - for the family - or ${component simple class name}._maxBatchSize.value - for a particular component, otherwise its default will be 1000. If you replace value by active, you can also configure if this feature is enabled or not. This is useful when you don’t want to use it at all. Learn how to implement chunking/bulking in this document. In some cases, you may need to split the output of a processor in two or more connections. A common example is to have "main" and "reject" output connections where part of the incoming data are passed to a specific bucket and processed later. Talend Component Kit supports two types of output connections: Flow and Reject. Flow is the main and standard output connection. The Reject connection handles records rejected during the processing. A component can only have one reject connection, if any. Its name must be REJECT to be processed correctly in Talend applications. You can also define the different output connections of your component in the Starter. To define an output connection, you can use @Output as replacement of the returned value in the @ElementListener: Alternatively, you can pass a string that represents the new branch: Having multiple inputs is similar to having multiple outputs, except that an OutputEmitter wrapper is not needed: @Input takes the input name as parameter. If no name is set, it defaults to the "main (default)" input branch. It is recommended to use the default branch when possible and to avoid naming branches according to the component semantic. Batch processing refers to the way execution environments process batches of data handled by a component using a grouping mechanism. By default, the execution environment of a component automatically decides how to process groups of records and estimates an optimal group size depending on the system capacity. With this default behavior, the size of each group could sometimes be optimized for the system to handle the load more effectively or to match business requirements. For example, real-time or near real-time processing needs often imply processing smaller batches of data, but more often. On the other hand, a one-time processing without business contraints is more effectively handled with a batch size based on the system capacity. Final users of a component developed with the Talend Component Kit that integrates the batch processing logic described in this document can override this automatic size. To do that, a maxBatchSize option is available in the component settings and allows to set the maximum size of each group of data to process. A component processes batch data as follows: Case 1 - No maxBatchSize is specified in the component configuration. The execution environment estimates a group size of 4. Records are processed by groups of 4. Case 2 - The runtime estimates a group size of 4 but a maxBatchSize of 3 is specified in the component configuration. The system adapts the group size to 3. Records are processed by groups of 3. Batch processing relies on the sequence of three methods: @BeforeGroup, @ElementListener, @AfterGroup, that you can customize to your needs as a component Developer. The group size automatic estimation logic is automatically implemented when a component is deployed to a Talend application. Each group is processed as follows until there is no record left: The @BeforeGroup method resets a record buffer at the beginning of each group. The records of the group are assessed one by one and placed in the buffer as follows: The @ElementListener method tests if the buffer size is greater or equal to the defined maxBatchSize. If it is, the records are processed. If not, then the current record is buffered. The previous step happens for all records of the group. Then the @AfterGroup method tests if the buffer is empty. You can define the following logic in the processor configuration: You can also use the condensed syntax for this kind of processor: When writing tests for components, you can force the maxBatchSize parameter value by setting it with the following syntax: .$maxBatchSize=10. You can learn more about processors in this document. Defining a processor/output logic General component execution logic Implementing bulk processing best practices For the case of output components (not emitting any data) using bulking you can pass the list of records to the after group method:

Defining a processor or an output component logic  How to develop an output component with Talend Component Kit   output processor

Processors and output components are the components in charge of reading, processing and transforming data in a Talend job, as well as passing it to its required destination. Before implementing the component logic and defining its layout and configurable fields, make sure you have specified its basic metadata, as detailed in this document. A Processor is a component that converts incoming data to a different model. A processor must have a method decorated with @ElementListener taking an incoming data and returning the processed data: Processors must be Serializable because they are distributed components. If you just need to access data on a map-based ruleset, you can use Record or JsonObject as parameter type. From there, Talend Component Kit wraps the data to allow you to access it as a map. The parameter type is not enforced. This means that if you know you will get a SuperCustomDto, then you can use it as parameter type. But for generic components that are reusable in any chain, it is highly encouraged to use Record until you have an evaluation language-based processor that has its own way to access components. For example: A processor also supports @BeforeGroup and @AfterGroup methods, which must not have any parameter and return void values. Any other result would be ignored. These methods are used by the runtime to mark a chunk of the data in a way which is estimated good for the execution flow size. Because the size is estimated, the size of a group can vary. It is even possible to have groups of size 1. It is recommended to batch records, for performance reasons: You can optimize the data batch processing by using the maxBatchSize parameter. This parameter is automatically implemented on the component when it is deployed to a Talend application. Only the logic needs to be implemented. You can however customize its value setting in your LocalConfiguration the property _maxBatchSize.value - for the family - or ${component simple class name}._maxBatchSize.value - for a particular component, otherwise its default will be 1000. If you replace value by active, you can also configure if this feature is enabled or not. This is useful when you don’t want to use it at all. Learn how to implement chunking/bulking in this document. In some cases, you may need to split the output of a processor in two or more connections. A common example is to have "main" and "reject" output connections where part of the incoming data are passed to a specific bucket and processed later. Talend Component Kit supports two types of output connections: Flow and Reject. Flow is the main and standard output connection. The Reject connection handles records rejected during the processing. A component can only have one reject connection, if any. Its name must be REJECT to be processed correctly in Talend applications. You can also define the different output connections of your component in the Starter. To define an output connection, you can use @Output as replacement of the returned value in the @ElementListener: Alternatively, you can pass a string that represents the new branch: Having multiple inputs is similar to having multiple outputs, except that an OutputEmitter wrapper is not needed: @Input takes the input name as parameter. If no name is set, it defaults to the "main (default)" input branch. It is recommended to use the default branch when possible and to avoid naming branches according to the component semantic. Batch processing refers to the way execution environments process batches of data handled by a component using a grouping mechanism. By default, the execution environment of a component automatically decides how to process groups of records and estimates an optimal group size depending on the system capacity. With this default behavior, the size of each group could sometimes be optimized for the system to handle the load more effectively or to match business requirements. For example, real-time or near real-time processing needs often imply processing smaller batches of data, but more often. On the other hand, a one-time processing without business contraints is more effectively handled with a batch size based on the system capacity. Final users of a component developed with the Talend Component Kit that integrates the batch processing logic described in this document can override this automatic size. To do that, a maxBatchSize option is available in the component settings and allows to set the maximum size of each group of data to process. A component processes batch data as follows: Case 1 - No maxBatchSize is specified in the component configuration. The execution environment estimates a group size of 4. Records are processed by groups of 4. Case 2 - The runtime estimates a group size of 4 but a maxBatchSize of 3 is specified in the component configuration. The system adapts the group size to 3. Records are processed by groups of 3. Batch processing relies on the sequence of three methods: @BeforeGroup, @ElementListener, @AfterGroup, that you can customize to your needs as a component Developer. The group size automatic estimation logic is automatically implemented when a component is deployed to a Talend application. Each group is processed as follows until there is no record left: The @BeforeGroup method resets a record buffer at the beginning of each group. The records of the group are assessed one by one and placed in the buffer as follows: The @ElementListener method tests if the buffer size is greater or equal to the defined maxBatchSize. If it is, the records are processed. If not, then the current record is buffered. The previous step happens for all records of the group. Then the @AfterGroup method tests if the buffer is empty. You can define the following logic in the processor configuration: You can also use the condensed syntax for this kind of processor: When writing tests for components, you can force the maxBatchSize parameter value by setting it with the following syntax: .$maxBatchSize=10. You can learn more about processors in this document. Defining a processor/output logic General component execution logic Implementing bulk processing best practices For the case of output components (not emitting any data) using bulking you can pass the list of records to the after group method: An Output is a Processor that does not return any data. Conceptually, an output is a data listener. It matches the concept of processor. Being the last component of the execution chain or returning no data makes your processor an output component: Currently, Talend Component Kit does not allow you to define a Combiner. A combiner is the symmetric part of a partition mapper. It allows to aggregate results in a single partition.

Registering components  How to define component and component family metadata   icon component version component name family category metadata

Before implementing a component logic and configuration, you need to specify the family and the category it belongs to, the component type and name, as well as its name and a few other generic parameters. This set of metadata, and more particularly the family, categories and component type, is mandatory to recognize and load the component to Talend Studio or Cloud applications. Some of these parameters are handled at the project generation using the starter, but can still be accessed and updated later on. The family and category of a component is automatically written in the package-info.java file of the component package, using the @Components annotation. By default, these parameters are already configured in this file when you import your project in your IDE. Their value correspond to what was defined during the project definition with the starter. Multiple components can share the same family and category value, but the family + name pair must be unique for the system. A component can belong to one family only and to one or several categories. If not specified, the category defaults to Misc. The package-info.java file also defines the component family icon, which is different from the component icon. You can learn how to customize this icon in this section. Here is a sample package-info.java: Another example with an existing component: Components can require metadata to be integrated in Talend Studio or Cloud platforms. Metadata is set on the component class and belongs to the org.talend.sdk.component.api.component package. When you generate your project and import it in your IDE, icon and version both come with a default value. @Icon: Sets an icon key used to represent the component. You can use a custom key with the custom() method but the icon may not be rendered properly. The icon defaults to Check. Replace it with a custom icon, as described in this section. @Version: Sets the component version. 1 by default. Learn how to manage different versions and migrations between your component versions in this section. For example: Every component family and component needs to have a representative icon. You have to define a custom icon as follows: For the component family the icon is defined in the package-info.java file. For the component itself, you need to declare the icon in the component class. Custom icons must comply with the following requirements: Icons must be stored in the src/main/resources/icons folder of the project. Icon file names need to match one of the following patterns: IconName.svg or IconName_icon32.png. The latter will run in degraded mode in Talend Cloud. Replace IconName by the name of your choice. Icons must be squared, even for the SVG format. Note that SVG icons are not supported by Talend Studio and can cause the deployment of the component to fail. If you aim at deploying a custom component to Talend Studio, specify PNG icons or use the Maven (or Gradle) svg2png plugin to convert SVG icons to PNG. If you want a finer control over both images, you can provide both in your component. Ultimately, you can also remove SVG parameters from the talend.component.server.icon.paths property in the HTTP server configuration. Note that SVG icons are not supported by Talend Studio and can cause the deployment of the component to fail. If you aim at deploying a custom component to Talend Studio, specify PNG icons or use the Maven (or Gradle) svg2png plugin to convert SVG icons to PNG. If you want a finer control over both images, you can provide both in your component. Ultimately, you can also remove SVG parameters from the talend.component.server.icon.paths property in the HTTP server configuration. For any purpose, you can also add user defined metadatas to your component with the @Metadatas annotation. Example: You can also use a SPI implementing org.talend.sdk.component.spi.component.ComponentMetadataEnricher. Methodology for creating components Generating a project using the starter Managing component versions Defining an input component Defining a processor or output component Defining a driver runner component Defining component layout and configuration best practices

Testing components  Learn how to test your component logic in the environment you need using Talend Component Kit   test overview environment beam runtime testing

Developing new components includes testing them in the required execution environments. Use the following articles to learn about the best practices and the available options to fully test your components. Component testing best practices Component testing kit Beam testing Testing in multiple environments Reusing Maven credentials Generating data for testing Simple/Test Pipeline API Beam Pipeline API

Internationalizing services  How to internationalize a service using Talend Component Kit   service component-manager internationalization i18n language lang locale

Internationalization requires following several best practices: Storing messages using ResourceBundle properties file in your component module. The location of the properties is in the same package than the related components and is named Messages. For example, org.talend.demo.MyComponent uses org.talend.demo.Messages[locale].properties. Use the internationalization API for your own messages. The Internationalization API is the mechanism to use to internationalize your own messages in your own components. The principle of the API is to design messages as methods returning String values and get back a template using a ResourceBundle named Messages and located in the same package than the interface that defines these methods. To ensure your internationalization API is identified, you need to mark it with the @Internationalized annotation: The corresponding Messages.properties placed in the org/superbiz resource folder contains the following:

Implementing components  Get an overview of the main steps to code the logic of your custom Talend Componit Kit components   create code class logic layout configuration dev overview api

Once you have generated a project, you can start implementing the logic and layout of your components and iterate on it. Depending on the type of component you want to create, the logic implementation can differ. However, the layout and component metadata are defined the same way for all types of components in your project. The main steps are: Defining family and component metadata Defining an input component logic Defining a processor/output logic Defining a standalone component logic Defining component layout and configuration In some cases, you will require specific implementations to handle more advanced cases, such as: Internationalizing a component Managing component versions Masking sensitive data Implementing batch processing Implementing streaming on a component You can also make certain configurations reusable across your project by defining services. Using your Java IDE along with a build tool supported by the framework, you can then compile your components to test and deploy them to Talend Studio or other Talend applications: Building components with Maven Building components with Gradle Wrapping a Beam I/O In any case, follow these best practices to ensure the components you develop are optimized. You can also learn more about component loading and plugins here: Loading a component

Building components with Maven  Use Maven or the Maven wrapper as build tool to develop components   mvn mvnw maven maven-plugin tool build

To develop new components, Talend Component Kit requires a build tool in which you will import the component project generated from the starter. You will then be able to install and deploy it to Talend applications. A Talend Component Kit plugin is available for each of the supported build tools. talend-component-maven-plugin helps you write components that match best practices and generate transparently metadata used by Talend Studio. You can use it as follows: This plugin is also an extension so you can declare it in your build/extensions block as: Used as an extension, the goals detailed in this document will be set up. The Talend Component Kit plugin integrates some specific goals within Maven build lifecycle. For example, to compile the project and prepare for deploying your component, run mvn clean install. Using this command, the following goals are executed: The build is split into several phases. The different goals are executed in the order shown above. Talend Component Kit uses default goals from the Maven build lifecycle and adds additional goals to the building and packaging phases. Goals added to the build by Talend Component Kit are detailed below. The default lifecycle is detailed in Maven documentation. The Talend Component Kit plugin for Maven integrates several specific goals into Maven build lifecycle. To run specific goals individually, run the following command from the root of the project, by adapting it with each goal name, parameters and values: The first goal is a shortcut for the maven-dependency-plugin. It creates the TALEND-INF/dependencies.txt file with the compile and runtime dependencies, allowing the component to use it at runtime: The scan-descriptor goal scans the current module and optionally other configured folders to precompute the list of interesting classes for the framework (components, services). It allows to save some bootstrap time when launching a job, which can be useful in some execution cases: Configuration - excluding parameters used by default only: Name Description User property Default output Where to dump the scan result. Note: It is not supported to change that value in the runtime. talend.scan.output ${project.build.outputDirectory}/TALEND-INF/scanning.properties scannedDirectories Explicit list of directories to scan. talend.scan.scannedDirectories If not set, defaults to ${project.build.outputDirectory} scannedDependencies Explicit list of dependencies to scan - set them in the groupId:artifactId format. The list is appended to the file to scan. talend.scan.scannedDependencies - The svg2png goal scans a directory - default to target/classes/icons - to find .svg files and copy them in a PNG version size at 32x32px and named with the suffix _icon32.png to enable the studio to read it: Configuration: Name Description User property Default icons Where to scan for the SVG icons to convert in PNG. talend.icons.source ${project.build.outputDirectory}/icons workarounds By default the shape of the icon will be enforce in the RGB channels (in white) using the alpha as reference. This is useful for black/white images using alpha to shape the picture because Eclipse - Talend Studio - caches icons using RGB but not alpha channel, pictures not using alpha channel to draw their shape should disable that workaround. talend.icons.workaround true if you use that plugin, ensure to set it before the validate mojo otherwise validation can miss some png files. This goal helps you validate the common programming model of the component. To activate it, you can use following execution definition: It is bound to the process-classes phase by default. When executed, it performs several validations that can be disabled by setting the corresponding flags to false in the block of the execution: Name Description User property Default validateInternationalization Validates that resource bundles are presents and contain commonly used keys (for example, _displayName) talend.validation.internationalization true validateModel Ensures that components pass validations of the ComponentManager and Talend Component runtime talend.validation.model true validateSerializable Ensures that components are Serializable. This is a sanity check, the component is not actually serialized here. If you have a doubt, make sure to test it. It also checks that any @Internationalized class is valid and has its keys. talend.validation.serializable true validateMetadata Ensures that components have an @Icon and a @Version defined. talend.validation.metadata true validateDataStore Ensures that any @DataStore defines a @HealthCheck and has a unique name. talend.validation.datastore true validateDataSet Ensures that any @DataSet has a unique name. Also ensures that there is a source instantiable just filling the dataset properties (all others not being required). Finally, the validation checks that each input or output component uses a dataset and that this dataset has a datastore. talend.validation.dataset true validateComponent Ensures that the native programming model is respected. You can disable it when using another programming model like Beam. talend.validation.component true validateActions Validates action signatures for actions not tolerating dynamic binding (@HealthCheck, @DynamicValues, and so on). It is recommended to keep it set to true. talend.validation.action true validateFamily Validates the family by verifying that the package containing the @Components has a @Icon property defined. talend.validation.family true validateDocumentation Ensures that all components and @Option properties have a documentation using the @Documentation property. talend.validation.documentation true validateLayout Ensures that the layout is referencing existing options and properties. talend.validation.layout true validateOptionNames Ensures that the option names are compliant with the framework. It is highly recommended and safer to keep it set to true. talend.validation.options true validateLocalConfiguration Ensures that if any TALEND-INF/local-configuration.properties exists then keys start with the family name. talend.validation.localConfiguration true validateOutputConnection Ensures that an output has only one input branch. talend.validation.validateOutputConnection true validatePlaceholder Ensures that string options have a placeholder. It is highly recommended to turn this property on. talend.validation.placeholder false locale The locale used to validate internationalization. talend.validation.locale root The asciidoc goal generates an Asciidoc file documenting your component from the configuration model (@Option) and the @Documentation property that you can add to options and to the component itself. Name Description User property Default level Level of the root title. talend.documentation.level 2 (==) output Output folder path. It is recommended to keep it to the default value. talend.documentation.output ${classes}/TALEND-INF/documentation.adoc formats Map of the renderings to do. Keys are the format (pdf or html) and values the output paths. talend.documentation.formats - attributes Asciidoctor attributes to use for the rendering when formats is set. talend.documentation.attributes - templateEngine Template engine configuration for the rendering. talend.documentation.templateEngine - templateDir Template directory for the rendering. talend.documentation.templateDir - title Document title. talend.documentation.title ${project.name} version The component version. It defaults to the pom version talend.documentation.version ${project.version} workDir The template directory for the Asciidoctor rendering - if 'formats' is set. talend.documentation.workdDir ${project.build.directory}/talend-component/workdir attachDocumentations Allows to attach (and deploy) the documentations (.adoc, and formats keys) to the project. talend.documentation.attach true htmlAndPdf If you use the plugin as an extension, you can add this property and set it to true in your project to automatically get HTML and PDF renderings of the documentation. talend.documentation.htmlAndPdf false To render the generated documentation in HTML or PDF, you can use the Asciidoctor Maven plugin (or Gradle equivalent). You can configure both executions if you want both HTML and PDF renderings. Make sure to execute the rendering after the documentation generation. If you prefer a HTML rendering, you can configure the following execution in the asciidoctor plugin. The example below: Generates the components documentation in target/classes/TALEND-INF/documentation.adoc. Renders the documentation as an HTML file stored in target/documentation/documentation.html. If you prefer a PDF rendering, you can configure the following execution in the asciidoctor plugin: If you want to add some more content or a title, you can include the generated document into another document using Asciidoc include directive. For example: To be able to do that, you need to pass the generated_doc attribute to the plugin. For example: This is optional but allows to reuse Maven placeholders to pass paths, which can be convenient in an automated build. You can find more customization options on Asciidoctor website. Testing the rendering of your component configuration into the Studio requires deploying the component in Talend Studio. Refer to the Studio documentation. In the case where you need to deploy your component into a Cloud (web) environment, you can test its web rendering by using the web goal of the plugin: Run the mvn talend-component:web command. Open the following URL in a web browser: localhost:8080. Select the component form you want to see from the treeview on the left. The selected form is displayed on the right. Two parameters are available with the plugin: serverPort, which allows to change the default port (8080) of the embedded server. Its associated user property is talend.web.port. serverArguments, that you can use to pass Meecrowave options to the server. Learn more about that configuration at openwebbeans.apache.org/meecrowave/meecrowave-core/cli.html. Make sure to install the artifact before using this command because it reads the component JAR from the local Maven repository. Finally, you can switch the lang of the component UI (documentation, form) using language query parameter in the webapp. For instance localhost:8080?language=fr. If you built a custom UI (JS + CSS) bundle and want to test it in the web application, you can configure it in the pom.xml file as follows: This is an advanced feature designed for expert users. Use it with caution. Component ARchive (.car) is the way to bundle a component to share it in the Talend ecosystem. It is an executable Java ARchive (.jar) containing a metadata file and a nested Maven repository containing the component and its dependencies. This command creates a .car file in your build directory. This file can be shared on Talend platforms. This command has some optional parameters: Name Description User property Default attach Specifies whether the component archive should be attached. talend.car.attach true classifier The classifier to use if attach is set to true. talend.car.classifier component metadata Additional custom metadata to bundle in the component archive. - - output Specifies the output path and name of the archive talend.car.output ${project.build.directory}/${project.build.finalName}.car packaging Specifies the packaging - ${project.packaging} This CAR is executable and exposes the studio-deploy command which takes a Talend Studio home path as parameter. When executed, it installs the dependencies into the Studio and registers the component in your instance. For example: You can also upload the dependencies to your Nexus server using the following command: In this command, Nexus URL and repository name are mandatory arguments. All other arguments are optional. If arguments contain spaces or special symbols, you need to quote the whole value of the argument. For example: The deploy-in-studio goal deploys the current component module into a local Talend Studio instance. Name Description User property Default studioHome Path to the Studio home directory talend.component.studioHome - studioM2 Path to the Studio maven repository if not the default one talend.component.studioM2 - You can use the following command from the root folder of your project: The help goal displays help information on talend-component-maven-plugin. Call mvn talend-component:help -Ddetail=true -Dgoal= to display the parameter details of a specific goal. Name Description User property Default detail Displays all settable properties for each goal. detail false goal The name of the goal for which to show help. If unspecified, all goals are displayed. goal - indentSize Number of spaces per indentation level. This integer should be positive. indentSize 2 lineLength Maximum length of a display line. This integer should be positive. lineLength 80

Managing component versions and migration  How to handle component versions and migration   migrationHandler version migration backward compatibility configuration option api

If some changes impact the configuration, they can be managed through a migration handler at the component level (enabling trans-model migration support). The @Version annotation supports a migrationHandler method which migrates the incoming configuration to the current model. For example, if the filepath configuration entry from v1 changed to location in v2, you can remap the value in your MigrationHandler implementation. A best practice is to split migrations into services that you can inject in the migration handler (through constructor) rather than managing all migrations directly in the handler. For example: What is important to notice in this snippet is the fact that you can organize your migrations the way that best fits your component. If you need to apply migrations in a specific order, make sure that they are sorted. Consider this API as a migration callback rather than a migration API. Adjust the migration code structure you need behind the MigrationHandler, based on your component requirements, using service injection. A nested configuration always migrates itself with any root prefix, whereas a component configuration always roots the full configuration. For example, if your model is the following: Then the component will see the path configuration.datastore.url for the datastore url whereas the datastore will see the path url for the same property. You can see it as configuration types - @DataStore, @DataSet - being configured with an empty root path.

Wrapping a Beam I/O  Learn how to wrap Beam inputs and outputs   Beam input output

This part is limited to specific kinds of Beam PTransform: PTransform> for inputs. PTransform, PDone> for outputs. Outputs must use a single (composite or not) DoFn in their apply method. To illustrate the input wrapping, this procedure uses the following input as a starting point (based on existing Beam inputs): To wrap the Read in a framework component, create a transform delegating to that Read with at least a @PartitionMapper annotation and using @Option constructor injections to configure the component. Also make sure to follow the best practices and to specify @Icon and @Version. To illustrate the output wrapping, this procedure uses the following output as a starting point (based on existing Beam outputs): You can wrap this output exactly the same way you wrap an input, but using @Processor instead of: Note that the org.talend.sdk.component.runtime.beam.transform.DelegatingTransform class fully delegates the "expansion" to another transform. Therefore, you can extend it and implement the configuration mapping: In terms of classloading, when you write an I/O, the Beam SDK Java core stack is assumed as provided in Talend Component Kit runtime. This way, you don’t need to include it in the compile scope, it would be ignored anyway. If you need a JSonCoder, you can use the org.talend.sdk.component.runtime.beam.factory.service.PluginCoderFactory service, which gives you access to the JSON-P and JSON-B coders. There is also an Avro coder, which uses the FileContainer. It ensures it is self-contained for IndexedRecord and it does not require—as the default Apache Beam AvroCoder—to set the schema when creating a pipeline. It consumes more space and therefore is slightly slower, but it is fine for DoFn, since it does not rely on serialization in most cases. See org.talend.sdk.component.runtime.beam.transform.avro.IndexedRecordCoder. If your PCollection is made of JsonObject records, and you want to convert them to IndexedRecord, you can use the following PTransforms: converts an IndexedRecord to a JsonObject. converts a JsonObject to an IndexedRecord. converts a JsonObject to an IndexedRecord with AVRO schema inference. There are two main provided coder for Record: it will unwrap the record as an Avro IndexedRecord and serialize it with its schema. This can indeed have a performance impact but, due to the structure of component, it will not impact the runtime performance in general - except with direct runner - because the runners will optimize the pipeline accurately. it will serialize the Avro IndexedRecord as well but it will ensure the schema is in the SchemaRegistry to be able to deserialize it when needed. This implementation is faster but the default implementation of the registry is "in memory" so will only work with a single worker node. You can extend it using Java SPI mecanism to use a custom distributed implementation. Sample input based on Beam Kafka: Because the Beam wrapper does not respect the standard Talend Component Kit programming model ( for example, there is no @Emitter), you need to set the false property in your pom.xml file (or equivalent for Gradle) to skip the component programming model validations of the framework.

Building components with Gradle  Use Gradle or the Gradle wrapper as build tool to develop components   gradle tool build

To develop new components, Talend Component Kit requires a build tool in which you will import the component project generated from the starter. With this build tool, you will also be able to implement the logic of your component and to install and deploy it to Talend applications. A Talend Component Kit plugin is available for each of the supported build tools. gradle-talend-component helps you write components that match the best practices. It is inspired from the Maven plugin and adds the ability to generate automatically the dependencies.txt file used by the SDK to build the component classpath. For more information on the configuration, refer to the Maven properties matching the attributes. By default, Gradle does not log information messages. To see messages, use --info in your commands. Refer to Gradle’s documentation to learn about log levels. You can use it as follows:

Contributing to Talend Component Kit documentation  Lear how to contribute to component-runtime documentation   documentation asciidoc asciidoctor contributor

This document explains how Asciidoctor is used in the context of Talend Component Kit as well as the specific processes in place. For general guidelines about Asciidoctor, refer to the Asciidoc Syntax quick reference page. There are two ways to suggest modifications or new content. Both of the options below require you to have a GitHub account created. On every page of the Talend Component Kit Developer Guide, a Suggest and edit button is available. It allows you to access the corresponding source file on GitHub and to create a pull request with the suggested edits. The pull request is then assessed by the team in charge of the project. Fork the Runtime repository of the Talend Component Kit project and edit .adoc files located under documentation\src\main\antora\modules\ROOT\pages. Make sure to follow the guidelines outlined in the current document, especially for large modifications or new content, to make sure it can properly be rendered. When done, create a pull request that will be assessed by the team in charge of the project. The documentation is made of: Documentation files manually written under documentation\src\main\antora\modules\ROOT\pages. Documentation files automatically generated from the source code under documentation\src\main\antora\modules\ROOT\pages\_partials. These files are individually called in manually written files through includes. Assets, especially images, stored in the documentation\src\main\antora\modules\ROOT\assets folder. Some subfolders exist to categorize the assets. Each file has a unique name and is rendered as a unique HTML page. Some of the files are prefixed to help identifying the type of content it contains. Most common examples are: index- for pages referenced from the main index page. These pages also contain specific attributes to be correctly rendered on the main index page (see 'List of metadata attributes' below). tutorial- for tutorials/guided examples. generated_ for pages generated from the source code. These pages are generally stored in the _partials folder. For all pages: :page-partial indicates that the current .adoc file can be included in another document using an include::. This attribute has no value. :page-talend_skipindexation: indicates that the current .adoc file must not be indexed. This attribute has no value. Add it to files that should not be returned in the search, like index files that only contain includes. :description: is the meta description of the file. Each .adoc file is rendered as an HTML file. :keywords: is the list of meta keywords relevant for the current .adoc file. Separate keywords using simple commas. :page-talend_stage: draft indicates that the current document is a draft and is not final even if published. It triggers the display of a small banner indicating the status of the page. Remove this attribute once the page is final. For pages that should appear as a tile on the index page: :page-documentationindex-index: is the weight of the page. A low weight indicates that the page should be one of the first tiles to appear. A high weight will push the tile towards the end of the list in the index page. :page-documentationindex-label: is the title of the tile in the index page. :page-documentationindex-icon: is the icon of the tile in the index page. The value of this attribute should be the name of a free icon on fontawesome. :page-documentationindex-description: is a short description of the page that will be displayed in the tile under its title. For pages containing API descriptions: :page-talend_swaggerui: true indicates that the page contains some API reference that should be displayed using Swagger UI The Talend Component Kit documentation is published as HTML and PDF. Some parts can differ between these two versions, such as lists of links, that are not functional in the PDF version. To avoid this, it is possible to define some conditionality to have some content only displaying in one of the output formats only. For example: Every .adoc file can only contain one 'level 1' title (=). It is the title of the page and is located at the top of the document. It is a best practices that all sublevels added to the document are kept consistent. For example, don’t use a 'level 3' (===) section directly inside a 'level 1'. When possible, avoid going lower than section 2 (===) to keep the page readable. In the HTML output, the document title is renderedh1, section 1 titles as h2, etc. The "in-page" navigation available on the right side of the HTML rendering only considers h2 and h3 elements. Other levels are ignored to keep the navigation readable and simple. It is possible to reuse content through "includes". Includes can be used to reuse entire files in other files, allowing to avoid copy pasting. When using an 'include' (calling an .adoc file from another .adoc file), you can specify a level offset to keep the hierarchy consistent in the current document. Avoid using includes if not absolutely necessary. An include can be done as follows: In this case, both doc1.adoc and doc2.adoc are rendered in the same page and their content is offset by one level, meaning that the document title of doc1 becomes a section 1 title (h2) instead of an h1 in the final rendering, and so on. Note that both doc1.adoc and doc2.adoc will in addition be rendered as standalone pages (doc1.html and doc2.html). All images are stored under documentation > src > main > antora > modules > ROOT > assets > images. Relatively to .adoc files, it can be ../assets/images/ or ../../assets/images for _partials (automatically generated from code) pages. To avoid handling different relative paths, the backend already resolves directly image: to the image folder. Hence, paths to images should start with the following: image:(/).png[(,parameters)] If there is no subfolder, type the image name right away. Adding an image title is mandatory to avoid empty broken spaces in the content. If necessary, you can add more parameters separated by a comma between the same brackets as the image title, such as the desired width, height, etc. Use values in % for image size. For example; image:landscape.png[Landscape,70%,window="_blank",link="https://talend.github.io/component-runtime/main/1.54.1/_images/landscape.png"] In a general manner, avoid using tables if there are other solutions available. This is especially the case for complex tables that include assets or big code samples, as these can lead to display issues. Table example: value API Type Description @o.t.s.c.api.service.completion.DynamicValues dynamic_values Mark a method as being useful to fill potential values of a string option for a property denoted by its value. @o.t.s.c.api.service.healthcheck.HealthCheck healthcheck This class marks an action doing a connection test The following elements can be used to create admonition blocks. However, avoid using them one after another as it can make reading more difficult: NOTE: for a simple information note IMPORTANT: for a warning. Warnings should include information that lead to important errors if not taken into account TIP: for alternative ways or shortcuts to ease a process or procedure Admonition blocks should be kept as simple and short as possible. In some niche cases, it may be required to insert more complex content in an admonition block, such as a bullet list. In these cases, they should be formatted as follows: