Search results for learn

Talend Component Kit methodology Learn the main steps to build a custom component using Talend Component Kit get started learn

Developing new components using the Component Kit framework includes: Creating a project using the starter or the Talend IntelliJ plugin. This step allows to build the skeleton of the project. It consists in: Defining the general configuration model for each component in your project. Generating and downloading the project archive from the starter. Compiling the project. Importing the compiled project in your IDE. This step is not required if you have generated the project using the IntelliJ plugin. Implementing the components, including: Registering the components by specifying their metadata: family, categories, version, icon, type and name. Defining the layout and configurable part of the components. Defining the execution logic of the components, also called runtime. Testing the components. Deploying the components to Talend Studio or Cloud applications. Optionally, you can use services. Services are predefined or user-defined configurations that can be reused in several components.

From Javajet to Talend Component Kit The Javajet framework is being replaced by the new Talend Component Kit. Learn the main differences and the new approach introduced with this framework. javajet studio studio-integration learning getting started principles

From the version 7.0 of Talend Studio, Talend Component Kit becomes the recommended framework to use to develop components. This framework is being introduced to ensure that newly developed components can be deployed and executed both in on-premise/local and cloud/big data environments. From that new approach comes the need to provide a complete yet unique and compatible way of developing components. With the Component Kit, custom components are entirely implemented in Java. To help you get started with a new custom component development project, a Starter is available. Using it, you will be able to generate the skeleton of your project. By importing this skeleton in a development tool, you can then implement the components layout and execution logic in Java. With the previous Javajet framework, metadata, widgets and configurable parts of a custom component were specified in XML. With the Component Kit, they are now defined in the Configuration (for example, LoggerProcessorConfiguration) Java class of your development project. Note that most of this configuration is transparent if you specified the Configuration Model of your components right before generating the project from the Starter. Any undocumented feature or option is considered not supported by the Component Kit framework. You can find examples of output in Studio or Cloud environments in the Gallery. Javajet Component Kit Javajet Component Kit Javajet Component Kit Javajet Component Kit Javajet Component Kit Javajet Component Kit or Component Kit Javajet Component Kit Javajet Component Kit Javajet Component Kit Javajet Component Kit Javajet Component Kit Javajet Component Kit Javajet Component Kit Javajet Component Kit Javajet Component Kit Return variables availability (1.51+) deprecates After Variables. Javajet AVAILABILITY can be : AFTER : set after component finished. FLOW : changed on row level. Component Kit Return variables can be nested as below: Javajet Component Kit or Previously, the execution of a custom component was described through several Javajet files: _begin.javajet, containing the code required to initialize the component. _main.javajet, containing the code required to process each line of the incoming data. _end.javajet, containing the code required to end the processing and go to the following step of the execution. With the Component Kit, the entire execution flow of a component is described through its main Java class (for example, LoggerProcessor) and through services for reusable parts. Each type of component has its own execution logic. The same basic logic is applied to all components of the same type, and is then extended to implement each component specificities. The project generated from the starter already contains the basic logic for each component. Talend Component Kit framework relies on several primitive components. All components can use @PostConstruct and @PreDestroy annotations to initialize or release some underlying resource at the beginning and the end of a processing. In distributed environments, class constructor are called on cluster manager nodes. Methods annotated with @PostConstruct and @PreDestroy are called on worker nodes. Thus, partition plan computation and pipeline tasks are performed on different nodes. All the methods managed by the framework must be public. Private methods are ignored. The framework is designed to be as declarative as possible but also to stay extensible by not using fixed interfaces or method signatures. This allows to incrementally add new features of the underlying implementations. To ensure that the Cloud-compatible approach of the Component Kit framework is respected, some changes were introduced on the implementation side, including: The File mode is no longer supported. You can still work with URIs and remote storage systems to use files. The file collection must be handled at the component implementation level. The input and output connections between two components can only be of the Flow or Reject types. Other types of connections are not supported. Every Output component must have a corresponding Input component and use a dataset. All datasets must use a datastore. To get started with the Component Kit framework, you can go through the following documents: learn the basics about Talend Component Kit Create and deploy your first Component Kit component learn about the Starter Start implementing components Integrate a component to Talend Studio Check some examples of components built with Talend Component Kit

Component execution logic Learn how components are executed PostConstruct PreDestroy BeforeGroup AfterGroup

Each type of component has its own execution logic. The same basic logic is applied to all components of the same type, and is then extended to implement each component specificities. The project generated from the starter already contains the basic logic for each component. Talend Component Kit framework relies on several primitive components. All components can use @PostConstruct and @PreDestroy annotations to initialize or release some underlying resource at the beginning and the end of a processing. In distributed environments, class constructor are called on cluster manager nodes. Methods annotated with @PostConstruct and @PreDestroy are called on worker nodes. Thus, partition plan computation and pipeline tasks are performed on different nodes. All the methods managed by the framework must be public. Private methods are ignored. The framework is designed to be as declarative as possible but also to stay extensible by not using fixed interfaces or method signatures. This allows to incrementally add new features of the underlying implementations.

Beam testing Learn how to test components in Beam test Beam Big Data testing

If you want to make sure that your component works in Beam and don’t want to use Spark, you can try with the Direct Runner. Check beam.apache.org/contribute/testing/ for more details.

Creating plugins Learn how to create plugins for your components ContainerManager ContainerListener plugin listener registration extension

The entry point of the API is the ContainerManager. It allows you to define what is the Shared classloader and to create children: how to resolve dependencies for plugins from the plugin file/location how to configure the classloaders (what is the parent classloader, how to handle the parent first/last delegation, and so on). It is recommended to keep the manager running if you can reuse plugins in order to avoid recreating classloaders and to mutualize them. What the shared classloader is Which classes are loaded from the shared loader first (intended to be used for API which should not be loaded from the plugin loader) Which classes are loaded from the parent classloader. This can be useful to prevent loading a "common" library from the parent classloader. For instance, it can be neat for guava, commons-lang3, an so on). Once you have defined a manager, you can create plugins: To create the plugin container, the Resolver resolves the dependencies needed for the plugin, then the manager creates the plugin classloader and registers the plugin Container. Some actions are needed when a plugin is registered or unregistered. For that purpose, you can use ContainerListener: Plugins are directly registered on the manager:

Wrapping a Beam I/O Learn how to wrap Beam inputs and outputs Beam input output

This part is limited to specific kinds of Beam PTransform: PTransform> for inputs. PTransform, PDone> for outputs. Outputs must use a single (composite or not) DoFn in their apply method. To illustrate the input wrapping, this procedure uses the following input as a starting point (based on existing Beam inputs): To wrap the Read in a framework component, create a transform delegating to that Read with at least a @PartitionMapper annotation and using @Option constructor injections to configure the component. Also make sure to follow the best practices and to specify @Icon and @Version. To illustrate the output wrapping, this procedure uses the following output as a starting point (based on existing Beam outputs): You can wrap this output exactly the same way you wrap an input, but using @Processor instead of: Note that the org.talend.sdk.component.runtime.beam.transform.DelegatingTransform class fully delegates the "expansion" to another transform. Therefore, you can extend it and implement the configuration mapping: In terms of classloading, when you write an I/O, the Beam SDK Java core stack is assumed as provided in Talend Component Kit runtime. This way, you don’t need to include it in the compile scope, it would be ignored anyway. If you need a JSonCoder, you can use the org.talend.sdk.component.runtime.beam.factory.service.PluginCoderFactory service, which gives you access to the JSON-P and JSON-B coders. There is also an Avro coder, which uses the FileContainer. It ensures it is self-contained for IndexedRecord and it does not require—as the default Apache Beam AvroCoder—to set the schema when creating a pipeline. It consumes more space and therefore is slightly slower, but it is fine for DoFn, since it does not rely on serialization in most cases. See org.talend.sdk.component.runtime.beam.transform.avro.IndexedRecordCoder. If your PCollection is made of JsonObject records, and you want to convert them to IndexedRecord, you can use the following PTransforms: converts an IndexedRecord to a JsonObject. converts a JsonObject to an IndexedRecord. converts a JsonObject to an IndexedRecord with AVRO schema inference. There are two main provided coder for Record: it will unwrap the record as an Avro IndexedRecord and serialize it with its schema. This can indeed have a performance impact but, due to the structure of component, it will not impact the runtime performance in general - except with direct runner - because the runners will optimize the pipeline accurately. it will serialize the Avro IndexedRecord as well but it will ensure the schema is in the SchemaRegistry to be able to deserialize it when needed. This implementation is faster but the default implementation of the registry is "in memory" so will only work with a single worker node. You can extend it using Java SPI mecanism to use a custom distributed implementation. Sample input based on Beam Kafka: Because the Beam wrapper does not respect the standard Talend Component Kit programming model ( for example, there is no @Emitter), you need to set the false property in your pom.xml file (or equivalent for Gradle) to skip the component programming model validations of the framework.

Integrating components into Talend Cloud Integrate components into Talend Cloud and learn about the component server web component server component-server cloud

learn about the Component Server with the following articles: Component server and HTTP API

Talend Component Kit Overview Learn the basic concepts of the Talend Component Kit framework framework

Talend Component Kit is a toolkit based on Java and designed to simplify the development of components at two levels: Runtime: Runtime is about injecting the specific component code into a job or pipeline. The framework helps unify as much as possible the code required to run in Data Integration (DI) and BEAM environments. Graphical interface: The framework helps unify the code required to be able to render the component in a browser (web) or in the Eclipse-based Studio (SWT). The Talend Component Kit framework is made of several tools designed to help you during the component development process. It allows to develop components that fit in both Java web UIs. Starter: Generate the skeleton of your development project using a user-friendly interface. The Talend Component Kit Starter is available as a web tool or as a plugin for the IntelliJ IDE. Component API: Check all classes available to implement components. Build tools: The framework comes with Maven and Gradle wrappers, which allow to always use the version of Maven or Gradle that is right for your component development environment and version. Testing tools: Test components before integrating them into Talend Studio or Cloud applications. Testing tools include the Talend Component Kit Web Tester, which allows to check the web UI of your components on your local machine. You can find more details about the framework design in this document. The Talend Component Kit project is available on GitHub in the following repository

Testing in a Continuous Integration environment Learn how to test components in a continuous integration environment tutorial example test CI continuous integration testing

This tutorial shows how to adapt the test configuration of the Zendesk search component that was done in this previous tutorial to make it work in a Continuous Integration environment. In the test, the Zendesk credentials are used directly in the code to perform a first capture of the API response. Then, fake credentials are used in the simulation mode because the real API is not called anymore. However, in some cases, you can require to continue calling the real API on a CI server or on a specific environment. To do that, you can adapt the test to get the credentials depending on the execution mode (simulation/passthrough). These instructions concern the CI server or on any environment that requires real credentials. This tutorial uses: A Maven server that supports password encryption as a credential provider. Encryption is optional but recommended. The MavenDecrypterRule test rule provided by the framework. This rule lets you get credentials from Maven settings using a server ID. To create encrypted server credentials for the Zendesk instance: Create a master password using the command: mvn --encrypt-master-password . Store this master password in the settings-security.xml file of the ~/.m2 folder. Encrypt the Zendesk instance password using the command: mvn --encrypt-password . Create a server entry under servers in Maven settings.xml file located in the ~/.m2 folder. You can store the settings-security.xml and settings.xml files elsewhere that the default location (~/.m2). To do that, set the path of the directory containing the files in the talend.maven.decrypter.m2.location environment variable. Add the MavenDecrypterRule rule to the test class. This rule allows to inject server information stored in Maven settings.xml file to the test. The rule also decrypts credentials if they are encrypted. Inject the Zendesk server to the test. To do that, add a new field to the class with the @DecryptedServer annotation, that holds the server ID to be injected. The MavenDecrypterRule is able to inject the server instance into this class at runtime. The server instance contains the username and the decrypted password. Use the server instance in the test to get the real credentials in a secured manner. Once modified, the complete test class looks as follows: This test will continue to work in simulation mode, because the API simulation proxy is activated. This tutorial shows how to set up a CI server in passthrough mode using Jenkins. Log in to Jenkins. Click New Item to create a new build job. Enter an Item name (Job name) and choose the freestyle job. Then click OK. In the Source Code Management section, enter your project repository URL. A GitHub repository is used in this tutorial. Specify the master branch as Branches to build. In the Build section, click Add build step and choose Invoke top-level Maven targets. Choose your Maven version and enter the Maven build command. In this case: clean install. Then, click Save. The -Dtalend.junit.http.passthrough=true option is part of the build command. This option tells the API simulation proxy to run in passthrough mode. This way, all the HTTP requests made in the test are forwarded to the real API server. The MavenDecrypterRule rule allows to get the real credentials. You can configure the passthrough mode globally on your CI server by setting the talend.junit.http.passthrough environment variable to true. Test the job by selecting Build now, and check that the job has built correctly. Now your tests run in a simulation mode on your development environment and in a passthrough mode on your CI server.

Testing a REST API Learn how to test a component that consumes a REST API through this tutorial tutorial example REST API zendesk test testing

Testing code that consumes REST APIs can sometimes present many constraints: API rate limit, authentication token and password sharing, API availability, sandbox expiration, API costs, and so on. As a developer, it becomes critical to avoid those constraints and to be able to easily mock the API response. The component framework provides an API simulation tool that makes it easy to write unit tests. This tutorial shows how to use this tool in unit tests. As a starting point, the tutorial uses the component that consumes Zendesk Search API and that was created in a previous tutorial. The goal is to add unit tests for it. For this tutorial, four tickets that have the open status have been added to the Zendesk test instance used in the tests. To learn more about the testing methodology used in this tutorial, refer to Component JUnit testing. Create a unit test that performs a real HTTP request to the Zendesk Search API instance. You can learn how to create a simple unit test in this tutorial. the authentication configuration using Zendesk instance URL and credentials. the search query configuration to get all the open ticket, ordered by creation date and sorted in descending order. The test is now complete and working. It performs a real HTTP request to the Zendesk instance. As an alternative, you can use mock results to avoid performing HTTP requests every time on the development environment. The real HTTP requests would, for example, only be performed on an integration environment. To transform the unit test into a mocked test that uses a mocked response of the Zendesk Search API: Add the two following JUnit rules provided by the component framework. JUnit4HttpApi: This rule starts a simulation server that acts as a proxy and catches all the HTTP requests performed in the tests. This simulation server has two modes : capture : This mode forwards the captured HTTP request to the real server and captures the response. simulation : this mode returns a mocked response from the responses already captured. This rule needs to be added as a class rule. JUnit4HttpApi: This rule has a reference to the first rule. Its role is to configure the simulation server for every unit test. It passes the context of the running test to the simulation server. This rule needs to be added as a simple (method) rule. Example to run in a simulation mode: Make the test run in capture mode to catch the real API responses that can be used later in the simulated mode. To do that, set a new talend.junit.http.capture environment variable to true. This tells the simulation server to run in a capture mode. The captured response is saved in the resources/talend.testing.http package in a JSON format, then reused to perform the API simulation.

Component server and HTTP API Learn about Talend Component Kit HTTP API and the component server REST API component-server

The HTTP API intends to expose most Talend Component Kit features over HTTP. It is a standalone Java HTTP server. The WebSocket protocol is activated for the endpoints. Endpoints then use /websocket/v1 as base instead of /api/v1. See WebSocket for more details. Browse the API description using interface. To make sure that the migration can be enabled, you need to set the version the component was created with in the execution configuration that you send to the server (component version is in component the detail endpoint). To do that, use tcomp::component::version key. Endpoints that are intended to disappear will be deprecated. A X-Talend-Warning header will be returned with a message as value. You can connect yo any endpoint by: Replacing /api with /websocket Appending / to the URL Formatting the request as: For example: The response is formatted as follows: All endpoints are logged at startup. You can then find them in the logs if you have a doubt about which one to use. If you don’t want to create a pool of connections per endpoint/verb, you can use the bus endpoint: /websocket/v1/bus. This endpoint requires that you add the destinationMethod header to each request with the verb value (GET by default): the configuration is read from system properties, environment variables, …. Default value: 1000. Maximum items a cache can store, used for index endpoints. A comma separated list of gav to locate the components Default value: ${home}/documentations. A component translation repository. This is where you put your documentation translations. Their name must follow the pattern documentation_${container-id}_language.adoc where ${container-id} is the component jar name (without the extension and version, generally the artifactId). Default value: true. Should the component extensions add required dependencies. If you deploy some extension, where they can create their dependencies if needed. Default value: 180000. Timeout for extension initialization at startup, since it ensures the startup wait extensions are ready and loaded it allows to control the latency it implies. A property file (or multiple comma separated) where the value is a gav of a component to register(complementary with coordinates). Note that the path can end up with or .properties to take into account all properties in a folder. Default value: true. Should the /documentation endpoint be activated. Note that when called on localhost the doc is always available. Default value: true. Should the /api/v1/environment endpoint be activated. It shows some internal versions and git commit which are not always desirable over the wire. Default value: false. Should the components using a @GridLayout support tab translation. Studio does not suppot that feature yet so this is not enabled by default. Default value: icons/%s.svg,icons/svg/%s.svg,icons/%s_icon32.png,icons/png/%s_icon32.png. These patterns are used to find the icons in the classpath(s). Default value: light. Icon default theme (light/dark). Default value: true. Do we support legacy (not themed) icons. If true, lookup will be done if not themed icon found. Default value: true. Do we support icons theme. Default value: false. If set it will replace any message for exceptions. Set to false to use the actual exception message. Default value: false. Should the lastUpdated timestamp value of /environment endpoint be updated with server start time. Default value: en*=en fr*=fr zh*=zh_CN ja*=ja de*=de. For caching reasons the goal is to reduce the locales to the minimum required numbers. For instance we avoid fr and fr_FR which would lead to the same entries but x2 in terms of memory. This mapping enables that by whitelisting allowed locales, default being en. If the key ends with it means all string starting with the prefix will match. For instance fr will match fr_FR but also fr_CA. The local maven repository used to locate components and their dependencies Default value: false. Should the plugins be un-deployed and re-deployed. Default value: 600. Interval in seconds between each check if plugins re-loading is enabled. Specify a file to check its timestamp on the filesystem. This file will take precedence of the default ones provided by the talend.component.server.component.registry property (used for timestamp method). Default value: timestamp. Re-deploy method on a timestamp or connectors version change. By default, the timestamp is checked on the file pointed by talend.component.server.component.registry or talend.component.server.plugins.reloading.marker variable, otherwise we inspect the content of the CONNECTORS_VERSION file. Accepted values: timestamp, anything else defaults to connectors. Default value: false. Should the all requests/responses be logged (debug purposes - only work when running with CXF). Default value: securityNoopHandler. How to validate a command/request. Accepted values: securityNoopHandler. Default value: securityNoopHandler. How to validate a connection. Accepted values: securityNoopHandler. A folder available for the server - don’t forget to mount it in docker if you are using the image - which accepts subfolders named as component plugin id (generally the artifactId or jar name without the version, ex: jdbc). Each family folder can contain: a user-configuration.properties file which will be merged with component configuration system (see services). This properties file enables the function userJar(xxxx) to replace the jar named xxxx by its virtual gav (groupId:artifactId:version), a list of jars which will be merged with component family classpath Default value: auto. Should the implicit artifacts be provisionned to a m2. If set to auto it tries to detect if there is a m2 to provision - recommended, if set to skip it is ignored, else it uses the value as a m2 path. The configuration uses Microprofile Config for most entries. It means it can be passed through system properties and environment variables (by replacing dots with underscores and making the keys uppercase). To configure a Docker image rather than a standalone instance, Docker Config and secrets integration allows you to read the configuration from files. You can customize the configuration of these integrations through system properties. Docker integration provides a secure: support to encrypt values and system properties, when required. It is fully implemented using the Apache Geronimo Microprofile Config extensions. Using the server ZIP (or Docker image), you can configure HTTPS by adding properties to _JAVA_OPTIONS. Assuming that you have a certificate in /opt/certificates/component.p12 (don’t forget to add/mount it in the Docker image if you use it), you can activate it as follows: You can define simple queries on the configuration types and components endpoints. These two endpoints support different parameters. Queries on the configurationtype/index endpoint supports the following parameters: type id name metadata of the first configuration property as parameters. Queries on the component/index endpoint supports the following parameters: plugin name id familyId metadata of the first configuration property as parameters. In both cases, you can combine several conditions using OR and AND operators. If you combine more than two conditions, note that they are evaluated in the order they are written. Each supported parameter in a condition can be "equal to" (=) or "not equal to" (!=) a defined value (case-sensitive). For example: In this example, the query gets components that have a dataset and belong to the jdbc-component plugin, or components that are named input. The component-form library provides a way to build a component REST API facade that is compatible with React form library. for example: the Client can be created using ClientFactory.createDefault(System.getProperty("app.components.base", "http://localhost:8080/api/v1")) and the service can be a simple new UiSpecService<>(). The factory uses JAX-RS if the API is available (assuming a JSON-B provider is registered). Otherwise, it tries to use Spring. The conversion from the component model (REST API) to the uiSpec model is done through UiSpecService. It is based on the object model which is mapped to a UI model. Having a flat model in the component REST API allows to customize layers easily. You can completely control the available components, tune the rendering by switching the uiSchema, and add or remove parts of the form. You can also add custom actions and buttons for specific needs of the application. The /migrate endpoint was not shown in the previous snippet but if you need it, add it as well. This Maven dependency provides the UISpec model classes. You can use the Ui API (with or without the builders) to create UiSpec representations. For Example: The model uses the JSON-B API to define the binding. Make sure to have an implementation in your classpath. To do that, add the following dependencies: The following module enables you to define through annotations a uispec on your own models: this can’t be used in components and is only intended for web applications. org.talend.sdk.component.form.uispec.mapper.api.service.UiSpecMapper enables to create a Ui instance from a custom type annotated with org.talend.sdk.component.form.uispec.mapper.api.model.View and org.talend.sdk.component.form.uispec.mapper.api.model.View.Schema. UiSpecMapper returns a Supplier and not directly an Ui because the ui-schema is re-evaluated when `get()̀ is called. This enables to update the title maps for example. Here is an example: This API maps directly the UiSpec model (json schema and ui schema of Talend UIForm). The default implementation of the mapper is available at org.talend.sdk.component.form.uispec.mapper.impl.UiSpecMapperImpl. Here is an example: The getTitleMapProviders() method will generally lookup a set of TitleMapProvider instances in your IoC context. This API is used to fill the titleMap of the form when a reference identifier is set on the @Schema annotation. component-kit.js is no more available (previous versions stay on NPM) and is replaced by @talend/react-containers. The previous import can be replaced by import kit from '@talend/react-containers/lib/ComponentForm/kit';. Default JavaScript integration goes through the Talend UI Forms library and its Containers wrapper. Documentation is now available on the previous link. The logging uses Log4j2. You can specify a custom configuration by using the -Dlog4j.configurationFile system property or by adding a log4j2.xml file to the classpath. Here are some common configurations: Console logging: Output messages look like: JSON logging: Output messages look like: Rolling file appender: More details are available in the RollingFileAppender documentation. You can compose previous layout (message format) and appenders (where logs are written). The server image is deployed on Docker. Its version is suffixed with a timestamp to ensure images are not overridden and can break your usage. You can check the available version on Docker hub. You can run the docker image by executing this command : You can set the env variable _JAVA_OPTIONS to customize the server, by default it is installed in /opt/talend/component-kit. The maven repository is the default one of the machine, you can change it setting the system property talend.component.server.maven.repository=/path/to/your/m2. If you want to deploy some components you can configure which ones in _JAVA_OPTIONS (see server doc online) and redirect your local m2: The component server docker image comes with two log4j2 profiles: TEXT (default) and JSON. The logging profile can be changed by setting the environment variable LOGGING_LAYOUT to JSON. Note that Component Server adds to these default Talend profiles the KAFKA profile. With this profile, all logs are sent to Kafka. You can check the exact configuration in the component-runtime/images/component-server-image/src/main/resources folder. The console logging is on at INFO level by default. You can customize it by setting the CONSOLE_LOG_LEVEL environment variable to DEBUG, INFO, WARN or to any other log level supported by log4j2. Run docker image with console logging: The JSON profile logs on the console using the CONSOLE_LOG_LEVEL configuration as the default profile. Events are logged in the following format: This profile is very close to the JSON profile and also adds the LOG_KAFKA_TOPIC and LOG_KAFKA_URL configuration. The difference is that it logs the default logs on Kafka in addition to the tracing logs. You can register component server images in Docker using these instructions in the corresponding image directory: Docker Compose allows you to deploy the server with components, by mounting the component volume into the server image. docker-compose.yml example: If you want to mount it from another image, you can use this compose configuration: To run one of the previous compose examples, you can use docker-compose -f docker-compose.yml up. Only use the configuration related to port 5005 (in ports and the -agentlib option in _JAVA_OPTIONS) to debug the server on port 5005. Don’t set it in production. You can mount a volume in /opt/talend/component-kit/custom/ and the jars in that folder which will be deployed with the server. Since the server relies on CDI (Apache OpenWebBeans) you can use that technology to enrich it, including JAX-RS endpoints, interceptors etc…or just libraries needing to be in the JVM.

Defining datasets and datastores Learn how to define datasets and datastores for input and output components. datastore dataset validation input output studio studio-integration connection

Datasets and datastores are configuration types that define how and where to pull the data from. They are used at design time to create shared configurations that can be stored and used at runtime. All connectors (input and output components) created using Talend Component Kit must reference a valid dataset. Each dataset must reference a datastore. Datastore: The data you need to connect to the backend. Dataset: A datastore coupled with the data you need to execute an action. Make sure that: a datastore is used in each dataset. each dataset has a corresponding input component (mapper or emitter). This input component must be able to work with only the dataset part filled by final users. Any other property implemented for that component must be optional. These rules are enforced by the validateDataSet validation. If the conditions are not met, the component builds will fail. Make sure that: a datastore is used in each dataset. each dataset has a corresponding input component (mapper or emitter). This input component must be able to work with only the dataset part filled by final users. Any other property implemented for that component must be optional. These rules are enforced by the validateDataSet validation. If the conditions are not met, the component builds will fail. A datastore defines the information required to connect to a data source. For example, it can be made of: a URL a username a password. You can specify a datastore and its context of use (in which dataset, etc.) from the Component Kit Starter. Make sure to modelize the data your components are designed to handle before defining datasets and datastores in the Component Kit Starter. Once you generate and import the project into an IDE, you can find datastores under a specific datastore node. Example of datastore: A dataset represents the inbound data. It is generally made of: A datastore that defines the connection information needed to access the data. A query. You can specify a dataset and its context of use (in which input and output component it is used) from the Component Kit Starter. Make sure to modelize the data your components are designed to handle before defining datasets and datastores in the Component Kit Starter. Once you generate and import the project into an IDE, you can find datasets under a specific dataset node. Example of dataset referencing the datastore shown above: The display name of each dataset and datastore must be referenced in the message.properties file of the family package. The key for dataset and datastore display names follows a defined pattern: ${family}.${configurationType}.${name}._displayName. For example: These keys are automatically added for datasets and datastores defined from the Component Kit Starter. When deploying a component or set of components that include datasets and datastores to Talend Studio, a new node is created under Metadata. This node has the name of the component family that was deployed. It allows users to create reusable configurations for datastores and datasets. With predefined datasets and datastores, users can then quickly fill the component configuration in their jobs. They can do so by selecting Repository as Property Type and by browsing to the predefined dataset or datastore. Studio will generate connection and close components auto for reusing connection function in input and output components, just need to do like this example: Then the runtime mapper and processor only need to use @Connection to get the connection like this: The component server scans all configuration types and returns a configuration type index. This index can be used for the integration into the targeted platforms (Studio, web applications, and so on). Mark a model (complex object) as being a dataset. API: @org.talend.sdk.component.api.configuration.type.DataSet Sample: Mark a model (complex object) as being a datastore (connection to a backend). API: @org.talend.sdk.component.api.configuration.type.DataStore Sample: Mark a model (complex object) as being a dataset discovery configuration. API: @org.talend.sdk.component.api.configuration.type.DatasetDiscovery Sample: The component family associated with a configuration type (datastore/dataset) is always the one related to the component using that configuration. The configuration type index is represented as a flat tree that contains all the configuration types, which themselves are represented as nodes and indexed by ID. Every node can point to other nodes. This relation is represented as an array of edges that provides the child IDs. As an illustration, a configuration type index for the example above can be defined as follows:

Secrets/Passwords and Maven Learn how to reuse Maven server files and credentials for testing purposes continuous integration testing password maven credentials

You can reuse Maven settings.xml server files, including the encrypted ones. org.talend.sdk.component.maven.MavenDecrypter allows yo to find a username/password from a server identifier: It is very useful to avoid storing secrets and to perform tests on real systems on a continuous integration platform. Even if you do not use Maven on the platform, you can generate the settings.xml and`settings-security.xml` files to use that feature. See maven.apache.org/guides/mini/guide-encryption.html for more details.

Testing best practices Learn the best practices for testing components developed with Talend Component Kit test best practices testing

This section mainly concerns tools that can be used with JUnit. You can use most of these best practices with TestNG as well. Parameterized tests are a great solution to repeat the same test multiple times. This method of testing requires defining a test scenario (I test function F) and making the input/output data dynamic. Here is a test example, which validates a connection URI using ConnectionService: The testing method is always the same. Only values are changing. It can therefore be rewritten using JUnit Parameterized runner, as follows: You don’t have to define a single @Test method. If you define multiple methods, each of them is executed with all the data. For example, if another test is added to the previous example, four tests are executed - 2 per data). With JUnit 5, parameterized tests are easier to use. The full documentation is available at junit.org/junit5/docs/current/user-guide/#writing-tests-parameterized-tests. The main difference with JUnit 4 is that you can also define inline that the test method is a parameterized test as well as the values to use: However, you can still use the previous behavior with a method binding configuration: This last option allows you to inject any type of value - not only primitives - which is common to define scenarios. Add the junit-jupiter-params dependency to benefit from this feature.

Testing on multiple environments Learn how to test components on multiple environments with Talend Component Kit test Junit Beam testing

JUnit (4 or 5) already provides ways to parameterize tests and execute the same "test logic" against several sets of data. However, it is not very convenient for testing multiple environments. For example, with Beam, you can test your code against multiple runners. But it requires resolving conflicts between runner dependencies, setting the correct classloaders, and so on. To simplify such cases, the framework provides you a multi-environment support for your tests, through the JUnit module, which works with both JUnit 4 and JUnit 5. The MultiEnvironmentsRunner executes the tests for each defined environments. With the example above, it means that it runs test1 for Env1 and Env2. By default, the JUnit4 runner is used to execute the tests in one environment, but you can use @DelegateRunWith to use another runner. The multi-environment configuration with JUnit 5 is similar to JUnit 4: The main differences are that no runner is used because they do not exist in JUnit 5, and that you need to replace @Test by @EnvironmentalTest. With JUnit5, tests are executed one after another for all environments, while tests are ran sequentially in each environments with JUnit 4. For example, this means that @BeforeAll and @AfterAll are executed once for all runners. The provided environment sets the contextual classloader in order to load the related runner of Apache Beam. Package: org.talend.sdk.component.junit.environment.builtin.beam the configuration is read from system properties, environment variables, …. _class: ContextualEnvironment. _class: DirectRunnerEnvironment. _class: FlinkRunnerEnvironment. _class: SparkRunnerEnvironment. If the environment extends BaseEnvironmentProvider and therefore defines an environment name - which is the case of the default ones - you can use EnvironmentConfiguration to customize the system properties used for that environment: If you set the .skip system property to true, the environment-related executions are skipped. This usage assumes that Beam 2.4.0 or later is used. The following dependencies bring the JUnit testing toolkit, the Beam integration and the multi-environment testing toolkit for JUnit into the test scope. Dependencies: Using the fluent DSL to define jobs, you can write a test as follows: Your job must be linear and each step must send a single value (no multi-input or multi-output). It executes the chain twice: With a standalone environment to simulate the Studio. With a Beam (direct runner) environment to ensure the portability of your job.

Generating data Learn how to generate data for testing components developed with Talend Component Kit test generate data testing

Several data generators exist if you want to populate objects with a semantic that is more evolved than a plain random string like commons-lang3: github.com/Codearte/jfairy github.com/DiUS/java-faker github.com/andygibson/datafactory etc. Even more advanced, the following generators allow to directly bind generic data on a model. However, data quality is not always optimal: github.com/devopsfolks/podam github.com/benas/random-beans etc. There are two main kinds of implementation: Implementations using a pattern and random generated data. Implementations using a set of precomputed data extrapolated to create new values. Check your use case to know which one fits best. An alternative to data generation can be to import real data and use Talend Studio to sanitize the data, by removing sensitive information and replacing it with generated or anonymized data. Then you just need to inject that file into the system. If you are using JUnit 5, you can have a look at glytching.github.io/junit-extensions/randomBeans.

Testing components Learn how to test your component logic in the environment you need using Talend Component Kit test overview environment beam runtime testing

Developing new components includes testing them in the required execution environments. Use the following articles to learn about the best practices and the available options to fully test your components. Component testing best practices Component testing kit Beam testing Testing in multiple environments Reusing Maven credentials Generating data for testing Simple/Test Pipeline API Beam Pipeline API

Getting started with Talend Component Kit Learn the basics about Talend Component Kit framework and get ready to create new components quickstart overview principle description

Talend Component Kit is a Java framework designed to simplify the development of components at two levels: The Runtime, that injects the specific component code into a job or pipeline. The framework helps unifying as much as possible the code required to run in Data Integration (DI) and BEAM environments. The Graphical interface. The framework helps unifying the code required to render the component in a browser or in the Eclipse-based Talend Studio (SWT). Most part of the development happens as a Maven or Gradle project and requires a dedicated tool such as IntelliJ. The Component Kit is made of: A Starter, that is a graphical interface allowing you to define the skeleton of your development project. APIs to implement components UI and runtime. Development tools: Maven and Gradle wrappers, validation rules, packaging, Web preview, etc. A testing kit based on JUnit 4 and 5. By using this tooling in a development environment, you can start creating components as described below. Developing new components using the Component Kit framework includes: Creating a project using the starter or the Talend IntelliJ plugin. This step allows to build the skeleton of the project. It consists in: Defining the general configuration model for each component in your project. Generating and downloading the project archive from the starter. Compiling the project. Importing the compiled project in your IDE. This step is not required if you have generated the project using the IntelliJ plugin. Implementing the components, including: Registering the components by specifying their metadata: family, categories, version, icon, type and name. Defining the layout and configurable part of the components. Defining the execution logic of the components, also called runtime. Testing the components. Deploying the components to Talend Studio or Cloud applications. Optionally, you can use services. Services are predefined or user-defined configurations that can be reused in several components. There are four types of components, each type coming with its specificities, especially on the runtime side. Input components: Retrieve the data to process from a defined source. An input component is made of: The execution logic of the component, represented by a Mapper or an Emitter class. The source logic of the component, represented by a Source class. The layout of the component and the configuration that the end-user will need to provide when using the component, defined by a Configuration class. All input components must have a dataset specified in their configuration, and every dataset must use a datastore. Processors: Process and transform the data. A processor is made of: The execution logic of the component, describing how to process each records or batches of records it receives. It also describes how to pass records to its output connections. This logic is defined in a Processor class. The layout of the component and the configuration that the end-user will need to provide when using the component, defined by a Configuration class. Output components: Send the processed data to a defined destination. An output component is made of: The execution logic of the component, describing how to process each records or batches of records it receives. This logic is defined in an Output class. Unlike processors, output components are the last components of the execution and return no data. The layout of the component and the configuration that the end-user will need to provide when using the component, defined by a Configuration class. All input components must have a dataset specified in their configuration, and every dataset must use a datastore. Standalone components: Make a call to the service or run a query on the database. A standalone component is made of: The execution logic of the component, represented by a DriverRunner class. The layout of the component and the configuration that the end-user will need to provide when using the component, defined by a Configuration class. All input components must have a datastore or dataset specified in their configuration, and every dataset must use a datastore. The following example shows the different classes of an input components in a multi-component development project: Setup your development environment Generate your first project and develop your first component

Masking sensitive data in your configuration Learn how to mark sensitive data such as credentials when developing components using Talend Component Kit tutorial example credentials password

This tutorial shows how to correctly mask the sensitive data of a component configuration. It is very common to define credentials when configuring a component. Most common cases can include passwords, secrets, keys (it is also common to show them in plain text in a textarea), and tokens. For example, this REST client configuration specifies that a username, a password and a token are needed to connect to the REST API: This configuration defines that these credentials are three simple String, represented as plain inputs, which causes severe security concerns: The password and token are clearly readable in all Talend user interfaces (Studio or Web), The password and token are potentially stored in clear. To avoid this behavior, you need to mark sensitive data as @Credential. Talend Component Kit provides you with the @Credential marker, that you can use on any @Option. This marker has two effects: It Replaces the default input widget by a password oriented widget It Requests the Studio or the Talend Cloud products to store the data as sensitive data (as encrypted values). In order to ensure that the password and token are never stored in clear or shown in the code, add the @Credential marker to the sensitive data. For example: Your password and token (or any other sensitive data that you need to mask) are not accessible by error anymore.

Setting up your environment Learn about the prerequisites and tools you need to install to develop components using Talend Component Kit install installation setup requirements tool

Before being able to develop components using Talend Component Kit, you need the right system configuration and tools. Although Talend Component Kit comes with some embedded tools, such as Maven and Gradle wrappers, you still need to prepare your system. A Talend Component Kit plugin for IntelliJ is also available and allows to design and generate your component project right from IntelliJ. System requirements Installing the IntelliJ plugin

Generating a project using the Component Kit Starter Learn how to define the basic configuration of a component using the Talend Component Kit Starter to start your project tutorial example starter

The Component Kit Starter lets you design your components configuration and generates a ready-to-implement project structure. The Starter is available on the web or as an IntelliJ plugin. This tutorial shows you how to use the Component Kit Starter to generate new components for MySQL databases. Before starting, make sure that you have correctly setup your environment. See this section. When defining a project using the Starter, do not refresh the page to avoid losing your configuration. Before being able to create components, you need to define the general settings of the project: Create a folder on your local machine to store the resource files of the component you want to create. For example, C:/my_components. Open the Starter in the web browser of your choice. Select your build tool. This tutorial uses Maven, but you can select Gradle instead. Add any facet you need. For example, add the Talend Component Kit Testing facet to your project to automatically generate unit tests for the components created in the project. Enter the Component Family of the components you want to develop in the project. This name must be a valid java name and is recommended to be capitalized, for example 'MySQL'. Once you have implemented your components in the Studio, this name is displayed in the Palette to group all of the MySQL-related components you develop, and is also part of your component name. Select the Category of the components you want to create in the current project. As MySQL is a kind of database, select Databases in this tutorial. This Databases category is used and displayed as the parent family of the MySQL group in the Palette of the Studio. Complete the project metadata by entering the Group, Artifact and Package. By default, you can only create processors. If you need to create Input or Output components, select Activate IO. By doing this: Two new menu entries let you add datasets and datastores to your project, as they are required for input and output components. Input and Output components without dataset (itself containing a datastore) will not pass the validation step when building the components. learn more about datasets and datastores in this document. An Input component and an Output component are automatically added to your project and ready to be configured. Components added to the project using Add A Component can now be processors, input or output components. A datastore represents the data needed by an input or output component to connect to a database. When building a component, the validateDataSet validation checks that each input or output (processor without output branch) component uses a dataset and that this dataset has a datastore. You can define one or several datastores if you have selected the Activate IO step. Select Datastore. The list of datastores opens. By default, a datastore is already open but not configured. You can configure it or create a new one using Add new Datastore. Specify the name of the datastore. Modify the default value to a meaningful name for your project. This name must be a valid Java name as it will represent the datastore class in your project. It is a good practice to start it with an uppercase letter. Edit the datastore configuration. Parameter names must be valid Java names. Use lower case as much as possible. A typical configuration includes connection details to a database: url username password. Save the datastore configuration. A dataset represents the data coming from or sent to a database and needed by input and output components to operate. The validateDataSet validation checks that each input or output (processor without output branch) component uses a dataset and that this dataset has a datastore. You can define one or several datasets if you have selected the Activate IO step. Select Dataset. The list of datasets opens. By default, a dataset is already open but not configured. You can configure it or create a new one using the Add new Dataset button. Specify the name of the dataset. Modify the default value to a meaningful name for your project. This name must be a valid Java name as it will represent the dataset class in your project. It is a good practice to start it with an uppercase letter. Edit the dataset configuration. Parameter names must be valid Java names. Use lower case as much as possible. A typical configuration includes details of the data to retrieve: Datastore to use (that contains the connection details to the database) table name data Save the dataset configuration. To create an input component, make sure you have selected Activate IO. When clicking Add A Component in the Starter, a new step allows you to define a new component in your project. The intent in this tutorial is to create an input component that connects to a MySQL database, executes a SQL query and gets the result. Choose the component type. Input in this case. Enter the component name. For example, MySQLInput. Click Configuration model. This button lets you specify the required configuration for the component. By default, a dataset is already specified. For each parameter that you need to add, click the (+) button on the right panel. Enter the parameter name and choose its type then click the tick button to save the changes. In this tutorial, to be able to execute a SQL query on the Input MySQL database, the configuration requires the following parameters:+ a dataset (which contains the datastore with the connection information) a timeout parameter. Closing the configuration panel on the right does not delete your configuration. However, refreshing the page resets the configuration. Specify whether the component issues a stream or not. In this tutorial, the MySQL input component created is an ordinary (non streaming) component. In this case, leave the Stream option disabled. Select the Record Type generated by the component. In this tutorial, select Generic because the component is designed to generate records in the default Record format. You can also select Custom to define a POJO that represents your records. Your input component is now defined. You can add another component or generate and download your project. When clicking Add A Component in the Starter, a new step allows you to define a new component in your project. The intent in this tutorial is to create a simple processor component that receives a record, logs it and returns it at it is. If you did not select Activate IO, all new components you add to the project are processors by default. If you selected Activate IO, you can choose the component type. In this case, to create a Processor component, you have to manually add at least one output. If required, choose the component type: Processor in this case. Enter the component name. For example, RecordLogger, as the processor created in this tutorial logs the records. Specify the Configuration Model of the component. In this tutorial, the component doesn’t need any specific configuration. Skip this step. Define the Input(s) of the component. For each input that you need to define, click Add Input. In this tutorial, only one input is needed to receive the record to log. Click the input name to access its configuration. You can change the name of the input and define its structure using a POJO. If you added several inputs, repeat this step for each one of them. The input in this tutorial is a generic record. Enable the Generic option and click Save. Define the Output(s) of the component. For each output that you need to define, click Add Output. The first output must be named MAIN. In this tutorial, only one generic output is needed to return the received record. Outputs can be configured the same way as inputs (see previous steps). You can define a reject output connection by naming it REJECT. This naming is used by Talend applications to automatically set the connection type to Reject. Your processor component is now defined. You can add another component or generate and download your project. To create an output component, make sure you have selected Activate IO. When clicking Add A Component in the Starter, a new step allows you to define a new component in your project. The intent in this tutorial is to create an output component that receives a record and inserts it into a MySQL database table. Output components are Processors without any output. In other words, the output is a processor that does not produce any records. Choose the component type. Output in this case. Enter the component name. For example, MySQLOutput. Click Configuration Model. This button lets you specify the required configuration for the component. By default, a dataset is already specified. For each parameter that you need to add, click the (+) button on the right panel. Enter the name and choose the type of the parameter, then click the tick button to save the changes. In this tutorial, to be able to insert a record in the output MySQL database, the configuration requires the following parameters:+ a dataset (which contains the datastore with the connection information) a timeout parameter. Closing the configuration panel on the right does not delete your configuration. However, refreshing the page resets the configuration. Define the Input(s) of the component. For each input that you need to define, click Add Input. In this tutorial, only one input is needed. Click the input name to access its configuration. You can change the name of the input and define its structure using a POJO. If you added several inputs, repeat this step for each one of them. The input in this tutorial is a generic record. Enable the Generic option and click Save. Do not create any output because the component does not produce any record. This is the only difference between an output an a processor component. Your output component is now defined. You can add another component or generate and download your project. Once your project is configured and all the components you need are created, you can generate and download the final project. In this tutorial, the project was configured and three components of different types (input, processor and output) have been defined. Click Finish on the left panel. You are redirected to a page that summarizes the project. On the left panel, you can also see all the components that you added to the project. Generate the project using one of the two options available: Download it locally as a ZIP file using the Download as ZIP button. Create a GitHub repository and push the project to it using the Create on Github button. In this tutorial, the project is downloaded to the local machine as a ZIP file. Once the package is available on your machine, you can compile it using the build tool selected when configuring the project. In the tutorial, Maven is the build tool selected for the project. In the project directory, execute the mvn package command. If you don’t have Maven installed on your machine, you can use the Maven wrapper provided in the generated project, by executing the ./mvnw package command. If you have created a Gradle project, you can compile it using the gradle build command or using the Gradle wrapper: ./gradlew build. The generated project code contains documentation that can guide and help you implementing the component logic. Import the project to your favorite IDE to start the implementation. The Component Kit Starter allows you to generate a component development project from an OpenAPI JSON descriptor. Open the Starter in the web browser of your choice. Enable the OpenAPI mode using the toggle in the header. Go to the API menu. Paste the OpenAPI JSON descriptor in the right part of the screen. All the described endpoints are detected. Unselect the endpoints that you do not want to use in the future components. By default, all detected endpoints are selected. Go to the Finish menu. Download the project. When exploring the project generated from an OpenAPI descriptor, you can notice the following elements: sources the API dataset an HTTP client for the API a connection folder containing the component configuration. By default, the configuration is only made of a simple datastore with a baseUrl parameter.

Version compatibility Learn which version of Talend Component Kit you can use for your components to be compatible with the right version of your Talend applications. versions Studio studio-integration Cloud compatibility

You can integrate and start using components developed using Talend Component Kit in Talend applications very easily. As both the development framework and Talend applications evolve over time, you need to ensure compatibility between the components you develop and the versions of Talend applications that you are targeting, by making sure that you use the right version of Talend Component Kit. The version of Talend Component Kit you need to use to develop new components depends on the versions of the Talend applications in which these components will be integrated. Talend product Talend Component Kit version Talend Studio 8.8.8 (aka master) latest release Talend Studio 8.0.1 latest release QA approved Talend Studio 7.3.1 Framework until 1.38.x Talend Studio 7.2.1 Framework until 1.1.10 Talend Studio 7.1.1 Framework until 1.1.1 Talend Studio 7.0.1 Framework until 0.0.5 Talend Cloud latest release QA and cloud teams approved More recent versions of Talend Component Kit contain many fixes, improvements and features that help developing your components. However, they can cause some compatibility issues when deploying these components to older/different versions of Talend Studio and Talend Cloud. Choose the version of Talend Component Kit that best fits your needs. Creating a project using the Component Kit Starter always uses the latest release of Talend Component Kit. However, you can manually change the version of Talend Component Kit directly in the generated project. Go to your IDE and access the project root .pom file. Look for the org.talend.sdk.component dependency nodes. Replace the version in the relevant nodes with the version that you need to use for your project. You can use a Snapshot of the version under development using the -SNAPSHOT version and Sonatype snapshot repository.

Iterating on component development with Talend Studio How to install and configure components developed with Talend Component Kit in Talend Open Studio component server deploy install studio studio-integration car car-bundler version component-server debug

Integrate components you developed using Talend Component Kit to Talend Studio in a few steps. Also learn how to enable the developer and debugging modes to iterate on your component development. The version of Talend Component Kit you need to use to develop new components depends on the version of Talend Studio in which components will be integrated. Refer to this document to learn about compatibility between Talend Component Kit and the different versions of Talend applications. learn how to build and deploy components to Talend Studio using Maven or Gradle Talend Component Kit plugins. This can be done using the deploy-in-studio goal from your development environment. If you are unfamiliar with component development, you can also follow this example to go through the entire process, from creating a project to using your new component in Talend Studio. The Studio integration relies on the Component Server, that the Studio uses to gather data about components created using Talend Component Kit. You can change the default configuration of component server by modifying the $STUDIO_HOME/configuration/config.ini file. The following parameters are available: Name Description Default component.environment Enables the developer mode when set to dev - component.debounce.timeout Specifies the timeout (in milliseconds) before calling listeners in components Text fields 750 component.kit.skip If set to true, the plugin is not enabled. It is useful if you don’t have any component developed with the framework. false component.java.arguments Component server additional options - component.java.m2 Maven repository that the server uses to resolve components Defaults to the global Studio configuration component.java.coordinates A list of comma-separated GAV (groupId:artifactId:version) of components to register - component.java.registry A properties file with values matching component GAV (groupId:artifactId:version) registered at startup. Only use slashes (even on windows) in the path. - component.java.port Sets the port to use for the server random component.server.extensions A comma separated list of gav to locate the extensions. - components.server.beam.active Active, if set to true, Beam support (Experimental). It requires Beam SDK Java core dependencies to be available. false component.server.jul.forceConsole Adds a console handler to JUL to see logs in the console. This can be helpful in development because the formatting is clearer than the OSGi one in workspace/.metadata/.log. It uses the java.util.logging.SimpleFormatter.format property to define its format. By default, it is %1$tb %1$td, %1$tY %1$tl:%1$tM:%1$tS %1$Tp %2$s%n%4$s: %5$s%6$s%n, but for development purposes [%4$s] %5$s%6$s%n is simpler and more readable. false Here is an example of a common developer configuration/config.ini file: The developer mode is especially useful to iterate on your component development and to avoid closing and restarting Talend Studio every time you make a change to a component. It adds a Talend Component Kit button in the main toolbar: When clicking this button, all components developed with the Talend Component Kit framework are reloaded. The cache is invalidated and the components refreshed. You still need to add and remove the components to see the changes. To enable it, simply set the component.environment parameter to dev in the config.ini configuration file of the component server. Several methods allow you to debug custom components created with Talend Component Kit in Talend Studio. From your development tool, create a new Remote configuration, and copy the Command line arguments for running remote JVM field. For example, -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005, where: the suspend parameter of the -agentlib argument specifies whether you want to suspend the debugged JVM until the debugger attaches to it. Possible values are n (no, default value) or y (yes). the address parameter of the -agentlib argument is the port used for the remote configuration. Make sure this port is available. Open Talend Studio. Create a new Job that uses the component you want to debug or open an existing one that already uses it. Go to the Run tab of the Job and select Use specific JVM arguments. Click New to add an argument. In the popup window, paste the arguments copied from the IDE. Enter the corresponding debug mode: To debug the runtime, run the Job and access the remote host configured in the IDE. To debug the Guess schema option, click the Guess schema action button of the component and access the remote host configured in the IDE. From your development tool, create a new Remote configuration, and copy the Command line arguments for running remote JVM field. For example, -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005, where: suspend defines whether you need to access the defined configuration to run the remote JVM. Possible values are n (no, default value) or y (yes). address is the port used for the remote configuration. Make sure this port is available. Access the installation directory of your Talend Sutdio. Open the .ini file corresponding to your Operating System. For example, TOS_DI-win-x86_64.ini. Paste the arguments copied from the IDE in a new line of the file. Go to Talend Studio to use the component, and access the host host configured in the IDE. If you run multiple Studio instances automatically in parallel, you can run into some issues with the random port computation. For example on a CI platform. For that purpose, you can create the $HOME/.talend/locks/org.talend.sdk.component.studio-integration.lock file. Then, when a server starts, it acquires a lock on that file and prevents another server to get a port until it is started. It ensures that you can’t have two concurrent processes getting the same port allocated. However, it is highly unlikely to happen on a desktop. In that case, forcing a different value through component.java.port in your config.ini file is a better solution for local installations.

Defining a processor How to develop a processor component with Talend Component Kit component type processor output

A Processor is a component that converts incoming data to a different model. A processor must have a method decorated with @ElementListener taking an incoming data and returning the processed data: Processors must be Serializable because they are distributed components. If you just need to access data on a map-based ruleset, you can use Record or JsonObject as parameter type. From there, Talend Component Kit wraps the data to allow you to access it as a map. The parameter type is not enforced. This means that if you know you will get a SuperCustomDto, then you can use it as parameter type. But for generic components that are reusable in any chain, it is highly encouraged to use Record until you have an evaluation language-based processor that has its own way to access components. For example: A processor also supports @BeforeGroup and @AfterGroup methods, which must not have any parameter and return void values. Any other result would be ignored. These methods are used by the runtime to mark a chunk of the data in a way which is estimated good for the execution flow size. Because the size is estimated, the size of a group can vary. It is even possible to have groups of size 1. It is recommended to batch records, for performance reasons: You can optimize the data batch processing by using the maxBatchSize parameter. This parameter is automatically implemented on the component when it is deployed to a Talend application. Only the logic needs to be implemented. You can however customize its value setting in your LocalConfiguration the property _maxBatchSize.value - for the family - or ${component simple class name}._maxBatchSize.value - for a particular component, otherwise its default will be 1000. If you replace value by active, you can also configure if this feature is enabled or not. This is useful when you don’t want to use it at all. learn how to implement chunking/bulking in this document. In some cases, you may need to split the output of a processor in two or more connections. A common example is to have "main" and "reject" output connections where part of the incoming data are passed to a specific bucket and processed later. Talend Component Kit supports two types of output connections: Flow and Reject. Flow is the main and standard output connection. The Reject connection handles records rejected during the processing. A component can only have one reject connection, if any. Its name must be REJECT to be processed correctly in Talend applications. You can also define the different output connections of your component in the Starter. To define an output connection, you can use @Output as replacement of the returned value in the @ElementListener: Alternatively, you can pass a string that represents the new branch: Having multiple inputs is similar to having multiple outputs, except that an OutputEmitter wrapper is not needed: @Input takes the input name as parameter. If no name is set, it defaults to the "main (default)" input branch. It is recommended to use the default branch when possible and to avoid naming branches according to the component semantic. Batch processing refers to the way execution environments process batches of data handled by a component using a grouping mechanism. By default, the execution environment of a component automatically decides how to process groups of records and estimates an optimal group size depending on the system capacity. With this default behavior, the size of each group could sometimes be optimized for the system to handle the load more effectively or to match business requirements. For example, real-time or near real-time processing needs often imply processing smaller batches of data, but more often. On the other hand, a one-time processing without business contraints is more effectively handled with a batch size based on the system capacity. Final users of a component developed with the Talend Component Kit that integrates the batch processing logic described in this document can override this automatic size. To do that, a maxBatchSize option is available in the component settings and allows to set the maximum size of each group of data to process. A component processes batch data as follows: Case 1 - No maxBatchSize is specified in the component configuration. The execution environment estimates a group size of 4. Records are processed by groups of 4. Case 2 - The runtime estimates a group size of 4 but a maxBatchSize of 3 is specified in the component configuration. The system adapts the group size to 3. Records are processed by groups of 3. Batch processing relies on the sequence of three methods: @BeforeGroup, @ElementListener, @AfterGroup, that you can customize to your needs as a component Developer. The group size automatic estimation logic is automatically implemented when a component is deployed to a Talend application. Each group is processed as follows until there is no record left: The @BeforeGroup method resets a record buffer at the beginning of each group. The records of the group are assessed one by one and placed in the buffer as follows: The @ElementListener method tests if the buffer size is greater or equal to the defined maxBatchSize. If it is, the records are processed. If not, then the current record is buffered. The previous step happens for all records of the group. Then the @AfterGroup method tests if the buffer is empty. You can define the following logic in the processor configuration: You can also use the condensed syntax for this kind of processor: When writing tests for components, you can force the maxBatchSize parameter value by setting it with the following syntax: .$maxBatchSize=10. You can learn more about processors in this document. Defining a processor/output logic General component execution logic Implementing bulk processing Best practices For the case of output components (not emitting any data) using bulking you can pass the list of records to the after group method:

Defining a processor or an output component logic How to develop an output component with Talend Component Kit output processor

Processors and output components are the components in charge of reading, processing and transforming data in a Talend job, as well as passing it to its required destination. Before implementing the component logic and defining its layout and configurable fields, make sure you have specified its basic metadata, as detailed in this document. A Processor is a component that converts incoming data to a different model. A processor must have a method decorated with @ElementListener taking an incoming data and returning the processed data: Processors must be Serializable because they are distributed components. If you just need to access data on a map-based ruleset, you can use Record or JsonObject as parameter type. From there, Talend Component Kit wraps the data to allow you to access it as a map. The parameter type is not enforced. This means that if you know you will get a SuperCustomDto, then you can use it as parameter type. But for generic components that are reusable in any chain, it is highly encouraged to use Record until you have an evaluation language-based processor that has its own way to access components. For example: A processor also supports @BeforeGroup and @AfterGroup methods, which must not have any parameter and return void values. Any other result would be ignored. These methods are used by the runtime to mark a chunk of the data in a way which is estimated good for the execution flow size. Because the size is estimated, the size of a group can vary. It is even possible to have groups of size 1. It is recommended to batch records, for performance reasons: You can optimize the data batch processing by using the maxBatchSize parameter. This parameter is automatically implemented on the component when it is deployed to a Talend application. Only the logic needs to be implemented. You can however customize its value setting in your LocalConfiguration the property _maxBatchSize.value - for the family - or ${component simple class name}._maxBatchSize.value - for a particular component, otherwise its default will be 1000. If you replace value by active, you can also configure if this feature is enabled or not. This is useful when you don’t want to use it at all. learn how to implement chunking/bulking in this document. In some cases, you may need to split the output of a processor in two or more connections. A common example is to have "main" and "reject" output connections where part of the incoming data are passed to a specific bucket and processed later. Talend Component Kit supports two types of output connections: Flow and Reject. Flow is the main and standard output connection. The Reject connection handles records rejected during the processing. A component can only have one reject connection, if any. Its name must be REJECT to be processed correctly in Talend applications. You can also define the different output connections of your component in the Starter. To define an output connection, you can use @Output as replacement of the returned value in the @ElementListener: Alternatively, you can pass a string that represents the new branch: Having multiple inputs is similar to having multiple outputs, except that an OutputEmitter wrapper is not needed: @Input takes the input name as parameter. If no name is set, it defaults to the "main (default)" input branch. It is recommended to use the default branch when possible and to avoid naming branches according to the component semantic. Batch processing refers to the way execution environments process batches of data handled by a component using a grouping mechanism. By default, the execution environment of a component automatically decides how to process groups of records and estimates an optimal group size depending on the system capacity. With this default behavior, the size of each group could sometimes be optimized for the system to handle the load more effectively or to match business requirements. For example, real-time or near real-time processing needs often imply processing smaller batches of data, but more often. On the other hand, a one-time processing without business contraints is more effectively handled with a batch size based on the system capacity. Final users of a component developed with the Talend Component Kit that integrates the batch processing logic described in this document can override this automatic size. To do that, a maxBatchSize option is available in the component settings and allows to set the maximum size of each group of data to process. A component processes batch data as follows: Case 1 - No maxBatchSize is specified in the component configuration. The execution environment estimates a group size of 4. Records are processed by groups of 4. Case 2 - The runtime estimates a group size of 4 but a maxBatchSize of 3 is specified in the component configuration. The system adapts the group size to 3. Records are processed by groups of 3. Batch processing relies on the sequence of three methods: @BeforeGroup, @ElementListener, @AfterGroup, that you can customize to your needs as a component Developer. The group size automatic estimation logic is automatically implemented when a component is deployed to a Talend application. Each group is processed as follows until there is no record left: The @BeforeGroup method resets a record buffer at the beginning of each group. The records of the group are assessed one by one and placed in the buffer as follows: The @ElementListener method tests if the buffer size is greater or equal to the defined maxBatchSize. If it is, the records are processed. If not, then the current record is buffered. The previous step happens for all records of the group. Then the @AfterGroup method tests if the buffer is empty. You can define the following logic in the processor configuration: You can also use the condensed syntax for this kind of processor: When writing tests for components, you can force the maxBatchSize parameter value by setting it with the following syntax: .$maxBatchSize=10. You can learn more about processors in this document. Defining a processor/output logic General component execution logic Implementing bulk processing Best practices For the case of output components (not emitting any data) using bulking you can pass the list of records to the after group method: An Output is a Processor that does not return any data. Conceptually, an output is a data listener. It matches the concept of processor. Being the last component of the execution chain or returning no data makes your processor an output component: Currently, Talend Component Kit does not allow you to define a Combiner. A combiner is the symmetric part of a partition mapper. It allows to aggregate results in a single partition.

Registering components How to define component and component family metadata icon component version component name family category metadata

Before implementing a component logic and configuration, you need to specify the family and the category it belongs to, the component type and name, as well as its name and a few other generic parameters. This set of metadata, and more particularly the family, categories and component type, is mandatory to recognize and load the component to Talend Studio or Cloud applications. Some of these parameters are handled at the project generation using the starter, but can still be accessed and updated later on. The family and category of a component is automatically written in the package-info.java file of the component package, using the @Components annotation. By default, these parameters are already configured in this file when you import your project in your IDE. Their value correspond to what was defined during the project definition with the starter. Multiple components can share the same family and category value, but the family + name pair must be unique for the system. A component can belong to one family only and to one or several categories. If not specified, the category defaults to Misc. The package-info.java file also defines the component family icon, which is different from the component icon. You can learn how to customize this icon in this section. Here is a sample package-info.java: Another example with an existing component: Components can require metadata to be integrated in Talend Studio or Cloud platforms. Metadata is set on the component class and belongs to the org.talend.sdk.component.api.component package. When you generate your project and import it in your IDE, icon and version both come with a default value. @Icon: Sets an icon key used to represent the component. You can use a custom key with the custom() method but the icon may not be rendered properly. The icon defaults to Check. Replace it with a custom icon, as described in this section. @Version: Sets the component version. 1 by default. learn how to manage different versions and migrations between your component versions in this section. For example: Every component family and component needs to have a representative icon. You have to define a custom icon as follows: For the component family the icon is defined in the package-info.java file. For the component itself, you need to declare the icon in the component class. Custom icons must comply with the following requirements: Icons must be stored in the src/main/resources/icons folder of the project. Icon file names need to match one of the following patterns: IconName.svg or IconName_icon32.png. The latter will run in degraded mode in Talend Cloud. Replace IconName by the name of your choice. Icons must be squared, even for the SVG format. Note that SVG icons are not supported by Talend Studio and can cause the deployment of the component to fail. If you aim at deploying a custom component to Talend Studio, specify PNG icons or use the Maven (or Gradle) svg2png plugin to convert SVG icons to PNG. If you want a finer control over both images, you can provide both in your component. Ultimately, you can also remove SVG parameters from the talend.component.server.icon.paths property in the HTTP server configuration. Note that SVG icons are not supported by Talend Studio and can cause the deployment of the component to fail. If you aim at deploying a custom component to Talend Studio, specify PNG icons or use the Maven (or Gradle) svg2png plugin to convert SVG icons to PNG. If you want a finer control over both images, you can provide both in your component. Ultimately, you can also remove SVG parameters from the talend.component.server.icon.paths property in the HTTP server configuration. For any purpose, you can also add user defined metadatas to your component with the @Metadatas annotation. Example: You can also use a SPI implementing org.talend.sdk.component.spi.component.ComponentMetadataEnricher. Methodology for creating components Generating a project using the starter Managing component versions Defining an input component Defining a processor or output component Defining a driver runner component Defining component layout and configuration Best practices

Implementing batch processing Optimize the way your processor component handle records using groups bulk bulking chunk group maxBatchSize bulking batch

Batch processing refers to the way execution environments process batches of data handled by a component using a grouping mechanism. By default, the execution environment of a component automatically decides how to process groups of records and estimates an optimal group size depending on the system capacity. With this default behavior, the size of each group could sometimes be optimized for the system to handle the load more effectively or to match business requirements. For example, real-time or near real-time processing needs often imply processing smaller batches of data, but more often. On the other hand, a one-time processing without business contraints is more effectively handled with a batch size based on the system capacity. Final users of a component developed with the Talend Component Kit that integrates the batch processing logic described in this document can override this automatic size. To do that, a maxBatchSize option is available in the component settings and allows to set the maximum size of each group of data to process. A component processes batch data as follows: Case 1 - No maxBatchSize is specified in the component configuration. The execution environment estimates a group size of 4. Records are processed by groups of 4. Case 2 - The runtime estimates a group size of 4 but a maxBatchSize of 3 is specified in the component configuration. The system adapts the group size to 3. Records are processed by groups of 3. Batch processing relies on the sequence of three methods: @BeforeGroup, @ElementListener, @AfterGroup, that you can customize to your needs as a component Developer. The group size automatic estimation logic is automatically implemented when a component is deployed to a Talend application. Each group is processed as follows until there is no record left: The @BeforeGroup method resets a record buffer at the beginning of each group. The records of the group are assessed one by one and placed in the buffer as follows: The @ElementListener method tests if the buffer size is greater or equal to the defined maxBatchSize. If it is, the records are processed. If not, then the current record is buffered. The previous step happens for all records of the group. Then the @AfterGroup method tests if the buffer is empty. You can define the following logic in the processor configuration: You can also use the condensed syntax for this kind of processor: When writing tests for components, you can force the maxBatchSize parameter value by setting it with the following syntax: .$maxBatchSize=10. You can learn more about processors in this document.

Creating components for a REST API Example of REST API component implementation with Talend Component Kit tutorial example zendesk

This tutorial shows how to create components that consume a REST API. The component developed as example in this tutorial is an input component that provides a search functionality for Zendesk using its Search API. Lombok is used to avoid writing getter, setter and constructor methods. You can generate a project using the Talend Components Kit starter, as described in this tutorial. The input component relies on Zendesk Search API and requires an HTTP client to consume it. The Zendesk Search API takes the following parameters on the /api/v2/search.json endpoint. query : The search query. sort_by : The sorting type of the query result. Possible values are updated_at, created_at, priority, status, ticket_type, or relevance. It defaults to relevance. sort_order: The sorting order of the query result. Possible values are asc (for ascending) or desc (for descending). It defaults to desc. Talend Component Kit provides a built-in service to create an easy-to-use HTTP client in a declarative manner, using Java annotations. No additional implementation is needed for the interface, as it is provided by the component framework, according to what is defined above. This HTTP client can be injected into a mapper or a processor to perform HTTP requests. This example uses the basic authentication that supported by the API. The first step is to set up the configuration for the basic authentication. To be able to consume the Search API, the Zendesk instance URL, the username and the password are needed. The data store is now configured. It provides a basic authentication token. Once the data store is configured, you can define the dataset by configuring the search query. It is that query that defines the records processed by the input component. Your component is configured. You can now create the component logic. Mappers defined with this tutorial don’t implement the split part because HTTP calls are not split on many workers in this case. Once the component logic implemented, you can create the source in charge of performing the HTTP request to the search API and converting the result to JsonObject records. You now have created a simple Talend component that consumes a REST API. To learn how to test this component, refer to this tutorial.

Implementing components Get an overview of the main steps to code the logic of your custom Talend Componit Kit components create code class logic layout configuration dev overview api

Once you have generated a project, you can start implementing the logic and layout of your components and iterate on it. Depending on the type of component you want to create, the logic implementation can differ. However, the layout and component metadata are defined the same way for all types of components in your project. The main steps are: Defining family and component metadata Defining an input component logic Defining a processor/output logic Defining a standalone component logic Defining component layout and configuration In some cases, you will require specific implementations to handle more advanced cases, such as: Internationalizing a component Managing component versions Masking sensitive data Implementing batch processing Implementing streaming on a component You can also make certain configurations reusable across your project by defining services. Using your Java IDE along with a build tool supported by the framework, you can then compile your components to test and deploy them to Talend Studio or other Talend applications: Building components with Maven Building components with Gradle Wrapping a Beam I/O In any case, follow these best practices to ensure the components you develop are optimized. You can also learn more about component loading and plugins here: Loading a component

Installing components using a CAR file How to build a component archive that you can easily share and how to install the shared .car file in Talend Studio. deploy install car .car car-bundler component archive studio-integration

Components built using Talend Component Kit can be shared as component archives (.car). These CAR files are executable files allowing to easily deploy the components it contains to any compatible version of Talend Studio. Component developers can generate .car files from their projects to share their components and make them available for other users, as detailed in this document. This document assumes that you have a component archive (.car) file and need to deploy it to Talend Studio. The component archive (.car) is executable and exposes the studio-deploy command which takes a Talend Studio home path as parameter. When executed, it installs the dependencies into the Studio and registers the component in your instance. For example: You can also upload the dependencies to your Nexus server using the following command: In this command, Nexus URL and repository name are mandatory arguments. All other arguments are optional. If arguments contain spaces or special symbols, you need to quote the whole value of the argument. For example: Talend Studio allows you to share components you have created using Talend Component Kit to other users working on the same remote project. Remote projects are available with Enterprise versions of Talend Studio only. Also, note that this feature has been removed in Studio since 7.3 release. Make sure you are connected to a remote project and the artifact repository for component sharing has been properly configured. On the toolbar of the Studio main window, click or click File > Edit Project Properties from the menu bar to open the Project Settings dialog box. In the tree view of the dialog box, select Repository Share to open the corresponding view. Select the Propagate components update to Artifact Repository check box. In the Repository ID field, specify the artifact repository configured for component sharing, and then click Check connection to verify the connectivity. Click Apply and Close to validate the settings and close the dialog box. Create a folder named patches at the root of your Talend Studio installation directory, then copy the .car files of the components you want share to this folder. Restart your Talend Studio and connect to the remote project. The components are deployed automatically to the repository and available in the Palette for other users when connected to a remote project with the same sharing repository configuration. My custom component builds correctly but does not appear in Talend Studio, how to fix it? This issue can be caused by the icon specified in the component metadata. Make sure to specify a custom icon for the component and the component family. These custom icons must be in PNG format to be properly handled by Talend Studio. Remove SVG parameters from the talend.component.server.icon.paths property in the HTTP server configuration. Refer to this section. learn more about defining custom icons for components in this document.

Knowledge base Read advanced articles about Talend Component Kit advanced article

learn about the latest news or go deeper in the framework with the following articles. Changelog Creating plugins Running Component Kit components on a Remote Engine

Creating your first component Create your first component using Talend Component Kit and integrate it to Talend Open Studio to build a job first start Studio studio-integration integration palette

This tutorial walks you through the most common iteration steps to create a component with Talend Component Kit and to deploy it to Talend Open Studio. The component created in this tutorial is a simple processor that reads data coming from the previous component in a job or pipeline and displays it in the console logs of the application, along with an additional information entered by the final user. The component designed in this tutorial is a processor and does not require nor show any datastore and dataset configuration. Datasets and datastores are required only for input and output components. To get your development environment ready and be able to follow this tutorial: Download and install a Java JDK 1.8 or greater. Download and install Talend Open Studio. For example, from Sourceforge. Download and install IntelliJ. Download the Talend Component Kit plugin for IntelliJ. The detailed installation steps for the plugin are available in this document. The first step in this tutorial is to generate a component skeleton using the Starter embedded in the Talend Component Kit plugin for IntelliJ. Start IntelliJ and create a new project. In the available options, you should see Talend Component. Make sure that a Project SDK is selected. Then, select Talend Component and click Next. The Talend Component Kit Starter opens. Enter the component and project metadata. Change the default values, for example as presented in the screenshot below: The Component Family and the Category will be used later in Talend Open Studio to find the new component. Project metadata is mostly used to identify the project structure. A common practice is to replace 'company' in the default value by a value of your own, like your domain name. Once the metadata is filled, select Add a component. A new screen is displayed in the Talend Component Kit Starter that lets you define the generic configuration of the component. By default, new components are processors. Enter a valid Java name for the component. For example, Logger. Select Configuration Model and add a string type field named level. This input field will be used in the component configuration for final users to enter additional information to display in the logs. In the Input(s) / Output(s) section, click the default MAIN input branch to access its detail, and make sure that the record model is set to Generic. Leave the Name of the branch with its default MAIN value. Repeat the same step for the default MAIN output branch. Because the component is a processor, it has an output branch by default. A processor without any output branch is considered an output component. You can create output components when the Activate IO option is selected. Click Next and check the name and location of the project, then click Finish to generate the project in the IDE. At this point, your component is technically already ready to be compiled and deployed to Talend Open Studio. But first, take a look at the generated project: Two classes based on the name and type of component defined in the Talend Component Kit Starter have been generated: LoggerProcessor is where the component logic is defined LoggerProcessorConfiguration is where the component layout and configurable fields are defined, including the level string field that was defined earlier in the configuration model of the component. The package-info.java file contains the component metadata defined in the Talend Component Kit Starter, such as family and category. You can notice as well that the elements in the tree structure are named after the project metadata defined in the Talend Component Kit Starter. These files are the starting point if you later need to edit the configuration, logic, and metadata of the component. There is more that you can do and configure with the Talend Component Kit Starter. This tutorial covers only the basics. You can find more information in this document. Without modifying the component code generated from the Starter, you can compile the project and deploy the component to a local instance of Talend Open Studio. The logic of the component is not yet implemented at that stage. Only the configurable part specified in the Starter will be visible. This step is useful to confirm that the basic configuration of the component renders correctly. Before starting to run any command, make sure that Talend Open Studio is not running. From the component project in IntelliJ, open a Terminal and make sure that the selected directory is the root of the project. All commands shown in this tutorial are performed from this location. Compile the project by running the following command: mvnw clean install. The mvnw command refers to the Maven wrapper that is embedded in Talend Component Kit. It allows to use the right version of Maven for your project without having to install it manually beforehand. An equivalent wrapper is available for Gradle. Once the command is executed and you see BUILD SUCCESS in the terminal, deploy the component to your local instance of Talend Open Studio using the following command: mvnw talend-component:deploy-in-studio -Dtalend.component.studioHome="". Replace the path with your own value. If the path contains spaces (for example, Program Files), enclose it with double quotes. Make sure the build is successful. Open Talend Open Studio and create a new Job: Find the new component by looking for the family and category specified in the Talend Component Kit Starter. You can add it to your job and open its settings. Notice that the level field specified in the configuration model of the component in the Talend Component Kit Starter is present. At this point, the new component is available in Talend Open Studio, and its configurable part is already set. But the component logic is still to be defined. You can now edit the component to implement its logic: reading the data coming through the input branch to display that data in the execution logs of the job. The value of the level field that final users can fill also needs to be changed to uppercase and displayed in the logs. Save the job created earlier and close Talend Open Studio. Go back to the component development project in IntelliJ and open the LoggerProcessor class. This is the class where the component logic can be defined. Look for the @ElementListener method. It is already present and references the default input branch that was defined in the Talend Component Kit Starter, but it is not complete yet. To be able to log the data in input to the console, add the following lines: The @ElementListener method now looks as follows: Open a Terminal again to compile the project and deploy the component again. To do that, run successively the two following commands: mvnw clean install `mvnw talend-component:deploy-in-studio -Dtalend.component.studioHome="" The update of the component logic should now be deployed. After restarting Talend Open Studio, you will be ready to build a job and use the component for the first time. To learn the different possibilities and methods available to develop more complex logics, refer to this document. If you want to avoid having to close and re-open Talend Open Studio every time you need to make an edit, you can enable the developer mode, as explained in this document. As the component is now ready to be used, it is time to create a job and check that it behaves as intended. Open Talend Open Studio again and go to the job created earlier. The new component is still there. Add a tRowGenerator component and connect it to the logger. Double-click the tRowGenerator to specify the data to generate: Add a first column named firstName and select the TalendDataGenerator.getFirstName() function. Add a second column named 'lastName' and select the TalendDataGenerator.getLastName() function. Set the Number of Rows for RowGenerator to 10. Validate the tRowGenerator configuration. Open the TutorialFamilyLogger component and set the level field to info. Go to the Run tab of the job and run the job. The job is executed. You can observe in the console that each of the 10 generated rows is logged, and that the info value entered in the logger is also displayed with each record, in uppercase.

Record types How to modelize data processed or emitted by components. record pojo builder factory types schema discover jsonObject json record-schema

Components are designed to manipulate data (access, read, create). Talend Component Kit can handle several types of data, described in this document. By design, the framework must run in DI (plain standalone Java program) and in Beam pipelines. It is out of scope of the framework to handle the way the runtime serializes - if needed - the data. For that reason, it is critical not to import serialization constraints to the stack. As an example, this is one of the reasons why Record or JsonObject were preferred to Avro IndexedRecord. Any serialization concern should either be hidden in the framework runtime (outside of the component developer scope) or in the runtime integration with the framework (for example, Beam integration). Record is the default format. It offers many possibilities and can evolve depending on the Talend platform needs. Its structure is data-driven and exposes a schema that allows to browse it. Projects generated from the Talend Component Kit Starter are by default designed to handle this format of data. Record is a Java interface but never implement it yourself to ensure compatibility with the different Talend products. Follow the guidelines below. You can build records using the newRecordBuilder method of the RecordBuilderFactory (see here). For example: In the example above, the schema is dynamically computed from the data. You can also do it using a pre-built schema, as follows: The example above uses a schema that was pre-built using factory.newSchemaBuilder(Schema.Type.RECORD). When using a pre-built schema, the entries passed to the record builder are validated. It means that if you pass a null value null or an entry type that does not match the provided schema, the record creation fails. It also fails if you try to add an entry which does not exist or if you did not set a not nullable entry. Using a dynamic schema can be useful on the backend but can lead users to more issues when creating a pipeline to process the data. Using a pre-built schema is more reliable for end-users. You can access and read data by relying on the getSchema method, which provides you with the available entries (columns) of a record. The Entry exposes the type of its value, which lets you access the value through the corresponding method. For example, the Schema.Type.STRING type implies using the getString method of the record. For example: The Record format supports the following data types: String Boolean Int Long Float Double DateTime Array Bytes Record A map can always be modelized as a list (array of records with key and value entries). For example: For example, you can use the API to provide the schema. The following method needs to be implemented in a service. Manually constructing the schema without any data: Returning the schema from an already built record: MyDataset is the class that defines the dataset. learn more about datasets and datastores in this document. Entry names for Record and JsonObject types must comply with the following rules: The name must start with a letter or with _. If not, the invalid characters are ignored until the first valid character. Following characters of the name must be a letter, a number, or . If not, the invalid character is replaced with . For example: 1foo becomes foo. f@o becomes f_o. 1234f5@o becomes ___f5_o. foo123 stays foo123. Each array uses only one schema for all of its elements. If an array contains several elements, they must be of the same data type. For example, the following array is not correct as it contains a string and an object: The runtime also supports JsonObject as input and output component type. You can rely on the JSON services (Jsonb, JsonBuilderFactory) to create new instances. This format is close to the Record format, except that it does not natively support the Datetime type and has a unique Number type to represent Int, Long, Float and Double types. It also does not provide entry metadata like nullable or comment, for example. It also inherits the Record format limitations. The runtime also accepts any POJO as input and output component type. In this case, it uses JSON-B to treat it as a JsonObject.

Tutorials Guided implementation examples to get your hands on Talend Component Kit tutorial example implement test dev testing

The following tutorials are designed to help you understand the main principles of component development using Talend Component Kit. With this set of tutorials, get your hands on project creation using the Component Kit Starter and implement the logic of different types of components. Creating your first component Generating a project from the starter Creating a Hazelcast input component Creating a Hazelcast output component Creating a Zendesk REST API connector Handling component version migration With this set of tutorials, learn the different approaches to test the components created in the previous tutorials. Testing a Zendesk REST API connector Testing a Hazelcast component Testing in a continuous integration environment

Building components with Maven Use Maven or the Maven wrapper as build tool to develop components mvn mvnw maven maven-plugin tool build

To develop new components, Talend Component Kit requires a build tool in which you will import the component project generated from the starter. You will then be able to install and deploy it to Talend applications. A Talend Component Kit plugin is available for each of the supported build tools. talend-component-maven-plugin helps you write components that match best practices and generate transparently metadata used by Talend Studio. You can use it as follows: This plugin is also an extension so you can declare it in your build/extensions block as: Used as an extension, the goals detailed in this document will be set up. The Talend Component Kit plugin integrates some specific goals within Maven build lifecycle. For example, to compile the project and prepare for deploying your component, run mvn clean install. Using this command, the following goals are executed: The build is split into several phases. The different goals are executed in the order shown above. Talend Component Kit uses default goals from the Maven build lifecycle and adds additional goals to the building and packaging phases. Goals added to the build by Talend Component Kit are detailed below. The default lifecycle is detailed in Maven documentation. The Talend Component Kit plugin for Maven integrates several specific goals into Maven build lifecycle. To run specific goals individually, run the following command from the root of the project, by adapting it with each goal name, parameters and values: The first goal is a shortcut for the maven-dependency-plugin. It creates the TALEND-INF/dependencies.txt file with the compile and runtime dependencies, allowing the component to use it at runtime: The scan-descriptor goal scans the current module and optionally other configured folders to precompute the list of interesting classes for the framework (components, services). It allows to save some bootstrap time when launching a job, which can be useful in some execution cases: Configuration - excluding parameters used by default only: Name Description User property Default output Where to dump the scan result. Note: It is not supported to change that value in the runtime. talend.scan.output ${project.build.outputDirectory}/TALEND-INF/scanning.properties scannedDirectories Explicit list of directories to scan. talend.scan.scannedDirectories If not set, defaults to ${project.build.outputDirectory} scannedDependencies Explicit list of dependencies to scan - set them in the groupId:artifactId format. The list is appended to the file to scan. talend.scan.scannedDependencies - The svg2png goal scans a directory - default to target/classes/icons - to find .svg files and copy them in a PNG version size at 32x32px and named with the suffix _icon32.png to enable the studio to read it: Configuration: Name Description User property Default icons Where to scan for the SVG icons to convert in PNG. talend.icons.source ${project.build.outputDirectory}/icons workarounds By default the shape of the icon will be enforce in the RGB channels (in white) using the alpha as reference. This is useful for black/white images using alpha to shape the picture because Eclipse - Talend Studio - caches icons using RGB but not alpha channel, pictures not using alpha channel to draw their shape should disable that workaround. talend.icons.workaround true if you use that plugin, ensure to set it before the validate mojo otherwise validation can miss some png files. This goal helps you validate the common programming model of the component. To activate it, you can use following execution definition: It is bound to the process-classes phase by default. When executed, it performs several validations that can be disabled by setting the corresponding flags to false in the block of the execution: Name Description User property Default validateInternationalization Validates that resource bundles are presents and contain commonly used keys (for example, _displayName) talend.validation.internationalization true validateModel Ensures that components pass validations of the ComponentManager and Talend Component runtime talend.validation.model true validateSerializable Ensures that components are Serializable. This is a sanity check, the component is not actually serialized here. If you have a doubt, make sure to test it. It also checks that any @Internationalized class is valid and has its keys. talend.validation.serializable true validateMetadata Ensures that components have an @Icon and a @Version defined. talend.validation.metadata true validateDataStore Ensures that any @DataStore defines a @HealthCheck and has a unique name. talend.validation.datastore true validateDataSet Ensures that any @DataSet has a unique name. Also ensures that there is a source instantiable just filling the dataset properties (all others not being required). Finally, the validation checks that each input or output component uses a dataset and that this dataset has a datastore. talend.validation.dataset true validateComponent Ensures that the native programming model is respected. You can disable it when using another programming model like Beam. talend.validation.component true validateActions Validates action signatures for actions not tolerating dynamic binding (@HealthCheck, @DynamicValues, and so on). It is recommended to keep it set to true. talend.validation.action true validateFamily Validates the family by verifying that the package containing the @Components has a @Icon property defined. talend.validation.family true validateDocumentation Ensures that all components and @Option properties have a documentation using the @Documentation property. talend.validation.documentation true validateLayout Ensures that the layout is referencing existing options and properties. talend.validation.layout true validateOptionNames Ensures that the option names are compliant with the framework. It is highly recommended and safer to keep it set to true. talend.validation.options true validateLocalConfiguration Ensures that if any TALEND-INF/local-configuration.properties exists then keys start with the family name. talend.validation.localConfiguration true validateOutputConnection Ensures that an output has only one input branch. talend.validation.validateOutputConnection true validatePlaceholder Ensures that string options have a placeholder. It is highly recommended to turn this property on. talend.validation.placeholder false locale The locale used to validate internationalization. talend.validation.locale root The asciidoc goal generates an Asciidoc file documenting your component from the configuration model (@Option) and the @Documentation property that you can add to options and to the component itself. Name Description User property Default level Level of the root title. talend.documentation.level 2 (==) output Output folder path. It is recommended to keep it to the default value. talend.documentation.output ${classes}/TALEND-INF/documentation.adoc formats Map of the renderings to do. Keys are the format (pdf or html) and values the output paths. talend.documentation.formats - attributes Asciidoctor attributes to use for the rendering when formats is set. talend.documentation.attributes - templateEngine Template engine configuration for the rendering. talend.documentation.templateEngine - templateDir Template directory for the rendering. talend.documentation.templateDir - title Document title. talend.documentation.title ${project.name} version The component version. It defaults to the pom version talend.documentation.version ${project.version} workDir The template directory for the Asciidoctor rendering - if 'formats' is set. talend.documentation.workdDir ${project.build.directory}/talend-component/workdir attachDocumentations Allows to attach (and deploy) the documentations (.adoc, and formats keys) to the project. talend.documentation.attach true htmlAndPdf If you use the plugin as an extension, you can add this property and set it to true in your project to automatically get HTML and PDF renderings of the documentation. talend.documentation.htmlAndPdf false To render the generated documentation in HTML or PDF, you can use the Asciidoctor Maven plugin (or Gradle equivalent). You can configure both executions if you want both HTML and PDF renderings. Make sure to execute the rendering after the documentation generation. If you prefer a HTML rendering, you can configure the following execution in the asciidoctor plugin. The example below: Generates the components documentation in target/classes/TALEND-INF/documentation.adoc. Renders the documentation as an HTML file stored in target/documentation/documentation.html. If you prefer a PDF rendering, you can configure the following execution in the asciidoctor plugin: If you want to add some more content or a title, you can include the generated document into another document using Asciidoc include directive. For example: To be able to do that, you need to pass the generated_doc attribute to the plugin. For example: This is optional but allows to reuse Maven placeholders to pass paths, which can be convenient in an automated build. You can find more customization options on Asciidoctor website. Testing the rendering of your component configuration into the Studio requires deploying the component in Talend Studio. Refer to the Studio documentation. In the case where you need to deploy your component into a Cloud (web) environment, you can test its web rendering by using the web goal of the plugin: Run the mvn talend-component:web command. Open the following URL in a web browser: localhost:8080. Select the component form you want to see from the treeview on the left. The selected form is displayed on the right. Two parameters are available with the plugin: serverPort, which allows to change the default port (8080) of the embedded server. Its associated user property is talend.web.port. serverArguments, that you can use to pass Meecrowave options to the server. learn more about that configuration at openwebbeans.apache.org/meecrowave/meecrowave-core/cli.html. Make sure to install the artifact before using this command because it reads the component JAR from the local Maven repository. Finally, you can switch the lang of the component UI (documentation, form) using language query parameter in the webapp. For instance localhost:8080?language=fr. If you built a custom UI (JS + CSS) bundle and want to test it in the web application, you can configure it in the pom.xml file as follows: This is an advanced feature designed for expert users. Use it with caution. Component ARchive (.car) is the way to bundle a component to share it in the Talend ecosystem. It is an executable Java ARchive (.jar) containing a metadata file and a nested Maven repository containing the component and its dependencies. This command creates a .car file in your build directory. This file can be shared on Talend platforms. This command has some optional parameters: Name Description User property Default attach Specifies whether the component archive should be attached. talend.car.attach true classifier The classifier to use if attach is set to true. talend.car.classifier component metadata Additional custom metadata to bundle in the component archive. - - output Specifies the output path and name of the archive talend.car.output ${project.build.directory}/${project.build.finalName}.car packaging Specifies the packaging - ${project.packaging} type Specifies the type: `connector' or server `extension' talend.car.type connector This CAR is executable and exposes the studio-deploy command which takes a Talend Studio home path as parameter. When executed, it installs the dependencies into the Studio and registers the component in your instance. For example: You can also upload the dependencies to your Nexus server using the following command: In this command, Nexus URL and repository name are mandatory arguments. All other arguments are optional. If arguments contain spaces or special symbols, you need to quote the whole value of the argument. For example: The deploy-in-studio goal deploys the current component module into a local Talend Studio instance. Name Description User property Default studioHome Path to the Studio home directory talend.component.studioHome - studioM2 Path to the Studio maven repository if not the default one talend.component.studioM2 - You can use the following command from the root folder of your project: The help goal displays help information on talend-component-maven-plugin. Call mvn talend-component:help -Ddetail=true -Dgoal= to display the parameter details of a specific goal. Name Description User property Default detail Displays all settable properties for each goal. detail false goal The name of the goal for which to show help. If unspecified, all goals are displayed. goal - indentSize Number of spaces per indentation level. This integer should be positive. indentSize 2 lineLength Maximum length of a display line. This integer should be positive. lineLength 80

Building components with Gradle Use Gradle or the Gradle wrapper as build tool to develop components gradle tool build

To develop new components, Talend Component Kit requires a build tool in which you will import the component project generated from the starter. With this build tool, you will also be able to implement the logic of your component and to install and deploy it to Talend applications. A Talend Component Kit plugin is available for each of the supported build tools. gradle-talend-component helps you write components that match the best practices. It is inspired from the Maven plugin and adds the ability to generate automatically the dependencies.txt file used by the SDK to build the component classpath. For more information on the configuration, refer to the Maven properties matching the attributes. By default, Gradle does not log information messages. To see messages, use --info in your commands. Refer to Gradle’s documentation to learn about log levels. You can use it as follows: