# Talend Component Kit Developer Reference Guide

## Talend Component Kit Overview

### Component API

The component API is declarative (through annotations) to ensure it is:

1. Evolutive. It can get new features without breaking old code.

2. As static as possible.

#### Evolution

Because it is fully declarative, any new API can be added iteratively without requiring any change to existing components.

For example, in the case of Beam potential evolution:

``````@ElementListener
public MyOutput onElement(MyInput data) {
return ...;
}``````

would not be affected by the addition of the new Timer API, which can be used as follows:

``````@ElementListener
public MyOutput onElement(MyInput data,
@Timer("my-timer") Timer timer) {
return ...;
}``````

#### Static

##### UI-friendly

The intent of the framework is to be able to fit in a Java UI as well as in a web UI.

It must be understood as colocalized and remote UI. The goal is to move as much as possible the logic to the UI side for UI-related actions. For example, validating a pattern, a size, and so on, should be done on the client side rather than on the server side. Being static encourages this practice.

##### Auditable and with clear expectations

The other goal of being static in the API definition is to ensure that the model will not be mutated at runtime and that all the auditing and modeling can be done before, at the design phase.

##### Dev-friendly

Being static also ensures that the development can be validated as much as possible through build tools.
This does not replace the requirement to test the components but helps developers to maintain components with automated tools.

#### Generic and specific

The processor API supports `JsonObject` as well as any custom model. The goal is to support generic component development that need to access configured "object paths", as well as specific components that rely on a defined path from the input.

A generic component can look like:

``````@ElementListener
public MyOutput onElement(JsonObject input) {
return ...;
}``````

A specific component can look like (with `MyInput` a POJO):

``````@ElementListener
public MyOutput onElement(MyInput input) {
return ...;
}``````
##### No runtime assumption

By design, the framework must run in DI (plain standalone Java program) and in Beam pipelines.
It is out of scope of the framework to handle the way the runtime serializes - if needed - the data.

For that reason, it is critical not to import serialization constraints to the stack. As an example, this is the reason why `JsonObject` is not an `IndexedRecord` from Avro.

Any serialization concern should either be hidden in the framework runtime (outside of the component developer scope) or in the runtime integration with the framework (for example, Beam integration).

In this context, JSON-P can be good compromise because it brings a powerful API with very few constraints.

### Isolated

The components must be able to execute even if they have conflicting libraries. For that purpose, classloaders must be isolated. A component defines its dependencies based on a Maven format and is always bound to its own classloader.

### REST

#### Consumable model

The definition payload is as flat as possible and strongly typed to ensure it can be manipulated by consumers. This way, consumers can add or remove fields with simple mapping rules, without any abstract tree handling.

The execution (runtime) configuration is the concatenation of framework metadata (only the version) and a key/value model of the instance of the configuration based on the definition properties paths for the keys. It enables consumers to maintain and work with the keys/values according to their need.

The framework not being responsible for any persistence, it is very important to make sure that consumers can handle it from end to end, with the ability to search for values (update a machine, update a port and so on) and keys (for example, a new encryption rule on key `certificate`).

Talend Component Kit is a metamodel provider (to build forms) and a runtime execution platform. It takes a configuration instance and uses it volatilely to execute a component logic. This implies it cannot own the data nor define the contract it has for these two endpoints and must let the consumers handle the data lifecycle (creation, encryption, deletion, and so on).

#### Execution with streaming

A new mime type called `talend/stream` is introduced to define a streaming format.

It matches a JSON object per line:

``````{"key1":"value1"}
{"key2":"value2"}
{"key1":"value11"}
{"key1":"value111"}
{"key2":"value2"}``````

### Fixed set of icons

Icons (`@Icon`) are based on a fixed set. Custom icons can be used but their display cannot be guaranteed. Components can be used in any environment and require a consistent look that cannot be guaranteed outside of the UI itself. Defining keys only is the best way to communicate this information.

 Once you know exactly how you will deploy your component in the Studio, then you can use `@Icon(value = CUSTOM, custom = "…​") to use a custom icon file.

## Talend Component Kit Overview

If you don’t know about Talend Component Kit, you can get started by going through the following links:

## Getting started with Talend Component Kit

Talend Component Kit is a framework designed to simplify the development of components at two levels:

• Runtime: Runtime is about injecting the specific component code into a job or pipeline. The framework helps unify as much as possible the code required to run in Data Integration (DI) and BEAM environments.

• Graphical interface: The framework helps unify the code required to be able to render the component in a browser (web) or in the Eclipse-based Studio (SWT).

### System Requirements

In order to use Talend Component Kit, you need the following tools installed on your machine:

• Apache Maven 3.5.x is recommended to develop a component or the project itself. You can download it from Apache Maven website. You can also use Gradle.

• A Java Integrated Development Environment such as Eclipse or IntelliJ.

• The version of your Studio is 7.0 and onwards.

### Main principles

Developing new components using the framework includes:

1. Creating a project using the starter or the Talend IntelliJ plugin.

1. Defining the general configuration model for each component in your project

3. Compiling the project

2. Implementing the components

1. Registering the components family

2. Defining the layout and configurable part of the components

3. Defining the partition mapper for Input components

4. Implementing the source logic for Input components

5. Defining the processor for Output components

3. Testing the components

## Talend Component Documentation

### Talend Components Definitions Documentation

#### Component definition

Talend Component Kit framework relies on several primitive components.

All components can use `@PostConstruct` and `@PreDestroy` annotations to initialize or release some underlying resource at the beginning and the end of a processing.

 In distributed environments, class constructor are called on cluster manager node. Methods annotated with `@PostConstruct` and `@PreDestroy` are called on worker nodes. Thus, partition plan computation and pipeline tasks are performed on different nodes.

1. The created task is a JAR file containing class information, which describes the pipeline (flow) that should be processed in cluster.

2. During the partition plan computation step, the pipeline is analyzed and split into stages. The cluster manager node instantiates mappers/processors, gets estimated data size using mappers, and splits created mappers according to the estimated data size.
All instances are then serialized and sent to the worker node.

3. Serialized instances are received and deserialized. Methods annotated with `@PostConstruct` are called. After that, pipeline execution starts. The @BeforeGroup annotated method of the processor is called before processing the first element in chunk.
After processing the number of records estimated as chunk size, the `@AfterGroup` annotated method of the processor is called. Chunk size is calculated depending on the environment the pipeline is processed by. Once the pipeline is processed, methods annotated with `@PreDestroy` are called.

 All the methods managed by the framework must be public. Private methods are ignored.

 The framework is designed to be as declarative as possible but also to stay extensible by not using fixed interfaces or method signatures. This allows to incrementally add new features of the underlying implementations.
##### PartitionMapper

A `PartitionMapper` is a component able to split itself to make the execution more efficient.

This concept is borrowed from big data and useful in this context only (`BEAM` executions). The idea is to divide the work before executing it in order to reduce the overall execution time.

The process is the following:

1. The size of the data you work on is estimated. This part can be heuristic and not very precise.

2. From that size, the execution engine (runner for Beam) requests the mapper to split itself in N mappers with a subset of the overall work.

3. The leaf (final) mapper is used as a `Producer` (actual reader) factory.

 This kind of component must be `Serializable` to be distributable.
###### Definition

A partition mapper requires three methods marked with specific annotations:

1. `@Assessor` for the evaluating method

2. `@Split` for the dividing method

3. `@Emitter` for the `Producer` factory

@Assessor

The Assessor method returns the estimated size of the data related to the component (depending its configuration). It must return a `Number` and must not take any parameter.

For example:

``````@Assessor
public long estimateDataSetByteSize() {
return ....;
}``````
@Split

The Split method returns a collection of partition mappers and can take optionally a `@PartitionSize` long value as parameter, which is the requested size of the dataset per sub partition mapper.

For example:

``````@Split
public List<MyMapper> split(@PartitionSize final long desiredSize) {
return ....;
}``````
@Emitter

The Emitter method must not have any parameter and must return a producer. It uses the partition mapper configuration to instantiate and configure the producer.

For example:

``````@Emitter
public MyProducer create() {
return ....;
}``````
##### Producer

`Producer` is a component interacting with a physical source. It produces input data for the processing flow.

A producer is a simple component that must have a `@Producer` method without any parameter. It can return any data:

``````@Producer
public MyData produces() {
return ...;
}``````
##### Processor

A `Processor` is a component that converts incoming data to a different model.

A processor must have a method decorated with `@ElementListener` taking an incoming data and returning the processed data:

``````@ElementListener
public MyNewData map(final MyData data) {
return ...;
}``````

Processors must be Serializable because they are distributed components.

If you just need to access data on a map-based ruleset, you can use `JsonObject` as parameter type.
From there, Talend Component Kit wraps the data to allow you to access it as a map. The parameter type is not enforced.
This means that if you know you will get a `SuperCustomDto`, then you can use it as parameter type. But for generic components that are reusable in any chain, it is highly encouraged to use `JsonObject` until you have an evaluation language-based processor that has its own way to access components.

For example:

``````@ElementListener
public MyNewData map(final JsonObject incomingData) {
String name = incomingData.getString("name");
int name = incomingData.getInt("age");
return ...;
}

// equivalent to (using POJO subclassing)

public class Person {
private String age;
private int age;

// getters/setters
}

@ElementListener
public MyNewData map(final Person person) {
String name = person.getName();
int age = person.getAge();
return ...;
}``````

A processor also supports `@BeforeGroup` and `@AfterGroup` methods, which must not have any parameter and return `void` values. Any other result would be ignored. These methods are used by the runtime to mark a chunk of the data in a way which is estimated good for the execution flow size.

 Because the size is estimated, the size of a group can vary. It is even possible to have groups of size `1`.

It is recommended to batch records, for performance reasons:

``````@BeforeGroup
public void initBatch() {
// ...
}

@AfterGroup
public void endBatch() {
// ...
}``````
 It is a good practice to support a `maxBatchSize` here and to commit before the end of the group, in case of a computed size that is way too big for your backend to handle.
##### Multiple outputs

In some cases, you may need to split the output of a processor in two. A common example is to have "main" and "reject" branches where part of the incoming data are passed to a specific bucket to be processed later.

To do that, you can use `@Output` as replacement of the returned value:

``````@ElementListener
public void map(final MyData data, @Output final OutputEmitter<MyNewData> output) {
output.emit(createNewData(data));
}``````

Alternatively, you can pass a string that represents the new branch:

``````@ElementListener
public void map(final MyData data,
@Output final OutputEmitter<MyNewData> main,
@Output("rejected") final OutputEmitter<MyNewDataWithError> rejected) {
if (isRejected(data)) {
rejected.emit(createNewData(data));
} else {
main.emit(createNewData(data));
}
}

// or

@ElementListener
public MyNewData map(final MyData data,
@Output("rejected") final OutputEmitter<MyNewDataWithError> rejected) {
if (isSuspicious(data)) {
rejected.emit(createNewData(data));
return createNewData(data); // in this case the processing continues but notifies another channel
}
return createNewData(data);
}``````
##### Multiple inputs

Having multiple inputs is similar to having multiple outputs, except that an `OutputEmitter` wrapper is not needed:

``````@ElementListener
public MyNewData map(@Input final MyData data, @Input("input2") final MyData2 data2) {
return createNewData(data1, data2);
}``````

`@Input` takes the input name as parameter. If no name is set, it defaults to the "main (default)" input branch. It is recommended to use the default branch when possible and to avoid naming branches according to the component semantic.

##### Output

An `Output` is a `Processor` that does not return any data.

Conceptually, an output is a data listener. It matches the concept of processor. Being the last component of the execution chain or returning no data makes your processor an output component:

``````@ElementListener
public void store(final MyData data) {
// ...
}``````
##### Combiners

Currently, Talend Component Kit does not allow you to define a `Combiner`. A combiner is the symmetric part of a partition mapper and allows to aggregate results in a single partition.

##### Family and component icons

Every component family and component needs to have a representative icon.
You can use one of the icons provided by the framework or you can use a custom icon.

For the component family the icon is defined in the `package-info.java` file. For the component itself, you need to declare it in the component class.

To use a custom icon, you need to have the icon file placed in the `resources/icons` folder of the project. The icon file needs to have a name following the convention `IconName_icon32.png`, where you can replace `IconName` by the name of your choice.

``@Icon(value = Icon.IconType.CUSTOM, custom = "IconName")``

#### Configuring components

Components are configured using their constructor parameters. They can all be marked with the `@Option` property, which lets you give a name to parameters.

For the name to be correct, you must follow these guidelines:

• Use a valid Java name.

• Do not include any `.` character in it.

• Do not start the name with a `$`. • Defining a name is optional. If you don’t set a specific name, it defaults to the bytecode name, which can require you to compile with a `-parameter` flag to not end up with names such as `arg0`, `arg1`, and so on. Parameter types can be primitives or complex objects with fields decorated with `@Option` exactly like method parameters.  It is recommended to use simple models which can be serialized in order to ease serialized component implementations. For example: ``````class FileFormat implements Serializable { @Option("type") private FileType type = FileType.CSV; @Option("max-records") private int maxRecords = 1024; } @PartitionMapper(family = "demo", name = "file-reader") public MyFileReader(@Option("file-path") final File file, @Option("file-format") final FileFormat format) { // ... }`````` Using this kind of API makes the configuration extensible and component-oriented, which allows you to define all you need. The instantiation of the parameters is done from the properties passed to the component. Examples of option names: Option name Valid myName my_name my.name$myName

##### Primitives

A primitive is a class which can be directly converted from a `String` to the expected type.

It includes all Java primitives, like the `String` type itself, but also all types with a `org.apache.xbean.propertyeditor.Converter`:

• `BigDecimal`

• `BigInteger`

• `File`

• `InetAddress`

• `ObjectName`

• `URI`

• `URL`

• `Pattern`

##### Complex object mapping

The conversion from property to object uses the Dot notation.

For example, assuming the method parameter was configured with `@Option("file")`:

``````file.path = /home/user/input.csv
file.format = CSV``````

matches

``````public class FileOptions {
@Option("path")
private File path;

@Option("format")
private Format format;
}``````
###### List case

Lists rely on an indexed syntax to define their elements.

For example, assuming that the list parameter is named `files` and that the elements are of the  `FileOptions` type, you can define a list of two elements as follows:

``````files[0].path = /home/user/input1.csv
files[0].format = CSV
files[1].path = /home/user/input2.xml
files[1].format = EXCEL``````
###### Map case

Similarly to the list case, the map uses `.key[index]` and `.value[index]` to represent its keys and values:

``````// Map<String, FileOptions>
files.key[0] = first-file
files.value[0].path = /home/user/input1.csv
files.value[0].type = CSV
files.key[1] = second-file
files.value[1].path = /home/user/input2.xml
files.value[1].type = EXCEL``````
``````// Map<FileOptions, String>
files.key[0].path = /home/user/input1.csv
files.key[0].type = CSV
files.value[0] = first-file
files.key[1].path = /home/user/input2.xml
files.key[1].type = EXCEL
files.value[1] = second-file``````
 Avoid using the Map type. For example, if you can configure your component with an object instead.
##### Defining Constraints and validations on the configuration

You can use metadata to specify that a field is required or has a minimum size, and so on. This is done using the `validation` metadata in the `org.talend.sdk.component.api.configuration.constraint` package:

API Name Parameter Type Description Supported Types Metadata sample

@org.talend.sdk.component.api.configuration.constraint.Max

maxLength

double

Ensure the decorated option size is validated with a higher bound.

CharSequence

{"validation::maxLength":"12.34"}

@org.talend.sdk.component.api.configuration.constraint.Min

minLength

double

Ensure the decorated option size is validated with a lower bound.

CharSequence

{"validation::minLength":"12.34"}

@org.talend.sdk.component.api.configuration.constraint.Pattern

pattern

string

Validate the decorated string with a javascript pattern (even into the Studio).

CharSequence

{"validation::pattern":"test"}

@org.talend.sdk.component.api.configuration.constraint.Max

max

double

Ensure the decorated option size is validated with a higher bound.

Number, int, short, byte, long, double, float

{"validation::max":"12.34"}

@org.talend.sdk.component.api.configuration.constraint.Min

min

double

Ensure the decorated option size is validated with a lower bound.

Number, int, short, byte, long, double, float

{"validation::min":"12.34"}

@org.talend.sdk.component.api.configuration.constraint.Required

required

-

Mark the field as being mandatory.

Object

{"validation::required":"true"}

@org.talend.sdk.component.api.configuration.constraint.Max

maxItems

double

Ensure the decorated option size is validated with a higher bound.

Collection

{"validation::maxItems":"12.34"}

@org.talend.sdk.component.api.configuration.constraint.Min

minItems

double

Ensure the decorated option size is validated with a lower bound.

Collection

{"validation::minItems":"12.34"}

@org.talend.sdk.component.api.configuration.constraint.Uniques

uniqueItems

-

Ensure the elements of the collection must be distinct (kind of set).

Collection

{"validation::uniqueItems":"true"}

 When using the programmatic API, metadata is prefixed by `tcomp::`. This prefix is stripped in the web for convenience, and the table above uses the web keys.

Also note that these validations are executed before the runtime is started (when loading the component instance) and that the execution will fail if they don’t pass. If somehow it breaks your application you can disable that validation on the JVM by setting the system property `talend.component.configuration.validation.skip` to `true`.

##### Marking a configuration as a particular type of data

It is common to classify the incoming data. It is similar to tagging data with several types. Data can commonly be categorized as follows:

• Datastore: The data you need to connect to the backend.

• Dataset: A datastore coupled with the data you need to execute an action.

org.talend.sdk.component.api.configuration.type.DataSet

dataset

Mark a model (complex object) as being a dataset.

{"tcomp::configurationtype::type":"dataset","tcomp::configurationtype::name":"test"}

org.talend.sdk.component.api.configuration.type.DataStore

datastore

Mark a model (complex object) as being a datastore (connection to a backend).

{"tcomp::configurationtype::type":"datastore","tcomp::configurationtype::name":"test"}

 The component family associated with a configuration type (datastore/dataset) is always the one related to the component using that configuration.

Those configuration types can be composed to provide one configuration item. For example, a dataset type often needs a datastore type to be provided. A datastore type (that provides the connection information) is used to create a dataset type.

Those configuration types are also used at design time to create shared configurations that can be stored and used at runtime.

For example, in the case of a relational database that supports JDBC:

• A datastore can be made of:

• a JDBC URL

• A dataset can be made of:

• a datastore (that provides the data required to connect to the database)

• a table name

• data.

The component server scans all configuration types and returns a configuration type index. This index can be used for the integration into the targeted platforms (Studio, web applications, and so on).

The configuration type index is represented as a flat tree that contains all the configuration types, which themselves are represented as nodes and indexed by ID.

Every node can point to other nodes. This relation is represented as an array of edges that provides the child IDs.

As an illustration, a configuration type index for the example above can be defined as follows:

``````{nodes: {
"idForDstore": { datastore:"datastore data", edges:[id:"idForDset"] },
"idForDset":   { dataset:"dataset data" }
}
}``````

If you need to define a binding between properties, you can use a set of annotations:

@org.talend.sdk.component.api.configuration.condition.ActiveIf

if

If the evaluation of the element at the location matches value then the element is considered active, otherwise it is deactivated.

{"condition::if::target":"test","condition::if::value":"value1,value2"}

@org.talend.sdk.component.api.configuration.condition.ActiveIfs

ifs

Allows to set multiple visibility conditions on the same property.

{"condition::if::value::0":"value1,value2","condition::if::value::1":"SELECTED","condition::if::target::0":"sibling1","condition::if::target::1":"../../other"}

The target element location is specified as a relative path to the current location, using Unix path characters. The configuration class delimiter is `/`.
The parent configuration class is specified by `..`.
Thus, `../targetProperty` denotes a property, which is located in the parent configuration class and is named `targetProperty`.

 When using the programmatic API, metadata is prefixed with `tcomp::`. This prefix is stripped in the web for convenience, and the previous table uses the web keys.

In some cases, you may need to add metadata about the configuration to let the UI render that configuration properly.
For example, a password value that must be hidden and not a simple clear input box. For these cases - if you want to change the UI rendering - you can use a particular set of annotations:

@org.talend.sdk.component.api.configuration.ui.DefaultValue

Provide a default value the UI can use - only for primitive fields.

{"ui::defaultvalue::value":"test"}

@org.talend.sdk.component.api.configuration.ui.OptionsOrder

Allows to sort a class properties.

{"ui::optionsorder::value":"value1,value2"}

@org.talend.sdk.component.api.configuration.ui.layout.AutoLayout

Request the rendered to do what it thinks is best.

{"ui::autolayout":"true"}

@org.talend.sdk.component.api.configuration.ui.layout.GridLayout

Advanced layout to place properties by row, this is exclusive with `@OptionsOrder`.

{"ui::gridlayout::value1::value":"first|second,third","ui::gridlayout::value2::value":"first|second,third"}

@org.talend.sdk.component.api.configuration.ui.layout.GridLayouts

Allow to configure multiple grid layouts on the same class, qualified with a classifier (name)

@org.talend.sdk.component.api.configuration.ui.layout.HorizontalLayout

Put on a configuration class it notifies the UI an horizontal layout is preferred.

{"ui::horizontallayout":"true"}

@org.talend.sdk.component.api.configuration.ui.layout.VerticalLayout

Put on a configuration class it notifies the UI a vertical layout is preferred.

{"ui::verticallayout":"true"}

@org.talend.sdk.component.api.configuration.ui.widget.Code

Mark a field as being represented by some code widget (vs textarea for instance).

{"ui::code::value":"test"}

@org.talend.sdk.component.api.configuration.ui.widget.Credential

Mark a field as being a credential. It is typically used to hide the value in the UI.

{"ui::credential":"true"}

@org.talend.sdk.component.api.configuration.ui.widget.Structure

Mark a List<String> or Map<String, String> field as being represented as the component data selector (field names generally or field names as key and type as value).

{"ui::structure::type":"null","ui::structure::discoverSchema":"test","ui::structure::value":"test"}

@org.talend.sdk.component.api.configuration.ui.widget.TextArea

Mark a field as being represented by a textarea(multiline text input).

{"ui::textarea":"true"}

 When using the programmatic API, metadata is prefixed with `tcomp::`. This prefix is stripped in the web for convenience, and the previous table uses the web keys.

Target support should cover `org.talend.core.model.process.EParameterFieldType` but you need to ensure that the web renderer is able to handle the same widgets.

##### Validations

You can also use other types of validation that are similar to `@Pattern`:

• `@Min`, `@Max` for numbers.

• `@Unique` for collection values.

• `@Required` for a required configuration.

#### Registering components

As you may have read in the Getting Started, you need an annotation to register your component through the `family` method. Multiple components can use the same `family` value but the `family` + `name` pair must be unique for the system.

In order to share the same component family name and to avoid repetitions in all `family` methods, you can use the `@Components` annotation on the root package of your component. It allows you to define the component family and the categories the component belongs to (`Misc` by default if not set).

Here is a sample `package-info.java`:

``````@Components(name = "my_component_family", categories = "My Category")
package org.talend.sdk.component.sample;

import org.talend.sdk.component.api.component.Components;``````

Another example with an existing component:

``````@Components(name = "Salesforce", categories = {"Business", "Cloud"})
package org.talend.sdk.component.sample;

import org.talend.sdk.component.api.component.Components;``````

Components can require metadata to be integrated in Talend Studio or Cloud platforms. Metadata is set on the component class and belongs to the `org.talend.sdk.component.api.component` package.

API Description

@Icon

Sets an icon key used to represent the component. You can use a custom key with the `custom()` method but the icon may not be rendered properly.

@Version

Sets the component version. 1 by default.

Example:

``````@Icon(FILE_XML_O)
@PartitionMapper(name = "jaxbInput")
public class JaxbPartitionMapper implements Serializable {
// ...
}``````
###### Managing version configuration

If some changes impact the configuration, they can be managed through a migration handler at the component level (enabling trans-model migration support).

The `@Version` annotation supports a `migrationHandler` method which migrates the incoming configuration to the current model.

For example, if the `filepath` configuration entry from v1 changed to `location` in v2, you can remap the value in your `MigrationHandler` implementation.

A best practice is to split migrations into services that you can inject in the migration handler (through constructor) rather than managing all migrations directly in the handler. For example:

``````// full component code structure skipped for brievity, kept only migration part
@Version(value = 3, migrationHandler = MyComponent.Migrations.class)
public class MyComponent {
// the component code...

private interface VersionConfigurationHandler {
Map<String, String> migrate(Map<String, String> incomingData);
}

public static class Migrations {
private final List<VersionConfigurationHandler> handlers;

// VersionConfigurationHandler implementations are decorated with @Service
public Migrations(final List<VersionConfigurationHandler> migrations) {
this.handlers = migrations;
this.handlers.sort(/*some custom logic*/);
}

@Override
public Map<String, String> migrate(int incomingVersion, Map<String, String> incomingData) {
Map<String, String> out = incomingData;
for (MigrationHandler handler : handlers) {
out = handler.migrate(out);
}
}
}
}``````

What is important to notice in this snippet is not the way the code is organized, but rather the fact that you can organize your migrations the way that best fits your component.

If you need to apply migrations in a specific order, make sure that they are sorted.

 Consider this API as a migration callback rather than a migration API. Adjust the migration code structure you need behind the `MigrationHandler`, based on your component requirements, using service injection.
###### @PartitionMapper

`@PartitionMapper` marks a partition mapper:

``````@PartitionMapper(family = "demo", name = "my_mapper")
public class MyMapper {
}``````
@Emitter

`@Emitter` is a shortcut for `@PartitionMapper` when you don’t support distribution. It enforces an implicit partition mapper execution with an assessor size of 1 and a split returning itself.

``````@Emitter(family = "demo", name = "my_input")
public class MyInput {
}``````
###### @Processor

A method decorated with `@Processor` is considered as a producer factory:

``````@Processor(family = "demo", name = "my_processor")
public class MyProcessor {
}``````

#### Component internationalization

In common cases, you can store messages using a properties file in your component module to use internationalization.

Store the properties file in the same package as the related components and name it `Messages`. For example, `org.talend.demo.MyComponent` uses `org.talend.demo.Messages[locale].properties`.

##### Default components keys

Out of the box components are internationalized using the same location logic for the resource bundle. The supported keys are:

Name Pattern Description

${family}._displayName Display name of the family${family}.${configurationType}.${name}._displayName

Display name of a configuration type (dataStore or dataSet)

${family}.${component_name}._displayName

Display name of the component (used by the GUIs)

${property_path}._displayName Display name of the option.${simple_class_name}.${property_name}._displayName Display name of the option using its class name.${enum_simple_class_name}.${enum_name}._displayName Display name of the `enum_name` value of the `enum_simple_class_name` enumeration.${property_path}._placeholder

Placeholder of the option.

Example of configuration for a component named `list` and belonging to the `memory` family (`@Emitter(family = "memory", name = "list")`):

``memory.list._displayName = Memory List``

Configuration classes can be translated using the simple class name in the messages properties file. This is useful in case of common configurations shared by multiple components.

For example, if you have a configuration class as follows :

``````public class MyConfig {

@Option
private String host;

@Option
private int port;
}``````

You can give it a translatable display name by adding `${simple_class_name}.${property_name}._displayName` to `Messages.properties` under the same package as the configuration class.

``````MyConfig.host._displayName = Server Host Name
MyConfig.host._placeholder = Enter Server Host Name...

MyConfig.port._displayName = Server Port
MyConfig.port._placeholder = Enter Server Port...``````
 If you have a display name using the property path, it overrides the display name defined using the simple class name. This rule also applies to placeholders.

### Components Packaging

Talend Component scanning is based on plugins. To make sure that plugins can be developed in parallel and avoid conflicts, plugins need to be isolated (component or group of components in a single jar/plugin).

Multiple options are available:

• Graph classloading: this option allows you to link the plugins and dependencies together dynamically in any direction.

For example, he tree classloading is commonly used by Servlet containers where plugins are web applications.

• Flat classpath: listed for completeness but rejected by design because it doesn’t comply with this requirement.

In order to avoid much complexity added by this layer, Talend Component Kit relies on a tree classloading. The advantage is that you don’t need to define the relationship with other plugins/dependencies, because it is built-in.

Here is a representation of this solution:

The shared area contains Talend Component Kit API, which only contains by default the classes shared by the plugins.

##### Packaging a plugin
 This section explains the overall way to handle dependencies but the Talend Maven plugin provides a shortcut for that.

A plugin is a JAR file that was enriched with the list of its dependencies. By default, Talend Component Kit runtime is able to read the output of `maven-dependency-plugin` in `TALEND-INF/dependencies.txt`. You just need to make sure that your component defines the following plugin:

``````<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>3.0.2</version>
<executions>
<execution>
<id>create-TALEND-INF/dependencies.txt</id>
<phase>process-resources</phase>
<goals>
<goal>list</goal>
</goals>
<configuration>
<outputFile>${project.build.outputDirectory}/TALEND-INF/dependencies.txt</outputFile> </configuration> </execution> </executions> </plugin>`````` Once build, check the JAR file and look for the following lines: ``````$ unzip -p target/mycomponent-1.0.0-SNAPSHOT.jar TALEND-INF/dependencies.txt

The following files have been resolved:
org.talend.sdk.component:component-api:jar:1.0.0-SNAPSHOT:provided
org.apache.geronimo.specs:geronimo-annotation_1.3_spec:jar:1.0:provided
org.superbiz:awesome-project:jar:1.2.3:compile
junit:junit:jar:4.12:test
org.hamcrest:hamcrest-core:jar:1.3:test``````

What is important to see is the scope related to the artifacts:

• The APIs (`component-api` and `geronimo-annotation_1.3_spec`) are `provided` because you can consider them to be there when executing (they come with the framework).

• Your specific dependencies (`awesome-project` in the example above) are marked as `compile`: they are included as needed dependencies by the framework (note that using `runtime` works too).

• the other dependencies are ignored. For example, `test` dependencies.

##### Packaging an application

Even if a flat classpath deployment is possible, it is not recommended because it would then reduce the capabilities of the components.

###### Dependencies

The way the framework resolves dependencies is based on a local Maven repository layout. As a quick reminder, it looks like:

``````.
├── groupId1
│   └── artifactId1
│       ├── version1
│       │   └── artifactId1-version1.jar
│       └── version2
│           └── artifactId1-version2.jar
└── groupId2
└── artifactId2
└── version1
└── artifactId2-version1.jar``````

This is all the layout the framework uses. The logic converts `t-uple {groupId, artifactId, version, type (jar)}` to the path in the repository.

Talend Component Kit runtime has two ways to find an artifact:

• From the file system based on a configured Maven 2 repository.

• From a fat JAR (uber JAR) with a nested Maven repository under `MAVEN-INF/repository`.

The first option uses either `${user.home}/.m2/repository` default) or a specific path configured when creating a `ComponentManager`. The nested repository option needs some configuration during the packaging to ensure the repository is correctly created. Creating a nested Maven repository with maven-shade-plugin To create the nested `MAVEN-INF/repository` repository, you can use the `nested-maven-repository` extension: ``````<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>3.0.0</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <transformers> <transformer implementation="org.talend.sdk.component.container.maven.shade.ContainerDependenciesTransformer"> <session>${session}</project>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
<dependencies>
<dependency>
<groupId>org.talend.sdk.component</groupId>
<artifactId>nested-maven-repository</artifactId>
<version>${the.plugin.version}</version> </dependency> </dependencies> </plugin>`````` ###### Listing needed plugins Plugins are usually programmatically registered. If you want to make some of them automatically available, you need to generate a `TALEND-INF/plugins.properties` file that maps a plugin name to coordinates found with the Maven mechanism described above. You can enrich `maven-shade-plugin` to do it: ``````<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>3.0.0</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <transformers> <transformer implementation="org.talend.sdk.component.container.maven.shade.PluginTransformer"> <session>${session}</project>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
<dependencies>
<dependency>
<groupId>org.talend.sdk.component</groupId>
<artifactId>nested-maven-repository</artifactId>
<version>${the.plugin.version}</version> </dependency> </dependencies> </plugin>`````` ###### maven-shade-plugin extensions Here is a final job/application bundle based on maven-shade-plugin: ``````<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>3.0.0</version> <configuration> <createDependencyReducedPom>false</createDependencyReducedPom> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/.SF</exclude> <exclude>META-INF/.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> </configuration> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <shadedClassifierName>shaded</shadedClassifierName> <transformers> <transformer implementation="org.talend.sdk.component.container.maven.shade.ContainerDependenciesTransformer"> <session>${session}</session>
<userArtifacts>
<artifact>
<groupId>org.talend.sdk.component</groupId>
<artifactId>sample-component</artifactId>
<version>1.0</version>
<type>jar</type>
</artifact>
</userArtifacts>
</transformer>
<session>${session}</session> <userArtifacts> <artifact> <groupId>org.talend.sdk.component</groupId> <artifactId>sample-component</artifactId> <version>1.0</version> <type>jar</type> </artifact> </userArtifacts> </transformer> </transformers> </configuration> </execution> </executions> <dependencies> <dependency> <groupId>org.talend.sdk.component</groupId> <artifactId>nested-maven-repository-maven-plugin</artifactId> <version>${the.version}</version>
</dependency>
</dependencies>
</plugin>``````
 The configuration unrelated to transformers depends on your application.

`ContainerDependenciesTransformer` embeds a Maven repository and `PluginTransformer` to create a file that lists (one per line) artifacts (representing plugins).

Both transformers share most of their configuration:

• `session`: must be set to `${session}`. This is used to retrieve dependencies. • `scope`: a comma-separated list of scopes to include in the artifact filtering (note that the default will rely on `provided` but you can replace it by `compile`, `runtime`, `runtime+compile`, `runtime+system` or `test`). • `include`: a comma-separated list of artifacts to include in the artifact filtering. • `exclude`: a comma-separated list of artifacts to exclude in the artifact filtering. • `userArtifacts`: a list of artifacts (groupId, artifactId, version, type - optional, file - optional for plugin transformer, scope - optional) which can be forced inline. This parameter is mainly useful for `PluginTransformer`. • `includeTransitiveDependencies`: should transitive dependencies of the components be included. Set to `true` by default. • `includeProjectComponentDependencies`: should project component dependencies be included. Set to `false` by default. It is not needed when a job project uses isolation for components. • `userArtifacts`: set of component artifacts to include.  With the component tooling, it is recommended to keep default locations. Also if you need to use project dependencies, you can need to refactor your project structure to ensure component isolation. Talend Component Kit lets you handle that part but the recommended practice is to use `userArtifacts` for the components instead of project ``. ContainerDependenciesTransformer `ContainerDependenciesTransformer` specific configuration is as follows: • `repositoryBase`: base repository location (`MAVEN-INF/repository` by default). • `ignoredPaths`: a comma-separated list of folders not to create in the output JAR. This is common for folders already created by other transformers/build parts. PluginTransformer `ContainerDependenciesTransformer` specific configuration is the following one: • `pluginListResource`: base repository location (default to TALEND-INF/plugins.properties`). For example, if you want to list only the plugins you use, you can configure this transformer as follows: ``````<transformer implementation="org.talend.sdk.component.container.maven.shade.PluginTransformer"> <session>${session}</session>
<include>org.talend.sdk.component:component-x,org.talend.sdk.component:component-y,org.talend.sdk.component:component-z</include>
</transformer>``````
##### Component scanning rules and default exclusions

The framework uses two kind of filterings when scanning your component. One based on the JAR name and one based on the package name. Make sure that your component definitions (including services) are in a scanned module if they are not registered manually using `ComponentManager.instance().addPlugin()`, and that the component package is not excluded.

###### Jars Scanning

To find components the framework can scan the classpath but in this case, to avoid to scan the whole classpath which can be really huge an impacts a lot the startup time, several jars are excluded out of the box.

These jars use the following prefix:

• ApacheJMeter

• FastInfoset

• HdrHistogram

• HikariCP

• PDFBox

• RoaringBitmap-

• XmlSchema-

• accessors-smart

• activation-

• activeio-

• activemq-

• aeron

• aether-

• agrona

• akka-

• animal-sniffer-annotation

• annotation

• ant-

• antlr-

• antlr4-

• aopalliance-

• apache-el

• apache-mime4j

• apacheds-

• api-asn1-

• api-common-

• api-util-

• apiguardian-api-

• app-

• archaius-core

• args4j-

• arquillian-

• asciidoctorj-

• asm-

• aspectj

• async-http-client-

• auto-value-

• autoschema-

• avalon-framework-

• avro-

• avro4s-

• awaitility-

• aws-

• axis-

• axis2-

• base64-

• batchee-jbatch

• batik-

• bcmail

• bcpkix

• bcprov-

• beam-model-

• beam-runners-

• beam-sdks-

• bigtable-client-

• bigtable-protos-

• boilerpipe-

• bonecp

• bootstrap.jar

• brave-

• bsf-

• bval

• byte-buddy

• c3p0-

• cache

• carrier

• cassandra-driver-core

• catalina-

• catalina.jar

• cats

• cdi-

• cglib-

• charsets.jar

• chill

• classindex

• classmate

• classutil

• classycle

• cldrdata

• commands-

• common-

• commons-

• component-api

• component-form

• component-runtime

• component-server

• component-spi

• component-studio

• components-api

• components-common

• compress-lzf

• config

• constructr

• container-core

• contenttype

• coverage-agent

• cryptacular-

• cssparser-

• curator-

• curvesapi-

• cxf-

• daikon

• databinding

• dataquality

• dataset-

• datastore-

• debugger-agent

• deltaspike-

• deploy.jar

• derby-

• derbyclient-

• derbynet-

• dnsns

• dom4j

• draw2d

• easymock-

• ecj-

• ehcache-

• el-api

• enumeratum

• enunciate-core-annotations

• error_prone_annotations

• expressions

• fastutil

• feign-core

• feign-hystrix

• feign-slf4j

• filters-helpers

• findbugs-

• fluent-hc

• fluentlenium-core

• fontbox

• freemarker-

• fusemq-leveldb-

• gax-

• gcsio-

• gef-

• geocoder

• geronimo-

• gmbal

• gpars-

• gragent.jar

• graph

• grizzled-scala

• grizzly-

• groovy-

• grpc-

• gson-

• guava-

• guice-

• h2-

• hamcrest-

• hawtbuf-

• hawtdispatch-

• hawtio-

• hawtjni-runtime

• help-

• hibernate-

• hk2-

• howl-

• hsqldb-

• htmlunit-

• htrace-

• httpclient-

• httpcore-

• httpmime

• hystrix

• iban4j-

• icu4j-

• idb-

• idea_rt.jar

• instrumentation-api

• ion-java

• isoparser-

• istack-commons-runtime-

• ivy-

• j2objc-annotations

• jBCrypt

• jaccess

• jackcess-

• jackson-

• janino-

• jansi-

• jasper-el.jar

• jasper.jar

• jasypt-

• java-atk-wrapper

• java-libpst-

• java-support-

• java-xmlbuilder-

• javacsv

• javaee-

• javaee-api

• javassist-

• javaws.jar

• javax.

• jaxb-

• jaxp-

• jbake-

• jboss-

• jbossall-

• jbosscx-

• jbossjts-

• jbosssx-

• jcache

• jce.jar

• jcip-annotations

• jcl-over-slf4j-

• jcommander-

• jdbcdslog-1

• jempbox

• jersey-

• jets3t

• jettison-

• jetty-

• jface

• jfairy

• jffi

• jfr.jar

• jfxrt.jar

• jfxswt

• jhighlight

• jjwt

• jline

• jmatio-

• jmdns-

• jmespath-

• jms

• jmustache

• jna-

• jnr-

• jobs-

• joda-convert

• joda-time-

• johnzon-

• jolokia-

• jopt-simple

• jruby-

• json-

• json4s-

• jsonb-api

• jsoup-

• jsp-api

• jsr

• jsse.jar

• jta

• jul-to-slf4j-

• juli-

• junit-

• junit5-

• juniversalchardet

• junrar-

• jwt

• jython

• kafka

• kotlin-runtime

• kryo

• leveldb

• libphonenumber

• lift-json

• lmdbjava

• localedata

• log4j-

• logback

• logging-event-layout

• logkit-

• lombok

• lucene

• lz4

• machinist

• macro-compat

• mail-

• management-

• mapstruct-

• maven-

• mbean-annotation-api-

• meecrowave-

• mesos-

• metrics-

• microprofile-config-api-

• mimepull-

• mina-

• minlog

• mockito-core

• mqtt-client-

• multitenant-core

• multiverse-core-

• mx4j-

• myfaces-

• mysql-connector-java-

• nashorn

• neethi-

• neko-htmlunit

• nekohtml-

• netflix

• netty-

• nimbus-jose-jwt

• objenesis-

• okhttp

• okio

• opencensus-

• openjpa-

• openmdx-

• opennlp-

• opensaml-

• opentest4j-

• openwebbeans-

• openws-

• ops4j-

• org.apache.aries

• org.apache.commons

• org.apache.log4j

• org.eclipse.

• org.junit.

• org.osgi.core-

• org.osgi.enterprise

• org.talend

• orient-commons-

• orientdb-core-

• orientdb-nativeos-

• oro-

• osgi

• paranamer

• parquet

• pax-url

• pdfbox

• play

• plexus-

• plugin.jar

• poi-

• postgresql

• preferences-

• prefixmapper

• proto-

• protobuf-

• py4j-

• pyrolite-

• qdox-

• quartz-2

• quartz-openejb-

• reactive-streams

• reflectasm-

• reflections

• regexp-

• registry-

• resources.jar

• rhino

• ribbon

• rmock-

• rome

• routes-compiler

• routines

• rt.jar

• runners

• runtime-

• rxjava

• rxnetty

• saaj-

• sac-

• scala

• scalap

• scalatest

• scannotation-

• selenium

• serializer-

• serp-

• service-common

• servlet-api-

• servo-

• shapeless

• shrinkwrap-

• sisu-guice

• sisu-inject

• slf4j-

• slick

• smack-

• smackx-

• snakeyaml-

• snappy-

• spark-

• specs2

• spring-

• sshd-

• ssl-config-core

• stax-api-

• stax2-api-

• stream

• sunec.jar

• sunjce_provider

• sunpkcs11

• surefire-

• swagger-

• swizzle-

• sxc-

• system-rules

• tachyon-

• tagsoup-

• talend-icon

• test-agent

• test-interface

• testng-

• threetenbp

• tika-

• tomcat

• tomee-

• tools.jar

• twirl

• tyrex

• uncommons

• unused

• util

• validation-api-

• velocity-

• wagon-

• wandou

• webbeans-

• websocket

• woodstox-core

• workbench

• ws-commons-util-

• wsdl4j-

• wss4j-

• wstx-asl-

• xalan-

• xbean-

• xercesImpl-

• xlsx-streamer-

• xml-apis-

• xml-resolver-

• xmlbeans-

• xmlenc-

• xmlgraphics-

• xmlpcore

• xmlpull-

• xmlrpc-

• xmlschema-

• xmlsec-

• xmltooling-

• xmlunit-

• xstream-

• xz-

• zipfs.jar

• zipkin-

• ziplock-

• zkclient

• zookeeper-

###### Package Scanning

Since the framework can be used in the case of fatjars or shades, and because it still uses scanning, it is important to ensure we don’t scan the whole classes for performances reason.

Therefore, the following packages are ignored:

• com.codehale.metrics

• com.ctc.wstx

• com.datastax.driver.core

• com.fasterxml.jackson.annotation

• com.fasterxml.jackson.core

• com.fasterxml.jackson.databind

• com.fasterxml.jackson.dataformat

• com.fasterxml.jackson.module

• com.ibm.wsdl

• com.jcraft.jsch

• com.kenai.jffi

• com.kenai.jnr

• com.sun.istack

• com.sun.xml.bind

• com.sun.xml.messaging.saaj

• com.sun.xml.txw2

• com.thoughtworks

• io.jsonwebtoken

• io.netty

• io.swagger.annotations

• io.swagger.config

• io.swagger.converter

• io.swagger.core

• io.swagger.jackson

• io.swagger.jaxrs

• io.swagger.model

• io.swagger.models

• io.swagger.util

• javax

• jnr

• junit

• net.sf.ehcache

• net.shibboleth.utilities.java.support

• org.aeonbits.owner

• org.apache.activemq

• org.apache.beam

• org.apache.bval

• org.apache.camel

• org.apache.catalina

• org.apache.commons.beanutils

• org.apache.commons.cli

• org.apache.commons.codec

• org.apache.commons.collections

• org.apache.commons.compress

• org.apache.commons.dbcp2

• org.apache.commons.digester

• org.apache.commons.io

• org.apache.commons.jcs.access

• org.apache.commons.jcs.auxiliary

• org.apache.commons.jcs.engine

• org.apache.commons.jcs.io

• org.apache.commons.jcs.utils

• org.apache.commons.lang

• org.apache.commons.lang3

• org.apache.commons.logging

• org.apache.commons.pool2

• org.apache.coyote

• org.apache.cxf

• org.apache.geronimo.javamail

• org.apache.geronimo.mail

• org.apache.geronimo.osgi

• org.apache.geronimo.specs

• org.apache.http

• org.apache.jcp

• org.apache.johnzon

• org.apache.juli

• org.apache.logging.log4j.core

• org.apache.logging.log4j.jul

• org.apache.logging.log4j.util

• org.apache.logging.slf4j

• org.apache.meecrowave

• org.apache.myfaces

• org.apache.naming

• org.apache.neethi

• org.apache.openejb

• org.apache.openjpa

• org.apache.oro

• org.apache.tomcat

• org.apache.tomee

• org.apache.velocity

• org.apache.webbeans

• org.apache.ws

• org.apache.wss4j

• org.apache.xbean

• org.apache.xml

• org.apache.xml.resolver

• org.bouncycastle

• org.codehaus.jackson

• org.codehaus.stax2

• org.codehaus.swizzle.Grep

• org.codehaus.swizzle.Lexer

• org.cryptacular

• org.eclipse.jdt.core

• org.eclipse.jdt.internal

• org.fusesource.hawtbuf

• org.h2

• org.hamcrest

• org.hsqldb

• org.jasypt

• org.jboss.marshalling

• org.joda.time

• org.jose4j

• org.junit

• org.jvnet.mimepull

• org.metatype.sxc

• org.objectweb.asm

• org.objectweb.howl

• org.openejb

• org.opensaml

• org.slf4j

• org.swizzle

• org.terracotta.context

• org.terracotta.entity

• org.terracotta.modules.ehcache

• org.terracotta.statistics

• org.tukaani

• org.yaml.snakeyaml

• serp

 it is not recommanded but possible to add in your plugin module a `TALEND-INF/scanning.properties` file with `classloader.includes` and `classloader.excludes` entries to refine the scanning with custom rules. In such a case, exclusions win over inclusions.

### Build tools

#### Maven Plugin

`talend-component-maven-plugin` helps you write components that match best practices and generate transparently metadata used by Talend Studio.

You can use it as follows:

``````<plugin>
<groupId>org.talend.sdk.component</groupId>
<artifactId>talend-component-maven-plugin</artifactId>
<version>${component.version}</version> </plugin>`````` This plugin is also an extension so you can declare it in your `build/extensions` block as: ``````<extension> <groupId>org.talend.sdk.component</groupId> <artifactId>talend-component-maven-plugin</artifactId> <version>${component.version}</version>
</extension>``````

Used as an extension, the `dependencies`, `validate` and `documentation` goals will be set up.

##### Dependencies

The first goal is a shortcut for the `maven-dependency-plugin`. It creates the `TALEND-INF/dependencies.txt` file with the `compile` and `runtime` dependencies, allowing the component to use it at runtime:

``````<plugin>
<groupId>org.talend.sdk.component</groupId>
<artifactId>talend-component-maven-plugin</artifactId>
<version>${component.version}</version> <executions> <execution> <id>talend-dependencies</id> <goals> <goal>dependencies</goal> </goals> </execution> </executions> </plugin>`````` ##### Validate This goal helps you validate the common programming model of the component. To activate it, you can use following execution definition: ``````<plugin> <groupId>org.talend.sdk.component</groupId> <artifactId>talend-component-maven-plugin</artifactId> <version>${component.version}</version>
<executions>
<execution>
<id>talend-component-validate</id>
<goals>
<goal>validate</goal>
</goals>
</execution>
</executions>
</plugin>``````

It is bound to the `process-classes` phase by default. When executed, it performs several validations that can be disabled by setting the corresponding flags to `false` in the `<configuration>` block of the execution:

Name Description Default

validateInternationalization

Validates that resource bundles are presents and contain commonly used keys (for example, `_displayName`)

true

validateModel

Ensures that components pass validations of the `ComponentManager` and Talend Component runtime

true

validateSerializable

Ensures that components are `Serializable`. This is a sanity check, the component is not actually serialized here. If you have a doubt, make sure to test it. It also checks that any `@Internationalized` class is valid and has its keys.

true

Ensures that components have an `@Icon` and a `@Version` defined.

true

validateDataStore

Ensures that any `@DataStore` defines a `@HealthCheck`.

true

validateComponent

Ensures that the native programming model is respected. You can disable it when using another programming model like Beam.

true

validateActions

Validates action signatures for actions not tolerating dynamic binding (`@HealthCheck`, `@DynamicValues`, and so on). It is recommended to keep it set to `true`.

true

validateFamily

Validates the family by verifying that the package containing the `@Components` has a `@Icon` property defined.

true

validateDocumentation

Ensures that all components and `@Option` properties have a documentation using the `@Documentation` property.

true

validateLayout

Ensures that the layout is referencing existing options and properties.

true

validateOptionNames

Ensures that the option names are compliant with the framework. It is highly recommended and safer to keep it set to `true`.

true

##### Documentation

This goal generates an Asciidoc file documenting your component from the configuration model (`@Option`) and the `@Documentation` property that you can add to options and to the component itself.

``````<plugin>
<groupId>org.talend.sdk.component</groupId>
<artifactId>talend-component-maven-plugin</artifactId>
<version>${component.version}</version> <executions> <execution> <id>talend-component-documentation</id> <goals> <goal>asciidoc</goal> </goals> </execution> </executions> </plugin>`````` Name Description Default level Level of the root title. 2 (`==`) output Output folder path. It is recommended to keep it to the default value. `${classes}/TALEND-INF/documentation.adoc`

formats

Map of the renderings to do. Keys are the format (`pdf` or `html`) and values the output paths.

-

attributes

Map of asciidoctor attributes when formats is set.

-

templateDir / templateEngine

Template configuration for the rendering.

-

title

Document title.

${project.name} attachDocumentations Allows to attach (and deploy) the documentations (`.adoc`, and `formats` keys) to the project. true  If you use the plugin as an extension, you can add the `talend.documentation.htmlAndPdf` property and set it to `true` in your project to automatically get HTML and PDF renderings of the documentation. ###### Rendering your documentation To render the generated documentation in HTML or PDF, you can use the Asciidoctor Maven plugin (or Gradle equivalent). You can configure both executions if you want both HTML and PDF renderings. Make sure to execute the rendering after the documentation generation. HTML rendering If you prefer a HTML rendering, you can configure the following execution in the asciidoctor plugin. The example below: 1. Generates the components documentation in `target/classes/TALEND-INF/documentation.adoc`. 2. Renders the documentation as an HTML file stored in `target/documentation/documentation.html`. ``````<plugin> (1) <groupId>org.talend.sdk.component</groupId> <artifactId>talend-component-maven-plugin</artifactId> <version>${talend-component-kit.version}</version>
<executions>
<execution>
<id>documentation</id>
<phase>prepare-package</phase>
<goals>
<goal>asciidoc</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin> (2)
<groupId>org.asciidoctor</groupId>
<artifactId>asciidoctor-maven-plugin</artifactId>
<version>1.5.6</version>
<executions>
<execution>
<id>doc-html</id>
<phase>prepare-package</phase>
<goals>
<goal>process-asciidoc</goal>
</goals>
<configuration>
<sourceDirectory>${project.build.outputDirectory}/TALEND-INF</sourceDirectory> <sourceDocumentName>documentation.adoc</sourceDocumentName> <outputDirectory>${project.build.directory}/documentation</outputDirectory>
<backend>html5</backend>
</configuration>
</execution>
</executions>
</plugin>``````
PDF rendering

If you prefer a PDF rendering, you can configure the following execution in the asciidoctor plugin:

``````<plugin>
<groupId>org.asciidoctor</groupId>
<artifactId>asciidoctor-maven-plugin</artifactId>
<version>1.5.6</version>
<executions>
<execution>
<id>doc-html</id>
<phase>prepare-package</phase>
<goals>
<goal>process-asciidoc</goal>
</goals>
<configuration>
<sourceDirectory>${project.build.outputDirectory}/TALEND-INF</sourceDirectory> <sourceDocumentName>documentation.adoc</sourceDocumentName> <outputDirectory>${project.build.directory}/documentation</outputDirectory>
<backend>pdf</backend>
</configuration>
</execution>
</executions>
<dependencies>
<dependency>
<groupId>org.asciidoctor</groupId>
<artifactId>asciidoctorj-pdf</artifactId>
<version>1.5.0-alpha.16</version>
</dependency>
</dependencies>
</plugin>``````
Including the documentation into a document

If you want to add some more content or a title, you can include the generated document into another document using Asciidoc `include` directive.

For example:

``````= Super Components
Super Writer
:toc:
:toclevels: 3
:source-highlighter: prettify
:numbered:
:icons: font
:hide-uri-scheme:
:imagesdir: images

To be able to do that, you need to pass the `generated_doc` attribute to the plugin. For example:

``````<plugin>
<groupId>org.asciidoctor</groupId>
<artifactId>asciidoctor-maven-plugin</artifactId>
<version>1.5.6</version>
<executions>
<execution>
<id>doc-html</id>
<phase>prepare-package</phase>
<goals>
<goal>process-asciidoc</goal>
</goals>
<configuration>
<sourceDirectory>${project.basedir}/src/main/asciidoc</sourceDirectory> <sourceDocumentName>my-main-doc.adoc</sourceDocumentName> <outputDirectory>${project.build.directory}/documentation</outputDirectory>
<backend>html5</backend>
<attributes>
<generated_adoc>${project.build.outputDirectory}/TALEND-INF</generated_adoc> </attributes> </configuration> </execution> </executions> </plugin>`````` This is optional but allows to reuse Maven placeholders to pass paths, which can be convenient in an automated build. You can find more customization options on Asciidoctor website. ##### Testing a component web rendering Testing the rendering of your component configuration into the Studio requires deploying the component in Talend Studio (refer to Studio Documentation. In the case where you need to deploy your component into a Cloud (web) environment, you can test its web rendering by using the `web` goal of the plugin: 1. Run the `mvn talend-component:web` command. 2. Open the following URL in a web browser: `localhost:8080`. 3. Select the component form you want to see from the treeview on the left. The selected form is displayed on the right. Two parameters are available with the plugin:  Make sure to install the artifact before using this command because it reads the component JAR from the local Maven repository. ##### Generating inputs or outputs The Mojo `generate` (Maven plugin goal) of the same plugin also embeds a generator that you can use to bootstrap any input or output component: ``````<plugin> <groupId>org.talend.sdk.component</groupId> <artifactId>talend-component-maven-plugin</artifactId> <version>${talend-component.version}</version>
<executions>
<execution> (1)
<id>generate-input</id>
<phase>generate-sources</phase>
<goals>
<goal>generate</goal>
</goals>
<configuration>
<type>input</type>
</configuration>
</execution>
<execution> (2)
<id>generate-output</id>
<phase>generate-sources</phase>
<goals>
<goal>generate</goal>
</goals>
<configuration>
<type>output</type>
</configuration>
</execution>
</executions>
</plugin>``````
 1 The first execution generates an input (partition mapper + emitter). 2 the second execution generates an output.

It is intended to be used from the command line (or IDE Maven integration) as follows:

``````$mvn talend-component:generate \ -Dtalend.generator.type=[input|output] \ (1) [-Dtalend.generator.classbase=com.test.MyComponent] \ (2) [-Dtalend.generator.family=my-family] \ (3) [-Dtalend.generator.pom.read-only=false] (4)``````  1 Select the type of component you want: `input` to generate a mapper and an emitter, or `output` to generate an output processor. 2 Set the class name base (automatically suffixed by the component type). If not set, the package is guessed and the classname is based on the basedir name. 3 Set the component family to use. If not specified, it defaults to the basedir name and removes "component[s]" from it. for example, `my-component` leads to `my` as family, unless it is explicitly set. 4 Specify if the generator needs to add `component-api` to the POM, if not already there. If you already added it, you can set it to `false` directly in the POM. For this command to work, you need to register the plugin as follows: ``````<plugin> <groupId>org.talend.sdk.component</groupId> <artifactId>talend-component-maven-plugin</artifactId> <version>${talend-component.version}</version>
</plugin>``````
##### Talend Component Archive

Component ARchive (`.car`) is the way to bundle a component to share it in the Talend ecosystem. It is a plain Java ARchive (`.jar`) containing a metadata file and a nested Maven repository containing the component and its depenencies.

``mvn talend-component:car``

This command creates a `.car` file in your build directory. This file can be shared on Talend platforms.

This CAR is executable and exposes the `studio-deploy` command which takes a Talend Studio home path as parameter. When executed, it installs the dependencies into the Studio and registers the component in your instance. For example:

``````# for a studio
java -jar mycomponent.car studio-deploy /path/to/my/studio
or
java -jar mycomponent.car studio-deploy --location /path/to/my/studio

# for a m2 provisioning
java -jar mycomponent.car maven-deploy /path/to/.m2/repository
or
java -jar mycomponent.car maven-deploy --location /path/to/.m2/repository``````

You can also upload the dependencies to your Nexus server using the following command:

``java -jar mycomponent.car deploy-to-nexus --url <nexus url> --repo <repository name> --user <username> --pass <password> --threads <parallel threads number> --dir <temp directory>``

In this command, Nexus URL and repository name are mandatory arguments. All other arguments are optional. If arguments contain spaces or special symbols, you need to quote the whole value of the argument. For example:

``--pass "Y0u will \ not G4iess i' ^"``

`gradle-talend-component` helps you write components that match the best practices. It is inspired from the Maven plugin and adds the ability to generate automatically the `dependencies.txt` file used by the SDK to build the component classpath. For more information on the configuration, refer to the Maven properties matching the attributes.

You can use it as follows:

``````buildscript {
repositories {
mavenLocal()
mavenCentral()
}
dependencies {
classpath "org.talend.sdk.component:gradle-talend-component:${talendComponentVersion}" } } apply plugin: 'org.talend.sdk.component' apply plugin: 'java' // optional customization talendComponentKit { // dependencies.txt generation, replaces maven-dependency-plugin dependenciesLocation = "TALEND-INF/dependencies.txt" boolean skipDependenciesFile = false; // classpath for validation utilities sdkVersion = "${talendComponentVersion}"
apiVersion = "${talendComponentApiVersion}" // documentation skipDocumentation = false documentationOutput = new File(....) documentationLevel = 2 // first level will be == in the generated adoc documentationTitle = 'My Component Family' // default to project name documentationFormats = [:] // adoc attributes documentationFormats = [:] // renderings to do // validation skipValidation = false validateFamily = true validateSerializable = true validateInternationalization = true validateModel = true validateOptionNames = true validateMetadata = true validateComponent = true validateDataStore = true validateDataSet = true validateActions = true // web serverArguments = [] serverPort = 8080 // car carOutput = new File(....) carMetadata = [:] // custom meta (string key-value pairs) }`````` ### Services #### Internationalizing services Internationalization requires following several best practices: • Storing messages using `ResourceBundle` properties file in your component module. • The location of the properties is in the same package than the related components and is named `Messages`. For example, `org.talend.demo.MyComponent` uses `org.talend.demo.Messages[locale].properties`. • Use the internationalization API for your own messages. ##### Internationalization API The Internationalization API is the mechanism to use to internationalize your own messages in your own components. The principle of the API is to design messages as methods returning `String` values and get back a template using a `ResourceBundle` named `Messages` and located in the same package than the interface that defines these methods. To ensure your internationalization API is identified, you need to mark it with the `@Internationalized` annotation: ``````@Internationalized (1) public interface Translator { String message(); String templatizedMessage(String arg0, int arg1); (2) String localized(String arg0, @Language Locale locale); (3) String localized(String arg0, @Language String locale); (4) }``````  1 `@Internationalized` allows to mark a class as an internationalized service. 2 You can pass parameters. The message uses the `MessageFormat` syntax to be resolved, based on the `ResourceBundle` template. 3 You can use `@Language` on a `Locale` parameter to specify manually the locale to use. Note that a single value is used (the first parameter tagged as such). 4 `@Language` also supports the `String` type. #### Providing actions for consumers In some cases you can need to add some actions that are not related to the runtime. For example, enabling clients - the users of the plugin/library - to test if a connection works properly. To do so, you need to define an `@Action`, which is a method with a name (representing the event name), in a class decorated with `@Service`: ``````@Service public class MyDbTester { @Action(family = "mycomp", "test") public Status doTest(final IncomingData data) { return ...; } }``````  Services are singleton. If you need some thread safety, make sure that they match that requirement. Services should not store any status either because they can be serialized at any time. Status are held by the component. Services can be used in components as well (matched by type). They allow to reuse some shared logic, like a client. Here is a sample with a service used to access files: ``````@Emitter(family = "sample", name = "reader") public class PersonReader implements Serializable { // attributes skipped to be concise public PersonReader(@Option("file") final File file, final FileService service) { this.file = file; this.service = service; } // use the service @PostConstruct public void open() throws FileNotFoundException { reader = service.createInput(file); } }`````` The service is automatically passed to the constructor. It can be used as a bean. In that case, it is only necessary to call the service method. ##### Particular action types Some common actions need a clear contract so they are defined as API first-class citizen. For example, this is the case for wizards or health checks. Here is the list of the available actions: API Type Description Return type Sample returned type @org.talend.sdk.component.api.service.completion.DynamicValues dynamic_values Mark a method as being useful to fill potential values of a string option for a property denoted by its value. You can link a field as being completable using @Proposable(value). The resolution of the completion action is then done through the component family and value of the action. The callback doesn’t take any parameter. Values `{"items":[{"id":"value","label":"label"}]}` @org.talend.sdk.component.api.service.healthcheck.HealthCheck healthcheck This class marks an action doing a connection test HealthCheckStatus `{"comment":"Something went wrong","status":"KO"}` @org.talend.sdk.component.api.service.schema.DiscoverSchema schema Mark an action as returning a discovered schema. Its parameter MUST be the type decorated with `@Structure`. Schema `{"entries":[{"name":"column1","type":"STRING"}]}` @org.talend.sdk.component.api.service.completion.Suggestions suggestions Mark a method as being useful to fill potential values of a string option. You can link a field as being completable using @Suggestable(value). The resolution of the completion action is then done when the user requests it (generally by clicking on a button or entering the field depending the environment). SuggestionValues `{"cacheable":false,"items":[{"id":"value","label":"label"}]}` @org.talend.sdk.component.api.service.Action user - any - @org.talend.sdk.component.api.service.asyncvalidation.AsyncValidation validation Mark a method as being used to validate a configuration. IMPORTANT: this is a server validation so only use it if you can’t use other client side validation to implement it. ValidationResult `{"comment":"Something went wrong","status":"KO"}` ##### Internationalization Internationalization is supported through the injection of the `$lang` parameter, which allows you to get the correct locale to use with an `@Internationalized` service:

``````public SuggestionValues findSuggestions(@Option("someParameter") final String param,
@Option("$lang") final String lang) { return ...; }``````  You can combine the `$lang` option with the `@Internationalized` and `@Language` parameters.

#### Built-in services

The framework provides built-in services that you can inject by type in components and actions.

Here is the list:

Type Description

`org.talend.sdk.component.api.service.cache.LocalCache`

Provides a small abstraction to cache data that does not need to be recomputed very often. Commonly used by actions for UI interactions.

`org.talend.sdk.component.api.service.dependency.Resolver`

Allows to resolve a dependency from its Maven coordinates.

`javax.json.bind.Jsonb`

A JSON-B instance. If your model is static and you don’t want to handle the serialization manually using JSON-P, you can inject that instance.

`javax.json.spi.JsonProvider`

A JSON-P instance. Prefer other JSON-P instances if you don’t exactly know why you use this one.

`javax.json.JsonBuilderFactory`

A JSON-P instance. It is recommended to use this one instead of a custom one to optimize memory usage and speed.

`javax.json.JsonWriterFactory`

A JSON-P instance. It is recommended to use this one instead of a custom one to optimize memory usage and speed.

`javax.json.JsonReaderFactory`

A JSON-P instance. It is recommended to use this one instead of a custom one to optimize memory usage and speed.

`javax.json.stream.JsonParserFactory`

A JSON-P instance. It is recommended to use this one instead of a custom one to optimize memory usage and speed.

`javax.json.stream.JsonGeneratorFactory`

A JSON-P instance. It is recommended to use this one instead of a custom one to optimize memory usage and speed.

`org.talend.sdk.component.api.service.configuration.LocalConfiguration`

Represents the local configuration that can be used during the design.

`org.talend.sdk.component.api.service.dependency.Resolver`

Allows to resolve files from Maven coordinates (like `dependencies.txt` for component). Note that it assumes that the files are available in the component Maven repository.

`org.talend.sdk.component.api.service.injector.Injector`

Utility to inject services in fields marked with `@Service`.

`org.talend.sdk.component.api.service.factory.ObjectFactory`

Allows to instantiate an object from its class name and properties.

It is not recommended to use it for the runtime because the local configuration is usually different and the instances are distinct.

You can also use the local cache as an interceptor with `@Cached`

Every interface that extends `HttpClient` and that contains methods annotated with `@Request`

Lets you define an HTTP client in a declarative manner using an annotated interface.

See the Using HttpClient for more details.

 All these injected services are serializable, which is important for big data environments. If you create the instances yourself, you cannot benefit from these features, nor from the memory optimization done by the runtime. Prefer reusing the framework instances over custom ones.
##### Using HttpClient

The HttpClient usage is described in this section by using the REST API example below. It is assume that it requires a basic authentication header.

 GET `/api/records/{id}` - POST `/api/records` JSON payload to be created: `{"id":"some id", "data":"some data"}`

To create an HTTP client that is able to consume the REST API above, you need to define an interface that extends `HttpClient`.

The `HttpClient` interface lets you set the `base` for the HTTP address that the client will hit.

The `base` is the part of the address that needs to be added to the request path to hit the API.

Every method annotated with `@Request` in the interface defines an HTTP request. Every request can have a `@Codec` parameter that allows to encode or decode the request/response payloads.

 You can ignore the encoding/decoding for `String` and `Void` payloads.
``````public interface APIClient extends HttpClient {
@Request(path = "api/records/{id}", method = "GET")
@Codec(decoder = RecordDecoder.class) //decoder =  decode returned data to Record class
Record getRecord(@Header("Authorization") String basicAuth, @Path("id") int id);

@Request(path = "api/records", method = "POST")
@Codec(encoder = RecordEncoder.class, decoder = RecordDecoder.class) //encoder = encode record to fit request format (json in this example)
Record createRecord(@Header("Authorization") String basicAuth, Record record);
}``````
 The interface should extend `HttpClient`.

In the codec classes (that implement Encoder/Decoder), you can inject any of your service annotated with `@Service` or `@Internationalized` into the constructor. Internationalization services can be useful to have internationalized messages for errors handling.

The interface can be injected into component classes or services to consume the defined API.

``````@Service
public class MyService {

private APIClient client;

public MyService(...,APIClient client){
//...
this.client = client;
client.base("http://localhost:8080");// init the base of the api, ofen in a PostConstruct or init method
}

//...
// Our get request
Record rec =  client.getRecord("Basic MLFKG?VKFJ", 100);

//...
// Our post request
Record newRecord = client.createRecord("Basic MLFKG?VKFJ", new Record());
}``````
 By default, `/+json` are mapped to JSON-P and `/+xml` to JAX-B if the model has a `@XmlRootElement` annotation.
###### Customizing HTTP client requests

For advanced cases, you can customize the `Connection` by directly using `@UseConfigurer` on the method. It calls your custom instance of `Configurer`. Note that you can use `@ConfigurerOption` in the method signature to pass some `Configurer` configurations.

For example, if you have the following `Configurer`:

``````public class BasicConfigurer implements Configurer {
@Override
public void configure(final Connection connection, final ConfigurerConfiguration configuration) {
final String user = configuration.get("username", String.class);
final String pwd = configuration.get("password", String.class);
"Authorization",
Base64.getEncoder().encodeToString((user + ':' + pwd).getBytes(StandardCharsets.UTF_8)));
}
}``````

You can then set it on a method to automatically add the basic header with this kind of API usage:

``````public interface APIClient extends HttpClient {
@Request(path = "...")
@UseConfigurer(BasicConfigurer.class)
}``````

#### Services and interceptors

For common concerns such as caching, auditing, and so on, you can use an interceptor-like API. It is enabled on services by the framework.

An interceptor defines an annotation marked with `@Intercepts`, which defines the implementation of the interceptor (`InterceptorHandler`).

For example:

``````@Intercepts(LoggingHandler.class)
@Target({ TYPE, METHOD })
@Retention(RUNTIME)
public @interface Logged {
String value();
}``````

The handler is created from its constructor and can take service injections (by type). The first parameter, however, can be `BiFunction<Method, Object[], Object>`, which represents the invocation chain if your interceptor can be used with others.

 If you make a generic interceptor, pass the invoker as first parameter. Otherwise you cannot combine interceptors at all.

Here is an example of interceptor implementation for the `@Logged` API:

``````public class LoggingHandler implements InterceptorHandler {
// injected
private final BiFunction<Method, Object[], Object> invoker;
private final SomeService service;

// internal
private final ConcurrentMap<Method, String> loggerNames = new ConcurrentHashMap<>();

public CacheHandler(final BiFunction<Method, Object[], Object> invoker, final SomeService service) {
this.invoker = invoker;
this.service = service;
}

@Override
public Object invoke(final Method method, final Object[] args) {
final String name = loggerNames.computeIfAbsent(method, m -> findAnnotation(m, Logged.class).get().value());
service.getLogger(name).info("Invoking {}", method.getName());
return invoker.apply(method, args);
}
}``````

This implementation is compatible with interceptor chains because it takes the invoker as first constructor parameter and it also takes a service injection. Then, the implementation simply does what is needed, which is logging the invoked method in this case.

 The `findAnnotation` annotation, inherited from `InterceptorHandler`, is an utility method to find an annotation on a method or class (in this order).

#### Creating a job pipeline

##### Job Builder

The `Job` builder lets you create a job pipeline programmatically using Talend components (Producers and Processors). The job pipeline is an acyclic graph, allowing you to build complex pipelines.

Let’s take a simple use case where two data sources (employee and salary) are formatted to CSV and the result is written to a file.

A job is defined based on components (nodes) and links (edges) to connect their branches together.

Every component is defined by a unique `id` and an URI that identify the component.

The URI follows the form `[family]://[component][?version][&configuration]`, where:

• family is the name of the component family.

• component is the name of the component.

• version is the version of the component. It is represented in a key=value format. The key is `__version` and the value is a number.

• configuration is component configuration. It is represented in a key=value format. The key is the path of the configuration and the value is a `string' corresponding to the configuration value.

URI example
``job://csvFileGen?__version=1&path=/temp/result.csv&encoding=utf-8"``
 configuration parameters must be URI/URL encoded.
Job example
``````Job.components()   (1)
.component("employee","db://input")
.component("salary", "db://input")
.component("concat", "transform://concat?separator=;")
.component("csv", "file://out?__version=2")
.connections()  (2)
.from("employee").to("concat", "string1")
.from("salary").to("concat", "string2")
.from("concat").to("csv")
.build()    (3)
.run(); (4)``````
 1 Defining all components used in the job pipeline. 2 Defining the connections between the components to construct the job pipeline. The links `from`/`to` use the component id and the default input/output branches. You can also connect a specific branch of a component, if it has multiple or named input/output branches, using the methods `from(id, branchName)` and `to(id, branchName)`. In the example above, the concat component has two inputs ("string1" and "string2"). 3 Validating the job pipeline by asserting that: It has some starting components (components that don’t have a `from` connection and that need to be of the producer type). There are no cyclic connections. The job pipeline needs to be an acyclic graph. All components used in the connections are already declared. Each connection is used only once. You cannot connect a component input/output branch twice. 4 Running the job pipeline.
 In this version, the execution of the job is linear. Components are not executed in parallel even if some steps may be independents.
###### Environment/Runner

Depending on the configuration, you can select the environment which you execute your job in.

To select the environment, the logic is the following one:

1. If an `org.talend.sdk.component.runtime.manager.chain.Job.ExecutorBuilder` class is passed through the job properties, then use it. The supported types are a `ExecutionBuilder` instance, a `Class` or a `String`.

2. if an `ExecutionBuilder` SPI is present, use it. It is the case if `component-runtime-beam` is present in your classpath.

3. else, use a local/standalone execution.

In the case of a Beam execution, you can customize the pipeline options using system properties. They have to be prefixed with `talend.beam.job.`. For example, to set the `appName` option, you need to use `-Dtalend.beam.job.appName=mytest`.

###### Key Provider

The job builder lets you set a key provider to join your data when a component has multiple inputs. The key provider can be set contextually to a component or globally to the job.

``````Job.components()
.component("employee","db://input")
.property(GroupKeyProvider.class.getName(),
(GroupKeyProvider) context -> context.getData().getString("id")) (1)
.component("salary", "db://input")
.component("concat", "transform://concat?separator=;")
.connections()
.from("employee").to("concat", "string1")
.from("salary").to("concat", "string2")
.build()
.property(GroupKeyProvider.class.getName(), (2)
(GroupKeyProvider) context -> context.getData().getString("employee_id"))
.run();``````
 1 Defining a key provider for the data produced by the `employee` component. 2 Defining a key provider for all data manipulated in the job.

If the incoming data has different IDs, you can provide a complex global key provider that relies on the context given by the component `id` and the branch `name`.

``````GroupKeyProvider keyProvider = context -> {
if ("employee".equals(context.getComponentId())) {
return context.getData().getString("id");
}
return context.getData().getString("employee_id");
};``````
##### Beam case

For Beam case, you need to rely on Beam pipeline definition and use the `component-runtime-beam` dependency, which provides Beam bridges.

###### Inputs and Outputs

`org.talend.sdk.component.runtime.beam.TalendIO` provides a way to convert a partition mapper or a processor to an input or processor using the `read` or `write` methods.

``````public class Main {
public static void main(final String[] args) {
final ComponentManager manager = ComponentManager.instance()
Pipeline pipeline = Pipeline.create();
//Create beam input from mapper and apply input to pipeline
put("fileprefix", "input");
}}).get()))
.apply(new ViewsMappingTransform(emptyMap(), "sample")) // prepare it for the output record format (see next part)
//Create beam processor from talend processor and apply to pipeline
.apply(TalendIO.write(manager.findProcessor("test", "writer", 1, new HashMap<String, String>() {{
put("fileprefix", "output");
}}).get(), emptyMap()));

//... run pipeline
}
}``````
###### Processors

`org.talend.sdk.component.runtime.beam.TalendFn` provides the way to wrap a processor in a Beam `PTransform` and to integrate it into the pipeline.

``````public class Main {
public static void main(final String[] args) {
//Component manager and pipeline initialization...

//Create beam PTransform from processor and apply input to pipeline
pipeline.apply(TalendFn.asFn(manager.findProcessor("sample", "mapper", 1, emptyMap())).get())), emptyMap());

//... run pipeline
}
}``````

The multiple inputs and outputs are represented by a `Map` element in Beam case to avoid using multiple inputs and outputs.

 You can use `ViewsMappingTransform` or `CoGroupByKeyResultMappingTransform` to adapt the input/output format to the record format representing the multiple inputs/output, like `Map>`, but materialized as a `JsonObject`. Input data must be of the `JsonObject` type in this case.
###### Converting a Beam.io into a component I/O

For simple inputs and outputs, you can get an automatic and transparent conversion of the Beam.io into an I/O component, if you decorated your `PTransform` with `@PartitionMapper` or `@Processor`.

However, there are limitations:

• Inputs must implement `PTransform<PBegin, PCollection<?>>` and must be a `BoundedSource`.

• Outputs must implement `PTransform<PCollection<?>, PDone>` and register a `DoFn` on the input `PCollection`.

For more information, see the How to wrap a Beam I/O page.

#### Advanced: defining a custom API

It is possible to extend the Component API for custom front features.

What is important here is to keep in mind that you should do it only if it targets not portable components (only used by the Studio or Beam).

It is recommended to create a custom `xxxx-component-api` module with the new set of annotations.

##### Extending the UI

To extend the UI, add an annotation that can be put on `@Option` fields, and that is decorated with `@Ui`. All its members are then put in the metadata of the parameter. For example:

``````@Ui
@Target(TYPE)
@Retention(RUNTIME)
public @interface MyLayout {
}``````

### Talend Component Testing Documentation

#### Testing best practices

This section mainly concerns tools that can be used with JUnit. You can use most of these best practices with TestNG as well.

##### Parameterized tests

Parameterized tests are a great solution to repeat the same test multiple times. This method of testing requires defining a test scenario (`I test function F`) and making the input/output data dynamic.

###### JUnit 4

Here is a test example, which validates a connection URI using `ConnectionService`:

``````public class MyConnectionURITest {
@Test
public void checkMySQL() {
assertTrue(new ConnectionService().isValid("jdbc:mysql://localhost:3306/mysql"));
}

@Test
public void checkOracle() {
assertTrue(new ConnectionService().isValid("jdbc:oracle:thin:@//myhost:1521/oracle"));
}
}``````

The testing method is always the same. Only values are changing. It can therefore be rewritten using JUnit `Parameterized` runner, as follows:

``````@RunWith(Parameterized.class) (1)
public class MyConnectionURITest {

@Parameterized.Parameters(name = "{0}") (2)
public static Iterable<String> uris() { (3)
return asList(
"jdbc:mysql://localhost:3306/mysql",
"jdbc:oracle:thin:@//myhost:1521/oracle");
}

@Parameterized.Parameter (4)
public String uri;

@Test
public void isValid() { (5)
assertNotNull(uri);
}
}``````
 1 `Parameterized` is the runner that understands `@Parameters` and how to use it. If needed, you can generate random data here. 2 By default the name of the executed test is the index of the data. Here, it is customized using the first `toString()` parameter value to have something more readable. 3 The `@Parameters` method must be static and return an array or iterable of the data used by the tests. 4 You can then inject the current data using the `@Parameter` annotation. It can take a parameter if you use an array of array instead of an iterable of object in `@Parameterized`. You can select which item you want to inject. 5 The `@Test` method is executed using the contextual data. In this sample, it gets executed twice with the two specified URIs.
 You don’t have to define a single `@Test` method. If you define multiple methods, each of them is executed with all the data. For example, if another test is added to the previous example, four tests are executed - 2 per data).
###### JUnit 5

With JUnit 5, parameterized tests are easier to use. The full documentation is available at junit.org/junit5/docs/current/user-guide/#writing-tests-parameterized-tests.

The main difference with JUnit 4 is that you can also define inline that the test method is a parameterized test as well as the values to use:

``````@ParameterizedTest
@ValueSource(strings = { "racecar", "radar", "able was I ere I saw elba" })
void mytest(String currentValue) {
// do test
}``````

However, you can still use the previous behavior with a method binding configuration:

``````@ParameterizedTest
@MethodSource("stringProvider")
void mytest(String currentValue) {
// do test
}

static Stream<String> stringProvider() {
return Stream.of("foo", "bar");
}``````

This last option allows you to inject any type of value - not only primitives - which is common to define scenarios.

 Add the `junit-jupiter-params` dependency to benefit from this feature.

#### component-runtime-testing

##### component-runtime-junit

`component-runtime-junit` is a test library that allows you to validate simple logic based on the Talend Component Kit tooling.

``````<dependency>
<groupId>org.talend.sdk.component</groupId>
<artifactId>component-runtime-junit</artifactId>
<version>${talend-component.version}</version> <scope>test</scope> </dependency>`````` This dependency also provides mocked components that you can use with your own component to create tests. The mocked components are provided under the `test` family: • `emitter` : a mock of an input component • `collector` : a mock of an output component ###### JUnit 4 You can define a standard JUnit test and use the `SimpleComponentRule` rule: ``````public class MyComponentTest { @Rule (1) public final SimpleComponentRule components = new SimpleComponentRule("org.talend.sdk.component.mycomponent"); @Test public void produce() { Job.components() (2) .component("mycomponent","yourcomponentfamily://yourcomponent?"+createComponentConfig()) .component("collector", "test://collector") .connections() .from("mycomponent").to("collector") .build() .run(); final List<MyRecord> records = components.getCollectedData(MyRecord.class); (3) doAssertRecords(records); // depending your test } }``````  1 The rule creates a component manager and provides two mock components: an emitter and a collector. Set the root package of your component to enable it. 2 Define any chain that you want to test. It generally uses the mock as source or collector. 3 Validate your component behavior. For a source, you can assert that the right records were emitted in the mock collect.  The rule can also be defined as a `@ClassRule` to start it once per class and not per test as with `@Rule`. To go further, you can add the `ServiceInjectionRule` rule, which allows to inject all the component family services into the test class by marking test class fields with `@InjectService`: ``````public class SimpleComponentRuleTest { @ClassRule public static final SimpleComponentRule COMPONENT_FACTORY = new SimpleComponentRule("..."); @Rule (1) public final ServiceInjectionRule injections = new ServiceInjectionRule(COMPONENT_FACTORY, this); (2) @Service (3) private LocalConfiguration configuration; @Service private Jsonb jsonb; @Test public void test() { // ... } }``````  1 The injection requires the test instance, so it must be a `@Rule` rather than a `@ClassRule`. 2 The `ComponentsController` is passed to the rule, which for JUnit 4 is the `SimpleComponentRule`, as well as the test instance to inject services in. 3 All service fields are marked with `@Service` to let the rule inject them before the test is ran. ###### JUnit 5 The JUnit 5 integration is very similar to JUnit 4, except that it uses the JUnit 5 extension mechanism. The entry point is the `@WithComponents` annotation that you add to your test class, and which takes the component package you want to test. You can use `@Injected` to inject an instance of `ComponentsHandler` - which exposes the same utilities than the JUnit 4 rule - in a test class field : ``````@WithComponents("org.talend.sdk.component.junit.component") (1) public class ComponentExtensionTest { @Injected (2) private ComponentsHandler handler; @Test public void manualMapper() { final Mapper mapper = handler.createMapper(Source.class, new Source.Config() { { values = asList("a", "b"); } }); assertFalse(mapper.isStream()); final Input input = mapper.create(); assertEquals("a", input.next()); assertEquals("b", input.next()); assertNull(input.next()); } }``````  1 The annotation defines which components to register in the test context. 2 The field allows to get the handler to be able to orchestrate the tests.  If you use JUnit 5 for the first time, keep in mind that the imports changed and that you need to use `org.junit.jupiter.api.Test` instead of `org.junit.Test`. Some IDE versions and `surefire` versions can also require you to install either a plugin or a specific configuration. As for JUnit 4, you can go further by injecting test class fields marked with `@InjectService`, but there is no additional extension to specify in this case: ``````@WithComponents("...") class ComponentExtensionTest { @Service (1) private LocalConfiguration configuration; @Service private Jsonb jsonb; @Test void test() { // ... } }``````  1 All service fields are marked with `@Service` to let the rule inject them before the test is ran. ###### Mocking the output Using the "test"/"collector" component as shown in the previous sample stores all records emitted by the chain (typically your source) in memory. You can then access them using `theSimpleComponentRule.getCollectedData(type)`. Note that this method filters by type. If you don’t need any specific type, you can use `Object.class`. ###### Mocking the input The input mocking is symmetric to the output. In this case, you provide the data you want to inject: ``````public class MyComponentTest { @Rule public final SimpleComponentRule components = new SimpleComponentRule("org.talend.sdk.component.mycomponent"); @Test public void produce() { components.setInputData(asList(createData(), createData(), createData())); (1) Job.components() .component("emitter","test://emitter") .component("out", "yourcomponentfamily://myoutput?"+createComponentConfig()) .connections() .from("emitter").to("out") .build .run(); assertMyOutputProcessedTheInputData(); } }``````  1 using `setInputData`, you prepare the execution(s) to have a fake input when using the "test"/"emitter" component. ###### Creating runtime configuration from component configuration The component configuration is a POJO (using `@Option` on fields) and the runtime configuration (`ExecutionChainBuilder`) uses a `Map<String, String>`. To make the conversion easier, the JUnit integration provides a `SimpleFactory.configurationByExample` utility to get this map instance from a configuration instance. Example: ``````final MyComponentConfig componentConfig = new MyComponentConfig(); componentConfig.setUser("...."); // .. other inits final Map<String, String> configuration = configurationByExample(componentConfig);`````` The same factory provides a fluent DSL to create the configuration by calling `configurationByExample` without any parameter. The advantage is to be able to convert an object as a `Map<String, String>` or as a query string in order to use it with the `Job` DSL: ``````final String uri = "family://component?" + configurationByExample().forInstance(componentConfig).configured().toQueryString();`````` It handles the encoding of the URI to ensure it is correctly done. ###### Testing a Mapper The `SimpleComponentRule` also allows to test a mapper unitarily. You can get an instance from a configuration and execute this instance to collect the output. Example ``````public class MapperTest { @ClassRule public static final SimpleComponentRule COMPONENT_FACTORY = new SimpleComponentRule( "org.company.talend.component"); @Test public void mapper() { final Mapper mapper = COMPONENT_FACTORY.createMapper(MyMapper.class, new Source.Config() {{ values = asList("a", "b"); }}); assertEquals(asList("a", "b"), COMPONENT_FACTORY.collectAsList(String.class, mapper)); } }`````` ###### Testing a Processor As for a mapper, a processor is testable unitary. However, this case can be more complex in case of multiple inputs or outputs. Example ``````public class ProcessorTest { @ClassRule public static final SimpleComponentRule COMPONENT_FACTORY = new SimpleComponentRule( "org.company.talend.component"); @Test public void processor() { final Processor processor = COMPONENT_FACTORY.createProcessor(Transform.class, null); final SimpleComponentRule.Outputs outputs = COMPONENT_FACTORY.collect(processor, new JoinInputFactory().withInput("__default__", asList(new Transform.Record("a"), new Transform.Record("bb"))) .withInput("second", asList(new Transform.Record("1"), new Transform.Record("2"))) ); assertEquals(2, outputs.size()); assertEquals(asList(2, 3), outputs.get(Integer.class, "size")); assertEquals(asList("a1", "bb2"), outputs.get(String.class, "value")); } }`````` The rule allows you to instantiate a `Processor` from your code, and then to `collect` the output from the inputs you pass in. There are two convenient implementations of the input factory: 1. `MainInputFactory` for processors using only the default input. 2. `JoinInputfactory` with the `withInput(branch, data)` method for processors using multiple inputs. The first argument is the branch name and the second argument is the data used by the branch.  If needed, you can also implement your own input representation using `org.talend.sdk.component.junit.ControllableInputFactory`. ##### component-runtime-testing-spark The following artifact allows you to test against a Spark cluster: ``````<dependency> <groupId>org.talend.sdk.component</groupId> <artifactId>component-runtime-testing-spark</artifactId> <version>${talend-component.version}</version>
<scope>test</scope>
</dependency>``````
###### JUnit 4

The testing relies on a JUnit `TestRule`. It is recommended to use it as a `@ClassRule`, to make sure that a single instance of a Spark cluster is built. You can also use it as a simple `@Rule`, to create the Spark cluster instances per method instead of per test class.

The `@ClassRule` takes the Spark and Scala versions to use as parameters. It then forks a master and N slaves. Finally, the `submit*` method allows you to send jobs either from the test classpath or from a shade if you run it as an integration test.

For example:

``````public class SparkClusterRuleTest {

@ClassRule
public static final SparkClusterRule SPARK = new SparkClusterRule("2.10", "1.6.3", 1);

@Test
public void classpathSubmit() throws IOException {
SPARK.submitClasspath(SubmittableMain.class, getMainArgs());

// wait for the test to pass
}
}``````
 This testing methodology works with `@Parameterized`. You can submit several jobs with different arguments and even combine it with Beam `TestPipeline` if you make it `transient`.
###### JUnit 5

The integration of that Spark cluster logic with JUnit 5 is done using the `@WithSpark` marker for the extension. Optionally, it allows you to inject—through `@SparkInject`—the `BaseSpark<?>` handler to access the Spark cluster meta information. For example, its host/port.

Example
``````@WithSpark
class SparkExtensionTest {

@SparkInject
private BaseSpark<?> spark;

@Test
void classpathSubmit() throws IOException {
final File out = new File(jarLocation(SparkClusterRuleTest.class).getParentFile(), "classpathSubmitJunit5.out");
if (out.exists()) {
out.delete();
}
spark.submitClasspath(SparkClusterRuleTest.SubmittableMain.class, spark.getSparkMaster(), out.getAbsolutePath());

await().atMost(5, MINUTES).until(
() -> out.exists() ? Files.readAllLines(out.toPath()).stream().collect(joining("\n")).trim() : null,
equalTo("b -> 1\na -> 1"));
}
}``````
###### Checking the job execution status

Currently, `SparkClusterRule` does not allow to know when a job execution is done, even by exposing and polling the web UI URL to check. The best solution at the moment is to make sure that the output of your job exists and contains the right value.

`awaitability` or any equivalent library can help you to implement such logic:

``````<dependency>
<groupId>org.awaitility</groupId>
<artifactId>awaitility</artifactId>
<version>3.0.0</version>
<scope>test</scope>
</dependency>``````

To wait until a file exists and check that its content (for example) is the expected one, you can use the following logic:

``````await()
.atMost(5, MINUTES)
.until(
() -> out.exists() ? Files.readAllLines(out.toPath()).stream().collect(joining("\n")).trim() : null,
equalTo("the expected content of the file"));``````
##### component-runtime-http-junit

The HTTP JUnit module allows you to mock REST API very simply. The module coordinates are:

``````<dependency>
<groupId>org.talend.sdk.component</groupId>
<artifactId>component-runtime-http-junit</artifactId>
<version>${talend-component.version}</version> <scope>test</scope> </dependency>``````  This module uses Apache Johnzon and Netty. If you have any conflict (in particular with Netty), you can add the `shaded` classifier to the dependency. This way, both dependencies are shaded, which avoids conflicts with your component. It supports both JUnit 4 and JUnit 5. The concept is the exact same one: the extension/rule is able to serve precomputed responses saved in the classpath. You can plug your own `ResponseLocator` to map a request to a response, but the default implementation - which should be sufficient in most cases - looks in `talend/testing/http/<class name>_<method name>.json`. Note that you can also put it in `talend/testing/http/<request path>.json`. ###### JUnit 4 JUnit 4 setup is done through two rules: • `JUnit4HttpApi`, which is starts the server. • `JUnit4HttpApiPerMethodConfigurator`, which configures the server per test and also handles the capture mode.  If you don’t use the `JUnit4HttpApiPerMethodConfigurator`, the capture feature is disabled and the per test mocking is not available. Test example ``````public class MyRESTApiTest { @ClassRule public static final JUnit4HttpApi API = new JUnit4HttpApi(); @Rule public final JUnit4HttpApiPerMethodConfigurator configurator = new JUnit4HttpApiPerMethodConfigurator(API); @Test public void direct() throws Exception { // ... do your requests } }`````` SSL For tests using SSL-based services, you need to use `activeSsl()` on the `JUnit4HttpApi` rule. You can access the client SSL socket factory through the API handler: ``````@ClassRule public static final JUnit4HttpApi API = new JUnit4HttpApi().activeSsl(); @Test public void test() throws Exception { final HttpsURLConnection connection = getHttpsConnection(); connection.setSSLSocketFactory(API.getSslContext().getSocketFactory()); // .... }`````` ###### JUnit 5 JUnit 5 uses a JUnit 5 extension based on the `HttpApi` annotation that you can add to your test class. You can inject the test handler - which has some utilities for advanced cases - through `@HttpApiInject`: ``````@HttpApi class JUnit5HttpApiTest { @HttpApiInject private HttpApiHandler<?> handler; @Test void getProxy() throws Exception { // .... do your requests } }``````  The injection is optional and the `@HttpApi` annotation allows you to configure several test behaviors. SSL For tests using SSL-based services, you need to use `@HttpApi(useSsl = true)`. You can access the client SSL socket factory through the API handler: ``````@HttpApi*(useSsl = true)* class MyHttpsApiTest { @HttpApiInject private HttpApiHandler<?> handler; @Test void test() throws Exception { final HttpsURLConnection connection = getHttpsConnection(); connection.setSSLSocketFactory(handler.getSslContext().getSocketFactory()); // .... } }`````` ###### Capturing mode The strength of this implementation is to run a small proxy server and to auto-configure the JVM: `http[s].proxyHost`, `http[s].proxyPort`, `HttpsURLConnection#defaultSSLSocketFactory` and `SSLContext#default` are auto-configured to work out-of-the-box with the proxy. It allows you to keep the native and real URLs in your tests. For example, the following test is valid: ``````public class GoogleTest { @ClassRule public static final JUnit4HttpApi API = new JUnit4HttpApi(); @Rule public final JUnit4HttpApiPerMethodConfigurator configurator = new JUnit4HttpApiPerMethodConfigurator(API); @Test public void google() throws Exception { assertEquals(HttpURLConnection.HTTP_OK, get("https://google.fr?q=Talend")); } private int get(final String uri) throws Exception { // do the GET request, skipped for brievity } }`````` If you execute this test, it fails with an HTTP 400 error because the proxy does not find the mocked response. You can create it manually, as described in component-runtime-http-junit, but you can also set the `talend.junit.http.capture` property to the folder storing the captures. It must be the root folder and not the folder where the JSON files are located (not prefixed by `talend/testing/http` by default). In most cases, use `src/test/resources`. If `new File("src/test/resources")` resolves the valid folder when executing your test (Maven default), then you can just set the system property to `true`. Otherwise, you need to adjust accordingly the system property value. When the tests run with this system property, the testing framework creates the correct mock response files. After that, you can remove the system property. The tests will still pass, using `google.com`, even if you disconnect your machine from the Internet. ###### Passthrough mode If you set the `talend.junit.http.passthrough` system property to `true`, the server acts as a proxy and executes each request to the actual server - similarly to the capturing mode. #### Beam testing If you want to make sure that your component works in Beam and don’t want to use Spark, you can try with the Direct Runner. Check beam.apache.org/contribute/testing/ for more details. #### Testing on multiple environments JUnit (4 or 5) already provides ways to parameterize tests and execute the same "test logic" against several sets of data. However, it is not very convenient for testing multiple environments. For example, with Beam, you can test your code against multiple runners. But it requires resolving conflicts between runner dependencies, setting the correct classloaders, and so on. To simplify such cases, the framework provides you a multi-environment support for your tests, through the JUnit module, which works with both JUnit 4 and JUnit 5. ##### JUnit 4 ``````@RunWith(MultiEnvironmentsRunner.class) @Environment(Env1.class) @Environment(Env2.class) public class TheTest { @Test public void test1() { // ... } }`````` The `MultiEnvironmentsRunner` executes the tests for each defined environments. With the example above, it means that it runs `test1` for `Env1` and `Env2`. By default, the `JUnit4` runner is used to execute the tests in one environment, but you can use `@DelegateRunWith` to use another runner. ##### JUnit 5 The multi-environment configuration with JUnit 5 is similar to JUnit 4: ``````@Environment(EnvironmentsExtensionTest.E1.class) @Environment(EnvironmentsExtensionTest.E2.class) class TheTest { @EnvironmentalTest void test1() { // ... } }`````` The main differences are that no runner is used because they do not exist in JUnit 5, and that you need to replace `@Test` by `@EnvironmentalTest`.  With JUnit5, tests are executed one after another for all environments, while tests are ran sequentially in each environments with JUnit 4. For example, this means that `@BeforeAll` and `@AfterAll` are executed once for all runners. ##### Provided environments The provided environment sets the contextual classloader in order to load the related runner of Apache Beam. Package: `org.talend.sdk.component.junit.environment.builtin.beam`  the configuration is read from system properties, environment variables, …​. Class Name Description ContextualEnvironment Contextual Contextual runner DirectRunnerEnvironment Direct Direct runner FlinkRunnerEnvironment Flink Flink runner SparkRunnerEnvironment Spark Spark runner ##### Configuring environments If the environment extends `BaseEnvironmentProvider` and therefore defines an environment name - which is the case of the default ones - you can use `EnvironmentConfiguration` to customize the system properties used for that environment: ``````@Environment(DirectRunnerEnvironment.class) @EnvironmentConfiguration( environment = "Direct", systemProperties = @EnvironmentConfiguration.Property(key = "beamTestPipelineOptions", value = "...")) @Environment(SparkRunnerEnvironment.class) @EnvironmentConfiguration( environment = "Spark", systemProperties = @EnvironmentConfiguration.Property(key = "beamTestPipelineOptions", value = "...")) @Environment(FlinkRunnerEnvironment.class) @EnvironmentConfiguration( environment = "Flink", systemProperties = @EnvironmentConfiguration.Property(key = "beamTestPipelineOptions", value = "...")) class MyBeamTest { @EnvironmentalTest void execute() { // run some pipeline } }``````  If you set the `.skip` system property to `true`, the environment-related executions are skipped. ###### Advanced usage This usage assumes that Beam 2.4.0 is used. The following dependencies bring the JUnit testing toolkit, the Beam integration and the multi-environment testing toolkit for JUnit into the test scope. Dependencies ``````<dependencies> <dependency> <groupId>org.talend.sdk.component</groupId> <artifactId>component-runtime-junit</artifactId> <scope>test</scope> </dependency> <dependency> <groupId>org.junit.jupiter</groupId> <artifactId>junit-jupiter-api</artifactId> <scope>test</scope> </dependency> <dependency> <groupId>org.jboss.shrinkwrap.resolver</groupId> <artifactId>shrinkwrap-resolver-impl-maven</artifactId> <version>3.1.3</version> <scope>test</scope> </dependency> <dependency> <groupId>org.talend.sdk.component</groupId> <artifactId>component-runtime-beam</artifactId> <scope>test</scope> </dependency> </dependencies>`````` Using the fluent DSL to define jobs, you can write a test as follows:  Your job must be linear and each step must send a single value (no multi-input or multi-output). ``````@Environment(ContextualEnvironment.class) @Environment(DirectRunnerEnvironment.class) class TheComponentTest { @EnvironmentalTest void testWithStandaloneAndBeamEnvironments() { from("myfamily://in?config=xxxx") .to("myfamily://out") .create() .execute(); // add asserts on the output if needed } }`````` It executes the chain twice: 1. With a standalone environment to simulate the Studio. 2. With a Beam (direct runner) environment to ensure the portability of your job. #### Secrets/Passwords and Maven You can reuse Maven `settings.xml` server files, including the encrypted ones. `org.talend.sdk.component.maven.MavenDecrypter` allows yo to find a `username`/`password` from a server identifier: ``````final MavenDecrypter decrypter = new MavenDecrypter(); final Server decrypted = decrypter.find("my-test-server"); // decrypted.getUsername(); // decrypted.getPassword();`````` It is very useful to avoid storing secrets and to perform tests on real systems on a continuous integration platform.  Even if you don’t use Maven on the platform, you can generate the `settings.xml` and`settings-security.xml` files to use that feature. See maven.apache.org/guides/mini/guide-encryption.html for more details. #### Generating data Several data generators exist if you want to populate objects with a semantic that is more evolved than a plain random string like `commons-lang3`: Even more advanced, the following generators allow to directly bind generic data on a model. However, data quality is not always optimal: There are two main kinds of implementation: • Implementations using a pattern and random generated data. • Implementations using a set of precomputed data extrapolated to create new values. Check your use case to know which one fits best.  An alternative to data generation can be to import real data and use Talend Studio to sanitize the data, by removing sensitive information and replacing it with generated or anonymized data. Then you just need to inject that file into the system. If you are using JUnit 5, you can have a look at glytching.github.io/junit-extensions/randomBeans. ## Talend Component Testing Documentation ### Testing best practices This section mainly concerns tools that can be used with JUnit. You can use most of these best practices with TestNG as well. #### Parameterized tests Parameterized tests are a great solution to repeat the same test multiple times. This method of testing requires defining a test scenario (`I test function F`) and making the input/output data dynamic. ##### JUnit 4 Here is a test example, which validates a connection URI using `ConnectionService`: ``````public class MyConnectionURITest { @Test public void checkMySQL() { assertTrue(new ConnectionService().isValid("jdbc:mysql://localhost:3306/mysql")); } @Test public void checkOracle() { assertTrue(new ConnectionService().isValid("jdbc:oracle:thin:@//myhost:1521/oracle")); } }`````` The testing method is always the same. Only values are changing. It can therefore be rewritten using JUnit `Parameterized` runner, as follows: ``````@RunWith(Parameterized.class) (1) public class MyConnectionURITest { @Parameterized.Parameters(name = "{0}") (2) public static Iterable<String> uris() { (3) return asList( "jdbc:mysql://localhost:3306/mysql", "jdbc:oracle:thin:@//myhost:1521/oracle"); } @Parameterized.Parameter (4) public String uri; @Test public void isValid() { (5) assertNotNull(uri); } }``````  1 `Parameterized` is the runner that understands `@Parameters` and how to use it. If needed, you can generate random data here. 2 By default the name of the executed test is the index of the data. Here, it is customized using the first `toString()` parameter value to have something more readable. 3 The `@Parameters` method must be static and return an array or iterable of the data used by the tests. 4 You can then inject the current data using the `@Parameter` annotation. It can take a parameter if you use an array of array instead of an iterable of object in `@Parameterized`. You can select which item you want to inject. 5 The `@Test` method is executed using the contextual data. In this sample, it gets executed twice with the two specified URIs.  You don’t have to define a single `@Test` method. If you define multiple methods, each of them is executed with all the data. For example, if another test is added to the previous example, four tests are executed - 2 per data). ##### JUnit 5 With JUnit 5, parameterized tests are easier to use. The full documentation is available at junit.org/junit5/docs/current/user-guide/#writing-tests-parameterized-tests. The main difference with JUnit 4 is that you can also define inline that the test method is a parameterized test as well as the values to use: ``````@ParameterizedTest @ValueSource(strings = { "racecar", "radar", "able was I ere I saw elba" }) void mytest(String currentValue) { // do test }`````` However, you can still use the previous behavior with a method binding configuration: ``````@ParameterizedTest @MethodSource("stringProvider") void mytest(String currentValue) { // do test } static Stream<String> stringProvider() { return Stream.of("foo", "bar"); }`````` This last option allows you to inject any type of value - not only primitives - which is common to define scenarios.  Add the `junit-jupiter-params` dependency to benefit from this feature. ### component-runtime-testing #### component-runtime-junit `component-runtime-junit` is a test library that allows you to validate simple logic based on the Talend Component Kit tooling. To import it, add the following dependency to your project: ``````<dependency> <groupId>org.talend.sdk.component</groupId> <artifactId>component-runtime-junit</artifactId> <version>${talend-component.version}</version>
<scope>test</scope>
</dependency>``````

This dependency also provides mocked components that you can use with your own component to create tests.

The mocked components are provided under the `test` family:

• `emitter` : a mock of an input component

• `collector` : a mock of an output component

##### JUnit 4

You can define a standard JUnit test and use the `SimpleComponentRule` rule:

``````public class MyComponentTest {

@Rule (1)
public final SimpleComponentRule components = new SimpleComponentRule("org.talend.sdk.component.mycomponent");

@Test
public void produce() {
Job.components() (2)
.component("mycomponent","yourcomponentfamily://yourcomponent?"+createComponentConfig())
.component("collector", "test://collector")
.connections()
.from("mycomponent").to("collector")
.build()
.run();

final List<MyRecord> records = components.getCollectedData(MyRecord.class); (3)
}
}``````
 1 The rule creates a component manager and provides two mock components: an emitter and a collector. Set the root package of your component to enable it. 2 Define any chain that you want to test. It generally uses the mock as source or collector. 3 Validate your component behavior. For a source, you can assert that the right records were emitted in the mock collect.
 The rule can also be defined as a `@ClassRule` to start it once per class and not per test as with `@Rule`.

To go further, you can add the `ServiceInjectionRule` rule, which allows to inject all the component family services into the test class by marking test class fields with `@InjectService`:

``````public class SimpleComponentRuleTest {

@ClassRule
public static final SimpleComponentRule COMPONENT_FACTORY = new SimpleComponentRule("...");

@Rule (1)
public final ServiceInjectionRule injections = new ServiceInjectionRule(COMPONENT_FACTORY, this); (2)

@Service (3)
private LocalConfiguration configuration;

@Service
private Jsonb jsonb;

@Test
public void test() {
// ...
}
}``````
 1 The injection requires the test instance, so it must be a `@Rule` rather than a `@ClassRule`. 2 The `ComponentsController` is passed to the rule, which for JUnit 4 is the `SimpleComponentRule`, as well as the test instance to inject services in. 3 All service fields are marked with `@Service` to let the rule inject them before the test is ran.
##### JUnit 5

The JUnit 5 integration is very similar to JUnit 4, except that it uses the JUnit 5 extension mechanism.

The entry point is the `@WithComponents` annotation that you add to your test class, and which takes the component package you want to test. You can use `@Injected` to inject an instance of `ComponentsHandler` - which exposes the same utilities than the JUnit 4 rule - in a test class field :

``````@WithComponents("org.talend.sdk.component.junit.component") (1)
public class ComponentExtensionTest {
@Injected (2)
private ComponentsHandler handler;

@Test
public void manualMapper() {
final Mapper mapper = handler.createMapper(Source.class, new Source.Config() {

{
values = asList("a", "b");
}
});
assertFalse(mapper.isStream());
final Input input = mapper.create();
assertEquals("a", input.next());
assertEquals("b", input.next());
assertNull(input.next());
}
}``````
 1 The annotation defines which components to register in the test context. 2 The field allows to get the handler to be able to orchestrate the tests.
 If you use JUnit 5 for the first time, keep in mind that the imports changed and that you need to use `org.junit.jupiter.api.Test` instead of `org.junit.Test`. Some IDE versions and `surefire` versions can also require you to install either a plugin or a specific configuration.

As for JUnit 4, you can go further by injecting test class fields marked with `@InjectService`, but there is no additional extension to specify in this case:

``````@WithComponents("...")
class ComponentExtensionTest {

@Service (1)
private LocalConfiguration configuration;

@Service
private Jsonb jsonb;

@Test
void test() {
// ...
}
}``````
 1 All service fields are marked with `@Service` to let the rule inject them before the test is ran.
##### Mocking the output

Using the "test"/"collector" component as shown in the previous sample stores all records emitted by the chain (typically your source) in memory. You can then access them using `theSimpleComponentRule.getCollectedData(type)`.

Note that this method filters by type. If you don’t need any specific type, you can use `Object.class`.

##### Mocking the input

The input mocking is symmetric to the output. In this case, you provide the data you want to inject:

``````public class MyComponentTest {

@Rule
public final SimpleComponentRule components = new SimpleComponentRule("org.talend.sdk.component.mycomponent");

@Test
public void produce() {
components.setInputData(asList(createData(), createData(), createData())); (1)

Job.components()
.component("emitter","test://emitter")
.component("out", "yourcomponentfamily://myoutput?"+createComponentConfig())
.connections()
.from("emitter").to("out")
.build
.run();

assertMyOutputProcessedTheInputData();
}
}``````
 1 using `setInputData`, you prepare the execution(s) to have a fake input when using the "test"/"emitter" component.
##### Creating runtime configuration from component configuration

The component configuration is a POJO (using `@Option` on fields) and the runtime configuration (`ExecutionChainBuilder`) uses a `Map<String, String>`. To make the conversion easier, the JUnit integration provides a `SimpleFactory.configurationByExample` utility to get this map instance from a configuration instance.

Example:
``````final MyComponentConfig componentConfig = new MyComponentConfig();
componentConfig.setUser("....");
// .. other inits

final Map<String, String> configuration = configurationByExample(componentConfig);``````

The same factory provides a fluent DSL to create the configuration by calling `configurationByExample` without any parameter. The advantage is to be able to convert an object as a `Map<String, String>` or as a query string in order to use it with the `Job` DSL:

``````final String uri = "family://component?" +
configurationByExample().forInstance(componentConfig).configured().toQueryString();``````

It handles the encoding of the URI to ensure it is correctly done.

##### Testing a Mapper

The `SimpleComponentRule` also allows to test a mapper unitarily. You can get an instance from a configuration and execute this instance to collect the output.

Example
``````public class MapperTest {

@ClassRule
public static final SimpleComponentRule COMPONENT_FACTORY = new SimpleComponentRule(
"org.company.talend.component");

@Test
public void mapper() {
final Mapper mapper = COMPONENT_FACTORY.createMapper(MyMapper.class, new Source.Config() {{
values = asList("a", "b");
}});
assertEquals(asList("a", "b"), COMPONENT_FACTORY.collectAsList(String.class, mapper));
}
}``````
##### Testing a Processor

As for a mapper, a processor is testable unitary. However, this case can be more complex in case of multiple inputs or outputs.

Example
``````public class ProcessorTest {

@ClassRule
public static final SimpleComponentRule COMPONENT_FACTORY = new SimpleComponentRule(
"org.company.talend.component");

@Test
public void processor() {
final Processor processor = COMPONENT_FACTORY.createProcessor(Transform.class, null);
final SimpleComponentRule.Outputs outputs = COMPONENT_FACTORY.collect(processor,
new JoinInputFactory().withInput("__default__", asList(new Transform.Record("a"), new Transform.Record("bb")))
.withInput("second", asList(new Transform.Record("1"), new Transform.Record("2")))
);
assertEquals(2, outputs.size());
assertEquals(asList(2, 3), outputs.get(Integer.class, "size"));
assertEquals(asList("a1", "bb2"), outputs.get(String.class, "value"));
}
}``````

The rule allows you to instantiate a `Processor` from your code, and then to `collect` the output from the inputs you pass in. There are two convenient implementations of the input factory:

1. `MainInputFactory` for processors using only the default input.

2. `JoinInputfactory` with the `withInput(branch, data)` method for processors using multiple inputs. The first argument is the branch name and the second argument is the data used by the branch.

 If needed, you can also implement your own input representation using `org.talend.sdk.component.junit.ControllableInputFactory`.

#### component-runtime-testing-spark

The following artifact allows you to test against a Spark cluster:

``````<dependency>
<groupId>org.talend.sdk.component</groupId>
<artifactId>component-runtime-testing-spark</artifactId>
<version>${talend-component.version}</version> <scope>test</scope> </dependency>`````` ##### JUnit 4 The testing relies on a JUnit `TestRule`. It is recommended to use it as a `@ClassRule`, to make sure that a single instance of a Spark cluster is built. You can also use it as a simple `@Rule`, to create the Spark cluster instances per method instead of per test class. The `@ClassRule` takes the Spark and Scala versions to use as parameters. It then forks a master and N slaves. Finally, the `submit*` method allows you to send jobs either from the test classpath or from a shade if you run it as an integration test. For example: ``````public class SparkClusterRuleTest { @ClassRule public static final SparkClusterRule SPARK = new SparkClusterRule("2.10", "1.6.3", 1); @Test public void classpathSubmit() throws IOException { SPARK.submitClasspath(SubmittableMain.class, getMainArgs()); // wait for the test to pass } }``````  This testing methodology works with `@Parameterized`. You can submit several jobs with different arguments and even combine it with Beam `TestPipeline` if you make it `transient`. ##### JUnit 5 The integration of that Spark cluster logic with JUnit 5 is done using the `@WithSpark` marker for the extension. Optionally, it allows you to inject—through `@SparkInject`—the `BaseSpark<?>` handler to access the Spark cluster meta information. For example, its host/port. Example ``````@WithSpark class SparkExtensionTest { @SparkInject private BaseSpark<?> spark; @Test void classpathSubmit() throws IOException { final File out = new File(jarLocation(SparkClusterRuleTest.class).getParentFile(), "classpathSubmitJunit5.out"); if (out.exists()) { out.delete(); } spark.submitClasspath(SparkClusterRuleTest.SubmittableMain.class, spark.getSparkMaster(), out.getAbsolutePath()); await().atMost(5, MINUTES).until( () -> out.exists() ? Files.readAllLines(out.toPath()).stream().collect(joining("\n")).trim() : null, equalTo("b -> 1\na -> 1")); } }`````` ##### Checking the job execution status Currently, `SparkClusterRule` does not allow to know when a job execution is done, even by exposing and polling the web UI URL to check. The best solution at the moment is to make sure that the output of your job exists and contains the right value. `awaitability` or any equivalent library can help you to implement such logic: ``````<dependency> <groupId>org.awaitility</groupId> <artifactId>awaitility</artifactId> <version>3.0.0</version> <scope>test</scope> </dependency>`````` To wait until a file exists and check that its content (for example) is the expected one, you can use the following logic: ``````await() .atMost(5, MINUTES) .until( () -> out.exists() ? Files.readAllLines(out.toPath()).stream().collect(joining("\n")).trim() : null, equalTo("the expected content of the file"));`````` #### component-runtime-http-junit The HTTP JUnit module allows you to mock REST API very simply. The module coordinates are: ``````<dependency> <groupId>org.talend.sdk.component</groupId> <artifactId>component-runtime-http-junit</artifactId> <version>${talend-component.version}</version>
<scope>test</scope>
</dependency>``````
 This module uses Apache Johnzon and Netty. If you have any conflict (in particular with Netty), you can add the `shaded` classifier to the dependency. This way, both dependencies are shaded, which avoids conflicts with your component.

It supports both JUnit 4 and JUnit 5. The concept is the exact same one: the extension/rule is able to serve precomputed responses saved in the classpath.

You can plug your own `ResponseLocator` to map a request to a response, but the default implementation - which should be sufficient in most cases - looks in `talend/testing/http/<class name>_<method name>.json`. Note that you can also put it in `talend/testing/http/<request path>.json`.

##### JUnit 4

JUnit 4 setup is done through two rules:

• `JUnit4HttpApi`, which is starts the server.

• `JUnit4HttpApiPerMethodConfigurator`, which configures the server per test and also handles the capture mode.

 If you don’t use the `JUnit4HttpApiPerMethodConfigurator`, the capture feature is disabled and the per test mocking is not available.
Test example
``````public class MyRESTApiTest {
@ClassRule
public static final JUnit4HttpApi API = new JUnit4HttpApi();

@Rule
public final JUnit4HttpApiPerMethodConfigurator configurator = new JUnit4HttpApiPerMethodConfigurator(API);

@Test
public void direct() throws Exception {
}
}``````
###### SSL

For tests using SSL-based services, you need to use `activeSsl()` on the `JUnit4HttpApi` rule.

You can access the client SSL socket factory through the API handler:

``````@ClassRule
public static final JUnit4HttpApi API = new JUnit4HttpApi().activeSsl();

@Test
public void test() throws Exception {
final HttpsURLConnection connection = getHttpsConnection();
connection.setSSLSocketFactory(API.getSslContext().getSocketFactory());
// ....
}``````
##### JUnit 5

JUnit 5 uses a JUnit 5 extension based on the `HttpApi` annotation that you can add to your test class. You can inject the test handler - which has some utilities for advanced cases - through `@HttpApiInject`:

``````@HttpApi
class JUnit5HttpApiTest {
@HttpApiInject
private HttpApiHandler<?> handler;

@Test
void getProxy() throws Exception {
}
}``````
 The injection is optional and the `@HttpApi` annotation allows you to configure several test behaviors.
###### SSL

For tests using SSL-based services, you need to use `@HttpApi(useSsl = true)`.

You can access the client SSL socket factory through the API handler:

``````@HttpApi*(useSsl = true)*
class MyHttpsApiTest {
@HttpApiInject
private HttpApiHandler<?> handler;

@Test
void test() throws Exception {
final HttpsURLConnection connection = getHttpsConnection();
connection.setSSLSocketFactory(handler.getSslContext().getSocketFactory());
// ....
}
}``````
##### Capturing mode

The strength of this implementation is to run a small proxy server and to auto-configure the JVM: `http[s].proxyHost`, `http[s].proxyPort`, `HttpsURLConnection#defaultSSLSocketFactory` and `SSLContext#default` are auto-configured to work out-of-the-box with the proxy.

It allows you to keep the native and real URLs in your tests. For example, the following test is valid:

``````public class GoogleTest {
@ClassRule
public static final JUnit4HttpApi API = new JUnit4HttpApi();

@Rule
public final JUnit4HttpApiPerMethodConfigurator configurator = new JUnit4HttpApiPerMethodConfigurator(API);

@Test
public void google() throws Exception {
}

private int get(final String uri) throws Exception {
// do the GET request, skipped for brievity
}
}``````

If you execute this test, it fails with an HTTP 400 error because the proxy does not find the mocked response.
You can create it manually, as described in component-runtime-http-junit, but you can also set the `talend.junit.http.capture` property to the folder storing the captures. It must be the root folder and not the folder where the JSON files are located (not prefixed by `talend/testing/http` by default).

In most cases, use `src/test/resources`. If `new File("src/test/resources")` resolves the valid folder when executing your test (Maven default), then you can just set the system property to `true`. Otherwise, you need to adjust accordingly the system property value.

When the tests run with this system property, the testing framework creates the correct mock response files. After that, you can remove the system property. The tests will still pass, using `google.com`, even if you disconnect your machine from the Internet.

##### Passthrough mode

If you set the `talend.junit.http.passthrough` system property to `true`, the server acts as a proxy and executes each request to the actual server - similarly to the capturing mode.

### Beam testing

If you want to make sure that your component works in Beam and don’t want to use Spark, you can try with the Direct Runner.

Check beam.apache.org/contribute/testing/ for more details.

### Testing on multiple environments

JUnit (4 or 5) already provides ways to parameterize tests and execute the same "test logic" against several sets of data. However, it is not very convenient for testing multiple environments.

For example, with Beam, you can test your code against multiple runners. But it requires resolving conflicts between runner dependencies, setting the correct classloaders, and so on.

To simplify such cases, the framework provides you a multi-environment support for your tests, through the JUnit module, which works with both JUnit 4 and JUnit 5.

#### JUnit 4

``````@RunWith(MultiEnvironmentsRunner.class)
@Environment(Env1.class)
@Environment(Env2.class)
public class TheTest {
@Test
public void test1() {
// ...
}
}``````

The `MultiEnvironmentsRunner` executes the tests for each defined environments. With the example above, it means that it runs `test1` for `Env1` and `Env2`.

By default, the `JUnit4` runner is used to execute the tests in one environment, but you can use `@DelegateRunWith` to use another runner.

#### JUnit 5

The multi-environment configuration with JUnit 5 is similar to JUnit 4:

``````@Environment(EnvironmentsExtensionTest.E1.class)
@Environment(EnvironmentsExtensionTest.E2.class)
class TheTest {

@EnvironmentalTest
void test1() {
// ...
}
}``````

The main differences are that no runner is used because they do not exist in JUnit 5, and that you need to replace `@Test` by `@EnvironmentalTest`.

 With JUnit5, tests are executed one after another for all environments, while tests are ran sequentially in each environments with JUnit 4. For example, this means that `@BeforeAll` and `@AfterAll` are executed once for all runners.

#### Provided environments

The provided environment sets the contextual classloader in order to load the related runner of Apache Beam.

Package: `org.talend.sdk.component.junit.environment.builtin.beam`

 the configuration is read from system properties, environment variables, …​.
Class Name Description

ContextualEnvironment

Contextual

Contextual runner

DirectRunnerEnvironment

Direct

Direct runner

SparkRunnerEnvironment

Spark

Spark runner

#### Configuring environments

If the environment extends `BaseEnvironmentProvider` and therefore defines an environment name - which is the case of the default ones - you can use `EnvironmentConfiguration` to customize the system properties used for that environment:

``````@Environment(DirectRunnerEnvironment.class)
@EnvironmentConfiguration(
environment = "Direct",
systemProperties = @EnvironmentConfiguration.Property(key = "beamTestPipelineOptions", value = "..."))

@Environment(SparkRunnerEnvironment.class)
@EnvironmentConfiguration(
environment = "Spark",
systemProperties = @EnvironmentConfiguration.Property(key = "beamTestPipelineOptions", value = "..."))

@EnvironmentConfiguration(
systemProperties = @EnvironmentConfiguration.Property(key = "beamTestPipelineOptions", value = "..."))
class MyBeamTest {

@EnvironmentalTest
void execute() {
// run some pipeline
}
}``````
 If you set the `.skip` system property to `true`, the environment-related executions are skipped.

This usage assumes that Beam 2.4.0 is used.

The following dependencies bring the JUnit testing toolkit, the Beam integration and the multi-environment testing toolkit for JUnit into the test scope.

Dependencies
``````<dependencies>
<dependency>
<groupId>org.talend.sdk.component</groupId>
<artifactId>component-runtime-junit</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-api</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.jboss.shrinkwrap.resolver</groupId>
<artifactId>shrinkwrap-resolver-impl-maven</artifactId>
<version>3.1.3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.talend.sdk.component</groupId>
<artifactId>component-runtime-beam</artifactId>
<scope>test</scope>
</dependency>
</dependencies>``````

Using the fluent DSL to define jobs, you can write a test as follows:

 Your job must be linear and each step must send a single value (no multi-input or multi-output).
``````@Environment(ContextualEnvironment.class)
@Environment(DirectRunnerEnvironment.class)
class TheComponentTest {
@EnvironmentalTest
void testWithStandaloneAndBeamEnvironments() {
from("myfamily://in?config=xxxx")
.to("myfamily://out")
.create()
.execute();
// add asserts on the output if needed
}
}``````

It executes the chain twice:

1. With a standalone environment to simulate the Studio.

2. With a Beam (direct runner) environment to ensure the portability of your job.

You can reuse Maven `settings.xml` server files, including the encrypted ones. `org.talend.sdk.component.maven.MavenDecrypter` allows yo to find a `username`/`password` from a server identifier:

``````final MavenDecrypter decrypter = new MavenDecrypter();
final Server decrypted = decrypter.find("my-test-server");

It is very useful to avoid storing secrets and to perform tests on real systems on a continuous integration platform.

 Even if you don’t use Maven on the platform, you can generate the `settings.xml` and`settings-security.xml` files to use that feature. See maven.apache.org/guides/mini/guide-encryption.html for more details.

### Generating data

Several data generators exist if you want to populate objects with a semantic that is more evolved than a plain random string like `commons-lang3`:

Even more advanced, the following generators allow to directly bind generic data on a model. However, data quality is not always optimal:

There are two main kinds of implementation:

• Implementations using a pattern and random generated data.

• Implementations using a set of precomputed data extrapolated to create new values.

Check your use case to know which one fits best.

 An alternative to data generation can be to import real data and use Talend Studio to sanitize the data, by removing sensitive information and replacing it with generated or anonymized data. Then you just need to inject that file into the system.

If you are using JUnit 5, you can have a look at glytching.github.io/junit-extensions/randomBeans.

## Talend Component Kit best practices

Some recommendations apply to the way component packages are organized:

1. Make sure to create a `package-info.java file` with the component family/categories at the root of your component package:

``````@Components(family = "jdbc", categories = "Database")
package org.talend.sdk.component.jdbc;

import org.talend.sdk.component.api.component.Components;``````
1. Create a package for the configuration.

2. Create a package for the actions.

3. Create a package for the component and one sub-package by type of component (input, output, processors, and so on).

### Configuring components

It is recommended to serialize your configuration in order to be able to pass it through other components.

#### Input and output components

When building a new component, the first step is to identify the way it must be configured.

The two main concepts are:

1. The DataStore which is the way you can access the backend.

2. The DataSet which is the way you interact with the backend.

For example:

Example description DataStore DataSet

Accessing a relational database like MySQL

Query to execute, row mapper, and so on.

Accessing a file system

File pattern (or directory + file extension/prefix/…​)

File format, buffer size, and so on.

It is common to have the dataset including the datastore, because both are required to work. However, it is recommended to replace this pattern by defining both dataset and datastore in a higher level configuration model. For example:

``````@DataSet
public class MyDataSet {
// ...
}

@DataStore
public class MyDataStore {
// ...
}

public class MyComponentConfiguration {
@Option
private MyDataSet dataset;

@Option
private MyDataStore datastore;
}``````

Input and output components are particular because they can be linked to a set of actions. It is recommended to wire all the actions you can apply to ensure the consumers of your component can provide a rich experience to their users.

The most common actions are the following ones:

Type Action Description Configuration example Action example

DataStore

`@Checkable`

Exposes a way to ensure the datastore/connection works

``````@DataStore
@Checkable
public class JdbcDataStore
implements Serializable {

@Option
private String driver;

@Option
private String url;

@Option

@Option
}``````
``````@HealthCheck
public HealthCheckStatus healthCheck(@Option("datastore") JdbcDataStore datastore) {
if (!doTest(dataStore)) {
// often add an exception message mapping or equivalent
return new HealthCheckStatus(Status.KO, "Test failed");
}
return new HealthCheckStatus(Status.KO, e.getMessage());
}``````
##### Limitations

Until the studio integration is complete, it is recommended to limit processors to one input.

#### Processor components

Configuring processor components is simpler than configuring input and output components because it is specific for each component. For example, a mapper takes the mapping between the input and output models:

``````public class MappingConfiguration {
@Option
private Map<String, String> fieldsMapping;

@Option
private boolean ignoreCase;

//...
}``````

### Handling UI interactions

It is recommended to provide as much information as possible to let the UI work with the data during its edition.

#### Validations

##### Light validations

Light validations are all the validations you can execute on the client side. They are listed in the UI hint section.

Use light validations first before going with custom validations because they are more efficient.

##### Custom validations

Custom validations enforce custom code to be executed, they are more heavy to process, so prefer using light validations when possible.

Define an action with the parameters needed for the validation and link the option you want to validate to this action. For example, to validate a dataset for a JDBC driver:

``````// ...
public class JdbcDataStore
implements Serializable {

@Option
@Validable("driver")
private String driver;

// ...
}

@AsyncValidation("driver")
public ValidationResult validateDriver(@Option("value") String driver) {
if (findDriver(driver) != null) {
return new ValidationResult(Status.OK, "Driver found");
}
}``````

You can also define a Validable class and use it to validate a form by setting it on your whole configuration:

``````// Note: some parts of the API were removed for clarity

public class MyConfiguration {

// a lot of @Options
}

public MyComponent {
public MyComponent(@Validable("configuration") MyConfiguration config) {
// ...
}

//...
}

@AsyncValidation("configuration")
public ValidationResult validateDriver(@Option("value") MyConfiguration configuration) {
if (isValid(configuration)) {
return new ValidationResult(Status.OK, "Configuration valid");
}
return new ValidationResult(Status.KO, "Driver not valid ${because ...}"); }``````  The parameter binding of the validation method uses the same logic as the component configuration injection. Therefore, the `@Option` method specifies the prefix to use to reference a parameter. It is recommended to use `@Option("value")` until you know exactly why you don’t use it. This way, the consumer can match the configuration model and just prefix it with `value.` to send the instance to validate. #### Completion It can be handy and user-friendly to provide completion on some fields. For example, to define completion for available drivers: ``````// ... public class JdbcDataStore implements Serializable { @Option @Completable("driver") private String driver; // ... } @Completion("driver") public CompletionList findDrivers() { return new CompletionList(findDriverList()); }`````` #### Component representation Each component must have its own icon: ``````@Icon(Icon.IconType.DB_INPUT) @PartitionMapper(family = "jdbc", name = "input") public class JdbcPartitionMapper implements Serializable { }``````  You can use talend.surge.sh/icons/ to find the icon you want to use. ### Enforcing versioning on components It is recommended to enforce the version of your component, event though it is not mandatory for the first version. ``````@Version(1) @PartitionMapper(family = "jdbc", name = "input") public class JdbcPartitionMapper implements Serializable { }`````` If you break a configuration entry in a later version; make sure to: 1. Upgrade the version. 2. Support a migration of the configuration. ``````@Version(value = 2, migrationHandler = JdbcPartitionMapper.Migrations.class) @PartitionMapper(family = "jdbc", name = "input") public class JdbcPartitionMapper implements Serializable { public static class Migrations implements MigrationHandler { // implement your migration } }`````` ### Testing components Testing your components is critical. You can use unit and simple standalone JUnit tests, but it is also highly recommended to have Beam tests in order to make sure that your component works in Big Data. ## Talend Component REST API Documentation  A test environment is available on Heroku and can be browsed using Talend Component Kit Server instance on Restlet Studio. ### HTTP API The HTTP API intends to expose most Talend Component Kit features over HTTP. It is a standalone Java HTTP server.  The WebSocket protocol is activated for the endpoints. Endpoints then use `/websocket/v1` as base instead of `/api/v1`. See WebSocket for more details. Here is the API: #### REST resources of Component Runtime :: Server Parent :: Server 1.0.2-SNAPSHOT ##### `POST api/v1/action/execute` This endpoint will execute any UI action and serialize the response as a JSON (pojo model) It takes as input the family, type and name of the related action to identify it and its configuration as a flat key value set using the same kind of mapping than for components (option path as key). ###### Request Content-Type: `application/json` Request Body: (`java.util.Map<java.lang.String, java.lang.String>`) Query Param: `action`, `java.lang.String` Query Param: `family`, `java.lang.String` Query Param: `lang`, `java.lang.String` Query Param: `type`, `java.lang.String` ###### Response Content-Type: `application/json` `200 OK` Response Body: (``) `400 Bad Request` Response Body: (`org.talend.sdk.component.server.front.model.error.ErrorPayload`) ``````{ "code": "ACTION_ERROR|ACTION_MISSING|BAD_FORMAT|COMPONENT_MISSING|CONFIGURATION_MISSING|DESIGN_MODEL_MISSING|FAMILY_MISSING|ICON_MISSING|PLUGIN_MISSING|UNAUTHORIZED|UNEXPECTED", "description": "string" }`````` `404 Not Found` Response Body: (`org.talend.sdk.component.server.front.model.error.ErrorPayload`) ``````{ "code": "ACTION_ERROR|ACTION_MISSING|BAD_FORMAT|COMPONENT_MISSING|CONFIGURATION_MISSING|DESIGN_MODEL_MISSING|FAMILY_MISSING|ICON_MISSING|PLUGIN_MISSING|UNAUTHORIZED|UNEXPECTED", "description": "string" }`````` ##### `GET api/v1/action/index` This endpoint returns the list of available actions for a certain family and potentially filters the " output limiting it to some families and types of actions. ###### Request No body Query Param: `family`, `java.lang.String` Query Param: `language`, `java.lang.String` Query Param: `type`, `java.lang.String` ###### Response Content-Type: `application/json` `200 OK` Response Body: (`org.talend.sdk.component.server.front.model.ActionList`) ``````{ "items": [ { "component": "string", "name": "string", "properties": [ { "defaultValue": "string", "displayName": "string", "metadata": { }, "name": "string", "path": "string", "placeholder": "string", "proposalDisplayNames": { }, "type": "string", "validation": { "enumValues": [ "string" ], "max": 0, "maxItems": 0, "maxLength": 0, "min": 0, "minItems": 0, "minLength": 0, "pattern": "string", "required": false, "uniqueItems": false } } ], "type": "string" } ] }`````` ##### `GET api/v1/component/dependencies` Returns a list of dependencies for the given components.  don’t forget to add the component itself since it will not be part of the dependencies. Then you can use /dependency/{id} to download the binary. ###### Request No body Query Param: `identifier`, `java.lang.String` ###### Response Content-Type: `application/json` `200 OK` Response Body: (`org.talend.sdk.component.server.front.model.Dependencies`) ``````{ "dependencies": { } }`````` ##### `GET api/v1/component/dependency/{id}` Return a binary of the dependency represented by `id`. It can be maven coordinates for dependencies or a component id. ###### Request No body Path Param: `id`, `java.lang.String` ###### Response Content-Type: `application/json` `200 OK` Response Body: (`javax.ws.rs.core.StreamingOutput`) `404 Not Found` Response Body: (`org.talend.sdk.component.server.front.model.error.ErrorPayload`) ``````{ "code": "ACTION_ERROR|ACTION_MISSING|BAD_FORMAT|COMPONENT_MISSING|CONFIGURATION_MISSING|DESIGN_MODEL_MISSING|FAMILY_MISSING|ICON_MISSING|PLUGIN_MISSING|UNAUTHORIZED|UNEXPECTED", "description": "string" }`````` ##### `GET api/v1/component/details` Returns the set of metadata about a few components identified by their 'id'. ###### Request No body Query Param: `identifiers`, `java.lang.String` Query Param: `language`, `java.lang.String` ###### Response Content-Type: `application/json` `200 OK` Response Body: (`org.talend.sdk.component.server.front.model.ComponentDetailList`) ``````{ "details": [ { "actions": [ { "family": "string", "name": "string", "properties": [ { "defaultValue": "string", "displayName": "string", "metadata": { }, "name": "string", "path": "string", "placeholder": "string", "proposalDisplayNames": { }, "type": "string", "validation": { "enumValues": [ "string" ], "max": 0, "maxItems": 0, "maxLength": 0, "min": 0, "minItems": 0, "minLength": 0, "pattern": "string", "required": false, "uniqueItems": false } } ], "type": "string" } ], "displayName": "string", "icon": "string", "id": { "family": "string", "familyId": "string", "id": "string", "name": "string", "plugin": "string", "pluginLocation": "string" }, "inputFlows": [ "string" ], "links": [ { "contentType": "string", "name": "string", "path": "string" } ], "outputFlows": [ "string" ], "properties": [ { "defaultValue": "string", "displayName": "string", "metadata": { }, "name": "string", "path": "string", "placeholder": "string", "proposalDisplayNames": { }, "type": "string", "validation": { "enumValues": [ "string" ], "max": 0, "maxItems": 0, "maxLength": 0, "min": 0, "minItems": 0, "minLength": 0, "pattern": "string", "required": false, "uniqueItems": false } } ], "type": "string", "version": 0 } ] }`````` `400 Bad Request` Response Body: (`java.util.Map<java.lang.String, org.talend.sdk.component.server.front.model.error.ErrorPayload>`) ##### `GET api/v1/component/icon/family/{id}` Returns a particular family icon in raw bytes. ###### Request No body Path Param: `id`, `java.lang.String` ###### Response Content-Type: `application/json` `200 OK` Response Body: (`byte[]`) ``````{ }`````` `404 Not Found` Response Body: (`org.talend.sdk.component.server.front.model.error.ErrorPayload`) ``````{ "code": "ACTION_ERROR|ACTION_MISSING|BAD_FORMAT|COMPONENT_MISSING|CONFIGURATION_MISSING|DESIGN_MODEL_MISSING|FAMILY_MISSING|ICON_MISSING|PLUGIN_MISSING|UNAUTHORIZED|UNEXPECTED", "description": "string" }`````` ##### `GET api/v1/component/icon/{id}` Returns a particular component icon in raw bytes. ###### Request No body Path Param: `id`, `java.lang.String` ###### Response Content-Type: `application/json` `200 OK` Response Body: (`byte[]`) ``````{ }`````` `404 Not Found` Response Body: (`org.talend.sdk.component.server.front.model.error.ErrorPayload`) ``````{ "code": "ACTION_ERROR|ACTION_MISSING|BAD_FORMAT|COMPONENT_MISSING|CONFIGURATION_MISSING|DESIGN_MODEL_MISSING|FAMILY_MISSING|ICON_MISSING|PLUGIN_MISSING|UNAUTHORIZED|UNEXPECTED", "description": "string" }`````` ##### `GET api/v1/component/index` Returns the list of available components. ###### Request No body Query Param: `includeIconContent`, `boolean` Query Param: `language`, `java.lang.String` ###### Response Content-Type: `application/json` `200 OK` Response Body: (`org.talend.sdk.component.server.front.model.ComponentIndices`) ``````{ "components": [ { "categories": [ "string" ], "displayName": "string", "familyDisplayName": "string", "icon": { "customIcon": { }, "customIconType": "string", "icon": "string" }, "iconFamily": { "customIcon": { }, "customIconType": "string", "icon": "string" }, "id": { "family": "string", "familyId": "string", "id": "string", "name": "string", "plugin": "string", "pluginLocation": "string" }, "links": [ { "contentType": "string", "name": "string", "path": "string" } ], "version": 0 } ] }`````` ##### `POST api/v1/component/migrate/{id}/{configurationVersion}` Allows to migrate a component configuration without calling any component execution. ###### Request Content-Type: `application/json` Request Body: (`java.util.Map<java.lang.String, java.lang.String>`) Path Param: `configurationVersion`, `int` Path Param: `id`, `java.lang.String` ###### Response Content-Type: `application/json` `200 OK` Response Body: (`java.util.Map<java.lang.String, java.lang.String>`) ##### `GET api/v1/configurationtype/details` Returns the set of metadata about a few configurations identified by their 'id'. ###### Request No body Query Param: `identifiers`, `java.lang.String` Query Param: `language`, `java.lang.String` ###### Response Content-Type: `application/json` `200 OK` Response Body: (`org.talend.sdk.component.server.front.model.ConfigTypeNodes`) ``````{ "nodes": { } }`````` ##### `GET api/v1/configurationtype/index` Returns all available configuration type - storable models. Note that the lightPayload flag allows to load all of them at once when you eagerly need to create a client model for all configurations. ###### Request No body Query Param: `language`, `java.lang.String` Query Param: `lightPayload`, `boolean` ###### Response Content-Type: `application/json` `200 OK` Response Body: (`org.talend.sdk.component.server.front.model.ConfigTypeNodes`) ``````{ "nodes": { } }`````` ##### `POST api/v1/configurationtype/migrate/{id}/{configurationVersion}` Allows to migrate a configuration without calling any component execution. ###### Request Content-Type: `application/json` Request Body: (`java.util.Map<java.lang.String, java.lang.String>`) Path Param: `configurationVersion`, `int` Path Param: `id`, `java.lang.String` ###### Response Content-Type: `application/json` `200 OK` Response Body: (`java.util.Map<java.lang.String, java.lang.String>`) ##### `GET api/v1/documentation/component/{id}` Returns an asciidoctor version of the documentation for the component represented by its identifier `id`. Format can be either asciidoc or html - if not it will fallback on asciidoc - and if html is selected you get a partial document.  it is recommended to use asciidoc format and handle the conversion on your side if you can, the html flavor handles a limited set of the asciidoc syntax only like plain arrays, paragraph and titles. The documentation will likely be the family documentation but you can use anchors to access a particular component (_componentname_inlowercase). ###### Request No body Path Param: `id`, `java.lang.String` Query Param: `format`, `java.lang.String` Query Param: `language`, `java.lang.String` ###### Response Content-Type: `application/json` `200 OK` Response Body: (`org.talend.sdk.component.server.front.model.DocumentationContent`) ``````{ "source": "string", "type": "string" }`````` ##### `GET api/v1/environment` Returns the environment of this instance. Useful to check the version or configure a healthcheck for the server. ###### Request No body ###### Response Content-Type: `*/*` `200 OK` Response Body: (`org.talend.sdk.component.server.front.model.Environment`) ``````{ "commit": "string", "lastUpdated": { }, "latestApiVersion": 0, "time": "string", "version": "string" }`````` ##### `POST api/v1/execution/read/{family}/{component}`  deprecated Read inputs from an instance of mapper. The number of returned records if enforced to be limited to 1000. The format is a JSON based format where each like is a json record. ###### Request Content-Type: `application/json` Request Body: (`java.util.Map<java.lang.String, java.lang.String>`) Path Param: `component`, `java.lang.String` Path Param: `family`, `java.lang.String` Query Param: `size`, `long` ###### Response Content-Type: `talend/stream` `204 No Content` ##### `POST api/v1/execution/write/{family}/{component}`  deprecated Sends records using a processor instance. Note that the processor should have only an input. Behavior for other processors is undefined. The input format is a JSON based format where each like is a json record - same as for the symmetric endpoint. ###### Request Content-Type: `talend/stream` Request Body: (`java.io.InputStream`) Path Param: `component`, `java.lang.String` Path Param: `family`, `java.lang.String` Query Param: `group-size`, `long` ###### Response Content-Type: `application/json` `204 No Content`  To make sure that the migration can be enabled, you need to set the version the component was created with in the execution configuration that you send to the server (component version is in component the detail endpoint). To do that, use `tcomp::component::version` key. #### Deprecated endpoints Endpoints that are intended to disappear will be deprecated. A `X-Talend-Warning` header will be returned with a message as value. #### WebSocket transport You can connect yo any endpoint by: 1. Replacing `/api` with `/websocket` 2. Appending `/<http method>` to the URL 3. Formatting the request as: ``````SEND destination: <endpoint after v1> <headers> <payload>^@`````` For example: ``````SEND destination: /component/index Accept: application/json ^@`````` The response is formatted as follows: ``````MESSAGE status: <http status code> <headers> <payload>^@``````  All endpoints are logged at startup. You can then find them in the logs if you have a doubt about which one to use. If you don’t want to create a pool of connections per endpoint/verb, you can use the bus endpoint: `/websocket/v1/bus`. This endpoint requires that you add the `destinationMethod` header to each request with the verb value (`GET` by default): ``````SEND destination: /component/index destinationMethod: GET Accept: application/json ^@`````` ### HTTPS activation Using the server ZIP (or Docker image), you can configure HTTPS by adding properties to `MEECROWAVE_OPTS`. Assuming that you have a certificate in `/opt/certificates/component.p12` (don’t forget to add/mount it in the Docker image if you use it), you can activate it as follows: ``````# use -e for Docker # # this skips the http port binding and only binds https on the port 8443, and setups the correct certificate export MEECROWAVE_OPTS="-Dskip-http=true -Dssl=true -Dhttps=8443 -Dkeystore-type=PKCS12 -Dkeystore-alias=talend -Dkeystore-password=talend -Dkeystore-file=/opt/certificates/component.p12"`````` ### Web forms and REST API The `component-form` library provides a way to build a component REST API facade that is compatible with React form library. for example: ``````@Path("tacokit-facade") @ApplicationScoped public class ComponentFacade { private static final String[] EMPTY_ARRAY = new String[0]; @Inject private Client client; @Inject private ActionService actionService; @Inject private UiSpecService uiSpecService; @Inject // assuming it is available in your app, use any client you want private WebTarget target; @POST @Path("action") public void action(@Suspended final AsyncResponse response, @QueryParam("family") final String family, @QueryParam("type") final String type, @QueryParam("action") final String action, final Map<String, Object> params) { client.action(family, type, action, params).handle((r, e) -> { if (e != null) { onException(response, e); } else { response.resume(actionService.map(type, r)); } return null; }); } @GET @Path("index") public void getIndex(@Suspended final AsyncResponse response, @QueryParam("language") @DefaultValue("en") final String language) { target .path("component/index") .queryParam("language", language) .request(APPLICATION_JSON_TYPE) .rx() .get(ComponentIndices.class) .toCompletableFuture() .handle((index, e) -> { if (e != null) { onException(response, e); } else { index.getComponents().stream().flatMap(c -> c.getLinks().stream()).forEach( link -> link.setPath(link.getPath().replaceFirst("/component/", "/application/").replace( "/details?identifiers=", "/detail/"))); response.resume(index); } return null; }); } @GET @Path("detail/{id}") public void getDetail(@Suspended final AsyncResponse response, @QueryParam("language") @DefaultValue("en") final String language, @PathParam("id") final String id) { target .path("component/details") .queryParam("language", language) .queryParam("identifiers", id) .request(APPLICATION_JSON_TYPE) .rx() .get(ComponentDetailList.class) .toCompletableFuture() .thenCompose(result -> uiSpecService.convert(result.getDetails().iterator().next())) .handle((result, e) -> { if (e != null) { onException(response, e); } else { response.resume(result); } return null; }); } private void onException(final AsyncResponse response, final Throwable e) { final UiActionResult payload; final int status; if (WebException.class.isInstance(e)) { final WebException we = WebException.class.cast(e); status = we.getStatus(); payload = actionService.map(we); } else if (CompletionException.class.isInstance(e)) { final CompletionException actualException = CompletionException.class.cast(e); log.error(actualException.getMessage(), actualException); status = Response.Status.BAD_GATEWAY.getStatusCode(); payload = actionService.map(new WebException(actualException, -1, emptyMap())); } else { log.error(e.getMessage(), e); status = Response.Status.BAD_GATEWAY.getStatusCode(); payload = actionService.map(new WebException(e, -1, emptyMap())); } response.resume(new WebApplicationException(Response.status(status).entity(payload).build())); } }``````  the `Client` can be created using `ClientFactory.createDefault(System.getProperty("app.components.base", "http://localhost:8080/api/v1"))` and the service can be a simple `new UiSpecService<>()`. The factory uses JAX-RS if the API is available (assuming a JSON-B provider is registered). Otherwise, it tries to use Spring. The conversion from the component model (REST API) to the uiSpec model is done through `UiSpecService`. It is based on the object model which is mapped to a UI model. Having a flat model in the component REST API allows to customize layers easily. You can completely control the available components, tune the rendering by switching the `uiSchema`, and add or remove parts of the form. You can also add custom actions and buttons for specific needs of the application.  The `/migrate` endpoint was not shown in the previous snippet but if you need it, add it as well. #### Using the UiSpec model without the tooling ``````<dependency> <groupId>org.talend.sdk.component</groupId> <artifactId>component-form-model</artifactId> <version>${talend-component-kit.version}</version>
</dependency>``````

This Maven dependency provides the UISpec model classes. You can use the `Ui` API (with or without the builders) to create UiSpec representations.

For Example:

``````final Ui form1 = ui()
// (1)
.withJsonSchema(JsonSchema.jsonSchemaFrom(Form1.class).build())
// (2)
.withUiSchema(uiSchema()
.withKey("multiSelectTag")
.withRestricted(false)
.withTitle("Simple multiSelectTag")
.withDescription("This data list accepts values that are not in the list of suggestions")
.withWidget("multiSelectTag")
.build())
// (3)
.withProperties(myFormInstance)
.build();

// (4)
final String json = jsonb.toJson(form1);``````
 1 The `JsonSchema` is extracted from reflection on the `Form1` class. Note that `@JsonSchemaIgnore` allows to ignore a field and `@JsonSchemaProperty` allows to rename a property. 2 A `UiSchema` is programmatically built using the builder API. 3 An instance of the form is passed to let the serializer extract its JSON model. 4 The `Ui` model, which can be used by UiSpec compatible front widgets, is serialized.

The model uses the JSON-B API to define the binding. Make sure to have an implementation in your classpath. To do that, add the following dependencies:

``````<dependency>
<groupId>org.apache.geronimo.specs</groupId>
<artifactId>geronimo-jsonb_1.0_spec</artifactId>
<version>1.0</version>
</dependency>
<dependency>
<groupId>org.apache.geronimo.specs</groupId>
<artifactId>geronimo-json_1.1_spec</artifactId>
<version>1.0</version>
</dependency>
<dependency>
<groupId>org.apache.johnzon</groupId>
<artifactId>johnzon-jsonb</artifactId>
<version>${johnzon.version}</version> <!-- 1.1.5 for instance --> </dependency>`````` #### JavaScript integration Default JavaScript integration goes through the Talend UI Forms library. It is bundled as a NPM module called `component-kit.js`. It provides a default trigger implementation for `UIForm`. Here is how to use it: ``````import React from 'react'; import UIForm from '@talend/react-forms/lib/UIForm/UIForm.container'; import TalendComponentKitTrigger from 'component-kit.js'; export default class ComponentKitForm extends React.Component { constructor(props) { super(props); this.trigger = new TalendComponentKitTrigger({ url: '/api/to/component/server/proxy' }); this.onTrigger = this.onTrigger.bind(this); // ... } onTrigger(event, payload) { return this.trigger.onDefaultTrigger(event, payload); } // ... render() { if(! this.state.uiSpec) { return (<div>Loading ...</div>); } return ( <UIForm data={this.state.uiSpec} onTrigger={this.onTrigger} onSubmit={this.onSubmit} /> ); } }`````` ### Logging The logging uses Log4j2. You can specify a custom configuration by using the `-Dlog4j.configurationFile` system property or by adding a `log4j2.xml` file to the classpath. Here are some common configurations: • Console logging: ``````<?xml version="1.0"?> <Configuration status="INFO"> <Appenders> <Console name="Console" target="SYSTEM_OUT"> <PatternLayout pattern="[%d{HH:mm:ss.SSS}][%highlight{%-5level}][%15.15t][%30.30logger] %msg%n"/> </Console> </Appenders> <Loggers> <Root level="INFO"> <AppenderRef ref="Console"/> </Root> </Loggers> </Configuration>`````` Output messages look like: ``[16:59:58.198][INFO ][ main][oyote.http11.Http11NioProtocol] Initializing ProtocolHandler ["http-nio-34763"]`` • JSON logging: ``````<?xml version="1.0"?> <Configuration status="INFO"> <Properties> <!-- DO NOT PUT logSource there, it is useless and slow --> <Property name="jsonLayout">{"severity":"%level","logMessage":"%encode{%message}{JSON}","logTimestamp":"%d{ISO8601}{UTC}","eventUUID":"%uuid{RANDOM}","@version":"1","logger.name":"%encode{%logger}{JSON}","host.name":"${hostName}","threadName":"%encode{%thread}{JSON}","stackTrace":"%encode{%xThrowable{full}}{JSON}"}%n</Property>
</Properties>
<Appenders>
<Console name="Console" target="SYSTEM_OUT">
<PatternLayout pattern="${jsonLayout}"/> </Console> </Appenders> <Loggers> <Root level="INFO"> <AppenderRef ref="Console"/> </Root> </Loggers> </Configuration>`````` Output messages look like: ``{"severity":"INFO","logMessage":"Initializing ProtocolHandler [\"http-nio-46421\"]","logTimestamp":"2017-11-20T16:04:01,763","eventUUID":"8b998e17-7045-461c-8acb-c43f21d995ff","@version":"1","logger.name":"org.apache.coyote.http11.Http11NioProtocol","host.name":"TLND-RMANNIBUCAU","threadName":"main","stackTrace":""}`` • Rolling file appender: ``````<?xml version="1.0"?> <Configuration status="INFO"> <Appenders> <RollingRandomAccessFile name="File" fileName="${LOG_PATH}/application.log" filePattern="${LOG_PATH}/application-%d{yyyy-MM-dd}.log"> <PatternLayout pattern="[%d{HH:mm:ss.SSS}][%highlight{%-5level}][%15.15t][%30.30logger] %msg%n"/> <Policies> <SizeBasedTriggeringPolicy size="100 MB" /> <TimeBasedTriggeringPolicy interval="1" modulate="true"/> </Policies> </RollingRandomAccessFile> </Appenders> <Loggers> <Root level="INFO"> <AppenderRef ref="File"/> </Root> </Loggers> </Configuration>`````` More details are available in the RollingFileAppender documentation.  You can compose previous layout (message format) and appenders (where logs are written). ### UiSpec Server The UiSpec server is a companion application for the Component Server. It provides a client to the Component Server which serves UiSpec payload to integrate with the client JavaScript `UiForm` library. #### Coordinates ``````<dependency> <groupId>org.talend.sdk.component</groupId> <artifactId>component-server-proxy</artifactId> <version>${server-proxy.version}</version>
</dependency>``````

#### Configuring the UiSpec server

 the configuration is read from system properties, environment variables, …​. If you use `playx-microprofile-config`, you can also use typesafe configuration.
Key Description Default

talend.component.proxy.actions.proposable.cached

If true the proposable (suggestion lists only depending on the server state) will be cached, otherwise they will be requested for each form rendering.

true

talend.component.proxy.application.home

A home location for relative path resolution (optional).

${playx.application.home} talend.component.proxy.client.providers List of JAX-RS providers to register on the client, at least a JSON-B one should be here. - talend.component.proxy.client.timeouts.connect The connect timeout for the communication with the server.base in ms. 60000 talend.component.proxy.client.timeouts.read The read timeout for the communication with the server.base in ms. 600000 talend.component.proxy.jcache.active Should the server use jcache to store catalog information and refresh it with some polling. If so the keys `talend.component.proxy.jcache.caches.$cacheName.expiry.duration`, `talend.component.proxy.jcache.caches.$cacheName.management.active` and `talend.component.proxy.jcache.caches.$cacheName.statistics.active` will be read to create a JCache `MutableConfiguration`. Also note that if all the cachesshare the same configuration you can ignore the `$cacheName` layer. true talend.component.proxy.jcache.provider Caching provider implementation to use (only set it if ambiguous). - talend.component.proxy.jcache.refresh.period Number of seconds used to check if the server must be refreshed. 60 talend.component.proxy.processing.headers The headers to append to the request when contacting the server. Format is a properties one. You can put a hardcoded value or a placeholder (`${key}`).In this case it will be read from the request attributes and headers.

-

talend.component.proxy.processing.uiSpec.patch

An optional location (absolute or resolved from `APP_HOME` environment variable). It can take an optional query parameter `force` which specifies if the startup should fail if the file is not resolved. The resolution is done per configuration type (`datastore`, `dataset`, …​) but fallbacks on `default` type if the file is not found.

The values can be keys in the resource bundle `org.talend.sdk.component.proxy.enrichment.i18n.Messages`. Use that for display names, placeholders etc…​The content

talend.component.proxy.server.base

The base to contact the remote server (NOTE: it is recommanded to put a load balancer if you have multiple instances.)

-

##### Adding custom entries to the forms

As shown in the table above, you can customize the forms by type. The format reuses Talend Component Kit REST API (properties model) and defines two main types of extensions:

1. `prependProperties`: Lists all the models of properties added to the form before the actual underlying form.

2. `appendProperties`: Lists all the models of properties added to the form after the actual underlying form.

If you don’t specify a name, the path is used to deduce the name automatically.

 Always make sure to define a root object for these properties. Do not use dots in the `path` value. It is recommended to prefix it with a `$` character. ##### Adding custom converters (selecting the widget or rendering) When developing a `org.talend.sdk.component.form.internal.converter.CustomPropertyConverter` CDI, the proxy adds it to the `UiSpecService` service and uses it with a high priority to convert the server model to a `UiSpec` model.  To make it a CDI bean, add `@Dependent` to the class and if you use the Play integration, customize the bean array: `playx.cdi.beans.customs += {className: org.talend.myapp.MyConverter}.` This allows to use a custom `@Ui` API and advanced modeling when specific to applications. Converters are sorted respecting to the `@Priority` value. If the annotation is missing, the priority defaults to `0`. ##### Client in Play The client to use to connect to the Talend Component Kit server is the CXF client, using HttpClient HC (NIO) transport. When you use the Play module, it can be configured with its standard properties prefixed by `talend.component.proxy.`. You can find more information on CXF website. ##### Defining a dropdown with all root configurations The special `dynamic_values` action `builtin::roots` can be used for a dropdown filled with all available root types. Here is a sample patch file: ``````{ "prependProperties": [ { "path": "$datasetMetadata",
"type": "OBJECT"
},
{
"path": "$datasetMetadata.type", "displayName": "Types", "type": "ENUM", "metadata": { "action::dynamic_values": "builtin::roots" } } ] }`````` ##### Reloading the form based on the selected root The `builtin::root::reloadFromId` action, with the `jsonpatch` type, allows to reload the whole form: ``````{ "path": "$datasetMetadata.type",
"displayName": "Types",
"type": "STRING",
"action::dynamic_values": "builtin::roots", (1)
}
}``````
 1 Prepopulating the dropdown with the list of datastores. 2 On selection of a datastore, refreshing the form with the new parameters.

It is common to have a dropdown with the list of roots and to reload the form when one is selected.

For example, the `UIForm` part (JavaScript side) can be implemented as follows:

``````import kit from 'component-kit.js';

// ...

constructor(props) {
super(props);

this.state = {};
this.trigger = kit.createTriggers({
url: '/componentproxy/api/v1/actions/execute',
customRegistry: {
reloadForm: ({ body }) => { (1)
const { _datasetMetadata } = this.state.uiSpec.properties;
return {
...body, (2)
};
}
}
});

// ...
}

// ...

.then(result => {
if (result.properties || result.errors || result.uiSchema || result.jsonSchema) { (4)
this.setState({
uiSpec: {
...this.state.uiSpec,
...result,
}
});
}
});
}``````
 1 Adding a custom handler for the specific `reloadForm` action. 2 Passing the `uiSchema` and `jsonSchema` to the next step in the response processing chain. 3 Resetting the dynamic part of the form. Only the static part is kept. 4 Merging back the result of the handler into the current state. You can use `redux` or `cmf`.

#### HTTP API

##### Component UiSpec Server
###### Overview

These endpoints allow to obtain UiSpec representations of the component/configuration types properties.

Version information

Version : v1

Contact information

Contact : Talend
Contact Email : contact@talend.com

URI scheme

Host : host:port
BasePath : /componentproxy/api/v1
Schemes : HTTP, HTTPS

Tags
• action

• configuration

• configurations

• dataset

• datastore

• form

• icon

• persistence

• ui spec

• uispec

###### Paths
This endpoint execute an action required by a form.
`POST /actions/execute`
Description

configuration types has action that can be executed using this endpoint

Parameters
Type Name Schema

Query

action
optional

string

Query

family
optional

string

Query

language
optional

string

Query

type
optional

string

Responses
HTTP Code Description Schema

200

successful operation
`Talend-Component-Server-Error` (boolean) : This header indicate the error origin. true indicate an error from the component server, false indicate that the error is from this proxy.

400

This response is returned when the action is null

404

This response is returned when no action is found

520

This response is returned when the action raise an unhandled error

Consumes
• `application/json`

Produces
• `application/json`

Tags
• action

• configurations

Return all the available root configuration (Data store like) from the component server
`GET /configurations`
Description

Every configuration has an icon. In the response an icon key is returned. this icon key can be one of the bundled icons or a custom one. The consumer of this endpoint will need to check if the icon key is in the icons bundle otherwise the icon need to be gathered using the `familyId` from this endpoint `configurations/{id}/icon`

Responses
HTTP Code Description Schema

200

successful operation
`Talend-Component-Server-Error` (boolean) : This header indicate the error origin. true indicate an error from the component server, false indicate that the error is from this proxy.

Consumes
• `application/json`

Produces
• `application/json`

Tags
• configurations

• datastore

Return a form description ( Ui Spec ) without a specific configuration
`GET /configurations/form/initial/{type}`
Parameters
Type Name Schema

Path

type
required

string

Responses
HTTP Code Description Schema

200

successful operation
`Talend-Component-Server-Error` (boolean) : This header indicate the error origin. true indicate an error from the component server, false indicate that the error is from this proxy.

Consumes
• `application/json`

Produces
• `application/json`

Tags
• configurations

• dataset

• datastore

• form

• ui spec

Return a form description ( Ui Spec ) of a specific configuration
`GET /configurations/form/{id}`
Parameters
Type Name Schema

Path

id
required

string

Responses
HTTP Code Description Schema

200

successful operation
`Talend-Component-Server-Error` (boolean) : This header indicate the error origin. true indicate an error from the component server, false indicate that the error is from this proxy.

Consumes
• `application/json`

Produces
• `application/json`

Tags
• configurations

• dataset

• datastore

• form

• ui spec

Return the configuration icon file in png format
`GET /configurations/icon/{id}`
Parameters
Type Name Schema

Path

id
required

string

Responses
HTTP Code Description Schema

200

successful operation
`Talend-Component-Server-Error` (boolean) : This header indicate the error origin. true indicate an error from the component server, false indicate that the error is from this proxy.

Consumes
• `application/json`

Produces
• `application/json`

• `application/octet-stream`

Tags
• icon

Update a configuration.
`POST /configurations/persistence/edit/{id}`
Parameters
Type Name Schema

Path

id
required

string

Responses
HTTP Code Description Schema

200

successful operation
`Talend-Component-Server-Error` (boolean) : This header indicate the error origin. true indicate an error from the component server, false indicate that the error is from this proxy.

Consumes
• `application/json`

Produces
• `application/json`

Tags
• configurations

• dataset

• datastore

• form

• persistence

• ui spec

Saves a configuration based on a type. Concretely it is the same as `/persistence/save/{formId}` but the `formId` is contained into the payload itself and marked in the metadata as such.
`POST /configurations/persistence/save-from-type/{type}`
Parameters
Type Name Schema

Path

type
required

string

Responses
HTTP Code Description Schema

200

successful operation
`Talend-Component-Server-Error` (boolean) : This header indicate the error origin. true indicate an error from the component server, false indicate that the error is from this proxy.

Consumes
• `application/json`

Produces
• `application/json`

Tags
• configurations

• dataset

• datastore

• form

• persistence

• ui spec

Saves a configuration based on a form identifier.
`POST /configurations/persistence/save/{formId}`
Parameters
Type Name Schema

Path

formId
required

string

Responses
HTTP Code Description Schema

200

successful operation
`Talend-Component-Server-Error` (boolean) : This header indicate the error origin. true indicate an error from the component server, false indicate that the error is from this proxy.

Consumes
• `application/json`

Produces
• `application/json`

Tags
• configurations

• dataset

• datastore

• form

• persistence

• ui spec

###### Definitions
CompletionStage

Type : object

CompletionStageByte[]

Type : object

CompletionStageCollectionSimplePropertyDefinition

Type : object

CompletionStageMapStringObject

Type : object

CompletionStageMapStringString

Type : object

CompletionStageNodes

Type : object

Condition
Name Schema

path
optional

string

values
optional

< object > array

EntityRef
Name Description Schema

id
optional

The identifier of the entity related to current request. It is generally thecreated entity of updated one.

string

JsonSchema
Name Schema

defaultValue
optional

object

description
optional

string

enumValues
optional

< string > array

id
optional

string

items
optional

maxItems
optional

integer (int32)

maxLength
optional

integer (int32)

maximum
optional

number (double)

minItems
optional

integer (int32)

minLength
optional

integer (int32)

minimum
optional

number (double)

pattern
optional

string

properties
optional

< string, JsonSchema > map

ref
optional

string

required
optional

< string > array

schema
optional

string

title
optional

string

type
optional

string

uniqueItems
optional

boolean

NameValue
Name Schema

name
optional

string

value
optional

string

Node
Name Description Schema

children
optional

The list of configuration reusing this one as a reference (can be created "next").

< string > array

familyId
optional

The identifier of the family of this configuration.

string

familyLabel
optional

The display name of the family of this configuration.

string

icon
optional

The icon of this configuration. If you use an existing bundle (@talend/ui/icon), ensure it is present by default and if not do a request using the family on the related endpoint.

string

id
optional

The identifier of this configuration/node.

string

label
optional

The display name of this configuration.

string

name
optional

The technical name of this node (it is human readable but not i18n friendly), useful for debug purposes.

string

version
optional

The version of this configuration for the migration management.

integer (int32)

Nodes
Name Description Schema

nodes
optional

The list of nodes matching the request. The key is the node identifier.

< string, Node > map

Option
Name Schema

path
optional

string

type
optional

string

Parameter
Name Schema

key
optional

string

path
optional

string

Name Description Schema

code
optional

The error code independently of the locale and not as precise as a message (not context aware).

string

message
optional

A human readable message to help understanding the error

string

Trigger
Name Schema

action
optional

string

family
optional

string

onEvent
optional

string

options
optional

< Option > array

parameters
optional

< Parameter > array

type
optional

string

Ui
Name Schema

jsonSchema
optional

properties
optional

object

uiSchema
optional

< UiSchema > array

UiNode
Name Description Schema

optional

The metadata associated to the node if needed by the UI.

ui
optional

The ui specification corresponding to the requested node. It is literally the form representing this configuration.

UiSchema
Name Schema

autoFocus
optional

boolean

conditions
optional

< Condition > array

description
optional

string

disabled
optional

boolean

itemWidget
optional

string

items
optional

< UiSchema > array

key
optional

string

options
optional

< string, string > map

placeholder
optional

string

optional

boolean

required
optional

boolean

restricted
optional

boolean

title
optional

string

titleMap
optional

< NameValue > array

triggers
optional

< Trigger > array

type
optional

string

widget
optional

string

There are two ways to call the `save` endpoint. If you don’t want to pass the form identifier and prefer to use a generic endpoint that simply passes the type of configuration you are configuring, then you need to modify your `enrichment` configuration to ensure that the form identifier is present and to specify which form field it is.

To do that, add the `proxyserver::formId` Boolean to the metadata:

``````{
"path": "$datasetMetadata.type", "displayName": "Types", "type": "STRING", "metadata": { // other metadata as seen previously "proxyserver::formId": "true" } }``````  Only the first property with `proxyserver::formId` set to `"true"` is used. The path cannot contain any array. #### Integrating with Play Thanks to Playx, you can deploy this server in a Play! application. 1. Import the pre-configuration of the Play integration. ``````<dependency> <groupId>org.talend.sdk.component</groupId> <artifactId>component-server-proxy-play</artifactId> <version>${server-proxy.version}</version>
</dependency>``````
1. Configure the integration in your `application.conf` file:

``````include "conf/component-proxy.play.conf" (1)

talend.component.proxy { (2)
server {
base = "http://localhost:8080/api/v1"
}
processing {
X-ServiceName = proxy

#### Configure

You can set the env variable `MEECROWAVE_OPTS` to customize the server, by default it is installed in `/opt/talend/component-kit`.

#### Maven repository

The maven repository is the default one of the machine, you can change it setting the system property `talend_component_server_maven_repository=/path/to/your/m2`.

##### Deploy components to the server

If you want to deploy some components you can configure which ones in MEECROWAVE_OPTS (see server doc online) and redirect your local m2:

``````\$ docker run \
-p 8080:8080 \
-v ~/.m2:/root/.m2 \
-e MEECROWAVE_OPTS="-Dtalend.component.server.component.coordinates=g:a:v,g2:a2:v2,..." \
component-server``````

#### Logging

The component server docker image comes with two log4j2 profile `default` and `kafka`. The logging profile can be changed by setting the environment variable `TALEND_COMPONENT_LOG4J2_PROFILE` to `kafka` the `default` profile is active by default.

##### default profile

The default profile has file and console logging capabilities. The console logging is off by default and you can activate it by setting `CONSOLE_LOG_LEVEL` environment variable to `DEBUG`, `INFO`, `WARN` or any other log level supported by log4j2. In practise and during development you will want to see the logs without connecting to the server by activating console logging.

Run docker image with console logging

``````sudo docker run -p 8080:8080 \
-e CONSOLE_LOG_LEVEL=INFO \
component-server``````
##### Kafka profile

Kafka profile let you send log to Kafka servers. The log are formatted in json and follow the layout defined by Talend and described here github.com/Talend/daikon/tree/master/daikon-logging/logging-event-layout

This profile require two environment variables

• `LOG_KAFKA_TOPIC` : Kafka topic.

• `LOG_KAFKA_URL` : A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. This list should be in the form `url:port` separated by `,`

Run docker image with kafka profile

``````sudo docker run -p 8080:8080 \
-e TALEND_COMPONENT_LOG4J2_PROFILE=kafka \
-e LOG_KAFKA_URL=`log kafka url:port` \
-e LOG_KAFKA_TOPIC=`log kafka topic` \
-e TRACING_KAFKA_URL=`tracing kafka url:port` \
-e TRACING_KAFKA_TOPIC=`tracing kafka topic` \
tacokit/component-server``````

Note : `LOG_KAFKA_TOPIC` will receive the application and the access logs and `TRACING_KAFKA_TOPIC` will receive brave tracing logs.

##### Tracing (Brave Monitoring)

The component server use github.com/openzipkin/brave to monitor request.

The tracing can be activated by setting environment variable `TRACING_ON` to `true`.

You can choose the reporter type by setting `talend_component_server_monitoring_brave_reporter_type` environment variable to `log` (this is the default value in this docker image) or to `noop` which will deactivate the tracing. Other type of reporter may be added in the future.

The tracing rate is configurable by setting environment variable `TRACING_SAMPLING_RATE`. This is the default sample rate for all the endpoints and has a default value of 0.1

You can define more accurate rate for every component server endpoint using those environment variables :

Environment variable Endpoint

`talend_component_server_monitoring_brave_sampling_environment_rate`

`/api/v1/environment`

`talend_component_server_monitoring_brave_sampling_configurationtype_rate`

`/api/v1/configurationtype`

`talend_component_server_monitoring_brave_sampling_component_rate`

`/api/v1/component`

`talend_component_server_monitoring_brave_sampling_documentation_rate`

`/api/v1/documentation`

`talend_component_server_monitoring_brave_sampling_action_rate`

`/api/v1/action`

`talend_component_server_monitoring_brave_sampling_execution_rate`

`/api/v1/execution`

Run docker image with tracing on

``````sudo docker run -p 8080:8080 \
-e TRACING_ON=true \
-e TRACING_SAMPLING_RATE = 0.1 \
tacokit/component-server``````

#### Build the image yourself

You can build component starter server in docker following those instructions :

``````docker build --build-arg ARTIFACT_ID=component-starter-server \
--build-arg SERVER_VERSION=`component starter server version` \
--tag tacokit/component-server .``````
 this assumes the project is built before you run that command.

## Wrapping a Beam I/O

### Limitations

This part is limited to specific kinds of Beam `PTransform`:

• `PTransform<PBegin, PCollection<?>>` for inputs.

• `PTransform<PCollection<?>, PDone>` for outputs. Outputs must use a single (composite or not) `DoFn` in their `apply` method.

### Wrapping an input

To illustrate the input wrapping, this procedure uses the following input as a starting point (based on existing Beam inputs):

``````@AutoValue
public abstract [static] class Read extends PTransform<PBegin, PCollection<String>> {

// config

@Override
public PCollection<String> expand(final PBegin input) {
return input.apply(
}

// ... other transform methods
}``````

To wrap the `Read` in a framework component, create a transform delegating to that Read with at least a `@PartitionMapper` annotation and using `@Option` constructor injections to configure the component. Also make sure to follow the best practices and to specify `@Icon` and `@Version`.

``````@PartitionMapper(family = "myfamily", name = "myname")
public class WrapRead extends PTransform<PBegin, PCollection<String>> {
private PTransform<PBegin, PCollection<String>> delegate;

}

@Override
public PCollection<String> expand(final PBegin input) {
return delegate.expand(input);
}

// ... other methods like the mapping with the native configuration (createConfigurationFrom)
}``````

### Wrapping an output

To illustrate the output wrapping, this procedure uses the following output as a starting point (based on existing Beam outputs):

``````@AutoValue
public abstract [static] class Write extends PTransform<PCollection<String>, PDone> {

// configuration withXXX(...)

@Override
public PDone expand(final PCollection<String> input) {
input.apply(ParDo.of(new WriteFn(this)));
return PDone.in(input.getPipeline());
}

// other methods of the transform
}``````

You can wrap this output exactly the same way you wrap an input, but using `@Processor` instead of:

``````@Processor(family = "myfamily", name = "myname")
public class WrapWrite extends PTransform<PCollection<String>, PDone> {
private PTransform<PCollection<String>, PDone> delegate;

public WrapWrite(@Option("dataset") final WrapWriteDataSet dataset) {
delegate = TheIO.write().withConfiguration(this.createConfigurationFrom(dataset));
}

@Override
public PDone expand(final PCollection<String> input) {
return delegate.expand(input);
}

// ... other methods like the mapping with the native configuration (createConfigurationFrom)
}``````

### Tip

Note that the `org.talend.sdk.component.runtime.beam.transform.DelegatingTransform` class fully delegates the "expansion" to another transform. Therefore, you can extend it and implement the configuration mapping:

``````@Processor(family = "beam", name = "file")
public class BeamFileOutput extends DelegatingTransform<PCollection<String>, PDone> {

public BeamFileOutput(@Option("output") final String output) {
super(TextIO.write()
.withSuffix("test")
.to(FileBasedSink.convertToFileResourceIfPossible(output)));
}
}``````

In terms of classloading, when you write an I/O, the Beam SDK Java core stack is assumed as provided in Talend Component Kit runtime. This way, you don’t need to include it in the compile scope, it would be ignored anyway.

#### Coder

If you need a JSonCoder, you can use the `org.talend.sdk.component.runtime.beam.factory.service.PluginCoderFactory` service, which gives you access to the JSON-P and JSON-B coders.

There is also an Avro coder, which uses the `FileContainer`. It ensures it is self-contained for `IndexedRecord` and it does not require—as the default Apache Beam `AvroCoder`—to set the schema when creating a pipeline.
It consumes more space and therefore is slightly slower, but it is fine for `DoFn`, since it does not rely on serialization in most cases. See `org.talend.sdk.component.runtime.beam.transform.avro.IndexedRecordCoder`.

#### JsonObject to IndexedRecord

The mainstream model is `JsonObject` but it is common to have a legacy system using `IndexedRecord`. To mitigate the transition, you can use the following `PTransforms`:

• `IndexedRecordToJson`: to convert an `IndexedRecord` to a `JsonObject`.

• `JsonToIndexedRecord`: to convert a `JsonObject` to an `IndexedRecord`.

• `SchemalessJsonToIndexedRecord`: to convert a `JsonObject` to an `IndexedRecord` with AVRO schema inference.

#### Sample

Sample input based on Beam Kafka
``````@Version
@Icon(Icon.IconType.KAFKA)
@Emitter(name = "Input")
@AllArgsConstructor
@Documentation("Kafka Input")
public class KafkaInput extends PTransform<PBegin, PCollection<JsonObject>> { (1)

private final InputConfiguration configuration;

private final JsonBuilderFactory builder;

private final PluginCoderFactory coderFactory;

.withBootstrapServers(configuration.getBootstrapServers())
.withTopics(configuration.getTopics().stream().map(InputConfiguration.Topic::getName).collect(toList()))
.withKeyDeserializer(ByteArrayDeserializer.class).withValueDeserializer(ByteArrayDeserializer.class);
if (configuration.getMaxResults() > 0) {
}
}

@Override (2)
public PCollection<JsonObject> expand(final PBegin pBegin) {
final PCollection<KafkaRecord<byte[], byte[]>> kafkaEntries = pBegin.getPipeline().apply(delegate());
return kafkaEntries.apply(ParDo.of(new RecordToJson(builder))).setCoder(coderFactory.jsonp()); (3)
}

@AllArgsConstructor
private static class RecordToJson extends DoFn<KafkaRecord<byte[], byte[]>, JsonObject> {

private final JsonBuilderFactory builder;

@ProcessElement
public void onElement(final ProcessContext context) {
context.output(toJson(context.element()));
}

private JsonObject toJson(final KafkaRecord<byte[], byte[]> element) {
}
}
}``````
 1 The `PTransform` generics define that the component is an input (`PBegin` marker). 2 The `expand` method chains the native I/O with a custom mapper (`RecordToJson`). 3 The mapper uses the JSON-P coder automatically created from the contextual component.

Because the Beam wrapper does not respect the standard Talend Component Kit programming model ( for example, there is no `@Emitter`), you need to set the `<talend.validation.component>false</talend.validation.component>` property in your `pom.xml` file (or equivalent for Gradle) to skip the component programming model validations of the framework.

## Talend Component Kit Appendix

### Defining the ContainerManager or the classloader manager

The entry point of the API is the `ContainerManager`. It allows you to define what is the `Shared` classloader and to create children:

``````try (final ContainerManager manager = new ContainerManager( (1)
ContainerManager.DependenciesResolutionConfiguration.builder() (2)
.resolver(new MvnDependencyListLocalRepositoryResolver("META-INF/talend/dependencies.list"))
.rootRepositoryLocation(new File(System.getProperty("user.home", ".m2/repository"))
.create(),
.classesFilter(name -> true)
.parentClassesFilter(name -> true)
.create())) {

// create plugins

}``````
1 The `ContainerManager` is `AutoCloseable`, which allows you to use it in a try or finally block if needed.
This manager has two main configuration entries:
• how to resolve dependencies for plugins from the plugin file/location

• how to configure the classloaders (what is the parent classloader, how to handle the parent first/last delegation, and so on).

 It is recommended to keep the manager running if you can reuse plugins in order to avoid recreating classloaders and to mutualize them.
2 `DependenciesResolutionConfiguration` allows you to pass a custom `Resolver` which is used to build the plugin classloaders.
Currently, the library only provides `MvnDependencyListLocalRepositoryResolver`, which reads the output of `mvn dependencies:list`. Add it to the plugin jar to resolve the dependencies from a local maven repository.
Note that `SNAPSHOT` are only resolved based on their name and not from the metadata (only useful in development).
To continue the comparison with a Servlet server, you can implement an unpacked war resolver.
3 `ClassLoaderConfiguration` configures the behavior of the whole container/plugin pair, including:
• What the shared classloader is

• Which classes are loaded from the shared loader first (intended to be used for API which shouldn’t be loaded from the plugin loader)

• Which classes are loaded from the parent classloader. This can be useful to prevent loading a "common" library from the parent classloader. For instance, it can be neat for guava, commons-lang3, an so on).

### Creating plugins

Once you have defined a manager, you can create plugins:

``````final Container plugin1 = manager.create( (1)
"plugin-id", (2)
new File("/plugin/myplugin1.jar")); (3)``````
 1 To create a plugin `Container`, use the `create` method of the manager. 2 Give an explicit ID to the plugin. You can choose to bypass it. In that case, the manager uses the jar name. 3 Specify the plugin root jar.

To create the plugin container, the `Resolver` resolves the dependencies needed for the plugin, then the manager creates the plugin classloader and registers the plugin `Container`.

### Defining a listener for the plugin registration

Some actions are needed when a plugin is registered or unregistered. For that purpose, you can use `ContainerListener`:

``````public class MyListener implements ContainerListener {
@Override
public void onCreate(final Container container) {
System.out.println("Container #" + container.getId() + " started.");
}

@Override
public void onClose(final Container container) {
System.out.println("Container #" + container.getId() + " stopped.");
}
}``````

Plugins are directly registered on the manager:

``````final ContainerManager manager = getContainerManager();
final ContainerListener myListener = new MyListener();

manager.registerListener(myListener); (1)
// do something
manager.unregisterListener(myListener); (2)``````
 1 `registerListener` is used to add the listener going forward. However, it does not get any event for already created containers. 2 You can remove a listener at any time by using `unregisterListener`.
Scroll to top