Talend Components Definitions Documentation
Component definition
Talend Component Kit framework relies on several primitive components.
All components can use @PostConstruct
and @PreDestroy
annotations to initialize or release some underlying resource at the beginning and the end of a processing.
In distributed environments, class constructor are called on cluster manager node. Methods annotated with @PostConstruct and @PreDestroy are called on worker nodes. Thus, partition plan computation and pipeline tasks are performed on different nodes.
|
-
The created task is a JAR file containing class information, which describes the pipeline (flow) that should be processed in cluster.
-
During the partition plan computation step, the pipeline is analyzed and split into stages. The cluster manager node instantiates mappers/processors, gets estimated data size using mappers, and splits created mappers according to the estimated data size.
All instances are then serialized and sent to the worker node. -
Serialized instances are received and deserialized. Methods annotated with
@PostConstruct
are called. After that, pipeline execution starts. The @BeforeGroup annotated method of the processor is called before processing the first element in chunk.
After processing the number of records estimated as chunk size, the@AfterGroup
annotated method of the processor is called. Chunk size is calculated depending on the environment the pipeline is processed by. Once the pipeline is processed, methods annotated with@PreDestroy
are called.
All the methods managed by the framework must be public. Private methods are ignored. |
The framework is designed to be as declarative as possible but also to stay extensible by not using fixed interfaces or method signatures. This allows to incrementally add new features of the underlying implementations. |
PartitionMapper
A PartitionMapper
is a component able to split itself to make the execution more efficient.
This concept is borrowed from big data and useful in this context only (BEAM
executions).
The idea is to divide the work before executing it in order to reduce the overall execution time.
The process is the following:
-
The size of the data you work on is estimated. This part can be heuristic and not very precise.
-
From that size, the execution engine (runner for Beam) requests the mapper to split itself in N mappers with a subset of the overall work.
-
The leaf (final) mapper is used as a
Producer
(actual reader) factory.
This kind of component must be Serializable to be distributable.
|
Definition
A partition mapper requires three methods marked with specific annotations:
-
@Assessor
for the evaluating method -
@Split
for the dividing method -
@Emitter
for theProducer
factory
@Assessor
The Assessor method returns the estimated size of the data related to the component (depending its configuration).
It must return a Number
and must not take any parameter.
For example:
@Assessor
public long estimateDataSetByteSize() {
return ....;
}
@Split
The Split method returns a collection of partition mappers and can take optionally a @PartitionSize
long value as parameter, which is the requested size of the dataset per sub partition mapper.
For example:
@Split
public List<MyMapper> split(@PartitionSize final long desiredSize) {
return ....;
}
Producer
A Producer
is a component interacting with a physical source. It produces input data for the processing flow.
A producer is a simple component that must have a @Producer
method without any parameter. It can return any data:
@Producer
public MyData produces() {
return ...;
}
Processor
A Processor
is a component that converts incoming data to a different model.
A processor must have a method decorated with @ElementListener
taking an incoming data and returning the processed data:
@ElementListener
public MyNewData map(final MyData data) {
return ...;
}
Processors must be Serializable because they are distributed components.
If you just need to access data on a map-based ruleset, you can use JsonObject
as parameter type.
From there, Talend Component Kit wraps the data to allow you to access it as a map. The parameter type is not enforced.
This means that if you know you will get a SuperCustomDto
, then you can use it as parameter type. But for generic components that are reusable in any chain, it is highly encouraged to use JsonObject
until you have an evaluation language-based processor that has its own way to access components.
For example:
@ElementListener
public MyNewData map(final JsonObject incomingData) {
String name = incomingData.getString("name");
int name = incomingData.getInt("age");
return ...;
}
// equivalent to (using POJO subclassing)
public class Person {
private String age;
private int age;
// getters/setters
}
@ElementListener
public MyNewData map(final Person person) {
String name = person.getName();
int age = person.getAge();
return ...;
}
A processor also supports @BeforeGroup
and @AfterGroup
methods, which must not have any parameter and return void
values. Any other result would be ignored.
These methods are used by the runtime to mark a chunk of the data in a way which is estimated good for the execution flow size.
Because the size is estimated, the size of a group can vary. It is even possible to have groups of size 1 .
|
It is recommended to batch records, for performance reasons:
@BeforeGroup
public void initBatch() {
// ...
}
@AfterGroup
public void endBatch() {
// ...
}
It is a good practice to support a maxBatchSize here and to commit before the end of the group, in case of a computed size that is way too big for your backend to handle.
|
Multiple outputs
In some cases, you may need to split the output of a processor in two. A common example is to have "main" and "reject" branches where part of the incoming data are passed to a specific bucket to be processed later.
To do that, you can use @Output
as replacement of the returned value:
@ElementListener
public void map(final MyData data, @Output final OutputEmitter<MyNewData> output) {
output.emit(createNewData(data));
}
Alternatively, you can pass a string that represents the new branch:
@ElementListener
public void map(final MyData data,
@Output final OutputEmitter<MyNewData> main,
@Output("rejected") final OutputEmitter<MyNewDataWithError> rejected) {
if (isRejected(data)) {
rejected.emit(createNewData(data));
} else {
main.emit(createNewData(data));
}
}
// or
@ElementListener
public MyNewData map(final MyData data,
@Output("rejected") final OutputEmitter<MyNewDataWithError> rejected) {
if (isSuspicious(data)) {
rejected.emit(createNewData(data));
return createNewData(data); // in this case the processing continues but notifies another channel
}
return createNewData(data);
}
Multiple inputs
Having multiple inputs is similar to having multiple outputs, except that an OutputEmitter
wrapper is not needed:
@ElementListener
public MyNewData map(@Input final MyData data, @Input("input2") final MyData2 data2) {
return createNewData(data1, data2);
}
@Input
takes the input name as parameter. If no name is set, it defaults to the "main (default)" input branch. It is recommended to use the default branch when possible and to avoid naming branches according to the component semantic.
Output
An Output
is a Processor
that does not return any data.
Conceptually, an output is a data listener. It matches the concept of processor. Being the last component of the execution chain or returning no data makes your processor an output component:
@ElementListener
public void store(final MyData data) {
// ...
}
Combiners
Currently, Talend Component Kit does not allow you to define a Combiner
.
A combiner is the symmetric part of a partition mapper and allows to aggregate results in a single partition.
Family and component icons
Every component family and component needs to have a representative icon.
You can use one of the icons provided by the framework or you can use a custom icon.
For the component family the icon is defined in the package-info.java
file. For the component itself, you need to declare it in the component class.
To use a custom icon, you need to have the icon file placed in the resources/icons
folder of the project.
The icon file needs to have a name following the convention IconName_icon32.png
, where you can replace IconName
by the name of your choice.
@Icon(value = Icon.IconType.CUSTOM, custom = "IconName")
Configuring components
Components are configured using their constructor parameters. They can all be marked with the @Option
property, which lets you give a name to parameters.
For the name to be correct, you must follow these guidelines:
-
Use a valid Java name.
-
Do not include any
.
character in it. -
Do not start the name with a
$
. -
Defining a name is optional. If you don’t set a specific name, it defaults to the bytecode name, which can require you to compile with a
-parameter
flag to not end up with names such asarg0
,arg1
, and so on.
Parameter types can be primitives or complex objects with fields decorated with @Option
exactly like method parameters.
It is recommended to use simple models which can be serialized in order to ease serialized component implementations. |
For example:
class FileFormat implements Serializable {
@Option("type")
private FileType type = FileType.CSV;
@Option("max-records")
private int maxRecords = 1024;
}
@PartitionMapper(family = "demo", name = "file-reader")
public MyFileReader(@Option("file-path") final File file,
@Option("file-format") final FileFormat format) {
// ...
}
Using this kind of API makes the configuration extensible and component-oriented, which allows you to define all you need.
The instantiation of the parameters is done from the properties passed to the component.
Examples of option names:
Option name | Valid |
---|---|
myName |
|
my_name |
|
my.name |
|
$myName |
Primitives
A primitive is a class which can be directly converted from a String
to the expected type.
It includes all Java primitives, like the String
type itself, but also all types with a org.apache.xbean.propertyeditor.Converter
:
-
BigDecimal
-
BigInteger
-
File
-
InetAddress
-
ObjectName
-
URI
-
URL
-
Pattern
Complex object mapping
The conversion from property to object uses the Dot notation.
For example, assuming the method parameter was configured with @Option("file")
:
file.path = /home/user/input.csv
file.format = CSV
matches
public class FileOptions {
@Option("path")
private File path;
@Option("format")
private Format format;
}
List case
Lists rely on an indexed syntax to define their elements.
For example, assuming that the list parameter is named files
and that the elements are of the FileOptions
type, you can define a list of two elements as follows:
files[0].path = /home/user/input1.csv
files[0].format = CSV
files[1].path = /home/user/input2.xml
files[1].format = EXCEL
Map case
Similarly to the list case, the map uses .key[index]
and .value[index]
to represent its keys and values:
// Map<String, FileOptions>
files.key[0] = first-file
files.value[0].path = /home/user/input1.csv
files.value[0].type = CSV
files.key[1] = second-file
files.value[1].path = /home/user/input2.xml
files.value[1].type = EXCEL
// Map<FileOptions, String>
files.key[0].path = /home/user/input1.csv
files.key[0].type = CSV
files.value[0] = first-file
files.key[1].path = /home/user/input2.xml
files.key[1].type = EXCEL
files.value[1] = second-file
Avoid using the Map type. For example, if you can configure your component with an object instead. |
Defining Constraints and validations on the configuration
You can use metadata to specify that a field is required or has a minimum size, and so on. This is done using the validation
metadata in the org.talend.sdk.component.api.configuration.constraint
package:
API | Name | Parameter Type | Description | Supported Types | Metadata sample |
---|---|---|---|---|---|
@org.talend.sdk.component.api.configuration.constraint.Max |
maxLength |
double |
Ensure the decorated option size is validated with a higher bound. |
CharSequence |
{"validation::maxLength":"12.34"} |
@org.talend.sdk.component.api.configuration.constraint.Min |
minLength |
double |
Ensure the decorated option size is validated with a lower bound. |
CharSequence |
{"validation::minLength":"12.34"} |
@org.talend.sdk.component.api.configuration.constraint.Pattern |
pattern |
string |
Validate the decorated string with a javascript pattern (even into the Studio). |
CharSequence |
{"validation::pattern":"test"} |
@org.talend.sdk.component.api.configuration.constraint.Max |
max |
double |
Ensure the decorated option size is validated with a higher bound. |
Number, int, short, byte, long, double, float |
{"validation::max":"12.34"} |
@org.talend.sdk.component.api.configuration.constraint.Min |
min |
double |
Ensure the decorated option size is validated with a lower bound. |
Number, int, short, byte, long, double, float |
{"validation::min":"12.34"} |
@org.talend.sdk.component.api.configuration.constraint.Required |
required |
- |
Mark the field as being mandatory. |
Object |
{"validation::required":"true"} |
@org.talend.sdk.component.api.configuration.constraint.Max |
maxItems |
double |
Ensure the decorated option size is validated with a higher bound. |
Collection |
{"validation::maxItems":"12.34"} |
@org.talend.sdk.component.api.configuration.constraint.Min |
minItems |
double |
Ensure the decorated option size is validated with a lower bound. |
Collection |
{"validation::minItems":"12.34"} |
@org.talend.sdk.component.api.configuration.constraint.Uniques |
uniqueItems |
- |
Ensure the elements of the collection must be distinct (kind of set). |
Collection |
{"validation::uniqueItems":"true"} |
When using the programmatic API, metadata is prefixed by tcomp:: . This prefix is stripped in the web for convenience, and the table above uses the web keys.
|
Also note that these validations are executed before the runtime is started
(when loading the component instance) and that the execution will fail if they don’t pass.
If somehow it breaks your application you can disable that validation on the JVM by setting the system property talend.component.configuration.validation.skip
to true
.
Marking a configuration as a particular type of data
It is common to classify the incoming data. It is similar to tagging data with several types. Data can commonly be categorized as follows:
-
Datastore: The data you need to connect to the backend.
-
Dataset: A datastore coupled with the data you need to execute an action.
API | Type | Description | Metadata sample |
---|---|---|---|
org.talend.sdk.component.api.configuration.type.DataSet |
dataset |
Mark a model (complex object) as being a dataset. |
{"tcomp::configurationtype::type":"dataset","tcomp::configurationtype::name":"test"} |
org.talend.sdk.component.api.configuration.type.DataStore |
datastore |
Mark a model (complex object) as being a datastore (connection to a backend). |
{"tcomp::configurationtype::type":"datastore","tcomp::configurationtype::name":"test"} |
The component family associated with a configuration type (datastore/dataset) is always the one related to the component using that configuration. |
Those configuration types can be composed to provide one configuration item. For example, a dataset type often needs a datastore type to be provided. A datastore type (that provides the connection information) is used to create a dataset type.
Those configuration types are also used at design time to create shared configurations that can be stored and used at runtime.
For example, in the case of a relational database that supports JDBC:
-
A datastore can be made of:
-
a JDBC URL
-
a username
-
a password.
-
-
A dataset can be made of:
-
a datastore (that provides the data required to connect to the database)
-
a table name
-
data.
-
The component server scans all configuration types and returns a configuration type index. This index can be used for the integration into the targeted platforms (Studio, web applications, and so on).
The configuration type index is represented as a flat tree that contains all the configuration types, which themselves are represented as nodes and indexed by ID.
Every node can point to other nodes. This relation is represented as an array of edges that provides the child IDs.
As an illustration, a configuration type index for the example above can be defined as follows:
{nodes: {
"idForDstore": { datastore:"datastore data", edges:[id:"idForDset"] },
"idForDset": { dataset:"dataset data" }
}
}
Defining links between properties
If you need to define a binding between properties, you can use a set of annotations:
API | Name | Description | Metadata Sample |
---|---|---|---|
@org.talend.sdk.component.api.configuration.condition.ActiveIf |
if |
If the evaluation of the element at the location matches value then the element is considered active, otherwise it is deactivated. |
{"condition::if::target":"test","condition::if::value":"value1,value2"} |
@org.talend.sdk.component.api.configuration.condition.ActiveIfs |
ifs |
Allows to set multiple visibility conditions on the same property. |
{"condition::if::value::0":"value1,value2","condition::if::value::1":"SELECTED","condition::if::target::0":"sibling1","condition::if::target::1":"../../other"} |
The target element location is specified as a relative path to the current location, using Unix path characters.
The configuration class delimiter is /
.
The parent configuration class is specified by ..
.
Thus, ../targetProperty
denotes a property, which is located in the parent configuration class and is named targetProperty
.
When using the programmatic API, metadata is prefixed with tcomp:: . This prefix is stripped in the web for convenience, and the previous table uses the web keys.
|
Adding hints about the rendering based on configuration/component knowledge
In some cases, you may need to add metadata about the configuration to let the UI render that configuration properly.
For example, a password value that must be hidden and not a simple clear input box. For these cases - if you want to change the UI rendering - you can use a particular set of annotations:
API | Description | Generated property metadata |
---|---|---|
@org.talend.sdk.component.api.configuration.ui.DefaultValue |
Provide a default value the UI can use - only for primitive fields. |
{"ui::defaultvalue::value":"test"} |
@org.talend.sdk.component.api.configuration.ui.OptionsOrder |
Allows to sort a class properties. |
{"ui::optionsorder::value":"value1,value2"} |
@org.talend.sdk.component.api.configuration.ui.layout.AutoLayout |
Request the rendered to do what it thinks is best. |
{"ui::autolayout":"true"} |
@org.talend.sdk.component.api.configuration.ui.layout.GridLayout |
Advanced layout to place properties by row, this is exclusive with |
{"ui::gridlayout::value1::value":"first|second,third","ui::gridlayout::value2::value":"first|second,third"} |
@org.talend.sdk.component.api.configuration.ui.layout.GridLayouts |
Allow to configure multiple grid layouts on the same class, qualified with a classifier (name) |
{"ui::gridlayout::Advanced::value":"another","ui::gridlayout::Main::value":"first|second,third"} |
@org.talend.sdk.component.api.configuration.ui.layout.HorizontalLayout |
Put on a configuration class it notifies the UI an horizontal layout is preferred. |
{"ui::horizontallayout":"true"} |
@org.talend.sdk.component.api.configuration.ui.layout.VerticalLayout |
Put on a configuration class it notifies the UI a vertical layout is preferred. |
{"ui::verticallayout":"true"} |
@org.talend.sdk.component.api.configuration.ui.widget.Code |
Mark a field as being represented by some code widget (vs textarea for instance). |
{"ui::code::value":"test"} |
@org.talend.sdk.component.api.configuration.ui.widget.Credential |
Mark a field as being a credential. It is typically used to hide the value in the UI. |
{"ui::credential":"true"} |
@org.talend.sdk.component.api.configuration.ui.widget.Structure |
Mark a List<String> or Map<String, String> field as being represented as the component data selector (field names generally or field names as key and type as value). |
{"ui::structure::type":"null","ui::structure::discoverSchema":"test","ui::structure::value":"test"} |
@org.talend.sdk.component.api.configuration.ui.widget.TextArea |
Mark a field as being represented by a textarea(multiline text input). |
{"ui::textarea":"true"} |
When using the programmatic API, metadata is prefixed with tcomp:: . This prefix is stripped in the web for convenience, and the previous table uses the web keys.
|
Target support should cover org.talend.core.model.process.EParameterFieldType
but you need to ensure that the web renderer is able to handle the same widgets.
Widget and validation gallery
Widgets
Name | Code | Studio Rendering | Web Rendering |
---|---|---|---|
Input/Text |
|
||
Password |
|
||
Textarea |
|
||
Checkbox |
|
||
List |
|
||
List |
|
||
Table |
|
||
Code |
|
||
Schema |
|
Validations
Name | Code | Studio Rendering | Web Rendering |
---|---|---|---|
Property validation |
|
||
Property validation with Pattern |
|
||
Data store validation |
|
You can also use other types of validation that are similar to @Pattern
:
-
@Min
,@Max
for numbers. -
@Unique
for collection values. -
@Required
for a required configuration.
Registering components
As you may have read in the Getting Started, you need an annotation to register your component through the family
method. Multiple components can use the same family
value but the family
+ name
pair must be unique for the system.
In order to share the same component family name and to avoid repetitions in all family
methods,
you can use the @Components
annotation on the root package of your component. It allows you to define the component family and the categories the component belongs to (Misc
by default if not set).
Here is a sample package-info.java
:
@Components(name = "my_component_family", categories = "My Category")
package org.talend.sdk.component.sample;
import org.talend.sdk.component.api.component.Components;
Another example with an existing component:
@Components(name = "Salesforce", categories = {"Business", "Cloud"})
package org.talend.sdk.component.sample;
import org.talend.sdk.component.api.component.Components;
Components metadata
Components can require metadata to be integrated in Talend Studio or Cloud platforms.
Metadata is set on the component class and belongs to the org.talend.sdk.component.api.component
package.
API | Description |
---|---|
@Icon |
Sets an icon key used to represent the component. You can use a custom key with the |
@Version |
Sets the component version. 1 by default. |
Example:
@Icon(FILE_XML_O)
@PartitionMapper(name = "jaxbInput")
public class JaxbPartitionMapper implements Serializable {
// ...
}
Managing version configuration
If some changes impact the configuration, they can be managed through a migration handler at the component level (enabling trans-model migration support).
The @Version
annotation supports a migrationHandler
method which migrates the incoming configuration to the current model.
For example, if the filepath
configuration entry from v1 changed to location
in v2, you can remap the value in your MigrationHandler
implementation.
A best practice is to split migrations into services that you can inject in the migration handler (through constructor) rather than managing all migrations directly in the handler. For example:
// full component code structure skipped for brievity, kept only migration part
@Version(value = 3, migrationHandler = MyComponent.Migrations.class)
public class MyComponent {
// the component code...
private interface VersionConfigurationHandler {
Map<String, String> migrate(Map<String, String> incomingData);
}
public static class Migrations {
private final List<VersionConfigurationHandler> handlers;
// VersionConfigurationHandler implementations are decorated with @Service
public Migrations(final List<VersionConfigurationHandler> migrations) {
this.handlers = migrations;
this.handlers.sort(/*some custom logic*/);
}
@Override
public Map<String, String> migrate(int incomingVersion, Map<String, String> incomingData) {
Map<String, String> out = incomingData;
for (MigrationHandler handler : handlers) {
out = handler.migrate(out);
}
}
}
}
What is important to notice in this snippet is not the way the code is organized, but rather the fact that you can organize your migrations the way that best fits your component.
If you need to apply migrations in a specific order, make sure that they are sorted.
Consider this API as a migration callback rather than a migration API. Adjust the migration code structure you need behind the MigrationHandler , based on your component requirements, using service injection.
|
Component internationalization
In common cases, you can store messages using a properties file in your component module to use internationalization.
Store the properties file in the same package as the related components and name it Messages
. For example, org.talend.demo.MyComponent
uses org.talend.demo.Messages[locale].properties
.
Default components keys
Out of the box components are internationalized using the same location logic for the resource bundle. The supported keys are:
Name Pattern | Description |
---|---|
${family}._displayName |
Display name of the family |
${family}.${configurationType}.${name}._displayName |
Display name of a configuration type (dataStore or dataSet) |
${family}.${component_name}._displayName |
Display name of the component (used by the GUIs) |
${property_path}._displayName |
Display name of the option. |
${simple_class_name}.${property_name}._displayName |
Display name of the option using its class name. |
${enum_simple_class_name}.${enum_name}._displayName |
Display name of the |
${property_path}._placeholder |
Placeholder of the option. |
Example of configuration for a component named list
and belonging to the memory
family (@Emitter(family = "memory", name = "list")
):
memory.list._displayName = Memory List
Configuration classes can be translated using the simple class name in the messages properties file. This is useful in case of common configurations shared by multiple components.
For example, if you have a configuration class as follows :
public class MyConfig {
@Option
private String host;
@Option
private int port;
}
You can give it a translatable display name by adding ${simple_class_name}.${property_name}._displayName
to Messages.properties
under the same package as the configuration class.
MyConfig.host._displayName = Server Host Name
MyConfig.host._placeholder = Enter Server Host Name...
MyConfig.port._displayName = Server Port
MyConfig.port._placeholder = Enter Server Port...
If you have a display name using the property path, it overrides the display name defined using the simple class name. This rule also applies to placeholders. |
Components Packaging
Component Loading
Talend Component scanning is based on plugins. To make sure that plugins can be developed in parallel and avoid conflicts, plugins need to be isolated (component or group of components in a single jar/plugin).
Multiple options are available:
-
Graph classloading: this option allows you to link the plugins and dependencies together dynamically in any direction.
For example, the graph classloading can be illustrated by OSGi containers. -
Tree classloading: a shared classloader inherited by plugin classloaders. However, plugin classloader classes are not seen by the shared classloader, nor by other plugins.
For example, he tree classloading is commonly used by Servlet containers where plugins are web applications. -
Flat classpath: listed for completeness but rejected by design because it doesn’t comply with this requirement.
In order to avoid much complexity added by this layer, Talend Component Kit relies on a tree classloading. The advantage is that you don’t need to define the relationship with other plugins/dependencies, because it is built-in.
Here is a representation of this solution:
The shared area contains Talend Component Kit API, which only contains by default the classes shared by the plugins.
Then, each plugin is loaded with its own classloader and dependencies.
Packaging a plugin
This section explains the overall way to handle dependencies but the Talend Maven plugin provides a shortcut for that. |
A plugin is a JAR file that was enriched with the list of its dependencies. By default, Talend Component Kit runtime is able to read the output of maven-dependency-plugin
in TALEND-INF/dependencies.txt
. You just need to make sure that your component defines the following plugin:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>3.0.2</version>
<executions>
<execution>
<id>create-TALEND-INF/dependencies.txt</id>
<phase>process-resources</phase>
<goals>
<goal>list</goal>
</goals>
<configuration>
<outputFile>${project.build.outputDirectory}/TALEND-INF/dependencies.txt</outputFile>
</configuration>
</execution>
</executions>
</plugin>
Once build, check the JAR file and look for the following lines:
$ unzip -p target/mycomponent-1.0.0-SNAPSHOT.jar TALEND-INF/dependencies.txt
The following files have been resolved:
org.talend.sdk.component:component-api:jar:1.0.0-SNAPSHOT:provided
org.apache.geronimo.specs:geronimo-annotation_1.3_spec:jar:1.0:provided
org.superbiz:awesome-project:jar:1.2.3:compile
junit:junit:jar:4.12:test
org.hamcrest:hamcrest-core:jar:1.3:test
What is important to see is the scope related to the artifacts:
-
The APIs (
component-api
andgeronimo-annotation_1.3_spec
) areprovided
because you can consider them to be there when executing (they come with the framework). -
Your specific dependencies (
awesome-project
in the example above) are marked ascompile
: they are included as needed dependencies by the framework (note that usingruntime
works too). -
the other dependencies are ignored. For example,
test
dependencies.
Packaging an application
Even if a flat classpath deployment is possible, it is not recommended because it would then reduce the capabilities of the components.
Dependencies
The way the framework resolves dependencies is based on a local Maven repository layout. As a quick reminder, it looks like:
.
├── groupId1
│ └── artifactId1
│ ├── version1
│ │ └── artifactId1-version1.jar
│ └── version2
│ └── artifactId1-version2.jar
└── groupId2
└── artifactId2
└── version1
└── artifactId2-version1.jar
This is all the layout the framework uses. The logic converts t-uple {groupId, artifactId, version, type (jar)}
to the path in the repository.
Talend Component Kit runtime has two ways to find an artifact:
-
From the file system based on a configured Maven 2 repository.
-
From a fat JAR (uber JAR) with a nested Maven repository under
MAVEN-INF/repository
.
The first option uses either ${user.home}/.m2/repository
default) or a specific path configured when creating a ComponentManager
.
The nested repository option needs some configuration during the packaging to ensure the repository is correctly created.
Creating a nested Maven repository with maven-shade-plugin
To create the nested MAVEN-INF/repository
repository, you can use the nested-maven-repository
extension:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.0.0</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer implementation="org.talend.sdk.component.container.maven.shade.ContainerDependenciesTransformer">
<session>${session}</project>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
<dependencies>
<dependency>
<groupId>org.talend.sdk.component</groupId>
<artifactId>nested-maven-repository</artifactId>
<version>${the.plugin.version}</version>
</dependency>
</dependencies>
</plugin>
Listing needed plugins
Plugins are usually programmatically registered. If you want to make some of them automatically available, you need to generate a TALEND-INF/plugins.properties
file that maps a plugin name to coordinates found with the Maven mechanism described above.
You can enrich maven-shade-plugin
to do it:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.0.0</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer implementation="org.talend.sdk.component.container.maven.shade.PluginTransformer">
<session>${session}</project>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
<dependencies>
<dependency>
<groupId>org.talend.sdk.component</groupId>
<artifactId>nested-maven-repository</artifactId>
<version>${the.plugin.version}</version>
</dependency>
</dependencies>
</plugin>
maven-shade-plugin
extensions
Here is a final job/application bundle based on maven-shade-plugin:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.0.0</version>
<configuration>
<createDependencyReducedPom>false</createDependencyReducedPom>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/.SF</exclude>
<exclude>META-INF/.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<shadedClassifierName>shaded</shadedClassifierName>
<transformers>
<transformer
implementation="org.talend.sdk.component.container.maven.shade.ContainerDependenciesTransformer">
<session>${session}</session>
<userArtifacts>
<artifact>
<groupId>org.talend.sdk.component</groupId>
<artifactId>sample-component</artifactId>
<version>1.0</version>
<type>jar</type>
</artifact>
</userArtifacts>
</transformer>
<transformer implementation="org.talend.sdk.component.container.maven.shade.PluginTransformer">
<session>${session}</session>
<userArtifacts>
<artifact>
<groupId>org.talend.sdk.component</groupId>
<artifactId>sample-component</artifactId>
<version>1.0</version>
<type>jar</type>
</artifact>
</userArtifacts>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
<dependencies>
<dependency>
<groupId>org.talend.sdk.component</groupId>
<artifactId>nested-maven-repository-maven-plugin</artifactId>
<version>${the.version}</version>
</dependency>
</dependencies>
</plugin>
The configuration unrelated to transformers depends on your application. |
ContainerDependenciesTransformer
embeds a Maven repository and PluginTransformer
to create a file that lists (one per line) artifacts (representing plugins).
Both transformers share most of their configuration:
-
session
: must be set to${session}
. This is used to retrieve dependencies. -
scope
: a comma-separated list of scopes to include in the artifact filtering (note that the default will rely onprovided
but you can replace it bycompile
,runtime
,runtime+compile
,runtime+system
ortest
). -
include
: a comma-separated list of artifacts to include in the artifact filtering. -
exclude
: a comma-separated list of artifacts to exclude in the artifact filtering. -
userArtifacts
: a list of artifacts (groupId, artifactId, version, type - optional, file - optional for plugin transformer, scope - optional) which can be forced inline. This parameter is mainly useful forPluginTransformer
. -
includeTransitiveDependencies
: should transitive dependencies of the components be included. Set totrue
by default. -
includeProjectComponentDependencies
: should project component dependencies be included. Set tofalse
by default. It is not needed when a job project uses isolation for components. -
userArtifacts
: set of component artifacts to include.
With the component tooling, it is recommended to keep default locations. Also if you need to use project dependencies, you can need to refactor your project structure to ensure component isolation. Talend Component Kit lets you handle that part but the recommended practice is to use userArtifacts for the components instead of project <dependencies> .
|
ContainerDependenciesTransformer
ContainerDependenciesTransformer
specific configuration is as follows:
-
repositoryBase
: base repository location (MAVEN-INF/repository
by default). -
ignoredPaths
: a comma-separated list of folders not to create in the output JAR. This is common for folders already created by other transformers/build parts.
PluginTransformer
ContainerDependenciesTransformer
specific configuration is the following one:
-
pluginListResource
: base repository location (default to TALEND-INF/plugins.properties`).
For example, if you want to list only the plugins you use, you can configure this transformer as follows:
<transformer implementation="org.talend.sdk.component.container.maven.shade.PluginTransformer">
<session>${session}</session>
<include>org.talend.sdk.component:component-x,org.talend.sdk.component:component-y,org.talend.sdk.component:component-z</include>
</transformer>
Component scanning rules and default exclusions
The framework uses two kind of filterings when scanning your component. One based on the JAR name
and one based on the package name. Make sure that your component definitions (including services)
are in a scanned module if they are not registered manually using ComponentManager.instance().addPlugin()
, and that the component package is not excluded.
Jars Scanning
To find components the framework can scan the classpath but in this case, to avoid to scan the whole classpath which can be really huge an impacts a lot the startup time, several jars are excluded out of the box.
These jars use the following prefix:
-
ApacheJMeter
-
FastInfoset
-
HdrHistogram
-
HikariCP
-
PDFBox
-
RoaringBitmap-
-
XmlSchema-
-
accessors-smart
-
activation-
-
activeio-
-
activemq-
-
aeron
-
aether-
-
agrona
-
akka-
-
animal-sniffer-annotation
-
annotation
-
ant-
-
antlr-
-
aopalliance-
-
apache-el
-
apacheds-
-
api-asn1-
-
api-util-
-
apiguardian-api-
-
app-
-
archaius-core
-
args4j-
-
arquillian-
-
asciidoctorj-
-
asm-
-
aspectj
-
async-http-client-
-
avalon-framework-
-
avro-
-
awaitility-
-
axis-
-
axis2-
-
base64-
-
batchee-jbatch
-
batik-
-
bcpkix
-
bcprov-
-
beam-model-
-
beam-runners-
-
beam-sdks-
-
bonecp
-
bootstrap.jar
-
brave-
-
bsf-
-
build-link
-
bval
-
byte-buddy
-
c3p0-
-
cache
-
carrier
-
cassandra-driver-core
-
catalina-
-
catalina.jar
-
cats
-
cdi-
-
cglib-
-
charsets.jar
-
chill
-
classindex
-
classmate
-
classutil
-
classycle
-
cldrdata
-
commands-
-
common-
-
commons-
-
component-api
-
component-form
-
component-runtime
-
component-server
-
component-spi
-
component-studio
-
components-api
-
components-common
-
compress-lzf
-
config
-
constructr
-
container-core
-
contenttype
-
coverage-agent
-
cryptacular-
-
cssparser-
-
curator-
-
cxf-
-
daikon
-
databinding
-
dataquality
-
debugger-agent
-
deltaspike-
-
deploy.jar
-
derby-
-
derbyclient-
-
derbynet-
-
dnsns
-
dom4j
-
draw2d
-
ecj-
-
eclipselink-
-
ehcache-
-
el-api
-
enunciate-core-annotations
-
error_prone_annotations
-
expressions
-
fastutil
-
feign-core
-
feign-hystrix
-
feign-slf4j
-
filters-helpers
-
findbugs-
-
fluentlenium-core
-
freemarker-
-
fusemq-leveldb-
-
gef-
-
geocoder
-
geronimo-
-
gmbal
-
google-
-
gpars-
-
gragent.jar
-
graph
-
grizzled-scala
-
grizzly-
-
groovy-
-
grpc-
-
gson-
-
guava-
-
guice-
-
h2-
-
hadoop-
-
hamcrest-
-
hawtbuf-
-
hawtdispatch-
-
hawtio-
-
hawtjni-runtime
-
help-
-
hibernate-
-
hk2-
-
howl-
-
hsqldb-
-
htmlunit-
-
htrace-
-
httpclient-
-
httpcore-
-
httpmime
-
hystrix
-
iban4j-
-
icu4j-
-
idb-
-
idea_rt.jar
-
instrumentation-api
-
istack-commons-runtime-
-
ivy-
-
j2objc-annotations
-
jBCrypt
-
jaccess
-
jackson-
-
janino-
-
jansi-
-
jasper-el.jar
-
jasper.jar
-
jasypt-
-
java-atk-wrapper
-
java-support-
-
java-xmlbuilder-
-
javacsv
-
javaee-
-
javaee-api
-
javassist-
-
javaws.jar
-
javax.
-
jaxb-
-
jaxp-
-
jbake-
-
jboss-
-
jbossall-
-
jbosscx-
-
jbossjts-
-
jbosssx-
-
jcache
-
jce.jar
-
jcip-annotations
-
jcl-over-slf4j-
-
jcommander-
-
jdbcdslog-1
-
jersey-
-
jets3t
-
jettison-
-
jetty-
-
jface
-
jfairy
-
jffi
-
jfr.jar
-
jfxrt.jar
-
jfxswt
-
jjwt
-
jline
-
jmdns-
-
jmustache
-
jna-
-
jnr-
-
jobs-
-
joda-convert
-
joda-time-
-
johnzon-
-
jolokia-
-
jopt-simple
-
jruby-
-
json-
-
json4s-
-
jsonb-api
-
jsoup-
-
jsp-api
-
jsr
-
jsse.jar
-
jta
-
jul-to-slf4j-
-
juli-
-
junit-
-
junit5-
-
jwt
-
jython
-
kafka
-
kahadb-
-
kotlin-runtime
-
kryo
-
leveldb
-
libphonenumber
-
lift-json
-
lmdbjava
-
localedata
-
log4j-
-
logback
-
logging-event-layout
-
logkit-
-
lombok
-
lucene
-
lz4
-
machinist
-
macro-compat
-
mail-
-
management-
-
mapstruct-
-
maven-
-
mbean-annotation-api-
-
meecrowave-
-
mesos-
-
metrics-
-
microprofile-config-api-
-
mimepull-
-
mina-
-
minlog
-
mockito-core
-
mqtt-client-
-
multiverse-core-
-
mx4j-
-
myfaces-
-
mysql-connector-java-
-
nashorn
-
neethi-
-
neko-htmlunit
-
nekohtml-
-
netflix
-
netty-
-
nimbus-jose-jwt
-
objenesis-
-
okhttp
-
okio
-
openjpa-
-
openmdx-
-
opensaml-
-
opentest4j-
-
openwebbeans-
-
openws-
-
ops4j-
-
org.apache.aries
-
org.apache.commons
-
org.apache.log4j
-
org.eclipse.
-
org.junit.
-
org.osgi.core-
-
org.osgi.enterprise
-
org.talend
-
orient-commons-
-
orientdb-core-
-
orientdb-nativeos-
-
oro-
-
osgi
-
paranamer
-
pax-url
-
play
-
plexus-
-
plugin.jar
-
poi-
-
postgresql
-
preferences-
-
prefixmapper
-
protobuf-
-
py4j-
-
pyrolite-
-
qdox-
-
quartz-2
-
quartz-openejb-
-
reactive-streams
-
reflectasm-
-
registry-
-
resources.jar
-
rhino
-
ribbon
-
rmock-
-
routes-compiler
-
routines
-
rt.jar
-
runners
-
runtime-
-
rxjava
-
rxnetty
-
saaj-
-
sac-
-
scala
-
scalap
-
scalatest
-
scannotation-
-
selenium
-
serializer-
-
serp-
-
service-common
-
servlet-api-
-
servo-
-
shaded
-
shrinkwrap-
-
sisu-guice
-
sisu-inject
-
slf4j-
-
slick
-
smack-
-
smackx-
-
snakeyaml-
-
snappy-
-
spark-
-
specs2
-
spring-
-
sshd-
-
ssl-config-core
-
stax-api-
-
stax2-api-
-
stream
-
sunec.jar
-
sunjce_provider
-
sunpkcs11
-
surefire-
-
swagger-
-
swizzle-
-
sxc-
-
system-rules
-
tachyon-
-
talend-icon
-
test-interface
-
testng-
-
tomcat
-
tomee-
-
tools.jar
-
twirl
-
twitter4j-
-
tyrex
-
uncommons
-
unused
-
validation-api-
-
velocity-
-
wagon-
-
webbeans-
-
websocket
-
woodstox-core
-
workbench
-
ws-commons-util-
-
wsdl4j-
-
wss4j-
-
wstx-asl-
-
xalan-
-
xbean-
-
xercesImpl-
-
xml-apis-
-
xml-resolver-
-
xmlbeans-
-
xmlenc-
-
xmlgraphics-
-
xmlpull-
-
xmlrpc-
-
xmlschema-
-
xmlsec-
-
xmltooling-
-
xmlunit-
-
xstream-
-
xz-
-
zipfs.jar
-
zipkin-
-
ziplock-
-
zkclient
-
zookeeper-
Package Scanning
Since the framework can be used in the case of fatjars or shades, and because it still uses scanning, it is important to ensure we don’t scan the whole classes for performances reason.
Therefore, the following packages are ignored:
-
avro.shaded
-
com.codehale.metrics
-
com.ctc.wstx
-
com.datastax.driver.core
-
com.fasterxml.jackson.annotation
-
com.fasterxml.jackson.core
-
com.fasterxml.jackson.databind
-
com.fasterxml.jackson.dataformat
-
com.fasterxml.jackson.module
-
com.google.common
-
com.google.thirdparty
-
com.ibm.wsdl
-
com.jcraft.jsch
-
com.kenai.jffi
-
com.kenai.jnr
-
com.sun.istack
-
com.sun.xml.bind
-
com.sun.xml.messaging.saaj
-
com.sun.xml.txw2
-
com.thoughtworks
-
io.jsonwebtoken
-
io.netty
-
io.swagger.annotations
-
io.swagger.config
-
io.swagger.converter
-
io.swagger.core
-
io.swagger.jackson
-
io.swagger.jaxrs
-
io.swagger.model
-
io.swagger.models
-
io.swagger.util
-
javax
-
jnr
-
junit
-
net.sf.ehcache
-
net.shibboleth.utilities.java.support
-
org.aeonbits.owner
-
org.apache.activemq
-
org.apache.beam
-
org.apache.bval
-
org.apache.camel
-
org.apache.catalina
-
org.apache.commons.beanutils
-
org.apache.commons.cli
-
org.apache.commons.codec
-
org.apache.commons.collections
-
org.apache.commons.compress
-
org.apache.commons.dbcp2
-
org.apache.commons.digester
-
org.apache.commons.io
-
org.apache.commons.jcs.access
-
org.apache.commons.jcs.admin
-
org.apache.commons.jcs.auxiliary
-
org.apache.commons.jcs.engine
-
org.apache.commons.jcs.io
-
org.apache.commons.jcs.utils
-
org.apache.commons.lang
-
org.apache.commons.lang3
-
org.apache.commons.logging
-
org.apache.commons.pool2
-
org.apache.coyote
-
org.apache.cxf
-
org.apache.geronimo.javamail
-
org.apache.geronimo.mail
-
org.apache.geronimo.osgi
-
org.apache.geronimo.specs
-
org.apache.http
-
org.apache.jcp
-
org.apache.johnzon
-
org.apache.juli
-
org.apache.logging.log4j.core
-
org.apache.logging.log4j.jul
-
org.apache.logging.log4j.util
-
org.apache.logging.slf4j
-
org.apache.meecrowave
-
org.apache.myfaces
-
org.apache.naming
-
org.apache.neethi
-
org.apache.openejb
-
org.apache.openjpa
-
org.apache.oro
-
org.apache.tomcat
-
org.apache.tomee
-
org.apache.velocity
-
org.apache.webbeans
-
org.apache.ws
-
org.apache.wss4j
-
org.apache.xbean
-
org.apache.xml
-
org.apache.xml.resolver
-
org.bouncycastle
-
org.codehaus.jackson
-
org.codehaus.stax2
-
org.codehaus.swizzle.Grep
-
org.codehaus.swizzle.Lexer
-
org.cryptacular
-
org.eclipse.jdt.core
-
org.eclipse.jdt.internal
-
org.fusesource.hawtbuf
-
org.h2
-
org.hamcrest
-
org.hsqldb
-
org.jasypt
-
org.jboss.marshalling
-
org.joda.time
-
org.jose4j
-
org.junit
-
org.jvnet.mimepull
-
org.metatype.sxc
-
org.objectweb.asm
-
org.objectweb.howl
-
org.openejb
-
org.opensaml
-
org.slf4j
-
org.swizzle
-
org.terracotta.context
-
org.terracotta.entity
-
org.terracotta.modules.ehcache
-
org.terracotta.statistics
-
org.tukaani
-
org.yaml.snakeyaml
-
serp
it is not recommanded but possible to add in your plugin module a
TALEND-INF/scanning.properties file with classloader.includes and
classloader.excludes entries to refine the scanning with custom rules.
In such a case, exclusions win over inclusions.
|
Build tools
Maven Plugin
talend-component-maven-plugin
helps you write components that match best practices and generate transparently metadata used by Talend Studio.
You can use it as follows:
<plugin>
<groupId>org.talend.sdk.component</groupId>
<artifactId>talend-component-maven-plugin</artifactId>
<version>${component.version}</version>
</plugin>
This plugin is also an extension so you can declare it in your build/extensions
block as:
<extension>
<groupId>org.talend.sdk.component</groupId>
<artifactId>talend-component-maven-plugin</artifactId>
<version>${component.version}</version>
</extension>
Used as an extension, the dependencies
, validate
and documentation
goals will be set up.
Dependencies
The first goal is a shortcut for the maven-dependency-plugin
. It creates the TALEND-INF/dependencies.txt
file with the compile
and runtime
dependencies, allowing the component to use it at runtime:
<plugin>
<groupId>org.talend.sdk.component</groupId>
<artifactId>talend-component-maven-plugin</artifactId>
<version>${component.version}</version>
<executions>
<execution>
<id>talend-dependencies</id>
<goals>
<goal>dependencies</goal>
</goals>
</execution>
</executions>
</plugin>
Validate
This goal helps you validate the common programming model of the component. To activate it, you can use following execution definition:
<plugin>
<groupId>org.talend.sdk.component</groupId>
<artifactId>talend-component-maven-plugin</artifactId>
<version>${component.version}</version>
<executions>
<execution>
<id>talend-component-validate</id>
<goals>
<goal>validate</goal>
</goals>
</execution>
</executions>
</plugin>
It is bound to the process-classes
phase by default. When executed, it performs several validations that can be disabled by setting the corresponding flags to false
in the <configuration>
block of the execution:
Name | Description | Default |
---|---|---|
validateInternationalization |
Validates that resource bundles are presents and contain commonly used keys (for example, |
true |
validateModel |
Ensures that components pass validations of the |
true |
validateSerializable |
Ensures that components are |
true |
validateMetadata |
Ensures that components have an |
true |
validateDataStore |
Ensures that any |
true |
validateComponent |
Ensures that the native programming model is respected. You can disable it when using another programming model like Beam. |
true |
validateActions |
Validates action signatures for actions not tolerating dynamic binding ( |
true |
validateFamily |
Validates the family by verifying that the package containing the |
true |
validateDocumentation |
Ensures that all components and |
true |
validateLayout |
Ensures that the layout is referencing existing options and properties. |
true |
validateOptionNames |
Ensures that the option names are compliant with the framework. It is highly recommended and safer to keep it set to |
true |
Documentation
This goal generates an Asciidoc file documenting your component from the configuration model (@Option
) and the @Documentation
property that you can add to options and to the component itself.
<plugin>
<groupId>org.talend.sdk.component</groupId>
<artifactId>talend-component-maven-plugin</artifactId>
<version>${component.version}</version>
<executions>
<execution>
<id>talend-component-documentation</id>
<goals>
<goal>asciidoc</goal>
</goals>
</execution>
</executions>
</plugin>
Name | Description | Default |
---|---|---|
level |
Level of the root title. |
2 ( |
output |
Output folder path. It is recommended to keep it to the default value. |
|
formats |
Map of the renderings to do. Keys are the format ( |
- |
attributes |
Map of asciidoctor attributes when formats is set. |
- |
templateDir / templateEngine |
Template configuration for the rendering. |
- |
title |
Document title. |
${project.name} |
attachDocumentations |
Allows to attach (and deploy) the documentations ( |
true |
If you use the plugin as an extension, you can add the talend.documentation.htmlAndPdf property and set it to true in your project to automatically get HTML and PDF renderings of the documentation.
|
Rendering your documentation
To render the generated documentation in HTML or PDF, you can use the Asciidoctor Maven plugin (or Gradle equivalent). You can configure both executions if you want both HTML and PDF renderings.
Make sure to execute the rendering after the documentation generation.
HTML rendering
If you prefer a HTML rendering, you can configure the following execution in the asciidoctor plugin. The example below:
-
Generates the components documentation in
target/classes/TALEND-INF/documentation.adoc
. -
Renders the documentation as an HTML file stored in
target/documentation/documentation.html
.
<plugin> (1)
<groupId>org.talend.sdk.component</groupId>
<artifactId>talend-component-maven-plugin</artifactId>
<version>${talend-component-kit.version}</version>
<executions>
<execution>
<id>documentation</id>
<phase>prepare-package</phase>
<goals>
<goal>asciidoc</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin> (2)
<groupId>org.asciidoctor</groupId>
<artifactId>asciidoctor-maven-plugin</artifactId>
<version>1.5.6</version>
<executions>
<execution>
<id>doc-html</id>
<phase>prepare-package</phase>
<goals>
<goal>process-asciidoc</goal>
</goals>
<configuration>
<sourceDirectory>${project.build.outputDirectory}/TALEND-INF</sourceDirectory>
<sourceDocumentName>documentation.adoc</sourceDocumentName>
<outputDirectory>${project.build.directory}/documentation</outputDirectory>
<backend>html5</backend>
</configuration>
</execution>
</executions>
</plugin>
PDF rendering
If you prefer a PDF rendering, you can configure the following execution in the asciidoctor plugin:
<plugin>
<groupId>org.asciidoctor</groupId>
<artifactId>asciidoctor-maven-plugin</artifactId>
<version>1.5.6</version>
<executions>
<execution>
<id>doc-html</id>
<phase>prepare-package</phase>
<goals>
<goal>process-asciidoc</goal>
</goals>
<configuration>
<sourceDirectory>${project.build.outputDirectory}/TALEND-INF</sourceDirectory>
<sourceDocumentName>documentation.adoc</sourceDocumentName>
<outputDirectory>${project.build.directory}/documentation</outputDirectory>
<backend>pdf</backend>
</configuration>
</execution>
</executions>
<dependencies>
<dependency>
<groupId>org.asciidoctor</groupId>
<artifactId>asciidoctorj-pdf</artifactId>
<version>1.5.0-alpha.16</version>
</dependency>
</dependencies>
</plugin>
Including the documentation into a document
If you want to add some more content or a title, you can include the generated document into
another document using Asciidoc include
directive.
For example:
= Super Components
Super Writer
:toc:
:toclevels: 3
:source-highlighter: prettify
:numbered:
:icons: font
:hide-uri-scheme:
:imagesdir: images
include::{generated_doc}/documentation.adoc[]
To be able to do that, you need to pass the generated_doc
attribute to the plugin. For example:
<plugin>
<groupId>org.asciidoctor</groupId>
<artifactId>asciidoctor-maven-plugin</artifactId>
<version>1.5.6</version>
<executions>
<execution>
<id>doc-html</id>
<phase>prepare-package</phase>
<goals>
<goal>process-asciidoc</goal>
</goals>
<configuration>
<sourceDirectory>${project.basedir}/src/main/asciidoc</sourceDirectory>
<sourceDocumentName>my-main-doc.adoc</sourceDocumentName>
<outputDirectory>${project.build.directory}/documentation</outputDirectory>
<backend>html5</backend>
<attributes>
<generated_adoc>${project.build.outputDirectory}/TALEND-INF</generated_adoc>
</attributes>
</configuration>
</execution>
</executions>
</plugin>
This is optional but allows to reuse Maven placeholders to pass paths, which can be convenient in an automated build.
You can find more customization options on Asciidoctor website.
Testing a component web rendering
Testing the rendering of your component configuration into the Studio requires deploying the component in Talend Studio (refer to Studio Documentation.
In the case where you need to deploy your component into a Cloud (web) environment, you can test its web rendering by using the web
goal of the plugin:
-
Run the
mvn talend-component:web
command. -
Open the following URL in a web browser:
localhost:8080
. -
Select the component form you want to see from the treeview on the left. The selected form is displayed on the right.
Two parameters are available with the plugin:
-
serverPort
, which allows to change the default port (8080) of the embedded server. -
serverArguments
, that you can use to pass Meecrowave options to the server. Learn more about that configuration at openwebbeans.apache.org/meecrowave/meecrowave-core/cli.html.
Make sure to install the artifact before using this command because it reads the component JAR from the local Maven repository. |
Generating inputs or outputs
The Mojo generate
(Maven plugin goal) of the same plugin also embeds a generator that you can use to bootstrap any input or output component:
<plugin>
<groupId>org.talend.sdk.component</groupId>
<artifactId>talend-component-maven-plugin</artifactId>
<version>${talend-component.version}</version>
<executions>
<execution> (1)
<id>generate-input</id>
<phase>generate-sources</phase>
<goals>
<goal>generate</goal>
</goals>
<configuration>
<type>input</type>
</configuration>
</execution>
<execution> (2)
<id>generate-output</id>
<phase>generate-sources</phase>
<goals>
<goal>generate</goal>
</goals>
<configuration>
<type>output</type>
</configuration>
</execution>
</executions>
</plugin>
1 | The first execution generates an input (partition mapper + emitter). |
2 | the second execution generates an output. |
It is intended to be used from the command line (or IDE Maven integration) as follows:
$ mvn talend-component:generate \
-Dtalend.generator.type=[input|output] \ (1)
[-Dtalend.generator.classbase=com.test.MyComponent] \ (2)
[-Dtalend.generator.family=my-family] \ (3)
[-Dtalend.generator.pom.read-only=false] (4)
1 | Select the type of component you want: input to generate a mapper and an emitter, or output to generate an output processor. |
2 | Set the class name base (automatically suffixed by the component type). If not set, the package is guessed and the classname is based on the basedir name. |
3 | Set the component family to use. If not specified, it defaults to the basedir name and removes "component[s]" from it. for example, my-component leads to my as family, unless it is explicitly set. |
4 | Specify if the generator needs to add component-api to the POM, if not already there. If you already added it, you can set it to false directly in the POM. |
For this command to work, you need to register the plugin as follows:
<plugin>
<groupId>org.talend.sdk.component</groupId>
<artifactId>talend-component-maven-plugin</artifactId>
<version>${talend-component.version}</version>
</plugin>
Talend Component Archive
Component ARchive (.car
) is the way to bundle a component to share it in the Talend ecosystem. It is a plain Java ARchive (.jar
) containing a metadata file and a nested Maven repository containing the component and its depenencies.
mvn talend-component:car
This command creates a .car
file in your build directory. This file can be shared on Talend platforms.
This CAR is executable and exposes the studio-deploy
command which takes
a Talend Studio home path as parameter. When executed, it installs the dependencies into the Studio and registers the component in your instance. For example:
# for a studio
java -jar mycomponent.car studio-deploy /path/to/my/studio
or
java -jar mycomponent.car studio-deploy --location /path/to/my/studio
# for a m2 provisioning
java -jar mycomponent.car maven-deploy /path/to/.m2/repository
or
java -jar mycomponent.car maven-deploy --location /path/to/.m2/repository
You can also upload the dependencies to your Nexus server using the following command:
java -jar mycomponent.car deploy-to-nexus --url <nexus url> --repo <repository name> --user <username> --pass <password> --threads <parallel threads number> --dir <temp directory>
In this command, Nexus URL and repository name are mandatory arguments. All other arguments are optional. If arguments contain spaces or special symbols, you need to quote the whole value of the argument. For example:
--pass "Y0u will \ not G4iess i' ^"
Gradle Plugin
gradle-talend-component
helps you write components that match the best practices. It is inspired from the Maven plugin and adds the ability to generate automatically the dependencies.txt
file used by the SDK to build the component classpath. For more information on the configuration, refer to the Maven properties matching the attributes.
You can use it as follows:
buildscript {
repositories {
mavenLocal()
mavenCentral()
}
dependencies {
classpath "org.talend.sdk.component:gradle-talend-component:${talendComponentVersion}"
}
}
apply plugin: 'org.talend.sdk.component'
apply plugin: 'java'
// optional customization
talendComponentKit {
// dependencies.txt generation, replaces maven-dependency-plugin
dependenciesLocation = "TALEND-INF/dependencies.txt"
boolean skipDependenciesFile = false;
// classpath for validation utilities
sdkVersion = "${talendComponentVersion}"
apiVersion = "${talendComponentApiVersion}"
// documentation
skipDocumentation = false
documentationOutput = new File(....)
documentationLevel = 2 // first level will be == in the generated adoc
documentationTitle = 'My Component Family' // default to project name
documentationFormats = [:] // adoc attributes
documentationFormats = [:] // renderings to do
// validation
skipValidation = false
validateFamily = true
validateSerializable = true
validateInternationalization = true
validateModel = true
validateOptionNames = true
validateMetadata = true
validateComponent = true
validateDataStore = true
validateDataSet = true
validateActions = true
// web
serverArguments = []
serverPort = 8080
// car
carOutput = new File(....)
carMetadata = [:] // custom meta (string key-value pairs)
}
Services
Internationalizing services
Internationalization requires following several best practices:
-
Storing messages using
ResourceBundle
properties file in your component module. -
The location of the properties is in the same package than the related components and is named
Messages
. For example,org.talend.demo.MyComponent
usesorg.talend.demo.Messages[locale].properties
. -
Use the internationalization API for your own messages.
Internationalization API
The Internationalization API is the mechanism to use to internationalize your own messages in your own components.
The principle of the API is to design messages as methods returning String
values and get back a template using a ResourceBundle
named Messages
and located in the same package than the interface that defines these methods.
To ensure your internationalization API is identified, you need to mark it with the @Internationalized
annotation:
@Internationalized (1)
public interface Translator {
String message();
String templatizedMessage(String arg0, int arg1); (2)
String localized(String arg0, @Language Locale locale); (3)
String localized(String arg0, @Language String locale); (4)
}
1 | @Internationalized allows to mark a class as an internationalized service. |
2 | You can pass parameters. The message uses the MessageFormat syntax to be resolved, based on the ResourceBundle template. |
3 | You can use @Language on a Locale parameter to specify manually the locale to use. Note that a single value is used (the first parameter tagged as such). |
4 | @Language also supports the String type. |
Providing actions for consumers
In some cases you can need to add some actions that are not related to the runtime. For example, enabling clients - the users of the plugin/library - to test if a connection works properly.
To do so, you need to define an @Action
, which is a method with a name (representing the event name), in a class decorated with @Service
:
@Service
public class MyDbTester {
@Action(family = "mycomp", "test")
public Status doTest(final IncomingData data) {
return ...;
}
}
Services are singleton. If you need some thread safety, make sure that they match that requirement. Services should not store any status either because they can be serialized at any time. Status are held by the component. |
Services can be used in components as well (matched by type). They allow to reuse some shared logic, like a client. Here is a sample with a service used to access files:
@Emitter(family = "sample", name = "reader")
public class PersonReader implements Serializable {
// attributes skipped to be concise
public PersonReader(@Option("file") final File file,
final FileService service) {
this.file = file;
this.service = service;
}
// use the service
@PostConstruct
public void open() throws FileNotFoundException {
reader = service.createInput(file);
}
}
The service is automatically passed to the constructor. It can be used as a bean. In that case, it is only necessary to call the service method.
Particular action types
Some common actions need a clear contract so they are defined as API first-class citizen. For example, this is the case for wizards or health checks. Here is the list of the available actions:
API | Type | Description | Return type | Sample returned type |
---|---|---|---|---|
@org.talend.sdk.component.api.service.completion.DynamicValues |
dynamic_values |
Mark a method as being useful to fill potential values of a string option for a property denoted by its value. You can link a field as being completable using @Proposable(value). The resolution of the completion action is then done through the component family and value of the action. The callback doesn’t take any parameter. |
Values |
|
@org.talend.sdk.component.api.service.healthcheck.HealthCheck |
healthcheck |
This class marks an action doing a connection test |
HealthCheckStatus |
|
@org.talend.sdk.component.api.service.schema.DiscoverSchema |
schema |
Mark an action as returning a discovered schema. Its parameter MUST be the type decorated with |
Schema |
|
@org.talend.sdk.component.api.service.completion.Suggestions |
suggestions |
Mark a method as being useful to fill potential values of a string option. You can link a field as being completable using @Suggestable(value). The resolution of the completion action is then done when the user requests it (generally by clicking on a button or entering the field depending the environment). |
SuggestionValues |
|
@org.talend.sdk.component.api.service.Action |
user |
- |
any |
- |
@org.talend.sdk.component.api.service.asyncvalidation.AsyncValidation |
validation |
Mark a method as being used to validate a configuration. IMPORTANT: this is a server validation so only use it if you can’t use other client side validation to implement it. |
ValidationResult |
|
Internationalization
Internationalization is supported through the injection of the $lang
parameter, which allows you to get the correct locale to use with an @Internationalized
service:
public SuggestionValues findSuggestions(@Option("someParameter") final String param,
@Option("$lang") final String lang) {
return ...;
}
You can combine the $lang option with the @Internationalized and @Language parameters.
|
Built-in services
The framework provides built-in services that you can inject by type in components and actions.
Here is the list:
Type | Description |
---|---|
|
Provides a small abstraction to cache data that does not need to be recomputed very often. Commonly used by actions for UI interactions. |
|
Allows to resolve a dependency from its Maven coordinates. |
|
A JSON-B instance. If your model is static and you don’t want to handle the serialization manually using JSON-P, you can inject that instance. |
|
A JSON-P instance. Prefer other JSON-P instances if you don’t exactly know why you use this one. |
|
A JSON-P instance. It is recommended to use this one instead of a custom one to optimize memory usage and speed. |
|
A JSON-P instance. It is recommended to use this one instead of a custom one to optimize memory usage and speed. |
|
A JSON-P instance. It is recommended to use this one instead of a custom one to optimize memory usage and speed. |
|
A JSON-P instance. It is recommended to use this one instead of a custom one to optimize memory usage and speed. |
|
A JSON-P instance. It is recommended to use this one instead of a custom one to optimize memory usage and speed. |
|
Represents the local configuration that can be used during the design. |
|
Allows to resolve files from Maven coordinates (like |
|
Utility to inject services in fields marked with |
|
Allows to instantiate an object from its class name and properties. It is not recommended to use it for the runtime because the local configuration is usually different and the instances are distinct. You can also use the local cache as an interceptor with |
Every interface that extends |
Lets you define an HTTP client in a declarative manner using an annotated interface. See the Using HttpClient for more details. |
All these injected services are serializable, which is important for big data environments. If you create the instances yourself, you cannot benefit from these features, nor from the memory optimization done by the runtime. Prefer reusing the framework instances over custom ones. |
Using HttpClient
The HttpClient usage is described in this section by using the REST API example below. It is assume that it requires a basic authentication header.
GET |
- |
POST |
JSON payload to be created: |
To create an HTTP client that is able to consume the REST API above, you need to define an interface that extends HttpClient
.
The HttpClient
interface lets you set the base
for the HTTP address that the client will hit.
The base
is the part of the address that needs to be added to the request path to hit the API.
Every method annotated with @Request
in the interface defines an HTTP request.
Every request can have a @Codec
parameter that allows to encode or decode the request/response payloads.
You can ignore the encoding/decoding for String and Void payloads.
|
public interface APIClient extends HttpClient {
@Request(path = "api/records/{id}", method = "GET")
@Codec(decoder = RecordDecoder.class) //decoder = decode returned data to Record class
Record getRecord(@Header("Authorization") String basicAuth, @Path("id") int id);
@Request(path = "api/records", method = "POST")
@Codec(encoder = RecordEncoder.class, decoder = RecordDecoder.class) //encoder = encode record to fit request format (json in this example)
Record createRecord(@Header("Authorization") String basicAuth, Record record);
}
The interface should extend HttpClient .
|
In the codec classes (that implement Encoder/Decoder), you can inject any of your service annotated with @Service
or @Internationalized
into the constructor.
Internationalization services can be useful to have internationalized messages for errors handling.
The interface can be injected into component classes or services to consume the defined API.
@Service
public class MyService {
private APIClient client;
public MyService(...,APIClient client){
//...
this.client = client;
client.base("http://localhost:8080");// init the base of the api, ofen in a PostConstruct or init method
}
//...
// Our get request
Record rec = client.getRecord("Basic MLFKG?VKFJ", 100);
//...
// Our post request
Record newRecord = client.createRecord("Basic MLFKG?VKFJ", new Record());
}
By default, /+json are mapped to JSON-P and /+xml to JAX-B if the model has a @XmlRootElement annotation.
|
Customizing HTTP client requests
For advanced cases, you can customize the Connection
by directly using @UseConfigurer
on the method. It calls your custom instance of Configurer
. Note that you can use @ConfigurerOption
in the method signature to pass some Configurer
configurations.
For example, if you have the following Configurer
:
public class BasicConfigurer implements Configurer {
@Override
public void configure(final Connection connection, final ConfigurerConfiguration configuration) {
final String user = configuration.get("username", String.class);
final String pwd = configuration.get("password", String.class);
connection.withHeader(
"Authorization",
Base64.getEncoder().encodeToString((user + ':' + pwd).getBytes(StandardCharsets.UTF_8)));
}
}
You can then set it on a method to automatically add the basic header with this kind of API usage:
public interface APIClient extends HttpClient {
@Request(path = "...")
@UseConfigurer(BasicConfigurer.class)
Record findRecord(@ConfigurerOption("username") String user, @ConfigurerOption("password") String pwd);
}
Services and interceptors
For common concerns such as caching, auditing, and so on, you can use an interceptor-like API. It is enabled on services by the framework.
An interceptor defines an annotation marked with @Intercepts
, which defines the implementation of the interceptor (InterceptorHandler
).
For example:
@Intercepts(LoggingHandler.class)
@Target({ TYPE, METHOD })
@Retention(RUNTIME)
public @interface Logged {
String value();
}
The handler is created from its constructor and can take service injections (by type). The first parameter, however, can be BiFunction<Method, Object[], Object>
, which represents the invocation chain if your interceptor can be used with others.
If you make a generic interceptor, pass the invoker as first parameter. Otherwise you cannot combine interceptors at all. |
Here is an example of interceptor implementation for the @Logged
API:
public class LoggingHandler implements InterceptorHandler {
// injected
private final BiFunction<Method, Object[], Object> invoker;
private final SomeService service;
// internal
private final ConcurrentMap<Method, String> loggerNames = new ConcurrentHashMap<>();
public CacheHandler(final BiFunction<Method, Object[], Object> invoker, final SomeService service) {
this.invoker = invoker;
this.service = service;
}
@Override
public Object invoke(final Method method, final Object[] args) {
final String name = loggerNames.computeIfAbsent(method, m -> findAnnotation(m, Logged.class).get().value());
service.getLogger(name).info("Invoking {}", method.getName());
return invoker.apply(method, args);
}
}
This implementation is compatible with interceptor chains because it takes the invoker as first constructor parameter and it also takes a service injection. Then, the implementation simply does what is needed, which is logging the invoked method in this case.
The findAnnotation annotation, inherited from InterceptorHandler , is an utility method to find an annotation on a method or class (in this order).
|
Creating a job pipeline
Job Builder
The Job
builder lets you create a job pipeline programmatically using Talend components
(Producers and Processors).
The job pipeline is an acyclic graph, allowing you to build complex pipelines.
Let’s take a simple use case where two data sources (employee and salary) are formatted to CSV and the result is written to a file.
A job is defined based on components (nodes) and links (edges) to connect their branches together.
Every component is defined by a unique id
and an URI that identify the component.
The URI follows the form [family]://[component][?version][&configuration]
, where:
-
family is the name of the component family.
-
component is the name of the component.
-
version is the version of the component. It is represented in a key=value format. The key is
__version
and the value is a number. -
configuration is component configuration. It is represented in a key=value format. The key is the path of the configuration and the value is a `string' corresponding to the configuration value.
job://csvFileGen?__version=1&path=/temp/result.csv&encoding=utf-8"
configuration parameters must be URI/URL encoded. |
Job.components() (1)
.component("employee","db://input")
.component("salary", "db://input")
.component("concat", "transform://concat?separator=;")
.component("csv", "file://out?__version=2")
.connections() (2)
.from("employee").to("concat", "string1")
.from("salary").to("concat", "string2")
.from("concat").to("csv")
.build() (3)
.run(); (4)
1 | Defining all components used in the job pipeline. |
2 | Defining the connections between the components to construct the job pipeline. The links from /to use the component id and the default input/output branches.You can also connect a specific branch of a component, if it has multiple or named input/output branches, using the methods from(id, branchName) and to(id, branchName) .In the example above, the concat component has two inputs ("string1" and "string2"). |
3 | Validating the job pipeline by asserting that:
|
4 | Running the job pipeline. |
In this version, the execution of the job is linear. Components are not executed in parallel even if some steps may be independents. |
Environment/Runner
Depending on the configuration, you can select the environment which you execute your job in.
To select the environment, the logic is the following one:
-
If an
org.talend.sdk.component.runtime.manager.chain.Job.ExecutorBuilder
class is passed through the job properties, then use it. The supported types are aExecutionBuilder
instance, aClass
or aString
. -
if an
ExecutionBuilder
SPI is present, use it. It is the case ifcomponent-runtime-beam
is present in your classpath. -
else, use a local/standalone execution.
In the case of a Beam execution, you can customize the pipeline options using system properties. They have to be prefixed with talend.beam.job.
. For example, to set the appName
option, you need to use -Dtalend.beam.job.appName=mytest
.
Key Provider
The job builder lets you set a key provider to join your data when a component has multiple inputs. The key provider can be set contextually to a component or globally to the job.
Job.components()
.component("employee","db://input")
.property(GroupKeyProvider.class.getName(),
(GroupKeyProvider) context -> context.getData().getString("id")) (1)
.component("salary", "db://input")
.component("concat", "transform://concat?separator=;")
.connections()
.from("employee").to("concat", "string1")
.from("salary").to("concat", "string2")
.build()
.property(GroupKeyProvider.class.getName(), (2)
(GroupKeyProvider) context -> context.getData().getString("employee_id"))
.run();
1 | Defining a key provider for the data produced by the employee component. |
2 | Defining a key provider for all data manipulated in the job. |
If the incoming data has different IDs, you can provide a complex global key provider that relies on the context given by the component id
and the branch name
.
GroupKeyProvider keyProvider = context -> {
if ("employee".equals(context.getComponentId())) {
return context.getData().getString("id");
}
return context.getData().getString("employee_id");
};
Beam case
For Beam case, you need to rely on Beam pipeline definition and use the component-runtime-beam
dependency, which provides Beam bridges.
Inputs and Outputs
org.talend.sdk.component.runtime.beam.TalendIO
provides a way to convert a partition mapper or a processor to an input or processor using the read
or write
methods.
public class Main {
public static void main(final String[] args) {
final ComponentManager manager = ComponentManager.instance()
Pipeline pipeline = Pipeline.create();
//Create beam input from mapper and apply input to pipeline
pipeline.apply(TalendIO.read(manager.findMapper(manager.findMapper("sample", "reader", 1, new HashMap<String, String>() {{
put("fileprefix", "input");
}}).get()))
.apply(new ViewsMappingTransform(emptyMap(), "sample")) // prepare it for the output record format (see next part)
//Create beam processor from talend processor and apply to pipeline
.apply(TalendIO.write(manager.findProcessor("test", "writer", 1, new HashMap<String, String>() {{
put("fileprefix", "output");
}}).get(), emptyMap()));
//... run pipeline
}
}
Processors
org.talend.sdk.component.runtime.beam.TalendFn
provides the way to wrap a processor in a Beam PTransform
and to integrate it into the pipeline.
public class Main {
public static void main(final String[] args) {
//Component manager and pipeline initialization...
//Create beam PTransform from processor and apply input to pipeline
pipeline.apply(TalendFn.asFn(manager.findProcessor("sample", "mapper", 1, emptyMap())).get())), emptyMap());
//... run pipeline
}
}
The multiple inputs and outputs are represented by a Map
element in Beam case to avoid using multiple inputs and outputs.
You can use ViewsMappingTransform or CoGroupByKeyResultMappingTransform to adapt the input/output format to the record format representing the multiple inputs/output, like Map<String, List<?>> , but materialized as a JsonObject . Input data must be of the JsonObject type in this case.
|
Converting a Beam.io into a component I/O
For simple inputs and outputs, you can get an automatic and transparent conversion of the Beam.io into an I/O component, if you decorated your PTransform
with @PartitionMapper
or @Processor
.
However, there are limitations:
-
Inputs must implement
PTransform<PBegin, PCollection<?>>
and must be aBoundedSource
. -
Outputs must implement
PTransform<PCollection<?>, PDone>
and register aDoFn
on the inputPCollection
.
For more information, see the How to wrap a Beam I/O page.
Advanced: defining a custom API
It is possible to extend the Component API for custom front features.
What is important here is to keep in mind that you should do it only if it targets not portable components (only used by the Studio or Beam).
It is recommended to create a custom xxxx-component-api
module with the new set of annotations.
Talend Component Testing Documentation
Testing best practices
This section mainly concerns tools that can be used with JUnit. You can use most of these best practices with TestNG as well.
Parameterized tests
Parameterized tests are a great solution to repeat the same test multiple times. This method of testing requires defining a test scenario (I test function F
) and making the input/output data dynamic.
JUnit 4
Here is a test example, which validates a connection URI using ConnectionService
:
public class MyConnectionURITest {
@Test
public void checkMySQL() {
assertTrue(new ConnectionService().isValid("jdbc:mysql://localhost:3306/mysql"));
}
@Test
public void checkOracle() {
assertTrue(new ConnectionService().isValid("jdbc:oracle:thin:@//myhost:1521/oracle"));
}
}
The testing method is always the same. Only values are changing. It can therefore be rewritten using JUnit Parameterized
runner, as follows:
@RunWith(Parameterized.class) (1)
public class MyConnectionURITest {
@Parameterized.Parameters(name = "{0}") (2)
public static Iterable<String> uris() { (3)
return asList(
"jdbc:mysql://localhost:3306/mysql",
"jdbc:oracle:thin:@//myhost:1521/oracle");
}
@Parameterized.Parameter (4)
public String uri;
@Test
public void isValid() { (5)
assertNotNull(uri);
}
}
1 | Parameterized is the runner that understands @Parameters and how to use it. If needed, you can generate random data here. |
2 | By default the name of the executed test is the index of the data. Here, it is customized using the first toString() parameter value to have something more readable. |
3 | The @Parameters method must be static and return an array or iterable of the data used by the tests. |
4 | You can then inject the current data using the @Parameter annotation. It can take a parameter if you use an array of array instead of an iterable of object in @Parameterized . You can select which item you want to inject. |
5 | The @Test method is executed using the contextual data. In this sample, it gets executed twice with the two specified URIs. |
You don’t have to define a single @Test method. If you define multiple methods, each of them is executed with all the data. For example, if another test is added to the previous example, four tests are executed - 2 per data).
|
JUnit 5
With JUnit 5, parameterized tests are easier to use. The full documentation is available at junit.org/junit5/docs/current/user-guide/#writing-tests-parameterized-tests.
The main difference with JUnit 4 is that you can also define inline that the test method is a parameterized test as well as the values to use:
@ParameterizedTest
@ValueSource(strings = { "racecar", "radar", "able was I ere I saw elba" })
void mytest(String currentValue) {
// do test
}
However, you can still use the previous behavior with a method binding configuration:
@ParameterizedTest
@MethodSource("stringProvider")
void mytest(String currentValue) {
// do test
}
static Stream<String> stringProvider() {
return Stream.of("foo", "bar");
}
This last option allows you to inject any type of value - not only primitives - which is common to define scenarios.
Add the junit-jupiter-params dependency to benefit from this feature.
|
component-runtime-testing
component-runtime-junit
component-runtime-junit
is a test library that allows you to validate simple logic based on the Talend Component Kit tooling.
To import it, add the following dependency to your project:
<dependency>
<groupId>org.talend.sdk.component</groupId>
<artifactId>component-runtime-junit</artifactId>
<version>${talend-component.version}</version>
<scope>test</scope>
</dependency>
This dependency also provides mocked components that you can use with your own component to create tests.
The mocked components are provided under the test
family:
-
emitter
: a mock of an input component -
collector
: a mock of an output component
JUnit 4
You can define a standard JUnit test and use the SimpleComponentRule
rule:
public class MyComponentTest {
@Rule (1)
public final SimpleComponentRule components = new SimpleComponentRule("org.talend.sdk.component.mycomponent");
@Test
public void produce() {
Job.components() (2)
.component("mycomponent","yourcomponentfamily://yourcomponent?"+createComponentConfig())
.component("collector", "test://collector")
.connections()
.from("mycomponent").to("collector")
.build()
.run();
final List<MyRecord> records = components.getCollectedData(MyRecord.class); (3)
doAssertRecords(records); // depending your test
}
}
1 | The rule creates a component manager and provides two mock components: an emitter and a collector. Set the root package of your component to enable it. |
2 | Define any chain that you want to test. It generally uses the mock as source or collector. |
3 | Validate your component behavior. For a source, you can assert that the right records were emitted in the mock collect. |
The rule can also be defined as a @ClassRule to start it once per class and not per test as with @Rule .
|
To go further, you can add the ServiceInjectionRule
rule, which allows to inject all the component family services into the test class by marking test class fields with @InjectService
:
public class SimpleComponentRuleTest {
@ClassRule
public static final SimpleComponentRule COMPONENT_FACTORY = new SimpleComponentRule("...");
@Rule (1)
public final ServiceInjectionRule injections = new ServiceInjectionRule(COMPONENT_FACTORY, this); (2)
@Service (3)
private LocalConfiguration configuration;
@Service
private Jsonb jsonb;
@Test
public void test() {
// ...
}
}
1 | The injection requires the test instance, so it must be a @Rule rather than a @ClassRule . |
2 | The ComponentsController is passed to the rule, which for JUnit 4 is the SimpleComponentRule , as well as the test instance to inject services in. |
3 | All service fields are marked with @Service to let the rule inject them before the test is ran. |
JUnit 5
The JUnit 5 integration is very similar to JUnit 4, except that it uses the JUnit 5 extension mechanism.
The entry point is the @WithComponents
annotation that you add to your test class, and which takes the component package you want to test. You can use @Injected
to inject an instance of ComponentsHandler
- which exposes the same utilities than the JUnit 4 rule - in a test class field :
@WithComponents("org.talend.sdk.component.junit.component") (1)
public class ComponentExtensionTest {
@Injected (2)
private ComponentsHandler handler;
@Test
public void manualMapper() {
final Mapper mapper = handler.createMapper(Source.class, new Source.Config() {
{
values = asList("a", "b");
}
});
assertFalse(mapper.isStream());
final Input input = mapper.create();
assertEquals("a", input.next());
assertEquals("b", input.next());
assertNull(input.next());
}
}
1 | The annotation defines which components to register in the test context. |
2 | The field allows to get the handler to be able to orchestrate the tests. |
If you use JUnit 5 for the first time, keep in mind that the imports changed and that you need to use org.junit.jupiter.api.Test instead of org.junit.Test .
Some IDE versions and surefire versions can also require you to install either a plugin or a specific configuration.
|
As for JUnit 4, you can go further by injecting test class fields marked with @InjectService
, but there is no additional extension to specify in this case:
@WithComponents("...")
class ComponentExtensionTest {
@Service (1)
private LocalConfiguration configuration;
@Service
private Jsonb jsonb;
@Test
void test() {
// ...
}
}
1 | All service fields are marked with @Service to let the rule inject them before the test is ran. |
Mocking the output
Using the "test"/"collector" component as shown in the previous sample stores all records emitted by the chain (typically your source) in memory. You can then access them using theSimpleComponentRule.getCollectedData(type)
.
Note that this method filters by type. If you don’t need any specific type, you can use Object.class
.
Mocking the input
The input mocking is symmetric to the output. In this case, you provide the data you want to inject:
public class MyComponentTest {
@Rule
public final SimpleComponentRule components = new SimpleComponentRule("org.talend.sdk.component.mycomponent");
@Test
public void produce() {
components.setInputData(asList(createData(), createData(), createData())); (1)
Job.components()
.component("emitter","test://emitter")
.component("out", "yourcomponentfamily://myoutput?"+createComponentConfig())
.connections()
.from("emitter").to("out")
.build
.run();
assertMyOutputProcessedTheInputData();
}
}
1 | using setInputData , you prepare the execution(s) to have a fake input when using the "test"/"emitter" component. |
Creating runtime configuration from component configuration
The component configuration is a POJO (using @Option
on fields) and the runtime configuration (ExecutionChainBuilder
) uses a Map<String, String>
. To make the conversion easier, the JUnit integration provides a SimpleFactory.configurationByExample
utility to get this map instance from a configuration instance.
final MyComponentConfig componentConfig = new MyComponentConfig();
componentConfig.setUser("....");
// .. other inits
final Map<String, String> configuration = configurationByExample(componentConfig);
The same factory provides a fluent DSL to create the configuration by calling configurationByExample
without any parameter.
The advantage is to be able to convert an object as a Map<String, String>
or as a query string
in order to use it with the Job
DSL:
final String uri = "family://component?" +
configurationByExample().forInstance(componentConfig).configured().toQueryString();
It handles the encoding of the URI to ensure it is correctly done.
Testing a Mapper
The SimpleComponentRule
also allows to test a mapper unitarily. You can get an instance from a configuration and execute this instance to collect the output.
public class MapperTest {
@ClassRule
public static final SimpleComponentRule COMPONENT_FACTORY = new SimpleComponentRule(
"org.company.talend.component");
@Test
public void mapper() {
final Mapper mapper = COMPONENT_FACTORY.createMapper(MyMapper.class, new Source.Config() {{
values = asList("a", "b");
}});
assertEquals(asList("a", "b"), COMPONENT_FACTORY.collectAsList(String.class, mapper));
}
}
Testing a Processor
As for a mapper, a processor is testable unitary. However, this case can be more complex in case of multiple inputs or outputs.
public class ProcessorTest {
@ClassRule
public static final SimpleComponentRule COMPONENT_FACTORY = new SimpleComponentRule(
"org.company.talend.component");
@Test
public void processor() {
final Processor processor = COMPONENT_FACTORY.createProcessor(Transform.class, null);
final SimpleComponentRule.Outputs outputs = COMPONENT_FACTORY.collect(processor,
new JoinInputFactory().withInput("__default__", asList(new Transform.Record("a"), new Transform.Record("bb")))
.withInput("second", asList(new Transform.Record("1"), new Transform.Record("2")))
);
assertEquals(2, outputs.size());
assertEquals(asList(2, 3), outputs.get(Integer.class, "size"));
assertEquals(asList("a1", "bb2"), outputs.get(String.class, "value"));
}
}
The rule allows you to instantiate a Processor
from your code, and then to collect
the output from the inputs you pass in. There are two convenient implementations of the input factory:
-
MainInputFactory
for processors using only the default input. -
JoinInputfactory
with thewithInput(branch, data)
method for processors using multiple inputs. The first argument is the branch name and the second argument is the data used by the branch.
If needed, you can also implement your own input representation using org.talend.sdk.component.junit.ControllableInputFactory .
|
component-runtime-testing-spark
The following artifact allows you to test against a Spark cluster:
<dependency>
<groupId>org.talend.sdk.component</groupId>
<artifactId>component-runtime-testing-spark</artifactId>
<version>${talend-component.version}</version>
<scope>test</scope>
</dependency>
JUnit 4
The testing relies on a JUnit TestRule
. It is recommended to use it as a @ClassRule
, to make sure that a single instance of a Spark cluster is built. You can also use it as a simple @Rule
, to create the Spark cluster instances per method instead of per test class.
The @ClassRule
takes the Spark and Scala versions to use as parameters. It then forks a master and N slaves.
Finally, the submit*
method allows you to send jobs either from the test classpath or from a shade if you run it as an integration test.
For example:
public class SparkClusterRuleTest {
@ClassRule
public static final SparkClusterRule SPARK = new SparkClusterRule("2.10", "1.6.3", 1);
@Test
public void classpathSubmit() throws IOException {
SPARK.submitClasspath(SubmittableMain.class, getMainArgs());
// wait for the test to pass
}
}
This testing methodology works with @Parameterized . You can submit several jobs with different arguments and even combine it with Beam TestPipeline if you make it transient .
|
JUnit 5
The integration of that Spark cluster logic with JUnit 5 is done using the @WithSpark
marker for the extension. Optionally, it allows you to inject—through @SparkInject
—the BaseSpark<?>
handler to access the Spark cluster meta information. For example, its host/port.
@WithSpark
class SparkExtensionTest {
@SparkInject
private BaseSpark<?> spark;
@Test
void classpathSubmit() throws IOException {
final File out = new File(jarLocation(SparkClusterRuleTest.class).getParentFile(), "classpathSubmitJunit5.out");
if (out.exists()) {
out.delete();
}
spark.submitClasspath(SparkClusterRuleTest.SubmittableMain.class, spark.getSparkMaster(), out.getAbsolutePath());
await().atMost(5, MINUTES).until(
() -> out.exists() ? Files.readAllLines(out.toPath()).stream().collect(joining("\n")).trim() : null,
equalTo("b -> 1\na -> 1"));
}
}
Checking the job execution status
Currently, SparkClusterRule
does not allow to know when a job execution is done, even by exposing and polling the web UI URL to check. The best solution at the moment is to make sure that the output of your job exists and contains the right value.
awaitability
or any equivalent library can help you to implement such logic:
<dependency>
<groupId>org.awaitility</groupId>
<artifactId>awaitility</artifactId>
<version>3.0.0</version>
<scope>test</scope>
</dependency>
To wait until a file exists and check that its content (for example) is the expected one, you can use the following logic:
await()
.atMost(5, MINUTES)
.until(
() -> out.exists() ? Files.readAllLines(out.toPath()).stream().collect(joining("\n")).trim() : null,
equalTo("the expected content of the file"));
component-runtime-http-junit
The HTTP JUnit module allows you to mock REST API very simply. The module coordinates are:
<dependency>
<groupId>org.talend.sdk.component</groupId>
<artifactId>component-runtime-http-junit</artifactId>
<version>${talend-component.version}</version>
<scope>test</scope>
</dependency>
This module uses Apache Johnzon and Netty. If you have any conflict (in particular with Netty), you can add the shaded classifier to the dependency. This way, both dependencies are shaded, which avoids conflicts with your component.
|
It supports both JUnit 4 and JUnit 5. The concept is the exact same one: the extension/rule is able to serve precomputed responses saved in the classpath.
You can plug your own ResponseLocator
to map a request to a response, but the default implementation - which should be sufficient in most cases - looks in talend/testing/http/<class name>_<method name>.json
. Note that you can also put it in talend/testing/http/<request path>.json
.
JUnit 4
JUnit 4 setup is done through two rules:
-
JUnit4HttpApi
, which is starts the server. -
JUnit4HttpApiPerMethodConfigurator
, which configures the server per test and also handles the capture mode.
If you don’t use the JUnit4HttpApiPerMethodConfigurator , the capture feature is disabled and the per test mocking is not available.
|
public class MyRESTApiTest {
@ClassRule
public static final JUnit4HttpApi API = new JUnit4HttpApi();
@Rule
public final JUnit4HttpApiPerMethodConfigurator configurator = new JUnit4HttpApiPerMethodConfigurator(API);
@Test
public void direct() throws Exception {
// ... do your requests
}
}
SSL
For tests using SSL-based services, you need to use activeSsl()
on the JUnit4HttpApi
rule.
You can access the client SSL socket factory through the API handler:
@ClassRule
public static final JUnit4HttpApi API = new JUnit4HttpApi().activeSsl();
@Test
public void test() throws Exception {
final HttpsURLConnection connection = getHttpsConnection();
connection.setSSLSocketFactory(API.getSslContext().getSocketFactory());
// ....
}
JUnit 5
JUnit 5 uses a JUnit 5 extension based on the HttpApi
annotation that you can add to your test class. You can inject the test handler - which has some utilities for advanced cases - through @HttpApiInject
:
@HttpApi
class JUnit5HttpApiTest {
@HttpApiInject
private HttpApiHandler<?> handler;
@Test
void getProxy() throws Exception {
// .... do your requests
}
}
The injection is optional and the @HttpApi annotation allows you to configure several test behaviors.
|
SSL
For tests using SSL-based services, you need to use @HttpApi(useSsl = true)
.
You can access the client SSL socket factory through the API handler:
@HttpApi*(useSsl = true)*
class MyHttpsApiTest {
@HttpApiInject
private HttpApiHandler<?> handler;
@Test
void test() throws Exception {
final HttpsURLConnection connection = getHttpsConnection();
connection.setSSLSocketFactory(handler.getSslContext().getSocketFactory());
// ....
}
}
Capturing mode
The strength of this implementation is to run a small proxy server and to auto-configure the JVM:
http[s].proxyHost
, http[s].proxyPort
, HttpsURLConnection#defaultSSLSocketFactory
and SSLContext#default
are auto-configured to work out-of-the-box with the proxy.
It allows you to keep the native and real URLs in your tests. For example, the following test is valid:
public class GoogleTest {
@ClassRule
public static final JUnit4HttpApi API = new JUnit4HttpApi();
@Rule
public final JUnit4HttpApiPerMethodConfigurator configurator = new JUnit4HttpApiPerMethodConfigurator(API);
@Test
public void google() throws Exception {
assertEquals(HttpURLConnection.HTTP_OK, get("https://google.fr?q=Talend"));
}
private int get(final String uri) throws Exception {
// do the GET request, skipped for brievity
}
}
If you execute this test, it fails with an HTTP 400 error because the proxy does not find the mocked response.
You can create it manually, as described in component-runtime-http-junit, but you can also set the talend.junit.http.capture
property to the folder storing the captures. It must be the root folder and not the folder where the JSON files are located (not prefixed by talend/testing/http
by default).
In most cases, use src/test/resources
. If new File("src/test/resources")
resolves the valid folder when executing your test (Maven default), then you can just set the system property to true
. Otherwise, you need to adjust accordingly the system property value.
When the tests run with this system property, the testing framework creates the correct mock response files. After that, you can remove the system property. The tests will still pass, using google.com
, even if you disconnect your machine from the Internet.
Beam testing
If you want to make sure that your component works in Beam and don’t want to use Spark, you can try with the Direct Runner.
Check beam.apache.org/contribute/testing/ for more details.
Testing on multiple environments
JUnit (4 or 5) already provides ways to parameterize tests and execute the same "test logic" against several sets of data. However, it is not very convenient for testing multiple environments.
For example, with Beam, you can test your code against multiple runners. But it requires resolving conflicts between runner dependencies, setting the correct classloaders, and so on.
To simplify such cases, the framework provides you a multi-environment support for your tests, through the JUnit module, which works with both JUnit 4 and JUnit 5.
JUnit 4
@RunWith(MultiEnvironmentsRunner.class)
@Environment(Env1.class)
@Environment(Env2.class)
public class TheTest {
@Test
public void test1() {
// ...
}
}
The MultiEnvironmentsRunner
executes the tests for each defined environments. With the example above, it means that it runs test1
for Env1
and Env2
.
By default, the JUnit4
runner is used to execute the tests in one environment, but you can use @DelegateRunWith
to use another runner.
JUnit 5
The multi-environment configuration with JUnit 5 is similar to JUnit 4:
@Environment(EnvironmentsExtensionTest.E1.class)
@Environment(EnvironmentsExtensionTest.E2.class)
class TheTest {
@EnvironmentalTest
void test1() {
// ...
}
}
The main differences are that no runner is used because they do not exist in JUnit 5, and that you need to replace @Test
by @EnvironmentalTest
.
With JUnit5, tests are executed one after another for all environments, while tests are ran sequentially in each environments with JUnit 4. For example, this means that @BeforeAll and @AfterAll are executed once for all runners.
|
Provided environments
The provided environment sets the contextual classloader in order to load the related runner of Apache Beam.
Package: org.talend.sdk.component.junit.environment.builtin.beam
the configuration is read from system properties, environment variables, …. |
Class | Name | Description |
---|---|---|
ContextualEnvironment |
Contextual |
Contextual runner |
DirectRunnerEnvironment |
Direct |
Direct runner |
FlinkRunnerEnvironment |
Flink |
Flink runner |
SparkRunnerEnvironment |
Spark |
Spark runner |
Configuring environments
If the environment extends BaseEnvironmentProvider
and therefore defines an environment name - which is the case of the default ones - you can use EnvironmentConfiguration
to customize the system properties used for that environment:
@Environment(DirectRunnerEnvironment.class)
@EnvironmentConfiguration(
environment = "Direct",
systemProperties = @EnvironmentConfiguration.Property(key = "beamTestPipelineOptions", value = "..."))
@Environment(SparkRunnerEnvironment.class)
@EnvironmentConfiguration(
environment = "Spark",
systemProperties = @EnvironmentConfiguration.Property(key = "beamTestPipelineOptions", value = "..."))
@Environment(FlinkRunnerEnvironment.class)
@EnvironmentConfiguration(
environment = "Flink",
systemProperties = @EnvironmentConfiguration.Property(key = "beamTestPipelineOptions", value = "..."))
class MyBeamTest {
@EnvironmentalTest
void execute() {
// run some pipeline
}
}
If you set the <environment name>.skip system property to true , the environment-related executions are skipped.
|
Advanced usage
This usage assumes that Beam 2.4.0 is used.
The following dependencies bring the JUnit testing toolkit, the Beam integration and the multi-environment testing toolkit for JUnit into the test scope.
<dependencies>
<dependency>
<groupId>org.talend.sdk.component</groupId>
<artifactId>component-runtime-junit</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-api</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.jboss.shrinkwrap.resolver</groupId>
<artifactId>shrinkwrap-resolver-impl-maven</artifactId>
<version>3.1.3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.talend.sdk.component</groupId>
<artifactId>component-runtime-beam</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
Using the fluent DSL to define jobs, you can write a test as follows:
Your job must be linear and each step must send a single value (no multi-input or multi-output). |
@Environment(ContextualEnvironment.class)
@Environment(DirectRunnerEnvironment.class)
class TheComponentTest {
@EnvironmentalTest
void testWithStandaloneAndBeamEnvironments() {
from("myfamily://in?config=xxxx")
.to("myfamily://out")
.create()
.execute();
// add asserts on the output if needed
}
}
It executes the chain twice:
-
With a standalone environment to simulate the Studio.
-
With a Beam (direct runner) environment to ensure the portability of your job.
Secrets/Passwords and Maven
You can reuse Maven settings.xml
server files, including the encrypted ones.
org.talend.sdk.component.maven.MavenDecrypter
allows yo to find a username
/password
from
a server identifier:
final MavenDecrypter decrypter = new MavenDecrypter();
final Server decrypted = decrypter.find("my-test-server");
// decrypted.getUsername();
// decrypted.getPassword();
It is very useful to avoid storing secrets and to perform tests on real systems on a continuous integration platform.
Even if you don’t use Maven on the platform, you can generate the settings.xml and`settings-security.xml` files to use that feature. See maven.apache.org/guides/mini/guide-encryption.html for more details.
|
Generating data
Several data generators exist if you want to populate objects with a semantic that is more evolved than a plain random string like commons-lang3
:
Even more advanced, the following generators allow to directly bind generic data on a model. However, data quality is not always optimal:
There are two main kinds of implementation:
-
Implementations using a pattern and random generated data.
-
Implementations using a set of precomputed data extrapolated to create new values.
Check your use case to know which one fits best.
An alternative to data generation can be to import real data and use Talend Studio to sanitize the data, by removing sensitive information and replacing it with generated or anonymized data. Then you just need to inject that file into the system. |
If you are using JUnit 5, you can have a look at glytching.github.io/junit-extensions/randomBeans.