Integrating components into Talend Studio

To be able to see and use your newly developed components, you need to integrate them to the right application.

Currently, you can deploy your components to Talend Studio as part of your development process to iterate on them:

You can also share your components externally and install them using a component archive (.car) file.

Check the versions of the framework that are compatible with your version of Talend Studio in this document.

If you were used to create custom components with the Javajet framework and want to get to know the new approach and main differences of the Component Kit framework, refer to this document.

Version compatibility

You can integrate and start using components developed using Talend Component Kit in Talend applications very easily.

As both the development framework and Talend applications evolve over time, you need to ensure compatibility between the components you develop and the versions of Talend applications that you are targeting, by making sure that you use the right version of Talend Component Kit.

Job run

Compatibility matrix

The version of Talend Component Kit you need to use to develop new components depends on the versions of the Talend applications in which these components will be integrated.

Talend product Talend Component Kit version

Talend Studio 8.8.8 (aka master)

latest release

Talend Studio 8.0.1

latest release QA approved

Talend Studio 7.3.1

Framework until 1.38.x

Talend Studio 7.2.1

Framework until 1.1.10

Talend Studio 7.1.1

Framework until 1.1.1

Talend Studio 7.0.1

Framework until 0.0.5

Talend Cloud

latest release QA and cloud teams approved

More recent versions of Talend Component Kit contain many fixes, improvements and features that help developing your components. However, they can cause some compatibility issues when deploying these components to older/different versions of Talend Studio and Talend Cloud. Choose the version of Talend Component Kit that best fits your needs.

Changing the Talend Component Kit version of your project

Creating a project using the Component Kit Starter always uses the latest release of Talend Component Kit.

However, you can manually change the version of Talend Component Kit directly in the generated project.

  1. Go to your IDE and access the project root .pom file.

  2. Look for the org.talend.sdk.component dependency nodes.

  3. Replace the version in the relevant nodes with the version that you need to use for your project.

You can use a Snapshot of the version under development using the -SNAPSHOT version and Sonatype snapshot repository.

Iterating on component development with Talend Studio

Integrate components you developed using Talend Component Kit to Talend Studio in a few steps. Also learn how to enable the developer and debugging modes to iterate on your component development.

Job run

Version compatibility

The version of Talend Component Kit you need to use to develop new components depends on the version of Talend Studio in which components will be integrated.

Refer to this document to learn about compatibility between Talend Component Kit and the different versions of Talend applications.

Installing the components

Learn how to build and deploy components to Talend Studio using Maven or Gradle Talend Component Kit plugins.

This can be done using the deploy-in-studio goal from your development environment.

If you are unfamiliar with component development, you can also follow this example to go through the entire process, from creating a project to using your new component in Talend Studio.

Configuring the component server

The Studio integration relies on the Component Server, that the Studio uses to gather data about components created using Talend Component Kit.

You can change the default configuration of component server by modifying the $STUDIO_HOME/configuration/config.ini file.

The following parameters are available:

Name Description Default

component.environment

Enables the developer mode when set to dev

-

component.debounce.timeout

Specifies the timeout (in milliseconds) before calling listeners in components Text fields

750

component.kit.skip

If set to true, the plugin is not enabled. It is useful if you don’t have any component developed with the framework.

false

component.java.arguments

Component server additional options

-

component.java.m2

Maven repository that the server uses to resolve components

Defaults to the global Studio configuration

component.java.coordinates

A list of comma-separated GAV (groupId:artifactId:version) of components to register

-

component.java.registry

A properties file with values matching component GAV (groupId:artifactId:version) registered at startup. Only use slashes (even on windows) in the path.

-

component.java.port

Sets the port to use for the server

random

component.server.extensions

A comma separated list of gav to locate the extensions.

-

components.server.beam.active

Active, if set to true, Beam support (Experimental). It requires Beam SDK Java core dependencies to be available.

false

component.server.jul.forceConsole

Adds a console handler to JUL to see logs in the console. This can be helpful in development because the formatting is clearer than the OSGi one in workspace/.metadata/.log.

It uses the java.util.logging.SimpleFormatter.format property to define its format. By default, it is %1$tb %1$td, %1$tY %1$tl:%1$tM:%1$tS %1$Tp %2$s%n%4$s: %5$s%6$s%n, but for development purposes [%4$s] %5$s%6$s%n is simpler and more readable.

false

Here is an example of a common developer configuration/config.ini file:

# use local .m2 instead of embedded studio one
maven.repository = global

# during development, see developer model part
component.environment = dev

# log into the console the component interactions - optional
component.server.jul.forceConsole = true
java.util.logging.SimpleFormatter.format = [%4$s] %5$s%6$s%n

Enabling the developer mode

The developer mode is especially useful to iterate on your component development and to avoid closing and restarting Talend Studio every time you make a change to a component. It adds a Talend Component Kit button in the main toolbar:

Studio Reload Button

When clicking this button, all components developed with the Talend Component Kit framework are reloaded. The cache is invalidated and the components refreshed.

You still need to add and remove the components to see the changes.

To enable it, simply set the component.environment parameter to dev in the config.ini configuration file of the component server.

Debugging your custom component in Talend Studio

Several methods allow you to debug custom components created with Talend Component Kit in Talend Studio.

Debugging the runtime or the Guess schema option of a component

  1. From your development tool, create a new Remote configuration, and copy the Command line arguments for running remote JVM field. For example, -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005, where:

    • the suspend parameter of the -agentlib argument specifies whether you want to suspend the debugged JVM until the debugger attaches to it. Possible values are n (no, default value) or y (yes).

    • the address parameter of the -agentlib argument is the port used for the remote configuration. Make sure this port is available.

      IntelliJ remote configuration
  2. Open Talend Studio.

  3. Create a new Job that uses the component you want to debug or open an existing one that already uses it.

  4. Go to the Run tab of the Job and select Use specific JVM arguments.

  5. Click New to add an argument.

  6. In the popup window, paste the arguments copied from the IDE.

    IntelliJ remote configuration
  7. Enter the corresponding debug mode:

    • To debug the runtime, run the Job and access the remote host configured in the IDE.

    • To debug the Guess schema option, click the Guess schema action button of the component and access the remote host configured in the IDE.

Debugging UI actions and validations

  1. From your development tool, create a new Remote configuration, and copy the Command line arguments for running remote JVM field. For example, -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005, where:

    • suspend defines whether you need to access the defined configuration to run the remote JVM. Possible values are n (no, default value) or y (yes).

    • address is the port used for the remote configuration. Make sure this port is available.

      IntelliJ remote configuration
  2. Access the installation directory of your Talend Sutdio.

  3. Open the .ini file corresponding to your Operating System. For example, TOS_DI-win-x86_64.ini.

  4. Paste the arguments copied from the IDE in a new line of the file.

    IntelliJ remote configuration
  5. Go to Talend Studio to use the component, and access the host host configured in the IDE.

Random port when running concurrent studio instances

If you run multiple Studio instances automatically in parallel, you can run into some issues with the random port computation. For example on a CI platform. For that purpose, you can create the $HOME/.talend/locks/org.talend.sdk.component.studio-integration.lock file.

Then, when a server starts, it acquires a lock on that file and prevents another server to get a port until it is started. It ensures that you can’t have two concurrent processes getting the same port allocated.

However, it is highly unlikely to happen on a desktop. In that case, forcing a different value through component.java.port in your config.ini file is a better solution for local installations.

Installing components using a CAR file

Components built using Talend Component Kit can be shared as component archives (.car). These CAR files are executable files allowing to easily deploy the components it contains to any compatible version of Talend Studio.

Component developers can generate .car files from their projects to share their components and make them available for other users, as detailed in this document.

This document assumes that you have a component archive (.car) file and need to deploy it to Talend Studio.

Deploying from the CAR file to Talend Studio

The component archive (.car) is executable and exposes the studio-deploy command which takes a Talend Studio home path as parameter. When executed, it installs the dependencies into the Studio and registers the component in your instance. For example:

# for a studio
java -jar mycomponent.car studio-deploy /path/to/my/studio
or
java -jar mycomponent.car studio-deploy --location /path/to/my/studio

# for a m2 provisioning
java -jar mycomponent.car maven-deploy /path/to/.m2/repository
or
java -jar mycomponent.car maven-deploy --location /path/to/.m2/repository

You can also upload the dependencies to your Nexus server using the following command:

java -jar mycomponent.car deploy-to-nexus --url <nexus url> --repo <repository name> --user <username> --pass <password> --threads <parallel threads number> --dir <temp directory>

In this command, Nexus URL and repository name are mandatory arguments. All other arguments are optional. If arguments contain spaces or special symbols, you need to quote the whole value of the argument. For example:

--pass "Y0u will \ not G4iess i' ^"

Deploying a component archive to a remote project from Talend Studio

Talend Studio allows you to share components you have created using Talend Component Kit to other users working on the same remote project.

Remote projects are available with Enterprise versions of Talend Studio only. Also, note that this feature has been removed in Studio since 7.3 release.

Make sure you are connected to a remote project and the artifact repository for component sharing has been properly configured.

  1. On the toolbar of the Studio main window, click or click File > Edit Project Properties from the menu bar to open the Project Settings dialog box.

  2. In the tree view of the dialog box, select Repository Share to open the corresponding view.

  3. Select the Propagate components update to Artifact Repository check box.

  4. In the Repository ID field, specify the artifact repository configured for component sharing, and then click Check connection to verify the connectivity.

  5. Click Apply and Close to validate the settings and close the dialog box.

  6. Create a folder named patches at the root of your Talend Studio installation directory, then copy the .car files of the components you want share to this folder.

  7. Restart your Talend Studio and connect to the remote project.

The components are deployed automatically to the repository and available in the Palette for other users when connected to a remote project with the same sharing repository configuration.

Troubleshooting

My custom component builds correctly but does not appear in Talend Studio, how to fix it? This issue can be caused by the icon specified in the component metadata.

  • Make sure to specify a custom icon for the component and the component family.

  • These custom icons must be in PNG format to be properly handled by Talend Studio.

  • Remove SVG parameters from the talend.component.server.icon.paths property in the HTTP server configuration. Refer to this section.

Learn more about defining custom icons for components in this document.

From Javajet to Talend Component Kit

From the version 7.0 of Talend Studio, Talend Component Kit becomes the recommended framework to use to develop components.

This framework is being introduced to ensure that newly developed components can be deployed and executed both in on-premise/local and cloud/big data environments.

From that new approach comes the need to provide a complete yet unique and compatible way of developing components.

With the Component Kit, custom components are entirely implemented in Java. To help you get started with a new custom component development project, a Starter is available. Using it, you will be able to generate the skeleton of your project. By importing this skeleton in a development tool, you can then implement the components layout and execution logic in Java.

Defining the component configuration

With the previous Javajet framework, metadata, widgets and configurable parts of a custom component were specified in XML. With the Component Kit, they are now defined in the <component_name><component_type>Configuration (for example, LoggerProcessorConfiguration) Java class of your development project.

Note that most of this configuration is transparent if you specified the Configuration Model of your components right before generating the project from the Starter.

Any undocumented feature or option is considered not supported by the Component Kit framework.

You can find examples of output in Studio or Cloud environments in the Gallery.

Widgets

Input/Text

Javajet

<PARAMETER
    NAME="CONFIG"
    FIELD="TEXT"
    NUM_ROW="10">
        <DEFAULT>""</DEFAULT>
</PARAMETER>

Component Kit

@Option
String config;
Password

Javajet

<PARAMETER
    NAME="PASSWORD"
    FIELD="PASSWORD"
    NUM_ROW="10"
    REQUIRED="true">

Component Kit

@Option
@Credential
String password;
Textarea

Javajet

<PARAMETER NAME="QUERY"
    FIELD="MEMO"
    NUM_ROW="1">
        <DEFAULT>""</DEFAULT>
</PARAMETER>

Component Kit

@Option
@Textarea
String query;
Integer

Javajet

<!-- There were no specific widget for number fields -->
<PARAMETER
    NAME="CONFIG"
    FIELD="TEXT"
    NUM_ROW="10">
        <DEFAULT>""</DEFAULT>
</PARAMETER>

Component Kit

@Option
@Documentation("This is a number")
public Integer number;
Checkbox

Javajet

<PARAMETER
    NAME="PRETTY_FORMAT"
    FIELD="CHECK"
    NUM_ROW="10">
        <DEFAULT>false</DEFAULT>
</PARAMETER>

Component Kit

@Option
Boolean pretty_format;
List

Javajet

<PARAMETER
    NAME="ACTION"
    FIELD="CLOSED_LIST"
    NUM_ROW="10">
    <ITEMS DEFAULT="1">
        <ITEM NAME="DELETE" VALUE="1" />
        <ITEM NAME="INSERT" VALUE="2" />
        <ITEM NAME="UPDATE" VALUE="3" />
    </ITEMS>
</PARAMETER>

Component Kit

@Option
@Proposable("valuesProvider")
String action;
/** service class */
@DynamicValues("valuesProvider")
public Values actions(){
  return new Values(asList(new Values.Item("1", "Delete"),
                    new Values.Item("2", "Insert"),
                    new Values.Item("3", "Update")));
}

or

Component Kit

@Option
ActionEnum action;

/** Define enum */
enum ActionEnum {
    Delete,
    Insert,
    Update
}
Suggestions

Javajet

<!-- There were no simple way to load proposals from service in javajet -->

Component Kit

@Option
@Suggestable(value = "loadModules", parameters = { "myconfig" })
@Documentation("module names are loaded using service")
public String moduleName;


// In Service class
@Suggestions("loadModules")
public SuggestionValues loadModules(@Option final MyConfig myconfig) { }
Table

Javajet

<!-- There were no simple way to select complex objects in javajet -->

Component Kit

@Option
List<MyObject> config;
Module List

Javajet

<PARAMETER NAME="DRIVER_JAR" FIELD="TABLE" NUM_ROW="3" NB_LINES="2" REQUIRED="true">
    <ITEMS>
        <ITEM NAME="JAR_NAME" FIELD="MODULE_LIST"  />
    </ITEMS>
</PARAMETER>

Component Kit

public class Driver implements Serializable {

    @ModuleList
    @Option
    private String path;

}

//define it in config class like this:
@Option
List<Driver> config;
Code

Javajet

<PARAMETERS>
    <PARAMETER NAME="CODE" FIELD="MEMO_JAVA" RAW="true" REQUIRED="false" NUM_ROW="10" NB_LINES="10">
        <DEFAULT>String foo = "bar";</DEFAULT>
    </PARAMETER>
</PARAMETERS>

Component Kit

@Code("java")
@Option
String code;
Schema

Javajet

<PARAMETER
    NAME="COLUMNS"
    FIELD="COLUMN_LIST"
    NUM_ROW="10"/>

Component Kit

@Option
@Structure
List<String> columns;

Validations

Property validation

Javajet

<!-- There were no url pattern validation in javajet -->

Component Kit

/** configuration class */
@Option
@Validable("url")
String config;

/** service class */
@AsyncValidation("url")
ValidationResult doValidate(String url) {
//validate the property
}
Property validation with Pattern

Javajet

<!-- There were no regex validation in javajet -->

Component Kit

/** configuration class */
@Option
@Pattern("/^[a-zA-Z\\-]+$/")
String username;
Data store validation

Javajet

<!-- There were no healthcheck javajet -->

Component Kit

@Datastore
@Checkable
public class config {
/** config ...*/
}

/** service class */
@HealthCheck
public HealthCheckStatus testConnection(){

//validate the connection
}

Binding properties

ActiveIf

Javajet

<PARAMETER
    NAME="AUTH_TYPE"
    FIELD="CLOSED_LIST"
    NUM_ROW="10">
    <ITEMS DEFAULT="NOAUTH">
        <ITEM NAME="NOAUTH" VALUE="NOAUTH" />
        <ITEM NAME="BASIC" VALUE="BASIC" />
        <ITEM NAME="BASIC" VALUE="OAUTH2" />
    </ITEMS>
</PARAMETER>

<PARAMETER
    NAME="LOGIN"
    FIELD="TEXT"
    NUM_ROW="20"
    SHOW_IF="AUTH_TYPE == 'BASIC'">
        <DEFAULT>"login"</DEFAULT>
</PARAMETER>

<PARAMETER
    NAME="LOGIN"
    FIELD="PASSWORD"
    NUM_ROW="20"
    SHOW_IF="AUTH_TYPE='BASIC'">
        <DEFAULT>"login"</DEFAULT>
</PARAMETER>

Component Kit

 enum AuthorizationType {
        NoAuth,
        Basic,
        oauth2
}

@Option
@Required
@Documentation("")
private AuthorizationType type = AuthorizationType.NoAuth;

@Option
@required
@ActiveIf(target = "type", value = "Basic")
@Documentation("Username for the basic authentication")
private String login;

@Option
@required
@credential
@ActiveIf(target = "type", value = "Basic")
@Documentation("password for the basic authentication")
private String password;

Return Variables

Return variables availability (1.51+) deprecates After Variables.

Javajet

<RETURNS>
    <RETURN NAME="QUERY" TYPE="id_String" AVAILABILITY="FLOW"/>
    <RETURN NAME="NAME_1_OF_AFTER_VARIABLE" TYPE="id_Integer" AVAILABILITY="AFTER"/>
</RETURNS>

AVAILABILITY can be :

  • AFTER : set after component finished.

  • FLOW : changed on row level.

Component Kit

@Slf4j
@Version(1)
@ReturnVariable(value = "QUERY", availability = FLOW, type = String.class, description = "Current row query")
@Processor(name = "Row")
@Documentation("JDBC Row component.")
public class JDBCRowProcessor implements Serializable {

    @RuntimeContext
    private transient RuntimeContextHolder context;

    @ElementListener
    public void elementListener(@Input final Record record,
                                @Output final OutputEmitter<Record> success) throws SQLException {
        ...

        if (context != null) {
            context.set("QUERY", configuration.getDataSet().getSqlQuery());
        }
    }
}

Return variables can be nested as below:

import org.talend.sdk.component.api.component.ReturnVariables;
import org.talend.sdk.component.api.component.ReturnVariables.ReturnVariable;
import org.talend.sdk.component.api.processor.Processor;


    @ReturnVariables({
            @ReturnVariable(value = "PROCESS_COUNT", type = Integer.class,
                            availability = ReturnVariable.AVAILABILITY.AFTER),
            @ReturnVariable(value = "MISC", type = String.class,
                            availability = ReturnVariable.AVAILABILITY.FLOW) })
    @Processor
    public class ClassWithReturnVariablesGroup {
        ...

        @PreDestroy
        public void close() {
            context.set("PROCESS_COUNT", counted);
        }

    }

After Variables

Javajet

   <RETURN NAME="NAME_1_OF_AFTER_VARIABLE" TYPE="id_Integer" AVAILABILITY="AFTER"/>
   <RETURN NAME="NAME_2_OF_AFTER_VARIABLE" TYPE="id_String" AVAILABILITY="AFTER"/>

Component Kit

import org.talend.sdk.component.api.component.AfterVariables.AfterVariableContainer;
import org.talend.sdk.component.api.component.AfterVariables.AfterVariable;

/**
* Possible types:
* Boolean.class, Byte.class, byte[].class, Character.class, Date.class, Double.class, Float.class,
* BigDecimal.class, Integer.class, Long.class, Object.class, Short.class, String.class, List.class
*/
@AfterVariable(value = "NAME_1_OF_AFTER_VARIABLE", description = "Some description", type = Integer.class)
@AfterVariable(value = "NAME_2_OF_AFTER_VARIABLE", description = "Custom variable description", type = String.class)
class Emitter {

    @AfterVariableContainer
    public Map<String, Object> afterVariables() {
        // .. code
    }

}

or

import org.talend.sdk.component.api.component.AfterVariables.AfterVariableContainer;
import org.talend.sdk.component.api.component.AfterVariables.AfterVariable;
import org.talend.sdk.component.api.component.AfterVariables;

@AfterVariables({
    @AfterVariable(value = "NAME_1_OF_AFTER_VARIABLE", description = "Some description", type = Integer.class),
    @AfterVariable(value = "NAME_2_OF_AFTER_VARIABLE", description = "Custom variable description", type = String.class)
})
class Emitter {
    @AfterVariableContainer
    public Map<String, Object> afterVariables() {
        // .. code
    }
}

Defining the runtime

Previously, the execution of a custom component was described through several Javajet files:

  • <component_name>_begin.javajet, containing the code required to initialize the component.

  • <component_name>_main.javajet, containing the code required to process each line of the incoming data.

  • <component_name>_end.javajet, containing the code required to end the processing and go to the following step of the execution.

With the Component Kit, the entire execution flow of a component is described through its main Java class <component_name><component_type> (for example, LoggerProcessor) and through services for reusable parts.

Component execution logic

Each type of component has its own execution logic. The same basic logic is applied to all components of the same type, and is then extended to implement each component specificities. The project generated from the starter already contains the basic logic for each component.

Talend Component Kit framework relies on several primitive components.

All components can use @PostConstruct and @PreDestroy annotations to initialize or release some underlying resource at the beginning and the end of a processing.

In distributed environments, class constructor are called on cluster manager nodes. Methods annotated with @PostConstruct and @PreDestroy are called on worker nodes. Thus, partition plan computation and pipeline tasks are performed on different nodes.

Deployment diagram

1 The created task is a JAR file containing class information, which describes the pipeline (flow) that should be processed in cluster.
2 During the partition plan computation step, the pipeline is analyzed and split into stages. The cluster manager node instantiates mappers/processors, gets estimated data size using mappers, and splits created mappers according to the estimated data size.
All instances are then serialized and sent to the worker node.
3 Serialized instances are received and deserialized. Methods annotated with @PostConstruct are called. After that, pipeline execution starts. The @BeforeGroup annotated method of the processor is called before processing the first element in chunk.
After processing the number of records estimated as chunk size, the @AfterGroup annotated method of the processor is called. Chunk size is calculated depending on the environment the pipeline is processed by. Once the pipeline is processed, methods annotated with @PreDestroy are called.
All the methods managed by the framework must be public. Private methods are ignored.

Driver processing workflow

Worker processing workflow

The framework is designed to be as declarative as possible but also to stay extensible by not using fixed interfaces or method signatures. This allows to incrementally add new features of the underlying implementations.

Main changes

To ensure that the Cloud-compatible approach of the Component Kit framework is respected, some changes were introduced on the implementation side, including:

  • The File mode is no longer supported. You can still work with URIs and remote storage systems to use files. The file collection must be handled at the component implementation level.

  • The input and output connections between two components can only be of the Flow or Reject types. Other types of connections are not supported.

  • Every Output component must have a corresponding Input component and use a dataset. All datasets must use a datastore.

Scroll to top