Component Loading

Talend Component scanning is based on plugins. To make sure that plugins can be developed in parallel and avoid conflicts, they need to be isolated (component or group of components in a single jar/plugin).

Multiple options are available:

  • Graph classloading: this option allows you to link the plugins and dependencies together dynamically in any direction.
    For example, the graph classloading can be illustrated by OSGi containers.

  • Tree classloading: a shared classloader inherited by plugin classloaders. However, plugin classloader classes are not seen by the shared classloader, nor by other plugins.
    For example, the tree classloading is commonly used by Servlet containers where plugins are web applications.

  • Flat classpath: listed for completeness but rejected by design because it doesn’t comply with this requirement.

In order to avoid much complexity added by this layer, Talend Component Kit relies on a tree classloading. The advantage is that you don’t need to define the relationship with other plugins/dependencies, because it is built-in.

Here is a representation of this solution:

Classloader layout

The shared area contains Talend Component Kit API, which only contains by default the classes shared by the plugins.

Then, each plugin is loaded with its own classloader and dependencies.

Packaging a plugin

This section explains the overall way to handle dependencies but the Talend Maven plugin provides a shortcut for that.

A plugin is a JAR file that was enriched with the list of its dependencies. By default, Talend Component Kit runtime is able to read the output of maven-dependency-plugin in TALEND-INF/dependencies.txt. You just need to make sure that your component defines the following plugin:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-dependency-plugin</artifactId>
  <version>3.0.2</version>
  <executions>
    <execution>
      <id>create-TALEND-INF/dependencies.txt</id>
      <phase>process-resources</phase>
      <goals>
        <goal>list</goal>
      </goals>
      <configuration>
        <outputFile>${project.build.outputDirectory}/TALEND-INF/dependencies.txt</outputFile>
      </configuration>
    </execution>
  </executions>
</plugin>

Once build, check the JAR file and look for the following lines:

$ unzip -p target/mycomponent-1.0.0-SNAPSHOT.jar TALEND-INF/dependencies.txt

The following files have been resolved:
   org.talend.sdk.component:component-api:jar:1.0.0-SNAPSHOT:provided
   org.apache.geronimo.specs:geronimo-annotation_1.3_spec:jar:1.0:provided
   org.superbiz:awesome-project:jar:1.2.3:compile
   junit:junit:jar:4.12:test
   org.hamcrest:hamcrest-core:jar:1.3:test

What is important to see is the scope related to the artifacts:

  • The APIs (component-api and geronimo-annotation_1.3_spec) are provided because you can consider them to be there when executing (they come with the framework).

  • Your specific dependencies (awesome-project in the example above) are marked as compile: they are included as needed dependencies by the framework (note that using runtime works too).

  • the other dependencies are ignored. For example, test dependencies.

Packaging an application

Even if a flat classpath deployment is possible, it is not recommended because it would then reduce the capabilities of the components.

Dependencies

The way the framework resolves dependencies is based on a local Maven repository layout. As a quick reminder, it looks like:

.
├── groupId1
│   └── artifactId1
│       ├── version1
│       │   └── artifactId1-version1.jar
│       └── version2
│           └── artifactId1-version2.jar
└── groupId2
    └── artifactId2
        └── version1
            └── artifactId2-version1.jar

This is all the layout the framework uses. The logic converts t-uple {groupId, artifactId, version, type (jar)} to the path in the repository.

Talend Component Kit runtime has two ways to find an artifact:

  • From the file system based on a configured Maven 2 repository.

  • From a fat JAR (uber JAR) with a nested Maven repository under MAVEN-INF/repository.

The first option uses either ${user.home}/.m2/repository (default) or a specific path configured when creating a ComponentManager. The nested repository option needs some configuration during the packaging to ensure the repository is correctly created.

Creating a nested Maven repository with maven-shade-plugin

To create the nested MAVEN-INF/repository repository, you can use the nested-maven-repository extension:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-shade-plugin</artifactId>
  <version>3.2.1</version>
  <executions>
    <execution>
      <phase>package</phase>
      <goals>
        <goal>shade</goal>
      </goals>
      <configuration>
        <transformers>
          <transformer implementation="org.talend.sdk.component.container.maven.shade.ContainerDependenciesTransformer">
            <session>${session}</session>
          </transformer>
        </transformers>
      </configuration>
    </execution>
  </executions>
  <dependencies>
    <dependency>
      <groupId>org.talend.sdk.component</groupId>
      <artifactId>nested-maven-repository</artifactId>
      <version>${the.plugin.version}</version>
    </dependency>
  </dependencies>
</plugin>

Listing needed plugins

Plugins are usually programmatically registered. If you want to make some of them automatically available, you need to generate a TALEND-INF/plugins.properties file that maps a plugin name to coordinates found with the Maven mechanism described above.

You can enrich maven-shade-plugin to do it:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-shade-plugin</artifactId>
  <version>3.2.1</version>
  <executions>
    <execution>
      <phase>package</phase>
      <goals>
        <goal>shade</goal>
      </goals>
      <configuration>
        <transformers>
          <transformer implementation="org.talend.sdk.component.container.maven.shade.PluginTransformer">
            <session>${session}</session>
          </transformer>
        </transformers>
      </configuration>
    </execution>
  </executions>
  <dependencies>
    <dependency>
      <groupId>org.talend.sdk.component</groupId>
      <artifactId>nested-maven-repository</artifactId>
      <version>${the.plugin.version}</version>
    </dependency>
  </dependencies>
</plugin>

maven-shade-plugin extensions

Here is a final job/application bundle based on maven-shade-plugin:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-shade-plugin</artifactId>
  <version>3.2.1</version>
  <configuration>
    <createDependencyReducedPom>false</createDependencyReducedPom>
    <filters>
      <filter>
        <artifact>*:*</artifact>
        <excludes>
          <exclude>META-INF/.SF</exclude>
          <exclude>META-INF/.DSA</exclude>
          <exclude>META-INF/*.RSA</exclude>
        </excludes>
      </filter>
    </filters>
  </configuration>
  <executions>
    <execution>
      <phase>package</phase>
      <goals>
        <goal>shade</goal>
      </goals>
      <configuration>
        <shadedClassifierName>shaded</shadedClassifierName>
        <transformers>
          <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
          <transformer
              implementation="org.talend.sdk.component.container.maven.shade.ContainerDependenciesTransformer">
            <session>${session}</session>
            <userArtifacts>
              <artifact>
                <groupId>org.talend.sdk.component</groupId>
                <artifactId>sample-component</artifactId>
                <version>1.0</version>
                <type>jar</type>
              </artifact>
            </userArtifacts>
          </transformer>
          <transformer implementation="org.talend.sdk.component.container.maven.shade.PluginTransformer">
            <session>${session}</session>
            <userArtifacts>
              <artifact>
                <groupId>org.talend.sdk.component</groupId>
                <artifactId>sample-component</artifactId>
                <version>1.0</version>
                <type>jar</type>
              </artifact>
            </userArtifacts>
          </transformer>
        </transformers>
      </configuration>
    </execution>
  </executions>
  <dependencies>
    <dependency>
      <groupId>org.talend.sdk.component</groupId>
      <artifactId>nested-maven-repository</artifactId>
      <version>${the.version}</version>
    </dependency>
  </dependencies>
</plugin>
The configuration unrelated to transformers depends on your application.

ContainerDependenciesTransformer embeds a Maven repository and PluginTransformer to create a file that lists (one per line) artifacts (representing plugins).

Both transformers share most of their configuration:

  • session: must be set to ${session}. This is used to retrieve dependencies.

  • scope: a comma-separated list of scopes to include in the artifact filtering (note that the default will rely on provided but you can replace it by compile, runtime, runtime+compile, runtime+system or test).

  • include: a comma-separated list of artifacts to include in the artifact filtering.

  • exclude: a comma-separated list of artifacts to exclude in the artifact filtering.

  • userArtifacts: set of artifacts to include (groupId, artifactId, version, type - optional, file - optional for plugin transformer, scope - optional) which can be forced inline. This parameter is mainly useful for PluginTransformer.

  • includeTransitiveDependencies: should transitive dependencies of the components be included. Set to true by default. It is active for userArtifacts.

  • includeProjectComponentDependencies: should component project dependencies be included. Set to false by default. It is not needed when a job project uses isolation for components.

With the component tooling, it is recommended to keep default locations.
Also if you need to use project dependencies, you can need to refactor your project structure to ensure component isolation.
Talend Component Kit lets you handle that part but the recommended practice is to use userArtifacts for the components instead of project <dependencies>.

ContainerDependenciesTransformer

ContainerDependenciesTransformer specific configuration is as follows:

  • repositoryBase: base repository location (MAVEN-INF/repository by default).

  • ignoredPaths: a comma-separated list of folders not to create in the output JAR. This is common for folders already created by other transformers/build parts.

PluginTransformer

ContainerDependenciesTransformer specific configuration is the following one:

  • pluginListResource: base repository location (default to TALEND-INF/plugins.properties).

For example, if you want to list only the plugins you use, you can configure this transformer as follows:

<transformer implementation="org.talend.sdk.component.container.maven.shade.PluginTransformer">
  <session>${session}</session>
  <include>org.talend.sdk.component:component-x,org.talend.sdk.component:component-y,org.talend.sdk.component:component-z</include>
</transformer>

Component scanning rules and default exclusions

The framework uses two kind of filterings when scanning your component. One based on the JAR content - the presence of TALEND-INF/dependencies.txt and one based on the package name. Make sure that your component definitions (including services) are in a scanned module if they are not registered manually using ComponentManager.instance().addPlugin(), and that the component package is not excluded.

Package Scanning

Since the framework can be used in the case of fatjars or shades, and because it still uses scanning, it is important to ensure we don’t scan the whole classes for performances reason.

Therefore, the following packages are ignored:

  • avro.shaded

  • com.codehale.metrics

  • com.ctc.wstx

  • com.datastax.driver

  • com.fasterxml.jackson

  • com.google.common

  • com.google.thirdparty

  • com.ibm.wsdl

  • com.jcraft.jsch

  • com.kenai

  • com.sun.istack

  • com.sun.xml

  • com.talend.shaded

  • com.thoughtworks

  • io.jsonwebtoken

  • io.netty

  • io.swagger

  • javax

  • jnr

  • junit

  • net.sf.ehcache

  • net.shibboleth

  • org.aeonbits.owner

  • org.apache

  • org.bouncycastle

  • org.codehaus

  • org.cryptacular

  • org.eclipse

  • org.fusesource

  • org.h2

  • org.hamcrest

  • org.hsqldb

  • org.jasypt

  • org.jboss

  • org.joda

  • org.jose4j

  • org.junit

  • org.jvnet

  • org.metatype

  • org.objectweb

  • org.openejb

  • org.opensaml

  • org.slf4j

  • org.swizzle

  • org.terracotta

  • org.tukaani

  • org.yaml

  • serp

it is not recommanded but possible to add in your plugin module a TALEND-INF/scanning.properties file with classloader.includes and classloader.excludes entries to refine the scanning with custom rules. In such a case, exclusions win over inclusions.
Scroll to top