Talend Component scanning is based on plugins. To make sure that plugins can be developed in parallel and avoid conflicts, they need to be isolated (component or group of components in a single jar/plugin).
Multiple options are available:
-
Graph classloading: this option allows you to link the plugins and dependencies together dynamically in any direction.
For example, the graph classloading can be illustrated by OSGi containers. -
Tree classloading: a shared classloader inherited by plugin classloaders. However, plugin classloader classes are not seen by the shared classloader, nor by other plugins.
For example, the tree classloading is commonly used by Servlet containers where plugins are web applications. -
Flat classpath: listed for completeness but rejected by design because it doesn’t comply with this requirement.
In order to avoid much complexity added by this layer, Talend Component Kit relies on a tree classloading. The advantage is that you don’t need to define the relationship with other plugins/dependencies, because it is built-in.
Here is a representation of this solution:
The shared area contains Talend Component Kit API, which only contains by default the classes shared by the plugins.
Then, each plugin is loaded with its own classloader and dependencies.
Packaging a plugin
This section explains the overall way to handle dependencies but the Talend Maven plugin provides a shortcut for that. |
A plugin is a JAR file that was enriched with the list of its dependencies. By default, Talend Component Kit runtime is able to read the output of maven-dependency-plugin
in TALEND-INF/dependencies.txt
. You just need to make sure that your component defines the following plugin:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>3.0.2</version>
<executions>
<execution>
<id>create-TALEND-INF/dependencies.txt</id>
<phase>process-resources</phase>
<goals>
<goal>list</goal>
</goals>
<configuration>
<outputFile>${project.build.outputDirectory}/TALEND-INF/dependencies.txt</outputFile>
</configuration>
</execution>
</executions>
</plugin>
Once build, check the JAR file and look for the following lines:
$ unzip -p target/mycomponent-1.0.0-SNAPSHOT.jar TALEND-INF/dependencies.txt
The following files have been resolved:
org.talend.sdk.component:component-api:jar:1.0.0-SNAPSHOT:provided
org.apache.geronimo.specs:geronimo-annotation_1.3_spec:jar:1.0:provided
org.superbiz:awesome-project:jar:1.2.3:compile
junit:junit:jar:4.12:test
org.hamcrest:hamcrest-core:jar:1.3:test
What is important to see is the scope related to the artifacts:
-
The APIs (
component-api
andgeronimo-annotation_1.3_spec
) areprovided
because you can consider them to be there when executing (they come with the framework). -
Your specific dependencies (
awesome-project
in the example above) are marked ascompile
: they are included as needed dependencies by the framework (note that usingruntime
works too). -
the other dependencies are ignored. For example,
test
dependencies.
Packaging an application
Even if a flat classpath deployment is possible, it is not recommended because it would then reduce the capabilities of the components.
Dependencies
The way the framework resolves dependencies is based on a local Maven repository layout. As a quick reminder, it looks like:
.
├── groupId1
│ └── artifactId1
│ ├── version1
│ │ └── artifactId1-version1.jar
│ └── version2
│ └── artifactId1-version2.jar
└── groupId2
└── artifactId2
└── version1
└── artifactId2-version1.jar
This is all the layout the framework uses. The logic converts t-uple {groupId, artifactId, version, type (jar)}
to the path in the repository.
Talend Component Kit runtime has two ways to find an artifact:
-
From the file system based on a configured Maven 2 repository.
-
From a fat JAR (uber JAR) with a nested Maven repository under
MAVEN-INF/repository
.
The first option uses either ${user.home}/.m2/repository
default) or a specific path configured when creating a ComponentManager
.
The nested repository option needs some configuration during the packaging to ensure the repository is correctly created.
Creating a nested Maven repository with maven-shade-plugin
To create the nested MAVEN-INF/repository
repository, you can use the nested-maven-repository
extension:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.0.0</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer implementation="org.talend.sdk.component.container.maven.shade.ContainerDependenciesTransformer">
<session>${session}</project>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
<dependencies>
<dependency>
<groupId>org.talend.sdk.component</groupId>
<artifactId>nested-maven-repository</artifactId>
<version>${the.plugin.version}</version>
</dependency>
</dependencies>
</plugin>
Listing needed plugins
Plugins are usually programmatically registered. If you want to make some of them automatically available, you need to generate a TALEND-INF/plugins.properties
file that maps a plugin name to coordinates found with the Maven mechanism described above.
You can enrich maven-shade-plugin
to do it:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.0.0</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer implementation="org.talend.sdk.component.container.maven.shade.PluginTransformer">
<session>${session}</project>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
<dependencies>
<dependency>
<groupId>org.talend.sdk.component</groupId>
<artifactId>nested-maven-repository</artifactId>
<version>${the.plugin.version}</version>
</dependency>
</dependencies>
</plugin>
maven-shade-plugin extensions
Here is a final job/application bundle based on maven-shade-plugin:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.0.0</version>
<configuration>
<createDependencyReducedPom>false</createDependencyReducedPom>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/.SF</exclude>
<exclude>META-INF/.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<shadedClassifierName>shaded</shadedClassifierName>
<transformers>
<transformer
implementation="org.talend.sdk.component.container.maven.shade.ContainerDependenciesTransformer">
<session>${session}</session>
<userArtifacts>
<artifact>
<groupId>org.talend.sdk.component</groupId>
<artifactId>sample-component</artifactId>
<version>1.0</version>
<type>jar</type>
</artifact>
</userArtifacts>
</transformer>
<transformer implementation="org.talend.sdk.component.container.maven.shade.PluginTransformer">
<session>${session}</session>
<userArtifacts>
<artifact>
<groupId>org.talend.sdk.component</groupId>
<artifactId>sample-component</artifactId>
<version>1.0</version>
<type>jar</type>
</artifact>
</userArtifacts>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
<dependencies>
<dependency>
<groupId>org.talend.sdk.component</groupId>
<artifactId>nested-maven-repository-maven-plugin</artifactId>
<version>${the.version}</version>
</dependency>
</dependencies>
</plugin>
The configuration unrelated to transformers depends on your application. |
ContainerDependenciesTransformer
embeds a Maven repository and PluginTransformer
to create a file that lists (one per line) artifacts (representing plugins).
Both transformers share most of their configuration:
-
session
: must be set to${session}
. This is used to retrieve dependencies. -
scope
: a comma-separated list of scopes to include in the artifact filtering (note that the default will rely onprovided
but you can replace it bycompile
,runtime
,runtime+compile
,runtime+system
ortest
). -
include
: a comma-separated list of artifacts to include in the artifact filtering. -
exclude
: a comma-separated list of artifacts to exclude in the artifact filtering. -
userArtifacts
: a list of artifacts (groupId, artifactId, version, type - optional, file - optional for plugin transformer, scope - optional) which can be forced inline. This parameter is mainly useful forPluginTransformer
. -
includeTransitiveDependencies
: should transitive dependencies of the components be included. Set totrue
by default. -
includeProjectComponentDependencies
: should project component dependencies be included. Set tofalse
by default. It is not needed when a job project uses isolation for components. -
userArtifacts
: set of component artifacts to include.
With the component tooling, it is recommended to keep default locations. Also if you need to use project dependencies, you can need to refactor your project structure to ensure component isolation. Talend Component Kit lets you handle that part but the recommended practice is to use userArtifacts for the components instead of project <dependencies> .
|
ContainerDependenciesTransformer
ContainerDependenciesTransformer
specific configuration is as follows:
-
repositoryBase
: base repository location (MAVEN-INF/repository
by default). -
ignoredPaths
: a comma-separated list of folders not to create in the output JAR. This is common for folders already created by other transformers/build parts.
PluginTransformer
ContainerDependenciesTransformer
specific configuration is the following one:
-
pluginListResource
: base repository location (default to TALEND-INF/plugins.properties`).
For example, if you want to list only the plugins you use, you can configure this transformer as follows:
<transformer implementation="org.talend.sdk.component.container.maven.shade.PluginTransformer">
<session>${session}</session>
<include>org.talend.sdk.component:component-x,org.talend.sdk.component:component-y,org.talend.sdk.component:component-z</include>
</transformer>
Component scanning rules and default exclusions
The framework uses two kind of filterings when scanning your component. One based on the JAR name
and one based on the package name. Make sure that your component definitions (including services)
are in a scanned module if they are not registered manually using ComponentManager.instance().addPlugin()
, and that the component package is not excluded.
Jars Scanning
To find components the framework can scan the classpath but in this case, to avoid to scan the whole classpath which can be really huge an impacts a lot the startup time, several jars are excluded out of the box.
These jars use the following prefix:
-
ApacheJMeter
-
FastInfoset
-
HdrHistogram
-
HikariCP
-
PDFBox
-
RoaringBitmap-
-
XmlSchema-
-
accessors-smart
-
activation-
-
activeio-
-
activemq-
-
aeron
-
aether-
-
agrona
-
akka-
-
animal-sniffer-annotation
-
annotation
-
ant-
-
antlr-
-
antlr4-
-
aopalliance-
-
apache-el
-
apache-mime4j
-
apacheds-
-
api-asn1-
-
api-common-
-
api-util-
-
apiguardian-api-
-
app-
-
archaius-core
-
args4j-
-
arquillian-
-
asciidoctorj-
-
asm-
-
aspectj
-
async-http-client-
-
auto-value-
-
autoschema-
-
avalon-framework-
-
avro-
-
avro4s-
-
awaitility-
-
aws-
-
axis-
-
axis2-
-
base64-
-
batchee-jbatch
-
batik-
-
bcmail
-
bcpkix
-
bcprov-
-
beam-model-
-
beam-runners-
-
beam-sdks-
-
bigtable-client-
-
bigtable-protos-
-
boilerpipe-
-
bonecp
-
bootstrap.jar
-
brave-
-
bsf-
-
build-link
-
bval
-
byte-buddy
-
c3p0-
-
cache
-
carrier
-
cassandra-driver-core
-
catalina-
-
catalina.jar
-
cats
-
cdi-
-
cglib-
-
charsets.jar
-
chill
-
classindex
-
classmate
-
classutil
-
classycle
-
cldrdata
-
commands-
-
common-
-
commons-
-
component-api
-
component-form
-
component-runtime
-
component-server
-
component-spi
-
component-studio
-
components-adapter-beam
-
components-api
-
components-common
-
compress-lzf
-
config
-
constructr
-
container-core
-
contenttype
-
coverage-agent
-
cryptacular-
-
cssparser-
-
curator-
-
curvesapi-
-
cxf-
-
daikon
-
databinding
-
dataquality
-
dataset-
-
datastore-
-
debugger-agent
-
deltaspike-
-
deploy.jar
-
derby-
-
derbyclient-
-
derbynet-
-
dnsns
-
dom4j
-
draw2d
-
easymock-
-
ecj-
-
eclipselink-
-
ehcache-
-
el-api
-
enumeratum
-
enunciate-core-annotations
-
error_prone_annotations
-
expressions
-
fastutil
-
feign-core
-
feign-hystrix
-
feign-slf4j
-
filters-helpers
-
findbugs-
-
fluent-hc
-
fluentlenium-core
-
fontbox
-
freemarker-
-
fusemq-leveldb-
-
gax-
-
gcsio-
-
gef-
-
geocoder
-
geronimo-
-
gmbal
-
google-
-
gpars-
-
gragent.jar
-
graph
-
grizzled-scala
-
grizzly-
-
groovy-
-
grpc-
-
gson-
-
guava-
-
guice-
-
h2-
-
hadoop-
-
hamcrest-
-
hawtbuf-
-
hawtdispatch-
-
hawtio-
-
hawtjni-runtime
-
help-
-
hibernate-
-
hk2-
-
howl-
-
hsqldb-
-
htmlunit-
-
htrace-
-
httpclient-
-
httpcore-
-
httpmime
-
hystrix
-
iban4j-
-
icu4j-
-
idb-
-
idea_rt.jar
-
instrumentation-api
-
ion-java
-
isoparser-
-
istack-commons-runtime-
-
ivy-
-
j2objc-annotations
-
jBCrypt
-
jaccess
-
jackcess-
-
jackson-
-
janino-
-
jansi-
-
jasper-el.jar
-
jasper.jar
-
jasypt-
-
java-atk-wrapper
-
java-libpst-
-
java-support-
-
java-xmlbuilder-
-
javacsv
-
javaee-
-
javaee-api
-
javassist-
-
javaws.jar
-
javax.
-
jaxb-
-
jaxp-
-
jbake-
-
jboss-
-
jbossall-
-
jbosscx-
-
jbossjts-
-
jbosssx-
-
jcache
-
jce.jar
-
jcip-annotations
-
jcl-over-slf4j-
-
jcommander-
-
jdbcdslog-1
-
jempbox
-
jersey-
-
jets3t
-
jettison-
-
jetty-
-
jface
-
jfairy
-
jffi
-
jfr.jar
-
jfxrt.jar
-
jfxswt
-
jhighlight
-
jjwt
-
jline
-
jmatio-
-
jmdns-
-
jmespath-
-
jms
-
jmustache
-
jna-
-
jnr-
-
jobs-
-
joda-convert
-
joda-time-
-
johnzon-
-
jolokia-
-
jopt-simple
-
jruby-
-
json-
-
json4s-
-
jsonb-api
-
jsoup-
-
jsp-api
-
jsr
-
jsse.jar
-
jta
-
jul-to-slf4j-
-
juli-
-
junit-
-
junit5-
-
juniversalchardet
-
junrar-
-
jwt
-
jython
-
kafka
-
kahadb-
-
kotlin-runtime
-
kryo
-
leveldb
-
libphonenumber
-
lift-json
-
lmdbjava
-
localedata
-
log4j-
-
logback
-
logging-event-layout
-
logkit-
-
lombok
-
lucene
-
lz4
-
machinist
-
macro-compat
-
mail-
-
management-
-
mapstruct-
-
maven-
-
mbean-annotation-api-
-
meecrowave-
-
mesos-
-
metadata-extractor-
-
metrics-
-
microprofile-config-api-
-
mimepull-
-
mina-
-
minlog
-
mockito-core
-
mqtt-client-
-
multitenant-core
-
multiverse-core-
-
mx4j-
-
myfaces-
-
mysql-connector-java-
-
nashorn
-
neethi-
-
neko-htmlunit
-
nekohtml-
-
netflix
-
netty-
-
nimbus-jose-jwt
-
objenesis-
-
okhttp
-
okio
-
opencensus-
-
openjpa-
-
openmdx-
-
opennlp-
-
opensaml-
-
opentest4j-
-
openwebbeans-
-
openws-
-
ops4j-
-
org.apache.aries
-
org.apache.commons
-
org.apache.log4j
-
org.eclipse.
-
org.junit.
-
org.osgi.core-
-
org.osgi.enterprise
-
org.talend
-
orient-commons-
-
orientdb-core-
-
orientdb-nativeos-
-
oro-
-
osgi
-
paranamer
-
parquet
-
pax-url
-
pdfbox
-
play
-
plexus-
-
plugin.jar
-
poi-
-
postgresql
-
preferences-
-
prefixmapper
-
proto-
-
protobuf-
-
py4j-
-
pyrolite-
-
qdox-
-
quartz-2
-
quartz-openejb-
-
reactive-streams
-
reflectasm-
-
reflections
-
regexp-
-
registry-
-
resources.jar
-
rhino
-
ribbon
-
rmock-
-
rome
-
routes-compiler
-
routines
-
rt.jar
-
runners
-
runtime-
-
rxjava
-
rxnetty
-
saaj-
-
sac-
-
scala
-
scalap
-
scalatest
-
scannotation-
-
selenium
-
serializer-
-
serp-
-
service-common
-
servlet-api-
-
servo-
-
shaded
-
shapeless
-
shrinkwrap-
-
sisu-guice
-
sisu-inject
-
slf4j-
-
slick
-
smack-
-
smackx-
-
snakeyaml-
-
snappy-
-
spark-
-
specs2
-
spring-
-
sshd-
-
ssl-config-core
-
stax-api-
-
stax2-api-
-
stream
-
sunec.jar
-
sunjce_provider
-
sunpkcs11
-
surefire-
-
swagger-
-
swizzle-
-
sxc-
-
system-rules
-
tachyon-
-
tagsoup-
-
talend-icon
-
test-agent
-
test-interface
-
testng-
-
threetenbp
-
tika-
-
tomcat
-
tomee-
-
tools.jar
-
twirl
-
twitter4j-
-
tyrex
-
uncommons
-
unused
-
util
-
validation-api-
-
velocity-
-
wagon-
-
wandou
-
webbeans-
-
websocket
-
woodstox-core
-
workbench
-
ws-commons-util-
-
wsdl4j-
-
wss4j-
-
wstx-asl-
-
xalan-
-
xbean-
-
xercesImpl-
-
xlsx-streamer-
-
xml-apis-
-
xml-resolver-
-
xmlbeans-
-
xmlenc-
-
xmlgraphics-
-
xmlpcore
-
xmlpull-
-
xmlrpc-
-
xmlschema-
-
xmlsec-
-
xmltooling-
-
xmlunit-
-
xstream-
-
xz-
-
zipfs.jar
-
zipkin-
-
ziplock-
-
zkclient
-
zookeeper-
Package Scanning
Since the framework can be used in the case of fatjars or shades, and because it still uses scanning, it is important to ensure we don’t scan the whole classes for performances reason.
Therefore, the following packages are ignored:
-
avro.shaded
-
com.codehale.metrics
-
com.ctc.wstx
-
com.datastax.driver.core
-
com.fasterxml.jackson.annotation
-
com.fasterxml.jackson.core
-
com.fasterxml.jackson.databind
-
com.fasterxml.jackson.dataformat
-
com.fasterxml.jackson.module
-
com.google.common
-
com.google.thirdparty
-
com.ibm.wsdl
-
com.jcraft.jsch
-
com.kenai.jffi
-
com.kenai.jnr
-
com.sun.istack
-
com.sun.xml.bind
-
com.sun.xml.messaging.saaj
-
com.sun.xml.txw2
-
com.thoughtworks
-
io.jsonwebtoken
-
io.netty
-
io.swagger.annotations
-
io.swagger.config
-
io.swagger.converter
-
io.swagger.core
-
io.swagger.jackson
-
io.swagger.jaxrs
-
io.swagger.model
-
io.swagger.models
-
io.swagger.util
-
javax
-
jnr
-
junit
-
net.sf.ehcache
-
net.shibboleth.utilities.java.support
-
org.aeonbits.owner
-
org.apache.activemq
-
org.apache.beam
-
org.apache.bval
-
org.apache.camel
-
org.apache.catalina
-
org.apache.commons.beanutils
-
org.apache.commons.cli
-
org.apache.commons.codec
-
org.apache.commons.collections
-
org.apache.commons.compress
-
org.apache.commons.dbcp2
-
org.apache.commons.digester
-
org.apache.commons.io
-
org.apache.commons.jcs.access
-
org.apache.commons.jcs.admin
-
org.apache.commons.jcs.auxiliary
-
org.apache.commons.jcs.engine
-
org.apache.commons.jcs.io
-
org.apache.commons.jcs.utils
-
org.apache.commons.lang
-
org.apache.commons.lang3
-
org.apache.commons.logging
-
org.apache.commons.pool2
-
org.apache.coyote
-
org.apache.cxf
-
org.apache.geronimo.javamail
-
org.apache.geronimo.mail
-
org.apache.geronimo.osgi
-
org.apache.geronimo.specs
-
org.apache.http
-
org.apache.jcp
-
org.apache.johnzon
-
org.apache.juli
-
org.apache.logging.log4j.core
-
org.apache.logging.log4j.jul
-
org.apache.logging.log4j.util
-
org.apache.logging.slf4j
-
org.apache.meecrowave
-
org.apache.myfaces
-
org.apache.naming
-
org.apache.neethi
-
org.apache.openejb
-
org.apache.openjpa
-
org.apache.oro
-
org.apache.tomcat
-
org.apache.tomee
-
org.apache.velocity
-
org.apache.webbeans
-
org.apache.ws
-
org.apache.wss4j
-
org.apache.xbean
-
org.apache.xml
-
org.apache.xml.resolver
-
org.bouncycastle
-
org.codehaus.jackson
-
org.codehaus.stax2
-
org.codehaus.swizzle.Grep
-
org.codehaus.swizzle.Lexer
-
org.cryptacular
-
org.eclipse.jdt.core
-
org.eclipse.jdt.internal
-
org.fusesource.hawtbuf
-
org.h2
-
org.hamcrest
-
org.hsqldb
-
org.jasypt
-
org.jboss.marshalling
-
org.joda.time
-
org.jose4j
-
org.junit
-
org.jvnet.mimepull
-
org.metatype.sxc
-
org.objectweb.asm
-
org.objectweb.howl
-
org.openejb
-
org.opensaml
-
org.slf4j
-
org.swizzle
-
org.terracotta.context
-
org.terracotta.entity
-
org.terracotta.modules.ehcache
-
org.terracotta.statistics
-
org.tukaani
-
org.yaml.snakeyaml
-
serp
it is not recommanded but possible to add in your plugin module a
TALEND-INF/scanning.properties file with classloader.includes and
classloader.excludes entries to refine the scanning with custom rules.
In such a case, exclusions win over inclusions.
|