Talend Component Testing Documentation

Best practises

this part is mainly around tools usable with JUnit. You can use most of these techniques with TestNG as well, check out the documentation if you need to use TestNG.

Parameterized tests

This is a great solution to repeat the same test multiple times. Overall idea is to define a test scenario (I test function F) and to make the input/output data dynamic.

JUnit 4

Here is an example. Let’s assume we have this test which validates the connection URI using ConnectionService:

public class MyConnectionURITest {
    @Test
    public void checkMySQL() {
        assertTrue(new ConnectionService().isValid("jdbc:mysql://localhost:3306/mysql"));
    }

    @Test
    public void checkOracle() {
        assertTrue(new ConnectionService().isValid("jdbc:oracle:thin:@//myhost:1521/oracle"));
    }
}

We clearly identify the test method is always the same except the value. It can therefore be rewritter using JUnit Parameterized runner like that:

@RunWith(Parameterized.class) (1)
public class MyConnectionURITest {

    @Parameterized.Parameters(name = "{0}") (2)
    public static Iterable<String> uris() { (3)
        return asList(
            "jdbc:mysql://localhost:3306/mysql",
            "jdbc:oracle:thin:@//myhost:1521/oracle");
    }

    @Parameterized.Parameter (4)
    public String uri;

    @Test
    public void isValid() { (5)
        assertNotNull(uri);
    }
}
1 Parameterized is the runner understanding @Parameters and how to use it. Note that you can generate random data here if desired.
2 by default the name of the executed test is the index of the data, here we customize it using the first parameter toString() value to have something more readable
3 the @Parameters method MUST be static and return an array or iterable of the data used by the tests
4 you can then inject the current data using @Parameter annotation, it can take a parameter if you use an array of array instead of an iterable of object in @Parameterized and you can select which item you want injected this way
5 the @Test method will be executed using the contextual data, in this sample we’ll get executed twice with the 2 specified urls
you don’t have to define a single @Test method, if you define multiple, each of them will be executed with all the data (ie if we add a test in previous example you will get 4 tests execution - 2 per data, ie 2x2)

JUnit 5

JUnit 5 reworked this feature to make it way easier to use. The full documentation is available at junit.org/junit5/docs/current/user-guide/#writing-tests-parameterized-tests.

The main difference is you can also define inline on the test method that it is a parameterized test and which are the values:

@ParameterizedTest
@ValueSource(strings = { "racecar", "radar", "able was I ere I saw elba" })
void mytest(String currentValue) {
    // do test
}

However you can still use the previous behavior using a method binding configuration:

@ParameterizedTest
@MethodSource("stringProvider")
void mytest(String currentValue) {
    // do test
}

static Stream<String> stringProvider() {
    return Stream.of("foo", "bar");
}

This last option allows you to inject any type of value - not only primitives - which is very common to define scenarii.

don’t forget to add junit-jupiter-params dependency to benefit from this feature.

component-runtime-testing

component-runtime-junit

component-runtime-junit is a small test library allowing you to validate simple logic based on Talend Component tooling.

To import it add to your project the following dependency:

<dependency>
  <groupId>org.talend.sdk.component</groupId>
  <artifactId>component-runtime-junit</artifactId>
  <version>${talend-component.version}</version>
  <scope>test</scope>
</dependency>

This dependency also provide some mocked components that you can use with your own component to create tests.

The mocked components are provided under the family test :

  • emitter : a mock of an input component

  • collector : a mock of an output component

JUnit 4

Then you can define a standard JUnit test and use the SimpleComponentRule rule:

public class MyComponentTest {

    @Rule (1)
    public final SimpleComponentRule components = new SimpleComponentRule("org.talend.sdk.component.mycomponent.");

    @Test
    public void produce() {
        Job.components() (2)
             .component("mycomponent","yourcomponentfamily://yourcomponent?"+createComponentConfig())
             .component("collector", "test://collector")
           .connections()
             .from("mycomponent").to("collector")
           .build()
           .run();

        final List<MyRecord> records = components.getCollectedData(MyRecord.class); (3)
        doAssertRecords(records); // depending your test
    }
}
1 the rule will create a component manager and provide two mock components: an emitter and a collector. Don’t forget to set the root package of your component to enable it.
2 you define any chain you want to test, it generally uses the mock as source or collector
3 you validate your component behavior, for a source you can assert the right records were emitted in the mock collect

JUnit 5

The JUnit 5 integration is mainly the same as for JUnit 4 except it uses the new JUnit 5 extension mecanism.

The entry point is the @WithComponents annotation you put on your test class which takes the component package you want to test and you can use @Injected to inject in a test class field an instance of ComponentsHandler which exposes the same utilities than the JUnit 4 rule:

@WithComponents("org.talend.sdk.component.junit.component") (1)
public class ComponentExtensionTest {
    @Injected (2)
    private ComponentsHandler handler;

    @Test
    public void manualMapper() {
        final Mapper mapper = handler.createMapper(Source.class, new Source.Config() {

            {
                values = asList("a", "b");
            }
        });
        assertFalse(mapper.isStream());
        final Input input = mapper.create();
        assertEquals("a", input.next());
        assertEquals("b", input.next());
        assertNull(input.next());
    }
}
1 The annotation defines which components to register in the test context.
2 The field allows to get the handler to be able to orchestrate the tests.
if it is the first time you use JUnit 5, don’t forget the imports changed and you must use org.junit.jupiter.api.Test instead of org.junit.Test. Some IDE versions and surefire versions can also need you to install either a plugin or a specific configuration.

Mocking the output

Using the component "test"/"collector" as in previous sample stores all records emitted by the chain (typically your source) in memory, you can then access them using theSimpleComponentRule.getCollectoedRecord(type). Note that this method filters by type, if you don’t care of the type just use Object.class.

Mocking the input

The input mocking is symmetric to the output but here you provide the data you want to inject:

public class MyComponentTest {

    @Rule
    public final SimpleComponentRule components = new SimpleComponentRule("org.talend.sdk.component.mycomponent.");

    @Test
    public void produce() {
        components.setInputData(asList(createData(), createData(), createData())); (1)

        Job.components() (2)
             .component("emitter","test://emitter")
             .component("out", "yourcomponentfamily://myoutput?"+createComponentConfig())
           .connections()
              .from("emitter").to("out")
           .build
           .run();

        assertMyOutputProcessedTheInputData();
    }
}
1 using setInputData you prepare the execution(s) to have a fake input when using "test"/"emitter" component.

Creating runtime configuration from component configuration

The component configuration is a POJO (using @Option on fields) and the runtime configuration (ExecutionChainBuilder) uses a Map<String, String>. To make the conversion easier, the JUnit integration provides a SimpleFactory.configurationByExample utility to get this map instance from a configuration instance.

Example:

final MyComponentConfig componentConfig = new MyComponentConfig();
componentConfig.setUser("....");
// .. other inits

final Map<String, String> configuration = configurationByExample(componentConfig);

The same factory provides a fluent DSL to create configuration calling configurationByExample without any parameter. The advantage is to be able to convert an object as a Map<String, String> as seen previously or as a query string to use it with the Job DSL:

final String uri = "family://component?" +
    configurationByExample().forInstance(componentConfig).configured().toQueryString();

It handles the encoding of the URI to ensure it is correctly done.

Testing a Mapper

The SimpleComponentRule also allows to test a mapper unitarly, you can get an instance from a configuration and you can execute this instance to collect the output. Here is a snippet doing that:

public class MapperTest {

    @ClassRule
    public static final SimpleComponentRule COMPONENT_FACTORY = new SimpleComponentRule(
            "org.company.talend.component");

    @Test
    public void mapper() {
        final Mapper mapper = COMPONENT_FACTORY.createMapper(MyMapper.class, new Source.Config() {{
            values = asList("a", "b");
        }});
        assertEquals(asList("a", "b"), COMPONENT_FACTORY.collectAsList(String.class, mapper));
    }
}

Testing a Processor

As for the mapper a processor is testable unitary. The case is a bit more complex since you can have multiple inputs and outputs:

public class ProcessorTest {

    @ClassRule
    public static final SimpleComponentRule COMPONENT_FACTORY = new SimpleComponentRule(
            "org.company.talend.component");

    @Test
    public void processor() {
        final Processor processor = COMPONENT_FACTORY.createProcessor(Transform.class, null);
        final SimpleComponentRule.Outputs outputs = COMPONENT_FACTORY.collect(processor,
                        new JoinInputFactory().withInput("__default__", asList(new Transform.Record("a"), new Transform.Record("bb")))
                                              .withInput("second", asList(new Transform.Record("1"), new Transform.Record("2")))
                );
        assertEquals(2, outputs.size());
        assertEquals(asList(2, 3), outputs.get(Integer.class, "size"));
        assertEquals(asList("a1", "bb2"), outputs.get(String.class, "value"));
    }
}

Here again the rule allows you to instantiate a Processor from your code and then to collect the output from the inputs you pass in. There are two convenient implementation of the input factory:

  1. MainInputFactory for processors using only the default input.

  2. JoinInputfactory for processors using multiple inputs have a method withInput(branch, data) The first arg is the branch name and the second arg is the data used by the branch.

you can also implement your own input representation if needed implementing org.talend.sdk.component.junit.ControllableInputFactory.

component-runtime-testing-spark

The folowing artifact will allow you to test against a spark cluster:

<dependency>
  <groupId>org.talend.sdk.component</groupId>
  <artifactId>component-runtime-testing-spark</artifactId>
  <version>${talend-component.version}</version>
  <scope>test</scope>
</dependency>

JUnit 4

The usage relies on a JUnit TestRule. It is recommanded to use it as a @ClassRule to ensure a single instance of a spark cluster is built but you can also use it as a simple @Rule which means it will be created per method instead of per test class.

It takes as parameter the spark and scala version to use. It will then fork a master and N slaves. Finally it will give you submit* method allowing you to send jobs either from the test classpath or from a shade if you run it as an integration test.

Here is a sample:

public class SparkClusterRuleTest {

    @ClassRule
    public static final SparkClusterRule SPARK = new SparkClusterRule("2.10", "1.6.3", 1);

    @Test
    public void classpathSubmit() throws IOException {
        SPARK.submitClasspath(SubmittableMain.class, getMainArgs());

        // do wait the test passed
    }
}
this is working with @Parameterized so you can submit a bunch of jobs with different args and even combine it with beam TestPipeline if you make it transient!

JUnit 5

The integration with JUnit 5 of that spark cluster logic uses @WithSpark marker for the extension and let you, optionally, inject through @SparkInject, the BaseSpark<?> handler to access te spark cluster meta information - like its host/port.

Here is a basic test using it:

@WithSpark
class SparkExtensionTest {

    @SparkInject
    private BaseSpark<?> spark;

    @Test
    void classpathSubmit() throws IOException {
        final File out = new File(jarLocation(SparkClusterRuleTest.class).getParentFile(), "classpathSubmitJunit5.out");
        if (out.exists()) {
            out.delete();
        }
        spark.submitClasspath(SparkClusterRuleTest.SubmittableMain.class, spark.getSparkMaster(), out.getAbsolutePath());

        await().atMost(5, MINUTES).until(
                () -> out.exists() ? Files.readAllLines(out.toPath()).stream().collect(joining("\n")).trim() : null,
                equalTo("b -> 1\na -> 1"));
    }
}

How to know the job is done

In current state, SparkClusterRule doesn’t allow to know a job execution is done - even if it exposes the webui url so you can poll it to check. The best at the moment is to ensure the output of your job exists and contains the right value.

awaitability or equivalent library can help you to write such logic.

Here are the coordinates of the artifact:

<dependency>
  <groupId>org.awaitility</groupId>
  <artifactId>awaitility</artifactId>
  <version>3.0.0</version>
  <scope>test</scope>
</dependency>

And here is how to wait a file exists and its content (for instance) is the expected one:

await()
    .atMost(5, MINUTES)
    .until(
        () -> out.exists() ? Files.readAllLines(out.toPath()).stream().collect(joining("\n")).trim() : null,
        equalTo("the expected content of the file"));

component-runtime-http-junit

The HTTP JUnit module allows you to mock REST API very easily. Here are its coordinates:

<dependency>
  <groupId>org.talend.sdk.component</groupId>
  <artifactId>component-runtime-junit</artifactId>
  <version>${talend-component.version}</version>
  <scope>test</scope>
</dependency>
this module uses Apache Johnzon and Netty, if you have any conflict (in particular with netty) you can add the classifier shaded to the dependency and the two dependencies are shaded avoiding the conflicts with your component.

It supports JUnit 4 and JUnit 5 as well but the overall concept is the exact same one: the extension/rule is able to serve precomputed responses saved in the classpath.

You can plug your own ResponseLocator to map a request to a response but the default implementation - which should be sufficient in most cases - will look in talend/testing/http/<class name>_<method name>.json. Note that you can also put it in talend/testing/http/<request path>.json.

JUnit 4

JUnit 4 setup is done through two rules: JUnit4HttpApi which is responsible to start the server and JUnit4HttpApiPerMethodConfigurator which is responsible to configure the server per test and also handle the capture mode (see later).

if you don’t use the JUnit4HttpApiPerMethodConfigurator, the capture feature will be deactivated and the per test mocking will not be available.

Most of the test will look like:

public class MyRESTApiTest {
    @ClassRule
    public static final JUnit4HttpApi API = new JUnit4HttpApi();

    @Rule
    public final JUnit4HttpApiPerMethodConfigurator configurator = new JUnit4HttpApiPerMethodConfigurator(API);

    @Test
    public void direct() throws Exception {
        // ... do your requests
    }
}
SSL

For tests using SSL based services, you will need to use activeSsl() on the JUnit4HttpApi rule.

If you need to access the server ssl socket factory you can do it from the HttpApiHandler (the rule):

@ClassRule
public static final JUnit4HttpApi API = new JUnit4HttpApi().activeSsl();

@Test
public void test() throws Exception {
    final HttpsURLConnection connection = getHttpsConnection();
    connection.setSSLSocketFactory(API.getSslContext().getSocketFactory());
    // ....
}

JUnit 5

JUnit 5 uses a JUnit 5 extension based on the HttpApi annotation you can put on your test class. You can inject the test handler (which has some utilities for advanced cases) through @HttpApiInject:

@HttpApi
class JUnit5HttpApiTest {
    @HttpApiInject
    private HttpApiHandler<?> handler;

    @Test
    void getProxy() throws Exception {
        // .... do your requests
    }
}
the injection is optional and the @HttpApi allows you to configure several behaviors of the test.
SSL

For tests using SSL based services, you will need to use @HttpApi(useSsl = true).

You can access the client SSL socket factory through the api handler:

@HttpApi*(useSsl = true)*
class MyHttpsApiTest {
    @HttpApiInject
    private HttpApiHandler<?> handler;

    @Test
    void test() throws Exception {
        final HttpsURLConnection connection = getHttpsConnection();
        connection.setSSLSocketFactory(handler.getSslContext().getSocketFactory());
        // ....
    }
}

Capturing mode

The strength of this implementation is to run a small proxy server and auto configure the JVM: http[s].proxyHost, http[s].proxyPort, HttpsURLConnection#defaultSSLSocketFactory and SSLContext#default are auto configured to work out of the box with the proxy.

It allows you to keep in your tests the native and real URLs. For instance this test is perfectlt valid:

public class GoogleTest {
    @ClassRule
    public static final JUnit4HttpApi API = new JUnit4HttpApi();

    @Rule
    public final JUnit4HttpApiPerMethodConfigurator configurator = new JUnit4HttpApiPerMethodConfigurator(API);

    @Test
    public void google() throws Exception {
        assertEquals(HttpURLConnection.HTTP_OK, get("https://google.fr?q=Talend"));
    }

    private int get(final String uri) throws Exception {
        // do the GET request, skipped for brievity
    }
}

If you execute this test, it will fail with a HTTP 400 because the proxy doesn’t find the mocked response. You can create it manually as seen in the introduction of the module but you can also set the property talend.junit.http.capture to the folder where to store the captures. It must be the root folder and not the folder where the json are (ie not prefixed by talend/testing/http by default).

Generally you will want to use src/test/resources. If new File("src/test/resources") resolves to the valid folder when executing your test (Maven default), then you can just set the system property to true, otherwise you need to adjust accordingly the system property value.

Once you ran the tests with this system property, the testing framework will have created the correct mock response files and you can remove the system property. The test will still pass, using google.com…​even if you disconnect your machine from the internet.

The rule (extension) is doing all the work for you :).

Passthrough mode

Setting talend.junit.http.passthrough system property to true, the server will just be a proxy and will execute each request to the actual server - like in capturing mode.

Beam testing

If you want to ensure your component works in Beam the minimum to do is to try with the direct runner (if you don’t want to use spark).

Check beam.apache.org/contribute/testing/ out for more details.

Multiple environments for the same tests

JUnit (4 or 5) already provides some ways to parameterized tests and execute the same "test logic" against several data. However it is not that convenient to test multiple environments.

For instance, with Beam, you can desire to test against multiple runners your code and it requires to solve conflicts between runner dependencies, setup the correct classloaders etc…​It is a lot of work!

To simplify such cases, the framework provides you a multi-environment support for your tests.

It is in the junit module and is usable with JUnit 4 and JUnit 5.

JUnit 4

@RunWith(MultiEnvironmentsRunner.class)
@Environment(Env1.class)
@Environment(Env1.class)
public class TheTest {
    @Test
    public void test1() {
        // ...
    }
}

The MultiEnvironmentsRunner will execute the test(s) for each defined environments. It means it will run test1 for Env1 and Env2 in previous example.

By default JUnit4 runner will be used to execute the tests in one environment but you can use @DelegateRunWith to use another runner.

JUnit 5

JUnit 5 configuration is close to JUnit 4 one:

@Environment(EnvironmentsExtensionTest.E1.class)
@Environment(EnvironmentsExtensionTest.E2.class)
class TheTest {

    @EnvironmentalTest
    void test1() {
        // ...
    }
}

The main difference is you don’t use a runner (it doesn’t exist in JUnit 5) and you replace @Test by @EnvironmentalTest.

the main difference with JUnit 4 integration is that the tests are execute one after each other for all environments instead of running all tests in each environments sequentially. It means, for instance, that @BeforeAll and @AfterAll are executed once for all runners.

Provided environments

The provided environment setup the contextual classloader to load the related runner of Apache Beam.

Package: org.talend.sdk.component.junit.environment.builtin.beam

the configuration is read from system properties, environment variables, …​.
Class Name Description

ContextualEnvironment

Contextual

Contextual runner

DirectRunnerEnvironment

Direct

Direct runner

FlinkRunnerEnvironment

Flink

Flink runner

SparkRunnerEnvironment

Spark

Spark runner

Configuring environments

If the environment extends BaseEnvironmentProvider and therefore defines an environment name - which is the case of the default ones, you can use EnvironmentConfiguration to customize the system properties used for that environment:

@Environment(DirectRunnerEnvironment.class)
@EnvironmentConfiguration(
    environment = "Direct",
    systemProperties = @EnvironmentConfiguration.Property(key = "beamTestPipelineOptions", value = "..."))

@Environment(SparkRunnerEnvironment.class)
@EnvironmentConfiguration(
    environment = "Spark",
    systemProperties = @EnvironmentConfiguration.Property(key = "beamTestPipelineOptions", value = "..."))

@Environment(FlinkRunnerEnvironment.class)
@EnvironmentConfiguration(
    environment = "Flink",
    systemProperties = @EnvironmentConfiguration.Property(key = "beamTestPipelineOptions", value = "..."))
class MyBeamTest {

    @EnvironmentalTest
    void execute() {
        // run some pipeline
    }
}
if you set the system property <environment name>.skip=true then the environment related executions will be skipped.

Advanced usage

this usage assumes Beam 2.4.0 is in used and the classloader fix about the PipelineOptions is merged.

Dependencies:

<dependencies>
  <dependency>
    <groupId>org.talend.sdk.component</groupId>
    <artifactId>component-runtime-junit</artifactId>
    <scope>test</scope>
  </dependency>
  <dependency>
    <groupId>org.junit.jupiter</groupId>
    <artifactId>junit-jupiter-api</artifactId>
    <scope>test</scope>
  </dependency>
  <dependency>
    <groupId>org.jboss.shrinkwrap.resolver</groupId>
    <artifactId>shrinkwrap-resolver-impl-maven</artifactId>
    <version>3.0.1</version>
    <scope>test</scope>
  </dependency>
  <dependency>
    <groupId>org.talend.sdk.component</groupId>
    <artifactId>component-runtime-beam</artifactId>
    <scope>test</scope>
  </dependency>
  <dependency>
    <groupId>org.talend.sdk.component</groupId>
    <artifactId>component-runtime-standalone</artifactId>
    <scope>test</scope>
  </dependency>
</dependencies>

These dependencies brings into the test scope the JUnit testing toolkit, the Beam integration and the multi-environment testing toolkit for JUnit.

Then using the fluent DSL to define jobs - which assumes your job is linear and each step sends a single value (no multi-input/multi-output), you can write this kind of test:

@Environment(ContextualEnvironment.class)
@Environment(DirectRunnerEnvironment.class)
class TheComponentTest {
    @EnvironmentalTest
    void testWithStandaloneAndBeamEnvironments() {
        from("myfamily://in?config=xxxx")
            .to("myfamily://out")
            .create()
            .execute();
        // add asserts on the output if needed
    }
}

It will execute the chain twice:

  1. with a standalone environment to simulate the studio

  2. with a beam (direct runner) environment to ensure the portability of your job

Secrets/Passwords and Maven

If you desire you can reuse your Maven settings.xml servers - including the encrypted ones. org.talend.sdk.component.maven.MavenDecrypter will give you the ability to find a server username/password from a server identifier:

final MavenDecrypter decrypter = new MavenDecrypter();
final Server decrypted = decrypter.find("my-test-server");
// decrypted.getUsername();
// decrypted.getPassword();

It is very useful to not store secrets and test on real systems on a continuous integration platform.

even if you don’t use maven on the platform you can generate the settings.xml and settings-security.xml files to use that feature. See maven.apache.org/guides/mini/guide-encryption.html for more details.

Generating data?

Several data generator exists if you want to populate objects with a semantic a bit more evolved than a plain random string like commons-lang3:

A bit more advanced, these ones allow to bind directly generic data on a model - but data quality is not always there:

Note there are two main kind of implementations:

  • the one using a pattern and random generated data

  • a set of precomputed data extrapolated to create new values

Check against your use case to know which one is the best.

an interesting alternative to data generation is to import real data and use Talend Studio to sanitize the data (remove sensitive information replacing them by generated data or anonymized data) and just inject that file into the system.

If you are using JUnit 5, you can have a look to glytching.github.io/junit-extensions/randomBeans which is pretty good on that topic.

Scroll to top