Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

article

Erick Ramirez avatar image
Erick Ramirez posted Erick Ramirez edited

HOW TO - Connect to Astra DB from Pentaho Data Integration

Overview

This article provides the steps for connecting to Astra DB from Pentaho Data Integration (PDI also known as "Spoon", formerly KETTLE).

Prerequisites

This article assumes you have installed Pentaho Data Integration on your laptop or PC. It was written for version 9.1 on MacOS but it should also work for the Windows version.

You will also need to generate an application token and download the secure bundle for your Astra DB.

Procedure

JDBC DRIVER

Download the JDBC driver from the DataStax website:

  1. Go to https://downloads.datastax.com/#odbc-jdbc-drivers.
  2. Select Simba JDBC Driver for Apache Cassandra.
  3. Select JDBC 4.2.
  4. Read the license terms and accept it (click the checkbox).
  5. Hit the blue Download button.
  6. Once the download completes, unzip the downloaded file.

IMPORT DRIVER

Deploy the Simba driver to Pentaho servers using the distribution tool:

  1. On your laptop or PC, copy the Simba JAR to the JDBC distribution directory:
    $ cp CassandraJDBC42.jar pentaho/jdbc-distribution/
  2. Run the distribution tool (distribute-files.baton Windows)
    $ cd /Applications/Pentaho/jdbc-distribution
    $ ./distribute-files.sh CassandraJDBC42.jar
  3. Verify that the JAR has been copied to the PDI library:
    $ cd /Applications/Pentaho
    $ ls -lh design-tools/data-integration/lib/CassandraJDBC42.jar
    -rw-r--r--  1 erick  vaxxed   16M 14 Sep 22:18 design-tools/data-integration/lib/CassandraJDBC42.jar
    
    $ file design-tools/data-integration/lib/CassandraJDBC42.jar
    design-tools/data-integration/lib/CassandraJDBC42.jar: Java archive data (JAR)
  4. Restart Pentaho on your workstation for the Simba driver to be loaded.

NEW CONNECTION

Connect to your Astra DB in PDI:

  1. Create a new Transformation.
  2. Open a new Database Connection dialog box.
  3. In the Connection name field, give your DB connection a name.
  4. Under Connection type, select Generic database.
  5. Set the Custom connection URL to:
  6. Set the Custom connection URLto:
    jdbc:cassandra://;AuthMech=2;TunableConsistency=6;SecureConnectionBundlePath=/path/to/secure-connect-getvaxxed.zip
    Note that you will need to specify the full path to your secure bundle.
  7. In the Username field, enter the string token.
  8. In the Password field, paste the value of the token you created in the Prerequisites section above. The token looks like AstraCS:AbC...XYz:123...edf0.

    pentaho-01-new-astra-connection.png

  9. Click on the Test Connectionbutton to confirm that the driver configuration is working:

    pentaho-02-test-connection.png

  10. Click on the OK button to save the connection settings.

Final test

Connect to your Astra DB by launching the SQL Editor in Pentaho and run a simple CQL statement. For example:

pentaho-03-sql-editor.png

Here's an example output:

pentaho-04-preview-data.png

You should also be able to browse the keyspaces in your Astra DB using the DataBase Explorer. Here's an example output:

pentaho-05-db-explorer.png

astra dbastra-db-clientspentaho
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Article

Contributors

Erick Ramirez contributed to this article