Connect to Salesforce Lightning API

Connect to Salesforce Lightning API


Warning

  • The RSD syntax used by Matillion ETL has changed significantly from version 1.41. For the latest information, please refer to Paging with the REST API Query Component 2020.
  • Due to these changes, the syntax of examples used in this document and RSD files on Matillion ETL may differ slightly from current API standards—for example, <rsb: is now <api:.
  • Other updates include the need for users to setup input parameters as inputs rather than dummy columns, and only returning the date on Date data types as opposed to the datetime as previously.


Example 1: Salesforce Lighting - Find the API documentation

Salesforce publish this article and this guide on how to find your regional data center. In this worked example the data center is um2.salesforce.com and connectivity on HTTPS port 443 is successfully verified using the Network Connectivity Test shared job.

The API documentation is published here and describes the API in great detail.

  • Parameters are supplied as a SOQL Query supplied as part of the URL.
  • The API is very large. Data is returned as a JSON array, and the returned granularity depends on the SOQL query.
  • The API uses paging. Each page of responses includes a URL to the next page. It’s blank if there are no more pages to fetch.

Authentication is via an access token and a refresh token, obtained using OAuth 2.0. The endpoints are documented here.

You can use any method to create your Salesforce refresh token, but note that OAuth 2.0 is a web-based protocol, so it’s impossible to create a refresh token in a completely non-interactive batch mode script. So for this worked example we’ll use an existing Matillion Salesforce OAuth connection, and retrieve the refresh token using an OAuth export from Matillion’s v1 API.

To export the details of an OAuth named SF Demo use the following cURL command. Note that the OAuth’s name contains a space which has been URL encoded.

curl -k -X GET -o SFDemo.json -u YourUser:YourPassword https://YourInstance/rest/v1/oauth/name/SF%20Demo/export

You can read the ClientID (aka Consumer Key), ClientSecret (aka Consumer Secret) and RefreshToken from fields in the returned JSON file. You’ll need all these three codes in the next section.


Example 2: Salesforce Lighting - Load the first page

The Salesforce REST API requires an access token as authentication. An access token can only be retrieved using the “refresh token” obtained in the previous step.

Get a new access token:

curl -v -k -X POST -d grant_type=refresh_token -d client_id=YourClientID -d client_secret=YourClientSecret -d refresh_token=YourRefreshToken https://login.salesforce.com/services/oauth2/token

This should return a JSON document including an access_token:

"access_token":"00D......2Wi"

Now you can use the access token to make the REST API call which will retrieve the first page of actual data:

curl -o page1-raw.json "https://um2.salesforce.com/services/data/v20.0/query/?q=SELECT+name+from+Account" -H 'Authorization:Bearer 00D......2Wi'

Validate and prettify the returned JSON file:

jq "." < page1-raw.json > page1.json

Upload this as a JSON file into your API Profile. Afterwards find its full path on the Matillion server, for example /usr/share/tomcat8/api_profiles/The Example API/page1.json

The repeating group in this example is an array element named records

In the generate dialog, supply the parameters for this API:

  • XPath to Repeat Element - /records
  • URI - /usr/share/tomcat8/api_profiles/The Example API/page1.json

Once the new RSD file has been created, remove the rsb:script sections associated with INSERTS, UPDATES and DELETES, then press the Test button. A new record for t1 should appear at bottom left. Press t1 and the data should appear at bottom right.

In this example, we’re expecting one record per Account. You should see a sample of your Account data.

If this works, it’s time to switch to using the actual API instead of the sample file!

Replace the "uri" value with the actual API endpoint. You should use exactly the same URI that you used earlier with the cURL command. In this case https://um2.salesforce.com/services/data/v20.0/query/?q=SELECT+name+from+Account

This time the RSD will attempt to contact the real API, so it’s necessary to also add authentication parameters. For the Salesforce REST API, it’s the access token in the form of an HTTP header bearer token. Again, use exactly the same value that you used earlier in the cURL command, but expressed in the RSD syntax:

<rsb:set attr="Header:Name#"  value="Authorization" />
<rsb:set attr="Header:Value#" value="Bearer 00D......2Wi" />

After saving the changes, test the API again and verify that it still retrieves data at the required granularity.

Salesforce access tokens expire after 2 hours by default. After that time the API call may stop working and you’ll need to get another access token. We’ll discuss replacing hardcoded values later in this document.


Example 3: Salesforce Lighting - Establish Paging

The first two steps are common to all paging implementations.

First add a new input named 'Rows@Next' to the list of columns inside the existing rsb:info block.

<input name="Rows@Next" desc="Identifier for the next page of results" />

Second, add the following to the rsb:script section just after the authentication settings:

<rsb:set attr="EnablePaging" value="TRUE" />

The Salesforce REST API enables paging by having each returned data set conveniently include a single, top-level element named nextRecordsUrl which points to the URL the client should use to fetch the next page.

Add the declaration to save this value:

<rsb:set attr="elementmappath#" value="/nextRecordsUrl" />
​ <rsb:set attr="elementmapname#" value="Rows@Next" />

It’s actually a partial URL, and doesn’t include the hostname (um2.salesforce.com) or protocol (https://). So these need to be included manually to construct the full URL.

The final conditional check, added just below the elementmapname declaration, looks like this:

<rsb:check attr="_input.Rows@Next">
<rsb:set attr="uri" value="https://um2.salesforce.com[_input.Rows@Next]" />
<rsb:else>
<rsb:set attr="uri" value="https://um2.salesforce.com/services/data/v20.0/query/?q=SELECT+name+from+Account" />
</rsb:else>
</rsb:check>

Now when you run this in an API Query you should find that it returns all the account names. If you only get 100 rows, make sure you have removed the default limit property.

The RSD script is fully functional as-is, but still contains a hardcoded access token. It’s bad practice to leave it this way, for two reasons:

  • It’s insecure: owning the access token is equivalent to knowing the Salesforce username and password, so it needs to be properly protected
  • Access tokens expire after two hours by default, so the existing API Profile has a very short shelf life

To remedy these things, the RSD needs to be parameterized as part of productionization.


Example 4: Salesforce Lighting - Productionize

Two steps remain to productionize the API Profile:

  • Remove the hardcoded access token, replacing it with a parameter
  • Obtain the access token at runtime and supply it to the RSD

After the API Profile has been parameterized, you can package the components into an Orchestration job.
 

Replacing the hardcoded access token with a parameter

RSD has a special syntax (described here) for parameters. In the RSD script, replace this line:

<rsb:set attr="Header:Value#" value="Bearer 00D...2Wi" />

With this line:

<rsb:set attr="Header:Value#" value="Bearer [_connection.access_token]" />

Now the script requires that the access token be supplied at runtime.

If you attempt to run or test the RSD now, it will fail because the parameter has not been supplied.

For testing purposes you can supply a parameter inside the Manage API Profile screen using a Parameter as follows:

  • Name: Other
  • Value: access_token=00D...2Wi

Test the API Profile again with the parameter in place to make sure it’s being picked up correctly. Remove the parameter again once the test has been conducted successfully.

Obtaining the access token at runtime

There are a number of different ways of achieving this, as outlined in this document.

For this worked example, we’ll replace the cURL command shown earlier with a Jython script in a Matillion Python Script component. The script is available for download at the end of this document.

The Orchestration job

There are three aspects to this:

  1. Job Variables, accepting all the necessary parameters including client_id, client_secret and refresh_token.
  2. The Python Script, which gets a new Salesforce access token using the client_id, client_secret and refresh_token.
  3. The API Query, which uses the access token to extract the Salesforce account data and load it into a database table.

In the Salesforce API Query component, go to Connection Options and add a new record:

  • Parameter: Other
  • Value: access_token=${access_token}

When calling this job from another Orchestration job, you must supply the job variables under the “Set Scalar Variables” property.

For security reasons it’s highly recommended that you do not hardcode the client_id, client_secret and refresh_token anywhere, or store them as environment variable defaults. Instead, implement password management functionality to protect them - for example as described in this article.