Problem: there are many times where you need a copy of
data from your live Salesforce instance loaded into another instance.
This is done automatically by Salesforce when refreshing a Full
Copy Sandbox but what about a Config Only, Developer
Sandbox or even a Developer Edition instance? For those instances
you need to load your own sample data and this is a complicated process
when you have to deal with multiple entities, their relationships and
the space limits of the target instance.
Solution: using ANT you can run this tool and use it to
load some or all of the data from a backup or Export from your
live Salesforce instance. You can choose which entities are loaded and
how much data is loaded from each so that you don't exceed the space
limits of the target instance.
How does it work?: The tool will scan your backup file for
all entities that you wish to load into your target instance. It will
use the upsert call to load all of these records. This means that
you must create an External Id field for all entities in your target
instance. The benefit of using the upsert call is two-fold. First, you
can run your load multiple times and it will not duplicate any data.
Secondly, the upsert call allows the load to automagically match records
through relationships i.e. the connections between your records e.g.
Accounts and related Contacts will be correctly loaded. The import will
only load records that have parent records already loaded e.g. if an
Account has been skipped then its Contacts will not be loaded. This
ensures that related records get priority during the load.
Why build this tool?: Most of my work is consulting and
training for Salesforce customers and Partners. I often see them
struggling with the Dev/Test/Live environment creation. This tool can
automate the preparation of the most critical part of the setup - the
data. Its a way of giving back to the community which pays my bills. I
hope you find it useful.
Note: it is also possible to copy your Salesforce Setup
(metadata) using ANT e.g copy your config from live or sandbox into a
Developer Edition. This can be done using the Retrieve and Deploy
tasks that are a part of the Force.com
Migration Tool. Using this toolkit in conjunction with this import tool,
it should be possible to make a complete copy of your live setup and
some sample data in a fully automated ANT build. This will give you
ability to have all developers using their own instance and thus not
depend on one sandbox for all development, test and training processes.
Requirements: A valid ANT configuration using Java 1.5 or
higher. This can be DOS/UNIX command line or from a Java IDE e.g
Eclipse. The specific versions are...
IMPORTANT: this tool is supplied "as is" and is not
supported by Salesforce.com or Astitch.net. All care is taken to provide
a bug free and useful tool but no warranty or technical support is
implied or provided. Sorry but I have to make this statement to avoid
issues with potential users of this tool.
If you would like to report a bug, you can do so using the Astitch Support form. If
you would like to track any updates, you can follow me on Twitter
Documentation
Running the import: to use this tool, first install Java
and ANT or Eclipse as described above. Then download the import tool and
unzip. Follow these steps to configure and run the tool in a Windows
command prompt...
Generate an export from your live Salesforce instance or Full
Copy Sandbox. Download this file.
Edit the import.bat file. Set the correct location for
JAVA_HOME, ANT_HOME. IMPORT_TOOL_HOME is the location where you have
unzipped the downloaded file. You can leave this as the default unless
you want to move the jar file.
Edit the invoke-sample.properties. Set all the properties in
that file, all are required. Note: this instance must have the same
config (at least the same custom fields) as your source instance.
Edit the invoke-sample.xml. This file controls which data you
would like to load from the backup file. For more information on the
structure of this file, look below.
run import.bat from a command prompt. This is a windows only
sample but you should be able to easily port it to unix if required
Test Mode: When configuring your import, it is useful to
be able to run the import in testmode which scans the file but
does not send the records to Salesforce. You can use this mode to test
the number of records that will be loaded to ensure that you do not
exceed the space limits of your target instance. If you wish to see
which records are specifically being loaded or ignored there are two
options.
You can enable ANT DEBUG mode and the output will display a
record selection log. This is done by adding a -debug parameter to the
ant invocation in the import.bat file.
You can use the recordLogging attribute to true. This is
already set in the sample build file.
Note that it will still connect to Salesforce to get the describe of the
target entities but it will not load any records.
If you prefer, you can run this tool from Eclipse. Eclipse has
integrated ANT support. You can open the xml file inside Eclipse using
the ANT editor and then invoke it from there. You will need to add the
backup-import.jar to the ANT runtime preferences. This can be done by
adding the jar to Ant/Runtime/Global Entries in the Preferences.
Customising the import process: the invoke-sample.xml file
has a sample load that you can use to get started. Of course you will
want to change the import to work for your specific needs. Below is an
explanation of all the settings that are possible in the build file.
importbackup: attributes:
externalidfield: this is the API name of the External Id field
that you need to create for all entities which are being loaded into
your target instance. This must be the same for all entities
owneridloaded: a boolean which can suppress the loading of the
OwnerId field for all records. Set this to false when you are
loading into a Developer Edition which only allows two users i.e. the
OwnerId will be invalid in the target instance. This means that all
your data will be owned by the loading user.
batchsize: an integer which controls the number of records
sent to Salesforce in each round trip. It will default to 200
testmode: a boolean which allows the import to only scan the
data and not send the records to Salesforce. Useful when testing which
data will be loaded if space is a concern
recordLogging: a boolean which reports which records are
loaded or not based on the various reasons that they might be ignored.
The default is false.
server: the Salesforce server. Defaults to www.salesforce.com
which is all live orgs including Developer Editions. Use test.salesforce.com
if you are loading into sandbox
username: the Salesforce username for the target instance
password: the Salesforce password for the target instance
token: the Salesforce API token for the target instance
backupfile: the location of the export file from the source
instance
entity: each entry describes an entity to be loaded from the
export file
readfromsfdc: load this entity from Salesforce instead of from
the CSV. This does not do any writes to Salesforce but it stores all
the record ids in memory so that any records that have relationships to
this records will be loaded. This is typically used with the User
entity when loading into a sandbox.
apiname: the API name of the entity. This matches the csv file
in the backup file and the target entity in Salesforce
limit: an integer which controls the maximum number of records
loaded for this entity. Useful to avoid using up too much space.
skip: an integer used to skip over records in the export file.
Useful to load a sample of records from the entire backup instead of
the first N records.
filter: used to filter records so that a subset of the exported
data is loaded. Note that more than one filter is supported, all must be
true for the record to be loaded.
field: the API name of a field in the backup file for the
parent entity. This is filtered using the regex below
regex: a regular expression (using Java Regex) which will be
applied to the field above. Useful when you want to select specific
records from the backup
isnull: matches records which have a null value for this
field. Must be used exclusively i.e. cannot be used with a regex also
in the filter
fieldvalue: used to override values in the export. Useful when
loading data into an org that is not an exact match of the production
instance e.g. Developer Edition
name: the API name of a field to be overridden
regex: a regular expression (using Java Regex) which will be
used to test if this override should be applied e.g. ".*" detects the
presence of any value.
value: the value to be applied if the regex is matched
Best Practices:
When loading into a sandbox, if your records have
relationships to User records other than owner then you must use the readfromsfdc
entity attribute to allow the load to know which Users exist in
Salesforce. The User object must also have an External Id field and it
should be populated with the User Id for every record. This might seem
odd but the Upsert call always uses an External Id for parent maching
so this is required when User is a parent to any record.
Use the limit and skip attributes on parent
entities e.g. Account and then use no skips for child entities e.g.
Contacts. This will load a sample of Accounts and only Contacts that
match those Accounts. Note that Contacts with no Account will be
included when loading this way.
Make sure that you load your entities in hierarchical order
i.e. load parent entities before related child entities. The load will
exclude records that point to a parent that has not been loaded.
If an Entity has a hierarchical relationship (i.e. a
relationship to itself) then run the load multiple times. This will
ensure that parent records exist for child records. Because the import
is using upsert, there is no duplication when re-running.
If an entity has a parent relationship to the standard User
object and your users do not match (e.g. you are loading from
production to a Developer Edition) then use a fieldvalue to
override the id any time that it exists. This can be done using
regex=".*" which will match & overide any user id that is present in
the export. Note that the value should be an External Id value in the
User object in this case
Limitations: currently only entities that can have an
External Id custom field can be loaded. This means that standard
junction entities (e.g. CampaignMember or OpportuniyContactRole) cannot
be loaded. If there is enough interest in this tool then I will enhance
it to support loading of these entities as well.