Users Brick

The users brick helps you manage users within your domain or workspace. You can do the following:

  • Add users to a domain
  • Add/remove users to/from a workspace
  • Update a user's information
  • Update a user's role in a workspace

 Contents:

Adding Users to a Domain and a Workspace

Adding a user consists of two consecutive steps:

  1. A user is added to a domain.
    The user can now access the platform but cannot access any workspace. In the context of access rights, the domain is not connected to any workspace, and the user added to the platform cannot access any workspace on the platform until invited to the workspace.
    A user can be added to the platform by either of the following ways:
    • The user can sign up themselves.
    • The domain admin adds the user.

      The domain admin can see all the users within the domain and manage them, as needed. The domain admin is the only person who can access the domain as an entity. For more information, see Your GoodData Domain.

  2. A user is added to a workspace.
    The user is now allowed to perform certain operations upon the workspace data according to their role. A user role is a set of permissions that a user is given within a particular workspace (for example, adminRole). See User Roles.
    A domain or workspace admin can invite a user to a workspace.

Updating Users in a Domain and a Workspace

When you use the users brick to update the domain, all the users that you provided in the source data are added to the domain.

  • The new users are added.
  • The existing users are updated.
  • If you did not provide an update for a specific user or a user's property, this user/user's property remains intact.

In other words, only the users/properties that you explicitly specified will be updated. If you did not mention a user/property in the input data, this user/property will not be touched.

Let's look at the example:

Notice that John's email was updated while his first name was not.

When you use the users brick update a workspace, the input file shows how the workspace will look like after the update.

  • Users that are in the workspace but not in the input file will be removed from the workspace.
  • Users that are in the input file but not in the workspace will be added to the workspace.
  • Existing users are updated according to the input data.

Let's look at the example:

The input file is on the left. The workspace with its current data is in the middle.

Notice the following:

  • The user 'todd@example.com' was added to the workspace because they were in the data coming from ETL and not in the workspace.
  • The user 'seth@example.com' was removed from the workspace because they were in the workspace but not in the data coming from ETL.
  • The role of the user 'jane@example.com' was updated to what was in the input file because this user existed in both the input data and the workspace but the role was different.
  • The user 'john@example.com' remained the same as they were identical in both the input data and the workspace.

Prerequisites

Before using the users brick, make sure that the following is true:

  • A domain is implemented at your site.
  • A domain admin and a workspace admin exist in your domain.

Input

The users brick expects to receive the data about users.

The brick accepts a list of users with their properties, for example:

project_idloginfirst_namelast_namerole
tspv1le9afb94q47pehiub568ubkkqqw

john.doe@example.com

John

Doe

adminRole

tspv1le9afb94q47pehiub568ubkkqqw

anna.doe@example.com

Anna

Doe

adminRole

To review the values that the 'role' column can contain (that is, user roles that users can have), see User Roles.

Use the role identifiers, not the role names.

  • Correct: adminRole
  • Incorrect: Administrator

You can define custom roles for your workspaces (see Create a Custom User Role).


The values in the 'login' column are case-sensitive and must be written in lowercase.

  • Correct: john.doe@example.com
  • Incorrect: John.Doe@example.com

Minimal Required Input Data

Depending on what synchronization mode you choose (see sync_mode), the users brick treats different categories of input data as mandatory.

For example, the 'add_to_organization' mode requires at least login information to be present in the input data. That is, the input data must contain a column named 'login' with user logins. All the other missing information will be auto-populated, such as:

  • A missing first name will be set to 'FirstName'.

  • A missing second name will be set to 'LastName'.

  • A missing email will be set to be equal to the login.

  • Passwords will be auto-generated.

The 'sync_project' mode, however, requires login and role information to be present in the input data. The 'sync_one_project_based_on_pid' mode needs workspace IDs in addition to login and role information.

To find the minimal required input data for each synchronization mode, see sync_mode.

Mapping Your Column Names to Defaults

If your file is in the required format but the column names are different from the default names,  you can map your column names to the default ones.

For example, in your input data first names are stored in a column called 'abc'. So, you input data would like like this:

project_idloginabclast_namerole
tspv1le9afb94q47pehiub568ubkkqqw

john.doe@example.com

John

Doe

adminRole

tspv1le9afb94q47pehiub568ubkkqqw

anna.doe@example.com

Anna

Doe

adminRole

Using the 'first_name_column' parameter, you can map your column name to the default name of the column, which is 'first_name':

  "first_name_column": "abc"

You can map multiple column names, for example:

  "first_name_column": "abc",
  "authentication_modes_column": "auth_mode"

Here is the list of the parameters to use for mapping (for more information about all parameters, see Parameters): 

FieldDefault column  nameBrick parameter for mapping the default name to your column name
User's first name

first_name

first_name_column

User's last name

last_name

last_name_column

User's login

login

login_column

User's email address

email

email_column

User's role

role

role_column

Workspace ID

project_id

multiple_projects_column

User's password

password

password_column

User's SSO provider

sso_provider

sso_provider_column

User's authentication mode

authentication_modes

authentication_modes_column

User's groupuser_groupsuser_groups_column

Output

After the users brick has completed, you can expect the following results based on the parameters that you specified:

  • New users have been added to the domain.
  • New users have been added to a workspace.
  • The users not specified in the input data have been deleted from a workspace.
  • Existing users' data has been updated according to the input data.

Parameters

NameTypeMandatory?DefaultDescription

domain

string

yes

n/a

The name of the domain where the brick is executed

input_source

JSON

yes

n/a

The source to take input data from. For more information on input data sources, see Types of Input Data Sources.

You must encode this parameter using the 'gd_encoded_params' and 'gd_encoded_hidden_params' parameters (see Specifying Complex Parameters).

CLIENT_GDC_HOSTNAMEstringsee 'Description' columnsecure.gooddata.com

The white-labeled domain name in the format of your.domain.com (for example, analytics.mycompany.com)

The 'CLIENT_GDC_HOSTNAME' parameter is mandatory only if your domain is white-labeled and you have defined the 'GDC_USERNAME' and 'GDC_PASSWORD' parameters (see in this table). Otherwise, the 'CLIENT_GDC_HOSTNAME' is optional.

If you define the 'CLIENT_GDC_HOSTNAME' parameter, you must also define the 'CLIENT_GDC_PROTOCOL' parameter.

The parameter name is case-sensitive and must be written in uppercase.

CLIENT_GDC_PROTOCOLstringsee 'Description' columnhttps

The protocol to transfer data over.

The 'CLIENT_GDC_PROTOCOL' parameter is mandatory only if your domain is white-labeled and you have defined the 'GDC_USERNAME' and 'GDC_PASSWORD' parameters (see in this table). Otherwise, the 'CLIENT_GDC_PROTOCOL' parameter is optional.

If you define the 'CLIENT_GDC_PROTOCOL' parameter, you must also define the 'CLIENT_GDC_HOSTNAME' parameter.

The parameter name is case-sensitive and must be written in uppercase.

multiple_projects_column

string

see 'Description' column

project_id

The name of the column in the input data source containing the workspace IDs (PIDs) or client IDs (CIDs) (if the column with PIDs is named differently in your input file, set up mapping)

The 'multiple_projects_column' parameter is mandatory when the 'sync_mode' parameter is set to either 'sync_multiple_projects_based_on_pid' or 'sync_one_project_based_on_pid'. Otherwise, the 'multiple_projects_column' parameter is optional.

GDC_USERNAMEstringnon/a

Overrides the default user under whom the brick would be executed, and specifies the user under whom you want to execute the brick.
If this parameter is not set up, the brick is by default executed:

  • (When the brick is run automatically based on the schedule) Under the user who is specified as 'Executes under' in the brick schedule
  • (When the brick is Running Schedules On-Demand) Under the user who ran the brick on demand
  • (When the brick is run via API) Under the user who submitted the API call

If you define the 'GDC_USERNAME' parameter, you must also define the 'GDC_PASSWORD' parameter.

The parameter name is case-sensitive and must be written in uppercase.

GDC_PASSWORDstringnon/a

(Use only when the 'GDC_USERNAME' parameter is set up) The password for the user that you specified in the 'GDC_USERNAME' parameter.

The parameter name is case-sensitive and must be written in uppercase.

sync_mode

string

no

n/a

See sync_mode.

first_name_column

string

no

first_name

The name of the column in the input data source containing users' first names (if the column with first names is named differently in your input file, set up mapping)

last_name_column

string

no

last_name

The name of the column in the input data source containing users' last names (if the column with last names is named differently in your input file, set up mapping)

login_column

string

no

login

The name of the column in the input data source containing users' logins in a form of email addresses (if the column with login is named differently in your input file, set up mapping)
The values of in the login column are case-sensitive and must be written in lowercase.
Once created, a login cannot be changed.

email_column

string

no

email

The name of the column in the input data source containing users' email addresses (if the column with emails is named differently in your input file, set up mapping). If not provided, the email from the login is used instead.

The values of in the email column are case-sensitive and must be written in lowercase.

role_column

string

no

role

The name of the column in the input data source containing users' roles in the workspace (if the column with roles is named differently in your input file, set up mapping).

password_column

string

no

password

The column in the input data source containing users' passwords (if the column with passwords is named differently in your input file, set up mapping)

You do not have to provide passwords, and we do not recommend that you provide them. If not provided, the password is automatically generated when created for the first time, and either the user can change it later, or the password can be set with SSO.

sso_provider_column

string

no

sso_provider

The name of the column in the input data source containing the SSO provider (if the column with the SSO provider is named differently in your input file, set up mapping)

authentication_modes_column

string

no

authentication_modes

The name of the column in the input data source containing the users' authentication mode (if the column with the authentication modes is named differently in your input file, set up mapping)

whitelists

string

no

n/a

See whitelists and regexp_whitelists.

regexp_whitelists

string

no

n/a

See whitelists and regexp_whitelists.

authentication_modes

string

no

n/a

See authentication_modes.

ignore_failures

Boolean

no

false

Defines how the brick should behave in case of failures.

  • If not set or set to 'false', the brick will fail in case of any error in the input data (for example, a wrong role, a badly formatted email address, and so on).
  • If set to 'true', the brick will ignore any input data errors and will continue running.

We recommend that you set this parameter to 'false'. If you switch it to 'true', the data in your workspace may be inconsistent due to ignored errors.

REMOVE_USERS_FROM_PROJECTBooleannofalseIf set to 'true', project.import_users deletes users from the project.
data_productstringnoAttempts to default to the only available data product.

Only the segments contained in the specified data_product are released.
If the specified data_product does not exist, it is created.

sync_mode

The 'sync_mode' parameter specifies the synchronization mode for the users. You can choose from the following synchronization modes:

  • add_to_organization: Synchronize only the domain.
    Brick deployment: service workspace (see How to Use a Brick).
    Minimal required input data: user logins (the 'login' column is filled in). Missing information will be auto-populated. For more information, see Minimal Required Input Data.
    The input CSV is deduplicated by login in order to allow usage of the same input CSV as for other modes, which may be duplicated because of the project_id field.

  • remove_from_organization: removes any defined user login added by the add_to_organization synchronization mode.

  • sync_project: Synchronize one workspace.
    The users have to exist in the domain. If they do not, the brick will fail.

    Brick deployment: synchronized workspace (see How to Use a Brick).
    Minimal required input data: user logins and roles (the 'login' and 'role' columns are filled in). Missing information will be auto-populated. For more information, see Minimal Required Input Data.

  • sync_domain_and_project: Synchronize the domain and then the workspace.
    Use this mode when you have only one workspace, and splitting the domain and workspace synchronization into two tasks (synchronizing the domain and synchronizing the workspace) is not efficient.

    Brick deployment: synchronized workspace (see How to Use a Brick).
    Minimal required input data: user logins and roles (the 'login' and 'role' columns are filled in). Missing information will be auto-populated. For more information, see Minimal Required Input Data.

  • sync_multiple_projects_based_on_pid: Synchronize multiple workspaces from the same input source using a single process.
    Distributing users among workspaces is done based on workspace IDs (PIDs).
    Use this mode when you have several workspaces, and synchronizing them one by one is time-consuming.

    Brick deployment: service workspace (see How to Use a Brick).
    Minimal required input data: user logins, user roles, and workspace IDs (the 'login', 'role', and 'project_id' columns are filled in). That is, the input data should define what user should go to what workspace. Based on workspace IDs, the input data is partitioned, and each partition is used to synchronize the appropriate workspace.
    Missing information will be auto-populated. For more information, see Minimal Required Input Data.


  • sync_multiple_projects_based_on_custom_id: This mode is analogous to the sync_multiple_projects_based_on_pid mode above. The only difference is that the project_id column in the incoming CSV contains the LCM client_id instead of the project ID.

  • sync_domain_client_workspaces: 

    LCM mode, for full synchronization of the whole domain.
    This mode is fully declarative, it performs a full synchronization of the input data to the domain / segments. Any users / user filters that are in the clients but NOT in the input data will be deleted. In other words, what is in the input data will be in the clients, anything extra is deleted.
    This mode accepts an optional parameter SEGMENTS_FILTER (array) to work only with clients from a specified segments.

    When used:
    All clients outside the specified segments are untouched.
    All users and user filters that are NOT listed in the input file are removed from workspaces (and are in the segment defined by the SEGMENTS_FILTER) unless the parameter do_not_touch_users_that_are_not_mentioned / do_not_touch_filters_that_are_not_mentioned is set to true.

    do_not_touch_users_that_are_not_mentioned / do_not_touch_filters_that_are_not_mentioned can be used with all sync modes.

  • sync_one_project_based_on_pid: Synchronize one workspace from a single input source that may have input data for other workspaces, too. The brick will filter out the users for this particular workspace based on its ID (PID), and will ignore the rest of the data. To use this mode, you have know the workspace ID.

    Brick deployment: synchronized workspace (see How to Use a Brick).
    Minimal required input data: user logins, user roles, and the workspace ID (the 'login', 'role', and 'project_id' columns are filled in). Missing information will be auto-populated. For more information, see Minimal Required Input Data.


  • sync_one_project_based_on_custom_id: Synchronize one workspace from a single input source that may have input data for other workspaces, too. The brick will filter out the users for this particular workspace based on its ID (PID), and will ignore the rest of the data.
    However, you may not know the workspace ID (PID).

    Instead of the unknown PID, you are going to use an internal ID (called 'custom workspace ID'): generate an internal ID for the workspace. When the workspace is spun up, this custom ID is stored in the workspace metadata. This way, the PID (that you do not know) is mapped to the custom ID (that you have generated). By the custom ID, the brick will be able to identify the workspace and obtain its PID.

    Brick deployment: synchronized workspace (see How to Use a Brick).
    Minimal required input data: user logins, user roles, and the workspace ID (the 'login', 'role', and 'project_id' columns are filled in; the 'project_id' column contains the custom IDs (internal ID that you generated) or client IDs (CIDs)). Missing information will be auto-populated. For more information, see Minimal Required Input Data.


    Notice that there are three groups of processes differentiated by color. The advantage is that these processes do not have to be synchronized and can run at their own pace.
    • Red: You load the data. At some point, the data is picked up and put into storage. This data contains the custom ID that would allow for sorting the data without knowing in which workspace they would end up.
    • Yellow: At some point, the process responsible for maintaining workspaces and deploying them starts. The process identifies that a new workspace (Project 4) has to be spun up, so it spins it up. A part of this is deploying an ETL process and marking the deployed workspace with the custom ID.
    • Gray: At some point, the ETL starts and processes the data. If it runs, it means that the data for this workspace is already in the storage.

Let's look at how the ETL will run:


On the top, you can see datasets with data. There are two workspaces referenced there by custom IDs. All the other datasets use the custom IDs as a reference to the workspaces. Once the ETL starts, it accesses the data and processes it. One of the output objects will be a file that provides data about users in a particular workspace (bottom left).

whitelists and regexp_whitelists

The 'whitelists' and 'regexp_whitelists' parameters define users to exclude from the processing.

Typically, in your workspace you have users that are there for business reasons. However, sometimes you would also have technical users (users deploying the ETL processes), users from vendors, and so on.

When updating the workspace, these non-business users will be deleted from the workspace unless explicitly specified in the input data. To avoid this, you can white-list users or classes of users who should be excluded from the process of adding and deleting users. For example:

  "whitelists" : ["etl_admin@gooddata.com", "etl_tester@gooddata.com"]
"whitelists" : ["etl_admin@mydomain.com"]
"regexp_whitelists" : ["etl.*@gooddata\.com", "admin[0-9]+.*@gooddata\.com"]

Parameters with a complex structure must be encoded with a special parameter called 'gd_encoded_params'. For more information, see Specifying Complex Parameters.

We recommend that you avoid using these parameters or use them as little as possible. If you decide to exclude some users, you have to always remember what users are excluded in what workspaces and act accordingly when you update users in these workspaces. Having too many users excluded from processing may cause data inconsistency in your workspaces.

authentication_modes

The 'authentication_modes' parameter specifies how users can access the platform. You can choose from the following authentication modes:

  • password: Users access the platform using their credentials.
  • sso: Users access the platform via SSO.

You can set up the authorization in the following ways:

  • Globally for all synchronized users: All users receive the same setting (password or SSO). This way, you do not have to specify authentication mode for each user and just set it globally for everybody in your process/schedule parameters. You can specify one or several values.
    Let's look at the example with one mode specified:

    "authentication_modes": "password"
    

    Another example with several mode values:

    "authentication_modes": ["password", "sso"]
    

    When 'sso' is set for "authentication_modes" globally, you must provide "sso_provider" value as well. See Parameters above.

    Parameters with a complex structure must be encoded with a special parameter called 'gd_encoded_params'. For more information, see Specifying Complex Parameters.

    When the 'authentication_modes' parameter is set, any user-specific authentication mode settings will be ignored. 

  • Per user setup driven by data: Each user has their own specific authentication mode.Your input data would look something like this:

    loginfirst_namelast_nameauthentication_modes
    anna.doe@example.com

    Anna

    Doe

    password

    john.doe@example.com

    John

    Doe

    "password, sso"

    If you want to specify several values for authentication mode in your input data, put these values inside quotation marks and separate them by comma.

    In case of setting authentication mode per user, you do not specify the 'authentication_modes' parameter in your scheduled process. The brick will look into the process parameters first, will not find the globally set authentication mode, and will proceed looking for it in the input data.
    If you do set the 'authentication_modes' parameter, the process will take it as the first choice and will ignore any user-specific authentication mode settings.