--- title: User Ingestion Processors Guide excerpt: >- Processors are data transformation tools that help clean, filter, and enrich user data during identity ingestion deprecated: false hidden: false metadata: robots: index --- ## Overview **Processors** are **optional data transformation steps** in your User Import Flow. They clean, filter, and enhance user data during ingestion from source systems (Okta, Active Directory, Workday, etc.).
**Note: Most processors operate on individual source data before merging, unless specifically stated otherwise. Processors execute in the order listed.** *** ## Table of Contents 1. [Quick Reference](#quick-reference) 2. [How to Configure](#how-to-configure) 3. [Available Processors](#available-processors) 4. [Best Practices](#best-practices) 5. [Common Scenarios](#common-scenarios) 6. [Troubleshooting](#troubleshooting) 7. [Rule Syntax Reference](#rule-syntax-reference) 8. [Limitations](#limitations) *** ## Quick Reference | Processor | Use Case | | ------------------------------------- | ------------------------------------------------------------------ | | **User Filter Processor** | Remove users matching specific field values | | **Filter Rule Post Processor** | Complex filtering with multiple conditions | | **User Timezone Processor** | Auto-populate timezone from location | | **User Password Meta Info Processor** | Fill missing password expiration dates | | **User Geocode Processor** | Add coordinates for analytics and location-based content retrieval | | **DSL First Match Dedupe Processor** | Deduplicate users across sources | | **Unified Resolve Manager Processor** | Link manager-employee hierarchy | *** ## How to Configure ### Navigation 1. Navigate to **Import Users** 2. Select your source(s) on the **Connectors** page 3. Proceed to **Configure Selected Sources** 4. Click **Advanced Mode** ### In Advanced Mode **Processors to Apply:** Add transformation processors that run during ingestion. Processors execute in the order listed. **Filter and Attribute List:** Control which records and fields are imported at the source level before any processors run. *** ## Available Processors ### Filter Users by Field Value **Processor Name:** User Filter Processor Excludes users from ingestion when a specified field matches any value in your exclusion list. This is a simple, single-field filter that performs exact matching. **Use Cases:** Exclude terminated employees, contractors, test accounts, or specific departments based on a single field value. **Configuration:** | Field | Description | Example | | --------------- | ----------------------------------- | ---------------------- | | **Filter Key** | User field to check | `employment_status` | | **Filter List** | Values to exclude (comma-separated) | `Terminated, Inactive` | **Examples:** ``` Exclude inactive employees: Filter Key: employment_status Filter List: Terminated, Inactive, On Leave Exclude non-employees: Filter Key: user_type Filter List: Contractor, Temp, External ``` *** ### Filter Users by Rule **Processor Name:** Filter Rule Post Processor Excludes users from ingestion based on complex conditional logic. Unlike the simple Field Value filter, this processor allows you to combine multiple field conditions using AND/OR logic, perform date comparisons, and apply sophisticated filtering rules. **Use Cases:** Apply multi-condition filtering (e.g., "active AND hired after date"), date-based filtering, or any logic requiring multiple field comparisons. **Configuration:** | Field | Description | | -------------------------- | ------------------------------------ | | **Filter Condition (DSL)** | Rule determining which users to keep | **Examples:** ``` Keep only active employees: employment_status == "Active" Active employees in specific departments: employment_status == "Active" AND department IN ["Engineering", "Sales"] Keep only users with company email: "@company.com" IN email_addr Keep only users with employee IDs: employee_id != "" ``` > **💡 Tip:** See [Rule Syntax Reference](#rule-syntax-reference) for complete syntax. *** ### Remove Duplicate Users **Processor Name:** DSL First Match Dedupe Processor When the same user appears multiple times (identified by your Index Key, typically email), this processor evaluates all duplicate records and keeps only the first one that matches your filter condition. All other duplicates are discarded. This operates across all sources after they're merged together. **Use Cases:** Multiple integrations provide overlapping users, need to select which source's data to prioritize, ensure each user appears only once in final roster. > **⚠️ Note:** Can be attached to any source - operates on merged data from all sources after ingestion. **Configuration:** | Field | Description | Example | | -------------------------- | -------------------------------------- | ------------------------------- | | **Index Key** | Field to identify duplicates | `email_addr` | | **Filter Condition (DSL)** | Rule to select which duplicate to keep | `record.employee_id != ""` | | **Lowercase** | Convert index key to lowercase | `true` (recommended for emails) | **Common Rules:** ``` Prefer active users: record.employment_status == "Active" Prefer records with employee ID: record.employee_id != "" ``` > **⚠️ Important:** Always set **Lowercase to `true`** when using `email_addr` as Index Key. *** ### Set User Timezone **Processor Name:** User Timezone Processor Automatically infers and populates the user's timezone field by analyzing their location information (city, state, country). The processor uses geographic data to determine the most likely timezone for each user's location. **Use Cases:** Source system doesn't provide timezone field, need consistent timezone data for time-based notifications and scheduling. **Configuration:** No configuration needed - just add the processor. It automatically reads from standard location fields. *** ### Calculate Password Expiration **Processor Name:** User Password Meta Info Processor Fills in missing password date information using your organization's password policy configuration. This processor operates on two fields in the user record: `password_last_changed` and `password_expires`. **What Fields It Uses:** * **Input fields**: `password_meta_info.password_last_changed` (date), `password_meta_info.password_expires` (date) * **Password policy**: Uses your org's configured `password_expiry_in_days` setting * **Output**: Populates whichever field is missing **How It Works:** | Scenario | What It Does | Calculation | | ---------------------------------------------------------------------------------------------- | ---------------------------- | -------------------------------------------------------------------- | | `password_meta_info.password_last_changed` exists, `password_meta_info.password_expires` empty | Calculates expiry date | `password_expires = password_last_changed + password_expiry_in_days` | | `password_meta_info.password_expires` exists, `password_meta_info.password_last_changed` empty | Calculates last changed date | `password_last_changed = password_expires - password_expiry_in_days` | | Both fields populated | No action taken | (already complete) | | Both fields empty | No action taken | (insufficient data) | **Configuration:** | Field | Description | Default | Use Case | | --------------- | -------------------------------------- | ------- | --------------------------------------------------------------------------- | | **Offset Days** | Adjustment to password policy duration | `0` | Use if source system's policy differs from org config (e.g., +5 or -5 days) | **Use Cases:** * Source provides only one of the two password fields * Need complete password data for expiry notifications and password reset workflows * Source system password policy differs slightly from Moveworks org configuration *** ### Add Location Coordinates **Processor Name:** User Geocode Processor Enriches user records with geographic coordinates (latitude/longitude) by geocoding their location information. The processor constructs a location query from specified fields, sends it to a geocoding service, and adds the resulting coordinates to the user's `geocodes` field. **What Fields It Uses:** * **Input**: Any combination of location fields you specify (typically `country_code`, `state`, `city`) * **Output**: Populates `geocodes` field with latitude/longitude data **Use Cases:** * Enable location-based analytics and reporting * Support features that require geographic coordinates * Enrich user profiles with precise location data > **⚠️ Important:** Attach to the source that contains the location fields you want to geocode. > > **Performance Note:** Makes external API calls for geocoding - may slow ingestion for large user sets. **Configuration:** | Field | Description | Example | | ------------------- | -------------------- | --------------------------- | | **Location Fields** | Fields for geocoding | `country_code, state, city` | *** ### Resolve Manager Relationships **Processor Name:** Unified Resolve Manager Processor Establishes manager-employee relationships by resolving manager email addresses to internal user IDs. This processor builds an index of all users (email → ID), then replaces each user's `manager_email` field value with the corresponding manager's internal ID, enabling proper organizational hierarchy. **What Fields It Uses:** * **Input**: `manager_email` (manager's email address) * **Index built from**: `email_addr` (all users' emails) * **Output**: Replaces `manager_email` value with manager's internal identifier **How It Works:** 1. Builds an index mapping every user's email address to their internal ID 2. For each user record, looks up their `manager_email` in the index 3. Replaces the email with the manager's internal ID 4. Result: Proper manager-employee links throughout the organization **Use Cases:** * Source provides manager email instead of manager ID * Need to build organizational reporting hierarchy * Manager data comes from different source than employee data > **⚠️ Note:** Can be attached to any source - operates on all users after merge. Add AFTER deduplication to ensure manager links resolve correctly. **Configuration:** No configuration needed - just add the processor. *** ## Best Practices ### 1. Filter Early Add filter processors before enrichment (like geocoding) to reduce processing time. ```yaml ✅ Good Order: 1. Filter Users by Field Value (remove terminated) 2. Set User Timezone 3. Add Location Coordinates ❌ Bad Order: 1. Add Location Coordinates (slow) 2. Filter Users by Field Value (wastes processing) ``` ### 2. Deduplicate Before Manager Resolution If using both processors, always apply deduplication first. ```yaml ✅ Correct Order: 1. Remove Duplicate Users 2. Resolve Manager Relationships ❌ Incorrect Order: 1. Resolve Manager Relationships 2. Remove Duplicate Users ``` ### 3. Use Lowercase for Email Deduplication When deduplicating by email, always set **Lowercase to `true`**. ```yaml ✅ Correct: Index Key: email_addr Lowercase: true ``` ### 4. Attach Geocode to Source with Location Data Add the geocode processor to the source that has location fields (country_code, state, city). ### 5. Test with Sample Data First 1. Configure processor on test integration 2. Run ingestion with small sample 3. Verify results match expectations 4. Apply to production *** ## Common Scenarios ### Scenario 1: Basic Filtering **Goal:** Exclude terminated and inactive users from Okta. **Steps:** 1. **Import Users** → Select Okta → **Advanced Mode** 2. In **Processors to Apply**, add: **Filter Users by Field Value** 3. Configure: Filter Key: `employment_status`, Filter List: `Terminated, Inactive` *** ### Scenario 2: Multi-Source with Deduplication **Goal:** Use both Okta and Workday, preferring records with employee IDs. **Okta Source:** 1. **Import Users** → Select Okta → **Advanced Mode** 2. Add: **Set User Timezone** **Workday Source:** 1. **Import Users** → Select Workday → **Advanced Mode** 2. Add: **Filter Users by Field Value** * Filter Key: `worker_type`, Filter List: `Contractor, Temp` **Either Source (Deduplication):** 1. Add: **Remove Duplicate Users** * Index Key: `email_addr` * Filter Condition: `record.employee_id != ""` * Lowercase: `true` *** ### Scenario 3: Complex Filtering **Goal:** Keep only active, full-time employees with company email addresses. **Steps:** 1. **Import Users** → Select source → **Advanced Mode** 2. Add: **Filter Users by Rule** 3. Configure Filter Condition: ``` employment_status == "Active" AND employment_type == "Full-time" AND "@company.com" IN email_addr ``` *** ### Scenario 4: Manager Hierarchy **Goal:** Establish manager relationships when source provides manager emails. **Steps:** 1. **Import Users** → Select any source → **Advanced Mode** 2. Add: **Resolve Manager Relationships** (no configuration needed) > **Note:** Add AFTER any deduplication processors. *** ## Troubleshooting ### ❌ Too many users filtered out **Solution:** * Review filter conditions and test with small sample * Verify field names match source data exactly (case-sensitive) * Check logical operators match intent (AND vs OR) *** ### ❌ Duplicate users still appearing **Check:** * ✓ Lowercase set to `true` for email-based deduplication * ✓ Index Key matches field name exactly (case-sensitive) * ✓ Filter condition correctly identifies preferred record *** ### ❌ Manager relationships not working **Check:** * ✓ Manager processor added AFTER deduplication * ✓ Manager emails exist in ingested user data * ✓ Manager email field populated in source data *** ### ❌ Rule syntax error **Check:** * ✓ Field names match exactly (case-sensitive) * ✓ Strings in quotes: `"value"` not `value` * ✓ Lists use brackets: `["value1", "value2"]` *** ## Rule Syntax Reference ### Filter Users by Rule Direct field names, no prefix needed. #### Basic Comparisons ``` field_name == "value" # Equal to field_name != "value" # Not equal to field_name > 100 # Greater than field_name >= 100 # Greater than or equal field_name < 100 # Less than field_name <= 100 # Less than or equal ``` #### List Operations ``` field_name IN ["value1", "value2"] # Field is in list field_name NOT IN ["value1", "value2"] # Field is not in list ``` #### Text Matching ``` "text" IN field_name # Substring match (text is contained in field) ``` #### Combining Conditions ``` condition1 AND condition2 # Both must be true condition1 OR condition2 # Either must be true NOT condition # Opposite/negation ``` #### Examples ``` # Keep active employees employment_status == "Active" # Active employees in specific departments employment_status == "Active" AND department IN ["Engineering", "Sales"] # Users with company email "@company.com" IN email_addr ``` *** ### Remove Duplicate Users Uses `record.` prefix to access fields. #### Basic Comparisons ``` record.field_name == "value" # Equal to record.field_name != "value" # Not equal to record.field_name > 100 # Greater than record.field_name >= 100 # Greater than or equal record.field_name < 100 # Less than record.field_name <= 100 # Less than or equal ``` #### List Operations ``` "value" IN record.field_name # Value is in field record.field_name IN ["value1", "value2"] # Field is in list ``` #### Text Matching ``` "text" IN record.field_name # Substring match (text is contained in field) ``` #### Examples ``` # Prefer active users record.employment_status == "Active" # Prefer records with employee ID record.employee_id != "" # Check nested fields "123" IN record.snow.itsm_user_id ``` *** ## Limitations **Processor Limits:** * Maximum 20 processors per integration source * Processors run in configured order * No processor loops or conditional execution **Rule Constraints:** * Field names are case-sensitive * Changes require running ingestion to take effect **Performance Considerations:** * Geocoding processors make external API calls (slower) * Large filter lists may impact performance * Test with sample data before full ingestion