EntityResolution / Client / create_matching_workflow
create_matching_workflow¶
- EntityResolution.Client.create_matching_workflow(**kwargs)¶
Creates a matching workflow that defines the configuration for a data processing job. The workflow name must be unique. To modify an existing workflow, use
UpdateMatchingWorkflow.Warning
For workflows where
resolutionTypeisML_MATCHINGorPROVIDER, incremental processing is not supported.See also: AWS API Documentation
Request Syntax
response = client.create_matching_workflow( workflowName='string', description='string', inputSourceConfig=[ { 'inputSourceARN': 'string', 'schemaName': 'string', 'applyNormalization': True|False }, ], outputSourceConfig=[ { 'outputS3Path': 'string', 'KMSArn': 'string', 'output': [ { 'name': 'string', 'hashed': True|False }, ], 'applyNormalization': True|False }, ], resolutionTechniques={ 'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER', 'ruleBasedProperties': { 'rules': [ { 'ruleName': 'string', 'matchingKeys': [ 'string', ] }, ], 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING' }, 'ruleConditionProperties': { 'rules': [ { 'ruleName': 'string', 'condition': 'string' }, ] }, 'providerProperties': { 'providerServiceArn': 'string', 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' } } }, incrementalRunConfig={ 'incrementalRunType': 'IMMEDIATE' }, roleArn='string', tags={ 'string': 'string' } )
- Parameters:
workflowName (string) –
[REQUIRED]
The name of the workflow. There can’t be multiple
MatchingWorkflowswith the same name.description (string) – A description of the workflow.
inputSourceConfig (list) –
[REQUIRED]
A list of
InputSourceobjects, which have the fieldsInputSourceARNandSchemaName.(dict) –
An object containing
inputSourceARN,schemaName, andapplyNormalization.inputSourceARN (string) – [REQUIRED]
An Glue table Amazon Resource Name (ARN) for the input source table.
schemaName (string) – [REQUIRED]
The name of the schema to be retrieved.
applyNormalization (boolean) –
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an
AttributeTypeofPHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
outputSourceConfig (list) –
[REQUIRED]
A list of
OutputSourceobjects, each of which contains fieldsoutputS3Path,applyNormalization,KMSArn, andoutput.(dict) –
A list of
OutputAttributeobjects, each of which have the fieldsNameandHashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.outputS3Path (string) – [REQUIRED]
The S3 path to which Entity Resolution will write the output table.
KMSArn (string) –
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
output (list) – [REQUIRED]
A list of
OutputAttributeobjects, each of which have the fieldsNameandHashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.(dict) –
A list of
OutputAttributeobjects, each of which have the fieldsNameandHashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.name (string) – [REQUIRED]
A name of a column to be written to the output. This must be an
InputFieldname in the schema mapping.hashed (boolean) –
Enables the ability to hash the column values in the output.
applyNormalization (boolean) –
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an
AttributeTypeofPHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
resolutionTechniques (dict) –
[REQUIRED]
An object which defines the
resolutionTypeand theruleBasedProperties.resolutionType (string) – [REQUIRED]
The type of matching workflow to create. Specify one of the following types:
RULE_MATCHING: Match records using configurable rule-based criteriaML_MATCHING: Match records using machine learning modelsPROVIDER: Match records using a third-party matching provider
ruleBasedProperties (dict) –
An object which defines the list of matching rules to run and has a field
rules, which is a list of rule objects.rules (list) – [REQUIRED]
A list of
Ruleobjects, each of which have fieldsRuleNameandMatchingKeys.(dict) –
An object containing the
ruleNameandmatchingKeys.ruleName (string) – [REQUIRED]
A name for the matching rule.
matchingKeys (list) – [REQUIRED]
A list of
MatchingKeys. TheMatchingKeysmust have been defined in theSchemaMapping. Two records are considered to match according to this rule if all of theMatchingKeysmatch.(string) –
attributeMatchingModel (string) – [REQUIRED]
The comparison type. You can choose
ONE_TO_ONEorMANY_TO_MANYas theattributeMatchingModel.If you choose
ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for theEmailattribute type, the system will only consider it a match if the value of theEmailfield of Profile A matches the value of theEmailfield of Profile B.If you choose
MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of theEmailfield of Profile A and the value ofBusinessEmailfield of Profile B matches, the two profiles are matched on theEmailattribute type.matchPurpose (string) –
An indicator of whether to generate IDs and index the data or not.
If you choose
IDENTIFIER_GENERATION, the process generates IDs and indexes the data.If you choose
INDEXING, the process indexes the data without generating IDs.
ruleConditionProperties (dict) –
An object containing the
rulesfor a matching workflow.rules (list) – [REQUIRED]
A list of rule objects, each of which have fields
ruleNameandcondition.(dict) –
An object that defines the
ruleConditionand theruleNameto use in a matching workflow.ruleName (string) – [REQUIRED]
A name for the matching rule.
For example:
Rule1condition (string) – [REQUIRED]
A statement that specifies the conditions for a matching rule.
If your data is accurate, use an Exact matching function:
ExactorExactManyToMany.If your data has variations in spelling or pronunciation, use a Fuzzy matching function:
Cosine,Levenshtein, orSoundex.Use operators if you want to combine (
AND), separate (OR), or group matching functions(...).For example:
(Cosine(a, 10) AND Exact(b, true)) OR ExactManyToMany(c, d)
providerProperties (dict) –
The properties of the provider service.
providerServiceArn (string) – [REQUIRED]
The ARN of the provider service.
providerConfiguration (document) –
The required configuration fields to use with the provider service.
intermediateSourceConfiguration (dict) –
The Amazon S3 location that temporarily stores your data while it processes. Your information won’t be saved permanently.
intermediateS3Path (string) – [REQUIRED]
The Amazon S3 location (bucket and prefix). For example:
s3://provider_bucket/DOC-EXAMPLE-BUCKET
incrementalRunConfig (dict) –
Optional. An object that defines the incremental run type. This object contains only the
incrementalRunTypefield, which appears as “Automatic” in the console.Warning
For workflows where
resolutionTypeisML_MATCHINGorPROVIDER, incremental processing is not supported.incrementalRunType (string) –
The type of incremental run. The only valid value is
IMMEDIATE. This appears as “Automatic” in the console.Warning
For workflows where
resolutionTypeisML_MATCHINGorPROVIDER, incremental processing is not supported.
roleArn (string) –
[REQUIRED]
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.
tags (dict) –
The tags used to organize, track, or control access for this resource.
(string) –
(string) –
- Return type:
dict
- Returns:
Response Syntax
{ 'workflowName': 'string', 'workflowArn': 'string', 'description': 'string', 'inputSourceConfig': [ { 'inputSourceARN': 'string', 'schemaName': 'string', 'applyNormalization': True|False }, ], 'outputSourceConfig': [ { 'outputS3Path': 'string', 'KMSArn': 'string', 'output': [ { 'name': 'string', 'hashed': True|False }, ], 'applyNormalization': True|False }, ], 'resolutionTechniques': { 'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER', 'ruleBasedProperties': { 'rules': [ { 'ruleName': 'string', 'matchingKeys': [ 'string', ] }, ], 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING' }, 'ruleConditionProperties': { 'rules': [ { 'ruleName': 'string', 'condition': 'string' }, ] }, 'providerProperties': { 'providerServiceArn': 'string', 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' } } }, 'incrementalRunConfig': { 'incrementalRunType': 'IMMEDIATE' }, 'roleArn': 'string' }
Response Structure
(dict) –
workflowName (string) –
The name of the workflow.
workflowArn (string) –
The ARN (Amazon Resource Name) that Entity Resolution generated for the
MatchingWorkflow.description (string) –
A description of the workflow.
inputSourceConfig (list) –
A list of
InputSourceobjects, which have the fieldsInputSourceARNandSchemaName.(dict) –
An object containing
inputSourceARN,schemaName, andapplyNormalization.inputSourceARN (string) –
An Glue table Amazon Resource Name (ARN) for the input source table.
schemaName (string) –
The name of the schema to be retrieved.
applyNormalization (boolean) –
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an
AttributeTypeofPHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
outputSourceConfig (list) –
A list of
OutputSourceobjects, each of which contains fieldsoutputS3Path,applyNormalization,KMSArn, andoutput.(dict) –
A list of
OutputAttributeobjects, each of which have the fieldsNameandHashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.outputS3Path (string) –
The S3 path to which Entity Resolution will write the output table.
KMSArn (string) –
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
output (list) –
A list of
OutputAttributeobjects, each of which have the fieldsNameandHashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.(dict) –
A list of
OutputAttributeobjects, each of which have the fieldsNameandHashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.name (string) –
A name of a column to be written to the output. This must be an
InputFieldname in the schema mapping.hashed (boolean) –
Enables the ability to hash the column values in the output.
applyNormalization (boolean) –
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an
AttributeTypeofPHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
resolutionTechniques (dict) –
An object which defines the
resolutionTypeand theruleBasedProperties.resolutionType (string) –
The type of matching workflow to create. Specify one of the following types:
RULE_MATCHING: Match records using configurable rule-based criteriaML_MATCHING: Match records using machine learning modelsPROVIDER: Match records using a third-party matching provider
ruleBasedProperties (dict) –
An object which defines the list of matching rules to run and has a field
rules, which is a list of rule objects.rules (list) –
A list of
Ruleobjects, each of which have fieldsRuleNameandMatchingKeys.(dict) –
An object containing the
ruleNameandmatchingKeys.ruleName (string) –
A name for the matching rule.
matchingKeys (list) –
A list of
MatchingKeys. TheMatchingKeysmust have been defined in theSchemaMapping. Two records are considered to match according to this rule if all of theMatchingKeysmatch.(string) –
attributeMatchingModel (string) –
The comparison type. You can choose
ONE_TO_ONEorMANY_TO_MANYas theattributeMatchingModel.If you choose
ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for theEmailattribute type, the system will only consider it a match if the value of theEmailfield of Profile A matches the value of theEmailfield of Profile B.If you choose
MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of theEmailfield of Profile A and the value ofBusinessEmailfield of Profile B matches, the two profiles are matched on theEmailattribute type.matchPurpose (string) –
An indicator of whether to generate IDs and index the data or not.
If you choose
IDENTIFIER_GENERATION, the process generates IDs and indexes the data.If you choose
INDEXING, the process indexes the data without generating IDs.
ruleConditionProperties (dict) –
An object containing the
rulesfor a matching workflow.rules (list) –
A list of rule objects, each of which have fields
ruleNameandcondition.(dict) –
An object that defines the
ruleConditionand theruleNameto use in a matching workflow.ruleName (string) –
A name for the matching rule.
For example:
Rule1condition (string) –
A statement that specifies the conditions for a matching rule.
If your data is accurate, use an Exact matching function:
ExactorExactManyToMany.If your data has variations in spelling or pronunciation, use a Fuzzy matching function:
Cosine,Levenshtein, orSoundex.Use operators if you want to combine (
AND), separate (OR), or group matching functions(...).For example:
(Cosine(a, 10) AND Exact(b, true)) OR ExactManyToMany(c, d)
providerProperties (dict) –
The properties of the provider service.
providerServiceArn (string) –
The ARN of the provider service.
providerConfiguration (document) –
The required configuration fields to use with the provider service.
intermediateSourceConfiguration (dict) –
The Amazon S3 location that temporarily stores your data while it processes. Your information won’t be saved permanently.
intermediateS3Path (string) –
The Amazon S3 location (bucket and prefix). For example:
s3://provider_bucket/DOC-EXAMPLE-BUCKET
incrementalRunConfig (dict) –
An object which defines an incremental run type and has only
incrementalRunTypeas a field.incrementalRunType (string) –
The type of incremental run. The only valid value is
IMMEDIATE. This appears as “Automatic” in the console.Warning
For workflows where
resolutionTypeisML_MATCHINGorPROVIDER, incremental processing is not supported.
roleArn (string) –
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.
Exceptions
EntityResolution.Client.exceptions.ThrottlingExceptionEntityResolution.Client.exceptions.InternalServerExceptionEntityResolution.Client.exceptions.AccessDeniedExceptionEntityResolution.Client.exceptions.ExceedsLimitExceptionEntityResolution.Client.exceptions.ConflictExceptionEntityResolution.Client.exceptions.ValidationException