pyapacheatlas.core package

pyapacheatlas.core.client module

class pyapacheatlas.core.client.AtlasClient(endpoint_url, authentication=None, **kwargs)

Bases: pyapacheatlas.core.util.AtlasBaseClient

Provides communication between your application and the Apache Atlas server with your entities and type definitions.

Parameters
  • endpoint_url (str) – The http url for communicating with your Apache Atlas server. It will most likely end in /api/atlas/v2.

  • authentication (AtlasAuthBase) – The method of authentication.

Kwargs:
param requests_*

Kwargs to pass to the underlying requests package method call. For example passing requests_verify = False will supply verify=False to any API call.

assignTerm(entities, termGuid=None, termName=None, glossary_name='Glossary')

AtlasClient.assignTerm is being deprecated. Please use AtlasClient.glossary.assignTerm instead.

Assign a single term to many entities. Provide either a term guid (if you know it) or provide the term name and glossary name. If term name is provided, term guid is ignored.

As for entities, you may provide a list of AtlasEntity BUT they must have a valid guid defined (not None, not -N) or it will fail with a transient error. Alternatively, you may provide your own dict that contains a ‘guid’ key and value.

Parameters
  • entities (list(Union(dict, AtlasEntity))) – The list of entities that should have the term assigned.

  • termGuid (str) – The guid for the term. Ignored if using termName.

  • termName (str) – The name of the term. Optional if using termGuid.

  • glossary_name (str) – The name of the glossary. Defaults to Glossary. Ignored if using termGuid.

Returns

A dictionary indicating success or failure.

Return type

dict

classify_bulk_entities(entityGuids, classification)

Given a single classification, you want to apply it to many entities and you know their guid. This call will fail if any one of the guids already have the provided classification on that entity.

Parameters
  • entityGuids (Union(str,list)) – The guid or guids you want to classify.

  • classification (Union(dict, AtlasClassification)) – The AtlasClassification object you want to apply to the entities.

Returns

A message indicating success. The only key is ‘message’, containing a brief string.

Return type

dict(str,Union(list(str),str))

classify_entity(guid, classifications, force_update=False)

Given a single entity, you want to apply many classifications. This call will fail if any one of the classifications exist on the entity already, unless you choose force_update=True.

force_update will query the existing entity and sort the classifications into NEW (post) and UPDATE (put) and do two requests to add and update.

force_update is not transactional, it performs adds first. If the add succeeds, it moves on to the updates. If the update fails, the adds will continue to exist on the Atlas server and will not be rolledback.

An error can occur if, for example, the classification has some required attribute that you do not provide.

Parameters
  • guid (str) – The guid you want to classify.

  • classifications – The list of AtlasClassification object you want to apply to the entities.

  • force_update (bool) – Mark as True if any of your classifications may already exist on the given entity.

Returns

A message indicating success and which classifications were ‘updates’ vs ‘adds’ for the given guid.

Return type

dict(str, str)

declassify_entity(guid, classificationName)

Given an entity guid and a classification name, remove the classification from the given entity.

Parameters
  • guid (str) – The guid for the entity that needs to be updated.

  • classificationName (str) – The name of the classification to be deleted.

Returns

A success message repeating what was deleted. The only key is ‘message’, containing the classification name and guid.

Return type

dict(str, str)

delete_assignedTerm(entities, termGuid=None, termName=None, glossary_name='Glossary')

AtlasClient.delete_assignedTerm is being deprecated. Please use AtlasClient.glossary.delete_assignedTerm instead.

Remove a single term from many entities. Provide either a term guid (if you know it) or provide the term name and glossary name. If term name is provided, term guid is ignored.

As for entities, you may provide a list of AtlasEntity BUT they must have a valid guid defined (not None, not -N) and a relationshipAttribute of meanings with an entry that has the term’s guid and relationshipGuid. Alternatively, you may provide your own dict that contains a ‘guid’ and ‘relationshipGuid’ key and value. Lastly, you may also pass in the results of the ‘entities’ key from the get_entity method and it will parse the relationshipAttributes properly and silently ignore the meanings that do not match the termGuid.

Parameters
  • entities (list(Union(dict, AtlasEntity))) – The list of entities that should have the term assigned.

  • termGuid (str) – The guid for the term. Ignored if using termName.

  • termName (str) – The name of the term. Optional if using termGuid.

  • glossary_name (str) – The name of the glossary. Defaults to Glossary. Ignored if using termGuid.

Returns

A dictionary indicating success or failure.

Return type

dict

delete_entity(guid)

Delete one or many guids from your Apache Atlas server.

Parameters

guid (Union(str,list(str))) – The guid or guids you want to remove.

Returns

An EntityMutationResponse containing guidAssignments, mutatedEntities, and partialUpdatedEntities (list).

Return type

dict(str, Union(dict,list))

delete_entity_labels(labels, guid=None, typeName=None, qualifiedName=None)

Delete the given labels for one entity. Provide a list of strings that should be removed. You can either provide the guid of the entity or the typeName and qualifiedName of the entity.

If you want to clear out an entity without knowing all the labels, you should consider update_entity_labels instead and set force_update to True.

Parameters
  • labels (list(str)) – The label(s) that should be removed.

  • guid (str) – The guid of the entity to be updated. Optional if using typeName and qualifiedName.

  • typeName (str) – The type name of the entity to be updated. Must also use qualifiedname with typeName. Not used if guid is provided.

  • qualifiedName (str) – The qualified name of the entity to be updated. Must also use typeName with qualifiedName. Not used if guid is provided.

Returns

A dict containing a message indicating success. Otherwise it will raise an AtlasException.

Return type

dict(str, str)

delete_relationship(guid)

Delete a relationship based on the guid. This lets you remove a connection between entities like removing a column from a table or a term from an entity.

Parameters

guid (str) – The relationship guid for the relationship that you want to remove.

Returns

A dictionary indicating success. Failure will raise an AtlasException.

Return type

dict

delete_type(name)

Delete a type based on the given name.

Parameters

name (str) – The name of the type you want to remove.

Returns

No content, should receive a 204 status code.

Return type

None

delete_typedefs(**kwargs)

Delete one or many types. You can provide a parameters as listed in the kwargs. You’ll pass in a type definition that you want to delete.

That type def can be retrieved with AtlasClient.get_typedef or by creating the typedef with, for example EntityTypeDef(“someType”) as imported from EntityTypeDef. You do not need to include any attribute defs, even if they’re required.

Kwargs:
param entityDefs

EntityDefs to delete.

type entityDefs

list( Union(BaseTypeDef, dict))

param businessMetadataDefs

BusinessMetadataDefs to delete.

type businessMetadataDefs

list( Union(BaseTypeDef, dict))

param classificationDefs

classificationDefs to delete.

type classificationDefs

list( Union(BaseTypeDef, dict))

param enumDefs

enumDefs to delete.

type enumDefs

list( Union(BaseTypeDef, dict))

param relationshipDefs

relationshipDefs to delete.

type relationshipDefs

list( Union(BaseTypeDef, dict))

param structDefs

structDefs to delete.

type structDefs

list( Union(BaseTypeDef, dict))

Returns

A dictionary indicating success. Failure will raise an AtlasException.

Return type

dict

get_all_typedefs()

Retrieve all of the type defs available on the Apache Atlas server.

Returns

A dict representing an AtlasTypesDef, containing lists of type defs wrapped in their corresponding definition types {“entityDefs”, “relationshipDefs”}.

Return type

dict(str, list(dict))

get_entity(guid=None, qualifiedName=None, typeName=None, ignoreRelationships=False, minExtInfo=False)

Retrieve one or many guids from your Atlas backed Data Catalog.

Returns a dictionary with keys “referredEntities” and “entities”. You’ll want to grab the entities values which is a list of entities.

You can provide a single guid or a list of guids. You can provide a single typeName and multiple qualified names in a list.

Parameters
  • guid (Union(str, list(str))) – The guid or guids you want to retrieve. Not used if using typeName and qualifiedName.

  • qualifiedName (Union(str, list(str))) – The qualified name of the entity you want to find. Must provide typeName if using qualifiedName. You may search for multiple qualified names under the same type. Ignored if using guid parameter.

  • typeName (str) – The type name of the entity you want to find. Must provide qualifiedName if using typeName. Ignored if using guid parameter.

  • ignoreRelationships (bool) – Exclude the relationship information from the response.

  • minExtInfo (bool) – Exclude the extra information from the response.

Returns

An AtlasEntitiesWithExtInfo object which includes a list of entities and accessible with the “entities” key.

Return type

dict(str, Union(list(dict),dict))

get_entity_classification(guid, classificationName)

Retrieve a specific entity from the given entity’s guid.

Parameters
  • guid (str) – The guid of the entity that you want to query.

  • classificationName (str) – The typeName of the classification you want to query.

Returns

An AtlasClassification object that contains entityGuid, entityStatus, typeName, attributes, and propagate fields.

Return type

dict(str, object)

get_entity_classifications(guid)

Retrieve all classifications from the given entity’s guid.

Parameters

guid (str) – The entity’s guid.

Returns

An AtlasClassifications object that contains keys ‘list’ (which is the list of classifications on the entity), pageSize, sortBy, startIndex, and totalCount.

Return type

dict(str, object)

get_entity_header(guid=None)

Retrieve one or many entity headers from your Atlas backed Data Catalog.

Parameters

guid (Union(str, list(str))) – The guid or guids you want to retrieve.

Returns

An AtlasEntityHeader dict which includes the keys: guid, attributes (which is a dict that contains qualifiedName and name keys), an array of classifications, and an array of glossary term headers.

Return type

dict

get_entity_lineage(guid, depth=3, width=10, direction='BOTH', includeParent=False, getDerivedLineage=False)

Gets lineage info about the specified entity by guid.

Parameters
  • guid (str) – The guid of the entity for which you want to retrieve lineage.

  • depth (int) – The number of hops for lineage

  • width (int) – The number of max expanding width in lineage

  • direction (str) – The direction of the lineage, which could be INPUT, OUTPUT or BOTH.

  • includeParent (bool) – True to include the parent chain in the response

  • getDerivedLineage (bool) – True to include derived lineage in the response

Returns

A dict representing AtlasLineageInfo with an array of parentRelations and an array of relations

Return type

dict(str, dict)

get_glossary(name='Glossary', guid=None, detailed=False)

AtlasClient.get_glossary is being deprecated. Please use AtlasClient.glossary.get_glossary instead.

Retrieve the specified glossary by name or guid along with the term headers (AtlasRelatedTermHeader: including displayText and termGuid). Providing the glossary name only will result in a lookup of all glossaries and returns the term headers (accessible via “terms” key) for all glossaries. Use detailed = True to return the full detail of terms (AtlasGlossaryTerm) accessible via “termInfo” key.

Parameters
  • name (str) – The name of the glossary to use, defaults to “Glossary”. Not required if using the guid parameter.

  • guid (str) – The unique guid of your glossary. Not required if using the name parameter.

  • detailed (bool) – Set to true if you want to pull back all terms and not just headers.

Returns

The requested glossary with the term headers (AtlasGlossary) or with detailed terms (AtlasGlossaryExtInfo).

Return type

list(dict)

get_glossary_term(guid=None, name=None, glossary_name='Glossary', glossary_guid=None)

AtlasClient.get_glossary_term is being deprecated. Please use AtlasClient.glossary.get_term instead.

Retrieve a single glossary term based on its guid. Providing only the glossary_name will result in a lookup for the glossary guid. If you plan on looking up many terms, consider using the get_glossary method with the detailed argument set to True. That method will provide all glossary terms in a dictionary for faster lookup.

Parameters
  • guid (str) – The guid of your term. Not required if name is specified.

  • name (str) – The name of your term’s display text. Overruled if guid is provided.

  • glossary_name (str) – The name of the glossary to use, defaults to “Glossary”. Not required if using the glossary_guid parameter.

  • glossary_guid (str) – The unique guid of your glossary. Not required if using the glossary_name parameter.

Returns

The requested glossary term as a dict.

Return type

dict

get_relationship(guid)

Retrieve the relationship attribute for the given guid.

Parameters

guid (str) – The unique guid for the relationship.

Returns

A dict representing AtlasRelationshipWithExtInfo with the relationship (what you probably care about) and referredEntities attributes.

Return type

dict(str, dict)

get_single_entity(guid=None, ignoreRelationships=False, minExtInfo=False)

Retrieve one entity based on guid from your Atlas backed Data Catalog.

Returns a dictionary with keys “referredEntities” and “entity”. You’ll want to grab the entity value which is a single dictionary.

Parameters
  • guid (str) – The guid you want to retrieve.

  • ignoreRelationships (bool) – Exclude the relationship information from the response.

  • minExtInfo (bool) – Exclude the extra information from the response.

Returns

An AtlasEntityWithExtInfo object which includes “referredEntities” and “entity” keys.

Return type

dict(str, Union(list(dict),dict))

get_termAssignedEntities(termGuid=None, termName=None, glossary_name='Glossary', limit=- 1, offset=0, sort='ASC')

AtlasClient.get_termAssignedEntities is being deprecated. Please use AtlasClient.glossary.get_termAssignedEntities instead.

Page through the assigned entities for the given term.

Parameters
  • termGuid (str) – The guid for the term. Ignored if using termName.

  • termName (str) – The name of the term. Optional if using termGuid.

  • glossary_name (str) – The name of the glossary. Defaults to Glossary. Ignored if using termGuid.

Returns

A list of Atlas relationships between the given term and entities.

Return type

list(dict)

get_typedef(type_category=None, guid=None, name=None)

Retrieve a single type def based on its guid, name, or type category and (guid or name). Rule of thumb: Use guid if you have it, use name if you want to essentially use duck typing and are testing what keys you’re reading from the response, or use type_category when you want to guarantee the type being returned.

Parameters
  • type_category (TypeCategory) – The type category your type def belongs to. You most likely want TypeCategory.ENTITY. Optional if name or guid is specified.

  • guid (str,optional) – A valid guid. Optional if name is specified.

  • name (str,optional) – A valid name. Optional if guid is specified.

Returns

A dictionary representing an Atlas{TypeCategory}Def.

Return type

dict

partial_update_entity(guid=None, typeName=None, qualifiedName=None, attributes={})

Partially update an entity without having to construct the entire object and its subsequent required attributes. Using guid, you can update a single attribute. Using typeName and qualifiedName, you can update multiple attributes.

Parameters
  • guid (str) – The guid for the entity you want to update. Not used if using typeName and qualifiedName.

  • qualifiedName (str) – The qualified name of the entity you want to update. Must provide typeName if using qualifiedName. Ignored if using guid parameter.

  • typeName (str) – The type name of the entity you want to update. Must provide qualifiedName if using typeName. Ignored if using guid parameter.

Returns

The results of your entity update.

Return type

dict

search_entities(query, limit=50, search_filter=None, starting_offset=0)

Search entities based on a query and automaticall handles limits and offsets to page through results.

The limit provides how many records are returned in each batch with a maximum of 1,000 entries per page.

Parameters
  • query (str) – The search query to be executed.

  • limit (int) – A non-zero integer representing how many entities to return for each page of the search results.

  • search_filter (dict) – A search filter to reduce your results.

Returns

The results of your search as a generator.

Return type

Iterator(dict)

update_entity_labels(labels, guid=None, typeName=None, qualifiedName=None, force_update=False)

Update the given labels for one entity. Provide a list of strings that should be added. You can either provide the guid of the entity or the typeName and qualifiedName of the entity. By using force_update set to True you will overwrite the existing entity. force_update set to False will append to the existing entity.

Parameters
  • labels (list(str)) – The label(s) that should be appended or set.

  • guid (str) – The guid of the entity to be updated. Optional if using typeName and qualifiedName.

  • typeName (str) – The type name of the entity to be updated. Must also use qualifiedname with typeName. Not used if guid is provided.

  • qualifiedName (str) – The qualified name of the entity to be updated. Must also use typeName with qualifiedName. Not used if guid is provided.

Returns

A dict containing a message indicating success. Otherwise it will raise an AtlasException.

Return type

dict(str, str)

upload_entities(batch, batch_size=None)

Upload entities to your Atlas backed Data Catalog.

Parameters
  • batch (Union(dict, AtlasEntity, list(dict), list(AtlasEntity) )) – The batch of entities you want to upload. Supports a single dict, AtlasEntity, list of dicts, list of atlas entities.

  • batch_size (int) – The number of entities you want to send in bulk

Returns

The results of your bulk entity upload.

Return type

dict

upload_relationship(relationship)

Upload a AtlasRelationship json. Should take the form of the following:

{
    "typeName": "hive_table_columns",
    "attributes": {},
    "guid": -100,
    "end1": {
        "guid": assignments["-1"]
    },
    "end2": {
        "guid": assignments["-5"]
        }
}
Parameters

relationship (dict) – The relationship you want to upload.

Returns

The results of your relationship upload.

Return type

dict

upload_terms(batch, force_update=False)

AtlasClient.upload_terms is being deprecated. Please use AtlasClient.glossary.upload_terms instead.

Upload terms to your Atlas backed Data Catalog. Supports Purview Term Templates by passing in an attributes field with the term template’s name as a field within attributes and an object of the required and optional fields.

Parameters

batch (list(dict)) – A list of AtlasGlossaryTerm objects to be uploaded.

Returns

A list of AtlasGlossaryTerm objects that are the results from your upload.

Return type

list(dict)

upload_typedefs(typedefs=None, force_update=False, **kwargs)

Provides a way to upload a single or multiple type definitions. If you provide one type def, it will format the required wrapper for you based on the type category.

If you want to upload multiple type defs or typedefs of different category, you can pass the in kwargs entityDefs, classificationDefs, enumDefs, relationshipDefs, structDefs which take in a list of dicts or appropriate TypeDef objects.

Otherwise, you can pass in the wrapper yourself (e.g. {“entityDefs”:[], “relationshipDefs”:[]}) by providing that dict to the typedefs parameter. If the dict you pass in contains at least one of these Def fields it will be considered valid and an upload will be attempted.

typedefs also takes in a BaseTypeDef object or a valid AtlasTypeDef json / dict. If you provide a value in typedefs, it will ignore the kwargs parameters.

When using force_update, it will look up all existing types and see if any of your provided types exist. If they do exist, they will be updated. If they do not exist, they will be issued as new. New types are uploaded first. Existing types are updated second. There are no transactional updates. New types can succeed and be inserted while a batch of existing types can fail and not be updated.

Parameters
  • typedefs (Union(dict, BaseTypeDef)) – The set of type definitions you want to upload.

  • force_update (bool) – Set to True if your typedefs contains any existing entities.

Returns

The results of your upload attempt from the Atlas server. :rtype: dict

Kwargs:
param entityDefs

EntityDefs to upload.

type entityDefs

list( Union(BaseTypeDef, dict))

param classificationDefs

classificationDefs to upload.

type classificationDefs

list( Union(BaseTypeDef, dict))

param enumDefs

enumDefs to upload.

type enumDefs

list( Union(BaseTypeDef, dict))

param relationshipDefs

relationshipDefs to upload.

type relationshipDefs

list( Union(BaseTypeDef, dict))

param structDefs

structDefs to upload.

type structDefs

list( Union(BaseTypeDef, dict))

param businessMetadataDefs

businessMetadataDefs to upload.

type businessMetadataDefs

list( Union(BaseTypeDef, dict))

class pyapacheatlas.core.client.PurviewClient(account_name, authentication=None, **kwargs)

Bases: pyapacheatlas.core.client.AtlasClient

Provides communication between your application and the Azure Purview service. Simplifies the requirements for knowing the endpoint url and requires only the Purview account name.

Parameters
  • account_name (str) – Your Purview account name.

  • authentication (AtlasAuthBase) – The method of authentication.

Kwargs:
param requests_*

Kwargs to pass to the underlying requests package method call. For example passing requests_verify = False will supply verify=False to any API call.

export_terms(guids, csv_path, glossary_name='Glossary', glossary_guid=None)

PurviewClient.export_terms is being deprecated. Please use PurviewClient.glossary.export_terms instead.

Parameters
  • guids (list(str)) – List of guids that should be exported as csv.

  • csv_path (str) – Path to CSV that will be imported.

  • glossary_name (str) – Name of the glossary. Defaults to ‘Glossary’. Not used if glossary_guid is provided.

  • glossary_guid (str) – Guid of the glossary, optional if glossary_name is provided. Otherwise, this parameter takes priority over glossary_name. Providing glossary_guid is also faster as you avoid a lookup based on glossary_name.

Returns

A csv file is written to the csv_path.

Return type

None

get_entity_next_lineage(guid, direction, getDerivedLineage=False, offset=0, limit=- 1)

Returns immediate next level lineage info about entity with pagination

Parameters
  • guid (str) – The guid of the entity for which you want to retrieve lineage.

  • direction (str) – The direction of the lineage, which could be INPUT or OUTPUT.

  • getDerivedLineage (bool) – True to include derived lineage in the response

  • offset (int) – The offset for pagination purpose.

  • limit (int) – The page size - by default there is no paging.

Returns

A dict representing AtlasLineageInfo with an array of parentRelations and an array of relations

Return type

dict(str, dict)

import_terms(csv_path, glossary_name='Glossary', glossary_guid=None)

Bulk import terms from an existing csv file. If you are using the system default, you must include the following headers: Name,Definition,Status,Related Terms,Synonyms,Acronym,Experts,Stewards

For custom term templates, additional attributes must include [Attribute][termTemplateName]attributeName as the header.

Parameters
  • csv_path (str) – Path to CSV that will be imported.

  • glossary_name (str) – Name of the glossary. Defaults to ‘Glossary’. Not used if glossary_guid is provided.

  • glossary_guid (str) – Guid of the glossary, optional if glossary_name is provided. Otherwise, this parameter takes priority over glossary_name.

Returns

A dict that contains an id that you can use in import_terms_status to get the status of the import operation.

Return type

dict

import_terms_status(operation_guid)

PurviewClient.import_terms_status is being deprecated. Please use PurviewClient.glossary.import_terms_status instead.

Get the operation status of a glossary term import activity. You get the operation_guid after executing the import_terms method and find the id field in the response dict/json.

Parameters

operation_guid (str) – The id of the import operation.

Returns

The status of the import operation as a dict. The dict includes a field called status that will report back RUNNING, SUCCESS, or FAILED. Other fields include the number of terms detected and number of errors.

Return type

dict

upload_term(term, includeTermHierarchy=True, **kwargs)

PurviewClient.upload_term is being deprecated. Please use PurviewClient.glossary.upload_term instead.

Upload a single term to Azure Purview. Minimally, you can specify the term alone and it will upload it to Purview! However, if you plan on uploading many terms programmatically, you might look at PurviewClient.upload_terms or PurviewClient.import_terms.

If you do intend on using this method for multiple terms consider looking up the glossary_guid and any parent term guids in advance otherwise, this method will call get_glossary multiple times making it much slower to do many updates.

` glossary = client.get_glossary() glossary_guid = glossary["guid"] `

pyapacheatlas.core.discovery module

class pyapacheatlas.core.discovery.PurviewDiscoveryClient(endpoint_url, authentication, **kwargs)

Bases: pyapacheatlas.core.util.AtlasBaseClient

autocomplete(keywords=None, filter=None, api_version='2021-05-01-preview', **kwargs)

Execute an autocomplete search request on Azure Purview’s /catalog/api/search/autocomplete endpoint.

Parameters
  • body (dict) – An OPTIONAL fully formed json body. If provided, all other params will be ignored except api-version.

  • keywords (str) – The keywords applied to all fields that support autocomplete operation. It must be at least 1 character, and no more than 100 characters.

  • filter (dict) – A json object that includes and, not, or conditions and ultimately a dict that contains attributeName, operator, and attributeValue.

  • limit (int) – The number of search results to return.

  • api_version (str) – The Purview API version to use.

Returns

Autocomplete Search results with a value field.

Return type

dict

browse(entityType=None, api_version='2021-05-01-preview', **kwargs)

Execute a browse search for Purview based on the entity against the /catalog/api/browse endpoint.

Parameters
  • entityType (str) – The entity type to browse as the root level entry point. This must be a valid Purview built-in or custom type.

  • path (str) – The path to browse the next level child entities.

  • limit (int) – The number of search results to return.

  • offset (int) – The number of search results to skip.

  • api_version (str) – The Purview API version to use.

Returns

Search query results with @search.count and value fields.

Return type

dict

query(keywords=None, filter=None, facets=None, taxonomySetting=None, api_version='2021-05-01-preview', **kwargs)

Execute a search query against Azure Purview’s /catalog/api/search/query endpoint.

Parameters
  • body (dict) – An optional fully formed json body. If provided, all other params will be ignored except api-version.

  • keywords (str) – The keyword to search. You can use None or ‘*’ for wildcard, or a string to search.

  • filter (dict) – A json object that includes and, not, or conditions and ultimately a dict that contains attributeName, operator, and attributeValue.

  • facets (dict) – The kind of aggregate count you want to retrieve. Should be a dict that contains fields: count, facet, and sort.

  • taxonomySetting (dict) – Undocumented.

  • limit (int) – The number of search results to return.

  • offset (int) – The number of search results to skip.

  • api_version (str) – The Purview API version to use.

Returns

Search query results with @search.count and value fields.

Return type

dict

search_entities(query, limit=50, search_filter=None, starting_offset=0, api_version='2021-05-01-preview', **kwargs)

Search entities based on a query and automaticall handles limits and offsets to page through results.

The limit provides how many records are returned in each batch with a maximum of 1,000 entries per page.

Parameters
  • query (str) – The search query to be executed.

  • limit (int) – A non-zero integer representing how many entities to return for each page of the search results.

  • search_filter (dict) – A json object that includes and, not, or conditions and ultimately a dict that contains attributeName, operator, and attributeValue.

  • facets (dict) – The kind of aggregate count you want to retrieve. Should be a dict that contains fields: count, facet, and sort.

  • taxonomySetting (dict) – Undocumented.

  • offset (int) – The number of search results to skip.

  • api_version (str) – The Purview API version to use.

Kwargs:
param dict body

An optional fully formed json body. If provided

query/keywords, limit, search_filter/filter, and starting_offset/offset will be updated using the values found in the body dictionary. Any additional keys provided in body will be passed along as additional kwargs.

Returns

The results of your search as a generator.

Return type

Iterator(dict)

suggest(keywords=None, filter=None, api_version='2021-05-01-preview', **kwargs)

Execute a sugest search request on Azure Purview’s /catalog/api/search/suggest endpoint.

Parameters
  • body (dict) – An optional fully formed json body. If provided, all other params will be ignored except api-version.

  • keywords (str) – The keywords applied to all fields that support autocomplete operation. It must be at least 1 character, and no more than 100 characters.

  • filter (dict) – A json object that includes and, not, or conditions and ultimately a dict that contains attributeName, operator, and attributeValue.

  • limit (int) – The number of search results to return.

  • api_version (str) – The Purview API version to use.

Returns

Suggest Search results with a value field.

Return type

dict

pyapacheatlas.core.entity module

class pyapacheatlas.core.entity.AtlasClassification(typeName, entityStatus='ACTIVE', propagate=False, removePropagationsOnEntityDelete=False, **kwargs)

Bases: object

A python implementation of the AtlasClassification from Apache Atlas.

Parameters
  • typeName (str) – The name of this classification.

  • entityStatus (str) – One of ACTIVE, DELETED, PURGED.

  • propagate (bool) – Whether the classification should propagate to child entities. Not implemented in Purview as of release time.

  • removePropagationsOnEntityDelete (bool) – Whether the classification should be removed on child entities if the parent entity is deleted. Not implemented in Purview as of release time.

  • attributes (dict, optional) – Additional attributes that your atlas entity may require.

  • validityPeriods (dict, optional) – Validity Periods that may be applied to this atlas classification.

to_json()

Convert this atlas entity to a dict / json.

Parameters

minimum (bool) – If True, returns only the type name, qualified name, and guid of the entity. Useful for being referenced in other entities like process inputs and outputs.

Returns

The json representation of this atlas entity.

Return type

dict

class pyapacheatlas.core.entity.AtlasEntity(name, typeName, qualified_name, guid=None, **kwargs)

Bases: object

A python representation of the AtlasEntity from Apache Atlas.

Parameters
  • name (str) – The name of this instance of an atlas entity.

  • typeName (str) – The type this entity should be.

  • qualified_name (str) – The unique “qualified name” of this instance of an atlas entity.

  • guid (Union(str,int)) – The guid to reference this entity by. Should be a negative number if you’re adding an entity. Consider using get_guid() method from GuidTracker to retrieve unique negative numbers.

  • relationshipAttributes (dict, optional) – The relationship attributes representing how this entity is connected to others. Commonly used for “columns” to indicate entity is a column of a table or “query” to indicate a process entity is tied another process in a column lineage scenario.

  • attributes (dict, optional) – Additional attributes that your atlas entity may require.

  • classifications (dict, optional) – Classifications that may be applied to this atlas entity.

  • contacts (dict(str, dict(str, list(dict(strt,str)))), optional) – Contacts should contain keys Experts and/or Owners. Their values should be a list of dicts with keys id and info. Id is a microsoft graph object id. Info is a string of extra information.

addBusinessAttribute(**kwargs)

Add one or many businessAttributes to the entity. This will also update an existing business attribute. You can pass in a parameter name and a dict.

Kwargs:
param kwarg

The name(s) of the business attribute(s) you’re adding.

type kwarg

dict

addClassification(*args)

Add one or many classifications to the entity. This will also update an existing attribute. You can pass in a parameter name and a string, an AtlasClassification, or a dictionary.

Parameters

args (Union(str, dict, AtlasClassification)) – The string, dictionary, or AtlasClassification passed as individual arguments. You can unpack a list using something like *my_list.

addCustomAttribute(**kwargs)

Add one or many customAttributes to the entity. This will also update an existing attribute. You can pass in a parameter name and a string.

Kwargs:
param kwarg

The name(s) of the custom attribute(s) you’re adding.

type kwarg

dict(str, str)

addRelationship(**kwargs)

Add one or many relationshipAttributes to the entity. This will also update an existing relationship attribute. You can pass in a parameter name and then either an Atlas Entity, a dict representing an AtlasEntity, or a list containing dicts of AtlasEntity pointers. For example, you might pass in addRelationship(table=AtlasEntity(…)) or addRelationship(column=[{‘guid’:’abc-123-def}])`.

Kwargs:
param kwarg

The name of the relationship attribute you’re adding.

type kwarg

Union(dict, pyapacheatlas.core.entity.AtlasEntity)

classmethod from_json(entity_json)
merge(other)

Update the calling object with the attributes and classifications of the passed in AtlasEntity.

:param AtlasEntity other:

The other AtlasEntity object that you want to merge.

property name

Retrieve the name of this entity.

Returns

The name of the entity.

Return type

str

property qualifiedName

Retrieve the qualifiedName of this entity.

Returns

The name of the entity.

Return type

str

to_json(minimum=False)

Convert this atlas entity to a dict / json. Returns typename, guid, and qualified name if guid is not none. If guid is None then this will return typename, uniqueAttributes with a sub object of qualified name.

By specifying a guid, this method assumes you will be uploading the entity (and want or at least willing to accept changes to the entity). By NOT specifying a guid, this assumes you will be using the entity as a reference used by another one in the upload (e.g. creating a process entity that uses an existing entity as an input or output).

Parameters

minimum (bool) – If True, returns only the type name, qualified name, and guid of the entity (when guid is defined). If True and guid is None, returns typeName, uniqueAttributes and qualifiedName. If False, return the full entity and its attributes and relationship attributes.

Returns

The json representation of this atlas entity.

Return type

dict

class pyapacheatlas.core.entity.AtlasProcess(name, typeName, qualified_name, inputs, outputs, guid=None, **kwargs)

Bases: pyapacheatlas.core.entity.AtlasEntity

A subclass of AtlasEntity that forces you to include the inputs and outputs of the process.

Parameters
  • name (str) – The name of this instance of an atlas entity.

  • typeName (str) – The type this entity should be.

  • qualified_name (str) – The unique “qualified name” of this instance of an atlas entity.

  • inputs (Union(list(dict), pyapacheatlas.core.entity.EntityDef)) – The list of input entities expressed as dicts and in minimum format (guid, type name, qualified name) or an AtlasEntity.

  • outputs (Union(list(dict), pyapacheatlas.core.entity.EntityDef)) – The list of output entities expressed as dicts and in minimum format (guid, type name, qualified name) or an AtlasEntity.

  • guid (Union(str,int), optional) – The guid to reference this entity by.

  • relationshipAttributes (dict, optional) – The relationship attributes representing how this entity is connected to others. Commonly used for “columns” to indicate entity is a column of a table or “query” to indicate a process entity is tied another process in a column lineage scenario.

  • attributes (dict, optional) – Additional attributes that your atlas entity may require.

  • classifications (dict, optional) – Classifications that may be applied to this atlas entity.

addInput(*args)

Add one or many entities to the inputs.

Parameters

args (Union(dict, pyapacheatlas.core.entity.AtlasEntity)) – The atlas entities you are adding. They are comma delimited dicts or AtlasEntity. You can expand a list with *my_list.

addOutput(*args)

Add one or many entities to the outputs.

Parameters

args (Union(dict, pyapacheatlas.core.entity.AtlasEntity)) – The atlas entities you are adding. They are comma delimited dicts or AtlasEntity. You can expand a list with *my_list.

property inputs

Retrieves the inputs attribute for the process.

Returns

The list of inputs as dicts.

Return type

Union(list(dict),None)

merge(other)

Combine the inputs and outputs of a process. Fails if one side has a null input or output. Updates the object that merge is called on.

:param AtlasEntity other:

The other AtlasEntity object that you want to merge.

property outputs

Retrieves the outputs attribute for the process.

Returns

The list of outputs as dicts.

Return type

Union(list(dict),None)

pyapacheatlas.core.glossary module

class pyapacheatlas.core.glossary.AtlasGlossaryTerm(**kwargs)

Bases: pyapacheatlas.core.glossary.term._CrossPlatformTerm

Defines an Atlas Glossary Term for use by the AtlasClient.

You should provide at least a name and a glossary_guid. You can find the glossary guid with the AtlasClient.glossary.get_glossary() method and extract the guid from the results.

Parameters
  • name (str) – A term that you want to upload. Also used as the nickname.

  • qualifiedName (str) – The qualified name of the term. It usually is termName@GlossaryName

  • glossary_guid (str) – The guid of the glossary the term belongs to.

  • status (str) – Should be one of Draft, Approved, Alert, Expired.

  • shortDescription (str) – A short description of your term.

  • longDescription (str) – A long description of your term.

  • abbreviation (str) – A comma delimited set of abbreviations.

  • classifications (list(dict)) – The classifications to assign to term.

Additional Args

Parameters
  • abbreviation (str) – The abbreviation of the term.

  • anchor (dict(str, str)) – Not required if passing in glossary_guid. Otherwise, should have key glossaryGuid and value of the glossary guid you want to use.

  • examples (str) – A list of examples.

  • guid (str) – The guid of the term. Not necessary for new uploads.

  • usage (str) – The usage of the term.

  • antonyms (list(dict)) – A list of AtlasRelatedTermHeaders.

  • classifies (list(dict)) – A list of AtlasRelatedTermHeaders.

  • isA (list(dict)) – A list of AtlasRelatedTermHeaders.

  • preferredTerms (list(dict)) – A list of AtlasRelatedTermHeaders.

  • preferredToTerms (list(dict)) – A list of AtlasRelatedTermHeaders.

  • replacedBy (list(dict)) – A list of AtlasRelatedTermHeaders.

  • replacementTerms (list(dict)) – A list of AtlasRelatedTermHeaders.

  • seeAlso (list(dict)) – A list of AtlasRelatedTermHeaders.

  • synonyms (list(dict)) – A list of AtlasRelatedTermHeaders.

  • translatedTerms (list(dict)) – A list of AtlasRelatedTermHeaders.

  • translationTerms (list(dict)) – A list of AtlasRelatedTermHeaders.

  • validValues (list(dict)) – A list of AtlasRelatedTermHeaders.

  • validValuesFor (list(dict)) – A list of AtlasRelatedTermHeaders.

class pyapacheatlas.core.glossary.GlossaryClient(endpoint_url, authentication, **kwargs)

Bases: pyapacheatlas.core.util.AtlasBaseClient

assignTerm(entities, termGuid=None, termName=None, glossary_name='Glossary', glossary_guid=None)

Assign a single term to many entities. Provide either a term guid (if you know it) or provide the term name and glossary name. If term name is provided, term guid is ignored.

As for entities, you may provide a list of AtlasEntity BUT they must have a valid guid defined (not None, not -N) or it will fail with a transient error. Alternatively, you may provide your own dict that contains a ‘guid’ key and value.

Parameters
  • entities (list(Union(dict, AtlasEntity))) – The list of entities that should have the term assigned.

  • termGuid (str) – The guid for the term. Ignored if using termName.

  • termName (str) – The name of the term. Optional if using termGuid.

  • glossary_name (str) – The name of the glossary. Defaults to Glossary. Ignored if using termGuid.

Returns

A dictionary indicating success or failure.

Return type

dict

delete_assignedTerm(entities, termGuid=None, termName=None, glossary_name='Glossary', glossary_guid=None)

Remove a single term from many entities. Provide either a term guid (if you know it) or provide the term name and glossary name. If term name is provided, term guid is ignored.

As for entities, you may provide a list of AtlasEntity BUT they must have a valid guid defined (not None, not -N) and a relationshipAttribute of meanings with an entry that has the term’s guid and relationshipGuid. Alternatively, you may provide your own dict that contains a ‘guid’ and ‘relationshipGuid’ key and value. Lastly, you may also pass in the results of the ‘entities’ key from the get_entity method and it will parse the relationshipAttributes properly and silently ignore the meanings that do not match the termGuid.

Parameters
  • entities (list(Union(dict, AtlasEntity))) – The list of entities that should have the term assigned.

  • termGuid (str) – The guid for the term. Ignored if using termName.

  • termName (str) – The name of the term. Optional if using termGuid.

  • glossary_name (str) – The name of the glossary. Defaults to Glossary. Ignored if using termGuid.

Returns

A dictionary indicating success or failure.

Return type

dict

get_glossary(name='Glossary', guid=None, detailed=False)

Retrieve the specified glossary by name or guid along with the term headers (AtlasRelatedTermHeader: including displayText and termGuid). Providing the glossary name only will result in a lookup of all glossaries and returns the term headers (accessible via “terms” key) for all glossaries. Use detailed = True to return the full detail of terms (AtlasGlossaryTerm) accessible via “termInfo” key.

Parameters
  • name (str) – The name of the glossary to use, defaults to “Glossary”. Not required if using the guid parameter.

  • guid (str) – The unique guid of your glossary. Not required if using the name parameter.

  • detailed (bool) – Set to true if you want to pull back all terms and not just headers.

Returns

The requested glossary with the term headers (AtlasGlossary) or with detailed terms (AtlasGlossaryExtInfo).

Return type

list(dict)

get_term(guid=None, name=None, glossary_name='Glossary', glossary_guid=None)

Retrieve a single glossary term based on its guid. Providing only the glossary_name will result in a lookup for the glossary guid. If you plan on looking up many terms, consider using the get_glossary method with the detailed argument set to True. That method will provide all glossary terms in a dictionary for faster lookup.

Parameters
  • guid (str) – The guid of your term. Not required if name is specified.

  • name (str) – The name of your term’s display text. Overruled if guid is provided.

  • glossary_name (str) – The name of the glossary to use, defaults to “Glossary”. Not required if using the glossary_guid parameter.

  • glossary_guid (str) – The unique guid of your glossary. Not required if using the glossary_name parameter.

Returns

The requested glossary term as a dict.

Return type

dict

get_termAssignedEntities(termGuid=None, termName=None, glossary_name='Glossary', limit=- 1, offset=0, sort='ASC', glossary_guid=None)

Page through the assigned entities for the given term.

Parameters
  • termGuid (str) – The guid for the term. Ignored if using termName.

  • termName (str) – The name of the term. Optional if using termGuid.

  • glossary_name (str) – The name of the glossary. Defaults to Glossary. Ignored if using termGuid.

Returns

A list of Atlas relationships between the given term and entities.

Return type

list(dict)

upload_term(term, force_update=False, **kwargs)

Upload a single term to Apache Atlas.

Provide an AtlasGlossaryTerm or dictionary.

Parameters
  • term (Union(AtlasGlossaryTerm, dict)) – The term to be uploaded.

  • force_update (bool) – Currently not used.

Kwargs:
param dict parameters

The parameters to pass into the url.

Returns

The uploaded term’s current state.

Return type

dict

upload_terms(terms, force_update=False, **kwargs)

Upload a multiple terms to Apache Atlas.

Provide a list of AtlasGlossaryTerms or dictionaries.

Parameters
Kwargs:
param dict parameters

The parameters to pass into the url.

Returns

The uploaded term’s current state.

Return type

dict

class pyapacheatlas.core.glossary.PurviewGlossaryClient(endpoint_url, authentication, **kwargs)

Bases: pyapacheatlas.core.glossary.glossaryclient.GlossaryClient

export_terms(guids, csv_path, glossary_name='Glossary', glossary_guid=None)

Export specific terms as provided by guid. Due to the design of Purview, you may not export terms with different term templates. Instead, you should batch exports based on the term template.

This method writes the csv file to the provided path.

Parameters
  • guids (list(str)) – List of guids that should be exported as csv.

  • csv_path (str) – Path to CSV that will be imported.

  • glossary_name (str) – Name of the glossary. Defaults to ‘Glossary’. Not used if glossary_guid is provided.

  • glossary_guid (str) – Guid of the glossary, optional if glossary_name is provided. Otherwise, this parameter takes priority over glossary_name. Providing glossary_guid is also faster as you avoid a lookup based on glossary_name.

Returns

A csv file is written to the csv_path.

Return type

None

import_terms(csv_path, glossary_name='Glossary', glossary_guid=None)

Bulk import terms from an existing csv file. If you are using the system default, you must include the following headers: Name,Definition,Status,Related Terms,Synonyms,Acronym,Experts,Stewards

For custom term templates, additional attributes must include [Attribute][termTemplateName]attributeName as the header.

In the resulting JSON, you will receive an operation guid that can be passed to the PurviewClient.glossary.import_terms_status method to determine the success or failure of the import.

Parameters
  • csv_path (str) – Path to CSV that will be imported.

  • glossary_name (str) – Name of the glossary. Defaults to ‘Glossary’. Not used if glossary_guid is provided.

  • glossary_guid (str) – Guid of the glossary, optional if glossary_name is provided. Otherwise, this parameter takes priority over glossary_name.

Returns

A dict that contains an id that you can use in import_terms_status to get the status of the import operation.

Return type

dict

import_terms_status(operation_guid)

Get the operation status of a glossary term import activity. You get the operation_guid after executing the import_terms method and find the id field in the response dict/json.

Parameters

operation_guid (str) – The id of the import operation.

Returns

The status of the import operation as a dict. The dict includes a field called status that will report back RUNNING, SUCCESS, or FAILED. Other fields include the number of terms detected and number of errors.

Return type

dict

upload_term(term, includeTermHierarchy=True, force_update=False, **kwargs)

Upload a single term to Azure Purview. If you plan on uploading many terms programmatically, you might look at PurviewClient.glossary.upload_terms or PurviewClient.glossary.import_terms.

Provide a PurviewGlossaryTerm or dictionary.

Parameters
  • term (Union(PurviewGlossaryTerm, dict)) – The term to be uploaded.

  • includeTermHierarchy (bool) – Must be True if you are using hierarchy or term templates.

  • force_update (bool) – Currently not used.

Kwargs:
param dict parameters

The parameters to pass into the url.

Returns

The uploaded term’s current state.

Return type

dict

upload_terms(terms, includeTermHierarchy=True, force_update=False, **kwargs)

Upload many terms to Azure Purview. However, if you plan on uploading many terms with many details, you might look at PurviewClient.glossary.import_terms instead which supports a csv upload.

Provide a list of PurviewGlossaryTerms or dictionaries.

Parameters
  • terms (list(Union(PurviewGlossaryTerm, dict))) – The term to be uploaded.

  • includeTermHierarchy (bool) – Must be True if you are using hierarchy or term templates.

  • force_update (bool) – Currently not used.

Kwargs:
param dict parameters

The parameters to pass into the url.

Returns

The uploaded terms’ current states.

Return type

dict

class pyapacheatlas.core.glossary.PurviewGlossaryTerm(**kwargs)

Bases: pyapacheatlas.core.glossary.term._CrossPlatformTerm

Create a Purview Glossary Term that supports term template attributes and hierarchical parents.

Parameters
  • name (str) – A term that you want to upload. Also used as the nickname.

  • glossary_guid (str) – The guid of the glossary the term belongs to.

  • qualifiedName (str) – The qualified name of the term. It usually is termName@GlossaryName

  • status (str) – Should be one of Draft, Approved, Alert, Expired.

  • longDescription (str) – The long description of the term.

Purview Supported Term Features:

Parameters
  • parent_formal_name (str) – The formal name of the parent term which would be used in the hierarchy. It will be concatenated with term’s value to create the formal name of the uploaded term. Must be provided if you plan on using the hierarchy feature of Purview.

  • parent_term_guid (str) – The guid of the parent term which would be used in the hierarchy. If you only provide parent_formal_name, get_glossary_term is called to get this guid.

  • resources (list(dict)) – An array of resource objects with keys displayName and url.

  • seeAlso (list(dict)) – Each dictionary should have the key termGuid and value of a guid.

  • synonyms (list(dict)) – Each dictionary should have the key termGuid and value of a guid.

  • contacts (dict(str,list(dict(str,str)))) – Root keys of both or either Experts or Stewards and the inner dicts in the list should have a key of id with an AAD object id as value and key of info with a string.

Additional Args: While the Glossary Term may support these, not all are visible on the Azure Purview portal UI.

Parameters
  • abbreviation (str) – The abbreviation of the term.

  • anchor (dict(str, str)) – Not required if passing in glossary_guid. Otherwise, should have key glossaryGuid and value of the glossary guid you want to use.

  • examples (str) – A list of examples.

  • guid (str) – The guid of the term. Not necessary for new uploads.

  • usage (str) – The usage of the term.

  • antonyms (list(dict)) – A list of AtlasRelatedTermHeaders.

  • classifies (list(dict)) – A list of AtlasRelatedTermHeaders.

  • isA (list(dict)) – A list of AtlasRelatedTermHeaders.

  • preferredTerms (list(dict)) – A list of AtlasRelatedTermHeaders.

  • preferredToTerms (list(dict)) – A list of AtlasRelatedTermHeaders.

  • replacedBy (list(dict)) – A list of AtlasRelatedTermHeaders.

  • replacementTerms (list(dict)) – A list of AtlasRelatedTermHeaders.

  • translatedTerms (list(dict)) – A list of AtlasRelatedTermHeaders.

  • translationTerms (list(dict)) – A list of AtlasRelatedTermHeaders.

  • validValues (list(dict)) – A list of AtlasRelatedTermHeaders.

  • validValuesFor (list(dict)) – A list of AtlasRelatedTermHeaders.

add_expert(objectId, info='')

Add an expert to your term’s contacts. You must provide the AAD object id and can optionally provide some information about the user.

Parameters
  • objectId (str) – The AAD object Id of the user.

  • info (str) – Optional information about the user.

add_hierarchy(parentFormalName, parentGuid)

Add hierarchy to your term. It creates the parentTerm property and requires the parent’s guid and the parent’s formal name.

Parameters
  • parentTerm (str) – The formal name of the parent term.

  • parentGuid (str) – The guid of the parent term.

add_steward(objectId, info='')

Add a steward to your term’s contacts. You must provide the AAD object id and can optionally provide some information about the user.

Parameters
  • objectId (str) – The AAD object Id of the user.

  • info (str) – Optional information about the user.

property name

The name of the term. Will be concatenated with parent formal name if provided.

Returns

The formal name of the term.

Return type

str

property nickName

Returns the raw name. It will match the name property if no parent formal name is provided.

Returns

The raw name of the term.

Return type

str

property parentFormalName

The formal name of the parent term used in hierarchies.

Returns

The parent formal name.

Return type

str

property parentGuid

The guid of the parent term used in hierarchies.

Returns

The parent guid.

Return type

str

property qualifiedName

The qualified name should be suffixed with @glossaryName with @Glossary being the correct suffix for the default dictionary. If a parent formal name was provided the qualified name will be prefixed with the parent formal name.

Returns

The qualified name of the term.

Return type

str

to_json()

Convert the PurviewGlossaryTerm to a json / dict. It will omit the non-initalized fields and handle the nickname, name, qualifiedName, and parentTerm (for hierarchical terms) appropriately if set using the add_hierarchy method.

Returns

The glossary term as a dict.

Return type

dict

pyapacheatlas.core.msgraph module

class pyapacheatlas.core.msgraph.MsGraphClient(authentication, **kwargs)

Bases: object

email_to_id(email, api_version='v1.0')

Based on email address, look up the user in Azure Active directory. It’s set to exact match on the mail field of the graph user response.

Parameters

email (str) – The email address of the user.

Returns

The AAD object id of the user principal name.

Return type

str

upn_to_id(userPrincipalName, api_version='v1.0')

Based on user principal name, look up the user in Azure Active directory.

Parameters

userPrincipalName (str) – The user principal name of the user.

Returns

The AAD object id of the user principal name.

Return type

str

exception pyapacheatlas.core.msgraph.MsGraphException

Bases: BaseException

pyapacheatlas.core.typedef module

class pyapacheatlas.core.typedef.AtlasAttributeDef(name, **kwargs)

Bases: object

An implementation of AtlasAttributeDef.

Parameters

name (str) – The name of the Attribute Definition. Provides a standard way to pass in an attribute definition when defining an Entity.

Kwargs:
param cardinality

One of Cardinality.SINGLE, .SET, .LIST. Defaults to SINGLE.

type cardinality

pyapacheatlas.core.typedef.Cardinality

param str typeName

The type of this attribute. Defaults to string.

propertiesEnum = ['cardinality', 'constraints', 'defaultValue', 'description', 'displayName', 'includeInNotification', 'indexType', 'isIndexable', 'isOptional', 'isUnique', 'name', 'options', 'searchWeight', 'typeName', 'valuesMaxCount', 'valuesMinCount']
to_json(omit_nulls=True)
class pyapacheatlas.core.typedef.AtlasRelationshipAttributeDef(name, relationshipTypeName, **kwargs)

Bases: pyapacheatlas.core.typedef.AtlasAttributeDef

An implementation of AtlasRelationshipAttributeDef. Provides a standard way to pass in a relationship definition when defining an Entity rather than creating a separate relationship def.

Parameters
  • name (str) – The name of the Relationship Attribute Definition.

  • relationshipTypeName (str) – The name of the relationship type being defined. Commonly uses ‘endDef1_endDef2’ where endDef’s are the names given based on the relationship being used.

Kwargs:
param cardinality

One of Cardinality.SINGLE, .SET, .LIST. Defaults to SINGLE.

type cardinality

pyapacheatlas.core.typedef.Cardinality

param str typeName

The type of this attribute. Defaults to string.

class pyapacheatlas.core.typedef.AtlasRelationshipEndDef(name, typeName, cardinality=Cardinality.SINGLE, isContainer=False, **kwargs)

Bases: object

An implementation of AtlasRelationshipEndDef.

Parameters
  • name (str) – The name that will appear on the entity’s relationship attribute.

  • typeName (str) – The type that is required for this end of the relationship.

  • cardinality – The cardinality of the end definition.

  • isContainer (bool) – This should be False when the cardinality is SINGLE. It should be True when cardinality is SET or LIST. endDef1 should

Kwargs:
param str description

The description of this end of the relationship.

param bool isLegacyAttribute

Defaults to False.

to_json(omit_nulls=True)

Converts the typedef object to a dict / json.

Parameters

omit_null (bool) – If True, omits keys with value of None.

Returns

The dict / json version of the type def.

Return type

dict

class pyapacheatlas.core.typedef.AtlasStructDef(name, category, **kwargs)

Bases: pyapacheatlas.core.typedef.BaseTypeDef

An implemention of AtlasStructDef. Not expected to be used by the end users.

Parameters
  • name (str) – The name of the type definition.

  • category (TypeCategory) – The category of the typedef.

Kwargs:
param attributeDefs

The AtlasAttributeDefs that should be available on the struct.

type attributeDefs

list(Union(dict, pyapacheatlas.core.typedef.AtlasAttributeDef))

addAttributeDef(*args)

Add one or many attribute definitions.

Parameters

args (Union(dict, pyapacheatlas.core.typedef.AtlasAttributeDef)) – The attribute defs you are adding. They are comma delimited dicts or AtlasAttributeDefs. You can expand a list with *my_list.

property attributeDefs
Returns

List of attribute definitions.

Return type

list(dict)

to_json(omit_nulls=True)

Convert the defintion to a JSON dict.

Returns

The definition as a dict.

Return type

dict

class pyapacheatlas.core.typedef.BaseTypeDef(name, category, **kwargs)

Bases: object

An implementation of AtlasBaseTypeDef.

Parameters
  • name (str) – The name of the typedef.

  • category (TypeCategory) – The category of the typedef.

to_json(omit_nulls=True)

Converts the typedef object to a dict / json.

Parameters

omit_null (bool) – If True, omits keys with value of None.

Returns

The dict / json version of the type def.

Return type

dict

class pyapacheatlas.core.typedef.Cardinality(value)

Bases: enum.Enum

An implementation of an Atlas Cardinality used in relationshipDefs.

LIST = 'LIST'
SET = 'SET'
SINGLE = 'SINGLE'
class pyapacheatlas.core.typedef.ChildEndDef(name, typeName, **kwargs)

Bases: pyapacheatlas.core.typedef.AtlasRelationshipEndDef

A helper for creating a Child end def (e.g. EndDef2 that is a single). The goal being to simplify and reduce the margin of error when creating containing relationships. This should be used in EndDef2 when the relationshipCategory is COMPOSITION or AGGREGATION.

Defaults to cardinality of SINGLE and isContainer=False.

class pyapacheatlas.core.typedef.ClassificationTypeDef(name, entityTypes=[], superTypes=[], **kwargs)

Bases: pyapacheatlas.core.typedef.AtlasStructDef

An implementation of AtlasClassificationDef

Parameters
  • name (str) – The name of the type definition.

  • entityTypes (list(str)) – The list of entityTypes for the classification.

  • superTypes (list(str)) – The list of superTypes for the classification.

Kwargs:
param attributeDefs

The AtlasAttributeDefs that should be available on the Classification.

type attributeDefs

list(Union(dict, pyapacheatlas.core.typedef.AtlasAttributeDef))

param list(str) subTypes

The types that will inherit this classification.

class pyapacheatlas.core.typedef.EntityTypeDef(name, superTypes=['DataSet'], **kwargs)

Bases: pyapacheatlas.core.typedef.AtlasStructDef

An implementation of AtlasEntityDef

Parameters
  • name (str) – The name of the type definition.

  • superTypes (list(str)) – The list of superTypes for the classification. You most likely want [‘DataSet’] to create a DataSet asset which is the default.

Kwargs:
param attributeDefs

The AtlasAttributeDefs that should be available on the Entity.

type attributeDefs

list(Union(dict, pyapacheatlas.core.typedef.AtlasAttributeDef))

addRelationshipAttributeDef(*args)

Add one or many attribute definitions.

Parameters

args (Union(dict, pyapacheatlas.core.typedef.relationshipAttributeDefs)) – The attribute defs you are adding. They are comma delimited dicts or AtlasAttributeDefs. You can expand a list with *my_list.

property relationshipAttributeDefs
Returns

List of relationship attribute definitions.

Return type

list(dict)

to_json(omit_nulls=True)

Convert the defintion to a JSON dict.

Returns

The definition as a dict.

Return type

dict

class pyapacheatlas.core.typedef.ParentEndDef(name, typeName, **kwargs)

Bases: pyapacheatlas.core.typedef.AtlasRelationshipEndDef

A helper for creating a Parent end def (e.g. EndDef1 that is a container). The goal being to simplify and reduce the margin of error when creating containing relationships. This should be used in EndDef1 when the relationshipCategory is COMPOSITION or AGGREGATION.

Defaults to cardinality of SET and isContainer=True.

class pyapacheatlas.core.typedef.RelationshipTypeDef(name, endDef1, endDef2, relationshipCategory, **kwargs)

Bases: pyapacheatlas.core.typedef.BaseTypeDef

An implementation of AtlasRelationshipDef.

Parameters
  • name (str) – The name of the relationship type def.

  • endDef1 (Union(AtlasRelationshipEndDef, dict)) – Either a valid AtlasRelationshipEndDef dict or class object.

  • endDef2 (Union(AtlasRelationshipEndDef, dict)) – Either a valid AtlasRelationshipEndDef dict or class object.

  • relationshipCategory (str) – One of COMPOSITION, AGGREGATION, ASSOCIATION. You’re most likely looking at COMPOSITION to create a parent/child relationship.

property endDef1
Returns

The first end definition.

Return type

dict

property endDef2
Returns

The second end definition.

Return type

dict

to_json(omit_nulls=True)

Convert the defintion to a JSON dict.

Returns

The definition as a dict.

Return type

dict

class pyapacheatlas.core.typedef.TypeCategory(value)

Bases: enum.Enum

An implementation of an Atlas TypeCategory used in relationshipDefs.

BUSINESSMETADATA = 'business_Metadata'
CLASSIFICATION = 'classification'
ENTITY = 'entity'
ENUM = 'enum'
RELATIONSHIP = 'relationship'
STRUCT = 'struct'

pyapacheatlas.core.util module

class pyapacheatlas.core.util.AtlasBaseClient(**kwargs)

Bases: object

exception pyapacheatlas.core.util.AtlasException

Bases: BaseException

class pyapacheatlas.core.util.AtlasUnInit

Bases: object

Represents a value that has not been initialized and will not be included in json body.

class pyapacheatlas.core.util.GuidTracker(starting=- 1000, direction='decrease')

Bases: object

Always grab the next available guid by either inrementing or decrementing. When defining an interconnected set of Atlas Entities, you use a negative integer to provide an entity with a temporary unique id.

get_guid()

Retrieve the next unique guid and update the guid.

Returns

A “unique” integer guid for your atlas object.

Return type

str

peek_next_guid()

Peek at the next guid without updating the guid.

Returns

The next guid you would receive.

Return type

str

pyapacheatlas.core.util.PurviewLimitation(func)

Raise a runtime warning if you are using a PurviewClient. Intended to wrap specific client methods that have limitations due to Purview.

pyapacheatlas.core.util.PurviewOnly(func)

Raise a runtime warning if you are using an AtlasClient (or non Purview) client. Intended to wrap specific client methods that are only available in Purview.

pyapacheatlas.core.util.batch_dependent_entities(entities, batch_size=1000)

Take a list of entities and organize them to batches of max batch_size.

This algorithm handles uploading multiple entities that are dependent on each other. For example, if A depends on B and B depends on C then the three entities will guaranteed be in the same batch.

Dependencies can be specified in either direction. For example a table may not have any relationship attribute dependencies. However, several columns may point to the given table. This will be handled by this function.

Parameters
  • entities (list(dict)) – A list of AtlasEntities to be uploaded as dicts

  • batch_size (int) –

Returns

A list of lists that organize the entities into batches of max batch_size and are in “most independent” to “least independent” meaning the batches with more dependencies between its entities will be at the end of the list of lists.

Return type

list( list(dict) )

pyapacheatlas.core.whatif module

class pyapacheatlas.core.whatif.EntityField(name, isOptional)

Bases: tuple

property isOptional

Alias for field number 1

property name

Alias for field number 0

class pyapacheatlas.core.whatif.WhatIfValidator(type_defs={}, existing_entities=[])

Bases: object

Provides a simple way to validate that your entities will successfully upload. Provides functions to validate the type, check if required attributes are missing, and check if superfluous attributes are included.

Parameters
  • type_defs (dict) – The list of type definitions to be validated against. Should be in the form of an AtlasTypeDef composite wrapper.

  • existing_entities (list(dict)) – The existing entities that should be validated against.

ASSET_ATTRIBUTES = ['name', 'description', 'owner']
ATLAS_MODEL = {'ASSET': ['name', 'description', 'owner'], 'DATASET': ['name', 'description', 'owner', 'qualifiedName'], 'INFRASTRUCTURE': ['name', 'description', 'owner', 'qualifiedName'], 'PROCESS': ['inputs', 'outputs', 'name', 'description', 'owner', 'qualifiedName'], 'REFERENCABLE': ['qualifiedName']}
REFERENCABLE_ATTRIBUTES = ['qualifiedName']
entity_has_invalid_attributes(entity)

Check if the entity is using attributes that are not defined on the type.

Parameters

entity (dict) –

Returns

Whether the entity matches the list of known entity types.

Return type

bool

entity_missing_attributes(entity)

Check if the entity is missing required attributes.

Parameters
  • entity (dict) –

  • type_def (dict) –

Returns

Whether the entity matches the list of known entity types.

Return type

bool

entity_type_exists(entity)

Validate that the entity’s type is an actual entity definition.

Parameters

entity (dict) –

Returns

Whether the entity matches the list of known entity types.

Return type

bool

entity_would_overwrite(entity)

Based on the qualified name attributes, does the provided entity exist in the entities provided to the What If Validator?

Parameters

entity (dict) –

Returns

Whether the entity matches an existing entity.

Return type

bool

validate_entities(entities)

Provide a report of invalid entities. Includes TypeDoesNotExist, UsingInvalidAttributes, and MissingRequiredAttributes.

Parameters

entities (list(dict)) – A list of entities to validate.

Returns

A dictionary containing counts values for the above values.

Return type

dict