SageMaker / Client / describe_cluster_event
describe_cluster_event¶
- SageMaker.Client.describe_cluster_event(**kwargs)¶
Retrieves detailed information about a specific event for a given HyperPod cluster. This functionality is only supported when the
NodeProvisioningMode
is set toContinuous
.See also: AWS API Documentation
Request Syntax
response = client.describe_cluster_event( EventId='string', ClusterName='string' )
- Parameters:
EventId (string) –
[REQUIRED]
The unique identifier (UUID) of the event to describe. This ID can be obtained from the
ListClusterEvents
operation.ClusterName (string) –
[REQUIRED]
The name or Amazon Resource Name (ARN) of the HyperPod cluster associated with the event.
- Return type:
dict
- Returns:
Response Syntax
{ 'EventDetails': { 'EventId': 'string', 'ClusterArn': 'string', 'ClusterName': 'string', 'InstanceGroupName': 'string', 'InstanceId': 'string', 'ResourceType': 'Cluster'|'InstanceGroup'|'Instance', 'EventTime': datetime(2015, 1, 1), 'EventDetails': { 'EventMetadata': { 'Cluster': { 'FailureMessage': 'string', 'EksRoleAccessEntries': [ 'string', ], 'SlrAccessEntry': 'string' }, 'InstanceGroup': { 'FailureMessage': 'string', 'AvailabilityZoneId': 'string', 'CapacityReservation': { 'Arn': 'string', 'Type': 'ODCR'|'CRG' }, 'SubnetId': 'string', 'SecurityGroupIds': [ 'string', ], 'AmiOverride': 'string' }, 'InstanceGroupScaling': { 'InstanceCount': 123, 'TargetCount': 123, 'FailureMessage': 'string' }, 'Instance': { 'CustomerEni': 'string', 'AdditionalEnis': { 'EfaEnis': [ 'string', ] }, 'CapacityReservation': { 'Arn': 'string', 'Type': 'ODCR'|'CRG' }, 'FailureMessage': 'string', 'LcsExecutionState': 'string', 'NodeLogicalId': 'string' } } }, 'Description': 'string' } }
Response Structure
(dict) –
EventDetails (dict) –
Detailed information about the requested cluster event, including event metadata for various resource types such as
Cluster
,InstanceGroup
,Instance
, and their associated attributes.EventId (string) –
The unique identifier (UUID) of the event.
ClusterArn (string) –
The Amazon Resource Name (ARN) of the HyperPod cluster associated with the event.
ClusterName (string) –
The name of the HyperPod cluster associated with the event.
InstanceGroupName (string) –
The name of the instance group associated with the event, if applicable.
InstanceId (string) –
The EC2 instance ID associated with the event, if applicable.
ResourceType (string) –
The type of resource associated with the event. Valid values are
Cluster
,InstanceGroup
, orInstance
.EventTime (datetime) –
The timestamp when the event occurred.
EventDetails (dict) –
Additional details about the event, including event-specific metadata.
EventMetadata (dict) –
Metadata specific to the event, which may include information about the cluster, instance group, or instance involved.
Note
This is a Tagged Union structure. Only one of the following top level keys will be set:
Cluster
,InstanceGroup
,InstanceGroupScaling
,Instance
. If a client receives an unknown member it will setSDK_UNKNOWN_MEMBER
as the top level key, which maps to the name or tag of the unknown member. The structure ofSDK_UNKNOWN_MEMBER
is as follows:'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
Cluster (dict) –
Metadata specific to cluster-level events.
FailureMessage (string) –
An error message describing why the cluster level operation (such as creating, updating, or deleting) failed.
EksRoleAccessEntries (list) –
A list of Amazon EKS IAM role ARNs associated with the cluster. This is created by HyperPod on your behalf and only applies for EKS orchestrated clusters.
(string) –
SlrAccessEntry (string) –
The Service-Linked Role (SLR) associated with the cluster. This is created by HyperPod on your behalf and only applies for EKS orchestrated clusters.
InstanceGroup (dict) –
Metadata specific to instance group-level events.
FailureMessage (string) –
An error message describing why the instance group level operation (such as creating, scaling, or deleting) failed.
AvailabilityZoneId (string) –
The ID of the Availability Zone where the instance group is located.
CapacityReservation (dict) –
Information about the Capacity Reservation used by the instance group.
Arn (string) –
The Amazon Resource Name (ARN) of the Capacity Reservation.
Type (string) –
The type of Capacity Reservation. Valid values are
ODCR
(On-Demand Capacity Reservation) orCRG
(Capacity Reservation Group).
SubnetId (string) –
The ID of the subnet where the instance group is located.
SecurityGroupIds (list) –
A list of security group IDs associated with the instance group.
(string) –
AmiOverride (string) –
If you use a custom Amazon Machine Image (AMI) for the instance group, this field shows the ID of the custom AMI.
InstanceGroupScaling (dict) –
Metadata related to instance group scaling events.
InstanceCount (integer) –
The current number of instances in the group.
TargetCount (integer) –
The desired number of instances for the group after scaling.
FailureMessage (string) –
An error message describing why the scaling operation failed, if applicable.
Instance (dict) –
Metadata specific to instance-level events.
CustomerEni (string) –
The ID of the customer-managed Elastic Network Interface (ENI) associated with the instance.
AdditionalEnis (dict) –
Information about additional Elastic Network Interfaces (ENIs) associated with the instance.
EfaEnis (list) –
A list of Elastic Fabric Adapter (EFA) ENIs associated with the instance.
(string) –
CapacityReservation (dict) –
Information about the Capacity Reservation used by the instance.
Arn (string) –
The Amazon Resource Name (ARN) of the Capacity Reservation.
Type (string) –
The type of Capacity Reservation. Valid values are
ODCR
(On-Demand Capacity Reservation) orCRG
(Capacity Reservation Group).
FailureMessage (string) –
An error message describing why the instance creation or update failed, if applicable.
LcsExecutionState (string) –
The execution state of the Lifecycle Script (LCS) for the instance.
NodeLogicalId (string) –
The unique logical identifier of the node within the cluster. The ID used here is the same object as in the
BatchAddClusterNodes
API.
Description (string) –
A human-readable description of the event.
Exceptions
SageMaker.Client.exceptions.ResourceNotFound