Resolution
Follow these troubleshooting steps for the error that you received.
"java.lang.OutOfMemoryError: Java heap space"
You might receive the preceding error when you run a command for cluster operations without the client properties file. To resolve this issue, include the properties based on the type of authentication in the client.properties file.
Example command with only an AWS Identity and Access Management (IAM) authentication port:
./kafka-topics.sh --create --bootstrap-server $BOOTSTRAP:9098 --replication-factor 3 --partitions 1 --topic TestTopic
Example command with an IAM authentication port and the client properties file:
./kafka-topics.sh --create --bootstrap-server $BOOTSTRAP:9098 --command-config client.properties --replication-factor 3 --partitions 1 --topic TestTopic
"org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: createTopics"
You might receive the preceding error when there's a network misconfiguration between the client application and Amazon MSK cluster.
To troubleshoot this issue, run the following telnet command to test connectivity from the client machine:
telnet bootstrap-broker port-number
Note: Replace bootstrap-broker with one of the broker addresses from your Amazon MSK cluster. Replace port-number with the port value based on the authentication that's turned on for your cluster.
If the client machine can access the brokers, then there are no connectivity issues. If the client machine can't access the brokers, then review the network connectivity configuration settings. Check the inbound and outbound rules for the security group.
"org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [test_topic]"
You might receive the preceding error when you use IAM authentication and your access policy blocks topic operations, such as WriteData and ReadData.
Note: Permission boundaries and service control policies (SCPs) also block a user's attempt to connect to the cluster without the required authorization.
If you use an authentication that's not IAM, then check whether you added topic-level access control lists (ACLs) that block operations.
To list the ACLs that are applied on a topic, run the following command:
bin/kafka-acls.sh --bootstrap-server $BOOTSTRAP:PORT --command-config adminclient-configs.conf --list --topic testtopic
"ZooKeeperClientTimeoutException"
You might receive the preceding error when the client tries to connect to the cluster through the Apache ZooKeeper string and the connection didn't establish. You might also receive this error when the Apache ZooKeeper string is incorrect.
Example incorrect Apache Zookeeper string:
./kafka-topics.sh --zookeeper z-1.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:2181,z-2.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:2181,z-3.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:2181 --list[2020-04-10 23:58:47,963] WARN Client session timed out, have not heard from server in 10756ms for sessionid 0x0 (org.apache.zookeeper.ClientCnxn)
Example output:
[2020-04-10 23:58:58,581] WARN Client session timed out, have not heard from server in 10508ms for sessionid 0x0 (org.apache.zookeeper.ClientCnxn)[2020-04-10 23:59:08,689] WARN Client session timed out, have not heard from server in 10004ms for sessionid 0x0 (org.apache.zookeeper.ClientCnxn)
Exception in thread "main" kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:259)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:255)
at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:113)
at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1858)
at kafka.admin.TopicCommand$ZookeeperTopicService$.apply(TopicCommand.scala:321)
at kafka.admin.TopicCommand$.main(TopicCommand.scala:54)
at kafka.admin.TopicCommand.main(TopicCommand.scala)
To resolve this issue, take the following actions:
- Verify that you used the correct Apache ZooKeeper string.
- Make sure that the security group for your Amazon MSK cluster allows inbound traffic from the client's security group on the Apache ZooKeeper ports.
- If your Apache ZooKeeper nodes are associated with a different security group than the MSK brokers or the client, then check your configuration requirements. The client's security group must allow connectivity to the security group associated with the ZooKeeper nodes on the required ZooKeeper ports. The ZooKeeper network interface security group inbound rules must allow connectivity from the client.
"Broker may not be unavailable"
"Topic <topicName> not present in metadata after 60000 ms. or Connection to node -<node-id> (<broker-host>/<broker-ip>:<port>) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)"
You might receive the preceding error when one of the following are true:
- The producer or consumer can't connect to the broker host and port.
- The broker string is incorrect.
If you receive this error even though the client or broker connectivity initially worked, then the broker might be unavailable.
This error can also occur when you use the broker string to access the cluster from outside the virtual private cloud (VPC).
Example producer broker string:
./kafka-console-producer.sh --broker-list b-2.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:9092,b-1.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:9092 --topic test
Example output:
[2020-04-10 23:51:57,668] ERROR Error when sending message to topic test with key: null, value: 1 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)org.apache.kafka.common.errors.TimeoutException: Topic test not present in metadata after 60000 ms.
Example consumer broker string:
./kafka-console-consumer.sh --bootstrap-server b-2.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:9092,b-1.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com:9092 --topic test
Example output:
[2020-04-11 00:03:21,157] WARN [Consumer clientId=consumer-console-consumer-88994-1, groupId=console-consumer-88994] Connection to node -1 (b-2.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com/172.31.6.19:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)[2020-04-11 00:04:36,818] WARN [Consumer clientId=consumer-console-consumer-88994-1, groupId=console-consumer-88994] Connection to node -2 (b-1.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com/172.31.44.252:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2020-04-11 00:05:53,228] WARN [Consumer clientId=consumer-console-consumer-88994-1, groupId=console-consumer-88994] Connection to node -1 (b-2.encryption.3a3zuy.c7.kafka.us-east-1.amazonaws.com/172.31.6.19:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
To troubleshoot this issue, take the following actions:
- Make sure that you use the correct broker string and port.
- If the broker is unavailable for ZooKeeper managed clusters, then check the ActiveControllerCount Amazon CloudWatch metric. Then, verify that the controller was active during the period. If the metric's value isn't 1, then one of the brokers in the cluster might be unavailable. Use the sum statistic over a 1 minute period to view the metric.
- Check the ZooKeeperSessionState metric to confirm that the brokers are in continual communication with the Apache ZooKeeper nodes.
- To understand why the broker failed, check the KafkaDataLogsDiskUsed metric to determine whether the broker ran out of storage space. For more information, see Amazon MSK metrics for monitoring Standard brokers with CloudWatch.
- Check whether the network configuration caused the issue. Amazon MSK resources are provisioned within the VPC. You must connect to the Amazon MSK cluster or produce over a private network in the same VPC. For information, see Unable to access cluster from within AWS: networking issues. Also, see How do I connect to my Amazon MSK cluster from inside AWS network but outside the cluster's Amazon VPC? in the Amazon MSK FAQ.
"Topic not present in metadata"
"org.apache.kafka.common.errors.TimeoutException: Topic test not present in metadata after 60000 ms"
You might receive the preceding error when the topic that you tried to write to doesn't exist in Amazon MSK. Check whether the topic exists in your Amazon MSK cluster. Verify that you used the correct broker string and port in your client configuration. If the topic doesn't exist, then either create the topic in Amazon MSK, or set auto.create.enable to true in your cluster configuration.
Note: When auto.create.enable is set to true, topics are automatically created.
You might also receive this error when the topic exists, but the partition doesn't. For example, you have a single partition [0] and your producer tries to send to partition [1].
Make sure that your Amazon MSK cluster's security group allows inbound traffic from the security group of the client application on the required ports.
If the error suddenly occurs after the system previously worked, then take the following actions to check the status of your Amazon MSK brokers:
- For ZooKeeper managed clusters, check the ActiveControllerCount metric. The value must be 1. If the metric has any other value, then one of the brokers in the cluster is unavailable. Use the sum statistic over a 1 minute period to view the metric.
- Check the ZooKeeprSessionState metric to confirm that the brokers are in continual communication with the ZooKeeper nodes.
- Monitor the KafkaDataLogsDiskUsed metric to make sure that the broker didn't run out of storage space.
Verify that you didn't try to access the cluster from outside the VPC without the correct configuration. By default, Amazon MSK resources are provisioned within the VPC. You must connect over a private network in the same VPC.
If you try to access the cluster from outside the VPC, then make sure you set up the necessary networking configurations for the AWS service. For example, AWS Client VPN or AWS Direct Connect.
Related information
Connect to an Amazon MSK Provisioned cluster
How do I troubleshoot authentication and permission issues when I use my Amazon MSK cluster with SASL/SCRAM authentication turned on?
Troubleshoot your Amazon MSK cluster