Cgroup only gets deleted if a process assigned to the Cgroup has exited. If it has not exited after the timeout then the Cgroup will not get deleted. We should increase the timeout to reduce the risk of accumulating Cgroups. yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms will be set to 5000ms.
Also the accumulated CPU utilization of NM containers should not always be set to 100 as this may impact other services. This Jira makes it configurable.