Cpu idle linux что это

What are the CPU c-states? How to check and monitor the CPU c-state usage in Linux per CPU and core?

Below list of topics are covered in this article

What are C-states, cstates, or C-modes?

There are various power modes of the CPU which are determined on the basis of their current usage and are collectively called “C-states” or “C-modes.”

The lower-power mode was first introduced with the 486DX4 processor. To the present, more power modes has been introduced and enhancements has been made to each mode for the CPU to consume less power in these low-power modes.

  • Each state of the CPU utilises different amount of power and impacts the application performance differently.
  • Whenever a CPU core is idle, the builtin power-saving logic kicks in and tries to transition the core from the current C-state to a higher C-state, turning off various processor components to save power
  • But you also need to understand that every time an application tries to bind itself to a CPU to do some task, the respective CPU has to come back from its «deeper sleep state» to «running state» that needs more time to wake up the CPU and be again 100% up and running. It also has to be done in an atomic context, so that nothing tries to use the core while it’s being powered up.
  • So the various modes to which the CPU transitions are called C-states
  • They are usually starting in C0, which is the normal CPU operating mode, i.e., the CPU is 100% turned on
  • With increasing C number, the CPU sleep mode is deeper, i.e., more circuits and signals are turned off and more time the CPU will require to return to C0 mode, i.e., to wake-up.
  • Each mode is also known by a name and several of them have sub-modes with different power saving – and thus wake-up time – levels.

Below table explains all the CPU C-states and their meaning

How can I disable processor sleep states?

Latency sensitive applications do not want the processor to transition into deeper C-states, due to the delays induced by coming out of the C-states back to C0. These delays can range from hundreds of microseconds to milliseconds.

There are various methods to achieve this.

Method 1
By booting with the kernel command line argument processor.max_cstate=0 the system will never enter a C-state other than zero.

You can add these variable in your grub2 file. Append » processor.max_cstate=0 » as shown below

Rebuild your initramfs

Reboot the node to activate the changes

  • The second method is to use the Power Management Quality of Service interface (PM QOS).
  • The file /dev/cpu_dma_latency is the interface which when opened registers a quality-of-service request for latency with the operating system.
  • A program should open /dev/cpu_dma_latency , write a 32-bit number to it representing a maximum response time in microseconds and then keep the file descriptor open while low-latency operation is desired. Writing a zero means that you want the fastest response time possible.
  • Various tuned profile can do this by reading the file continously and writing a value based on the input provided foe eg, network-latency, latency-performance etc.

Below is a snippet from latency-performance tuned file

Here as you see this file will always be on open state by the tuned as long as tuned is in running state

These profiles write force_latency as 1 to make sure the CPU c-state does not enters deeper C state other than C1.

How to read and interpret /dev/cpu_dma_latency?

If we use normal text editor tool to read this file then the output would be something like

Since this value is «raw» (not encoded as text) you can read it with something like hexdump.

When you read this further

It tells us that the current latency value time is 2000 seconds which is the time a CPU would need or take to come up from a deeper C state to C0.

When we set a tuned profile with force_latency=1

For example here I will set tuned profile of network-latency

Check the existing active profile

Now lets check the latency value

As you see the latency value has been changed to 1 micro second.

What is the maximum C-state allowed for my CPU?

We have multiple CPU c-states as you can see in the above table but depending upon the latency values and other max_cstate value provided in the GRUB the maximum allowed c-states for any processor can vary.

Below file should give the value from your node

How do I check the existing latency value for different C-states?

The latency value may change depending upon various C-states and the transition time from deeper C-states to C0.

Below command shall give you the existing latency values of all the c-states per cpu

# for state in state <0..4>; do echo c-$state `cat $state/name` `cat $state/latency` ; done
c-state0 POLL 0
c-state1 C1-HSW 2
c-state2 C1E-HSW 10
c-state3 C3-HSW 33
c-state4 C6-HSW 133

Similar value can be grepped for all the available CPUs by changing the cpu number in the above highlighted area.

How to check and monitor the CPU c-state usage in Linux per CPU and core?

You can use «turbostat» tool for this purpose which will give you runtime value for the CPU c-state usage for all the available CPU and cores.

I will be using ‘turbostat’ and ‘stress’ tool to monitor the CPU c-state and put some load on my CPU respectively.

To install these rpms you can use

Case 1: Using throughput-performance tuned profile

To check the currently active profile

With this our latency value is default i.e. 2000 seconds

Check the output using turbostat

As you see all the available CPU and cores are at c-6 state because all are free. Now if I start putting stress then the CPU will start transitioing from C6 to c0 state and c6 will become free as all CPU will be in running state

Читайте также:  Linux pci kernel module

Core CPU Avg_MHz Busy% Bzy_MHz TSC_MHz IRQ SMI CPU%c1 CPU%c3 CPU%c6 CPU%c7 CoreTmp PkgTmp PkgWatt RAMWatt PKG_%RAM_%
— — 2428 86.63 2804 2599 85363 656 6.08 0.96 6.33 0.00 57 60 119.27 17.04 0.00 0.00
0 0 2377 84.85 2802 2600 5756 41 9.47 1.09 4.59 0.00 55 60 55.56 6.59 0.00 0.00
1 8 1835 65.48 2801 2602 5742 41 20.04 2.11 12.37 0.00 54
2 1 2802 99.93 2803 2601 5037 41 0.07 0.00 0.00 0.00 57
3 9 2802 99.93 2803 2601 5035 41 0.07 0.00 0.00 0.00 56
4 2 2802 99.94 2803 2600 5044 41 0.06 0.00 0.00 0.00 57
5 10 1992 71.12 2802 2598 5688 41 16.62 1.77 10.50 0.00 54
6 3 2799 99.94 2803 2599 5049 41 0.06 0.00 0.00 0.00 57
7 11 1914 68.39 2801 2598 5720 41 18.45 2.09 11.07 0.00 51
0 4 2066 73.79 2800 2600 5335 41 9.85 2.19 14.17 0.00 46 53 63.72 10.45 0.00 0.00
1 12 2803 99.86 2807 2600 5088 41 0.14 0.00 0.00 0.00 52
2 5 656 23.46 2800 2597 3312 41 21.81 6.10 48.63 0.00 45
3 13 2799 99.86 2807 2597 5610 41 0.14 0.00 0.00 0.00 53
4 6 2799 99.86 2807 2597 7143 41 0.14 0.00 0.00 0.00 51
5 14 2799 99.86 2807 2597 5044 41 0.14 0.00 0.00 0.00 50
6 7 2799 99.86 2807 2597 5679 41 0.14 0.00 0.00 0.00 50
7 15 2799 99.86 2807 2597 5081 41 0.14 0.00 0.00 0.00 48

Core CPU Avg_MHz Busy% Bzy_MHz TSC_MHz IRQ SMI CPU%c1 CPU%c3 CPU%c6 CPU%c7 CoreTmp PkgTmp PkgWatt RAMWatt PKG_%RAM_%
— — 2421 86.42 2807 2595 84373 656 6.28 1.07 6.23 0.00 59 62 120.52 17.00 0.00 0.00
0 0 2798 99.83 2808 2595 5039 41 0.17 0.00 0.00 0.00 57 62 55.92 6.54 0.00 0.00
1 8 1891 67.58 2803 2595 5151 41 16.92 2.72 12.78 0.00 55
2 1 2798 99.83 2808 2595 5032 41 0.17 0.00 0.00 0.00 59
3 9 2798 99.83 2808 2595 6068 41 0.17 0.00 0.00 0.00 58
4 2 2798 99.83 2808 2595 5041 41 0.17 0.00 0.00 0.00 58
5 10 1527 54.56 2804 2595 5540 41 24.02 3.73 17.70 0.00 56
6 3 2793 99.83 2808 2590 5045 41 0.17 0.00 0.00 0.00 58
7 11 1692 60.57 2804 2590 5556 41 20.66 3.24 15.53 0.00 54
0 4 1425 50.99 2800 2595 5251 41 19.20 4.24 25.57 0.00 48 57 64.60 10.46 0.00 0.00
1 12 2799 99.85 2809 2595 5053 41 0.15 0.00 0.00 0.00 54
2 5 2799 99.84 2809 2595 5054 41 0.16 0.00 0.00 0.00 53
3 13 1419 50.79 2800 2595 4642 41 17.88 3.22 28.11 0.00 49
4 6 2799 99.85 2809 2595 5059 41 0.15 0.00 0.00 0.00 55
5 14 2799 99.84 2809 2595 5047 41 0.16 0.00 0.00 0.00 53
6 7 2799 99.84 2809 2595 6206 41 0.16 0.00 0.00 0.00 53
7 15 2801 99.84 2809 2597 5589 41 0.16 0.00 0.00 0.00 50

Now towards the end as you see the Busy% increases and the CPU state under c-6 is reduced which means the CPU are currently in running state.

Case 2: Change tuned profile to latency-performance

Next monitor the CPU c-state when the system is idle

As you see even when the CPU and cores are sitting idle still the CPU won’t transition to deeper c-states since we are forcing it to stay at C1

What is POLL idle state ?

If cpuidle is active, X86 platforms have one special idle state. The POLL idle state is not a real idle state, it does not save any power. Instead, a busy-loop is executed doing nothing for a short period of time. This state is used if the kernel knows that work has to be processed very soon and entering any real hardware idle state may result in a slight performance penalty.

There exist two different cpuidle drivers on the X86 architecture platform:

«acpi_idle» cpuidle driver
The acpi_idle cpuidle driver retrieves available sleep states (C-states) from the ACPI BIOS tables (from the _CST ACPI function on recent platforms or from the FADT BIOS table on older ones). The C1 state is not retrieved from ACPI tables. If the C1 state is entered, the kernel will call the hlt instruction (or mwait on Intel).

«intel_idle» cpuidle driver
In kernel 2.6.36 the intel_idle driver was introduced. It only serves recent Intel CPUs (Nehalem, Westmere, Sandybridge, Atoms or newer). On older Intel CPUs the acpi_idle driver is still used (if the BIOS provides C-state ACPI tables). The intel_idle driver knows the sleep state capabilities of the processor and ignores ACPI BIOS exported processor sleep states tables.


Name already in use

docLinux / articles / Как рассчитывается время и процент использования ЦП Linux.md

  • Go to file T
  • Go to line L
  • Copy path
  • Copy permalink

Copy raw contents

Copy raw contents

Как рассчитывается время и процент использования ЦП Linux

Время процессора распределяется в дискретных временных срезах (тиках). В течение определенного количества временных интервалов процессор занят, в других случаях это не так (что представлено процессом простоя). На рисунке ниже процессор занят для 6 из 10 срезов процессора. 6/10 = .60 = 60% занятого времени (и поэтому было бы 40% простоя).

Примечание. Тик (цикл) — это время, необходимое для отправки одного импульса. Импульс состоит из высокого напряжения, за которым следует низкое напряжение. Может быть миллиарды тиков в секунду в зависимости от частоты (ГГц) тактовой частоты процессора.

Вы можете получить количество тактов процессора с момента загрузки из /proc/stat

  • user: нормальные процессы, выполняющиеся в пользовательском режиме
  • nice: niced процессы, выполняющиеся в пользовательском режиме
  • system: процессы, выполняющиеся в режиме ядра
  • idle: большие пальцы
  • iowait: ожидание завершения ввода/вывода
  • irq: обслуживание прерываний
  • softirq: обслуживание Softirqs
  • steal: непроизвольное ожидание
  • guest: работающий под нормальным гость
  • guest_nice: работающий под приятным(niced) гостем

Чтобы рассчитать время загрузки процессора Linux, вычтите время простоя процессора из общего времени процессора следующим образом:

Общее время процессора с момента загрузки= user+nice+system+idle+iowait+irq+softirq+steal

Общее время простоя процессора с момента загрузки= idle + iowait

Общее время использования процессора с момента загрузки= Общее время процессора с момента загрузки — Общее время простоя процессора с момента загрузки

Общий процент процессора= Общее время использования процессора с момента загрузки/Общее время процессора с момента загрузки X 100

Если вы используете формулу в приведенных выше примерах данных, вы должны получить процент использования ЦП Linux в размере 60%.

Читайте также:  Драйвер для принтера катюша м247

Примечание: Guest и Guest_nice уже учтены в user и nice, следовательно, они не включены в общий расчет

Для использования процессора в режиме реального времени вам необходимо рассчитать время между двумя интервалами.

Ниже приведен пример сценария Bash Пола Колби, который делает это


CPU Idle Time Management¶

В© 2019 Intel Corporation

CPU Idle Time Management Subsystem¶

Every time one of the logical CPUs in the system (the entities that appear to fetch and execute instructions: hardware threads, if present, or processor cores) is idle after an interrupt or equivalent wakeup event, which means that there are no tasks to run on it except for the special “idle” task associated with it, there is an opportunity to save energy for the processor that it belongs to. That can be done by making the idle logical CPU stop fetching instructions from memory and putting some of the processor’s functional units depended on by it into an idle state in which they will draw less power.

However, there may be multiple different idle states that can be used in such a situation in principle, so it may be necessary to find the most suitable one (from the kernel perspective) and ask the processor to use (or “enter”) that particular idle state. That is the role of the CPU idle time management subsystem in the kernel, called CPUIdle .

The design of CPUIdle is modular and based on the code duplication avoidance principle, so the generic code that in principle need not depend on the hardware or platform design details in it is separate from the code that interacts with the hardware. It generally is divided into three categories of functional units: governors responsible for selecting idle states to ask the processor to enter, drivers that pass the governors’ decisions on to the hardware and the core providing a common framework for them.

CPU Idle Time Governors¶

A CPU idle time ( CPUIdle ) governor is a bundle of policy code invoked when one of the logical CPUs in the system turns out to be idle. Its role is to select an idle state to ask the processor to enter in order to save some energy.

CPUIdle governors are generic and each of them can be used on any hardware platform that the Linux kernel can run on. For this reason, data structures operated on by them cannot depend on any hardware architecture or platform design details as well.

The governor itself is represented by a struct cpuidle_governor object containing four callback pointers, enable , disable , select , reflect , a rating field described below, and a name (string) used for identifying it.

For the governor to be available at all, that object needs to be registered with the CPUIdle core by calling cpuidle_register_governor() with a pointer to it passed as the argument. If successful, that causes the core to add the governor to the global list of available governors and, if it is the only one in the list (that is, the list was empty before) or the value of its rating field is greater than the value of that field for the governor currently in use, or the name of the new governor was passed to the kernel as the value of the cpuidle.governor= command line parameter, the new governor will be used from that point on (there can be only one CPUIdle governor in use at a time). Also, user space can choose the CPUIdle governor to use at run time via sysfs .

Once registered, CPUIdle governors cannot be unregistered, so it is not practical to put them into loadable kernel modules.

The interface between CPUIdle governors and the core consists of four callbacks:

The role of this callback is to prepare the governor for handling the (logical) CPU represented by the struct cpuidle_device object pointed to by the dev argument. The struct cpuidle_driver object pointed to by the drv argument represents the CPUIdle driver to be used with that CPU (among other things, it should contain the list of struct cpuidle_state objects representing idle states that the processor holding the given CPU can be asked to enter).

It may fail, in which case it is expected to return a negative error code, and that causes the kernel to run the architecture-specific default code for idle CPUs on the CPU in question instead of CPUIdle until the ->enable() governor callback is invoked for that CPU again.

Called to make the governor stop handling the (logical) CPU represented by the struct cpuidle_device object pointed to by the dev argument.

It is expected to reverse any changes made by the ->enable() callback when it was last invoked for the target CPU, free all memory allocated by that callback and so on.

Called to select an idle state for the processor holding the (logical) CPU represented by the struct cpuidle_device object pointed to by the dev argument.

The list of idle states to take into consideration is represented by the states array of struct cpuidle_state objects held by the struct cpuidle_driver object pointed to by the drv argument (which represents the CPUIdle driver to be used with the CPU at hand). The value returned by this callback is interpreted as an index into that array (unless it is a negative error code).

The stop_tick argument is used to indicate whether or not to stop the scheduler tick before asking the processor to enter the selected idle state. When the bool variable pointed to by it (which is set to true before invoking this callback) is cleared to false , the processor will be asked to enter the selected idle state without stopping the scheduler tick on the given CPU (if the tick has been stopped on that CPU already, however, it will not be restarted before asking the processor to enter the idle state).

This callback is mandatory (i.e. the select callback pointer in struct cpuidle_governor must not be NULL for the registration of the governor to succeed).

Called to allow the governor to evaluate the accuracy of the idle state selection made by the ->select() callback (when it was invoked last time) and possibly use the result of that to improve the accuracy of idle state selections in the future.

In addition, CPUIdle governors are required to take power management quality of service (PM QoS) constraints on the processor wakeup latency into account when selecting idle states. In order to obtain the current effective PM QoS wakeup latency constraint for a given CPU, a CPUIdle governor is expected to pass the number of the CPU to cpuidle_governor_latency_req() . Then, the governor’s ->select() callback must not return the index of an indle state whose exit_latency value is greater than the number returned by that function.

Читайте также:  Как установить скачанное приложение на linux

CPU Idle Time Management Drivers¶

CPU idle time management ( CPUIdle ) drivers provide an interface between the other parts of CPUIdle and the hardware.

First of all, a CPUIdle driver has to populate the states array of struct cpuidle_state objects included in the struct cpuidle_driver object representing it. Going forward this array will represent the list of available idle states that the processor hardware can be asked to enter shared by all of the logical CPUs handled by the given driver.

The entries in the states array are expected to be sorted by the value of the target_residency field in struct cpuidle_state in the ascending order (that is, index 0 should correspond to the idle state with the minimum value of target_residency ). [Since the target_residency value is expected to reflect the “depth” of the idle state represented by the struct cpuidle_state object holding it, this sorting order should be the same as the ascending sorting order by the idle state “depth”.]

Three fields in struct cpuidle_state are used by the existing CPUIdle governors for computations related to idle state selection:

Minimum time to spend in this idle state including the time needed to enter it (which may be substantial) to save more energy than could be saved by staying in a shallower idle state for the same amount of time, in microseconds.

Maximum time it will take a CPU asking the processor to enter this idle state to start executing the first instruction after a wakeup from it, in microseconds.

Flags representing idle state properties. Currently, governors only use the CPUIDLE_FLAG_POLLING flag which is set if the given object does not represent a real idle state, but an interface to a software “loop” that can be used in order to avoid asking the processor to enter any idle state at all. [There are other flags used by the CPUIdle core in special situations.]

The enter callback pointer in struct cpuidle_state, which must not be NULL , points to the routine to execute in order to ask the processor to enter this particular idle state:

The first two arguments of it point to the struct cpuidle_device object representing the logical CPU running this callback and the struct cpuidle_driver object representing the driver itself, respectively, and the last one is an index of the struct cpuidle_state entry in the driver’s states array representing the idle state to ask the processor to enter.

The analogous ->enter_s2idle() callback in struct cpuidle_state is used only for implementing the suspend-to-idle system-wide power management feature. The difference between in and ->enter() is that it must not re-enable interrupts at any point (even temporarily) or attempt to change the states of clock event devices, which the ->enter() callback may do sometimes.

Once the states array has been populated, the number of valid entries in it has to be stored in the state_count field of the struct cpuidle_driver object representing the driver. Moreover, if any entries in the states array represent “coupled” idle states (that is, idle states that can only be asked for if multiple related logical CPUs are idle), the safe_state_index field in struct cpuidle_driver needs to be the index of an idle state that is not “coupled” (that is, one that can be asked for if only one logical CPU is idle).

In addition to that, if the given CPUIdle driver is only going to handle a subset of logical CPUs in the system, the cpumask field in its struct cpuidle_driver object must point to the set (mask) of CPUs that will be handled by it.

A CPUIdle driver can only be used after it has been registered. If there are no “coupled” idle state entries in the driver’s states array, that can be accomplished by passing the driver’s struct cpuidle_driver object to cpuidle_register_driver() . Otherwise, cpuidle_register() should be used for this purpose.

However, it also is necessary to register struct cpuidle_device objects for all of the logical CPUs to be handled by the given CPUIdle driver with the help of cpuidle_register_device() after the driver has been registered and cpuidle_register_driver() , unlike cpuidle_register() , does not do that automatically. For this reason, the drivers that use cpuidle_register_driver() to register themselves must also take care of registering the struct cpuidle_device objects as needed, so it is generally recommended to use cpuidle_register() for CPUIdle driver registration in all cases.

The registration of a struct cpuidle_device object causes the CPUIdle sysfs interface to be created and the governor’s ->enable() callback to be invoked for the logical CPU represented by it, so it must take place after registering the driver that will handle the CPU in question.

CPUIdle drivers and struct cpuidle_device objects can be unregistered when they are not necessary any more which allows some resources associated with them to be released. Due to dependencies between them, all of the struct cpuidle_device objects representing CPUs handled by the given CPUIdle driver must be unregistered, with the help of cpuidle_unregister_device() , before calling cpuidle_unregister_driver() to unregister the driver. Alternatively, cpuidle_unregister() can be called to unregister a CPUIdle driver along with all of the struct cpuidle_device objects representing CPUs handled by it.

CPUIdle drivers can respond to runtime system configuration changes that lead to modifications of the list of available processor idle states (which can happen, for example, when the system’s power source is switched from AC to battery or the other way around). Upon a notification of such a change, a CPUIdle driver is expected to call cpuidle_pause_and_lock() to turn CPUIdle off temporarily and then cpuidle_disable_device() for all of the struct cpuidle_device objects representing CPUs affected by that change. Next, it can update its states array in accordance with the new configuration of the system, call cpuidle_enable_device() for all of the relevant struct cpuidle_device objects and invoke cpuidle_resume_and_unlock() to allow CPUIdle to be used again.

© Copyright The kernel development community.


Поделиться с друзьями