Skip to content

Commit 01a178a

Browse files
tang-chentorvalds
authored andcommitted
acpi, memory-hotplug: support getting hotplug info from SRAT
We now provide an option for users who don't want to specify physical memory address in kernel commandline. /* * For movablemem_map=acpi: * * SRAT: |_____| |_____| |_________| |_________| ...... * node id: 0 1 1 2 * hotpluggable: n y y n * movablemem_map: |_____| |_________| * * Using movablemem_map, we can prevent memblock from allocating memory * on ZONE_MOVABLE at boot time. */ So user just specify movablemem_map=acpi, and the kernel will use hotpluggable info in SRAT to determine which memory ranges should be set as ZONE_MOVABLE. If all the memory ranges in SRAT is hotpluggable, then no memory can be used by kernel. But before parsing SRAT, memblock has already reserve some memory ranges for other purposes, such as for kernel image, and so on. We cannot prevent kernel from using these memory. So we need to exclude these ranges even if these memory is hotpluggable. Furthermore, there could be several memory ranges in the single node which the kernel resides in. We may skip one range that have memory reserved by memblock, but if the rest of memory is too small, then the kernel will fail to boot. So, make the whole node which the kernel resides in un-hotpluggable. Then the kernel has enough memory to use. NOTE: Using this way will cause NUMA performance down because the whole node will be set as ZONE_MOVABLE, and kernel cannot use memory on it. If users don't want to lose NUMA performance, just don't use it. [[email protected]: fix warning] [[email protected]: use strcmp()] Signed-off-by: Tang Chen <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Jianguo Wu <[email protected]> Cc: Kamezawa Hiroyuki <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Wu Jianguo <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Len Brown <[email protected]> Cc: "Brown, Len" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent 27168d3 commit 01a178a

File tree

4 files changed

+113
-11
lines changed

4 files changed

+113
-11
lines changed

Documentation/kernel-parameters.txt

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1640,22 +1640,41 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
16401640
that the amount of memory usable for all allocations
16411641
is not too small.
16421642

1643+
movablemem_map=acpi
1644+
[KNL,X86,IA-64,PPC] This parameter is similar to
1645+
memmap except it specifies the memory map of
1646+
ZONE_MOVABLE.
1647+
This option inform the kernel to use Hot Pluggable bit
1648+
in flags from SRAT from ACPI BIOS to determine which
1649+
memory devices could be hotplugged. The corresponding
1650+
memory ranges will be set as ZONE_MOVABLE.
1651+
NOTE: Whatever node the kernel resides in will always
1652+
be un-hotpluggable.
1653+
16431654
movablemem_map=nn[KMG]@ss[KMG]
16441655
[KNL,X86,IA-64,PPC] This parameter is similar to
16451656
memmap except it specifies the memory map of
16461657
ZONE_MOVABLE.
1647-
If more areas are all within one node, then from
1648-
lowest ss to the end of the node will be ZONE_MOVABLE.
1649-
If an area covers two or more nodes, the area from
1650-
ss to the end of the 1st node will be ZONE_MOVABLE,
1651-
and all the rest nodes will only have ZONE_MOVABLE.
1658+
If user specifies memory ranges, the info in SRAT will
1659+
be ingored. And it works like the following:
1660+
- If more ranges are all within one node, then from
1661+
lowest ss to the end of the node will be ZONE_MOVABLE.
1662+
- If a range is within a node, then from ss to the end
1663+
of the node will be ZONE_MOVABLE.
1664+
- If a range covers two or more nodes, then from ss to
1665+
the end of the 1st node will be ZONE_MOVABLE, and all
1666+
the rest nodes will only have ZONE_MOVABLE.
16521667
If memmap is specified at the same time, the
16531668
movablemem_map will be limited within the memmap
16541669
areas. If kernelcore or movablecore is also specified,
16551670
movablemem_map will have higher priority to be
16561671
satisfied. So the administrator should be careful that
16571672
the amount of movablemem_map areas are not too large.
16581673
Otherwise kernel won't have enough memory to start.
1674+
NOTE: We don't stop users specifying the node the
1675+
kernel resides in as hotpluggable so that this
1676+
option can be used as a workaround of firmware
1677+
bugs.
16591678

16601679
MTD_Partition= [MTD]
16611680
Format: <name>,<region-number>,<size>,<offset>

arch/x86/mm/srat.c

Lines changed: 66 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -142,16 +142,72 @@ static inline int save_add_info(void) {return 0;}
142142
#endif
143143

144144
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
145-
static void __init handle_movablemem(int node, u64 start, u64 end)
145+
static void __init
146+
handle_movablemem(int node, u64 start, u64 end, u32 hotpluggable)
146147
{
147-
int overlap;
148+
int overlap, i;
148149
unsigned long start_pfn, end_pfn;
149150

150151
start_pfn = PFN_DOWN(start);
151152
end_pfn = PFN_UP(end);
152153

153154
/*
154-
* For movablecore_map=nn[KMG]@ss[KMG]:
155+
* For movablemem_map=acpi:
156+
*
157+
* SRAT: |_____| |_____| |_________| |_________| ......
158+
* node id: 0 1 1 2
159+
* hotpluggable: n y y n
160+
* movablemem_map: |_____| |_________|
161+
*
162+
* Using movablemem_map, we can prevent memblock from allocating memory
163+
* on ZONE_MOVABLE at boot time.
164+
*
165+
* Before parsing SRAT, memblock has already reserve some memory ranges
166+
* for other purposes, such as for kernel image. We cannot prevent
167+
* kernel from using these memory, so we need to exclude these memory
168+
* even if it is hotpluggable.
169+
* Furthermore, to ensure the kernel has enough memory to boot, we make
170+
* all the memory on the node which the kernel resides in
171+
* un-hotpluggable.
172+
*/
173+
if (hotpluggable && movablemem_map.acpi) {
174+
/* Exclude ranges reserved by memblock. */
175+
struct memblock_type *rgn = &memblock.reserved;
176+
177+
for (i = 0; i < rgn->cnt; i++) {
178+
if (end <= rgn->regions[i].base ||
179+
start >= rgn->regions[i].base +
180+
rgn->regions[i].size)
181+
continue;
182+
183+
/*
184+
* If the memory range overlaps the memory reserved by
185+
* memblock, then the kernel resides in this node.
186+
*/
187+
node_set(node, movablemem_map.numa_nodes_kernel);
188+
189+
goto out;
190+
}
191+
192+
/*
193+
* If the kernel resides in this node, then the whole node
194+
* should not be hotpluggable.
195+
*/
196+
if (node_isset(node, movablemem_map.numa_nodes_kernel))
197+
goto out;
198+
199+
insert_movablemem_map(start_pfn, end_pfn);
200+
201+
/*
202+
* numa_nodes_hotplug nodemask represents which nodes are put
203+
* into movablemem_map.map[].
204+
*/
205+
node_set(node, movablemem_map.numa_nodes_hotplug);
206+
goto out;
207+
}
208+
209+
/*
210+
* For movablemem_map=nn[KMG]@ss[KMG]:
155211
*
156212
* SRAT: |_____| |_____| |_________| |_________| ......
157213
* node id: 0 1 1 2
@@ -160,6 +216,8 @@ static void __init handle_movablemem(int node, u64 start, u64 end)
160216
*
161217
* Using movablemem_map, we can prevent memblock from allocating memory
162218
* on ZONE_MOVABLE at boot time.
219+
*
220+
* NOTE: In this case, SRAT info will be ingored.
163221
*/
164222
overlap = movablemem_map_overlap(start_pfn, end_pfn);
165223
if (overlap >= 0) {
@@ -187,9 +245,12 @@ static void __init handle_movablemem(int node, u64 start, u64 end)
187245
*/
188246
insert_movablemem_map(start_pfn, end_pfn);
189247
}
248+
out:
249+
return;
190250
}
191251
#else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
192-
static inline void handle_movablemem(int node, u64 start, u64 end)
252+
static inline void
253+
handle_movablemem(int node, u64 start, u64 end, u32 hotpluggable)
193254
{
194255
}
195256
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
@@ -234,7 +295,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
234295
(unsigned long long) start, (unsigned long long) end - 1,
235296
hotpluggable ? "Hot Pluggable": "");
236297

237-
handle_movablemem(node, start, end);
298+
handle_movablemem(node, start, end, hotpluggable);
238299

239300
return 0;
240301
out_err_bad_srat:

include/linux/mm.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1366,9 +1366,11 @@ struct movablemem_entry {
13661366
};
13671367

13681368
struct movablemem_map {
1369+
bool acpi; /* true if using SRAT info */
13691370
int nr_map;
13701371
struct movablemem_entry map[MOVABLEMEM_MAP_MAX];
13711372
nodemask_t numa_nodes_hotplug; /* on which nodes we specify memory */
1373+
nodemask_t numa_nodes_kernel; /* on which nodes kernel resides in */
13721374
};
13731375

13741376
extern void __init insert_movablemem_map(unsigned long start_pfn,

mm/page_alloc.c

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -203,7 +203,10 @@ static unsigned long __meminitdata dma_reserve;
203203

204204
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
205205
/* Movable memory ranges, will also be used by memblock subsystem. */
206-
struct movablemem_map movablemem_map;
206+
struct movablemem_map movablemem_map = {
207+
.acpi = false,
208+
.nr_map = 0,
209+
};
207210

208211
static unsigned long __meminitdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES];
209212
static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES];
@@ -5314,6 +5317,23 @@ static int __init cmdline_parse_movablemem_map(char *p)
53145317
if (!p)
53155318
goto err;
53165319

5320+
if (!strcmp(p, "acpi"))
5321+
movablemem_map.acpi = true;
5322+
5323+
/*
5324+
* If user decide to use info from BIOS, all the other user specified
5325+
* ranges will be ingored.
5326+
*/
5327+
if (movablemem_map.acpi) {
5328+
if (movablemem_map.nr_map) {
5329+
memset(movablemem_map.map, 0,
5330+
sizeof(struct movablemem_entry)
5331+
* movablemem_map.nr_map);
5332+
movablemem_map.nr_map = 0;
5333+
}
5334+
return 0;
5335+
}
5336+
53175337
oldp = p;
53185338
mem_size = memparse(p, &p);
53195339
if (p == oldp)

0 commit comments

Comments
 (0)