bb-reorder.cc source code [gcc/bb-reorder.cc]

1	/ Basic block reordering routines for the GNU compiler.*
2	Copyright (C) 2000-2024 Free Software Foundation, Inc.
3
4	This file is part of GCC.
5
6	GCC is free software; you can redistribute it and/or modify it
7	under the terms of the GNU General Public License as published by
8	the Free Software Foundation; either version 3, or (at your option)
9	any later version.
10
11	GCC is distributed in the hope that it will be useful, but WITHOUT
12	ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
13	or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
14	License for more details.
15
16	You should have received a copy of the GNU General Public License
17	along with GCC; see the file COPYING3. If not see
18	<http://www.gnu.org/licenses/>. /*
19
20	/ This file contains the "reorder blocks" pass, which changes the control*
21	flow of a function to encounter fewer branches; the "partition blocks"
22	pass, which divides the basic blocks into "hot" and "cold" partitions,
23	which are kept separate; and the "duplicate computed gotos" pass, which
24	duplicates blocks ending in an indirect jump.
25
26	There are two algorithms for "reorder blocks": the "simple" algorithm,
27	which just rearranges blocks, trying to minimize the number of executed
28	unconditional branches; and the "software trace cache" algorithm, which
29	also copies code, and in general tries a lot harder to have long linear
30	pieces of machine code executed. This algorithm is described next. /*
31
32	/ This (greedy) algorithm constructs traces in several rounds.*
33	The construction starts from "seeds". The seed for the first round
34	is the entry point of the function. When there are more than one seed,
35	the one with the lowest key in the heap is selected first (see bb_to_key).
36	Then the algorithm repeatedly adds the most probable successor to the end
37	of a trace. Finally it connects the traces.
38
39	There are two parameters: Branch Threshold and Exec Threshold.
40	If the probability of an edge to a successor of the current basic block is
41	lower than Branch Threshold or its count is lower than Exec Threshold,
42	then the successor will be the seed in one of the next rounds.
43	Each round has these parameters lower than the previous one.
44	The last round has to have these parameters set to zero so that the
45	remaining blocks are picked up.
46
47	The algorithm selects the most probable successor from all unvisited
48	successors and successors that have been added to this trace.
49	The other successors (that has not been "sent" to the next round) will be
50	other seeds for this round and the secondary traces will start from them.
51	If the successor has not been visited in this trace, it is added to the
52	trace (however, there is some heuristic for simple branches).
53	If the successor has been visited in this trace, a loop has been found.
54	If the loop has many iterations, the loop is rotated so that the source
55	block of the most probable edge going out of the loop is the last block
56	of the trace.
57	If the loop has few iterations and there is no edge from the last block of
58	the loop going out of the loop, the loop header is duplicated.
59
60	When connecting traces, the algorithm first checks whether there is an edge
61	from the last block of a trace to the first block of another trace.
62	When there are still some unconnected traces it checks whether there exists
63	a basic block BB such that BB is a successor of the last block of a trace
64	and BB is a predecessor of the first block of another trace. In this case,
65	BB is duplicated, added at the end of the first trace and the traces are
66	connected through it.
67	The rest of traces are simply connected so there will be a jump to the
68	beginning of the rest of traces.
69
70	The above description is for the full algorithm, which is used when the
71	function is optimized for speed. When the function is optimized for size,
72	in order to reduce long jumps and connect more fallthru edges, the
73	algorithm is modified as follows:
74	(1) Break long traces to short ones. A trace is broken at a block that has
75	multiple predecessors/ successors during trace discovery. When connecting
76	traces, only connect Trace n with Trace n + 1. This change reduces most
77	long jumps compared with the above algorithm.
78	(2) Ignore the edge probability and count for fallthru edges.
79	(3) Keep the original order of blocks when there is no chance to fall
80	through. We rely on the results of cfg_cleanup.
81
82	To implement the change for code size optimization, block's index is
83	selected as the key and all traces are found in one round.
84
85	References:
86
87	"Software Trace Cache"
88	A. Ramirez, J. Larriba-Pey, C. Navarro, J. Torrellas and M. Valero; 1999
89	http://citeseer.nj.nec.com/15361.html
90
91	*/
92
93	#include "config.h"
94	#include "system.h"
95	#include "coretypes.h"
96	#include "backend.h"
97	#include "target.h"
98	#include "rtl.h"
99	#include "tree.h"
100	#include "cfghooks.h"
101	#include "df.h"
102	#include "memmodel.h"
103	#include "optabs.h"
104	#include "regs.h"
105	#include "emit-rtl.h"
106	#include "output.h"
107	#include "expr.h"
108	#include "tree-pass.h"
109	#include "cfgrtl.h"
110	#include "cfganal.h"
111	#include "cfgbuild.h"
112	#include "cfgcleanup.h"
113	#include "bb-reorder.h"
114	#include "except.h"
115	#include "alloc-pool.h"
116	#include "fibonacci_heap.h"
117	#include "stringpool.h"
118	#include "attribs.h"
119	#include "common/common-target.h"
120
121	/ The number of rounds. In most cases there will only be 4 rounds, but*
122	when partitioning hot and cold basic blocks into separate sections of
123	the object file there will be an extra round. /*
124	#define N_ROUNDS 5
125
126	struct target_bb_reorder default_target_bb_reorder;
127	#if SWITCHABLE_TARGET
128	struct target_bb_reorder *this_target_bb_reorder = &default_target_bb_reorder;
129	#endif
130
131	#define uncond_jump_length \
132	(this_target_bb_reorder->x_uncond_jump_length)
133
134	/ Branch thresholds in thousandths (per mille) of the REG_BR_PROB_BASE. /
135	static const int branch_threshold[N_ROUNDS] = {`400`, `200`, `100`, `0`, `0`};
136
137	/ Exec thresholds in thousandths (per mille) of the count of bb 0. /
138	static const int exec_threshold[N_ROUNDS] = {`500`, `200`, `50`, `0`, `0`};
139
140	/ If edge count is lower than DUPLICATION_THRESHOLD per mille of entry*
141	block the edge destination is not duplicated while connecting traces. /*
142	#define DUPLICATION_THRESHOLD 100
143
144	typedef fibonacci_heap <long, basic_block_def> bb_heap_t;
145	typedef fibonacci_node <long, basic_block_def> bb_heap_node_t;
146
147	/ Structure to hold needed information for each basic block. /
148	struct bbro_basic_block_data
149	{
150	/ Which trace is the bb start of (-1 means it is not a start of any). /
151	int start_of_trace;
152
153	/ Which trace is the bb end of (-1 means it is not an end of any). /
154	int end_of_trace;
155
156	/ Which trace is the bb in? /
157	int in_trace;
158
159	/ Which trace was this bb visited in? /
160	int visited;
161
162	/ Cached maximum frequency of interesting incoming edges.*
163	Minus one means not yet computed. /*
164	int priority;
165
166	/ Which heap is BB in (if any)? /
167	bb_heap_t *heap;
168
169	/ Which heap node is BB in (if any)? /
170	bb_heap_node_t *node;
171	};
172
173	/ The current size of the following dynamic array. /
174	static int array_size;
175
176	/ The array which holds needed information for basic blocks. /
177	static bbro_basic_block_data *bbd;
178
179	/ To avoid frequent reallocation the size of arrays is greater than needed,*
180	the number of elements is (not less than) 1.25 size_wanted. /
181	#define GET_ARRAY_SIZE(X) ((((X) / 4) + 1) * 5)
182
183	/ Free the memory and set the pointer to NULL. /
184	#define FREE(P) (gcc_assert (P), free (P), P = 0)
185
186	/ Structure for holding information about a trace. /
187	struct trace
188	{
189	/ First and last basic block of the trace. /
190	basic_block first, last;
191
192	/ The round of the STC creation which this trace was found in. /
193	int round;
194
195	/ The length (i.e. the number of basic blocks) of the trace. /
196	int length;
197	};
198
199	/ Maximum count of one of the entry blocks. /
200	static profile_count max_entry_count;
201
202	/ Local function prototypes. /
203	static void find_traces_1_round (int, profile_count, struct trace , int* *,
204	int, bb_heap_t *, int*);
205	static basic_block copy_bb (basic_block, edge, basic_block, int);
206	static long bb_to_key (basic_block);
207	static bool better_edge_p (const_basic_block, const_edge, profile_probability,
208	profile_count, profile_probability, profile_count,
209	const_edge);
210	static bool copy_bb_p (const_basic_block, int);
211
212	/ Return the trace number in which BB was visited. /
213
214	static int
215	bb_visited_trace (const_basic_block bb)
216	{
217	gcc_assert (bb->index < array_size);
218	return bbd[bb->index].visited;
219	}
220
221	/ This function marks BB that it was visited in trace number TRACE. /
222
223	static void
224	mark_bb_visited (basic_block bb, int trace)
225	{
226	bbd[bb->index].visited = trace;
227	if (bbd[bb->index].heap)
228	{
229	bbd[bb->index].heap->delete_node (node: bbd[bb->index].node);
230	bbd[bb->index].heap = NULL;
231	bbd[bb->index].node = NULL;
232	}
233	}
234
235	/ Check to see if bb should be pushed into the next round of trace*
236	collections or not. Reasons for pushing the block forward are 1).
237	If the block is cold, we are doing partitioning, and there will be
238	another round (cold partition blocks are not supposed to be
239	collected into traces until the very last round); or 2). There will
240	be another round, and the basic block is not "hot enough" for the
241	current round of trace collection. /*
242
243	static bool
244	push_to_next_round_p (const_basic_block bb, int round, int number_of_rounds,
245	profile_count count_th)
246	{
247	bool there_exists_another_round;
248	bool block_not_hot_enough;
249
250	there_exists_another_round = round < number_of_rounds - `1`;
251
252	block_not_hot_enough = (bb->count < count_th
253	\|\| probably_never_executed_bb_p (cfun, bb));
254
255	if (there_exists_another_round
256	&& block_not_hot_enough)
257	return true;
258	else
259	return false;
260	}
261
262	/ Find the traces for Software Trace Cache. Chain each trace through*
263	RBI()->next. Store the number of traces to N_TRACES and description of
264	traces to TRACES. /*
265
266	static void
267	find_traces (int n_traces, struct* trace *traces)
268	{
269	int i;
270	int number_of_rounds;
271	edge e;
272	edge_iterator ei;
273	bb_heap_t heap = new* bb_heap_t (LONG_MIN);
274
275	/ Add one extra round of trace collection when partitioning hot/cold*
276	basic blocks into separate sections. The last round is for all the
277	cold blocks (and ONLY the cold blocks). /*
278
279	number_of_rounds = N_ROUNDS - `1`;
280
281	/ Insert entry points of function into heap. /
282	max_entry_count = profile_count::zero ();
283	FOR_EACH_EDGE (e, ei, ENTRY_BLOCK_PTR_FOR_FN (cfun)->succs)
284	{
285	bbd[e->dest->index].heap = heap;
286	bbd[e->dest->index].node = heap->insert (key: bb_to_key (e->dest), data: e->dest);
287	if (e->dest->count > max_entry_count)
288	max_entry_count = e->dest->count;
289	}
290
291	/ Find the traces. /
292	for (i = `0`; i < number_of_rounds; i++)
293	{
294	profile_count count_threshold;
295
296	if (dump_file)
297	fprintf (stream: dump_file, format: "STC - round %d\n", i + `1`);
298
299	count_threshold = max_entry_count.apply_scale (num: exec_threshold[i], den: `1000`);
300
301	find_traces_1_round (REG_BR_PROB_BASE * branch_threshold[i] / `1000`,
302	count_threshold, traces, n_traces, i, &heap,
303	number_of_rounds);
304	}
305	delete heap;
306
307	if (dump_file)
308	{
309	for (i = `0`; i < *n_traces; i++)
310	{
311	basic_block bb;
312	fprintf (stream: dump_file, format: "Trace %d (round %d): ", i + `1`,
313	traces[i].round + `1`);
314	for (bb = traces[i].first;
315	bb != traces[i].last;
316	bb = (basic_block) bb->aux)
317	{
318	fprintf (stream: dump_file, format: "%d [", bb->index);
319	bb->count.dump (f: dump_file);
320	fprintf (stream: dump_file, format: "] ");
321	}
322	fprintf (stream: dump_file, format: "%d [", bb->index);
323	bb->count.dump (f: dump_file);
324	fprintf (stream: dump_file, format: "]\n");
325	}
326	fflush (stream: dump_file);
327	}
328	}
329
330	/ Rotate loop whose back edge is BACK_EDGE in the tail of trace TRACE*
331	(with sequential number TRACE_N). /*
332
333	static basic_block
334	rotate_loop (edge back_edge, struct trace trace, int* trace_n)
335	{
336	basic_block bb;
337
338	/ Information about the best end (end after rotation) of the loop. /
339	basic_block best_bb = NULL;
340	edge best_edge = NULL;
341	profile_count best_count = profile_count::uninitialized ();
342	/ The best edge is preferred when its destination is not visited yet*
343	or is a start block of some trace. /*
344	bool is_preferred = false;
345
346	/ Find the most frequent edge that goes out from current trace. /
347	bb = back_edge->dest;
348	do
349	{
350	edge e;
351	edge_iterator ei;
352
353	FOR_EACH_EDGE (e, ei, bb->succs)
354	if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
355	&& bb_visited_trace (bb: e->dest) != trace_n
356	&& (e->flags & EDGE_CAN_FALLTHRU)
357	&& !(e->flags & EDGE_COMPLEX))
358	{
359	if (is_preferred)
360	{
361	/ The best edge is preferred. /
362	if (!bb_visited_trace (bb: e->dest)
363	\|\| bbd[e->dest->index].start_of_trace >= `0`)
364	{
365	/ The current edge E is also preferred. /
366	if (e->count () > best_count)
367	{
368	best_count = e->count ();
369	best_edge = e;
370	best_bb = bb;
371	}
372	}
373	}
374	else
375	{
376	if (!bb_visited_trace (bb: e->dest)
377	\|\| bbd[e->dest->index].start_of_trace >= `0`)
378	{
379	/ The current edge E is preferred. /
380	is_preferred = true;
381	best_count = e->count ();
382	best_edge = e;
383	best_bb = bb;
384	}
385	else
386	{
387	if (!best_edge \|\| e->count () > best_count)
388	{
389	best_count = e->count ();
390	best_edge = e;
391	best_bb = bb;
392	}
393	}
394	}
395	}
396	bb = (basic_block) bb->aux;
397	}
398	while (bb != back_edge->dest);
399
400	if (best_bb)
401	{
402	/ Rotate the loop so that the BEST_EDGE goes out from the last block of*
403	the trace. /*
404	if (back_edge->dest == trace->first)
405	{
406	trace->first = (basic_block) best_bb->aux;
407	}
408	else
409	{
410	basic_block prev_bb;
411
412	for (prev_bb = trace->first;
413	prev_bb->aux != back_edge->dest;
414	prev_bb = (basic_block) prev_bb->aux)
415	;
416	prev_bb->aux = best_bb->aux;
417
418	/ Try to get rid of uncond jump to cond jump. /
419	if (single_succ_p (bb: prev_bb))
420	{
421	basic_block header = single_succ (bb: prev_bb);
422
423	/ Duplicate HEADER if it is a small block containing cond jump*
424	in the end. /*
425	if (any_condjump_p (BB_END (header)) && copy_bb_p (header, `0`)
426	&& !CROSSING_JUMP_P (BB_END (header)))
427	copy_bb (header, single_succ_edge (bb: prev_bb), prev_bb, trace_n);
428	}
429	}
430	}
431	else
432	{
433	/ We have not found suitable loop tail so do no rotation. /
434	best_bb = back_edge->src;
435	}
436	best_bb->aux = NULL;
437	return best_bb;
438	}
439
440	/ One round of finding traces. Find traces for BRANCH_TH and EXEC_TH i.e. do*
441	not include basic blocks whose probability is lower than BRANCH_TH or whose
442	count is lower than EXEC_TH into traces (or whose count is lower than
443	COUNT_TH). Store the new traces into TRACES and modify the number of
444	traces N_TRACES. Set the round (which the trace belongs to) to ROUND.*
445	The function expects starting basic blocks to be in HEAP and will delete*
446	HEAP and store starting points for the next round into new HEAP. */
447
448	static void
449	find_traces_1_round (int branch_th, profile_count count_th,
450	struct trace traces, int* n_traces, int* round,
451	bb_heap_t *heap, int* number_of_rounds)
452	{
453	/ Heap for discarded basic blocks which are possible starting points for*
454	the next round. /*
455	bb_heap_t new_heap = new* bb_heap_t (LONG_MIN);
456	bool for_size = optimize_function_for_size_p (cfun);
457
458	while (!(*heap)->empty ())
459	{
460	basic_block bb;
461	struct trace *trace;
462	edge best_edge, e;
463	long key;
464	edge_iterator ei;
465
466	bb = (*heap)->extract_min ();
467	bbd[bb->index].heap = NULL;
468	bbd[bb->index].node = NULL;
469
470	if (dump_file)
471	fprintf (stream: dump_file, format: "Getting bb %d\n", bb->index);
472
473	/ If the BB's count is too low, send BB to the next round. When*
474	partitioning hot/cold blocks into separate sections, make sure all
475	the cold blocks (and ONLY the cold blocks) go into the (extra) final
476	round. When optimizing for size, do not push to next round. /*
477
478	if (!for_size
479	&& push_to_next_round_p (bb, round, number_of_rounds,
480	count_th))
481	{
482	int key = bb_to_key (bb);
483	bbd[bb->index].heap = new_heap;
484	bbd[bb->index].node = new_heap->insert (key, data: bb);
485
486	if (dump_file)
487	fprintf (stream: dump_file,
488	format: " Possible start point of next round: %d (key: %d)\n",
489	bb->index, key);
490	continue;
491	}
492
493	trace = traces + *n_traces;
494	trace->first = bb;
495	trace->round = round;
496	trace->length = `0`;
497	bbd[bb->index].in_trace = *n_traces;
498	(*n_traces)++;
499
500	do
501	{
502	bool ends_in_call;
503
504	/ The probability and count of the best edge. /
505	profile_probability best_prob = profile_probability::uninitialized ();
506	profile_count best_count = profile_count::uninitialized ();
507
508	best_edge = NULL;
509	mark_bb_visited (bb, trace: *n_traces);
510	trace->length++;
511
512	if (dump_file)
513	fprintf (stream: dump_file, format: "Basic block %d was visited in trace %d\n",
514	bb->index, *n_traces);
515
516	ends_in_call = block_ends_with_call_p (bb);
517
518	/ Select the successor that will be placed after BB. /
519	FOR_EACH_EDGE (e, ei, bb->succs)
520	{
521	gcc_assert (!(e->flags & EDGE_FAKE));
522
523	if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
524	continue;
525
526	if (bb_visited_trace (bb: e->dest)
527	&& bb_visited_trace (bb: e->dest) != *n_traces)
528	continue;
529
530	/ If partitioning hot/cold basic blocks, don't consider edges*
531	that cross section boundaries. /*
532	if (BB_PARTITION (e->dest) != BB_PARTITION (bb))
533	continue;
534
535	profile_probability prob = e->probability;
536	profile_count count = e->dest->count;
537
538	/ The only sensible preference for a call instruction is the*
539	fallthru edge. Don't bother selecting anything else. /*
540	if (ends_in_call)
541	{
542	if (e->flags & EDGE_CAN_FALLTHRU)
543	{
544	best_edge = e;
545	best_prob = prob;
546	best_count = count;
547	}
548	continue;
549	}
550
551	/ Edge that cannot be fallthru or improbable or infrequent*
552	successor (i.e. it is unsuitable successor). When optimizing
553	for size, ignore the probability and count. /*
554	if (!(e->flags & EDGE_CAN_FALLTHRU) \|\| (e->flags & EDGE_COMPLEX)
555	\|\| !prob.initialized_p ()
556	\|\| ((prob.to_reg_br_prob_base () < branch_th
557	\|\| e->count () < count_th) && (!for_size)))
558	continue;
559
560	if (better_edge_p (bb, e, prob, count, best_prob, best_count,
561	best_edge))
562	{
563	best_edge = e;
564	best_prob = prob;
565	best_count = count;
566	}
567	}
568
569	/ If the best destination has multiple predecessors and can be*
570	duplicated cheaper than a jump, don't allow it to be added to
571	a trace; we'll duplicate it when connecting the traces later.
572	However, we need to check that this duplication wouldn't leave
573	the best destination with only crossing predecessors, because
574	this would change its effective partition from hot to cold. /*
575	if (best_edge
576	&& EDGE_COUNT (best_edge->dest->preds) >= `2`
577	&& copy_bb_p (best_edge->dest, `0`))
578	{
579	bool only_crossing_preds = true;
580	edge e;
581	edge_iterator ei;
582	FOR_EACH_EDGE (e, ei, best_edge->dest->preds)
583	if (e != best_edge && !(e->flags & EDGE_CROSSING))
584	{
585	only_crossing_preds = false;
586	break;
587	}
588	if (!only_crossing_preds)
589	best_edge = NULL;
590	}
591
592	/ If the best destination has multiple successors or predecessors,*
593	don't allow it to be added when optimizing for size. This makes
594	sure predecessors with smaller index are handled before the best
595	destination. It breaks long trace and reduces long jumps.
596
597	Take if-then-else as an example.
598	A
599	/ \
600	B C
601	\ /
602	D
603	If we do not remove the best edge B->D/C->D, the final order might
604	be A B D ... C. C is at the end of the program. If D's successors
605	and D are complicated, might need long jumps for A->C and C->D.
606	Similar issue for order: A C D ... B.
607
608	After removing the best edge, the final result will be ABCD/ ACBD.
609	It does not add jump compared with the previous order. But it
610	reduces the possibility of long jumps. /*
611	if (best_edge && for_size
612	&& (EDGE_COUNT (best_edge->dest->succs) > `1`
613	\|\| EDGE_COUNT (best_edge->dest->preds) > `1`))
614	best_edge = NULL;
615
616	/ Add all non-selected successors to the heaps. /
617	FOR_EACH_EDGE (e, ei, bb->succs)
618	{
619	if (e == best_edge
620	\|\| e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)
621	\|\| bb_visited_trace (bb: e->dest))
622	continue;
623
624	key = bb_to_key (e->dest);
625
626	if (bbd[e->dest->index].heap)
627	{
628	/ E->DEST is already in some heap. /
629	if (key != bbd[e->dest->index].node->get_key ())
630	{
631	if (dump_file)
632	{
633	fprintf (stream: dump_file,
634	format: "Changing key for bb %d from %ld to %ld.\n",
635	e->dest->index,
636	(long) bbd[e->dest->index].node->get_key (),
637	key);
638	}
639	bbd[e->dest->index].heap->replace_key
640	(node: bbd[e->dest->index].node, key);
641	}
642	}
643	else
644	{
645	bb_heap_t which_heap = heap;
646
647	profile_probability prob = e->probability;
648
649	if (!(e->flags & EDGE_CAN_FALLTHRU)
650	\|\| (e->flags & EDGE_COMPLEX)
651	\|\| !prob.initialized_p ()
652	\|\| prob.to_reg_br_prob_base () < branch_th
653	\|\| e->count () < count_th)
654	{
655	/ When partitioning hot/cold basic blocks, make sure*
656	the cold blocks (and only the cold blocks) all get
657	pushed to the last round of trace collection. When
658	optimizing for size, do not push to next round. /*
659
660	if (!for_size && push_to_next_round_p (bb: e->dest, round,
661	number_of_rounds,
662	count_th))
663	which_heap = new_heap;
664	}
665
666	bbd[e->dest->index].heap = which_heap;
667	bbd[e->dest->index].node = which_heap->insert (key, data: e->dest);
668
669	if (dump_file)
670	{
671	fprintf (stream: dump_file,
672	format: " Possible start of %s round: %d (key: %ld)\n",
673	(which_heap == new_heap) ? "next" : "this",
674	e->dest->index, (long) key);
675	}
676
677	}
678	}
679
680	if (best_edge) / Suitable successor was found. /
681	{
682	if (bb_visited_trace (bb: best_edge->dest) == *n_traces)
683	{
684	/ We do nothing with one basic block loops. /
685	if (best_edge->dest != bb)
686	{
687	if (best_edge->count ()
688	> best_edge->dest->count.apply_scale (num: `4`, den: `5`))
689	{
690	/ The loop has at least 4 iterations. If the loop*
691	header is not the first block of the function
692	we can rotate the loop. /*
693
694	if (best_edge->dest
695	!= ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb)
696	{
697	if (dump_file)
698	{
699	fprintf (stream: dump_file,
700	format: "Rotating loop %d - %d\n",
701	best_edge->dest->index, bb->index);
702	}
703	bb->aux = best_edge->dest;
704	bbd[best_edge->dest->index].in_trace =
705	(*n_traces) - `1`;
706	bb = rotate_loop (back_edge: best_edge, trace, trace_n: *n_traces);
707	}
708	}
709	else
710	{
711	/ The loop has less than 4 iterations. /
712
713	if (single_succ_p (bb)
714	&& copy_bb_p (best_edge->dest,
715	optimize_edge_for_speed_p
716	(best_edge)))
717	{
718	bb = copy_bb (best_edge->dest, best_edge, bb,
719	*n_traces);
720	trace->length++;
721	}
722	}
723	}
724
725	/ Terminate the trace. /
726	break;
727	}
728	else
729	{
730	/ Check for a situation*
731
732	A
733	/\|
734	B \|
735	\\|
736	C
737
738	where
739	AB->count () + BC->count () >= AC->count ().
740	(i.e. 2 B->count >= AC->count )*
741	Best ordering is then A B C.
742
743	When optimizing for size, A B C is always the best order.
744
745	This situation is created for example by:
746
747	if (A) B;
748	C;
749
750	*/
751
752	FOR_EACH_EDGE (e, ei, bb->succs)
753	if (e != best_edge
754	&& (e->flags & EDGE_CAN_FALLTHRU)
755	&& !(e->flags & EDGE_COMPLEX)
756	&& !bb_visited_trace (bb: e->dest)
757	&& single_pred_p (bb: e->dest)
758	&& !(e->flags & EDGE_CROSSING)
759	&& single_succ_p (bb: e->dest)
760	&& (single_succ_edge (bb: e->dest)->flags
761	& EDGE_CAN_FALLTHRU)
762	&& !(single_succ_edge (bb: e->dest)->flags & EDGE_COMPLEX)
763	&& single_succ (bb: e->dest) == best_edge->dest
764	&& (e->dest->count * `2`
765	>= best_edge->count () \|\| for_size))
766	{
767	best_edge = e;
768	if (dump_file)
769	fprintf (stream: dump_file, format: "Selecting BB %d\n",
770	best_edge->dest->index);
771	break;
772	}
773
774	bb->aux = best_edge->dest;
775	bbd[best_edge->dest->index].in_trace = (*n_traces) - `1`;
776	bb = best_edge->dest;
777	}
778	}
779	}
780	while (best_edge);
781	trace->last = bb;
782	bbd[trace->first->index].start_of_trace = *n_traces - `1`;
783	if (bbd[trace->last->index].end_of_trace != *n_traces - `1`)
784	{
785	bbd[trace->last->index].end_of_trace = *n_traces - `1`;
786	/ Update the cached maximum frequency for interesting predecessor*
787	edges for successors of the new trace end. /*
788	FOR_EACH_EDGE (e, ei, trace->last->succs)
789	if (EDGE_FREQUENCY (e) > bbd[e->dest->index].priority)
790	bbd[e->dest->index].priority = EDGE_FREQUENCY (e);
791	}
792
793	/ The trace is terminated so we have to recount the keys in heap*
794	(some block can have a lower key because now one of its predecessors
795	is an end of the trace). /*
796	FOR_EACH_EDGE (e, ei, bb->succs)
797	{
798	if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)
799	\|\| bb_visited_trace (bb: e->dest))
800	continue;
801
802	if (bbd[e->dest->index].heap)
803	{
804	key = bb_to_key (e->dest);
805	if (key != bbd[e->dest->index].node->get_key ())
806	{
807	if (dump_file)
808	{
809	fprintf (stream: dump_file,
810	format: "Changing key for bb %d from %ld to %ld.\n",
811	e->dest->index,
812	(long) bbd[e->dest->index].node->get_key (), key);
813	}
814	bbd[e->dest->index].heap->replace_key
815	(node: bbd[e->dest->index].node, key);
816	}
817	}
818	}
819	}
820
821	delete (*heap);
822
823	/ "Return" the new heap. /
824	*heap = new_heap;
825	}
826
827	/ Create a duplicate of the basic block OLD_BB and redirect edge E to it, add*
828	it to trace after BB, mark OLD_BB visited and update pass' data structures
829	(TRACE is a number of trace which OLD_BB is duplicated to). /*
830
831	static basic_block
832	copy_bb (basic_block old_bb, edge e, basic_block bb, int trace)
833	{
834	basic_block new_bb;
835
836	new_bb = duplicate_block (old_bb, e, bb);
837	BB_COPY_PARTITION (new_bb, old_bb);
838
839	gcc_assert (e->dest == new_bb);
840
841	if (dump_file)
842	fprintf (stream: dump_file,
843	format: "Duplicated bb %d (created bb %d)\n",
844	old_bb->index, new_bb->index);
845
846	if (new_bb->index >= array_size
847	\|\| last_basic_block_for_fn (cfun) > array_size)
848	{
849	int i;
850	int new_size;
851
852	new_size = MAX (last_basic_block_for_fn (cfun), new_bb->index + `1`);
853	new_size = GET_ARRAY_SIZE (new_size);
854	bbd = XRESIZEVEC (bbro_basic_block_data, bbd, new_size);
855	for (i = array_size; i < new_size; i++)
856	{
857	bbd[i].start_of_trace = -`1`;
858	bbd[i].end_of_trace = -`1`;
859	bbd[i].in_trace = -`1`;
860	bbd[i].visited = `0`;
861	bbd[i].priority = -`1`;
862	bbd[i].heap = NULL;
863	bbd[i].node = NULL;
864	}
865	array_size = new_size;
866
867	if (dump_file)
868	{
869	fprintf (stream: dump_file,
870	format: "Growing the dynamic array to %d elements.\n",
871	array_size);
872	}
873	}
874
875	gcc_assert (!bb_visited_trace (e->dest));
876	mark_bb_visited (bb: new_bb, trace);
877	new_bb->aux = bb->aux;
878	bb->aux = new_bb;
879
880	bbd[new_bb->index].in_trace = trace;
881
882	return new_bb;
883	}
884
885	/ Compute and return the key (for the heap) of the basic block BB. /
886
887	static long
888	bb_to_key (basic_block bb)
889	{
890	edge e;
891	edge_iterator ei;
892
893	/ Use index as key to align with its original order. /
894	if (optimize_function_for_size_p (cfun))
895	return bb->index;
896
897	/ Do not start in probably never executed blocks. /
898
899	if (BB_PARTITION (bb) == BB_COLD_PARTITION
900	\|\| probably_never_executed_bb_p (cfun, bb))
901	return BB_FREQ_MAX;
902
903	/ Prefer blocks whose predecessor is an end of some trace*
904	or whose predecessor edge is EDGE_DFS_BACK. /*
905	int priority = bbd[bb->index].priority;
906	if (priority == -`1`)
907	{
908	priority = `0`;
909	FOR_EACH_EDGE (e, ei, bb->preds)
910	{
911	if ((e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
912	&& bbd[e->src->index].end_of_trace >= `0`)
913	\|\| (e->flags & EDGE_DFS_BACK))
914	{
915	int edge_freq = EDGE_FREQUENCY (e);
916
917	if (edge_freq > priority)
918	priority = edge_freq;
919	}
920	}
921	bbd[bb->index].priority = priority;
922	}
923
924	if (priority)
925	/ The block with priority should have significantly lower key. /
926	return -(`100` * BB_FREQ_MAX + `100` * priority + bb->count.to_frequency (cfun));
927
928	return -bb->count.to_frequency (cfun);
929	}
930
931	/ Return true when the edge E from basic block BB is better than the temporary*
932	best edge (details are in function). The probability of edge E is PROB. The
933	count of the successor is COUNT. The current best probability is
934	BEST_PROB, the best count is BEST_COUNT.
935	The edge is considered to be equivalent when PROB does not differ much from
936	BEST_PROB; similarly for count. /*
937
938	static bool
939	better_edge_p (const_basic_block bb, const_edge e, profile_probability prob,
940	profile_count count, profile_probability best_prob,
941	profile_count best_count, const_edge cur_best_edge)
942	{
943	bool is_better_edge;
944
945	/ The BEST_* values do not have to be best, but can be a bit smaller than*
946	maximum values. /*
947	profile_probability diff_prob = best_prob / `10`;
948
949	/ The smaller one is better to keep the original order. /
950	if (optimize_function_for_size_p (cfun))
951	return !cur_best_edge
952	\|\| cur_best_edge->dest->index > e->dest->index;
953
954	/ Those edges are so expensive that continuing a trace is not useful*
955	performance wise. /*
956	if (e->flags & (EDGE_ABNORMAL \| EDGE_EH))
957	return false;
958
959	if (prob > best_prob + diff_prob
960	\|\| (!best_prob.initialized_p ()
961	&& prob > profile_probability::guessed_never ()))
962	/ The edge has higher probability than the temporary best edge. /
963	is_better_edge = true;
964	else if (prob < best_prob - diff_prob)
965	/ The edge has lower probability than the temporary best edge. /
966	is_better_edge = false;
967	else
968	{
969	profile_count diff_count = best_count / `10`;
970	if (count < best_count - diff_count
971	\|\| (!best_count.initialized_p ()
972	&& count.nonzero_p ()))
973	/ The edge and the temporary best edge have almost equivalent*
974	probabilities. The higher countuency of a successor now means
975	that there is another edge going into that successor.
976	This successor has lower countuency so it is better. /*
977	is_better_edge = true;
978	else if (count > best_count + diff_count)
979	/ This successor has higher countuency so it is worse. /
980	is_better_edge = false;
981	else if (e->dest->prev_bb == bb)
982	/ The edges have equivalent probabilities and the successors*
983	have equivalent frequencies. Select the previous successor. /*
984	is_better_edge = true;
985	else
986	is_better_edge = false;
987	}
988
989	return is_better_edge;
990	}
991
992	/ Return true when the edge E is better than the temporary best edge*
993	CUR_BEST_EDGE. If SRC_INDEX_P is true, the function compares the src bb of
994	E and CUR_BEST_EDGE; otherwise it will compare the dest bb.
995	BEST_LEN is the trace length of src (or dest) bb in CUR_BEST_EDGE.
996	TRACES record the information about traces.
997	When optimizing for size, the edge with smaller index is better.
998	When optimizing for speed, the edge with bigger probability or longer trace
999	is better. /*
1000
1001	static bool
1002	connect_better_edge_p (const_edge e, bool src_index_p, int best_len,
1003	const_edge cur_best_edge, struct trace *traces)
1004	{
1005	int e_index;
1006	int b_index;
1007	bool is_better_edge;
1008
1009	if (!cur_best_edge)
1010	return true;
1011
1012	if (optimize_function_for_size_p (cfun))
1013	{
1014	e_index = src_index_p ? e->src->index : e->dest->index;
1015	b_index = src_index_p ? cur_best_edge->src->index
1016	: cur_best_edge->dest->index;
1017	/ The smaller one is better to keep the original order. /
1018	return b_index > e_index;
1019	}
1020
1021	if (src_index_p)
1022	{
1023	e_index = e->src->index;
1024
1025	/ We are looking for predecessor, so probabilities are not that*
1026	informative. We do not want to connect A to B because A has
1027	only one successor (probability is 100%) while there is edge
1028	A' to B where probability is 90% but which is much more frequent. /*
1029	if (e->count () > cur_best_edge->count ())
1030	/ The edge has higher probability than the temporary best edge. /
1031	is_better_edge = true;
1032	else if (e->count () < cur_best_edge->count ())
1033	/ The edge has lower probability than the temporary best edge. /
1034	is_better_edge = false;
1035	else if (e->probability > cur_best_edge->probability)
1036	/ The edge has higher probability than the temporary best edge. /
1037	is_better_edge = true;
1038	else if (e->probability < cur_best_edge->probability)
1039	/ The edge has lower probability than the temporary best edge. /
1040	is_better_edge = false;
1041	else if (traces[bbd[e_index].end_of_trace].length > best_len)
1042	/ The edge and the temporary best edge have equivalent probabilities.*
1043	The edge with longer trace is better. /*
1044	is_better_edge = true;
1045	else
1046	is_better_edge = false;
1047	}
1048	else
1049	{
1050	e_index = e->dest->index;
1051
1052	if (e->probability > cur_best_edge->probability)
1053	/ The edge has higher probability than the temporary best edge. /
1054	is_better_edge = true;
1055	else if (e->probability < cur_best_edge->probability)
1056	/ The edge has lower probability than the temporary best edge. /
1057	is_better_edge = false;
1058	else if (traces[bbd[e_index].start_of_trace].length > best_len)
1059	/ The edge and the temporary best edge have equivalent probabilities.*
1060	The edge with longer trace is better. /*
1061	is_better_edge = true;
1062	else
1063	is_better_edge = false;
1064	}
1065
1066	return is_better_edge;
1067	}
1068
1069	/ Connect traces in array TRACES, N_TRACES is the count of traces. /
1070
1071	static void
1072	connect_traces (int n_traces, struct trace *traces)
1073	{
1074	int i;
1075	bool *connected;
1076	bool two_passes;
1077	int last_trace;
1078	int current_pass;
1079	int current_partition;
1080	profile_count count_threshold;
1081	bool for_size = optimize_function_for_size_p (cfun);
1082
1083	count_threshold = max_entry_count.apply_scale (DUPLICATION_THRESHOLD, den: `1000`);
1084
1085	connected = XCNEWVEC (bool, n_traces);
1086	last_trace = -`1`;
1087	current_pass = `1`;
1088	current_partition = BB_PARTITION (traces[`0`].first);
1089	two_passes = false;
1090
1091	if (crtl->has_bb_partition)
1092	for (i = `0`; i < n_traces && !two_passes; i++)
1093	if (BB_PARTITION (traces[`0`].first)
1094	!= BB_PARTITION (traces[i].first))
1095	two_passes = true;
1096
1097	for (i = `0`; i < n_traces \|\| (two_passes && current_pass == `1`) ; i++)
1098	{
1099	int t = i;
1100	int t2;
1101	edge e, best;
1102	int best_len;
1103
1104	if (i >= n_traces)
1105	{
1106	gcc_assert (two_passes && current_pass == `1`);
1107	i = `0`;
1108	t = i;
1109	current_pass = `2`;
1110	if (current_partition == BB_HOT_PARTITION)
1111	current_partition = BB_COLD_PARTITION;
1112	else
1113	current_partition = BB_HOT_PARTITION;
1114	}
1115
1116	if (connected[t])
1117	continue;
1118
1119	if (two_passes
1120	&& BB_PARTITION (traces[t].first) != current_partition)
1121	continue;
1122
1123	connected[t] = true;
1124
1125	/ Find the predecessor traces. /
1126	for (t2 = t; t2 > `0`;)
1127	{
1128	edge_iterator ei;
1129	best = NULL;
1130	best_len = `0`;
1131	FOR_EACH_EDGE (e, ei, traces[t2].first->preds)
1132	{
1133	int si = e->src->index;
1134
1135	if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
1136	&& (e->flags & EDGE_CAN_FALLTHRU)
1137	&& !(e->flags & EDGE_COMPLEX)
1138	&& bbd[si].end_of_trace >= `0`
1139	&& !connected[bbd[si].end_of_trace]
1140	&& (BB_PARTITION (e->src) == current_partition)
1141	&& connect_better_edge_p (e, src_index_p: true, best_len, cur_best_edge: best, traces))
1142	{
1143	best = e;
1144	best_len = traces[bbd[si].end_of_trace].length;
1145	}
1146	}
1147	if (best)
1148	{
1149	best->src->aux = best->dest;
1150	t2 = bbd[best->src->index].end_of_trace;
1151	connected[t2] = true;
1152
1153	if (dump_file)
1154	{
1155	fprintf (stream: dump_file, format: "Connection: %d %d\n",
1156	best->src->index, best->dest->index);
1157	}
1158	}
1159	else
1160	break;
1161	}
1162
1163	if (last_trace >= `0`)
1164	traces[last_trace].last->aux = traces[t2].first;
1165	last_trace = t;
1166
1167	/ Find the successor traces. /
1168	while (`1`)
1169	{
1170	/ Find the continuation of the chain. /
1171	edge_iterator ei;
1172	best = NULL;
1173	best_len = `0`;
1174	FOR_EACH_EDGE (e, ei, traces[t].last->succs)
1175	{
1176	int di = e->dest->index;
1177
1178	if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
1179	&& (e->flags & EDGE_CAN_FALLTHRU)
1180	&& !(e->flags & EDGE_COMPLEX)
1181	&& bbd[di].start_of_trace >= `0`
1182	&& !connected[bbd[di].start_of_trace]
1183	&& (BB_PARTITION (e->dest) == current_partition)
1184	&& connect_better_edge_p (e, src_index_p: false, best_len, cur_best_edge: best, traces))
1185	{
1186	best = e;
1187	best_len = traces[bbd[di].start_of_trace].length;
1188	}
1189	}
1190
1191	if (for_size)
1192	{
1193	if (!best)
1194	/ Stop finding the successor traces. /
1195	break;
1196
1197	/ It is OK to connect block n with block n + 1 or a block*
1198	before n. For others, only connect to the loop header. /*
1199	if (best->dest->index > (traces[t].last->index + `1`))
1200	{
1201	int count = EDGE_COUNT (best->dest->preds);
1202
1203	FOR_EACH_EDGE (e, ei, best->dest->preds)
1204	if (e->flags & EDGE_DFS_BACK)
1205	count--;
1206
1207	/ If dest has multiple predecessors, skip it. We expect*
1208	that one predecessor with smaller index connects with it
1209	later. /*
1210	if (count != `1`)
1211	break;
1212	}
1213
1214	/ Only connect Trace n with Trace n + 1. It is conservative*
1215	to keep the order as close as possible to the original order.
1216	It also helps to reduce long jumps. /*
1217	if (last_trace != bbd[best->dest->index].start_of_trace - `1`)
1218	break;
1219
1220	if (dump_file)
1221	fprintf (stream: dump_file, format: "Connection: %d %d\n",
1222	best->src->index, best->dest->index);
1223
1224	t = bbd[best->dest->index].start_of_trace;
1225	traces[last_trace].last->aux = traces[t].first;
1226	connected[t] = true;
1227	last_trace = t;
1228	}
1229	else if (best)
1230	{
1231	if (dump_file)
1232	{
1233	fprintf (stream: dump_file, format: "Connection: %d %d\n",
1234	best->src->index, best->dest->index);
1235	}
1236	t = bbd[best->dest->index].start_of_trace;
1237	traces[last_trace].last->aux = traces[t].first;
1238	connected[t] = true;
1239	last_trace = t;
1240	}
1241	else
1242	{
1243	/ Try to connect the traces by duplication of 1 block. /
1244	edge e2;
1245	basic_block next_bb = NULL;
1246	bool try_copy = false;
1247
1248	FOR_EACH_EDGE (e, ei, traces[t].last->succs)
1249	if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
1250	&& (e->flags & EDGE_CAN_FALLTHRU)
1251	&& !(e->flags & EDGE_COMPLEX)
1252	&& (!best \|\| e->probability > best->probability))
1253	{
1254	edge_iterator ei;
1255	edge best2 = NULL;
1256	int best2_len = `0`;
1257
1258	/ If the destination is a start of a trace which is only*
1259	one block long, then no need to search the successor
1260	blocks of the trace. Accept it. /*
1261	if (bbd[e->dest->index].start_of_trace >= `0`
1262	&& traces[bbd[e->dest->index].start_of_trace].length
1263	== `1`)
1264	{
1265	best = e;
1266	try_copy = true;
1267	continue;
1268	}
1269
1270	FOR_EACH_EDGE (e2, ei, e->dest->succs)
1271	{
1272	int di = e2->dest->index;
1273
1274	if (e2->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)
1275	\|\| ((e2->flags & EDGE_CAN_FALLTHRU)
1276	&& !(e2->flags & EDGE_COMPLEX)
1277	&& bbd[di].start_of_trace >= `0`
1278	&& !connected[bbd[di].start_of_trace]
1279	&& BB_PARTITION (e2->dest) == current_partition
1280	&& e2->count () >= count_threshold
1281	&& (!best2
1282	\|\| e2->probability > best2->probability
1283	\|\| (e2->probability == best2->probability
1284	&& traces[bbd[di].start_of_trace].length
1285	> best2_len))))
1286	{
1287	best = e;
1288	best2 = e2;
1289	if (e2->dest != EXIT_BLOCK_PTR_FOR_FN (cfun))
1290	best2_len = traces[bbd[di].start_of_trace].length;
1291	else
1292	best2_len = INT_MAX;
1293	next_bb = e2->dest;
1294	try_copy = true;
1295	}
1296	}
1297	}
1298
1299	/ Copy tiny blocks always; copy larger blocks only when the*
1300	edge is traversed frequently enough. /*
1301	if (try_copy
1302	&& BB_PARTITION (best->src) == BB_PARTITION (best->dest)
1303	&& copy_bb_p (best->dest,
1304	optimize_edge_for_speed_p (best)
1305	&& (!best->count ().initialized_p ()
1306	\|\| best->count () >= count_threshold)))
1307	{
1308	basic_block new_bb;
1309
1310	if (dump_file)
1311	{
1312	fprintf (stream: dump_file, format: "Connection: %d %d ",
1313	traces[t].last->index, best->dest->index);
1314	if (!next_bb)
1315	fputc (c: `'\n'`, stream: dump_file);
1316	else if (next_bb == EXIT_BLOCK_PTR_FOR_FN (cfun))
1317	fprintf (stream: dump_file, format: "exit\n");
1318	else
1319	fprintf (stream: dump_file, format: "%d\n", next_bb->index);
1320	}
1321
1322	new_bb = copy_bb (old_bb: best->dest, e: best, bb: traces[t].last, trace: t);
1323	traces[t].last = new_bb;
1324	if (next_bb && next_bb != EXIT_BLOCK_PTR_FOR_FN (cfun))
1325	{
1326	t = bbd[next_bb->index].start_of_trace;
1327	traces[last_trace].last->aux = traces[t].first;
1328	connected[t] = true;
1329	last_trace = t;
1330	}
1331	else
1332	break; / Stop finding the successor traces. /
1333	}
1334	else
1335	break; / Stop finding the successor traces. /
1336	}
1337	}
1338	}
1339
1340	if (dump_file)
1341	{
1342	basic_block bb;
1343
1344	fprintf (stream: dump_file, format: "Final order:\n");
1345	for (bb = traces[`0`].first; bb; bb = (basic_block) bb->aux)
1346	fprintf (stream: dump_file, format: "%d ", bb->index);
1347	fprintf (stream: dump_file, format: "\n");
1348	fflush (stream: dump_file);
1349	}
1350
1351	FREE (connected);
1352	}
1353
1354	/ Return true when BB can and should be copied. CODE_MAY_GROW is true*
1355	when code size is allowed to grow by duplication. /*
1356
1357	static bool
1358	copy_bb_p (const_basic_block bb, int code_may_grow)
1359	{
1360	unsigned int size = `0`;
1361	unsigned int max_size = uncond_jump_length;
1362	rtx_insn *insn;
1363
1364	if (EDGE_COUNT (bb->preds) < `2`)
1365	return false;
1366	if (!can_duplicate_block_p (bb))
1367	return false;
1368
1369	/ Avoid duplicating blocks which have many successors (PR/13430). /
1370	if (EDGE_COUNT (bb->succs) > `8`)
1371	return false;
1372
1373	if (code_may_grow && optimize_bb_for_speed_p (bb))
1374	max_size *= param_max_grow_copy_bb_insns;
1375
1376	FOR_BB_INSNS (bb, insn)
1377	{
1378	if (INSN_P (insn))
1379	{
1380	size += get_attr_min_length (insn);
1381	if (size > max_size)
1382	break;
1383	}
1384	}
1385
1386	if (size <= max_size)
1387	return true;
1388
1389	if (dump_file)
1390	{
1391	fprintf (stream: dump_file,
1392	format: "Block %d can't be copied because its size = %u.\n",
1393	bb->index, size);
1394	}
1395
1396	return false;
1397	}
1398
1399	/ Return the length of unconditional jump instruction. /
1400
1401	int
1402	get_uncond_jump_length (void)
1403	{
1404	unsigned int length;
1405
1406	start_sequence ();
1407	rtx_code_label *label = emit_label (gen_label_rtx ());
1408	rtx_insn *jump = emit_jump_insn (targetm.gen_jump (label));
1409	length = get_attr_min_length (jump);
1410	end_sequence ();
1411
1412	gcc_assert (length < INT_MAX);
1413	return length;
1414	}
1415
1416	/ Create a forwarder block to OLD_BB starting with NEW_LABEL and in the*
1417	other partition wrt OLD_BB. /*
1418
1419	static basic_block
1420	create_eh_forwarder_block (rtx_code_label *new_label, basic_block old_bb)
1421	{
1422	/ Split OLD_BB, so that EH pads have always only incoming EH edges,*
1423	bb_has_eh_pred bbs are treated specially by DF infrastructure. /*
1424	old_bb = split_block_after_labels (old_bb)->dest;
1425
1426	/ Put the new label and a jump in the new basic block. /
1427	rtx_insn *label = emit_label (new_label);
1428	rtx_code_label *old_label = block_label (old_bb);
1429	rtx_insn *jump = emit_jump_insn (targetm.gen_jump (old_label));
1430	JUMP_LABEL (jump) = old_label;
1431
1432	/ Create the new basic block and put it in last position. /
1433	basic_block last_bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb;
1434	basic_block new_bb = create_basic_block (label, jump, last_bb);
1435	new_bb->aux = last_bb->aux;
1436	new_bb->count = old_bb->count;
1437	last_bb->aux = new_bb;
1438
1439	emit_barrier_after_bb (bb: new_bb);
1440
1441	make_single_succ_edge (new_bb, old_bb, `0`);
1442
1443	/ Make sure the new basic block is in the other partition. /
1444	unsigned new_partition = BB_PARTITION (old_bb);
1445	new_partition ^= BB_HOT_PARTITION \| BB_COLD_PARTITION;
1446	BB_SET_PARTITION (new_bb, new_partition);
1447
1448	return new_bb;
1449	}
1450
1451	/ The common landing pad in block OLD_BB has edges from both partitions.*
1452	Add a new landing pad that will just jump to the old one and split the
1453	edges so that no EH edge crosses partitions. /*
1454
1455	static void
1456	sjlj_fix_up_crossing_landing_pad (basic_block old_bb)
1457	{
1458	const unsigned lp_len = cfun->eh->lp_array->length ();
1459	edge_iterator ei;
1460	edge e;
1461
1462	/ Generate the new common landing-pad label. /
1463	rtx_code_label *new_label = gen_label_rtx ();
1464	LABEL_PRESERVE_P (new_label) = `1`;
1465
1466	/ Create the forwarder block. /
1467	basic_block new_bb = create_eh_forwarder_block (new_label, old_bb);
1468
1469	/ Create the map from old to new lp index and initialize it. /
1470	unsigned index_map = (unsigned* ) alloca (lp_len sizeof (unsigned));
1471	memset (s: index_map, c: `0`, n: lp_len * sizeof (unsigned));
1472
1473	/ Fix up the edges. /
1474	for (ei = ei_start (old_bb->preds); (e = ei_safe_edge (i: ei)) != NULL; )
1475	if (e->src != new_bb && BB_PARTITION (e->src) == BB_PARTITION (new_bb))
1476	{
1477	rtx_insn *insn = BB_END (e->src);
1478	rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
1479
1480	gcc_assert (note != NULL);
1481	const unsigned old_index = INTVAL (XEXP (note, `0`));
1482
1483	/ Generate the new landing-pad structure. /
1484	if (index_map[old_index] == `0`)
1485	{
1486	eh_landing_pad old_lp = (*cfun->eh->lp_array)[old_index];
1487	eh_landing_pad new_lp = gen_eh_landing_pad (old_lp->region);
1488	new_lp->post_landing_pad = old_lp->post_landing_pad;
1489	new_lp->landing_pad = new_label;
1490	index_map[old_index] = new_lp->index;
1491	}
1492	XEXP (note, `0`) = GEN_INT (index_map[old_index]);
1493
1494	/ Adjust the edge to the new destination. /
1495	redirect_edge_succ (e, new_bb);
1496	}
1497	else
1498	ei_next (i: &ei);
1499	}
1500
1501	/ The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.*
1502	Add a new landing pad that will just jump to the old one and split the
1503	edges so that no EH edge crosses partitions. /*
1504
1505	static void
1506	dw2_fix_up_crossing_landing_pad (eh_landing_pad old_lp, basic_block old_bb)
1507	{
1508	eh_landing_pad new_lp;
1509	edge_iterator ei;
1510	edge e;
1511
1512	/ Generate the new landing-pad structure. /
1513	new_lp = gen_eh_landing_pad (old_lp->region);
1514	new_lp->post_landing_pad = old_lp->post_landing_pad;
1515	new_lp->landing_pad = gen_label_rtx ();
1516	LABEL_PRESERVE_P (new_lp->landing_pad) = `1`;
1517
1518	/ Create the forwarder block. /
1519	basic_block new_bb = create_eh_forwarder_block (new_label: new_lp->landing_pad, old_bb);
1520
1521	/ Fix up the edges. /
1522	for (ei = ei_start (old_bb->preds); (e = ei_safe_edge (i: ei)) != NULL; )
1523	if (e->src != new_bb && BB_PARTITION (e->src) == BB_PARTITION (new_bb))
1524	{
1525	rtx_insn *insn = BB_END (e->src);
1526	rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
1527
1528	gcc_assert (note != NULL);
1529	gcc_checking_assert (INTVAL (XEXP (note, `0`)) == old_lp->index);
1530	XEXP (note, `0`) = GEN_INT (new_lp->index);
1531
1532	/ Adjust the edge to the new destination. /
1533	redirect_edge_succ (e, new_bb);
1534	}
1535	else
1536	ei_next (i: &ei);
1537	}
1538
1539
1540	/ Ensure that all hot bbs are included in a hot path through the*
1541	procedure. This is done by calling this function twice, once
1542	with WALK_UP true (to look for paths from the entry to hot bbs) and
1543	once with WALK_UP false (to look for paths from hot bbs to the exit).
1544	Returns the updated value of COLD_BB_COUNT and adds newly-hot bbs
1545	to BBS_IN_HOT_PARTITION. /*
1546
1547	static unsigned int
1548	sanitize_hot_paths (bool walk_up, unsigned int cold_bb_count,
1549	vec<basic_block> *bbs_in_hot_partition)
1550	{
1551	/ Callers check this. /
1552	gcc_checking_assert (cold_bb_count);
1553
1554	/ Keep examining hot bbs while we still have some left to check*
1555	and there are remaining cold bbs. /*
1556	vec<basic_block> hot_bbs_to_check = bbs_in_hot_partition->copy ();
1557	while (! hot_bbs_to_check.is_empty ()
1558	&& cold_bb_count)
1559	{
1560	basic_block bb = hot_bbs_to_check.pop ();
1561	vec<edge, va_gc> *edges = walk_up ? bb->preds : bb->succs;
1562	edge e;
1563	edge_iterator ei;
1564	profile_probability highest_probability
1565	= profile_probability::uninitialized ();
1566	profile_count highest_count = profile_count::uninitialized ();
1567	bool found = false;
1568
1569	/ Walk the preds/succs and check if there is at least one already*
1570	marked hot. Keep track of the most frequent pred/succ so that we
1571	can mark it hot if we don't find one. /*
1572	FOR_EACH_EDGE (e, ei, edges)
1573	{
1574	basic_block reach_bb = walk_up ? e->src : e->dest;
1575
1576	if (e->flags & EDGE_DFS_BACK)
1577	continue;
1578
1579	/ Do not expect profile insanities when profile was not adjusted. /
1580	if (e->probability == profile_probability::never ()
1581	\|\| e->count () == profile_count::zero ())
1582	continue;
1583
1584	if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION)
1585	{
1586	found = true;
1587	break;
1588	}
1589	/ The following loop will look for the hottest edge via*
1590	the edge count, if it is non-zero, then fallback to
1591	the edge probability. /*
1592	if (!(e->count () > highest_count))
1593	highest_count = e->count ();
1594	if (!highest_probability.initialized_p ()
1595	\|\| e->probability > highest_probability)
1596	highest_probability = e->probability;
1597	}
1598
1599	/ If bb is reached by (or reaches, in the case of !WALK_UP) another hot*
1600	block (or unpartitioned, e.g. the entry block) then it is ok. If not,
1601	then the most frequent pred (or succ) needs to be adjusted. In the
1602	case where multiple preds/succs have the same frequency (e.g. a
1603	50-50 branch), then both will be adjusted. /*
1604	if (found)
1605	continue;
1606
1607	FOR_EACH_EDGE (e, ei, edges)
1608	{
1609	if (e->flags & EDGE_DFS_BACK)
1610	continue;
1611	/ Do not expect profile insanities when profile was not adjusted. /
1612	if (e->probability == profile_probability::never ()
1613	\|\| e->count () == profile_count::zero ())
1614	continue;
1615	/ Select the hottest edge using the edge count, if it is non-zero,*
1616	then fallback to the edge probability. /*
1617	if (highest_count.initialized_p ())
1618	{
1619	if (!(e->count () >= highest_count))
1620	continue;
1621	}
1622	else if (!(e->probability >= highest_probability))
1623	continue;
1624
1625	basic_block reach_bb = walk_up ? e->src : e->dest;
1626
1627	/ We have a hot bb with an immediate dominator that is cold.*
1628	The dominator needs to be re-marked hot. /*
1629	BB_SET_PARTITION (reach_bb, BB_HOT_PARTITION);
1630	if (dump_file)
1631	fprintf (stream: dump_file, format: "Promoting bb %i to hot partition to sanitize "
1632	"profile of bb %i in %s walk\n", reach_bb->index,
1633	bb->index, walk_up ? "backward" : "forward");
1634	cold_bb_count--;
1635
1636	/ Now we need to examine newly-hot reach_bb to see if it is also*
1637	dominated by a cold bb. /*
1638	bbs_in_hot_partition->safe_push (obj: reach_bb);
1639	hot_bbs_to_check.safe_push (obj: reach_bb);
1640	}
1641	}
1642	hot_bbs_to_check.release ();
1643
1644	return cold_bb_count;
1645	}
1646
1647
1648	/ Find the basic blocks that are rarely executed and need to be moved to*
1649	a separate section of the .o file (to cut down on paging and improve
1650	cache locality). Return a vector of all edges that cross. /*
1651
1652	static vec<edge>
1653	find_rarely_executed_basic_blocks_and_crossing_edges (void)
1654	{
1655	vec<edge> crossing_edges = vNULL;
1656	basic_block bb;
1657	edge e;
1658	edge_iterator ei;
1659	unsigned int cold_bb_count = `0`;
1660	auto_vec<basic_block> bbs_in_hot_partition;
1661
1662	propagate_unlikely_bbs_forward ();
1663
1664	/ Mark which partition (hot/cold) each basic block belongs in. /
1665	FOR_EACH_BB_FN (bb, cfun)
1666	{
1667	bool cold_bb = false;
1668
1669	if (probably_never_executed_bb_p (cfun, bb))
1670	{
1671	cold_bb = true;
1672
1673	/ Handle profile insanities created by upstream optimizations*
1674	by also checking the incoming edge weights. If there is a non-cold
1675	incoming edge, conservatively prevent this block from being split
1676	into the cold section. /*
1677	if (!bb->count.precise_p ())
1678	FOR_EACH_EDGE (e, ei, bb->preds)
1679	if (!probably_never_executed_edge_p (cfun, e))
1680	{
1681	cold_bb = false;
1682	break;
1683	}
1684	}
1685	if (cold_bb)
1686	{
1687	BB_SET_PARTITION (bb, BB_COLD_PARTITION);
1688	cold_bb_count++;
1689	}
1690	else
1691	{
1692	BB_SET_PARTITION (bb, BB_HOT_PARTITION);
1693	bbs_in_hot_partition.safe_push (obj: bb);
1694	}
1695	}
1696
1697	/ Ensure that hot bbs are included along a hot path from the entry to exit.*
1698	Several different possibilities may include cold bbs along all paths
1699	to/from a hot bb. One is that there are edge weight insanities
1700	due to optimization phases that do not properly update basic block profile
1701	counts. The second is that the entry of the function may not be hot, because
1702	it is entered fewer times than the number of profile training runs, but there
1703	is a loop inside the function that causes blocks within the function to be
1704	above the threshold for hotness. This is fixed by walking up from hot bbs
1705	to the entry block, and then down from hot bbs to the exit, performing
1706	partitioning fixups as necessary. /*
1707	if (cold_bb_count)
1708	{
1709	mark_dfs_back_edges ();
1710	cold_bb_count = sanitize_hot_paths (walk_up: true, cold_bb_count,
1711	bbs_in_hot_partition: &bbs_in_hot_partition);
1712	if (cold_bb_count)
1713	sanitize_hot_paths (walk_up: false, cold_bb_count, bbs_in_hot_partition: &bbs_in_hot_partition);
1714
1715	hash_set <basic_block> set;
1716	find_bbs_reachable_by_hot_paths (&set);
1717	FOR_EACH_BB_FN (bb, cfun)
1718	if (!set.contains (k: bb))
1719	BB_SET_PARTITION (bb, BB_COLD_PARTITION);
1720	}
1721
1722	/ The format of .gcc_except_table does not allow landing pads to*
1723	be in a different partition as the throw. Fix this by either
1724	moving the landing pads or inserting forwarder landing pads. /*
1725	if (cfun->eh->lp_array)
1726	{
1727	const bool sjlj
1728	= (targetm_common.except_unwind_info (&global_options) == UI_SJLJ);
1729	unsigned i;
1730	eh_landing_pad lp;
1731
1732	FOR_EACH_VEC_ELT (*cfun->eh->lp_array, i, lp)
1733	{
1734	bool all_same, all_diff;
1735
1736	if (lp == NULL
1737	\|\| lp->landing_pad == NULL_RTX
1738	\|\| !LABEL_P (lp->landing_pad))
1739	continue;
1740
1741	all_same = all_diff = true;
1742	bb = BLOCK_FOR_INSN (insn: lp->landing_pad);
1743	FOR_EACH_EDGE (e, ei, bb->preds)
1744	{
1745	gcc_assert (e->flags & EDGE_EH);
1746	if (BB_PARTITION (bb) == BB_PARTITION (e->src))
1747	all_diff = false;
1748	else
1749	all_same = false;
1750	}
1751
1752	if (all_same)
1753	;
1754	else if (all_diff)
1755	{
1756	int which = BB_PARTITION (bb);
1757	which ^= BB_HOT_PARTITION \| BB_COLD_PARTITION;
1758	BB_SET_PARTITION (bb, which);
1759	}
1760	else if (sjlj)
1761	sjlj_fix_up_crossing_landing_pad (old_bb: bb);
1762	else
1763	dw2_fix_up_crossing_landing_pad (old_lp: lp, old_bb: bb);
1764
1765	/ There is a single, common landing pad in SJLJ mode. /
1766	if (sjlj)
1767	break;
1768	}
1769	}
1770
1771	/ Mark every edge that crosses between sections. /
1772	FOR_EACH_BB_FN (bb, cfun)
1773	FOR_EACH_EDGE (e, ei, bb->succs)
1774	{
1775	unsigned int flags = e->flags;
1776
1777	/ We should never have EDGE_CROSSING set yet. /
1778	gcc_checking_assert ((flags & EDGE_CROSSING) == `0`);
1779
1780	if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
1781	&& e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
1782	&& BB_PARTITION (e->src) != BB_PARTITION (e->dest))
1783	{
1784	crossing_edges.safe_push (obj: e);
1785	flags \|= EDGE_CROSSING;
1786	}
1787
1788	/ Now that we've split eh edges as appropriate, allow landing pads*
1789	to be merged with the post-landing pads. /*
1790	flags &= ~EDGE_PRESERVE;
1791
1792	e->flags = flags;
1793	}
1794
1795	return crossing_edges;
1796	}
1797
1798	/ Set the flag EDGE_CAN_FALLTHRU for edges that can be fallthru. /
1799
1800	static void
1801	set_edge_can_fallthru_flag (void)
1802	{
1803	basic_block bb;
1804
1805	FOR_EACH_BB_FN (bb, cfun)
1806	{
1807	edge e;
1808	edge_iterator ei;
1809
1810	FOR_EACH_EDGE (e, ei, bb->succs)
1811	{
1812	e->flags &= ~EDGE_CAN_FALLTHRU;
1813
1814	/ The FALLTHRU edge is also CAN_FALLTHRU edge. /
1815	if (e->flags & EDGE_FALLTHRU)
1816	e->flags \|= EDGE_CAN_FALLTHRU;
1817	}
1818
1819	/ If the BB ends with an invertible condjump all (2) edges are*
1820	CAN_FALLTHRU edges. /*
1821	if (EDGE_COUNT (bb->succs) != `2`)
1822	continue;
1823	if (!any_condjump_p (BB_END (bb)))
1824	continue;
1825
1826	rtx_jump_insn bb_end_jump = as_a <rtx_jump_insn > (BB_END (bb));
1827	if (!invert_jump (bb_end_jump, JUMP_LABEL (bb_end_jump), `0`))
1828	continue;
1829	invert_jump (bb_end_jump, JUMP_LABEL (bb_end_jump), `0`);
1830	EDGE_SUCC (bb, `0`)->flags \|= EDGE_CAN_FALLTHRU;
1831	EDGE_SUCC (bb, `1`)->flags \|= EDGE_CAN_FALLTHRU;
1832	}
1833	}
1834
1835	/ If any destination of a crossing edge does not have a label, add label;*
1836	Convert any easy fall-through crossing edges to unconditional jumps. /*
1837
1838	static void
1839	add_labels_and_missing_jumps (vec<edge> crossing_edges)
1840	{
1841	size_t i;
1842	edge e;
1843
1844	FOR_EACH_VEC_ELT (crossing_edges, i, e)
1845	{
1846	basic_block src = e->src;
1847	basic_block dest = e->dest;
1848	rtx_jump_insn *new_jump;
1849
1850	if (dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
1851	continue;
1852
1853	/ Make sure dest has a label. /
1854	rtx_code_label *label = block_label (dest);
1855
1856	/ Nothing to do for non-fallthru edges. /
1857	if (src == ENTRY_BLOCK_PTR_FOR_FN (cfun))
1858	continue;
1859	if ((e->flags & EDGE_FALLTHRU) == `0`)
1860	continue;
1861
1862	/ If the block does not end with a control flow insn, then we*
1863	can trivially add a jump to the end to fixup the crossing.
1864	Otherwise the jump will have to go in a new bb, which will
1865	be handled by fix_up_fall_thru_edges function. /*
1866	if (control_flow_insn_p (BB_END (src)))
1867	continue;
1868
1869	/ Make sure there's only one successor. /
1870	gcc_assert (single_succ_p (src));
1871
1872	new_jump = emit_jump_insn_after (targetm.gen_jump (label), BB_END (src));
1873	BB_END (src) = new_jump;
1874	JUMP_LABEL (new_jump) = label;
1875	LABEL_NUSES (label) += `1`;
1876
1877	emit_barrier_after_bb (bb: src);
1878
1879	/ Mark edge as non-fallthru. /
1880	e->flags &= ~EDGE_FALLTHRU;
1881	}
1882	}
1883
1884	/ Find any bb's where the fall-through edge is a crossing edge (note that*
1885	these bb's must also contain a conditional jump or end with a call
1886	instruction; we've already dealt with fall-through edges for blocks
1887	that didn't have a conditional jump or didn't end with call instruction
1888	in the call to add_labels_and_missing_jumps). Convert the fall-through
1889	edge to non-crossing edge by inserting a new bb to fall-through into.
1890	The new bb will contain an unconditional jump (crossing edge) to the
1891	original fall through destination. /*
1892
1893	static void
1894	fix_up_fall_thru_edges (void)
1895	{
1896	basic_block cur_bb;
1897
1898	FOR_EACH_BB_FN (cur_bb, cfun)
1899	{
1900	edge succ1;
1901	edge succ2;
1902	edge fall_thru = NULL;
1903	edge cond_jump = NULL;
1904
1905	fall_thru = NULL;
1906	if (EDGE_COUNT (cur_bb->succs) > `0`)
1907	succ1 = EDGE_SUCC (cur_bb, `0`);
1908	else
1909	succ1 = NULL;
1910
1911	if (EDGE_COUNT (cur_bb->succs) > `1`)
1912	succ2 = EDGE_SUCC (cur_bb, `1`);
1913	else
1914	succ2 = NULL;
1915
1916	/ Find the fall-through edge. /
1917
1918	if (succ1
1919	&& (succ1->flags & EDGE_FALLTHRU))
1920	{
1921	fall_thru = succ1;
1922	cond_jump = succ2;
1923	}
1924	else if (succ2
1925	&& (succ2->flags & EDGE_FALLTHRU))
1926	{
1927	fall_thru = succ2;
1928	cond_jump = succ1;
1929	}
1930	else if (succ2 && EDGE_COUNT (cur_bb->succs) > `2`)
1931	fall_thru = find_fallthru_edge (edges: cur_bb->succs);
1932
1933	if (fall_thru && (fall_thru->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)))
1934	{
1935	/ Check to see if the fall-thru edge is a crossing edge. /
1936
1937	if (fall_thru->flags & EDGE_CROSSING)
1938	{
1939	/ The fall_thru edge crosses; now check the cond jump edge, if*
1940	it exists. /*
1941
1942	bool cond_jump_crosses = true;
1943	int invert_worked = `0`;
1944	rtx_insn *old_jump = BB_END (cur_bb);
1945
1946	/ Find the jump instruction, if there is one. /
1947
1948	if (cond_jump)
1949	{
1950	if (!(cond_jump->flags & EDGE_CROSSING))
1951	cond_jump_crosses = false;
1952
1953	/ We know the fall-thru edge crosses; if the cond*
1954	jump edge does NOT cross, and its destination is the
1955	next block in the bb order, invert the jump
1956	(i.e. fix it so the fall through does not cross and
1957	the cond jump does). /*
1958
1959	if (!cond_jump_crosses)
1960	{
1961	/ Find label in fall_thru block. We've already added*
1962	any missing labels, so there must be one. /*
1963
1964	rtx_code_label *fall_thru_label
1965	= block_label (fall_thru->dest);
1966
1967	if (old_jump && fall_thru_label)
1968	{
1969	rtx_jump_insn *old_jump_insn
1970	= dyn_cast <rtx_jump_insn *> (p: old_jump);
1971	if (old_jump_insn)
1972	invert_worked = invert_jump (old_jump_insn,
1973	fall_thru_label, `0`);
1974	}
1975
1976	if (invert_worked)
1977	{
1978	fall_thru->flags &= ~EDGE_FALLTHRU;
1979	cond_jump->flags \|= EDGE_FALLTHRU;
1980	update_br_prob_note (cur_bb);
1981	std::swap (a&: fall_thru, b&: cond_jump);
1982	cond_jump->flags \|= EDGE_CROSSING;
1983	fall_thru->flags &= ~EDGE_CROSSING;
1984	}
1985	}
1986	}
1987
1988	if (cond_jump_crosses \|\| !invert_worked)
1989	{
1990	/ This is the case where both edges out of the basic*
1991	block are crossing edges. Here we will fix up the
1992	fall through edge. The jump edge will be taken care
1993	of later. The EDGE_CROSSING flag of fall_thru edge
1994	is unset before the call to force_nonfallthru
1995	function because if a new basic-block is created
1996	this edge remains in the current section boundary
1997	while the edge between new_bb and the fall_thru->dest
1998	becomes EDGE_CROSSING. /*
1999
2000	fall_thru->flags &= ~EDGE_CROSSING;
2001	unsigned old_count = EDGE_COUNT (cur_bb->succs);
2002	basic_block new_bb = force_nonfallthru (fall_thru);
2003
2004	if (new_bb)
2005	{
2006	new_bb->aux = cur_bb->aux;
2007	cur_bb->aux = new_bb;
2008
2009	/ This is done by force_nonfallthru_and_redirect. /
2010	gcc_assert (BB_PARTITION (new_bb)
2011	== BB_PARTITION (cur_bb));
2012
2013	edge e = single_succ_edge (bb: new_bb);
2014	e->flags \|= EDGE_CROSSING;
2015	if (EDGE_COUNT (cur_bb->succs) > old_count)
2016	{
2017	/ If asm goto has a crossing fallthrough edge*
2018	and at least one of the labels to the same bb,
2019	force_nonfallthru can result in the fallthrough
2020	edge being redirected and a new edge added for the
2021	label or more labels to e->dest. As we've
2022	temporarily cleared EDGE_CROSSING flag on the
2023	fallthrough edge, we need to restore it again.
2024	See PR108596. /*
2025	rtx_insn *j = BB_END (cur_bb);
2026	gcc_checking_assert (JUMP_P (j)
2027	&& (asm_noperands (PATTERN (j))
2028	> `0`));
2029	edge e2 = find_edge (cur_bb, e->dest);
2030	if (e2)
2031	e2->flags \|= EDGE_CROSSING;
2032	}
2033	}
2034	else
2035	{
2036	/ If a new basic-block was not created; restore*
2037	the EDGE_CROSSING flag. /*
2038	fall_thru->flags \|= EDGE_CROSSING;
2039	}
2040
2041	/ Add barrier after new jump /
2042	emit_barrier_after_bb (bb: new_bb ? new_bb : cur_bb);
2043	}
2044	}
2045	}
2046	}
2047	}
2048
2049	/ This function checks the destination block of a "crossing jump" to*
2050	see if it has any crossing predecessors that begin with a code label
2051	and end with an unconditional jump. If so, it returns that predecessor
2052	block. (This is to avoid creating lots of new basic blocks that all
2053	contain unconditional jumps to the same destination). /*
2054
2055	static basic_block
2056	find_jump_block (basic_block jump_dest)
2057	{
2058	basic_block source_bb = NULL;
2059	edge e;
2060	rtx_insn *insn;
2061	edge_iterator ei;
2062
2063	FOR_EACH_EDGE (e, ei, jump_dest->preds)
2064	if (e->flags & EDGE_CROSSING)
2065	{
2066	basic_block src = e->src;
2067
2068	/ Check each predecessor to see if it has a label, and contains*
2069	only one executable instruction, which is an unconditional jump.
2070	If so, we can use it. /*
2071
2072	if (LABEL_P (BB_HEAD (src)))
2073	for (insn = BB_HEAD (src);
2074	!INSN_P (insn) && insn != NEXT_INSN (BB_END (src));
2075	insn = NEXT_INSN (insn))
2076	{
2077	if (INSN_P (insn)
2078	&& insn == BB_END (src)
2079	&& JUMP_P (insn)
2080	&& !any_condjump_p (insn))
2081	{
2082	source_bb = src;
2083	break;
2084	}
2085	}
2086
2087	if (source_bb)
2088	break;
2089	}
2090
2091	return source_bb;
2092	}
2093
2094	/ Find all BB's with conditional jumps that are crossing edges;*
2095	insert a new bb and make the conditional jump branch to the new
2096	bb instead (make the new bb same color so conditional branch won't
2097	be a 'crossing' edge). Insert an unconditional jump from the
2098	new bb to the original destination of the conditional jump. /*
2099
2100	static void
2101	fix_crossing_conditional_branches (void)
2102	{
2103	basic_block cur_bb;
2104	basic_block new_bb;
2105	basic_block dest;
2106	edge succ1;
2107	edge succ2;
2108	edge crossing_edge;
2109	edge new_edge;
2110	rtx set_src;
2111	rtx old_label = NULL_RTX;
2112	rtx_code_label *new_label;
2113
2114	FOR_EACH_BB_FN (cur_bb, cfun)
2115	{
2116	crossing_edge = NULL;
2117	if (EDGE_COUNT (cur_bb->succs) > `0`)
2118	succ1 = EDGE_SUCC (cur_bb, `0`);
2119	else
2120	succ1 = NULL;
2121
2122	if (EDGE_COUNT (cur_bb->succs) > `1`)
2123	succ2 = EDGE_SUCC (cur_bb, `1`);
2124	else
2125	succ2 = NULL;
2126
2127	/ We already took care of fall-through edges, so only one successor*
2128	can be a crossing edge. /*
2129
2130	if (succ1 && (succ1->flags & EDGE_CROSSING))
2131	crossing_edge = succ1;
2132	else if (succ2 && (succ2->flags & EDGE_CROSSING))
2133	crossing_edge = succ2;
2134
2135	if (crossing_edge)
2136	{
2137	rtx_insn *old_jump = BB_END (cur_bb);
2138
2139	/ Check to make sure the jump instruction is a*
2140	conditional jump. /*
2141
2142	set_src = NULL_RTX;
2143
2144	if (any_condjump_p (old_jump))
2145	{
2146	if (GET_CODE (PATTERN (old_jump)) == SET)
2147	set_src = SET_SRC (PATTERN (old_jump));
2148	else if (GET_CODE (PATTERN (old_jump)) == PARALLEL)
2149	{
2150	set_src = XVECEXP (PATTERN (old_jump), `0`,`0`);
2151	if (GET_CODE (set_src) == SET)
2152	set_src = SET_SRC (set_src);
2153	else
2154	set_src = NULL_RTX;
2155	}
2156	}
2157
2158	if (set_src && (GET_CODE (set_src) == IF_THEN_ELSE))
2159	{
2160	rtx_jump_insn *old_jump_insn =
2161	as_a <rtx_jump_insn *> (p: old_jump);
2162
2163	if (GET_CODE (XEXP (set_src, `1`)) == PC)
2164	old_label = XEXP (set_src, `2`);
2165	else if (GET_CODE (XEXP (set_src, `2`)) == PC)
2166	old_label = XEXP (set_src, `1`);
2167
2168	/ Check to see if new bb for jumping to that dest has*
2169	already been created; if so, use it; if not, create
2170	a new one. /*
2171
2172	new_bb = find_jump_block (jump_dest: crossing_edge->dest);
2173
2174	if (new_bb)
2175	new_label = block_label (new_bb);
2176	else
2177	{
2178	basic_block last_bb;
2179	rtx_code_label *old_jump_target;
2180	rtx_jump_insn *new_jump;
2181
2182	/ Create new basic block to be dest for*
2183	conditional jump. /*
2184
2185	/ Put appropriate instructions in new bb. /
2186
2187	new_label = gen_label_rtx ();
2188	emit_label (new_label);
2189
2190	gcc_assert (GET_CODE (old_label) == LABEL_REF);
2191	old_jump_target = old_jump_insn->jump_target ();
2192	new_jump = as_a <rtx_jump_insn *>
2193	(p: emit_jump_insn (targetm.gen_jump (old_jump_target)));
2194	new_jump->set_jump_target (old_jump_target);
2195
2196	last_bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb;
2197	new_bb = create_basic_block (new_label, new_jump, last_bb);
2198	new_bb->aux = last_bb->aux;
2199	last_bb->aux = new_bb;
2200
2201	emit_barrier_after_bb (bb: new_bb);
2202
2203	/ Make sure new bb is in same partition as source*
2204	of conditional branch. /*
2205	BB_COPY_PARTITION (new_bb, cur_bb);
2206	}
2207
2208	/ Make old jump branch to new bb. /
2209
2210	redirect_jump (old_jump_insn, new_label, `0`);
2211
2212	/ Remove crossing_edge as predecessor of 'dest'. /
2213
2214	dest = crossing_edge->dest;
2215
2216	redirect_edge_succ (crossing_edge, new_bb);
2217
2218	/ Make a new edge from new_bb to old dest; new edge*
2219	will be a successor for new_bb and a predecessor
2220	for 'dest'. /*
2221
2222	if (EDGE_COUNT (new_bb->succs) == `0`)
2223	new_edge = make_single_succ_edge (new_bb, dest, `0`);
2224	else
2225	new_edge = EDGE_SUCC (new_bb, `0`);
2226
2227	crossing_edge->flags &= ~EDGE_CROSSING;
2228	new_edge->flags \|= EDGE_CROSSING;
2229	}
2230	}
2231	}
2232	}
2233
2234	/ Find any unconditional branches that cross between hot and cold*
2235	sections. Convert them into indirect jumps instead. /*
2236
2237	static void
2238	fix_crossing_unconditional_branches (void)
2239	{
2240	basic_block cur_bb;
2241	rtx_insn *last_insn;
2242	rtx label;
2243	rtx label_addr;
2244	rtx_insn *indirect_jump_sequence;
2245	rtx_insn *jump_insn = NULL;
2246	rtx new_reg;
2247	rtx_insn *cur_insn;
2248	edge succ;
2249
2250	FOR_EACH_BB_FN (cur_bb, cfun)
2251	{
2252	last_insn = BB_END (cur_bb);
2253
2254	if (EDGE_COUNT (cur_bb->succs) < `1`)
2255	continue;
2256
2257	succ = EDGE_SUCC (cur_bb, `0`);
2258
2259	/ Check to see if bb ends in a crossing (unconditional) jump. At*
2260	this point, no crossing jumps should be conditional. /*
2261
2262	if (JUMP_P (last_insn)
2263	&& (succ->flags & EDGE_CROSSING))
2264	{
2265	gcc_assert (!any_condjump_p (last_insn));
2266
2267	/ Make sure the jump is not already an indirect or table jump. /
2268
2269	if (!computed_jump_p (last_insn)
2270	&& !tablejump_p (last_insn, NULL, NULL)
2271	&& asm_noperands (PATTERN (insn: last_insn)) < `0`)
2272	{
2273	/ We have found a "crossing" unconditional branch. Now*
2274	we must convert it to an indirect jump. First create
2275	reference of label, as target for jump. /*
2276
2277	label = JUMP_LABEL (last_insn);
2278	label_addr = gen_rtx_LABEL_REF (Pmode, label);
2279	LABEL_NUSES (label) += `1`;
2280
2281	/ Get a register to use for the indirect jump. /
2282
2283	new_reg = gen_reg_rtx (Pmode);
2284
2285	/ Generate indirect the jump sequence. /
2286
2287	start_sequence ();
2288	emit_move_insn (new_reg, label_addr);
2289	emit_indirect_jump (new_reg);
2290	indirect_jump_sequence = get_insns ();
2291	end_sequence ();
2292
2293	/ Make sure every instruction in the new jump sequence has*
2294	its basic block set to be cur_bb. /*
2295
2296	for (cur_insn = indirect_jump_sequence; cur_insn;
2297	cur_insn = NEXT_INSN (insn: cur_insn))
2298	{
2299	if (!BARRIER_P (cur_insn))
2300	BLOCK_FOR_INSN (insn: cur_insn) = cur_bb;
2301	if (JUMP_P (cur_insn))
2302	jump_insn = cur_insn;
2303	}
2304
2305	/ Insert the new (indirect) jump sequence immediately before*
2306	the unconditional jump, then delete the unconditional jump. /*
2307
2308	emit_insn_before (indirect_jump_sequence, last_insn);
2309	delete_insn (last_insn);
2310
2311	JUMP_LABEL (jump_insn) = label;
2312	LABEL_NUSES (label)++;
2313
2314	/ Make BB_END for cur_bb be the jump instruction (NOT the*
2315	barrier instruction at the end of the sequence...). /*
2316
2317	BB_END (cur_bb) = jump_insn;
2318	}
2319	}
2320	}
2321	}
2322
2323	/ Update CROSSING_JUMP_P flags on all jump insns. /
2324
2325	static void
2326	update_crossing_jump_flags (void)
2327	{
2328	basic_block bb;
2329	edge e;
2330	edge_iterator ei;
2331
2332	FOR_EACH_BB_FN (bb, cfun)
2333	FOR_EACH_EDGE (e, ei, bb->succs)
2334	if (e->flags & EDGE_CROSSING)
2335	{
2336	if (JUMP_P (BB_END (bb)))
2337	CROSSING_JUMP_P (BB_END (bb)) = `1`;
2338	break;
2339	}
2340	}
2341
2342	/ Reorder basic blocks using the software trace cache (STC) algorithm. /
2343
2344	static void
2345	reorder_basic_blocks_software_trace_cache (void)
2346	{
2347	if (dump_file)
2348	fprintf (stream: dump_file, format: "\nReordering with the STC algorithm.\n\n");
2349
2350	int n_traces;
2351	int i;
2352	struct trace *traces;
2353
2354	/ We are estimating the length of uncond jump insn only once since the code*
2355	for getting the insn length always returns the minimal length now. /*
2356	if (uncond_jump_length == `0`)
2357	uncond_jump_length = get_uncond_jump_length ();
2358
2359	/ We need to know some information for each basic block. /
2360	array_size = GET_ARRAY_SIZE (last_basic_block_for_fn (cfun));
2361	bbd = XNEWVEC (bbro_basic_block_data, array_size);
2362	for (i = `0`; i < array_size; i++)
2363	{
2364	bbd[i].start_of_trace = -`1`;
2365	bbd[i].end_of_trace = -`1`;
2366	bbd[i].in_trace = -`1`;
2367	bbd[i].visited = `0`;
2368	bbd[i].priority = -`1`;
2369	bbd[i].heap = NULL;
2370	bbd[i].node = NULL;
2371	}
2372
2373	traces = XNEWVEC (struct trace, n_basic_blocks_for_fn (cfun));
2374	n_traces = `0`;
2375	find_traces (n_traces: &n_traces, traces);
2376	connect_traces (n_traces, traces);
2377	FREE (traces);
2378	FREE (bbd);
2379	}
2380
2381	/ Order edges by execution frequency, higher first. /
2382
2383	static int
2384	edge_order (const void ve1, const* void *ve2)
2385	{
2386	edge e1 = (const* edge *) ve1;
2387	edge e2 = (const* edge *) ve2;
2388	profile_count c1 = e1->count ();
2389	profile_count c2 = e2->count ();
2390	/ Since profile_count::operator< does not establish a strict weak order*
2391	in presence of uninitialized counts, use 'max': this makes them appear
2392	as if having execution frequency less than any initialized count. /*
2393	profile_count m = c1.max (other: c2);
2394	return (m == c2) - (m == c1);
2395	}
2396
2397	/ Reorder basic blocks using the "simple" algorithm. This tries to*
2398	maximize the dynamic number of branches that are fallthrough, without
2399	copying instructions. The algorithm is greedy, looking at the most
2400	frequently executed branch first. /*
2401
2402	static void
2403	reorder_basic_blocks_simple (void)
2404	{
2405	if (dump_file)
2406	fprintf (stream: dump_file, format: "\nReordering with the \"simple\" algorithm.\n\n");
2407
2408	edge edges = new* edge[`2` * n_basic_blocks_for_fn (cfun)];
2409
2410	/ First, collect all edges that can be optimized by reordering blocks:*
2411	simple jumps and conditional jumps, as well as the function entry edge. /*
2412
2413	int n = `0`;
2414	edges[n++] = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), `0`);
2415
2416	basic_block bb;
2417	FOR_EACH_BB_FN (bb, cfun)
2418	{
2419	rtx_insn *end = BB_END (bb);
2420
2421	if (computed_jump_p (end) \|\| tablejump_p (end, NULL, NULL))
2422	continue;
2423
2424	/ We cannot optimize asm goto. /
2425	if (JUMP_P (end) && extract_asm_operands (end))
2426	continue;
2427
2428	if (single_succ_p (bb))
2429	edges[n++] = EDGE_SUCC (bb, `0`);
2430	else if (any_condjump_p (end))
2431	{
2432	edge e0 = EDGE_SUCC (bb, `0`);
2433	edge e1 = EDGE_SUCC (bb, `1`);
2434	/ When optimizing for size it is best to keep the original*
2435	fallthrough edges. /*
2436	if (e1->flags & EDGE_FALLTHRU)
2437	std::swap (a&: e0, b&: e1);
2438	edges[n++] = e0;
2439	edges[n++] = e1;
2440	}
2441	}
2442
2443	/ Sort the edges, the most desirable first. When optimizing for size*
2444	all edges are equally desirable. /*
2445
2446	if (optimize_function_for_speed_p (cfun))
2447	gcc_stablesort (edges, n, sizeof *edges, edge_order);
2448
2449	/ Now decide which of those edges to make fallthrough edges. We set*
2450	BB_VISITED if a block already has a fallthrough successor assigned
2451	to it. We make ->AUX of an endpoint point to the opposite endpoint
2452	of a sequence of blocks that fall through, and ->AUX will be NULL
2453	for a block that is in such a sequence but not an endpoint anymore.
2454
2455	To start with, everything points to itself, nothing is assigned yet. /*
2456
2457	FOR_ALL_BB_FN (bb, cfun)
2458	{
2459	bb->aux = bb;
2460	bb->flags &= ~BB_VISITED;
2461	}
2462
2463	EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = `0`;
2464
2465	/ Now for all edges, the most desirable first, see if that edge can*
2466	connect two sequences. If it can, update AUX and BB_VISITED; if it
2467	cannot, zero out the edge in the table. /*
2468
2469	for (int j = `0`; j < n; j++)
2470	{
2471	edge e = edges[j];
2472
2473	basic_block tail_a = e->src;
2474	basic_block head_b = e->dest;
2475	basic_block head_a = (basic_block) tail_a->aux;
2476	basic_block tail_b = (basic_block) head_b->aux;
2477
2478	/ An edge cannot connect two sequences if:*
2479	- it crosses partitions;
2480	- its src is not a current endpoint;
2481	- its dest is not a current endpoint;
2482	- or, it would create a loop. /*
2483
2484	if (e->flags & EDGE_CROSSING
2485	\|\| tail_a->flags & BB_VISITED
2486	\|\| !tail_b
2487	\|\| (!(head_b->flags & BB_VISITED) && head_b != tail_b)
2488	\|\| tail_a == tail_b)
2489	{
2490	edges[j] = `0`;
2491	continue;
2492	}
2493
2494	tail_a->aux = `0`;
2495	head_b->aux = `0`;
2496	head_a->aux = tail_b;
2497	tail_b->aux = head_a;
2498	tail_a->flags \|= BB_VISITED;
2499	}
2500
2501	/ Put the pieces together, in the same order that the start blocks of*
2502	the sequences already had. The hot/cold partitioning gives a little
2503	complication: as a first pass only do this for blocks in the same
2504	partition as the start block, and (if there is anything left to do)
2505	in a second pass handle the other partition. /*
2506
2507	basic_block last_tail = (basic_block) ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
2508
2509	int current_partition
2510	= BB_PARTITION (last_tail == ENTRY_BLOCK_PTR_FOR_FN (cfun)
2511	? EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), `0`)->dest
2512	: last_tail);
2513	bool need_another_pass = true;
2514
2515	for (int pass = `0`; pass < `2` && need_another_pass; pass++)
2516	{
2517	need_another_pass = false;
2518
2519	FOR_EACH_BB_FN (bb, cfun)
2520	if ((bb->flags & BB_VISITED && bb->aux) \|\| bb->aux == bb)
2521	{
2522	if (BB_PARTITION (bb) != current_partition)
2523	{
2524	need_another_pass = true;
2525	continue;
2526	}
2527
2528	last_tail->aux = bb;
2529	last_tail = (basic_block) bb->aux;
2530	}
2531
2532	current_partition ^= BB_HOT_PARTITION \| BB_COLD_PARTITION;
2533	}
2534
2535	last_tail->aux = `0`;
2536
2537	/ Finally, link all the chosen fallthrough edges. /
2538
2539	for (int j = `0`; j < n; j++)
2540	if (edges[j])
2541	edges[j]->src->aux = edges[j]->dest;
2542
2543	delete[] edges;
2544
2545	/ If the entry edge no longer falls through we have to make a new*
2546	block so it can do so again. /*
2547
2548	edge e = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), `0`);
2549	if (e->dest != ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux)
2550	{
2551	force_nonfallthru (e);
2552	e->src->aux = ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
2553	}
2554	}
2555
2556	/ Reorder basic blocks. The main entry point to this file. /
2557
2558	static void
2559	reorder_basic_blocks (void)
2560	{
2561	gcc_assert (current_ir_type () == IR_RTL_CFGLAYOUT);
2562
2563	if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + `1`)
2564	return;
2565
2566	set_edge_can_fallthru_flag ();
2567	mark_dfs_back_edges ();
2568
2569	switch (flag_reorder_blocks_algorithm)
2570	{
2571	case REORDER_BLOCKS_ALGORITHM_SIMPLE:
2572	reorder_basic_blocks_simple ();
2573	break;
2574
2575	case REORDER_BLOCKS_ALGORITHM_STC:
2576	reorder_basic_blocks_software_trace_cache ();
2577	break;
2578
2579	default:
2580	gcc_unreachable ();
2581	}
2582
2583	relink_block_chain (/stay_in_cfglayout_mode=/true);
2584
2585	if (dump_file)
2586	{
2587	if (dump_flags & TDF_DETAILS)
2588	dump_reg_info (dump_file);
2589	dump_flow_info (dump_file, dump_flags);
2590	}
2591
2592	/ Signal that rtl_verify_flow_info_1 can now verify that there*
2593	is at most one switch between hot/cold sections. /*
2594	crtl->bb_reorder_complete = true;
2595	}
2596
2597	/ Determine which partition the first basic block in the function*
2598	belongs to, then find the first basic block in the current function
2599	that belongs to a different section, and insert a
2600	NOTE_INSN_SWITCH_TEXT_SECTIONS note immediately before it in the
2601	instruction stream. When writing out the assembly code,
2602	encountering this note will make the compiler switch between the
2603	hot and cold text sections. /*
2604
2605	void
2606	insert_section_boundary_note (void)
2607	{
2608	basic_block bb;
2609	bool switched_sections = false;
2610	int current_partition = `0`;
2611
2612	if (!crtl->has_bb_partition)
2613	return;
2614
2615	FOR_EACH_BB_FN (bb, cfun)
2616	{
2617	if (!current_partition)
2618	current_partition = BB_PARTITION (bb);
2619	if (BB_PARTITION (bb) != current_partition)
2620	{
2621	gcc_assert (!switched_sections);
2622	switched_sections = true;
2623	emit_note_before (NOTE_INSN_SWITCH_TEXT_SECTIONS, BB_HEAD (bb));
2624	current_partition = BB_PARTITION (bb);
2625	}
2626	}
2627
2628	/ Make sure crtl->has_bb_partition matches reality even if bbpart finds*
2629	some hot and some cold basic blocks, but later one of those kinds is
2630	optimized away. /*
2631	crtl->has_bb_partition = switched_sections;
2632	}
2633
2634	namespace {
2635
2636	const pass_data pass_data_reorder_blocks =
2637	{
2638	.type: RTL_PASS, / type /
2639	.name: "bbro", / name /
2640	.optinfo_flags: OPTGROUP_NONE, / optinfo_flags /
2641	.tv_id: TV_REORDER_BLOCKS, / tv_id /
2642	.properties_required: `0`, / properties_required /
2643	.properties_provided: `0`, / properties_provided /
2644	.properties_destroyed: `0`, / properties_destroyed /
2645	.todo_flags_start: `0`, / todo_flags_start /
2646	.todo_flags_finish: `0`, / todo_flags_finish /
2647	};
2648
2649	class pass_reorder_blocks : public rtl_opt_pass
2650	{
2651	public:
2652	pass_reorder_blocks (gcc::context *ctxt)
2653	: rtl_opt_pass (pass_data_reorder_blocks, ctxt)
2654	{}
2655
2656	/ opt_pass methods: /
2657	bool gate (function *) final override
2658	{
2659	if (targetm.cannot_modify_jumps_p ())
2660	return false;
2661	return (optimize > `0`
2662	&& (flag_reorder_blocks \|\| flag_reorder_blocks_and_partition));
2663	}
2664
2665	unsigned int execute (function *) final override;
2666
2667	}; // class pass_reorder_blocks
2668
2669	unsigned int
2670	pass_reorder_blocks::execute (function *fun)
2671	{
2672	basic_block bb;
2673
2674	/ Last attempt to optimize CFG, as scheduling, peepholing and insn*
2675	splitting possibly introduced more crossjumping opportunities. /*
2676	cfg_layout_initialize (CLEANUP_EXPENSIVE);
2677
2678	reorder_basic_blocks ();
2679	cleanup_cfg (CLEANUP_EXPENSIVE \| CLEANUP_NO_PARTITIONING);
2680
2681	FOR_EACH_BB_FN (bb, fun)
2682	if (bb->next_bb != EXIT_BLOCK_PTR_FOR_FN (fun))
2683	bb->aux = bb->next_bb;
2684	cfg_layout_finalize ();
2685
2686	FOR_EACH_BB_FN (bb, fun)
2687	df_recompute_luids (bb);
2688	return `0`;
2689	}
2690
2691	} // anon namespace
2692
2693	rtl_opt_pass *
2694	make_pass_reorder_blocks (gcc::context *ctxt)
2695	{
2696	return new pass_reorder_blocks (ctxt);
2697	}
2698
2699	/ Duplicate a block (that we already know ends in a computed jump) into its*
2700	predecessors, where possible. Return whether anything is changed. /*
2701	static bool
2702	maybe_duplicate_computed_goto (basic_block bb, int max_size)
2703	{
2704	/ Make sure that the block is small enough. /
2705	rtx_insn *insn;
2706	FOR_BB_INSNS (bb, insn)
2707	if (INSN_P (insn))
2708	{
2709	max_size -= get_attr_min_length (insn);
2710	if (max_size < `0`)
2711	return false;
2712	}
2713
2714	bool changed = false;
2715	edge e;
2716	edge_iterator ei;
2717	for (ei = ei_start (bb->preds); (e = ei_safe_edge (i: ei)); )
2718	{
2719	basic_block pred = e->src;
2720
2721	/ Do not duplicate BB into PRED if we cannot merge a copy of BB*
2722	with PRED. /*
2723	if (!single_succ_p (bb: pred)
2724	\|\| e->flags & EDGE_COMPLEX
2725	\|\| pred->index < NUM_FIXED_BLOCKS
2726	\|\| (JUMP_P (BB_END (pred)) && !simplejump_p (BB_END (pred)))
2727	\|\| (JUMP_P (BB_END (pred)) && CROSSING_JUMP_P (BB_END (pred))))
2728	{
2729	ei_next (i: &ei);
2730	continue;
2731	}
2732
2733	if (dump_file)
2734	fprintf (stream: dump_file, format: "Duplicating computed goto bb %d into bb %d\n",
2735	bb->index, e->src->index);
2736
2737	/ Remember if PRED can be duplicated; if so, the copy of BB merged*
2738	with PRED can be duplicated as well. /*
2739	bool can_dup_more = can_duplicate_block_p (pred);
2740
2741	/ Make a copy of BB, merge it into PRED. /
2742	basic_block copy = duplicate_block (bb, e, NULL);
2743	emit_barrier_after_bb (bb: copy);
2744	reorder_insns_nobb (BB_HEAD (copy), BB_END (copy), BB_END (pred));
2745	merge_blocks (pred, copy);
2746
2747	changed = true;
2748
2749	/ Try to merge the resulting merged PRED into further predecessors. /
2750	if (can_dup_more)
2751	maybe_duplicate_computed_goto (bb: pred, max_size);
2752	}
2753
2754	return changed;
2755	}
2756
2757	/ Duplicate the blocks containing computed gotos. This basically unfactors*
2758	computed gotos that were factored early on in the compilation process to
2759	speed up edge based data flow. We used to not unfactor them again, which
2760	can seriously pessimize code with many computed jumps in the source code,
2761	such as interpreters. See e.g. PR15242. /*
2762	static void
2763	duplicate_computed_gotos (function *fun)
2764	{
2765	/ We are estimating the length of uncond jump insn only once*
2766	since the code for getting the insn length always returns
2767	the minimal length now. /*
2768	if (uncond_jump_length == `0`)
2769	uncond_jump_length = get_uncond_jump_length ();
2770
2771	/ Never copy a block larger than this. /
2772	int max_size
2773	= uncond_jump_length * param_max_goto_duplication_insns;
2774
2775	bool changed = false;
2776
2777	/ Try to duplicate all blocks that end in a computed jump and that*
2778	can be duplicated at all. /*
2779	basic_block bb;
2780	FOR_EACH_BB_FN (bb, fun)
2781	if (computed_jump_p (BB_END (bb)) && can_duplicate_block_p (bb))
2782	changed \|= maybe_duplicate_computed_goto (bb, max_size);
2783
2784	/ Some blocks may have become unreachable. /
2785	if (changed)
2786	cleanup_cfg (`0`);
2787
2788	/ Duplicating blocks will redirect edges and may cause hot blocks*
2789	previously reached by both hot and cold blocks to become dominated
2790	only by cold blocks. /*
2791	if (changed)
2792	fixup_partitions ();
2793	}
2794
2795	namespace {
2796
2797	const pass_data pass_data_duplicate_computed_gotos =
2798	{
2799	.type: RTL_PASS, / type /
2800	.name: "compgotos", / name /
2801	.optinfo_flags: OPTGROUP_NONE, / optinfo_flags /
2802	.tv_id: TV_REORDER_BLOCKS, / tv_id /
2803	.properties_required: `0`, / properties_required /
2804	.properties_provided: `0`, / properties_provided /
2805	.properties_destroyed: `0`, / properties_destroyed /
2806	.todo_flags_start: `0`, / todo_flags_start /
2807	.todo_flags_finish: `0`, / todo_flags_finish /
2808	};
2809
2810	class pass_duplicate_computed_gotos : public rtl_opt_pass
2811	{
2812	public:
2813	pass_duplicate_computed_gotos (gcc::context *ctxt)
2814	: rtl_opt_pass (pass_data_duplicate_computed_gotos, ctxt)
2815	{}
2816
2817	/ opt_pass methods: /
2818	bool gate (function *) final override;
2819	unsigned int execute (function *) final override;
2820
2821	}; // class pass_duplicate_computed_gotos
2822
2823	bool
2824	pass_duplicate_computed_gotos::gate (function *fun)
2825	{
2826	if (targetm.cannot_modify_jumps_p ())
2827	return false;
2828	return (optimize > `0`
2829	&& flag_expensive_optimizations
2830	&& ! optimize_function_for_size_p (fun));
2831	}
2832
2833	unsigned int
2834	pass_duplicate_computed_gotos::execute (function *fun)
2835	{
2836	duplicate_computed_gotos (fun);
2837
2838	return `0`;
2839	}
2840
2841	} // anon namespace
2842
2843	rtl_opt_pass *
2844	make_pass_duplicate_computed_gotos (gcc::context *ctxt)
2845	{
2846	return new pass_duplicate_computed_gotos (ctxt);
2847	}
2848
2849	/ This function is the main 'entrance' for the optimization that*
2850	partitions hot and cold basic blocks into separate sections of the
2851	.o file (to improve performance and cache locality). Ideally it
2852	would be called after all optimizations that rearrange the CFG have
2853	been called. However part of this optimization may introduce new
2854	register usage, so it must be called before register allocation has
2855	occurred. This means that this optimization is actually called
2856	well before the optimization that reorders basic blocks (see
2857	function above).
2858
2859	This optimization checks the feedback information to determine
2860	which basic blocks are hot/cold, updates flags on the basic blocks
2861	to indicate which section they belong in. This information is
2862	later used for writing out sections in the .o file. Because hot
2863	and cold sections can be arbitrarily large (within the bounds of
2864	memory), far beyond the size of a single function, it is necessary
2865	to fix up all edges that cross section boundaries, to make sure the
2866	instructions used can actually span the required distance. The
2867	fixes are described below.
2868
2869	Fall-through edges must be changed into jumps; it is not safe or
2870	legal to fall through across a section boundary. Whenever a
2871	fall-through edge crossing a section boundary is encountered, a new
2872	basic block is inserted (in the same section as the fall-through
2873	source), and the fall through edge is redirected to the new basic
2874	block. The new basic block contains an unconditional jump to the
2875	original fall-through target. (If the unconditional jump is
2876	insufficient to cross section boundaries, that is dealt with a
2877	little later, see below).
2878
2879	In order to deal with architectures that have short conditional
2880	branches (which cannot span all of memory) we take any conditional
2881	jump that attempts to cross a section boundary and add a level of
2882	indirection: it becomes a conditional jump to a new basic block, in
2883	the same section. The new basic block contains an unconditional
2884	jump to the original target, in the other section.
2885
2886	For those architectures whose unconditional branch is also
2887	incapable of reaching all of memory, those unconditional jumps are
2888	converted into indirect jumps, through a register.
2889
2890	IMPORTANT NOTE: This optimization causes some messy interactions
2891	with the cfg cleanup optimizations; those optimizations want to
2892	merge blocks wherever possible, and to collapse indirect jump
2893	sequences (change "A jumps to B jumps to C" directly into "A jumps
2894	to C"). Those optimizations can undo the jump fixes that
2895	partitioning is required to make (see above), in order to ensure
2896	that jumps attempting to cross section boundaries are really able
2897	to cover whatever distance the jump requires (on many architectures
2898	conditional or unconditional jumps are not able to reach all of
2899	memory). Therefore tests have to be inserted into each such
2900	optimization to make sure that it does not undo stuff necessary to
2901	cross partition boundaries. This would be much less of a problem
2902	if we could perform this optimization later in the compilation, but
2903	unfortunately the fact that we may need to create indirect jumps
2904	(through registers) requires that this optimization be performed
2905	before register allocation.
2906
2907	Hot and cold basic blocks are partitioned and put in separate
2908	sections of the .o file, to reduce paging and improve cache
2909	performance (hopefully). This can result in bits of code from the
2910	same function being widely separated in the .o file. However this
2911	is not obvious to the current bb structure. Therefore we must take
2912	care to ensure that: 1). There are no fall_thru edges that cross
2913	between sections; 2). For those architectures which have "short"
2914	conditional branches, all conditional branches that attempt to
2915	cross between sections are converted to unconditional branches;
2916	and, 3). For those architectures which have "short" unconditional
2917	branches, all unconditional branches that attempt to cross between
2918	sections are converted to indirect jumps.
2919
2920	The code for fixing up fall_thru edges that cross between hot and
2921	cold basic blocks does so by creating new basic blocks containing
2922	unconditional branches to the appropriate label in the "other"
2923	section. The new basic block is then put in the same (hot or cold)
2924	section as the original conditional branch, and the fall_thru edge
2925	is modified to fall into the new basic block instead. By adding
2926	this level of indirection we end up with only unconditional branches
2927	crossing between hot and cold sections.
2928
2929	Conditional branches are dealt with by adding a level of indirection.
2930	A new basic block is added in the same (hot/cold) section as the
2931	conditional branch, and the conditional branch is retargeted to the
2932	new basic block. The new basic block contains an unconditional branch
2933	to the original target of the conditional branch (in the other section).
2934
2935	Unconditional branches are dealt with by converting them into
2936	indirect jumps. /*
2937
2938	namespace {
2939
2940	const pass_data pass_data_partition_blocks =
2941	{
2942	.type: RTL_PASS, / type /
2943	.name: "bbpart", / name /
2944	.optinfo_flags: OPTGROUP_NONE, / optinfo_flags /
2945	.tv_id: TV_REORDER_BLOCKS, / tv_id /
2946	PROP_cfglayout, / properties_required /
2947	.properties_provided: `0`, / properties_provided /
2948	.properties_destroyed: `0`, / properties_destroyed /
2949	.todo_flags_start: `0`, / todo_flags_start /
2950	.todo_flags_finish: `0`, / todo_flags_finish /
2951	};
2952
2953	class pass_partition_blocks : public rtl_opt_pass
2954	{
2955	public:
2956	pass_partition_blocks (gcc::context *ctxt)
2957	: rtl_opt_pass (pass_data_partition_blocks, ctxt)
2958	{}
2959
2960	/ opt_pass methods: /
2961	bool gate (function *) final override;
2962	unsigned int execute (function *) final override;
2963
2964	}; // class pass_partition_blocks
2965
2966	bool
2967	pass_partition_blocks::gate (function *fun)
2968	{
2969	/ The optimization to partition hot/cold basic blocks into separate*
2970	sections of the .o file does not work well with linkonce or with
2971	user defined section attributes or with naked attribute. Don't call
2972	it if either case arises. /*
2973	return (flag_reorder_blocks_and_partition
2974	&& optimize
2975	/ See pass_reorder_blocks::gate. We should not partition if*
2976	we are going to omit the reordering. /*
2977	&& optimize_function_for_speed_p (fun)
2978	&& !DECL_COMDAT_GROUP (current_function_decl)
2979	&& !lookup_attribute (attr_name: "section", DECL_ATTRIBUTES (fun->decl))
2980	&& !lookup_attribute (attr_name: "naked", DECL_ATTRIBUTES (fun->decl))
2981	/ Workaround a bug in GDB where read_partial_die doesn't cope*
2982	with DIEs with DW_AT_ranges, see PR81115. /*
2983	&& !(in_lto_p && MAIN_NAME_P (DECL_NAME (fun->decl))));
2984	}
2985
2986	unsigned
2987	pass_partition_blocks::execute (function *fun)
2988	{
2989	vec<edge> crossing_edges;
2990
2991	if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + `1`)
2992	return `0`;
2993
2994	df_set_flags (DF_DEFER_INSN_RESCAN);
2995
2996	crossing_edges = find_rarely_executed_basic_blocks_and_crossing_edges ();
2997	if (!crossing_edges.exists ())
2998	/ Make sure to process deferred rescans and clear changeable df flags. /
2999	return TODO_df_finish;
3000
3001	crtl->has_bb_partition = true;
3002
3003	/ Make sure the source of any crossing edge ends in a jump and the*
3004	destination of any crossing edge has a label. /*
3005	add_labels_and_missing_jumps (crossing_edges);
3006
3007	/ Convert all crossing fall_thru edges to non-crossing fall*
3008	thrus to unconditional jumps (that jump to the original fall
3009	through dest). /*
3010	fix_up_fall_thru_edges ();
3011
3012	/ If the architecture does not have conditional branches that can*
3013	span all of memory, convert crossing conditional branches into
3014	crossing unconditional branches. /*
3015	if (!HAS_LONG_COND_BRANCH)
3016	fix_crossing_conditional_branches ();
3017
3018	/ If the architecture does not have unconditional branches that*
3019	can span all of memory, convert crossing unconditional branches
3020	into indirect jumps. Since adding an indirect jump also adds
3021	a new register usage, update the register usage information as
3022	well. /*
3023	if (!HAS_LONG_UNCOND_BRANCH)
3024	fix_crossing_unconditional_branches ();
3025
3026	update_crossing_jump_flags ();
3027
3028	/ Clear bb->aux fields that the above routines were using. /
3029	clear_aux_for_blocks ();
3030
3031	crossing_edges.release ();
3032
3033	/ ??? FIXME: DF generates the bb info for a block immediately.*
3034	And by immediately, I mean during* creation of the block.*
3035
3036	#0 df_bb_refs_collect
3037	#1 in df_bb_refs_record
3038	#2 in create_basic_block_structure
3039
3040	Which means that the bb_has_eh_pred test in df_bb_refs_collect
3041	will always* fail, because no edges can have been added to the*
3042	block yet. Which of course means we don't add the right
3043	artificial refs, which means we fail df_verify (much) later.
3044
3045	Cleanest solution would seem to make DF_DEFER_INSN_RESCAN imply
3046	that we also shouldn't grab data from the new blocks those new
3047	insns are in either. In this way one can create the block, link
3048	it up properly, and have everything Just Work later, when deferred
3049	insns are processed.
3050
3051	In the meantime, we have no other option but to throw away all
3052	of the DF data and recompute it all. /*
3053	if (fun->eh->lp_array)
3054	{
3055	df_finish_pass (true);
3056	df_scan_alloc (NULL);
3057	df_scan_blocks ();
3058	/ Not all post-landing pads use all of the EH_RETURN_DATA_REGNO*
3059	data. We blindly generated all of them when creating the new
3060	landing pad. Delete those assignments we don't use. /*
3061	df_set_flags (DF_LR_RUN_DCE);
3062	df_analyze ();
3063	}
3064
3065	/ Make sure to process deferred rescans and clear changeable df flags. /
3066	return TODO_df_finish;
3067	}
3068
3069	} // anon namespace
3070
3071	rtl_opt_pass *
3072	make_pass_partition_blocks (gcc::context *ctxt)
3073	{
3074	return new pass_partition_blocks (ctxt);
3075	}
3076

source code of gcc/bb-reorder.cc