These flags control binary code generation, so the correct use of these flags can dramatically improve runtime performance. What exactly do these flags do? Do they have the same meaning when compiling for Arm as when compiling for x86? Do they mean the same thing to all compilers? How should you use them to get the best performance for your application? For those compilers, the -march flag specifies the target architecture. The -mtune flag specifies the target microarchitecture.
The -mtune flag does not enable the compiler to use the special hardware features of the target. It only advises the compiler to perform architecture-independent optimizations like instruction reordering. This is a crucial difference between Arm and x86!
Figure 1: Architecture vs. Microarchitecture in the Arm Ecosystem. If you plot some Arm architecture specifications e. The graph axes somewhat conflated since architectures and microarchitectures are closely linked, so the blue horizontal lines show the baseline architecture for each microarchitecture on the vertical axis.
For now, just focus on the idea that each target has an architecture and a microarchitecture. You may notice that many of the targets in a.
However, if it ever did exist then we know for certain that this example a. The optimization space only shows the targets for which the compiler may have performed optimizations. This flag advises the compiler to optimize for a target microarchitecture, but only for a generic instruction set.
On Arm, if you want to optimize for both a particular architecture and microarchitecture then you use the -mcpu flag. The -mcpu flag accepts the same parameter values as the -mtune flag. In this case, the binary could execute on anything implementing the v8. What happens when -march, -mtune, and -mcpu are used in combination? On Arm, the -march and -mtune flags override any value passed to -mcpu. Fortunately, the GNU compiler will issue a warning in this case.
Another difference between Arm and x86 is that the -march and -mtune flags are entirely orthogonal on Arm. Mix and match freely! The resulting binary will execute on architecture X and all supersets of architecture X, but will be optimized for microarchitecture Y. The binary would have execution and optimization spaces as shown in Figure 5. So why have the -mcpu flag at all if -mcpu is just an alias for -mtune on x86, and -march and -mtune are orthogonal on Arm?
Why not just combine -march and -mtune as needed on Arm, or follow the x86 convention and let -march imply -mtune? In reality, CPU architects frequently add extensions from multiple Arm architectures to the baseline, both above and below the baseline architecture version.
The Arm Neoverse N1 is a perfect example of how targets typically have a complete implementation of one architecture but support features from other architectures as well. On the ThunderX2, any instruction from the v8. When you specify -march, you are confining the compiler to only the baseline architecture, so the compiler is unable take advantage of any architecture extensions beyond the baseline.
In order to take advantage of all the features of a particular target, you should use the -mcpu flag to simultaneously specify the architecture with all its extensions, and the microarchitecture.
The code is shown in Figure 6. Arm v8. Real world application speed-ups of 10x and even x have been reported when using LSE, so if our target supports LSE then we would very much like to use LSE instructions.Generate code for the specified data model. The default depends on the specific target configuration. Generate big-endian code.
Generate code which uses only the general-purpose registers.
This will prevent the compiler from using floating-point and Advanced SIMD registers but will not impose any restrictions on the assembler. Generate little-endian code. Generate code for the tiny code model. The program and its statically defined symbols must be within 1MB of each other. Programs can be statically or dynamically linked. Generate code for the small code model. The program and its statically defined symbols must be within 4GB of each other.
This is the default code model. Generate code for the large code model. This makes no assumptions about addresses and sizes of sections. Programs can be statically linked only. Avoid or allow generating memory accesses that may not be aligned on a natural object boundary as described in the architecture specification.
Generate stack protection code using canary at guard. There is no default register or offset as this is entirely for use within the Linux kernel. This is the default. Specify bit size of immediate TLS offsets. Valid values are 12, 24, 32, This option requires binutils 2. This involves inserting a NOP instruction between memory instructions and bit integer multiply-accumulate instructions.
This erratum workaround is made at link time and this will only pass the corresponding flag to the linker.
Enable or disable the reciprocal square root approximation.In general, we've tried to match existing conventions for these arguments, but like pretty much everything else there are enough quirks to warrant a blog post. This allows us to expose the same command-line interface from both the GNU tools GCC and binutils as well as the LLVM tools, as well as avoid the need for users to directly pass flags to the assembler or linker via the compiler's -Wa and -Wl arguments.
To ensure that the RISC-V compiler command-line interface is easy to extend in the future, we decided on a scheme where users describe the RISC-V target they are trying to compile for using three arguments:. This argument determines the set of implementations that a program will run on: any RISC-V compliant system that subsumes the -march value used to compile a program should be able to run that program. To get a bit more specific: Version 2. In addition to these base ISAs, a handful of extensions have been specified.
The extensions that have been specified and are supported by the toolchain are:. On RISC-V systems that don't support particular operations, emulation routines may be used to provide the missing functionality.
For example the following C code. Similar emulation routines exist for the C intrinsics that are trivially implemented by the M and F extensions. As of this writing, there are no A routine emulations because they were rejected as part of the Linux upstreaming process -- this might change in the future, but - for now - we plan to mandate that Linux-capable machines subsume the A extension as part of the RISC-V platform specification.
Much like how the -march argument specifies which hardware generated code can run on, the -mabi argument specifies which software generated code can link against. We use the standard naming scheme for integer ABIs ilp32 or lp64with an argumental single letter appended to select the floating-point registers used by the ABI ilp32 vs ilp32f vs ilp32d.
In order for objects to be linked together, they must follow the same ABI. As a more concrete example, let's examine a simple C function that takes two double-precision arguments and returns their product. In order to make argument location explicit in all cases, we'll reverse the order of the arguments between the function call and the multiplication:. The first argument is the simplest one: if neither the ABI or ISA contains the concept of floating-point hardware then the C compiler cannot emit any floating-point-specific instructions.
In this case, emulation routines are used to perform the computation and the arguments are passed in integer registers.
The second case is the exact opposite of this one: everything is supported in hardware. In this case we can emit a single fmul. The last case exposes why there is a split between the -march and -mabi arguments to RISC-V compilers: users may want to generate code that can be linked with code designed for systems that don't subsume a particular extension while still taking advantage of the extra instructions present in a particular extension.
This is a common problem when dealing with legacy libraries that need to be integrated into newer systems so we've designed our compiler arguments and multilib paths to cleanly integrate with this workflow. The generated code is essentially a mix between the two above outputs: the arguments are passed in the registers specified by the ilp32 ABI as opposed to the ilp32d ABI, which could pass these arguments in registers but then once inside the function the compiler is free to use the full power of the rv32imafdc ISA to actually compute the result.
As a result, the compiler generates the double-precision arguments in memory the only way to construct a double on rv32loads them into F registers, performs the computation, stores the F -register result back out to the stack, and loads the result into the ABI-compliant return value registers a0 and a1.
While this is less efficient than the code the compiler could generate if it was allowed to take full advantage of the D-extension registers, it's a lot more efficient than computing the floating-point multiplication without the D-extension instructions. There's no way the compiler could generate code for an ISA that requires passing arguments in F registers if it doesn't have access to the instructions required to access those registers.
As this must be user error, we bail out right away. The last compiler argument that's involved in specifying a target is the simplest of the bunch. While the -march argument can cause systems to be unable to execute code and the -mabi argument can cause objects to be incompatible with each other, the -mtune argument should only change the performance of the generated code.
All Aboard, Part 1: The -march, -mabi, and -mtune arguments to RISC-V Compilers
Stack Overflow for Teams is a private, secure spot for you and your coworkers to find and share information. When does one use just -marchvs. Is it ever possible to just -mtune? If you use -mtunethen the compiler will generate code that works on any of them, but will favour instruction sequences that run fastest on the specific CPU you indicated. Certain instruction sets 3DNow! No idea if this is true or not.
Learn more. GCC: how is march different from mtune? Ask Question. Asked 7 years, 11 months ago. Active 2 years, 4 months ago. Viewed 27k times. I tried to scrub the GCC man page for this, but still don't get it, really. What's the difference between -march and -mtune? Benjamin Jameson Jameson 4, 4 4 gold badges 23 23 silver badges 43 43 bronze badges. Active Oldest Votes. James Youngman James Youngman 3, 17 17 silver badges 20 20 bronze badges.
Doesn't answer whether it makes sense to use both or whether mtune is redundant when set to the same value.
Besides, the documentation explicitly states that march implies mtune. So, the answers to your objections are no and yes respectively. Thank you for explaining this so elegantly! You make it easy to understand.
People need a tl;dr: Use -march if you ONLY run it on your processor, use -mtune if you want it safe for other processors.
Users must also understand that older compilers released before some CPU did not exist may result in different optimal mtune and march combination. This blog post illuminates that point with the others: lemire. Sign up or log in Sign up using Google.
Tech Wars 2020 Takes Over GCC March 12! ~ CANCELLED
Sign up using Facebook. Sign up using Email and Password. Post as a guest Name. Email Required, but never shown.Welcome to LinuxQuestions. You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features.GCC Gathering March 15
Registration is quick, simple and absolutely free. Join our community today! Note that registered members see fewer ads, and ContentLink is completely disabled once you log in. Are you new to LinuxQuestions.
If you need to reset your password, click here. Having a problem logging in? Please visit this page to clear all LQ-related cookies. Introduction to Linux - A Hands on Guide This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration.
This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own. Click Here to receive this Complete Guide absolutely free. Will they produce identical code for me?
Do I need to specify -m64 at all? To make selection of the best method simple, consider the following possible combinations, noting the order of the flags. Is Any particular one of these methods going to be "the best" for me?
Last edited by GrapefruiTgirl; at AM. Thanks for the perfect answer! Cheers Sasha. Tags gccx Thread Tools. BB code is On. Smilies are On. All times are GMT The time now is AM. Open Source Consulting Domain Registration. Search Blogs. Mark Forums Read. User Name. Remember Me? Linux - Software This forum is for Software issues. Having a problem installing a new program? Want to know which application is best for the job?
Post your question in this forum.Each year, with a forward vision and new developments in technology, Tech Wars introduces new events.
Regardless of the event, all of the students enjoy the opportunity to see their hard work come to fruition. Tech Wars event details, rules and competition descriptions are available at techwarsgccny. Back by popular demand for all participants is the Mystery Event which allows students to use their creativity and skills in an on-demand, timed situation.
As an educational precursor to a future in technology, STEAM Jam participants will have the opportunity to observe the fun and exciting Tech Wars competitions. Through these experiences and relationships students begin to form goals and a vision for their own futures. Our local sponsors also serve as volunteers, judges, and spend their valuable time talking with students and inspiring entrepreneurial spirit.
In addition, the businesses that participate in these events get an exclusive opportunity to meet and network with the future workforce in our community. Tech Wars is among several dynamic programs giving students the opportunity to learn hands-on, often in business settings and with industry professionals.
For photographs or media inquiries, contact Marketing Communications Director Donna Rae Sutherland at ext. Skip to main content. Login: myGCC Mail. Search: Search Search Type website people. Wednesday, February 12, Help answer threads with 0 replies. Welcome to LinuxQuestions. You are currently viewing LQ as a guest.
By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free.
Join our community today! Note that registered members see fewer ads, and ContentLink is completely disabled once you log in. Are you new to LinuxQuestions. If you need to reset your password, click here. Having a problem logging in?
Please visit this page to clear all LQ-related cookies. Introduction to Linux - A Hands on Guide This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant.
They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own. Click Here to receive this Complete Guide absolutely free. I'm on Debian Stretch and decided to re-build some of the packages as a test bed. The packages built, installed and ran without a hitch.
From GCC's manual 3. I would therefore anticipate that it can see whether this-or-that feature exists, without further clues from you. However, if you know that a particular feature isn't there, I would think that there is no harm in being more specific if you want to. If you know that SSE3 really isn't there, and fear that the compiler might think that it is which would surprise me Last edited by sundialsvcs; at AM.
Find More Posts by sundialsvcs View Blog. Originally Posted by pan Hello from a gentoo user. First I want to say this topic was a lot discussed on forums. Conclusion for myself. Originally Posted by sundialsvcs. I quite-frankly agree. Maybe the man-page is what is out of date on this very-small point. If you ask gcc to "adapt itself to the architecture of the host upon which it now finds itself," I'll betcha that it will do so correctly — no matter what the man-page says.
Also — "now that even 'run-of-the-mill' microprocessors are running billions! Can anyone, today, actually "hear you scream? Originally Posted by Emerson.