![]() |
| Forums | Gaming News | Videos | Downloads | Today's Posts | Mark Forums Read | Chat | FAQ | Members List | Contact |
| ||||||
This is a discussion on C/C++ Optimizations within the PSP Development Forum forums, part of the PSP Development, Hacks, and Homebrew category; Post here whatever code optimizations you know of, and I'll add to this first post! Here's some links I picked ...
![]() |
|
|
LinkBack | Thread Tools |
|
|
#1 |
![]() ![]() sceKernelExitGame();
|
Post here whatever code optimizations you know of, and I'll add to this first post!
Here's some links I picked up from psp-programming.com (Thanks )c optimization c++ optimization pspgu optimization Nexis2600 There are a few reasons to do while(!ExitLoop), benifit is anywhere in the code it can set the varible to true and force the main loop to exit. Another reason why someone might use a varible vs a break is you have to make sure you exit the loop at the proper time. For instance, you enter a loop. At the start you start a new display list. Then break before the end display list is called. Next time you try to start a new display list your bound to lock up the psp or cause an error. Fanjita Get yourself a copy of Mike Abrash's "Zen of Code Optimisation". That will teach you an enormous amount about optimisation techniques, and ways of thinking. At the end of the day, the best route to optimisation is to get to know your target platform, how computers work at a low level, and to understand the theory of algorithms. Insomniac197 Use sceCtrlPeekBufferPositive over sceCtrlReadBufferPositive . Use hardware acceleration wherever possible instead of software. Don't overuse sceKernelDcacheWritebackI nvalidateAll(). Always swizzle your images. Use the texture cache. VRAM is faster than RAM (but you have very little of it to use) - so place textures that are ALWAYS on screen in VRAM (for example the player sprite). Harleyg Use >> and << instead of / Never use while(1), use while(foo) so you can stop the loop by setting foo to 0. If you just want a while(1) equivalent use: Code:
for(;;) For programming on the PSP it's important to minimize memory usage overhead. Reusing the same buffer spaces for multiple things is dirty but can achieve this; you'll get higher cache hit rates. For small temporary allocations it's also better to allocate on the stack with dynamically sized arrays than it is on the heap with malloc/new, because that'll both improve spatial locality of your neighboring elements, temporal locality over the stack, and decrease memory overhead overall. PSP only has a small amount of cache so when doing computationally expensive work this is important. If your programming leads you to balance icache usage over dcache usage it's probably better to favor dcache usage, because icache can be prefetched more transparently and thus efficiently (is probably the reason why some platforms have less icache than dcache). If you're not using VRAM that heavily anyway (happens, well, in emulators usually) you can place non-video related things there... the ME's eDRAM is good for this too but can only be accessed on the ME of course. If you're just going to use it like an extension normal RAM you shouldn't have to worry about caching problems, but be sure the memory there is aligned to the cache line width (64 bytes). AnonymousTipster Use VFPU accelerated functions where applicable. There are VFPU ASM code fragments on ps2dev.org here: http://forums.ps2dev.org/viewtopic.php?t=6609 http://forums.ps2dev.org/viewtopic.php?t=5557 http://forums.ps2dev.org/viewtopic.php?t=6478 You need to be careful of the overhead of moving data when utilising the VFPU, or the performance gain will be negated. I've only made minimal use of the VFPU thus far, but it could potentially be very powerful in skilled hands. Yaustar -Profiling: Under GCC IIRC, you add -pg to the CFLAGs and perform a recompile. When you run the executable, if will record the time that functions take to finish, the call charts etc. This information will be in a file gmon.out (I think) which can be read using a tool like gprof. Here is what one of mine looks like for the project I am working on: Code:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
100.00 0.02 0.02 zoomSurfaceRGBA
0.00 0.02 0.00 10 0.00 0.00 Core::CLogger::operator<<(std::string const&)
0.00 0.02 0.00 2 0.00 0.00 Core::CSurfaceObject::FreeSurface()
0.00 0.02 0.00 1 0.00 0.00 global destructors keyed to _ZN4Core14CSurfaceObjectC2Ev
0.00 0.02 0.00 1 0.00 0.00 global destructors keyed to _ZN4Core6ErrLogE
0.00 0.02 0.00 1 0.00 0.00 __static_initialization_and_destruction_0(int, int)
0.00 0.02 0.00 1 0.00 0.00 __static_initialization_and_destruction_0(int, int)
0.00 0.02 0.00 1 0.00 0.00 Core::CSurfaceObject::ScaleImage(float)
0.00 0.02 0.00 1 0.00 0.00 Core::CSurfaceObject::LoadImage(std::string const&)
0.00 0.02 0.00 1 0.00 0.00 Core::CSurfaceObject::CSurfaceObject()
0.00 0.02 0.00 1 0.00 0.00 Core::CSurfaceObject::~CSurfaceObject()
0.00 0.02 0.00 1 0.00 0.00 Core::CLogger::~CLogger()
% the percentage of the total running time of the
time program used by this function.
cumulative a running sum of the number of seconds accounted
seconds for by this function and those listed above it.
self the number of seconds accounted for by this
seconds function alone. This is the major sort for this
listing.
calls the number of times this function was invoked, if
this function is profiled, else blank.
self the average number of milliseconds spent in this
ms/call function per call, if this function is profiled,
else blank.
total the average number of milliseconds spent in this
ms/call function and its descendents per call, if this
function is profiled, else blank.
name the name of the function. This is the minor sort
for this listing. The index shows the location of
the function in the gprof listing. If the index is
in parenthesis it shows where it would appear in
the gprof listing if it were to be printed.
Call graph (explanation follows)
granularity: each sample hit covers 4 byte(s) for 50.00% of 0.02 seconds
index % time self children called name
<spontaneous>
[1] 100.0 0.02 0.00 zoomSurfaceRGBA [1]
-----------------------------------------------
0.00 0.00 1/10 Core::CSurfaceObject::ScaleImage(float) [11]
0.00 0.00 2/10 Core::CSurfaceObject::CSurfaceObject() [13]
0.00 0.00 2/10 Core::CSurfaceObject::~CSurfaceObject() [14]
0.00 0.00 2/10 Core::CSurfaceObject::LoadImage(std::string const&) [12]
0.00 0.00 3/10 Core::CSurfaceObject::FreeSurface() [6]
[5] 0.0 0.00 0.00 10 Core::CLogger::operator<<(std::string const&) [5]
-----------------------------------------------
0.00 0.00 1/2 Core::CSurfaceObject::~CSurfaceObject() [14]
0.00 0.00 1/2 Core::CSurfaceObject::LoadImage(std::string const&) [12]
[6] 0.0 0.00 0.00 2 Core::CSurfaceObject::FreeSurface() [6]
0.00 0.00 3/10 Core::CLogger::operator<<(std::string const&) [5]
-----------------------------------------------
0.00 0.00 1/1 __do_global_dtors [1263]
[7] 0.0 0.00 0.00 1 global destructors keyed to _ZN4Core14CSurfaceObjectC2Ev [7]
0.00 0.00 1/1 __static_initialization_and_destruction_0(int, int) [10]
-----------------------------------------------
0.00 0.00 1/1 __do_global_dtors [1263]
[8] 0.0 0.00 0.00 1 global destructors keyed to _ZN4Core6ErrLogE [8]
0.00 0.00 1/1 __static_initialization_and_destruction_0(int, int) [9]
-----------------------------------------------
0.00 0.00 1/1 global destructors keyed to _ZN4Core6ErrLogE [8]
[9] 0.0 0.00 0.00 1 __static_initialization_and_destruction_0(int, int) [9]
0.00 0.00 1/1 Core::CLogger::~CLogger() [15]
-----------------------------------------------
0.00 0.00 1/1 global destructors keyed to _ZN4Core14CSurfaceObjectC2Ev [7]
[10] 0.0 0.00 0.00 1 __static_initialization_and_destruction_0(int, int) [10]
-----------------------------------------------
0.00 0.00 1/1 SDL_main [58]
[11] 0.0 0.00 0.00 1 Core::CSurfaceObject::ScaleImage(float) [11]
0.00 0.00 1/10 Core::CLogger::operator<<(std::string const&) [5]
-----------------------------------------------
0.00 0.00 1/1 SDL_main [58]
[12] 0.0 0.00 0.00 1 Core::CSurfaceObject::LoadImage(std::string const&) [12]
0.00 0.00 2/10 Core::CLogger::operator<<(std::string const&) [5]
0.00 0.00 1/2 Core::CSurfaceObject::FreeSurface() [6]
-----------------------------------------------
0.00 0.00 1/1 SDL_main [58]
[13] 0.0 0.00 0.00 1 Core::CSurfaceObject::CSurfaceObject() [13]
0.00 0.00 2/10 Core::CLogger::operator<<(std::string const&) [5]
-----------------------------------------------
0.00 0.00 1/1 SDL_main [58]
[14] 0.0 0.00 0.00 1 Core::CSurfaceObject::~CSurfaceObject() [14]
0.00 0.00 2/10 Core::CLogger::operator<<(std::string const&) [5]
0.00 0.00 1/2 Core::CSurfaceObject::FreeSurface() [6]
-----------------------------------------------
0.00 0.00 1/1 __static_initialization_and_destruction_0(int, int) [9]
[15] 0.0 0.00 0.00 1 Core::CLogger::~CLogger() [15]
-----------------------------------------------
This table describes the call tree of the program, and was sorted by
the total amount of time spent in each function and its children.
Each entry in this table consists of several lines. The line with the
index number at the left hand margin lists the current function.
The lines above it list the functions that called this function,
and the lines below it list the functions this one called.
This line lists:
index A unique number given to each element of the table.
Index numbers are sorted numerically.
The index number is printed next to every function name so
it is easier to look up where the function in the table.
% time This is the percentage of the `total' time that was spent
in this function and its children. Note that due to
different viewpoints, functions excluded by options, etc,
these numbers will NOT add up to 100%.
self This is the total amount of time spent in this function.
children This is the total amount of time propagated into this
function by its children.
called This is the number of times the function was called.
If the function called itself recursively, the number
only includes non-recursive calls, and is followed by
a `+' and the number of recursive calls.
name The name of the current function. The index number is
printed after it. If the function is a member of a
cycle, the cycle number is printed between the
function's name and the index number.
For the function's parents, the fields have the following meanings:
self This is the amount of time that was propagated directly
from the function into this parent.
children This is the amount of time that was propagated from
the function's children into this parent.
called This is the number of times this parent called the
function `/' the total number of times the function
was called. Recursive calls to the function are not
included in the number after the `/'.
name This is the name of the parent. The parent's index
number is printed after it. If the parent is a
member of a cycle, the cycle number is printed between
the name and the index number.
If the parents of the function cannot be determined, the word
`<spontaneous>' is printed in the `name' field, and all the other
fields are blank.
For the function's children, the fields have the following meanings:
self This is the amount of time that was propagated directly
from the child into the function.
children This is the amount of time that was propagated from the
child's children to the function.
called This is the number of times the function called
this child `/' the total number of times the child
was called. Recursive calls by the child are not
listed in the number after the `/'.
name This is the name of the child. The child's index
number is printed after it. If the child is a
member of a cycle, the cycle number is printed
between the name and the index number.
If there are any cycles (circles) in the call graph, there is an
entry for the cycle-as-a-whole. This entry shows who called the
cycle (as parents) and the members of the cycle (as children.)
The `+' recursive calls entry shows the number of function calls that
were internal to the cycle, and the calls entry for each member shows,
for that member, how many times it was called from other members of
the cycle.
Index by function name
[7] global destructors keyed to _ZN4Core14CSurfaceObjectC2Ev (CSurfaceObject.cpp) [11] Core::CSurfaceObject::ScaleImage(float) [14] Core::CSurfaceObject::~CSurfaceObject()
[8] global destructors keyed to _ZN4Core6ErrLogE (CLogger.cpp) [6] Core::CSurfaceObject::FreeSurface() [15] Core::CLogger::~CLogger()
[10] __static_initialization_and_destruction_0(int, int) (CSurfaceObject.cpp) [12] Core::CSurfaceObject::LoadImage(std::string const&) [5] Core::CLogger::operator<<(std::string const&)
[9] __static_initialization_and_destruction_0(int, int) (CLogger.cpp) [13] Core::CSurfaceObject::CSurfaceObject() [1] zoomSurfaceRGBA
- Dont allocate or deallocate memory between frames. Do it at the end or begginning of a 'game 'state' or even better used fixed memory pools. - Keep class hierarchies as flat as possible - Avoid Pointer chains (eg a->b->c->d->blah = 10) - Trust your compiler to deal with little optimisations (eg loop unrolling) - In loops, count down to 0 rather then up to an X number (saves on several operations per loop) - sometimes -O2 will give better results then -O3 (dont ask me why because it shouldn't :/)
__________________
Last edited by Bronx; 09-10-2006 at 05:07 PM.. |
|
|
|
|
|
#2 | |
![]() ![]() Developer
|
Use sceCtrlPeekBufferPositive over sceCtrlReadBufferPositive .
Use hardware acceleration wherever possible instead of software. Don't overuse sceKernelDcacheWritebackI nvalidateAll(). Always swizzle your images. Use the texture cache. VRAM is faster than RAM (but you have very little of it to use) - so place textures that are ALWAYS on screen in VRAM (for example the player sprite).
__________________
![]() Check out my homebrew & C tutorials at http://insomniac.0x89.org/ Coder formerly known as Insomniac197 Quote:
|
|
|
|
|
|
|
#3 |
![]() ![]() sceKernelExitGame();
|
Added
__________________
|
|
|
|
|
|
#6 | |
![]() ![]() AKA Homer
|
Quote:
|
|
|
|
|
|
|
#8 | |
![]() ![]() Developer
|
Quote:
There are a lot of good techniques that compilers may get now, like strength reduction over loops, inner loop hoisting.. but sometimes if you do those things yourself you might open up new opportunities that the compiler hasn't seen. For programming on the PSP it's important to minimize memory usage overhead. Reusing the same buffer spaces for multiple things is dirty but can achieve this; you'll get higher cache hit rates. For small temporary allocations it's also better to allocate on the stack with dynamically sized arrays than it is on the heap with malloc/new, because that'll both improve spatial locality of your neighboring elements, temporal locality over the stack, and decrease memory overhead overall. PSP only has a small amount of cache so when doing computationally expensive work this is important. If your programming leads you to balance icache usage over dcache usage it's probably better to favor dcache usage, because icache can be prefetched more transparently and thus efficiently (is probably the reason why some platforms have less icache than dcache). If you're not using VRAM that heavily anyway (happens, well, in emulators usually) you can place non-video related things there... the ME's eDRAM is good for this too but can only be accessed on the ME of course. If you're just going to use it like an extension normal RAM you shouldn't have to worry about caching problems, but be sure the memory there is aligned to the cache line width (64 bytes). |
|
|
|
|
|
|
#9 | ||
![]() ![]() Developer
|
Quote:
__________________
![]() Check out my homebrew & C tutorials at http://insomniac.0x89.org/ Coder formerly known as Insomniac197 Quote:
|
||
|
|
|
|
|
#11 | ||
![]() |
Quote:
Quote:
__________________
[URL="http://www.newlilwayne.com"]www.NewLilWayne.com[/URL] |
||
|
|
|
|
|
#12 |
![]() ![]() Developer
|
Use VFPU accelerated functions where applicable. There are VFPU ASM code fragments on ps2dev.org here:
http://forums.ps2dev.org/viewtopic.php?t=6609 http://forums.ps2dev.org/viewtopic.php?t=5557 http://forums.ps2dev.org/viewtopic.php?t=6478 You need to be careful of the overhead of moving data when utilising the VFPU, or the performance gain will be negated. I've only made minimal use of the VFPU thus far, but it could potentially be very powerful in skilled hands.
__________________
Developer of Tipster Unzip/Unrar ThrottleX RoboTORN3D ODEPsp ![]() Now, with the power of my PSP, I will finally RULE THE WORLD. Muhahahah. |
|
|
|
|
|
#13 | |
![]() ![]() ...in a dream...
|
Quote:
@harleyg - Using while(1) is perfectly fine, as calling a statement such as 'break' will stop the loop just as easy. You can't necessarly 'restart' the loop as you could with a variable, but you could just make it into a function and just re-call it to start it over...
__________________
...you'll never know what it's like... spending your whole life in a dream...
Launch a Kitten out of a Cannon and win real cash! Checkout my newly updated site for all my projects (Kitten Cannon, BOXHEAD, Light Cycle 3D) |
|
|
|
|
|
|
#14 | |
![]() ![]() My name is Mud
|
Quote:
Not much, but bleh.
__________________
|
|
|
|
|
|
|
#15 |
![]() ![]() sceKernelExitGame();
|
Everything is added! Great input guys, keep it up!
__________________
|
|
|
|
|
|
#16 |
![]() ![]() ...in a dream...
|
Any loop, you can 'break'...
Code:
for (;;) { break; }
while(1) { break; }
do { break; } while(1);
__________________
...you'll never know what it's like... spending your whole life in a dream...
Launch a Kitten out of a Cannon and win real cash! Checkout my newly updated site for all my projects (Kitten Cannon, BOXHEAD, Light Cycle 3D) |
|
|
|
|
|
#19 | |
![]() ![]() Your Fate is Grim...
|
Quote:
__________________
-------------------------------------------------------------------------------------- ![]() |
|
|
|
|
|
|
#20 |
![]() ![]() ...in a dream...
|
Okkk.... Then what's it about?
__________________
...you'll never know what it's like... spending your whole life in a dream...
Launch a Kitten out of a Cannon and win real cash! Checkout my newly updated site for all my projects (Kitten Cannon, BOXHEAD, Light Cycle 3D) |
|
|
|
|
|
#22 | |
![]() ![]() OMFG
|
Quote:
|
|
|
|
|
|
|
#23 | |
![]() |
Quote:
__________________
[URL="http://www.newlilwayne.com"]www.NewLilWayne.com[/URL] |
|
|
|
|
|
|
#25 | |
![]() ![]() Developer
|
Quote:
Here is what one of mine looks like for the project I am working on: Spoiler for gprof output:
|
|
|
|
|
|
|
#26 | |
![]() |
Quote:
Thanks for the info.
__________________
[URL="http://www.newlilwayne.com"]www.NewLilWayne.com[/URL] |
|
|
|
|
|
|
#27 |
![]() ![]() ...in a dream...
|
For someone working on a new firmware and is 26 on a 'kiddies' forum, they really have too much time on there hands eh? (pedophile?
On Topic: Why would you want to set a variable to determine whether to loop or not? 'true' is already defined as 1, so setting '1' in the loop will make it loop forever. To stop this, why not just call 'break'... Doesnt take up any memory on the stack as a variable... That's not too good when optimizing code... And what does my sudden misunderstanding of your poorly written response have to do with me being a 'coder/hacker'? Seems like you need to get back to coding that firmware as all the newly acquired fresh air you got from leaving your house is messing with your brain @head_54us - Nice, would never have known that Im gonna give it a test run...
__________________
...you'll never know what it's like... spending your whole life in a dream...
Launch a Kitten out of a Cannon and win real cash! Checkout my newly updated site for all my projects (Kitten Cannon, BOXHEAD, Light Cycle 3D) |
|
|
|
|
|
#29 |
![]() ![]() ...in a dream...
|
Quit going off topic Harleyg, your ruining this thread. Oh and nice response, really backs yourself up.
On Topic: @head_54us - Know any other helpful things for optimization? Seems like you've done your homework
__________________
...you'll never know what it's like... spending your whole life in a dream...
Launch a Kitten out of a Cannon and win real cash! Checkout my newly updated site for all my projects (Kitten Cannon, BOXHEAD, Light Cycle 3D) |
|
|
|
|
|
#30 | |
![]() ![]() sceKernelExitGame();
|
Quote:
__________________
|
|
|
|
|
![]() |
| Tags |
| c or c , optimizations |
| Thread Tools | |
|
|