Skip to content

Runtime speed improvements #75

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jacobwilliams opened this issue Mar 3, 2015 · 28 comments
Closed

Runtime speed improvements #75

jacobwilliams opened this issue Mar 3, 2015 · 28 comments

Comments

@jacobwilliams
Copy link
Owner

Need to do some speed tests, and also see if there are any changes that can be made to speed up the code. The file reading and/or parsing seems rather slow. See also #37.

@zbeekman
Copy link
Contributor

zbeekman commented Mar 3, 2015

Are you using big.json to do the testing?

@jacobwilliams
Copy link
Owner Author

Yes, see test case 9. I've also got some larger ones that I haven't committed.

@zbeekman
Copy link
Contributor

zbeekman commented Mar 3, 2015

ok, I was thinking a good place to start would be to run some of the tests through gprof and gprof2dot. (This will create pretty call graphs with timing info.) gprof doesn’t work on mac, so I may muck around and see if I can get travis-ci to do it for me. The timings won’t be great since it’s running in a busy VM, but it’s at least a starting point. If you want to muck around with it, you can install gprof2dot with sudo pip install gprof2dot (you’ll need graphviz too) and then look at the gprof2dot homepage for more info: https://code.google.com/p/jrfonseca/wiki/Gprof2Dot#gprof

@zbeekman
Copy link
Contributor

zbeekman commented Mar 3, 2015

Hopefully with some profiling, we can get a better understanding if it is #37, #38 or IO that is slowing things down.

@szaghi
Copy link

szaghi commented Mar 3, 2015

@jacobwilliams and @zbeekman
In my modest opinion, you that are not so poor Fortran man can also exploit the Intel profiling tools (inspector, vamplifier, etc...) with your Visual Studio and then tell to me (a real poor man without VS) how they works :-)

@zbeekman
Copy link
Contributor

zbeekman commented Mar 3, 2015

@szaghi I am “poorer” than @jacobwilliams: no VS for me.

BTW, @szaghi did you see my post on cpp-coveralls about what I think the issue is that you’re having? Any luck getting it to work?

@szaghi
Copy link

szaghi commented Mar 3, 2015

@zbeekman
I am sorry for my delay, but today was very busy. I just tested your suggestions on binutils and with a sym link of gcov into local path (that I export) cpp-coverals works very fine, thank you very much! Indeed, before I had to stop testing your suggestions, I had tried to use a container-based travis (as I guess you do) for exploiting travis caching feature, but I failing miserably (in particular installing gfortran-4.9 without sudo and without add the testing repo). Can I ask you how you use the travis container-based (if so)?

@zbeekman
Copy link
Contributor

zbeekman commented Mar 3, 2015

@szaghi no, we don’t use container based for this project due to the gfortran 4.9 dependency. There is no sane way, that I know of, to use a container based build and improve the build speed. An insane way would be to:

  1. compile GCC/gfortran 4.9 from source on travis, without installing it to the restricted system directories
  2. set up the environment to compile and run from from this new GCC build (that is in user space)
  3. cache the built binaries so that you don’t have to re download and re-compile them next time travis is fired up

But I don’t particularly fancy scripting the installation of GCC from source on a remote VM. Which is why I don’t think that this is a sane approach. I just have cache: apt in .travis.yml in they ever add apt caching to the non container builds in the future.

@szaghi
Copy link

szaghi commented Mar 4, 2015

@zbeekman thank you. I agree with you, I do not like to locally compile gfortran. Thank you for your suggestions.

P.S. If I will find the time, I will try to support gprof in FoBiS and improve gcov one. Reading your nice idea to maintain a wiki page of untested procedures I have had the idea that this can be automatically accomplished by FoBiS: run the build with coverage flag should build the targets (your tests in this case) with coverage instruments, run the tests themself, collect the coverage data and produce a markdown page report. In FoBiS is simple to produce markdown pages ... Do you have suggestions?

@zbeekman
Copy link
Contributor

zbeekman commented Mar 4, 2015

P.S. If I will find the time, I will try to support gprof in FoBiS and improve gcov one. Reading your nice idea to maintain a wiki page of untested procedures I have had the idea that this can be automatically accomplished by FoBiS: run the build with coverage flag should build the targets (your tests in this case) with coverage instruments, run the tests themself, collect the coverage data and produce a markdown page report. In FoBiS is simple to produce markdown pages ... Do you have suggestions?

Hi Stefano,

I’m not sure what the current FoBiS.py support for gcov is beyond compiling/linking with coverage flags. I seem to recall you mentioning some additional support of some variety… if so can you remind me what the additional functionality does?

As for generating markdown pages for the coverage info… that is quite an interesting idea! I guess ideally it should be stylistically compatible-with/similar-looking-to with FORD since FoBiS.py and FORD use similar Markdown engines, yes? (Of course the ability to customize the appearance, for example, to be more similar to ROBODoc output, would be wonderful.)

As far as the actual output content goes, it would be great to:

  • Ensure that extracted coverage information is updated each time a program runs. (i.e., the .gcda files are cumulative and the json-fortran test suite consists of a number of individual programs, so if FoBiS.py is going to call gcov for the user it must make sure that it does so either after all executables have been run, or for every source file of an executable after it’s run, each time it is run.)
  • Include basic information at the top, for each source file like:
    • number of procedures (functions and subroutines)
    • number of executed procedures
    • number of unexecuted procedures
    • percentage of executed and unexecuted procedures (with pie chart?)
    • number of executable lines
    • number of missed lines
    • number of covered lines
    • average hits per total number of executable lines
    • average hits per executed line
    • pie chart of percentage of missed lines and percentage of hit lines?
  • A functions listing sorted by percentage of missed lines—functions with high miss rates should be printed first
    • For each function include similar info as above:
      • number of executable lines
      • number of lines missed
      • number of lines hit
      • average hits per executable line
      • average hits per line actually executed
      • percentage of hits and misses, possibly with a pie chart

@szaghi
Copy link

szaghi commented Mar 5, 2015

Hi Izaak,

I try to summarize some points.

FoBiS.py coverage support

Presently, there are a coverage switch for both build and rule execution: the first build target with coverage instruments, while the second run the analyzer (gcovr for gnu and the intel equivaldnt for intel). However, the user have to run the targets into the middle of this two steps, and the rule step is very trivial (in my cases I do other things after... like coveralls as you). Thus I am now thinking (not yet an actual design) how to improve. In not special order I am thinking to:

  • force cflags_heritage when coverage is activated in building mode: this ensures that all sources are always instrumented;
  • running the targets after they have built;
  • modify the intrinsic rule (the second coverage above) in order to make it more general (this is a very unclear stuff).

Unexecuted procedures

First of all FoBiS.py does not use any markdown inputs or outputs. Ford uses markdown for its input and jnija2 as template engine for its html outputs. I said that is simple for FoBiS.py to generate mardown reports because I have strong experience with Python+markdown due to another project of mine MaTiSSe. Consequently, I will very quickly integrate a markdown generator into FoBiS if necessary.

As uncovered procedures searching algorithm is concerned, I am very confused. Yesterday, with the fews minutes of my lunch break... I tried to extract uncovered informations by means of different tools, but it has been very frustrating:

  • gcov & co. do not provide information about procedures, but only for each single line executed or not;
  • gprof provide many informations (executed and not executed procedures list, timing, number of executions...) but it very unflexible:
    • I do not find a way to limit the informations provided to a specific set of files;
    • the calls graph is creasy: there are the calls to all, immaginable, procedures (libc system library, gfortran low level procedures...) and I do not find a way to trim out these low-level calls;
  • after the failing of my approach with gprof I tried callgrind by means of vallgrind: it seems more friendly, but the results I obtained is essently the same of gprof.

Indeed my failing is surely due to the fews minutes I tested the tools, but I think I need your help to find a simple way to extract the uncovered procedures information:

  • how do you extract your un-executed procedures list?
  • how we can find if a procedure is a function or subroutine?
  • robodoc parses markdown or you need other format (html, txt,...)?

Today I should find some more minutes to dig this depeer.

Thank you for help.

@zbeekman
Copy link
Contributor

zbeekman commented Mar 5, 2015

Hi Izaak,

I try to summarize some points.

FoBiS.py coverage support

Presently, there are a coverage switch for both build and rule execution: the first build target with coverage instruments, while the second run the analyzer (gcovr for gnu and the intel equivaldnt for intel). However, the user have to run the targets into the middle of this two steps, and the rule step is very trivial (in my cases I do other things after… like coveralls as you). Thus I am now thinking (not yet an actual design) how to improve. In not special order I am thinking to:

  • force cflags_heritage when coverage is activated in building mode: this ensures that all sources are always instrumented;

yes, I think this is a good idea. I was confused and surprised a few times when coverage info wasn’t created after adding -coverage because the binaries had already been built and seemed to be up to date. Also, when I look at the contents of .cflags.heritage right now, it seems that -ftest-coverage and -fprofile-arcs aren’t in there… only the flags passed by the user to -cflags rather than all the flags that FoBiS.py passes to the compiler. I suspect this is a bug.

  • running the targets after they have built;

I’m not sure what a good way to do this would be. Who knows if the targets will require special command line arguments or input files in a particular location?

  • modify the intrinsic rule (the second coverage above) in order to make it more general (this is a very unclear stuff).

Unexecuted procedures

First of all FoBiS.py does not use any markdown inputs or outputs. Ford uses markdown for its input and jnija2 as template engine for its html outputs. I said that is simple for FoBiS.py to generate mardown reports because I have strong experience with Python+markdown due to another project of mine MaTiSSe. Consequently, I will very quickly integrate a markdown generator into FoBiS if necessary.

As uncovered procedures searching algorithm is concerned, I am very confused. Yesterday, with the fews minutes of my lunch break… I tried to extract uncovered informations by means of different tools, but it has been very frustrating:

  • gcov & co. do not provide information about procedures, but only for each single line executed or not;

On my system there are two ways to get info about procedures. If you checkout the master branch of json-fortran and then export CODE_COVERAGE=yes and then ./build.sh, which will call FoBiS.py to do the build and then run the test suite. The code will be compiled with the gfortran coverage flags and the binary files containing the coverage info are cumulatively updated. Then here are the two ways to get the procedure coverage information:

  1. Call gcov -o lib src/json_module.f90 to produce the json_module.f90.gcov file for the library. In it you will notice that lines preceded by #### are not executed. Any line that is a procedure declaration subroutine foo(in_a,in_b,out_1) that has the #### in front of it indicates that this procedure has never been called.
  2. Another way is to pass the -f flag to gcov: gcov -f -o lib src/json_module.f90 This should produce the same json_module.f90.gcov file with the source listing and info about executed lines, BUT it will also print procedure execution summaries to stdout. An example of a procedure that is never called will look something like this:
Function '__json_module_MOD_json_file_move_pointer'
Lines executed:0.00% of 6

Note that the gcov man page says that percentages will never be listed as exactly 0 or 100 unless NONE or ALL of the lines are executed, respectively. So 0.00% always indicates no lines executed and hence the procedure was never called.

Either way, my suggestions will very likely be quite challenging to implement, since the output of gcov needs to be parsed to extract the relevant information. I think this is what cpp-coveralls does, and then encodes all this info in a json payload to send to coveralls.io.

FoBiS.py is a great tool, and anything beyond what you have already done is extra, in my mind. If you are up for this challenge, great, but if it’s too complicated, it doesn’t add that much compared to the other great functionality you’ve already implemented. I just thought I’d give you my “Christmas wish list” for what the ideal markdown output would look like.

  • gprof provide many informations (executed and not executed procedures list, timing, number of executions…) but it very unflexible:
    • I do not find a way to limit the informations provided to a specific set of files;
    • the calls graph is creasy: there are the calls to all, immaginable, procedures (libc system library, gfortran low level procedures…) and I do not find a way to trim out these low-level calls;
  • after the failing of my approach with gprof I tried call grind by means of vallgrind: it seems more friendly, but the results I obtained is essently the same of gprof.

I wouldn’t worry to much about gprof and callgrind. I think that extracting coverage information should be done with gcov or similar. Profiling info can be nice, but beyond sending gprof output through gprof2dot and then dot (part of graphviz) to generate images of the call graphs with timing info. It’s fine if system calls are included, in gprof2dot there is a way to set thresholds to not display calls that take insignificant time and this usually gets rid of a lot of the system calls. While “fortran poor man” may not entirely understand the system calls, the relevant info will still be there.

Indeed my failing is surely due to the fews minutes I tested the tools, but I think I need your help to find a simple way to extract the uncovered procedures information:

  • how do you extract your un-executed procedures list?

As discussed above the best way is to parse the *.f90.gcov file for function and subroutine declarations preceded by #### (line never called). The -f, function summary, flag to gcov might also be useful.

  • how we can find if a procedure is a function or subroutine?

I’m not sure whether or not this is critical. It should be clear to the user how to find the procedure in the source, since functions and subroutines cannot have the same name in the same scoping unit. However, if you parse the *.f90.gcov file, the line with the function or subroutine declaration, will include exactly that information.

  • robodoc parses markdown or you need other format (html, txt,…)?

No, robodoc doesn’t know anything about markdown or do any parsing of markdown. What I meant was that it would be nice to be able to have output in a format that is stylistically compatible with other documentation, whether it is FORD or ROBODoc. The ability to have HTML output would also be great, then the robodoc.css can just be used as the stylesheet.

Today I should find some more minutes to dig this depeer.

Thank you for help.

I’m happy to help, but as I mentioned above all of these are bonus features. FoBiS.py is great as is! Only work on this if you are inspired to do so and have the time to spare.

@szaghi
Copy link

szaghi commented Mar 5, 2015

@zbeekman

Perfect!

I never take a look at the .gcov ouputs, I had wrongly supposed that there were only lines coverage without info about procedures...

Your Christmas wish list will be completed before the next Christmas... It should be very simple to parse the .gcov files. For our first step I will try to produce a very simple markdown report, in a second phase I will try to support other format.

Following your wise suggestions, I freeze (for the moment) any other experiments with gprof (indeed I have already added a -profile option...).

Tomorrow I hope to push a first FoBiS.py version with your -uncovered report feature :-)

P.S. the missing coverage flags into the cflags heritage is really a bug! Thank you very much!

@jacobwilliams
Copy link
Owner Author

You guys have taken over my issue ticket! Ha ha!

Notes on the original topic: two things are making the parsing generally way too slow

  1. reading the file in one character at a time using a formatted read.
  2. the great number of allocatable string allocations and reallocations during parsing.

I have some fixes in the work for both of these that greatly speed it up... but I'm still doing some experiments, and there are some other things I want to try. Stay tuned.

@zbeekman
Copy link
Contributor

zbeekman commented Mar 7, 2015

sorry 😥 but looking forwards to seeing what you come up with.

@jacobwilliams
Copy link
Owner Author

Update: to parse the big.json file (~7 MB) on my laptop:

  • The current master branch takes about 4.1 seconds when parsing it directly from the file.
  • The "speed" branch takes about 1.7 sec when parsing it from the file, and only 0.3 sec when reading the entire file into a string and then parsing the string. Note that for the master branch, this isn't even really a viable option, because it just takes forever (I killed it after a few minutes...) The reason is the large number of allocatable character reallocations.

So there is some promise here. Still working on it. I need to see how it can work with unicode files (I'm using unformatted stream read).

@szaghi
Copy link

szaghi commented Mar 10, 2015

@jacobwilliams Please, forgive me for polluting your issue...
@zbeekman I finally get the gcov report maker... presently the output is similar is something like this:
https://github.com/szaghi/BeFoR64/wiki/Lib_Base64.f90.gcov-report

For pie chart I must select the best plug-in for markdown. The inter-procedure analysis (metrics inside each procedure) will come later. See you soon.

@zbeekman
Copy link
Contributor

awesome, looks great!

@zbeekman
Copy link
Contributor

@jacobwilliams I did some more research on unicode/utf-8 encoding as it pertains to speeding up reading in large files. (I also posted some more questions on your Intel forums thread.)

One thing that may be helpful to us in diagnosing whether files may have utf-8 encoded characters with inquire( … encoding=FL_ENCODING …). If FL_ENCODING isn’t returned as ‘UNKNOWN’ or ‘UTF-8’ then we can safely use the unformatted stream io trick. Otherwise, the easiest course of action is probably formatted IO with UCS4 characters.

A reference-counting-like scheme could be used to keep track of how many sub-objects contain non-ascii characters which can then be used to determine whether or not to use ‘utf-8’ encoding if the object is ever written out to a file.

I think it is possible to use unformatted stream io with uff-8 files, by reading into a string of 1 byte characters. Then you can look at the first bit (e.g. ichar( buffer(i:i) ) > 127) to determine if the character takes up more than 1 byte and use the bit manipulation intrinsics to further determine how many bytes the character has been encoded as and how to transcode it to UCS4. (Although this won’t be fun.)

@zbeekman
Copy link
Contributor

Here’s another thought in terms of managing speed and UTF8/UCS4:

  • UCS4 enabled json-fortran will read both UTF8 encoded files (the slower, safe way) and ascii files (the fast way)
  • On output, UCS4 characters > 127 (ascii set) get converted to the ‘\u####’ format and saved without UCS4 encoding so that they may be read in quickly by json-fortran using the stream-io + unformatted trick. What do you think, @jacobwilliams ?

@zbeekman
Copy link
Contributor

Well, those were all nice theories, but after doing some tests, it seems—at least on Mac OS X—that there is no way to determine the encoding or contents of the file simply using the inquire intrinsic… files containing ascii text are completely indistinguishable whether or not they are written with UTF8 encoding. I guess this makes sense….

So it seems that, other than examining the leading bit of each byte read in, (which I imagine would be slow…or at least difficult to code) there is no good way other than formatted IO to deal with UCS4 characters.

So, in light of this, this might be a good thing to let the client code control: provide a safe but slow API interface to read files with non-ascii, UCS4 characters in them, as well as a fast API that will read files containing only ASCII characters correctly, but will fail for UCS4.

Thoughts?

@zbeekman
Copy link
Contributor

btw, here is the code I was using to test:

program main
  use iso_fortran_env ,only : file_storage_size
  implicit none
  integer, parameter :: CK=merge(tsource=selected_char_kind('ISO_10646'),&
                                 fsource=selected_char_kind('DEFAULT'),&
                                 mask=selected_char_kind('ISO_10646') /= -1)

  integer :: lun

  character(kind=CK,len=:),allocatable :: ucs4str
  character(len=:),allocatable :: plainstr
  character(len=16) :: strm, enc, frm, unfrm
  integer :: sz

  ucs4str = CK_'Hello World!'
  plainstr = 'Hello ascii world!'

  open(newunit=lun,file='utf8.txt',access='stream',encoding='utf-8',form='formatted')
  write(lun,'(A)') ucs4str
  close(lun)

  open(newunit=lun,file='ascii.txt',access='stream',form='formatted')
  write(lun,'(A)') plainstr
  close(lun)

  ucs4str = CK_'\u3053\u3093\u306b\u3061\u306f\u4e16\u754c' ! hello world in japanese

  open(newunit=lun,file='hello-jp-utf8.txt',access='stream',encoding='utf-8',form='formatted')
  write(lun,'(A)') ucs4str
  close(lun)

  open(newunit=lun,file='ascii-utf-8.txt',access='stream',encoding='utf-8',form='formatted')
  write(lun,'(A)') plainstr
  close(lun)

!  stream encoding size formatted unformatted

  print'(A)','stream          '//'encoding        '//'formatted       '//'unformatted     '//'size '//'file'
  inquire(file='utf8.txt',stream=strm,encoding=enc,formatted=frm,unformatted=unfrm,size=sz)
  write(*,'(A,I4,A)') strm//enc//frm//unfrm, sz, ' utf8.txt'
  inquire(file='ascii.txt',stream=strm,encoding=enc,formatted=frm,unformatted=unfrm,size=sz)
  write(*,'(A,I4,A)') strm//enc//frm//unfrm, sz, ' ascii.txt'
  inquire(file='hello-jp-utf8.txt',stream=strm,encoding=enc,formatted=frm,unformatted=unfrm,size=sz)
  write(*,'(A,I4,A)') strm//enc//frm//unfrm, sz, ' hello-jp-utf8.txt'
  inquire(file='ascii-utf-8.txt',stream=strm,encoding=enc,formatted=frm,unformatted=unfrm,size=sz)
  write(*,'(A,I4,A)') strm//enc//frm//unfrm, sz, ' ascii-utf-8.txt'
end program main

If you compile with gfortran add the -fbackslash flag.

@jacobwilliams
Copy link
Owner Author

This is still on my to do list...just haven't had time to do it yet.

@zbeekman
Copy link
Contributor

no worries, I’ve been slammed too

@jacobwilliams
Copy link
Owner Author

Update: Good news and bad news!

I'm getting this all merged together now (see the unicode-speed branch)...not finished yet, but I think I have most of what I did a couple months ago working again. It seems to compile/run fine with the gfortran 5.0 build I was using before.

However, I also updated my laptop with the latest gfortran 5.1 build (from http://coudert.name) and with that one the unicode tests all fail (even for the master branch). Haven't looked into it in detail yet...

@zbeekman
Copy link
Contributor

zbeekman commented May 4, 2015

Oh, nice find: gfortran 5.1 for OS X. I’ll try to take a look at the test failures in a few days. I’m guessing this is a regression on gcc’s part, but I’ll take a look. If it is a failure, it would be great to get them a bug report & reproducer ASAP so they can fix it before the compiler is released.

@jacobwilliams
Copy link
Owner Author

I went ahead and merged this into master. Note: the STREAM read mode is not enabled when using Unicode...still need to look into that further.

@zbeekman
Copy link
Contributor

zbeekman commented May 5, 2015

I think due to the nature of UTF 8 encoding, other than rolling your own UTF8 parser, it will be hard/impossible to use stream reads for unicode, since UTF8 encoding is variable length per character to maintain backwards compatibility with ASCII. Not that I’m trying to discourage you, but I think at the end of the day we need compiler support for fast reads of UTF8 encoded characters…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants