Skip to content

Tidy needs some test procedures! #330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
geoffmcl opened this issue Dec 2, 2015 · 40 comments
Closed

Tidy needs some test procedures! #330

geoffmcl opened this issue Dec 2, 2015 · 40 comments
Milestone

Comments

@geoffmcl
Copy link
Contributor

geoffmcl commented Dec 2, 2015

This is an important topic... and needs a champion to solve it!

Several poster have offered some parts of the solution... please read them... 1, 2, 3, and more...

At this moment @vielmetti seems to have the best solution for this, but also like others test solution, certain issues still need to be resolved...

Really we do not want to replace one lame solution with another lame solution, not that the TAP idea is lame! But what do we gain here?

At present a test.t requires that we re-write each test! At what cost for each new test? Wow, is that really necessary? As expressed can we not leverage the files in test/input, and test/testbase?

And it does not yet seem to address that certain issues that need more than a simple exit value test. It needs to compare the previous known tidy output, both the 'message' and the 'html' output, to see if something has changed! And that can depend on the the options used.

The current 327 seems such a case. It seems solved until you add the option --show-body-only yes, when the warning is erroniouly still shown. It should fail with this option. How do we deal with this?

Really we need help on this. Tidy needs a solution for this... but I am sorry, I do not see it yet!

@balthisar
Copy link
Member

This is a huge engineering task in its own right. We could start by defining some specific, stated goals for testing. It's overly broad to state the result is that Tidy works properly.

Tidy was written was long before unit testing was popular and proper unit testing would require refactoring a lot of Tidy code. Some simple test cases could be made for the major services without too much refactoring, though.

I'm not certain that unit testing will tell us enough about Tidy as whole, though, and so some type of integration or functional testing is the real requirement. Integration testing is hard without being able to trust that the individual units are always working, and so we're back to unit testing.

Regression testing is the only type of testing that's performed now, but as @geoffmcl mentioned the inputs are often decoupled from the option settings and so if we can take those into account we could turn this into a kind of functional test.

The output compare should consist of two things: does the tidy'd output match the expected output? I think we have most of that in testbase. Capturing the message output with text as-is could be dangerous because we're also looking at localization and/or English messages could change. We could use the raw data from one of the message callback filters, though, and record just the error code and location data.

Do we all want to learn Python? Ruby? Make it non-portable and require bash? Or maintain something in Tidy's native C?

The issue of configuration differences isn't small, either. Given what I think are 96 different options with possible option values, do we run each test with all possible combination of options? Or do we introduce the possibility of human error and try to figure out the best options on a case-by-case basis? show-body-only is an obvious one. We can probably ignore tidy-mark, but there are several that can have unexpected interactions with each other unless we test for them. But it's chicken and egg, because right now we don't know how they interact without manually reviewing them.

I supposed we could build an program to build the tests. Given each test case we'll programmatically adjust every option, capture the output, and compare everything against everything for differences, but this may still give us a hundred or more variations of the same test depending on options interacting.

My head hurts thinking about it.

@balthisar
Copy link
Member

I've weeded down the possible formatting options very, very loosely and without too much thinking to 62. Most of these are binary, but some aren't, but even if they were all simple yes/no, that's still 4.6x10^18 possible output variations for a single document. Granted a lot of them won't be different because a lot of test cases don't include (e.g.) repeated attributes or php tags or things like that.

That 62 discounts things like file encoding, verifying the correct number of tabs/spaces are used, etc., making sure wrap works at 80 vs 68, etc.

@joeheyming
Copy link

Looking at what is currently in the test/ directory. I think that MS bat files are not a good idea since that is not portable. Any other cross platform scripting language would be better than this (and may encourage others to write more tests)

I think the only option we have at the moment is to write black box tests. I propose that we first get continuous integration running (Travis CI) so that for any merge to master, tests get run. This will force us to start thinking in a cross platform way. Then start converting the existing tests to run in the CI suite. Then we can start talking about unit tests.

@geoffmcl
Copy link
Contributor Author

@joeheyming thank you for starting to look at this...

The original tests, still all there, are in unix shell scripts, which are also not too portable. Charles and I added the MS cmd/bat files to mirror that unix system, and in some ways added a little to it...

It is correct to separate, for want of a better word, the current regression tests, from any more formal unit testing. That is a different topic, maybe Test Procedures 2. And as @balthisar points out, that may require some restructuring, testing interaction of the options, etc, etc,... a much bigger topic.

So just concerning the tests we have. How can these be dealt with, in a cross platform way?

The suggestion is first we get Travis CI running, then we will start thinking? HUH? I would suggest we start think before we automate the tests ;=))

Get some aims? Know what we trying to achieve... fix, get them clean... then, and only then automate, if need be... chicken and egg, no more like putting the cart before the horse...

The present aim of the first 228 tests in testcases.txt, is to compare the exit code with expected, and compare the output with the testbase output. Uses the ../build/cmake/tidy binary, and processes each case to an output. They do not presently do the last step of the diff between the output, and the testbase folder. That must be done manually... but could be added easily...

Is this sufficient? Does this give us what we want? Well yes, I think so. It quickly tests some things for change, but only some things, and this should/could have been done before it was pushed to master, not after! Get in front of the problem, not just be in catch up mode...

If you want, need to experiment, do it in a branch, and make sure you run these tests before offering a PR.

So I would like help, discussion, ideas on getting this working now, in as cross platform way as possible. Including maybe adding some more documentation on how to run them, what to expect, etc... see test/README.md, maybe add to CONTRIBUTING.md...

Concerning Travis CI, I would like some more experience with this before attaching it to htacg tidy, even if we want that. But I do need some help setting up my fork. I did experiment a little a few months ago, but did not get very far...

Does someone have a .travis.yml file that I can try? Do they have something working now I can look at? Another fork? @vielmetti did this, see #266, but have not heard from him in a while, and it seemed to involve re-writing each of the test???

I definitely want to experiment, see what is possible, before maybe recommending it to htacg tidy. It should first use what we have. And new, different things should be thought about, discussed, setup, experimented with, etc, all before other tests be considered, added...

So there are few issues here -

  1. look at, discuss, fix, repair the current set of tests.
  2. document, and make them easy to run by every developer.
  3. experiment, test what travis ci can bring.

I look forward to help, ideas, patches, PRs, on these topic... thanks...

@joeheyming
Copy link

If I have any time to look at this, I would probably investigate using python (or maybe perl since it has good text manipulation abilities). I think python is safer/hip'er, but may have more leg work to get the right set of test utilities.

The easiest way to write a python test would be to use the unittest module. That framework provides a junit style way of test setup/teardown. Given that the test that are written adheres to that framework, you can then build test tools that are common to setup/teardown. Think a test/lib/ (or tools) directory.

I agree with the horse before the cart thing. I was actually thinking that exact phrase after I sent my earlier comment :-p

Travis.yml is simple. Given that you can run any test from the command line, you can make travis invoke those tests (glob invoke) and succeed/pass if their return codes are sane (nonzero). I wrote some unit tests even for lisp with travis. Here is an example in one of my python projects: https://github.com/joeheyming/redef/blob/master/.travis.yml

@geoffmcl
Copy link
Contributor Author

@mcepl suggests another neat target, see 266 of $ make check. Maybe he will come back with a patch or PR... thanks...

And re-reading all of #266 gives some discussion on difference when test are run on different OS/CPU combinations, some of which were never resolved... one has a patch to fix the path separator... etc... lot of reading, information here...

This is issue # 1. Look at, discuss, fix, repair the current set of tests.

@balthisar
Copy link
Member

Well, I've posted a testing branch with my take on it. Everything is completely self-contained in testing/benchmarks/, and there's a pretty comprehensive Ruby script regression_test.rb that performs the testing.

In this case I'm not looking at any of the exit codes; I'm only comparing the error output and the generated HTML. There are still some files that don't pass; I've not root caused them completely but probably relate to the comments @geoff made above.

To have a try, check out the testing branch, make sure you have Ruby installed, and cd into the testing/benchmarks/ directory. Although it's not a Ruby gem, using bundler will ensure that the dependencies are met. If necessary on your system, gem install bundler, and then in the benchmarks/ dir, bundle install.

Running the script without arguments will provide all of the instructions needed to get started.

As of right now, generating new test result files isn't implemented, but I'll work on that as time permits. However all of the regression testing functions, and I think the report that it prints out (and saves) is full of useful information.

I've duped the test cases into benchmarks/cases/ and renamed some of them to conform with the testing applications file naming conventions. A look in the cases dir should make it pretty apparent what's required.

In addition to @geoffmcl's standard test cases, I've also moved in all of the other test cases, and where required, generated config files for each of them (I've still not added the expectations files, though).

I think it should be trivial to code around some of the platform differences, or at least provide an explanation for those specific cases.

Why Ruby? Because it will run on any OS unlike bash or cmd. Ruby also has a lot of power for a scripting language, meaning I could deliver something a little better than "bare bones" in a limited amount of time. Ruby's fairly easy to read, and I used my overly-verbose commenting style throughout, so there should be few surprises.

Things to consider in the testing so far:

  • It's comparing the error output.
  • It's comparing the tidied HTML.
  • It's NOT looking at Tidy's exit code.
  • I'd like to consider using the --quiet output, which provides all of the error messages without the extra text at the bottom. I think this should be sufficient, but opinions are wanted. It would simplify testing and be more solid against future strings changes.

Also to-do is implement all of the expect files for the other, non-core tests. I'll update the testing branch as I get around to it.

Adding additional test cases is very easy by following the naming conventions. Just drop the files into cases.

Finally I think I've made it as cross-compatible as possible. I'll install dev environments into my Windows and Linux VMs just to double-check, but I'm pretty sure I covered all the bases.

@balthisar
Copy link
Member

Now the canonize command works, but I've NOT run it on the current cases/. I'll troubleshoot the current failing cases, first.

I may look at some convention to fix path issues and look at processor-specific things, too, although I think #266 is a bug, not a processor feature.

I'm definitely getting some type of encoding issue on case-427664 that I have to troubleshoot.


Right now the testing should be 100% backwards compatible barring differences in string output. String output kind of sucks.

If we're willing to abandon backwards compatibility, it would be much nicer to add a new flag to tidy to present only raw data for error output, such as error code, line, column, instead of the English text.

Also similar my earlier comment about standardizing on --quiet.

It depends on what we risk losing by dropping this.

@mcepl
Copy link

mcepl commented Dec 21, 2015

@mcepl suggests another neat target, see 266 of $ make check. Maybe he will come back with a patch or PR... thanks...

He will not. Sorry. Too much on my plate already.

@balthisar
Copy link
Member

Case in point re: the above. Some of the sample tests are failing because I changed some non-standard English to more standard English, and the literal string comparisons are failing.

I won't reverse the string changes just to get it to pass a test, but this is a good argument for eliminating the English output altogether with a (e.g.) --diagnostic switch on the command line whereby only error codes are reported instead of strings.

I'm just thinking aloud as I encounter the issues...

@balthisar
Copy link
Member

Final report of the night. There's a huge encoding difference, but I'm not sure if it's due to Ruby, or a Mac OS/Windows difference.

On my Mac running all of the tests these are the failed cases with explanations:

case-1062511.html   Fails ERRORS because I changed the English string.
case-1410061b.html  Fails ERRORS because I changed the English string.
case-1410061c.html  Fails ERRORS because I changed the English string.
case-1426419.html   Fails ERRORS because I changed the English string.
case-1632218.html   Source indicates Euro Symbol € expect indicates URL encoded version %E2%82%AC, my system generates %C3%84 .
case-427664.html    Text editor complains that -expect.txt is bad UTF, indicates line 5 column 1 - Warning: <body> attribute name "  1/2" (value="xx") is invalid (inverted question marks). My system generates line 5 column 1 - Warning: <body> attribute name "Ã1/2" (value="xx") is invalid. Source file is <body √1/2="xx" width= align="">
case-427672.html    Identical to above.
case-427823.html    Fails ERRORS because I changed the English string.
case-431721.html    Fails ERRORS because I changed the English string.
case-445557.html    Uses default config. Expects uses &# entities. My system generates the Ò unicode character.
case-588061.html    Fails ERRORS because I changed the English string.
case-661606.html    Fails, but they LOOK identical in a text editor, and RENDER identically in a browser. I've not done a binary compare yet.

I've still not installed a dev environment on Windows or Linux, and haven't begun to dig into Tidy's internals or Ruby's, and I've not yet begun to dig into the source file formats (for example, case-661606 is Western Mac OS Roman, a really old legacy format).

@geoffmcl
Copy link
Contributor Author

@balthisar this looks like a great effort... thanks... it is good someone else begins to see the current mis-matches, and tries to understand, fix, change, except, handle them ;=))

After jumping through a few hoops got a working Ruby, installed, and running, my first in Windows - chose version 2.1.7 (32)...

But could never get the tidy tests to run... the problem seems to be the setting of the path to the tidy.exe to use...

So switched to Ubuntu 14.04 LTS to give it a try there... well it had no trouble using an installed tidy binary (/usr/bin/tidy)... seemed to run all the tests, but got an error in make_report!

./regression_test.rb:140:in `make_report`: uninitialized constant TidyRegressionTesting::TidyTestRecord::DateTime (NameError)

But then I wanted to use other than an installed tidy version - very necessary since I do not want to install this as yet un-tested tidy - and ran into the same problem as in windows when I tried to add a binary option, like -

$ ./regression_test.rb -t ../../build/cmake/tidy rtest

All I get is the rtest usage??? And when I ran this without the rtest it shows me the correct path to tidy but not the correct version??? Which in this case is the localizing version...

But I realise this is a first cut, and lots of time, effort, thought, and testing... and can not wait until maybe it matures a little... thanks again... this seems another step down an interesting road...

It does add a Windows Ruby dependency, not normally present, but OTOH it does not seem too extreem to expect some tidy developers to take that extra install step, if they get better, more thorough regression testing, and perhaps other testing...

I have been experimenting with an Test-Tidy app, and must take the time to look closely at the mis-match cases I found there, and see if they compare to those you mentioned. Of course that is other than those where the you have changed the string, if that is the only difference...

My list from that testing is 445557, 500236, 647255, 649812 and 658230!

I would really like to understand, and do something about these exceptions... they niggle me all the time...

@balthisar
Copy link
Member

Hmmm... thanks for trying it out. I've got a working Ruby on Windows now... let me see what's not working.

First off, apparently Windows doesn't know what to do with shebang lines, so it's necessary to run ruby regression_test.rb. It looks like it's still hanging up on popen, which is supposed to be supported in Windows. Let me try to troubleshoot that.

I'll move to Linux next. I think, though, that the command line interpreter wants to see the command "rtest" before the options. I'll clarify that in the help.

@balthisar
Copy link
Member

Latest commit now verified to work on Windows. I still have to spool up a Linux VM to find out what the issue is there.

ruby regression_test.rb help -t ..\..\build\cmake\tidy.exe
ruby regression_test.rb help

Show the correct paths and version strings on my system (the first from Tidy in $PATH, the second in the build directory). It also runs through the regression test and canonization process without issues.

Windows test failures (5.1.25)

case-427664.html              config_default.conf             PASSED   FAILED
case-427672.html              config_default.conf             PASSED   FAILED
case-431889.html              case-431889.conf                PASSED   FAILED
case-433012.html              case-433012.conf                PASSED   FAILED
case-588061.html              config_default.conf             PASSED   FAILED
case-649812.html              case-649812.conf                FAILED   PASSED

Mac failures (5.1.24)

case-1632218.html             config_default.conf             FAILED   PASSED   
case-427664.html              config_default.conf             PASSED   FAILED   
case-427672.html              config_default.conf             PASSED   FAILED   
case-427827.html              config_default.conf             FAILED   FAILED   
case-431874.html              config_default.conf             FAILED   FAILED   
case-431889.html              case-431889.conf                PASSED   FAILED   
case-445557.html              config_default.conf             FAILED   PASSED   
case-661606.html              case-661606.conf                FAILED   PASSED 

I'll have to dig into the differences between the failures after performing a Linux test. I'm guessing these are related to character encoding, though.

@mcepl
Copy link

mcepl commented Dec 22, 2015

I still have to spool up a Linux VM to find out what the issue is there.

Why not to make Travis-CI working?

@balthisar
Copy link
Member

Why not to make Travis-CI working?

Because I know Linux quite well, and don't know a thing about Travis-CI. For my own projects' CI needs, Mac OS X Server has a perfectly capable CI solution that works with github, so I've never bothered to look at third parties. Still, we don't know my experimental solution will be adopted, and so one step at a time. ;-)

@mcepl
Copy link

mcepl commented Dec 22, 2015

Not sure what to think about all those missing test files (that's 868c80c on this repo, branch testing) ... typescript.txt

@balthisar
Copy link
Member

Test cases are present, but there's nothing to test them against -- yet. Aside from the standard test cases, there are all of the accessibility tests, and the old html4 and html5 tests. They all used to use their own testing scripts and test output separately.

I've generated config files for all of them based on the old test scripts, but I've not yet generated the files to compare the output against. I'll have to go through all of that manually to ensure that they work as expected.

For now the output warns you that, hey, these files can't be tested because of the missing files.

@geoffmcl
Copy link
Contributor Author

@balthisar ok it now runs in Windows! Thanks ;=)) But in unix I get the same error as @mcepl!

regression_test.rb:219:in `make_report': uninitialized constant TidyRegressionTesting::TidyTestRecord::DateTime (NameError)

Concerning version of tidy in the tests, maybe line 540 in regression_test.rb is the problem tidy_out.split.last.scan(/\d+/).join('.'). It seem this split needs to include more than numbers, like 5.1.32localizing so I exactly know which tidy is being used. Sorry do not know rudy enough to correct that... but why is that not just tidy_out.split.last? Seemed to work for me...

But what have you added to these tests?

It seems you have now added the contents of the test\html5 folder, and sub-folder??? These are not regression tests! These were just development files I added while I was working on adding html5 support, way back in my fork... and when htacg merged my fork, they were just sort of left there... and the topic sort of did not come up...

I did think about formalising them, but as the 20140805 README.txt I wrote says it, was more about comparing to tidy 2009 output, than comparing two modern current versions of tidy... I have not used them for ages... they were just for development... they perhaps should have been deleted...

I certainly do not think they should be added to your benchmarks/cases!

Now if you want to add more tests then I have been keeping a test\input5 folder, and for just about every bug I have worked on has one or more test cases in there. Over 300 files! I have developed batch files to run these cases, but I have not added a base compare... that could be done. Using these would certainly be a regression test for each recent bug.

Be aware it includes files for the re-testing of some of the old SF bugs, some edbrowse bugs, and assorted others... It is quite a MESS really... so again I am not sure if this is the time to add those... but advise and I will push them...

And have not checked, but what about the list in xmlcases.txt? But I am fairly sure I did not add a testbase output for them... not sure...

But I do not think we should be looking to expand, increasing the tests at this time. As stated, it seems the first priority is to achieve a clean run, in any/every OS/CPU, on the testbase ;=)) a proof of concept...

Access Testing

Now part of that base testing was to do the access tests. Although I have not checked carefully or fully yet, it seems the access tests do need some TLC - see #338, which was reported so long ago, and still exists within tidy today...

Of course these access tests do NOT compare the output with another base output, but rather that the expected access warning value is shown. That is, they uses grep (or find in windows) to find the expected say 1-1-1-1 in the current message output...

Maybe this could be another test type like atest, instead of rtest, or something... and as seen in my #338 case, maybe some of the access warning values, and/or descriptions, and maybe when emitted, do need some work...

Missing files

Now you mention missing testbase files, but I am sure you do realise in certain tests there is no html output due to the fact that tidy detected a html error and --force was not added to the config, so there is no html output. And it would thus be an error if an output was found! IIRC there were 6 of this type 427676 434100 647900 1002509 1063256 2046048, but need to check... these are the tests that exit with a 2...

And then of course there is 431958, which I only 'discovered' recently! It uses write-back: yes in its config! That is, it modifies the input file!!! So there will be no html in testbase for it. Interesting that only $ git status would show if this test failed in that there should be no write-back modification of the input ;=))

Then there is case 431895! The only test to use gnu-emacs: yes in its config. And this must show a difference in the path separator between unix and windows. I note you did not include a case-431895-expect.txt. But how does rtest avoid showing this as a problem either in one or the other OS, depending on what -expect.txt is added. Can we have say 2 expects, like an expect-u.txt and expect-w.txt, and switch compare unix/win32?

As repeated, I hope we can go back to just testcases.txt, plus perhaps xmlcases.txt, and solve, decide on, discuss, fix, the specific problems in those, so we can have a clean run...

And maybe consider something different for access testing... this would go a long way to showing the viability of this Ruby scripted solution...

Just my 10 cents ;=))

OT: While I am not suggesting Travis CI for this htacg, at least not yet, I am trying to experiment with it from my fork. @mcepl do you have something running in Travis CI? See https://travis-ci.org/geoffmcl/tidy-fork... Seems I need rake or something... do you have a working .yml file? Any help appreciated...

@mcepl
Copy link

mcepl commented Dec 22, 2015

@mcepl do you have something running in Travis CI?

Plenty, but I am a Python guy, so not sure how helpful to a Rubyist. https://travis-ci.org/mcepl/

@geoffmcl
Copy link
Contributor Author

@mcepl, hey, I am not a Rubyist ;=)) Nor am I a Python guy ;=((

I am a C/C++ coder, and only use scripts where I really have to! And even then usually only until I write some C/C++ to do the same thing... usually not because it is faster, or easier, or anything, ... but because I love C/C++ coding...

But forget 'testing' for now... can you setup your fork to just build on Travis CI... that is clone the repo, cd build/cmake; cmake ../..; make;... that would be a great start... thanks...

@mcepl
Copy link

mcepl commented Dec 22, 2015

But forget 'testing' for now... can you setup your fork to just build on Travis CI... that is clone the repo, cd build/cmake; cmake ../..; make;... that would be a great start... thanks...

mcepl/tidy-html5@2d04638f40 seems to work, but for problems with Travis CI having too old cmake, which is something you can fix better than myself. See the results on https://travis-ci.org/mcepl/tidy-html5/builds/98394012

Of course, all that install: (with gem --version) is just an example where would installation of gems go, if you need it. If not, than just both these lines can go.

@balthisar
Copy link
Member

@geoffmcl, I'm not home and thus spent much of yesterday trying to download an Ubuntu image and getting the VM to work through a proxy. Linux will be fixed soon!

I would argue that the HTML4/5 files are regression tests. While it would take a pretty major effort to cause a regression in Tidy, it's something that's possible. Maybe you hey could be condensed into a single file, though.

I can write a new method for the access tests, but my thought was to treat them identically to the other regressions. Matching a single string is already encompassed by the regression test, so it can be convenient just to issue rtest and be done with it.

A simple tweak can introduce traversing directories, so maybe I can at least break up the cases into categories by directory.

Right now the report generates a lot of noise due to the missing comparison files. The html4/5 tests and access tests all pass 5.1.24, and so I think it's safe to use canonize on them that will squelch the missing file warnings and they'll be tested.

I'll make sure to capture non-forced output. I thought I already handled that case but will have another look.

Oh, and the XML is already included, testing, and working!

@geoffmcl
Copy link
Contributor Author

@mcepl thank you... I have dropped that cmake version back to 2.8.7... There was no real reason to have a minimum version of 2.8.8... and it could be even lower... tidy CMakeLists.txt makes no use of cmake feature since probably 2.8.0, and even probably LONG before that... maybe even say 2.6...

I did not understand the install: target comment... We would only need to install in Travis CI if something else only worked if the package is installled... but to me everything else just correctly depends on creating a build/cmake/tidy executable... it does not need to be installed... It, by default, is links with a static libtidy, so unless other cmake options are added, the tidy exe is self contained, and can run from anywhere...

Can you try the build again? thanks...

@geoffmcl
Copy link
Contributor Author

@balthisar yes, I can understand my accidently added HTML5/4 files could be use in regression testing, even given that they did not have that original purpose, but they would need to be culled heavily, and a basetest standard output added... just work, effort, thought, and discussion, on what is being added, and that the test is not already tested in other tests...

Removing them would reduce a lot of noise, and the work to add to the base, ie the expect.txt/html, that is all...

Hmmmm, I do not think access testing should be, or can be, based on the same basetest file comparision. If we correct one access message, or suppress it due to html5, etc, thus changing the output text, then that should not invalidate perhaps several other access test comparisons, for the wrong reason?

No, the access tests were designed to output a specific access value, and the tests success or failure depends on that alone. Not on what else was output... or have I got the wrong idea here?

Maybe you do "capture non-forced output", although I do not quite understand that... do you want the ruby script to read and parse the config to see if force-output: yes is there? to decide if there will be no html output??? A quick check of the input config files only shows 647900 as having it... that seems excessive... and difficult... again maybe I do not understand...

But you seem on a roll with this, so no negative thoughts intended... go for it ;=))

Glad the xml tests were included... and good luck with the Ubuntu VM install...

Oh, I have not yet understood what to canonize them actually means... still to try that option...

@balthisar
Copy link
Member

The option canonize will generate the -expects files.

For example the access tests currently pass the current testing script. Therefore their current output is acceptable to use for future comparisons, and so canonize would generate those files.

The current strings are what complicate the testing procedure. Either they never change, or we stop counting on them.

A hacky workaround: I restore the old strings, call the old version of English en_legacy, and use that as our test language. It guarantees backwards compatibility, and because of string inheritance making new en strings won't add much to tidy. This addresses access test concerns, too.

The ruby script doesn't have to check the config. There is at least one zero byte file. I'll confirm it's from non-forced output (mobile phone now). When the script generates zero bytes and compares it to a zero byte file, it matches.

@balthisar
Copy link
Member

I've gotten my VM to match my home VM: Ubuntu 14.04 LTS, all updates applied, Ruby 2.2.1p85 installed. If you used apt-get you probably got an older version of Ruby (1.9 or so), whereas RVM nicely gave me modern Ruby. Upon cloning the rest repo and building and installing Tidy, the regression test ran perfectly!

I'll have RVM install an older version just to see what happens. In the meantime I pushed what I think is a fix for older Ruby, which doesn't include the date dependency by default.

Let me know if anything else seems broken on Linux!

@balthisar
Copy link
Member

Well, I hate to be spamming the list, but it's the strangest thing... Ruby 1.9.3p551 has no problem with the DateTime, either, even when not explicitly requiring the library.

@geoffmcl, can you give me a ruby --version for your system? Can you also try ruby regression_test.rb to see if that makes a difference?

@mcepl
Copy link

mcepl commented Dec 23, 2015

My branch testing is as far as I am able (and willing) to go given my non-knowledge of Ruby.
It breaks in some weird Ruby problems (see https://travis-ci.org/mcepl/tidy-html5/builds/98488955).

I guess @balthisar should now take it and run with it.

@balthisar
Copy link
Member

I'll try to have a look tomorrow, or during the weekend (hopefully some unblocked time available!).

@geoffmcl
Copy link
Contributor Author

@mcepl thanks for your Travis CI lead ;=)) I just needed a working sample... I think the Ruby bug seems gone in the latest... you need to rebase your fork, specifically the testing branch...

I have not yet tried the Ruby road, but was able to do the previous regression tests without too much trouble... Used a travis.yml file of -

# whitelist - only specific branches
branches:
  only:
    - master
    - testing
language: c
rvm:
    - 2.0.0
addons:
  apt:
    packages:
    - cmake
    - xsltproc
sudo: false
#install:
#    - bundle install --gemfile=test/benchmarks/Gemfile
before_script:
    - cd build/cmake
    - cmake ../..
    - make
    - cd ../../test/
    - ls -l ../build/cmake/tidy
    - ls -l ../build/cmake/
script:
    - ./testall.sh
    - diff -ua testbase tmp
# eof

This gets me back to seeing the problem tests I am concerned about solving, namely -

  • 431895 txt - but just a leading ./ by the script I think - can be fixed
  • 500236 html - can't quickly spot the difference - need to check
  • 661606 html - a whole file diff??? quite strange - need to check

But wow, in general this is not something I want run each time I do a push... maybe there is a way to have a specific single trigger, like pushed a version.txt file change only, or something...

Quite fun initally, but frustrating to debug (yaml/compile/versions/...) problems... and too slow to be any real dev cycle use - after a push I could fall asleep waiting for the vm to start, clone, setup, ... ;=))

And it seems a testing build overwrites the master... well nothing of the vm is kept after the finish - of course you can keep the text log (with ansii color sequences)...

Will continue to experiment, sporatically, now and again... and I notice they have an osx vm ;=)) thanks...

@mcepl
Copy link

mcepl commented Dec 23, 2015

I don't know ... https://travis-ci.org/mcepl/tidy-html5/builds/98488955 claims it took 28 sec. If it is too long for you, you have really trouble with patience (yes, I know it is more than that).

@balthisar
Copy link
Member

Test suite reporting and test case organization is coming along nicely. It's a slow process, but I've encountered a frustrating Windows issue (I've not looked into it yet; I don't have a Windows C dev system installed; suggestions appreciated).

In Windows Tidy is writing -- sometimes, but not always -- the error output to stdout with extra newlines. If you run the current batch of tests, you see it in some -- but not all! -- of the -fails.txt messages.

It's not a ruby issue. If I use tidy -f errors.txt cases\case-1002509.html then Tidy writes the errors properly. If I redirect STDOUT like tidy cases\case-1002509.html then I get a file with garbage newline characters: $0D$0D$0A instead of two newlines. It looks like Ruby is fixing the missing $0A and giving two Windows newlines, so that's nice, except why is Tidy on Windows.

Interestingly redirecting stderr on Mac OS and Linux give Windows CRLF pairs, while -f uses the $20 on Mac and $0A on Linux. For current Mac OS I'd regard that as a bug, as it should use the standard Unix LF.

This is the latest release 5.1.25. I've not browsed the diffs on newer versions, and I'm not filing this as a bug report yet, as I plan to have a look at it first.

As for the testing suite I was hoping to avoid file access and do everything in memory (capturing stderr) for speed, but I'll simply use a file, then we'll be assured that the test is still backwards-compatible with any potential bug fix.

@balthisar
Copy link
Member

Current status: after refactoring quite a bit, file comparison is much more sane, and pretty much every tests passes. I've still got to fix the few that aren't tested (should be easy to manually canonize them and verify). There are (so far) some platform issues mostly due to encoding, as well as a string that changed not too long ago.

I can get all of the current tests to pass, it will give us some pointers for issues on Win/Linux/Mac, but string testing isn't the way forward.

More discussion later; I'm being told I've played too much today!

@balthisar
Copy link
Member

Today's report

regression_test.rb is really becoming quite stable, and now handles special cases like --write-back and --gnu-emacs in ways that I think are acceptable. Because of issues trying to capture SDTOUT on Windows, all of the testing has to be disk-based instead of memory-based, especially if we want to remain backwards compatible with previous versions of Tidy (it works -- with lots of errors -- on Apple's built-in 2006 version of Tidy!). However disk-based testing is pretty slow, especially on Windows. On the other hand, what's a few more seconds of our lives?

The whole thing is organized enough, easy enough to use, and thorough that I'm considering a merge in time for Tidy 5.2.0 release. However it's a good idea to rebase the current branch into a new, fresh branch before merging, because all of the non-released development cruft can be left behind.

This leaves a question about the existing scripts and test files and so on. They're now redundant, and I'd love to get redundancies out of the system. Mostly I'm referring to the input and testbase files, because maintaining multiple copies as things change can be a PITA. I'll see about migrating the existing testing scripts into "benchmarks" using the new file organization, that way at least we still have the choice of which actual testing scripts to run. I might need help with the Windows scripts, but the sh scripts should be no problem.


Going Forward

Currently for comparing the error output we're completely limited by strings, and this is going to hurt us for checking backwards compatibility. For example I corrected several strings in the localize branch, and this will cause a lot of tests to fail. While we can regenerate the regression test expectations files for the future, this means that testing older versions of Tidy will fail due to language changes. This isn't something that we had to worry about when we only cared about the exit status.

How we deal with this in the future is really the subject of some intense discussion. Here are a few thoughts I had while just brainstorming:

  • Create a custom testing localization
    • Starting with Tidy 5.2.0 we can include en_TY which includes all of the static errors strings that we will never, ever change. Never. Ever. Except to add new strings, but then they'll never change, either.
    • Old versions of Tidy accept but don't use the --language option, and so specifying --language en_TY with older Tidy won't do anything, and using it with localized-capable Tidy will force it to use the never-changing legacy strings.
    • Because of how the localization inherits from a base-language, the impact to the executable size can be really quite small.
    • Thus we're assured that if we change warning and error strings in the default English, it won't affect the results of regression testing.
  • Create a --test-output option
    • This will force Tidy to prepend all warning/error output with a header line, e.g., TY_DIAGNOSTIC_OUTPUT.
    • Prepend each line with something like TY_MESSAGE=[ {line}, {col}, {type}, {msg_num}] # regular output
    • And `TY_STATUS=[ {warnings}, {errors}, {exit status} ] # regular output
    • However this would forever break our ability to test with older versions of Tidy to check, e.g., what happened in the past.
    • We would never have to worry about exact strings.
    • Writing testing scripts becomes a bit more difficult (most of regression_test.rb is user-friendliness; it's not really a complex script in the core).

I'm strongly favoring the first option.

@balthisar
Copy link
Member

I've determined that Tidy's error outputs are unique numbers. Once localization is merged, I think I'll revisit this. In order to accommodate extra localization features (such as mssgreportfilter3 using opaque types) I might export the two enums.

Then we might capture tidy's error output opaquely, but only going forward. We won't be able to run regression tests with older versions of Tidy.

@alvestrand
Copy link

Pardon, but I have a hard time following ... where are the tests that one can run at the moment?
I wish to contribute some bug reports, but first I'd like to test that my setup passes the current set of tests, whatever they are.....

@geoffmcl
Copy link
Contributor Author

@alvestrand at this moment testing is under HEAVY development in a new repo - tidy-html5-tests...

We are getting there, and shall hopefully shortly add a big notice here... sorry for the delay...

@geoffmcl
Copy link
Contributor Author

@alvestrand things have massively progressed in the testing repo tidy-html5-tests, thanks largely to @balthisar, the latest being in the refactor branch of that repo... . You can forget 90% of the discussion above, since so much has changed since then...

Most of the readme instructions have also be updated once you checkout that refactor branch, but to state it simply, assuming unix, but it is very similar in windows, and assuming the clones are into the same root folder, and using the default repo directory names...

  • Clone latest tidy from tidy-html5, master branch, currently version 5.1.42
  • Build the latest tidy 5.1.42, in build/cmake. There is no need to install it. Check ./tidy -v emits 5.1.42
  • Clone latest tests from tidy-html5-tests, and checkout refactor branch.
  • Change to tools-sh folder, and run ./testall.sh ../../tidy-html5/build/cmake/tidy. Should be no errors reported.
  • Compare the results - diff -ua ../cases/testbase-expects ../cases/testbase-results. Should exit cleanly - no diffs!

If both testall.sh and diff report no errors, you have successfully run some 224 regression tests. If in windows, use the tools-cmd, and run alltest.bat path\to\tidy.exe, and the same diff... Advise if you have any problem...

Now you are ready to make changes in the tidy source... do fixes, add features, changes, what ever..., and at any time you can run the above test suite again, using your new tidy, to make sure you have not triggered some regression problem.

Of course, if your tidy changes changed some message strings, or the order of warning output, then you can expect a difference, but any other difference should be investigated very carefully.

And of course even then, such difference may in fact be an improvement in tidy, and when your changes are accepted, merged into tidy, version updated, then a final step would be to also update the particular tests expects to reflect that change. In effect the source repo and the tests repo will march forward, version by version. Mismatches can produce many differences...

This testing repo will only function from 5.1.42 onwards. No attempt has been made to provide the correct expects output to earlier versions. But why would you need that?

We shall now steadily add new tests, and variations on tests, like with different configs, etc, but this is starting to feel very solid.

Now we are still testing and refining this regression test process, hence the use of a branch, so your assistance in reporting any problems, ideas, etc would be very much appreciated... thanks...

@geoffmcl
Copy link
Contributor Author

@joeheyming, @mcepl, @alvestrand, as discussed above, thanks to @balthisar, the testing has been moved to its own tidy-html5-tests repository...

The READMEs there should clearly explain the process. For the moment it is still in a refactor branch of that repo, but hope that will soon be merged back into master.

Also for the moment we have returned to using batch scripts for Windows, in the tools-cmd folder, and bash scripts for unix, in the tools-sh folder...

But basically, after you have built tidy, from this source, in a tidy-html5/build/cmake directory, on the same level as the tests repo clone, that is in the same root, you should be able to run alltest.bat in tools-cmd folder for windows, and run-tests.sh in the tools-sh folder for unix, and see the results of presently of about 225 testbase tests...

There are other options, like exactly which tidy exe to use, and other tests... consult the README for these...

So any further discussion on testing should be in issues there, and of course forks and pull requests are always appreciated...

We could really do with some help expanding and refining these tests...

Meantime, closing this here... But hope to see you on the other side ;=))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants