-
Notifications
You must be signed in to change notification settings - Fork 430
Tidy needs some test procedures! #330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is a huge engineering task in its own right. We could start by defining some specific, stated goals for testing. It's overly broad to state the result is that Tidy works properly. Tidy was written was long before unit testing was popular and proper unit testing would require refactoring a lot of Tidy code. Some simple test cases could be made for the major services without too much refactoring, though. I'm not certain that unit testing will tell us enough about Tidy as whole, though, and so some type of integration or functional testing is the real requirement. Integration testing is hard without being able to trust that the individual units are always working, and so we're back to unit testing. Regression testing is the only type of testing that's performed now, but as @geoffmcl mentioned the inputs are often decoupled from the option settings and so if we can take those into account we could turn this into a kind of functional test. The output compare should consist of two things: does the tidy'd output match the expected output? I think we have most of that in testbase. Capturing the message output with text as-is could be dangerous because we're also looking at localization and/or English messages could change. We could use the raw data from one of the message callback filters, though, and record just the error code and location data. Do we all want to learn Python? Ruby? Make it non-portable and require bash? Or maintain something in Tidy's native C? The issue of configuration differences isn't small, either. Given what I think are 96 different options with possible option values, do we run each test with all possible combination of options? Or do we introduce the possibility of human error and try to figure out the best options on a case-by-case basis? I supposed we could build an program to build the tests. Given each test case we'll programmatically adjust every option, capture the output, and compare everything against everything for differences, but this may still give us a hundred or more variations of the same test depending on options interacting. My head hurts thinking about it. |
I've weeded down the possible formatting options very, very loosely and without too much thinking to 62. Most of these are binary, but some aren't, but even if they were all simple yes/no, that's still 4.6x10^18 possible output variations for a single document. Granted a lot of them won't be different because a lot of test cases don't include (e.g.) repeated attributes or php tags or things like that. That 62 discounts things like file encoding, verifying the correct number of tabs/spaces are used, etc., making sure wrap works at 80 vs 68, etc. |
Looking at what is currently in the test/ directory. I think that MS bat files are not a good idea since that is not portable. Any other cross platform scripting language would be better than this (and may encourage others to write more tests) I think the only option we have at the moment is to write black box tests. I propose that we first get continuous integration running (Travis CI) so that for any merge to master, tests get run. This will force us to start thinking in a cross platform way. Then start converting the existing tests to run in the CI suite. Then we can start talking about unit tests. |
@joeheyming thank you for starting to look at this... The original tests, still all there, are in unix shell scripts, which are also not too portable. Charles and I added the MS cmd/bat files to mirror that unix system, and in some ways added a little to it... It is correct to separate, for want of a better word, the current regression tests, from any more formal unit testing. That is a different topic, maybe So just concerning the tests we have. How can these be dealt with, in a cross platform way? The suggestion is first we get Travis CI running, then we will start thinking? HUH? I would suggest we start think before we automate the tests ;=)) Get some aims? Know what we trying to achieve... fix, get them clean... then, and only then automate, if need be... chicken and egg, no more like putting the cart before the horse... The present aim of the first 228 tests in testcases.txt, is to compare the exit code with expected, and compare the output with the testbase output. Uses the ../build/cmake/tidy binary, and processes each case to an output. They do not presently do the last step of the diff between the output, and the testbase folder. That must be done manually... but could be added easily... Is this sufficient? Does this give us what we want? Well yes, I think so. It quickly tests some things for change, but only some things, and this should/could have been done before it was pushed to If you want, need to experiment, do it in a branch, and make sure you run these tests before offering a PR. So I would like help, discussion, ideas on getting this working now, in as cross platform way as possible. Including maybe adding some more documentation on how to run them, what to expect, etc... see test/README.md, maybe add to CONTRIBUTING.md... Concerning Travis CI, I would like some more experience with this before attaching it to htacg tidy, even if we want that. But I do need some help setting up my fork. I did experiment a little a few months ago, but did not get very far... Does someone have a I definitely want to experiment, see what is possible, before maybe recommending it to htacg tidy. It should first use what we have. And new, different things should be thought about, discussed, setup, experimented with, etc, all before other tests be considered, added... So there are few issues here -
I look forward to help, ideas, patches, PRs, on these topic... thanks... |
If I have any time to look at this, I would probably investigate using python (or maybe perl since it has good text manipulation abilities). I think python is safer/hip'er, but may have more leg work to get the right set of test utilities. The easiest way to write a python test would be to use the unittest module. That framework provides a junit style way of test setup/teardown. Given that the test that are written adheres to that framework, you can then build test tools that are common to setup/teardown. Think a test/lib/ (or tools) directory. I agree with the horse before the cart thing. I was actually thinking that exact phrase after I sent my earlier comment :-p Travis.yml is simple. Given that you can run any test from the command line, you can make travis invoke those tests (glob invoke) and succeed/pass if their return codes are sane (nonzero). I wrote some unit tests even for lisp with travis. Here is an example in one of my python projects: https://github.com/joeheyming/redef/blob/master/.travis.yml |
@mcepl suggests another neat target, see 266 of And re-reading all of #266 gives some discussion on difference when test are run on different This is issue # 1. Look at, discuss, fix, repair the current set of tests. |
Well, I've posted a In this case I'm not looking at any of the exit codes; I'm only comparing the error output and the generated HTML. There are still some files that don't pass; I've not root caused them completely but probably relate to the comments @geoff made above. To have a try, check out the Running the script without arguments will provide all of the instructions needed to get started. As of right now, generating new test result files isn't implemented, but I'll work on that as time permits. However all of the regression testing functions, and I think the report that it prints out (and saves) is full of useful information. I've duped the test cases into In addition to @geoffmcl's standard test cases, I've also moved in all of the other test cases, and where required, generated config files for each of them (I've still not added the expectations files, though). I think it should be trivial to code around some of the platform differences, or at least provide an explanation for those specific cases. Why Ruby? Because it will run on any OS unlike bash or cmd. Ruby also has a lot of power for a scripting language, meaning I could deliver something a little better than "bare bones" in a limited amount of time. Ruby's fairly easy to read, and I used my overly-verbose commenting style throughout, so there should be few surprises. Things to consider in the testing so far:
Also to-do is implement all of the expect files for the other, non-core tests. I'll update the Adding additional test cases is very easy by following the naming conventions. Just drop the files into Finally I think I've made it as cross-compatible as possible. I'll install dev environments into my Windows and Linux VMs just to double-check, but I'm pretty sure I covered all the bases. |
Now the I may look at some convention to fix path issues and look at processor-specific things, too, although I think #266 is a bug, not a processor feature. I'm definitely getting some type of encoding issue on case-427664 that I have to troubleshoot. Right now the testing should be 100% backwards compatible barring differences in string output. String output kind of sucks. If we're willing to abandon backwards compatibility, it would be much nicer to add a new flag to tidy to present only raw data for error output, such as error code, line, column, instead of the English text. Also similar my earlier comment about standardizing on It depends on what we risk losing by dropping this. |
He will not. Sorry. Too much on my plate already. |
Case in point re: the above. Some of the sample tests are failing because I changed some non-standard English to more standard English, and the literal string comparisons are failing. I won't reverse the string changes just to get it to pass a test, but this is a good argument for eliminating the English output altogether with a (e.g.) --diagnostic switch on the command line whereby only error codes are reported instead of strings. I'm just thinking aloud as I encounter the issues... |
Final report of the night. There's a huge encoding difference, but I'm not sure if it's due to Ruby, or a Mac OS/Windows difference. On my Mac running all of the tests these are the failed cases with explanations:
I've still not installed a dev environment on Windows or Linux, and haven't begun to dig into Tidy's internals or Ruby's, and I've not yet begun to dig into the source file formats (for example, case-661606 is Western Mac OS Roman, a really old legacy format). |
@balthisar this looks like a great effort... thanks... it is good someone else begins to see the current After jumping through a few hoops got a working Ruby, installed, and running, my first in Windows - chose version 2.1.7 (32)... But could never get the tidy tests to run... the problem seems to be the setting of the path to the tidy.exe to use... So switched to Ubuntu 14.04 LTS to give it a try there... well it had no trouble using an installed tidy binary (
But then I wanted to use other than an
All I get is the But I realise this is a first cut, and lots of time, effort, thought, and testing... and can not wait until maybe it matures a little... thanks again... this seems another step down an interesting road... It does add a Windows Ruby dependency, not normally present, but OTOH it does not seem too extreem to expect some tidy developers to take that extra install step, if they get better, more thorough regression testing, and perhaps other testing... I have been experimenting with an Test-Tidy app, and must take the time to look closely at the mis-match cases I found there, and see if they compare to those you mentioned. Of course that is other than those where the you have changed the string, if that is the only difference... My list from that testing is 445557, 500236, 647255, 649812 and 658230! I would really like to understand, and do something about these |
Hmmm... thanks for trying it out. I've got a working Ruby on Windows now... let me see what's not working. First off, apparently Windows doesn't know what to do with shebang lines, so it's necessary to run I'll move to Linux next. I think, though, that the command line interpreter wants to see the command "rtest" before the options. I'll clarify that in the help. |
Latest commit now verified to work on Windows. I still have to spool up a Linux VM to find out what the issue is there.
Show the correct paths and version strings on my system (the first from Tidy in $PATH, the second in the build directory). It also runs through the regression test and canonization process without issues. Windows test failures (5.1.25)
Mac failures (5.1.24)
I'll have to dig into the differences between the failures after performing a Linux test. I'm guessing these are related to character encoding, though. |
Why not to make Travis-CI working? |
Because I know Linux quite well, and don't know a thing about Travis-CI. For my own projects' CI needs, Mac OS X Server has a perfectly capable CI solution that works with github, so I've never bothered to look at third parties. Still, we don't know my experimental solution will be adopted, and so one step at a time. ;-) |
Not sure what to think about all those missing test files (that's 868c80c on this repo, branch |
Test cases are present, but there's nothing to test them against -- yet. Aside from the standard test cases, there are all of the accessibility tests, and the old html4 and html5 tests. They all used to use their own testing scripts and test output separately. I've generated config files for all of them based on the old test scripts, but I've not yet generated the files to compare the output against. I'll have to go through all of that manually to ensure that they work as expected. For now the output warns you that, hey, these files can't be tested because of the missing files. |
@balthisar ok it now runs in Windows! Thanks ;=)) But in unix I get the same error as @mcepl!
Concerning version of tidy in the tests, maybe line 540 in regression_test.rb is the problem But what have you added to these tests? It seems you have now added the contents of the test\html5 folder, and sub-folder??? These are not I did think about formalising them, but as the 20140805 README.txt I wrote says it, was more about comparing to tidy 2009 output, than comparing two modern current versions of tidy... I have not used them for ages... they were just for development... they perhaps should have been deleted... I certainly do not think they should be added to your Now if you want to add more tests then I have been keeping a Be aware it includes files for the re-testing of some of the old SF bugs, some And have not checked, but what about the list in But I do not think we should be looking to expand, increasing the tests at this time. As stated, it seems the first priority is to achieve a clean run, in any/every OS/CPU, on the Access TestingNow part of that base testing was to do the access tests. Although I have not checked carefully or fully yet, it seems the access tests do need some TLC - see #338, which was reported so long ago, and still exists within tidy today... Of course these access tests do NOT compare the output with another Maybe this could be another test type like Missing filesNow you mention missing And then of course there is 431958, which I only 'discovered' recently! It uses Then there is case 431895! The only test to use As repeated, I hope we can go back to just And maybe consider something different for Just my 10 cents ;=)) OT: While I am not suggesting Travis CI for this htacg, at least not yet, I am trying to experiment with it from my fork. @mcepl do you have something running in Travis CI? See https://travis-ci.org/geoffmcl/tidy-fork... Seems I need |
Plenty, but I am a Python guy, so not sure how helpful to a Rubyist. https://travis-ci.org/mcepl/ |
@mcepl, hey, I am not a Rubyist ;=)) Nor am I a Python guy ;=(( I am a C/C++ coder, and only use scripts where I really have to! And even then usually only until I write some C/C++ to do the same thing... usually not because it is faster, or easier, or anything, ... but because I love C/C++ coding... But forget 'testing' for now... can you setup your fork to just build on Travis CI... that is clone the repo, cd build/cmake; cmake ../..; make;... that would be a great start... thanks... |
mcepl/tidy-html5@2d04638f40 seems to work, but for problems with Travis CI having too old Of course, all that |
@geoffmcl, I'm not home and thus spent much of yesterday trying to download an Ubuntu image and getting the VM to work through a proxy. Linux will be fixed soon! I would argue that the HTML4/5 files are regression tests. While it would take a pretty major effort to cause a regression in Tidy, it's something that's possible. Maybe you hey could be condensed into a single file, though. I can write a new method for the access tests, but my thought was to treat them identically to the other regressions. Matching a single string is already encompassed by the regression test, so it can be convenient just to issue rtest and be done with it. A simple tweak can introduce traversing directories, so maybe I can at least break up the cases into categories by directory. Right now the report generates a lot of noise due to the missing comparison files. The html4/5 tests and access tests all pass 5.1.24, and so I think it's safe to use canonize on them that will squelch the missing file warnings and they'll be tested. I'll make sure to capture non-forced output. I thought I already handled that case but will have another look. Oh, and the XML is already included, testing, and working! |
@mcepl thank you... I have dropped that cmake version back to 2.8.7... There was no real reason to have a minimum version of 2.8.8... and it could be even lower... tidy CMakeLists.txt makes no use of cmake feature since probably 2.8.0, and even probably LONG before that... maybe even say 2.6... I did not understand the install: target comment... We would only need to Can you try the build again? thanks... |
@balthisar yes, I can understand my Removing them would reduce a Hmmmm, I do not think No, the access tests were designed to output a specific access value, and the tests success or failure depends on that alone. Not on what else was output... or have I got the wrong idea here? Maybe you do "capture non-forced output", although I do not quite understand that... do you want the ruby script to read and parse the config to see if But you seem on a roll with this, so no negative thoughts intended... go for it ;=)) Glad the xml tests were included... and good luck with the Ubuntu VM install... Oh, I have not yet understood what to |
The option canonize will generate the -expects files. For example the access tests currently pass the current testing script. Therefore their current output is acceptable to use for future comparisons, and so canonize would generate those files. The current strings are what complicate the testing procedure. Either they never change, or we stop counting on them. A hacky workaround: I restore the old strings, call the old version of English en_legacy, and use that as our test language. It guarantees backwards compatibility, and because of string inheritance making new en strings won't add much to tidy. This addresses access test concerns, too. The ruby script doesn't have to check the config. There is at least one zero byte file. I'll confirm it's from non-forced output (mobile phone now). When the script generates zero bytes and compares it to a zero byte file, it matches. |
I've gotten my VM to match my home VM: Ubuntu 14.04 LTS, all updates applied, Ruby 2.2.1p85 installed. If you used apt-get you probably got an older version of Ruby (1.9 or so), whereas RVM nicely gave me modern Ruby. Upon cloning the rest repo and building and installing Tidy, the regression test ran perfectly! I'll have RVM install an older version just to see what happens. In the meantime I pushed what I think is a fix for older Ruby, which doesn't include the Let me know if anything else seems broken on Linux! |
Well, I hate to be spamming the list, but it's the strangest thing... Ruby 1.9.3p551 has no problem with the DateTime, either, even when not explicitly requiring the library. @geoffmcl, can you give me a ruby --version for your system? Can you also try |
My branch I guess @balthisar should now take it and run with it. |
I'll try to have a look tomorrow, or during the weekend (hopefully some unblocked time available!). |
@mcepl thanks for your Travis CI lead ;=)) I just needed a working sample... I think the Ruby bug seems gone in the latest... you need to rebase your fork, specifically the I have not yet tried the Ruby road, but was able to do the previous regression tests without too much trouble... Used a travis.yml file of -
This gets me back to seeing the problem tests I am concerned about solving, namely -
But wow, in general this is not something I want run each time I do a push... maybe there is a way to have a specific single trigger, like pushed a version.txt file change only, or something... Quite fun initally, but frustrating to debug (yaml/compile/versions/...) problems... and too slow to be any real dev cycle use - after a push I could fall asleep waiting for the vm to start, clone, setup, ... ;=)) And it seems a Will continue to experiment, sporatically, now and again... and I notice they have an osx vm ;=)) thanks... |
I don't know ... https://travis-ci.org/mcepl/tidy-html5/builds/98488955 claims it took 28 sec. If it is too long for you, you have really trouble with patience (yes, I know it is more than that). |
Test suite reporting and test case organization is coming along nicely. It's a slow process, but I've encountered a frustrating Windows issue (I've not looked into it yet; I don't have a Windows C dev system installed; suggestions appreciated). In Windows Tidy is writing -- sometimes, but not always -- the error output to stdout with extra newlines. If you run the current batch of tests, you see it in some -- but not all! -- of the -fails.txt messages. It's not a ruby issue. If I use Interestingly redirecting stderr on Mac OS and Linux give Windows CRLF pairs, while -f uses the $20 on Mac and $0A on Linux. For current Mac OS I'd regard that as a bug, as it should use the standard Unix LF. This is the latest release 5.1.25. I've not browsed the diffs on newer versions, and I'm not filing this as a bug report yet, as I plan to have a look at it first. As for the testing suite I was hoping to avoid file access and do everything in memory (capturing stderr) for speed, but I'll simply use a file, then we'll be assured that the test is still backwards-compatible with any potential bug fix. |
Current status: after refactoring quite a bit, file comparison is much more sane, and pretty much every tests passes. I've still got to fix the few that aren't tested (should be easy to manually canonize them and verify). There are (so far) some platform issues mostly due to encoding, as well as a string that changed not too long ago. I can get all of the current tests to pass, it will give us some pointers for issues on Win/Linux/Mac, but string testing isn't the way forward. More discussion later; I'm being told I've played too much today! |
Today's report
The whole thing is organized enough, easy enough to use, and thorough that I'm considering a merge in time for Tidy 5.2.0 release. However it's a good idea to rebase the current branch into a new, fresh branch before merging, because all of the non-released development cruft can be left behind. This leaves a question about the existing scripts and test files and so on. They're now redundant, and I'd love to get redundancies out of the system. Mostly I'm referring to the input and testbase files, because maintaining multiple copies as things change can be a PITA. I'll see about migrating the existing testing scripts into "benchmarks" using the new file organization, that way at least we still have the choice of which actual testing scripts to run. I might need help with the Windows scripts, but the sh scripts should be no problem. Going ForwardCurrently for comparing the error output we're completely limited by strings, and this is going to hurt us for checking backwards compatibility. For example I corrected several strings in the How we deal with this in the future is really the subject of some intense discussion. Here are a few thoughts I had while just brainstorming:
I'm strongly favoring the first option. |
I've determined that Tidy's error outputs are unique numbers. Once localization is merged, I think I'll revisit this. In order to accommodate extra localization features (such as mssgreportfilter3 using opaque types) I might export the two enums. Then we might capture tidy's error output opaquely, but only going forward. We won't be able to run regression tests with older versions of Tidy. |
Pardon, but I have a hard time following ... where are the tests that one can run at the moment? |
@alvestrand at this moment We are getting there, and shall hopefully shortly add a big notice here... sorry for the delay... |
@alvestrand things have massively progressed in the testing repo tidy-html5-tests, thanks largely to @balthisar, the latest being in the Most of the readme instructions have also be updated once you checkout that
If both Now you are ready to make changes in the tidy source... do fixes, add features, changes, what ever..., and at any time you can run the above test suite again, using your new tidy, to make sure you have not triggered some regression problem. Of course, if your tidy changes changed some message strings, or the order of warning output, then you can expect a difference, but any other difference should be investigated very carefully. And of course even then, such difference may in fact be an improvement in tidy, and when your changes are accepted, merged into tidy, version updated, then a final step would be to also update the particular tests This testing repo will only function from 5.1.42 onwards. No attempt has been made to provide the correct We shall now steadily add new tests, and variations on tests, like with different configs, etc, but this is starting to feel very solid. Now we are still testing and refining this |
@joeheyming, @mcepl, @alvestrand, as discussed above, thanks to @balthisar, the testing has been moved to its own tidy-html5-tests repository... The READMEs there should clearly explain the process. For the moment it is still in a Also for the moment we have returned to using batch scripts for Windows, in the But basically, after you have built tidy, from this source, in a There are other options, like exactly which tidy exe to use, and other tests... consult the README for these... So any further discussion on testing should be in issues there, and of course forks and pull requests are always appreciated... We could really do with some help expanding and refining these tests... Meantime, closing this here... But hope to see you on the other side ;=)) |
This is an important topic... and needs a champion to solve it!
Several poster have offered some parts of the solution... please read them... 1, 2, 3, and more...
At this moment @vielmetti seems to have the
best
solution for this, but also like others test solution, certain issues still need to be resolved...Really we do not want to replace one lame solution with another lame solution, not that the TAP idea is
lame
! But what do we gain here?At present a
test.t
requires that we re-write each test! At what cost for each new test? Wow, is that really necessary? As expressed can we not leverage the files intest/input
, andtest/testbase
?And it does not yet seem to address that certain issues that need more than a simple exit value test. It needs to compare the previous known tidy output, both the 'message' and the 'html' output, to see if something has changed! And that can depend on the the options used.
The current 327 seems such a case. It seems solved until you add the option
--show-body-only yes
, when the warning is erroniouly still shown. It should fail with this option. How do we deal with this?Really we need help on this. Tidy needs a solution for this... but I am sorry, I do not see it yet!
The text was updated successfully, but these errors were encountered: