change_log.txt

Change Log:

This document lists all changes and refactoring made that either:
add new features, break the old API or fix known bugs.  There will
often be many more source code changes that aren't listed if they don't
change the behavior in any public API classes.  The order of changes is mostly
chronological so the most important changes may not always be ordered first.

================
Jillion 6.0.3
================
Bug Fixes
-----------
1. GappedSequenceBuilder now allows insertions at the end of reference sequences

API Changes
-----------
1. ResidueSequenceBuilder - added extra insert/append/prepend methods that were on sub-interfaces
2. GappedSequenceBuilder now works on any `ResidueSequence` not just `NucleotideSequence`s.
3. Added interface ReverseComplementable to NucleotideSequence
4. added interface Complementable to Nucleotide
5. added ResidueSequence#reverseIterator() and ResidueSequence#computeUngappedSequence()
6. added ResidueSequenceBuilder#appendGap()
7. GappedReferenceBuilder changed to use any ResidueSequence not just NucleotideSequence.
8. Cigar.Builder now has a buildMerged() method which combines consecutive CigarOperations.
   For example `3M3M` will be built into a Cigar String of `6M` using buildMerged.

================
Jillion 6.0.2
================
Bug Fixes
---------
1. ACGTN and ACGT Only NucleotideSequence#getNumberOfGapsUtil was returning wrong value.

API Changes
-----------
1. Added ResidueSequence#getUngappedOffsetForSafe() and ResidueSequence#toUngappedRangeSafe()
   which will not throw index out of bounds exceptions if given parameters go beyond sequence length.
 
2. Added new class SamVisitorFunctions which has Factory Methods for easy visitor implementations.
3. added INucleotideSequence#getLeftFlankingNonGapOffset() and Right flanking 
   offset and new expanding and contracting flanking Range. 
   
4. added INucleotideSequence#createRightFlankingNonGapIterator(start) 
   and INucleotideSequence#createLeftFlankingNonGapIterator(start)
   
5. Added new FastqDownsampler interface to downsample fastq files and FastqDownsamplers class
   with various algorithm implementations.
   
6.  FastqWriter#write(FastqRecord[]) and FastqWriter#write(FastqRecord[], begin, end) methods to bulk write arrays.

7. Added FastqParser.iterator() and FastqParser.iterator(FastqVisitorMemento) which returns 
   new class FastqSingleVisitIterator.
   
8. FastqSingleVisitIterator a new Iterator that lets user visit fastq record one record at a time controlling
   when the next call to visit will occur.
   
9. New FastqDownSampler interface with implementations in FastqDownSamplers. 
   Works on single fastq files and paired-ends fastq files.

10. added PeekableIterator#advanceIf( predicate) and PeekableIterator#advanceWhile( predicate)

11. SamTransformationService and SamGappedReferenceBuilderVisitor now can take provided NucleotideDataStores
    for the references and will lazy load making the gapped refererences when it first encounters a read that
    aligns to that reference.  This allows mapping parts of the human genome without haivng to load the whole thing!

12. Added RangeMap#forEach

13. Moved NucleotideSequence#findMatches() method to new `MatchableSequence` interface

13. AssemblyTransformer#refOrConsensus() now passes an `INucleotideSequence<?,?>` instead of `NucleotideSequence`

14. added INucleotideSequenceBuilder#toBuilder(initialCapacity) for when you want to create
    a new builder with the same sequence but with larger capacity then the current seq length.

14. GappedReferenceBuilder, SamAlignmentGapInserter and related classes now use INucleotideSequence interface
    instead of NucleotideSequence class

15.  Added Some Jackson annotations to NuclotideSequence and Direction so Jillion can be used
to read and write JSON and YAML formatted data with Jackson without needing additional software.

16. SamTransformationService now has INucleotideSequence and INucleotideSequenceBuilder generics added to the class
    signature.  These generics refer to the type of the REFERENCE used.  Constructors made private and new equivalent #create() factory methods
    should now be used which handle the creation of the new generics added.

 17. Added additional factory methods to SamTransformationService to use different reference Datastores other than
 normal fasta files.

 18.  SamRecordFilter#ungappedReferenceDataStore() now takes a DataStore<INucleotideSequence> instead of NucleotideFastaDataStore

Performance Improvements
------------------------
1. Some internal methods implementations were rewritten to be easier to maintain and improve performance.
2.  performance improvements to calculate flank offsets. This improves performance of applications
    that use Jillion with general assembly/gap heavy calculations by about 10%.
3. GappedReferenceBuilder which is used by Sam and Cas Assembly Transformer and Gapped Reference Builders
   will now use a sparse matrix to keep track of gap insertions if the input reference is very large
   (currently > 1Mbps)


================
Jillion 6.0.1
================

Bug Fixes
---------

1. Changed VariantNucleotideSequence#getTriplets() to examine the underlying read combinations even
   if only one slice has variants.  6.0 would only do this possibly more computationally intensive operation if
   multiple slices had variants.  This fixes a bug where a mis-assembled read that doesn't span the whole codon
   should not be included as a variant triplet.
   
================
Jillion 6.0
================

Performance Improvements
------------------------
1. Performance improvements on uncompressed NucleotideSequences if they are only ACGT or ACGTN
2. Performance improvements parsing and iterating over large Fasta files
3. Improvements for various internal functions that used gap offsets that previously used boxed 
	Integers but could be replaced with primitive int iterators.

New Features
------------

1. Moved to Java 11
2. Added lombok support
3. Added Vcf support
4. Added ThrowingSupplier
5. Added Support for XZ compressed files in InputStreamSupplier
6. New methods on NucleotideSequence to get ranges of Ns and Percent Ns.
7. Added Tar Support for InputStreamSupplier
8. FastaFileParser and FastqFileParser can now correctly parse compressed archive formats such as "tar.gz". 
   Assuming we only want the first file entry to be parsed.
   This means that fasta and fastq visitors and datastores will now seamlessly works on tar.gz files.
   
9.  InputStreamSuppliers of formats with multiple entries can read a specific entry instead of just the first.

API Changes
-----------

1. ResidueSequence is now Comparable and the default implementation compares toString() values.

2. new NucleotideSequence creation methods that take single Nucleotide objects.

3. ReferenceMappedNucleotideSequence now has a new method computePolymorphisms() which is similar to
   the already existing method getDifferenceMap() except it goes further and denotes insertions vs deletions 
   and supports consecutive differences grouped together.

4. new method NucleotideSequence#isAllGapsOrBlank()

5. new method NucleotideSequence#isAllNs()

6. new method NucleotideSequenceBuilder#ungap(Predicate<Range>) which will only ungap the 
   passed in gap ranges if they pass the predicate.
   
7. new factory method NucleotideFastaRecord.of(File) will parse the given fasta file and return the first
   record.
   
8. New factory method NucleotideFastaRecord.createNewIteratorFor( File) will parse the given fasta file and return 
   a StreamingIterator to iterate over each record.
   
9. New methods on TranslationTable to take more options a new TranslationOptions object was created to help reduce
   the explosion of new methods to handle every combination of flags.
   
10. New methods on NucleotideSequence to get ranges of Gaps 

11. New feature to InputStreamSupplier allowing nested decompression to support getting the uncompressed stream
    from an tar.gz record for example.
    
12. new method InputStreamSupplier#get(InputStreamSupplierOptions)  which allows for more easily setting
    the different possible options to fetching an inputStream now including start/length and nested decompression
    without having the number of methods explode.  All previous get() methods with the the different parameters
    are still present for backwards compatibility.

13. New InputStreamSupplierRegistry to add custom InputStreamSupplier implementations at runtime.  Implementations
    must implement the new org.jcvi.jillion.spi.io.InputStreamSupplierFactory interface.
    
14. New constructors NucleotideSequenceBuilder( NucleotideSequence, Range),  NucleotideSequenceBuilder( NucleotideSequence, Range...) and NucleotideSequenceBuilder( NucleotideSequence, Iterable<Range>)
    to support more efficient creation of builders that only contain partial ranges of a sequence
    in a more efficient manner than performing multiple trim operations.
    
15. New method NucleotideSequenceBuilder#append(NucleotideSequence, Range) to append sequence in 
    a more efficient manner than performing trim operation.
    
16. New method ResidueSequence#hasAmbiguities()

17. New method ProteinSequence#computePercentX()

18. added query shift amount to pairwise alignment builders.

19. added new helper method DataStoreUtil#asDataStoreEntryIterator( StreamingIterator<T>, Function<T, String>)
to wrap a StreamingIterator into a DataStoreIterator

20. SplitFastaWriter objects are now synchronized by default.

21. FastaFileDataStoreBuilders now have onlyIncludeIds(Set<String>) which similar to filter( Predicate<String> ) 
    except we know how many ids there should be so we can use this information in ITERATION_ONLY datastore implementations
    to exit the parsing early if we've already found all the ids we care about but have not yet parsed the file.
    
22. New method ProtienSequenceBuilder#copy(Range) to match similar method in NucleotideSequenceBuilder

23. New method NucleotideSequence#computePercentGC()

24. Added Range#startsAfter( Range) and Range#endsAfter

25. Added method NucleotideSequence#hasGaps() and ProteinSequence#hasGaps()

26. ProteinPairwiseSequenceAlignment now extends ProtienSequenceAlignment

27. NucleotidePairwiseSequenceAlignment now extends NucleotideSequenceAlignment

28. SequenceBuilder now has a delete(Range...ranges) with a varargs of Ranges that will handle deleting
    multiple Ranges at once correctly.  The method is implemented with a default.
    
29.  SingleThreadAdder is now comparable and implements equals and hashcode

30. ArrayUtil IntegerArrayList now has new method intIterator() 
    which returns an PrimitiveIterator.OfInt introduced in Java 8.
    
31.  TranslationTables now also translate sequences with Uracil.

32. Cigar.Builder now has a trim(Range) method to add further trim the sequence with soft clips.

33. Added Cigar#toBuilder() method.

34. Ranges methods now take Collection<? extends Rangeable> instead of Collection<Range> for greater usage.

35. Range#complement(List<Range>)  and Range#union(List<Range>) are now 
    Range#complement(List<? extends Rangeable>) and Range#union(List<? extends Rangeable>) for same reason.
    
36. New method Cigar.Builder#trim(Range) will update a Cigar to clipping operations to
    make the Cigar String a soft clip beyond the given valid range.
    
37. New VCF Domains Specific Language (DSL) for creating VCF files.

38.  Added Reserved VCF Info and Filter objects based on VCF 4.3 spec.

39. Added new DecodingOptions class to use inside NucletoideSequenceBuilder to add more configuration options
    for invalid character handling and other common string manipulation such as making all ambiguities Ns.
    
40.  GrowableXArrays added replaceIf(predicate, value) 

41. SamRecord and related objects are now Serializable

42. new class SamAlignmentGapInserter which performs the extra gap insertions for read alignments to convert SAM/BAM alignments into proper aligned contigs.

43.  Added #toArray() methods to NucleotideSequenceBuilder and ProteinSequenceBuilder.

44.  Added #getNumberOfXs() to ProteinSequence

45.  Added RangeCollectors class

46.  Added Sam and Bam parser visitor options to visit multiple Ranges for a given reference.
     Previously could only visit one Range at a time so multiple Ranges would have required multiple parses.
     
47.  Added Range#intersectsOrAbuts(Range) which returns a boolean.

48.  Added SingleThreadAdder#set(long)

49.  Added new class MultipleNucleotideFastaFileDataStore

50.  Added new IlluminaUtil#IlluminaName Matcher

51.  Added SamParserFactory#Parameters class and builder for parsing options including whether to use an 
     index file or not even if present (previously always used index which is now default).

API BREAKING CHANGES
--------------------

1. Requires Java 11

2.  TranslationVisitor will now visit all Codons not just the ones found between start and stop.  This 
    is to support translating partial sequences where the start codon is missing.
    
3.  TranslationVisitor now has new method visitVariantCodon(long nucleotideCoordinate, List<Codon> codons)
	to support variant sequences
	
4.  TranslationVisitor methods now have an additional long parameter to provide the start and 
    end coordinates (inclusive) of the nucleotides that contributed to the Codon.
5.  FastaWriter#adapt() adapter parameter now takes a ThrowingTriConsumer that consumes 
    the id, adapted sequence and comment instead of returning a new FastaRecord.  If the adapter
    decides not to pass on the adapted sequence to the delegate, the implementation should not call
    the consumer (previously it returned null).  this should improve performance and not require
    the FastaRecord object to be created as sometimes some implementations don't have easily
    accessible constructors.  This should also allow for chaining of consumers.
    
6.  SamRecord#getNextOffset() is now called SamRecord#getNextPosition()

7.  AssemblyTransformationService#aligned() and #unaligned() methods now have additional parameter `Object readObject` which is the 
    actual read object the transformer is transforming for you in the event you need to query the object directly (by downcasting)
    to get additional information in your transformer.
    
8.  BamFileParser will now call visitHeader() even when just parsing a specified Range
    
Bug Fixes
---------

1. Building a ReferenceMappedNucleotideSequence from a NucleotideSequenceBuilder 
   with compression turned off no longer throws a ClassCastException. 
   
2. ProteinSequenceBuilder#copy() and #copy(Range) now correctly account for gaps and ambiguities.

3. FastaWriters can now have their close() method called more than once without throwing an error.

4. Bam parsing only selected regions no longer throws NullPointerException on unmapped reads.

5. Bam Indexing improvements handling incorrectly formatted unaligned reads.

6. Fixed Serialization issues from caching performance improvements introduced in 5.3

7. Fixed SamRecordFlags#remove() methods which would accidentally ADD the flag if it wasn't already present.

8. NucleotideSequenceBuilder won't throw an exception when using a reference AND later inserting bases.  As of now inserting
   bases will clear the reference field to make it a non-reference based sequence.
   
9. SamRecordBuilder now initializes flags correctly. 

10. Virtual Offsets inside indexed bam files is now correctly computed even when jumping to particular index.

11. Truncated Bam files are now detected if file ends before index says it should.

================
Jillion 5.3
================
New Features
------------

1. Kmer Support - A new Kmer class was added.  Each Kmer instance has the kmer sequence
                  and the offset it came from.  NucleotideSequence and ProteinSequence now
                  have new methods Stream<Kmer> kmers( int k) and Stream<Kmer> kmers( int k, Range r)
                  to get a Stream of all the kmers of size k from either the whole sequence or a subRange.
                  
2. New Simplified way to read and write basic bioinformatics file formats and reduced boilerplate code.  New Classes and static methods
   on interfaces were added to turn common usecases into one single lines of code.  For example, iterating over a
   the records in a fastq file can now be done with a single line of code to get back a ThrowingStream<FastqRecord>.
   Previously, a FastqFileDataStoreBuilder had to be created, built, and the returned datastore had to get either
   its streamingIterator method called or stream records method called.  All that boiler plate is no longer required to be written
   by the user.   New classes and methods detailed in API Changes.  
   
3. New static factory methods to some trimmer classes to make trimming Traces easier will allow 
   the QualityTrimmer or NucleotideTrimmer implementations to take the entire trace object as input instead of
   trace.getQualitySequence() and/or trace.getNucleotideSequence().  Making it easier to read and write.  
   
4. Added new adapt( Function<Fastq, Fastq>) method to FastqFileWriterBuilder and static adapt method to FastqWriter that 
  can modify a FastqRecord before writing it.  Useful for abstracting away changing the record ids or performing additional trimming. 
  
5. FastqWriterBuilders and FastaWriterBuilders constructors that take File will now parse the output file's 
   extension and if it's "gz" or "zip" will automatically compress the output accordingly.  Currently does not handle nested 
   compressions or tar but those may be supported in future versions.

6. NucleotideSequence now supports Uracil.  It is possible to also have sequences with both Ts and Us since some
   therapeutics cataloged by the FDA have such sequences.

API Changes
-----------

1. ResidueSequence - Added more Generics to the the class signature to specify 
                     the sub-interface and builder classes used.  This change shouldn't
                     affect normal use of these classes but will cause some incompatibility
                     if you implement your own implementations of Sequences.
                     
2. NucleotideSequence and ResidueSequence now have emptyBuilder() and emptyBuilder(int capacity) methods
   that will return new empty NucleotideSequenceBuilders and ResidueSequenceBuilders respectively.
   
3.  StreamingIterator and DataStore stream() methods now return a new ThrowingStream which has extra methods
    that can accept functions/consumers that throw checked exceptions.  These exceptions are then propagated
    up without having to wrap them in runtime exceptions.
    
4.  SamValidationException now extends IOException instead of just Exception.  This simplifies catch blocks
    and makes it work better with ThrowingStream.

5.  New Pair utility class was added to make returning 2-tuples easier.  Pair is Closable so it can be used inside
	try-with-resource when it is closed, it will try to close the elements in the pair if they are closable.
    
6.  Created new FastqFileReader class with several methods to parse a fastq file and get back a Results
    object (subclass of Pair) that has both the ThrowingStream<FastqRecord> and the FastqQualityCodec that was used.
    This removes the need to make a datastore and have to remember to specify DataStoreProviderHint.ITERATION_ONLY.
    
7.  Created new Trimmer<T> interface which is now the parent interface to QualityTrimmer and NucleotideTrimmer.

8.  Modified FastaWriterBuilder implementations so that all methods on them return the actual concrete Builder type
	instead of the abstract  parent builder class.  This lets us chain multiple class specific methods which 
	wasn't possible before.
    
9.  Added FastaWriterBuilder.sort(Comparator) method which uses default in memory cache size currently set to 1024 records.

10. Created new NucleotideFastFileReader class with several methods to parse a fastq file and get back a Results
    object (subclass of Pair) that has both the ThrowingStream<NucleotideFastaRecord> and the FastqQualityCodec that was used.
    This removes the need to make a datastore and have to remember to specify DataStoreProviderHint.ITERATION_ONLY.
    
11. Created NucleotideFastaFileDataStore interface which has a getFile() method all file based datastores now implement this interface.

12. Added static helper factory methods to NucleotideFastaFileDataStore to simplify creating datastores using Builders with one liners.

13.  Made FastqRecordBuilder an interface.  There are now a few implementations but they are all package private use
FastqRecordBuilder.create(...) methods to create new instances or the new FastqRecord.toBuilder() method to get the particular
implementation best for that record.

14. Added getters and setters to FastqRecordBuilder.

15.  Added trim(Range) method to FastqRecordBuilder to simplify one of the most common modifications.

16.  Added DataStore.forEach( BiConsumer<String, T>) that will call the given consumer once for each record in the datastore.
     This method will often be more efficient than using Iterators. 
     
17. FastqRecord.getAvgQuality() now returns an OptionalDouble instead of a double.  
    If the sequence is empty, then the returned Optional is also empty.  
    Previously an empty string threw an Arithmetic error.
    
18. QualitySequence.getMinQuality() and getMaxQuality() now return an Optional<PhredQuality> instead of a PhredQuality.  
    If the sequence is empty, then the returned Optional is also empty.  
    Previously an empty sequence returned null.
    
19. QualitySequence.getAvgQuality() now returns an OptionalDouble instead of a double.  
    If the sequence is empty, then the returned Optional is also empty.  
    Previously an empty string threw an Arithmetic error.
    
20. Added new methods to Range forEachValue(LongConsumer) and forEachValue(CoordinateSystem, LongConsumer)
    that use a primitive consumer of longs.  This should be used in preference to Iterable.forEach(Consumer)
    which autoboxes and can only use zero based (array offset) coordinates.
    
21. Renamed SamRecordFlags to SamRecordFlag (no ending "S").

22. Created new SamRecordFlags (with "S") which stores flag bits as int.  This is now cached and used as a flyweight for
    better performance over storing duplicate Set<SamRecordFlag> over and over again.

23.  Added new SamParserOptions class to specify how to parse the sam/bam file including which
    reference, alignment range and if to add memento support or not.

24. Nucleotide enum now has Uracil

25. NucleotideSequence now has new methods isDna() and isRna() if the sequence has exclusively Ts and Us.
    
Performance Improvements

1.  QualitySequence.getAvgQuality()/ getMinQuality()/ getMaxQuality()
   Most implementations now cache the computation and performs all the calculations at once.
   Previously the computations were performed separately.
   
2. BAM reading performance improvements by caching and costly computations for sam record flags and sequence storage.  
   Benchmarkings show 30% performance improvements reading BAM files.
     
Bug Fixes
---------

1. Adapted FastqRecordWriter now fixed to actually write adapted record
2. ProteinSequenceBuilder ungap now correctly ungaps the sequence.
================
Jillion 5.2
================

New Features
------------
1.  Added new method to FastqWriter to automatically trim given a Range. 
    This saves users the trouble of creating SequenceBuilders and trimming themselves.
    
2.  Added new method to FastqRecord to get the average Quality of the quality sequence.
    The default implementation calls getQualitySequence().getAvgQuality() but some implementations
    use a more efficient version. 
    
3.  Added new QualityTrimmer SlidingWindowQualityTrimmer which acts like Trimmomatic's SLIDINGWINDOW option.

4. Added new convenience methods to NucleotideTrimmer and QualityTrimmer that take Builders.  This is really useful
   when performing multiple trimming operations in serial since some trimmers may be able to save CPU cycles
   and work directly from the builders.
   
5. Added new TrimmerPipeline and TrimmerPipelineBuilder classes which can take multiple NucleotideTrimmers
   and QualityTrimmers and combine the trimming results for you.
   
6. Added SamFileDataStore and SamFileDataStoreBuilder to finally provide a higher level API for
   working with sam and bam files without needing to use a low level Visitor. 
   
7. Added Optional<File> getFile() to FastqParser and refactored CasParser
   implementations to make begin to make it easier to extend cas file parsing.
   
8. Add lambda hook to CasFileTransformationService to override how fastqDataStore is generated so
   users could override to provide their own implementation. 
   
9.  Added new ConsensusCollectors class that can take Streams of various sequence inputs and compute a consensus.

10.  Added new TraceDirPhdDataStoreBuilder class that can make a PhdDataStore implementation from a folder of sanger trace files.

11. AbiChromatogramParser - Added support for ABI 3500 abi files.

API Changes
-----------
1. Added Trace.getLength() 

2. Added default methods to Rangeable for getLength() getBegin(), getEnd() and isEmpty() since 
   that is used the most don't have to always build a new Range object.

3. Added Range.Builder intersect methods

4.  Changed TrimmerPipeline methods to be faster by making fewer Range objects and working off of Range.Builders instead.

5.  Added new Range.toString() methods that take lambda expressions so users can make their 
    own toString implementations.  Have several overloaded versions 
    * toString(RangeToStringFunction)
    * toString(RangeToStringFunction, CoordinateSystem)
    
    * toString(RangeAndCoordinateSystemToStringFunction)
    * toString(RangeAndCoordinateSystemToStringFunction, CoordinateSystem)
    
    to let users convert to different coordinate systems and to
    include that coordinate system in the lambda expression or not.

6.  Added toGappedRange( Range) and toUngappedRange( Range) to ResidueSequence
   with default implementations and more efficient implementation when the codec 
   knows it doesn't have gaps.  Changed AssemblyUtil to use that instead of its own implementation.

7.  Added toUngappedRange( Range) to NucleotideSequenceBuilder

8. DataStoreException now extends IOException

9.  Added new StreamingIterator.empty() method


Bug Fixes
---------
1.  BlastParser - fixed bug in XML Blast Parser when it sometimes accidentally set percent identity to be (1 - percent identity).
    
================
Jillion 5.1
================

New Features
------------
1. Added new methods to FastaDataStore getSequence( id) which gets just the sequence
   and is equivalent to get(id).getSequence().

2. Added new methods to FastaDataStore getSubSequence( id, offset) which gets just the sequence
   starting from the given offset.

3. Added new methods to FastaDataStore getSubSequence( id, range) which gets just the sequence
   that intersects the given range.
   
4.  Added support for Fasta Index Files (.fai) files to NucleotideFastaDataStore.
    The NucleotideFastaFileDataStoreBuilder object can now be given an fai file
    or auto-detect one and use that to make a more efficient implementation
    to be used with the new getSequence() or getSubSequence() methods.
    
5.  Added support for writing Fasta Index Files (.fai) files to NucleotideFastaWriter using 
    the createIndex(true) method.  This will make an additional file named $outputFasta.fai.
    Supports normal, zipped and non-redundant fasta files.
    
6.  Added new class FaiNucleotideWriterBuilder that can create new Fasta Index Files (.fai) for 
    existing fasta files.  The builder object supports fully configuration of the fai to be written
    including the output path, the end of line character, and the Charset.


API Changes
-----------
1. Created new abstract class AbstractReadCasVisitor which is now the parent class of AbstractAlignedReadCasVisitor.  
   The new class handles iterating over the input read files to link cas alignments to their read names, sequences and qualities.
   Now you can extend that class if you want that extra information without realigning to gapped references.
   
2.  Moved FastaUtil to internal package since it should not be used outside of Jillion classes.  Heavily refactored it.

3.  Improved Javadoc.  Many more classes and methods now have javadoc.  Hundreds of javadoc comments have been improved
    to fix problems found by the javadoc: lint.
    
4.  BlosumMatrices class added support for Blosum30 and 40.

5.  Some classes that were in jillion.internal were moved to jillion.shared since all internal classes can't
    be exported by OSGI.  These classes should not be considered part of the public API and should only be for internal use.
    
6. FastqFileParser.canAccept() renamed to canParse() to match the other parsers.

Bug Fixes
---------
1.  PositionSequence - Sanger Position Sequence.iterator(Range)
    off by 1 bug fix did not include the last base in the range.
    
2.  StreamingIterator - abstract class that many StreamingIterators extend to use background thread
    to populate iterator has been improved to fix occasional dead lock issues if the background thread throws exceptions.
    
3.  BlastParser - fixed bug in XML Blast Parser when it sometimes accidentally set percent identity to be (1 - percent identity).

================
Jillion 5.0
================

LICENSE CHANGE

Jillion 5 is now LGPL 2.1.  Previous versions of Jillion are GPL 3 and will remain that way.

Jillion 5 now uses the same license as Bio* projects and commercial software may now
use Jillion's jar file in their software.

New Features
------------

1. Added LucyVectorSpliceTrimmer that performs vector splice trimming using 
   a simplified version of the algorithm that the TIGR program Lucy used.
   
2. Added new SplitFastaWriter and SplitFastqWriter classes which have 3 factory methods to make 
   different Writer implementations that split up writing records to different files using different
   strategies.  roundRobin(), rollover() and deconvolve() each method takes a lambda function
   to create the new individual writers and deconvolve() takes a second lambda which determines which
   output file the record will go to.
   
3. FastqWriterBuilder and FastaWriterBuilders for Nucleotide, Protein, Quality and Position files 
   can now sort records using a Comparator.  Both in-memory only 
   and using temp files to sort all the records are supported.  An additional overloaded
   sort() method takes a File object that is the directory to create the temp files in 
   (default directory is System temp).  Using the temp files to help with sorting
   allows the writing very large sorted output files that would not have been able to all fit in memory.
   
4. FastqFileDataStoreBuilder and FastaFileDataStoreBuilders for Nucleotide, Protein, Quality and Position files
   can now filter ids by Predicate<String>.  Previously you had to implement the DataStoreFilter interface.

5. FastqFileDataStoreBuilder and FastaFileDataStoreBuilders for Nucleotide, Protein, Quality and Position files
   can now filter records by Predicate<FastqRecord> and Predicate<FastaRecord> respectively.
   This allows you to very easily include/exclude records from the DataStore using criteria
   other than the record id.  This also removes a lot of boilerplate code of iterating through the file
   multiple times to make a second datastore of only the data you wanted.  
   For example, to make a NucleotideFastaDatastore where the sequences are all > 1000bp:
   
   new NucleotideFastaFileDataStoreBuilder(fastaFile)
						.filterRecords(record-> record.getLength() >1000)
						.build();
						
6. Changed DataStoreFilter, ReadFilter and SliceElementFilter to now have the parent interface Predicate<T> and changed
   All the APIs that use these classes to take a Predicate instead.  
   This lets you use lambda expressions anywhere a filter was used before which is much easier to read,
   is fewer characters to write and allows filters to be more reusable.  For example:
   
   new ContigCoverageMapBuilder<>(contig)
        			.filter(read -> read.getDirection() == Direction.FORWARD)
        			.build()
        			
   will make a CoverageMap object that only contains forward reads.

7. Created new GenomeStatistics class which is a utility class for computing different
    statistical measurements about genomes (for example N50).  It uses the new Java 8
    Collector interface.  For example to compute the N50 of all the records in a Fasta file:
    
    try(NucleotideFastaDataStore datastore = new NucleotideFastaFileDataStoreBuilder(fastaFile)
														.hint(DataStoreProviderHint.ITERATION_ONLY)
														.build();
			
		Stream<NucleotideFastaRecord> stream = datastore.iterator().toStream();
		){
			OptionalInt n50Value = stream
										.map(fasta -> fasta.getLength())
										.collect(GenomeStatistics.n50Collector());
			
			//return value is optional because there might not be any records!
			if(n50Value.isPresent()){
				System.out.println("N50 = " + n50Value.getAsInt());
			}
		}
    
8.  Created new CoverageMapCollectors class which is a utility class for creating
    Java 8 Collector objects that create CoverageMap objects.  For example, 
    if you had a contig and wanted a coverage map of the alignment locations of 
    just the forward reads capped  to a max of 200x coverage the code would look like this:
    
    CoverageMap<Range> forwardCoverageMap200x = contig.reads()
	 										         .filter(read -> read.getDirection() == Direction.FORWARD)
	 										         .map(AssembledRead::asRange)
	 										         .collect(CoverageMapCollectors.toCoverageMap(200));

9.  Performance improvements to Fastq file parsing.  When not using Mementos
	or DataStoreProviderHint.RANDOM_ACCESS_OPTIMIZE_MEMORY (which uses mementos)
	parsing time is now improved by 400%!

10.  Performance improvements of FastqFileParser and built in FastqWriter implementation for the most common
    use-case of parsing a fastq file and writing out the FastqRecord instances as is to a different writer.
    New internal classes are now used which don't convert the encoded quality strings into QualitySequence
    objects unless getQualitySequence() is called.  This takes up slightly more memory per record
    which usually isn't an issue because most of the time the files are streamed as ITERATION_ONLY
    and so the records will be GC'ed as soon as they are out of scope in the iterator.  
    When tested on large 25 million read fastq files from 1000genomes project, throughput improved by 25%.

11.  Added new DataStoreFilters factory method: containedInDataStore(DataStore datastore) which will only accept ids that are
     also contained in the given datastore.
     
12. Created InputStreamSupplier with support for normal files, zipped and gzipped files.
	This lets users reparse with compressed files multiple times.  Previously
	compressed files either could only be parsed a single time via
	InputStream constructors.
	
13. Performance improvements of parsing BAM files including now using the BAM index if present 
    to help skip over unnecessary parts of the file.

14. AceFileParser - more lenient Consensus Tag timestamp parsers to support CLC Workbench ace output
    which doesn't follow the ace file spec regarding timestamp resolution.
    
15. Jillion 5 is now OSGI compliant and can now be used in an OSGI container.
    All classes except for those under org.jcvi.jillion.internal.* are exported.
    
16. BtabWriter implementation created by the BtabWriterBuilder can now
    format the dates differently by using alternate Locales.
    
17. Added new default method to SamAttributeValidator.  thenComparing(SamAttributeValidator other) which returns
    a new SamAttributeValidator that checks both validators in a chain and only passes if both validators pass
    the attribute. Uses a similar construction to the new Java 8 Comparator.thenComparing(...) methods.

API Changes
-----------

1. Added Java 8 Lambda Support to many APIs.

2. Moved quality trimming classes that used to be in org.jcvi.jillion.core.qual.trim to new
   org.jcvi.jillion.trim package.  LucyQualityTrimmer moved to new org.jcvi.jillion.trim.lucy
   
3. Added new FastqFileDataStore interface which is a sub-interface of FastqDataStore and adds one method getQualityCodec()
   which returns the FastqQualityCodec object that was used to encode all the fastq records in the file.
   FastqFileBuilder objects have changed to return this new interface.
   
4.  Added new method to Stream<T> Contig.reads() which returns a new Java 8 Stream of read of the appropriate type.
    The Stream can then be used in any normal Java 8 Stream chain or as input to one of the new Jillion
    Collectors described below.
    
5.  Added StreamingIterator.toStream() method which returns a Java 8 Stream<T> to easily
    convert from Jillion StreamingIterators to the new Java 8 API.  The Stream still needs to be closed
    so putting it inside a try-with-resource is still recommended.
   
6.  Changed Position Fasta package to conform to the same API as the other Fasta packages for 
    nucleotides, proteins and qualities. There is now a PositionFastaFileDataStoreBuilder 
    to make the datastores with the same filter and hint methods as the other similar builders.
    All the previous DataStore implementation classes that used to be public are 
    now package private.  Please use the PositionFastaFileDataStoreBuilder only.
    
7.  Made FastqFileParser class package private.  Please use the new FastqFileParserBuilder object
    instead which has more configuration methods to more easily create FastqParser objects.
    This was to avoid the explosion of factory methods for FastqParser to handle all the
    possible combinations of inputstreams, Files, compressed Files, comments on defline,
    and multiline sequences. 
    
8.  Added new method FastqRecord.getLength() which is convenience method for FastqRecord.getNucleotideSequence().getLength()
    but some implementations may use more optimized implementation.  Added as default method.
    
9.  Added new method FastaRecord.getLength() which is convenience method for FastaRecord.getSequence().getLength()
    but some implementations may use more optimized implementation.  Added as default method.
 

10.  Moved experimental code from its own "experimental" and "experimental-test" folders to the
    normal src and test folders.
    
11.  Changed package names of the experimental code from org.jcvi.jillion_experimental.*
    to org.jcvi.jillion.experimental.*
  
12.  SWITCHED BUILD TOOL FROM ANT TO MAVEN. Use custom configuration to keep original
    folder structure.
    
13. removed jodatime dependency from ConsedAssemblyTransformerBuilder which was accidentally
    put in during testing to allow unit tests to use a fixed phred_date.  This caused the jillion
    jar to depend on jodatime when it should not have any dependencies other than JDK 8. 
    Replace jodatime code with equivalent Java 8 Clock object.
    
14. Changed FastqFileParser and FastqFileDataStoreBuilder to use those 
	InputStreamProvider objects instead of Function<File, InputStream> which was not only repetitive
	but forced users to handle IOException themselves.
	
15. Changed FastqFileParser and FastqFileDataStoreBuilder constructors that take File objects
	to delegate to InputStreamProvider.forFile( file) which handles the detection and decompression for you.
	This means you can give the constructors
	zipped or gzipped files and it will work as if it was uncompressed.
	
16.  Added zip and gzip support to fastaFileParesr and the FastaFileDataStoreBuilders

18. Changed SamVisitor API to remove visitRecord(Callback, SamRecord) which was only called when visiting SAM files,
    and not BAM files.  Now all records visited will call 
    visitRecord(Callback callback, SamRecord record , VirtualFileOffset start, VirtualFileOffset end)
    where the start and end parameters will now be null if it's a SAM file and non-null if it's a BAM file.
    Previously start and end would never be null but only called when visiting BAM files.  This lead to a lot 
    of confusion and duplicated code when dealing with both SAM and BAM files.
 Changed SamParser API methods from canAccept() and accept(...) to canParse() and parse(...).

19. Added new method to the SamParser API void parse(String referenceName, SamVisitor visitor)
    which will only visit the SamRecords in the file that map to the given reference.  Sorted Bam parser implementations
    can use the bam index if available to quickly seek right to the part of the bam file where the alignments for the 
    specified reference are stored.
    
20. Added new method to the SamParser API void parse(String referenceName, Range alignmentRange, SamVisitor visitor)
    which will only visit the SamRecords in the file that map to the given reference and the alignment intersect the 
    given alignmentRange.  Sorted Bam parser implementations can use the bam index if available to quickly seek right
    to the parts of the bam file where the alignments for the specified reference are stored.
    
21. Added new factory methods to SamParserFactory: createFromBamIndex(File bam, File bamIndex) and 
    createFromBamIndex(File bam, File bamIndex, SamAttributeValidator validator) that use a more efficient
    bam parser that uses the provided index file to randomly access alignment information.
    
22. Changed SamParserFactory.create(File) to have additional checks to see if the given file
    is a coordinate sorted BAM file, and if it is, check to see if it also has a corresponding BAI
    file and if it does, then use the new parser implementation that uses the index as if
    createFromBamIndex(File bam, File bamIndex) was called.
    
23.  Added new helper method SamRecord.getAlignmentRange()

24.  Added support for SamVisitor Memento support with new SamParser.parse(visitor, SamVisitorMemento) method.
     Previously you could create mementos but couldn't use them.
     
25.  SamRecord.Builder is now pulled out into its own class SamRecordBuilder.

26.  Refactored SamRecord from a class to an interface.  The old SamRecord class is now package private. 
     All API methods in sam package now use the new SamRecord interface instead of the old class.
     
27. BtabWriter added new locale(Locale) method to change the Locale for the Date formatting.
    If not called, the default Locale is used.  Previous versions of Jillion always used the default Locale.
    
28.  Created new SamAttributed interface which has the methods hasAttribute(...) and getAttribute(...)
     SamRecord and SamRecordBuilder now both implement this interface.
     
29.  Added additional parameter to SamAttributeValidator to add a SamAttributed instance.  This will be
     the source that the attribute is from.  This allows new validators to be written to check other attributes
     from the same source.
     
30.  Created new SamAttributeValidator singleton class NoDuplicateSamAttribute that makes sure the given 
    SamAttributeKey for isn't already used by the record, which would be a violation of the SAM specification.
    
31.  Added new default method to SamAttributeValidator.  thenComparing(SamAttributeValidator other) which returns
    a new SamAttributeValidator that checks both validators in a chain and only passes if both validators pass the attribute.
    Uses a similar construction to the new Java 8 Comparator.thenComparing(...) methods.
  

Bug Fixes
----------
1. Generate 454 Universal Accession number did not
   generate valid id if the location x,y coordinates were very small.
   
2. Bug Fix in SAM and BAM header writer which incorrectly wrote out the MD5 values of the references as "MD5" 
   instead of the actual md5 hash value.
   
3. Bug Fix in SAM and BAM header writer which incorrectly wrote out the URI path to the reference file to be the md5 value
   instead of the actual path.

4. Bug Fix in BAM writer which incorrectly computed BAM bin.

5. Bug Fixes in BAM index writer which incorrectly computed BAM bin and intervals.

================
Jillion 4.2
================

New Features
------------
1.  FastaFileParser and all Fasta Datastore implementations
    supports non-redundant text fasta files like the ones described in
    ftp://ftp.ncbi.nih.gov/blast/db/README</a>.
    If non-redundant records are encountered, then the visitXXX methods will be called
    in a way such that it will appear as if they were redundantly listed.  The non-redundant
    defline will be split and each identical sequence will be visited separately with each
    of the many ids for it.  Creating org.jcvi.jillion.fasta.FastaVisitorCallback.FastaVisitorMemento
    are also non-redundant aware and will correctly only visit the subset of non-redundant records according to when
    the memento was created.

2.  AminoAcid - Added Pyrrolysine 'O' to AminoAcid class and AminoAcidSequence
    as well as Blosum matrices. 

Bug Fixes
----------

1.  AlnFileParser - added lowercase basecall support which is used in MAFFT output.
                    Jillion can now successfully parse .aln files produced by MAFFT.
    
2.  XMLBlastParser - If accession in subject is "No definition line found" will
                     use subjectDefline instead.
                     
3.  PrimerDetector - Added guard clause if input sequence is empty, 
                     then empty collection of hits is returned.
                    
API Changes
-----------

1.  Renamed *FastaRecordWriter* to *FastaWriter* to make smaller class names

2.  Renamed *FastqRecordWriter* to *FastqWriter* to make smaller class names

3.  Blast- BlastHit now has subjectDefline field instead of
           subjectDeflineComment which only used to have the comment.
           
4.  BtabWriter - now puts alignment length in previously skipped column

================
Jillion 4.1 - internal release only
================

New Features
------------

1.  Added support for AminoAcid ambiguity codes in AminoAcid class AminoAcidSequence
    as well as Blosum matrices. 
    
2.  Added support for Amino Acid 'U' Selenocystenine.

API Changes
-----------

1.  removed getNucleotideSequenceBuilder from AssembledReadBuilder since we want
    to control all sequence manipulations directly to keep read ranges in sync.
    
2.  Added method CoverageRegion.getLength() to avoid having to chain region.asRange().getLength()
 
3.  Moved Frame class to residue package.

4.  Added new class NucleotideSequencePermuter

5.  Improved Javadoc

Bug Fixes
---------
1.  Changed alignment resource loading from loading file to loading inputStream.
    Getting the resource as a file doesn't work once Jillion has been jarred up.
    
2.  Modified AceContigBuilder and AceReadBuilder to callback to it's parent
    builder to update contig left and right if the read size changes.
    
3.  added code to (Hopefully) stop validating the DTD which was
    DDOS'ing NCBI when this code was used on the grid.
    
4.  FastqFileParser - Bug fix for unindexed Casava 1.8 read ids.

5.  AceContigBuilder - Bug fix for inserting bases in ace contig that extend beyond original consensus

================
Jillion 4.0 RC 5 - Added support for indexed BAM files and improvemented SAM/BAM API.
                   Bug fixes and performance improvements.  Added more javadoc.
================

API Changes
-----------
1.  Moved many previously public classes in SAM and BAM packages to be in "internal" packages.
    Which are not intended to be used by external clients.  This greatly simplifies the public facing API.

2.  Added BAM index support for reading and writing.  This includes support for BAM "metadata" records
    which are not in the SAM specification but are created and used by both samtools and Picard.

3.  Added FastqQualityCodec.getOffset() method to get the integer offset for the encoding.  This will
    return 33 or 64 depending on the implementation. 

Bug Fixes
----------
1.  Fixed bug in BAM VirtualFileOffset computation so it matches Picard.  The bug was 
    related to computing the offset at a BGZF block boundary.  Picard sets the offset
    to the beginning of the next block.
    
Performance Improvements
------------------------
1. Improved BAM parsing and writing code to be 25% faster.  Some of these improvements
   might also improve cas file parsing. 

================
Jillion 4.0 RC 4 - Added SAM/BAM and MAQ bfa and bfq support.  Promoted pairwise alignment code out of experimental
                   and into production. Bug fixes and performance improvements.
================

API Changes
------------
1. Added new NucleotideSequenceBuilder constructor that takes a char[]

2. Added append/insert/prepend methods to NucleotideSequenceBuilder that take a char[]

3. Completely changed interfaces for parsing with Visitors.  
   Followed the AceHandler design but renamed methods.  
   XFileParser classes are now factories to create instances of XParser interface.
   XParser has (usually) 3 methods: parse(XVisitor), parse(XVisitor, XMemento) and canParse().
   parse() is the new name for accept().  canParse() returns a boolean to say 
   if a call to parse(...) will throw IllegalStateException.  
   This is because some implementations such as file parsers using an
   inputStream can only call parse() once.  
   
   Any additional calls to parse(...) will fail since we can't always rewind the Stream.
   The will decouple visiting from an actual file so we could visit off of other
   types of objects (like Contig objects in tests).
   
4.  Changed all class references from XFileParser to XParser objects.

5.  Changed FastaFileBuilders to try to create input File's parent directory if does not exist,
    this made the contructor throw IOException instead of FileNotFoundException.
    
6.  Moved alignment packages out of experimental and into main.  New package is org.jcvi.jillion.align.

    Refactored pairwise alignment code to use more intent revealing PairwiseAlignmentBuilder 
    class with 2 static factory methods: createNucleotideAlignment( ..) and createProteinAlignment(..)
    to handle all the messy generics.  Builder has method to specify local vs global alignment 
    to hide algorithm name implementation details.
    
    Added support for Protein Blast results as well as code to auto-detect
    nucleotide or protein blast results to build the correct HSPs.
    
    Renamed ScoringMatrix to SubstitutionMatrix.  Created NucleotideSubsitutionMatrices utility classes.
    Refactored matrix file parser classes into abstract class with 2 subclasses to handle nucleotide and protein matrices.
    moved more classes into align package.

7.  Renamed AminoAcidSequence to ProteinSequence.  
    All other public classes with AminoAcidSequence in their name is now ProteinSequence instead.
    For example: AminoAcidFastaRecord is now called ProteinFastaRecord.

8.  Added new methods to fastaWriterBuilders #lineSeparator(String) to change line separator from \n to
    support windows and #allBasesOnOneLine() to force all data on one line instead of splitting the data onto multilines.
    
9.  made CompactProteinSequenceCodec package private.

10. Changed cas parser to default to fasta format type for unknown file extension or file without extension. 
    Previously default was chromatogram.  Chromatograms now must have either 'ab1', 'abi', 'scf' or 'ztr' file extensions.
    
11. Added getters and setters to Range.Builder previously only had expand and contract.

12. Created AlnWriter interface and AlnFileWriter classes to handle writing nucleotide and protein aln files.

13. Added Support for reading SAM and BAM formatted files. 
    SamHeader, Cigar, SamRecord, ReadGroups, SamAttributes etc all have object representations.
     
14.  Added new method Range.complement(Collection<Range> ) which will return Ranges in the input collection that are not in "this" Range.

15. Made Range.complementFrom(Collection<Range> ) package private since it is not used anymore outside of the Range class
    and is confusing since there are other complement methods.
    
16. Created new AssemblyTransformer interface which will be used to convert from one format into another. 
    Created CasAssemblyTransformationService to convert cas files into something else using AssemblyTransformer.
    Created ConsedAssemblyTransformerBuilder for transforming assemblies into consed packages. 
    (Passing the built assemblyTransformer from this class into the CasAssemblyTransformationService 
    will perform a cas2consed conversion.)
    
17. Created new class DataStoreEntry which is similar to MapEntry<String, T>. 
    Added new method to Datastore.entryIterator() which returns a StreamingIterator<DataStoreEntry<T>>.

18.  Added methods isReadOnceOnly() and isReadOnceOnly() and canCreateMemento() to  FastaParser and FastqParser
     interface to distinguish between file-based and inputStream based datasources.  This may be added to all
     other parsers eventually.

19.  Modified all fasta and fastq DataStore implementations to accept Parser instances in addition to file and inputStream.

20.  Added support for MAQ binary fasta (.bfa) and binary fastq (.bfq) formatted files.
     Maq binary fasta and fastq datastore classes delegate to their nucleotide fasta and fastq counterparts by 
     just passing their BinaryParser isntance to them for improved code re-use.
     
21.  FastqFileWriter now has a includeIdOnQualDefLine( boolean) method to add the ids to the quality defline. Some
     legacy external fastq pipelines require seeing the id on both the sequence and quality deflines so this is to support them.
     
     
Bug Fixes
------------
1. Fixed bug in QualitySequenceBuilder AND AminoAcidSequenceBuilder where trimming beyond current Range threw Exception.

2. Changed AbstractAceContigBuilderVisitor to get valid range ONLY from QA align coordinates instead of intersection.
   This solves problem of user edited read extensions not getting parsed correctly.
   
3. Fixed bug in NucleotideSequenceBuilder that didn't treat \t as a whitespace character.

4. Changed NucleotideSequenceBuilder to ignore \0 in char arrays.

5. Fixed bug in XML BLAST parser to call visitEnd() at end of file.

6. Major Bug fix in ResidueSequence.isGap() which didn't work for the NucleotideCodecs
   if the offset requested was the first gap in the read! 


Performance Improvements
------------------------
1.  Added optional fastqCodec field to AbstractAlignedCasReadVisitor so
    fastq files included in cas assemblies don't have to be parsed twice to guess encoding.
2.  Increased use of StringBuilders instead of String concatentation. Set large initial buffer sizes
    to avoid as many re-sizing operations.

================
Jillion 4.0 RC 3 - Most changes address performance improvements.
================

API Changes
------------
1. Created new TranslationTable interface to translate NucleotideSequence
   into AminoAcidSequence.
   
2. Created new IupacTranslationTables enum which has the common Genbank Translation tables.
   Class contains methods to get by name and by genbank table number.
   
3. Removed old Codon class.

4. Removed SequenceBuilder.build(Range) since it was never used.

5. Refactored AminoAcid classes moved AminoAcids helper class to test
   directory and renamed it AminoAcidUtil.
   
6. Removed GlyphCodec.encode(Collection<T> ) and pushed it down to sub-interfaces.

7. Removed NucleotideCodec.encode(Collection<T> ) and all its uses 
   outside of tests.  Created new helper method in test area to perform
   function so it doesn't pollute public API.

8. Created AminoAcidCodec interface, previously was just GlyphCodec<AminoAcid>.

9. Added sortedInsert(value) and sortedRemove(value) to the growableArrays.

10. Made GrowableArrays implement Iterable, created new classes to Iterate over primitive arrays.

11. DefaultAssembledReadBuilder equals() and hashcode() have been changed
    to use only readId and direction which are the only immutable fields.
    Previously used reference field which is mutable and could breaks the
    equals and hashcode contract.
    
12. Changed AssembledReadBuilder to no longer have methods to programatically set the
    reference since it is now done during build time.
    
13. Added support for creating "denovo" contigs without first providing an
    initial consensus sequence.  So far only supported in AceContigBuilders.
    Will add support for others later.
    
14. Changed StreamingIteator.close() to no longer throw IOException this cleans up a
    lot code that uses the new Java 7 try-with-resource.
 
15. Added sortedInsert( int[]) to GrowableIntArray and equivalents to 
    GrowableByteArray, GrowableShortArray and GrowableLongArray.
    
16. Changed ScfChromatogramWriter to accept Chromatogram objects not
    just ScfChromatogram objects.  If the Chromatogram is not
    an ScfChromatogram, then the writer pretends the scf-specific
    fields are blank.
    
17. Deleted AbstractConsensusCaller class; pushed used methods down
    to AbstractChurchhillWatermanConsensusCaller.
    
18. Added TraceArchiveWriter.getNumberOfTracesWritten().
   
Performance Improvements
------------------------
1.  Changed NucleotideSequenceBuilder to track sorted gap offsets while building.
    This takes up slightly more memory when building but will enable optimized object
    creation during the build() step.
    
2.  Performance improvement changed endian of 2 bit nucleotide encodings to
    reduce the amount of computations.

3.  Changed NucleotideSequenceBuilder to use new added 
    sortedInsert(value) and sortedRemove(value) on the growableArrays.

4.  To improve speed of cas2consed, changed casAlignmentBuilder from random
    access into reference sequence to see if value is gap to
    caching the gap offsets and then doing binary search on them. 
    
5.  Improved fake quality ace writing output.

6.  Performance improvement to NucleotideSequenceBuilder.ungap().

7.  Changed DefaultAssembledReadBuilder to no longer store the
    original NucleotideSequence reference field. Since it is no longer
    used outside of the build() method.  This reduces memory footprint
    as well since there are millions of builder objects in memory all 
    with 1 fewer reference saving either 4 or 8 bytes each.
    
8.  Added QualityCodec.toQualityArray() to efficiently create byte array
    where array[i] is the ith quality score.  
    Changed EncodedQualitySequnce.equasl() to be based on
    array which should be faster most of the time (at least for reads).
    
9.  Performance improvement for NucleotideSequenceBuilder.insert() append()
    and prepend() methods when a NucleotideSequence is passed in.  
    Previously method took an Iterable<Nucletoide> to accept both
    Sequences and Collections.  However since NucleotideSequence
    knows how its length and how many gaps it has we can use that 
    information to pre-allocate the internal structures instead of
    possibly re-allocating them many times as the structures grow.
    
10. Performance improvement for NucleotideSequence : 
    changed Sequence gap computation methods to be more efficient.
    NucleotideCodec now computes those values which should be faster 
    than converting to a List<Integer> then iterating over them. 
    Also stops looking as soon as the current gap offset > offset 
    we care about. Which will make it faster to convert between gap 
    and ungapped offsets for low valued offsets where the sequence 
    is very gappy downstream.

11. Performance improvement for NucleotideSequenceBuilder:
    no longer uses BitSet to internally store data since that was very slow.
    Now use GrowableByteArray which is more than 2x faster but
    takes up 2x the memory.  Now each base is 1 byte instead of 
    4 bits while building.  (But when build() is called the base 
    will be 2 or 4 bits depending on context).  
    We can later implement a GrowableHalfByteArray or 
    something similar to do our own bit masking which will
    reduce the memory footprint if memory becomes a problem.
    
12. TextLineParser no longer uses a pushback inputstream but
    instead keeps the extra byte it read in its own memory.  
    This takes up less memory than a pushback inputstream and 
    we only need to check the unread byte once at the beginning
    of every nextLine() instead of while reading each byte with in.read().
     
13. Changed PairwiseAligner TraceBack matrix to pack 4 tracebacks into a single byte.
    This should reduce memory of smith-waterman and needleman wunsch alignments 
    by a factor of 4.  Previously aligning two 30k sequences to each 
    other took almost 2 GB, now its 400MB.
     
14. Changed NucleotideSequenceBuilder.NewValues.insert() to use
    new growableArray.sortedInsert() instead of growableArray.append(int[]); 
    growableArray.sort().  This was revealed to be bottleneck when using a profiler.

Bug Fixes
------------
1. AminoAcidSequenceBuilder.toString() to actually print
   sequence instead of object ref to growable array.

2. BlosomMatrix - Added support for Stop codon.

3. bug fix to GapQualityValueStrategy to fix quality values of reverse complemented reads.

================
Jillion 4.0 RC 2
================

New Features
------------
1.  Added support for new version of CLC .cas file format produced
    by new version of CLC software "clc_mapper".  Previously
    Jillion only supported cas files produced by CLC ref_assemble 
    software.

API Changes
------------
1.  Created ZtrChromatogramParser class to replace old ZtrChromatogramFileParser
    class so the Chromatogram parsers match the other Jillion parser and visitor 
    classes.
    
2.  Created ScfChromatogramParser and AbiChromatogramParser classes to replace
    old ScfChromatogramFileParser and AbiChromatogramParser classes so the 
    Chromatogram parsers match the other Jillion parser and visitor classes.
    
3.  Renamed ChromatogramFileVisitor.visitEndOfTrace() to visitEnd() to be consistent 
    with other Jillion visitors
    
4.  Changed Chromatogram equals contract to require quality values and id are equal as well.

5.  Removed ChromatogramFileVisitor.visitNewTrace() since it was never used by any chromatogram 
    implementations other than as a no-op.
    
6.  Changed Chromatogram.getPositionSequence() to getPeakSequence().

7.  Changed ChromatogramBuilder methods to get /set confidence to get/set qualities.

8.  Changed TasmBuilder.withAvgCoverage(Double) method to be 
    setCoverageInfo(Integer numReads, Double avgCov) so we also set number of reads.  
    Setting numReads to non-null will override the Contig.getNumberOfReads() methods 
    to be whatever you set it to. (helpful for annotation contigs)
    
9.  Changed AbstractTasmFileParser to set coverage info when parsing tasm header 
    and reset it if any reads parsed.  This allows annotation contigs to retain their
    correct # seqs and avgCoverage levels.

10. Added new method TasmContig.isAnnotationContig() to denote if the numRecords and 
    avg coverage or if it was explicitly set or was computed from underlying read info.
    
11. Changed FastaDataStoreBuilder classes to take InputStreams as well as Files.

12. Changed Slice.equals() to allow for SliceElements to be in any order as long as all present.  
    Previously order of elements mattered.

13. renamed Phd.getPositionSequence() Phd.getPeakSequence()

14. Removed TraceDecoderException and TraceEncoderException, 
    everywhere that used these classes now throw./catch IOException instead.

15. Changed AssembledRead.toReferenceOffset() and AssembledRead.toGappedValidRangeOffset()
    to throw IndexOutOfBoundsExceptions instead of IllegalArgumentExceptions if given invalid offsets.
    
16. Added TextLineParser.peekNextLine()

17. Renamed CtgFileWriter to TigrContigFileWriter.

18. Made DefaultPlacedContig package private since it is only 
    created internally by the ScaffoldBuilder objects.
    
19. Moved ScaffoldUtils class out of Jillion since it was never used except by other higher up modules 
    internal to JCVI (it was moved to  one of those modules )
    
20. Added new method Slice.getConsensusCall() which optionally stores the consensus call for that slice needed 
    for new consensus caller implementations, if consensus is not set, then this method returns null.
    Changed SliceMapBuilder with Contig constructors to set each Slice's consensus call to the
    contig consensus.

21. Added method SliceBuilder.getCurrentCoverageDepth().

22. Added constructors and methods to SliceBuilder that take SliceElementFilter to more simply filter elements.

23. Changed CasFileParser constructor to be private added new create static factory method to use instead.

24. Added sort() and binarySerarch() methods to GrowableArrays. 

25. Added clear() method to SequenceBuilders so the same object can be re-used without having
    to create new objects or gc'ing old objects.
    
26. Changed GapQualityValueStrategy to return entire gapped complemented quality sequence all at once instead of one offset at a time since it is computationally intensive
    and most of the time you need the entire sequence anyway (for slice map building).
    
27. Renamed AceFileContigDataStore interface AceFileDataStore to match the respective builder

Performance Improvements
------------------------
1.  Improved performance of gap calculations for NucletoideSequence implementation that only contains ACGTN

2.  Added constant Nucleotide.VALUES which stores a copy of values() as an unmodifiable List so we don't 
    keep having to clone it every time.

3.  improved NucleotideSequence and NucleotideSequenceBuilder processing by
    optimizing bottlenecks detected with profiler.
    
4.  Improved Fastq parsing by reducing the number of objects created during parsing.

5.  Modified ContigBuilders to use new GapQualityValue methods which can compute quality values
    faster than old method. 

6.  Performance improvements to Range construction.

7.  Rewrote NucleotideCodec implementations so that the class that stores each base
    in 4 bits can take advantage of the 'book keeping' information in the header that was
    used in the 2 bit implementations.  This makes computing gapped and ungapped 
    coordinates much faster with only a slight increase in memory usage.  
    This improves cas2consed times by 33%.

Bug Fixes
------------
1. Fixed internal TextLineParser class to correctly compute the 
   current position offset into a file that uses Windows '\r\n' EOLs.  
   This broke index offset datastore implementations on Windows machines.
   
2. MostFrequentBasecallConsensusCaller now uses cumulative quality scores 
   in the event of a tie.
   
3. Fixed bug in ACGTNucleotideCodec which incorrectly computed ungappedLength.

4. Bug fix LargeTasmContigDataStore.iterator() method to return TasmContig instances 
   instead of TasmContigBuilder instances

5. Fixed Bug in Range.iterator() if Range.getEnd() was Long.MAX_VALUE.

6. Fixed bug in DefaultAsmContig to not delete readbuilders during build().  
   This will now let users call built multiple times without errors.
   
7.  Fixed PhdParser and Writer to correctly handle read tags in all allowed locations.

8.  Fixed bug in PhdWriter to write "UNPADDED_READ_POS" instead of "UNGAPPED_READ_POS"

9.  Fixed bugs in AbstractAlignedReadCasVisitor to correctly handle interleaved records.

10. Fixed for phdBall writer to not write out comment "null" if there wasn't any comment.

11. Bug fix for sff parser if sff read did not have quality trimming applied to it.

12.  Fixed Cas2consed to correctly trim sff reads correctly.

13.  Bug fix in NucleotideSequenceBuilder which incorrectly counted N's as 
     ambiguities when determining the codec to use.

14.  Fixed bugs in GapQualityValueStrategy to correctly compute reverse complemented gap values. 

15.  Bug fix to cas2consed to take qualities into account when computing consensus.