This SRFI is currently in ``draft'' status.
To see an explanation of
each status that a SRFI can hold, see here.
To provide input on this SRFI, please
mail to
`<srfi minus 113 at srfi dot schemers dot org>`

. See
instructions here to
subscribe to the list. You can access previous messages via
the archive of the mailing list.

- Received: 2013/05/31
- Draft: 2013/06/01-2013/08/01
- Revised: 2013/06/09
- Revised: 2013/07/03
- Revised: 2013/12/04
- Revised: 2014/03/31

*Sets* and *bags* (also known as multisets) are unordered collections that can contain any Scheme object. Sets enforce the constraint that no two elements can be the same in the sense of the set's associated *equality predicate*; bags do not.

Sets are a standard part of the libraries of many high-level programming languages, including Smalltalk, Java, and C++. Racket provides general sets similar to those of this proposal, though with fewer procedures, and there is a Chicken egg called `sets`

(unfortunately undocumented), which provides a minimal set of procedures. SRFI 1 also provides a list-based implementation of sets.

Bags are useful for counting anything from a fixed set of possibilities, e.g. the number of each type of error in a log file or the number of uses of each word in a lexicon drawn from a body of documents. Although other data structures can serve the same purpose, using bags clearly expresses the programmer's intent and allows for optimization.

Insofar as possible, the names in this SRFI are harmonized with the names used for ordered collections (lists, strings, vectors, and bytevectors) in Scheme. However, `size`

is used instead of `length`

to express the number of elements in a collection, because `length`

implies order.

It's possible to use the general sets of this SRFI to contain characters, but the use of SRFI 14 is recommended instead. The names and facilities in this SRFI are harmonized with SRFI 14, except that SRFI 14 does not contain analogues of the `set-search!`

, `set>?`

, `set<=?`

, `set>=?`

, `set-remove`

, or `set-partition`

procedures.

Sets and bags do not have a lexical syntax representation. It's possible to use SRFI 108 quasi-literal constructors to create them in code, but this SRFI does not standardize how that is done.

The interface to general sets and bags depends on SRFI 114 comparators, despite that SRFI having a higher number than this one for hysterical raisins. Comparators conveniently package the equality predicate of the set with the hash function or comparison procedure needed to implement the set efficiently.

Sets and bags are mutually disjoint, and disjoint from other types of Scheme objects.

It is an error for any procedure defined in this SRFI to be invoked on sets or bags with distinct comparators (in the sense of `eq?`

).

It is an error to mutate any object while it is contained in a set or bag.

It is an error to add an object to a set or bag which does not satisfy the type test predicate of the comparator.

It is an error to add or remove an object for a set or a bag while iterating over it.

The procedures of this SRFI, by default, are "pure functional" — they do not alter their parameters. However, this SRFI also defines "linear-update" procedures, all of whose names end in `!`

. They have hybrid pure-functional/side-effecting semantics: they are allowed, but not required, to side-effect one of their parameters in order to construct their result. An implementation may legally implement these procedures as pure, side-effect-free functions, or it may implement them using side effects, depending upon the details of what is the most efficient or simple to implement in terms of the underlying representation.

It is an error to rely upon these procedures working by side effect. For example, this is not guaranteed to work:

(let* ((set1 (set 'a 'b 'c)) ; set1 = {a,b,c}. (set2 (set-adjoin! set1 'd))) ; Add d to {a,b,c}. set1) ; Could be either {a,b,c} or {a,b,c,d}.

However, this is well-defined:

(let ((set1 (set 'a 'b 'c))) (set-adjoin! set1 'd)) ; Add d to {a,b,c}.

So clients of these procedures write in a functional style, but must additionally be sure that, when the procedure is called, there are no other live pointers to the potentially-modified character set (hence the term "linear update").

There are two benefits to this convention:

Implementations are free to provide the most efficient possible implementation, either functional or side-effecting.

Programmers may nonetheless continue to assume that sets are purely functional data structures: they may be reliably shared without needing to be copied, uniquified, and so forth.

In practice, these procedures are most useful for efficiently constructing sets and bags in a side-effecting manner, in some limited local context, before passing the character set outside the local construction scope to be used in a functional manner.

Scheme provides no assistance in checking the linearity of the potentially side-effected parameters passed to these functions — there's no linear type checker or run-time mechanism for detecting violations.

Note that if an implementation uses no side effects at all, it is allowed to return existing sets and bags rather than newly allocated ones, even where this SRFI explicitly says otherwise.

Implementations of this SRFI are allowed to place restrictions on the comparators that the procedures accept. In particular, an implementation may require comparators to provide a comparison procedure. Alternatively, an implementation may require comparators to provide a hash function, unless the equality predicate of the comparator is `eq?`

, `eqv?`

, `equal?`

, `string=?`

, or `string-ci=?`

. Implementations must not require the provision of both a comparison procedure and a hash function.

Constructors:

`set`

,`set-contains?`

,`set-unfold`

Predicates:

`set?`

,`set-empty?`

,`set-disjoint?`

Accessors:

`set-member`

,`set-element-comparator`

Updaters:

`set-adjoin`

,`set-adjoin!`

,`set-replace`

,`set-replace!`

,`set-delete`

,`set-delete!`

,`set-delete-all`

,`set-delete-all!`

,`set-search!`

The whole set:

`set-size`

,`set-find`

,`set-count`

,`set-any?`

,`set-every?`

Mapping and folding:

`set-map`

,`set-for-each`

,`set-fold`

,`set-filter`

,`set-filter!`

,`set-remove`

,`set-remove!`

,`set-partition`

,`set-partition!`

Copying and conversion:

`set-copy`

,`set->list`

,`list->set`

,`list->set!`

Subsets:

`set=?`

,`set<?`

,`set>?`

,`set<=?`

,`set>=?`

Set theory operations:

`set-union`

,`set-intersection`

,`set-difference`

,`set-xor`

,`set-union!`

,`set-intersection!`

,`set-difference!`

,`set-xor!`

Additional bag procedures:

`bag-sum`

,`bag-sum!`

,`bag-product`

,`bag-product!`

,`bag-element-count`

,`bag-for-each-unique`

,`bag-fold-unique`

,`bag-increment!`

,`bag-decrement!`

,`bag->set`

,`set->bag`

,`set->bag!`

Comparators:

`set-comparator`

,`bag-comparator`

`(set `

*comparator* *element* ... `)`

Returns a newly allocated empty set. The *comparator* argument is a SRFI 114 comparator, which is used to control and distinguish the elements of the set. The *elements* are used to initialize the set.

`(set-unfold `

*stop? mapper successor seed comparator*`)`

Create a newly allocated set as if by `set`

using *comparator*. If the result of applying the predicate *stop?* to *seed* is true, return the set. Otherwise, apply the procedure *mapper* to *seed*. The value that *mapper* returns is added to the set. Then get a new seed by applying the procedure *successor* to *seed*, and repeat this algorithm.

`(set? `

*obj*`)`

Returns `#t`

if *obj* is a set, and `#f`

otherwise.

`(set-contains? `

*set element*`)`

Returns `#t`

if *element* is a member of *set* and `#f`

otherwise.

`(set-empty? `

*set*`)`

Returns `#t`

if *set* has no elements and `#f`

otherwise.

`(set-disjoint? `

*set _{1} set_{2}*

`)`

Returns `#t`

if *set _{1}* and

`#f`

otherwise.`(set-member `

*set element default*`)`

Returns the element of *set* that is equal, in the sense of *set's* equality predicate, to *element*. If *element* is not a member of *set*, *default* is returned.

`(set-element-comparator `

*set*`)`

Returns the comparator used to compare the elements of *set*.

`(set-adjoin `

*set element* ...`)`

`(set-adjoin! `

*set element* ...`)`

The `set-adjoin`

procedure returns a newly allocated set that uses the same comparator as *set* and contains all the values of *set*, and in addition each *element* unless it is already equal (in the sense of the comparator) to one of the existing or newly added members. It is an error to add an element to *set* that does not return `#t`

when passed to the type test procedure of the comparator.

`(set-replace `

*set element*`)`

`(set-replace! `

*set element*`)`

The `set-replace`

procedure returns a newly allocated set that uses the same comparator as *set* and contains all the values of *set* except as follows: If *element* is equal (in the sense of *set*'s comparator) to an existing member of *set*, then that member is omitted and replaced by *element*. If there is no such element in *set*, then *set* is returned unchanged.

The `set-adjoin!`

and `set-replace!`

procedures are the same as `set-adjoin`

and `set-replace`

, except that they are permitted to mutate and return the *set* argument rather than allocating a new set.

`(set-delete `

*set element* ...`)`

`(set-delete! `

*set element* ...`)`

`(set-delete-all `

*set element-list*`)`

`(set-delete-all! `

*set element-list*`)`

The `set-delete`

procedure returns a newly allocated set containing all the values of *set* except for any that are equal (in the sense of *set*'s comparator) to one or more of the *elements*. Any *element* that is not equal to some member of the set is ignored.

The `set-delete!`

procedure is the same as `set-delete`

, except that it is permitted to mutate and return the *set* argument rather than allocating a new set.

The `set-delete-all`

and `set-delete-all!`

procedures are the same as `set-delete`

and `set-delete!`

, except that they accept a single argument which is a list of elements to be deleted.

`(set-search! `

*set element failure success*`)`

The *set* is searched for *element*. If it is not found, then the *failure* procedure is tail-called with two continuation arguments, *insert* and *ignore*, and is expected to tail-call one of them. If *element* is found, then the *success* procedure is tail-called with the matching element of *set* and two continuations, *update* and *remove*, and is expected to tail-call one of them.

The effects of the continuations are as follows (where *obj* is any Scheme object):

Invoking

`(`

*insert obj*`)`

causes*element*to be inserted into*set*.Invoking

`(`

*ignore obj*`)`

causes*set*to remain unchanged.Invoking

`(`

*update new-element obj*`)`

causes*new-element*to be inserted into*set*in place of*element*.Invoking

`(`

*remove obj*`)`

causes the matching element of*set*to be removed from it.

In all cases, two values are returned: the possibly updated *set* and *obj*.

`(set-size `

*set*`)`

Returns the number of elements in *set* as an exact integer.

`(set-find `

*predicate set failure*`)`

Returns an arbitrarily chosen element of *set* that satisfies *predicate*, or the result of invoking *failure* with no arguments if there is none.

`(set-count `

*predicate set*`)`

Returns the number of elements of *set* that satisfy *predicate* as an exact integer.

`(set-any? `

*predicate set*`)`

Returns `#t`

if any element of *set* satisfies *predicate*, or `#f`

otherwise. Note that this differs from the SRFI 1 analogue because it does not return an element of the set.

`(set-every? `

*predicate set*`)`

Returns `#t`

if every element of *set* satisfies *predicate*, or `#f`

otherwise. Note that this differs from the SRFI 1 analogue because it does not return an element of the set.

`(set-map `

*comparator proc set*`)`

Applies *proc* to each element of *set* in arbitrary order and returns a newly allocated set, created as if by `(set `

*comparator*`)`

, which contains the results of the applications. For example:

(set-map string-ci-comparator symbol->string (set eq? 'foo 'bar 'baz)) => (set string-ci-comparator "foo" "bar" "baz")

Note that, when *proc* defines a mapping that is not 1:1, some of the mapped objects may be equivalent in the sense of *comparator's* equality predicate, and in this case duplicate elements are omitted as in the set constructor. For example:

(set-map (lambda (x) (quotient x 2)) integer-comparator (set integer-comparator 1 2 3 4 5)) => (set integer-comparator 0 1 2)

If the elements are the same in the sense of `eqv?`

, it is unpredictable which one will be preserved in the result.

`(set-for-each `

*proc set*`)`

Applies *proc* to *set* in arbitrary order, discarding the returned values. Returns an unspecified result.

`(set-fold `

*proc nil set*`)`

Invokes *proc* on each member of *set* in arbitrary order, passing the result of the previous invocation as a second argument. For the first invocation, *nil* is used as the second argument. Returns the result of the last invocation, or *nil* if there was no invocation.

`(set-filter `

*predicate set*`)`

Returns a newly allocated set with the same comparator as *set*, containing just the elements of *set* that satisfy *predicate*.

`(set-filter! `

*predicate set*`)`

A linear update procedure that returns a set containing just the elements of *set* that satisfy *predicate*.

`(set-remove `

*predicate set*`)`

Returns a newly allocated set with the same comparator as *set*, containing just the elements of *set* that do not satisfy *predicate*.

`(set-remove! `

*predicate set*`)`

A linear update procedure that returns a set containing just the elements of *set* that do not satisfy *predicate*.

`(set-partition `

*predicate set*`)`

Returns two values: a newly allocated set with the same comparator as *set* that contains just the elements of *set* that satisfy *predicate*, and another newly allocated set, also with the same comparator, that contains just the elements of *set* that do not satisfy *predicate*.

`(set-partition! `

*predicate set*`)`

A linear update procedure that returns two sets containing the elements of *set* that do and do not, respectively, not satisfy *predicate*.

`(set-copy `

*set*`)`

Returns a newly allocated set containing the elements of *set*, and using the same comparator.

`(set->list `

*set*`)`

Returns a newly allocated list containing the members of *set* in unspecified order.

`(list->set `

*comparator list*`)`

Returns a newly allocated set, created as if by `set`

using *comparator*, that contains the elements of *list*. Duplicate elements (in the sense of the equality predicate) are omitted.

`(list->set! `

*set list*`)`

Returns a set that contains the elements of both *set* and *list*. Duplicate elements (in the sense of the equality predicate) are omitted.

Note: The following three predicates do not obey the trichotomy law and therefore do not constitute a total order on sets.

`(set=? `

*set _{1} set_{2}* ...

`)`

Returns `#t`

if each *set* contains the same elements.

`(set<? `

*set _{1} set_{2}* ...

`)`

Returns `#t`

if each *set* other than the last is a proper subset of the following *set*, and `#f`

otherwise.

`(set>? `

*set _{1} set_{2}* ...

`)`

Returns `#t`

if each *set* other than the last is a proper superset of the following *set*, and `#f`

otherwise.

`(set<=? `

*set _{1} set_{2}* ...

`)`

Returns `#t`

if each *set* other than the last is a subset of the following *set*, and `#f`

otherwise.

`(set>=? `

*set _{1} set_{2}* ...

`)`

Returns `#t`

if each *set* other than the last is a superset of the following *set*, and `#f`

otherwise.

`(set-union `

*set _{1} set_{2}* ...

`)`

`(set-intersection `

*set _{1} set_{2}* ...

`)`

`(set-difference `

*set _{1} set_{2}* ...

`)`

`(set-xor `

*set _{1} set_{2}*

`)`

Return a newly allocated set that is the union, intersection, asymmetric difference, or symmetric difference of the *sets*. Asymmetric difference is extended to more than two sets by taking the difference between the first set and the union of the others. Symmetric difference is not extended beyond two sets. Elements in the result set are drawn from the first set in which they appear.

`(set-union! `

*set _{1} set_{2}* ...

`)`

`(set-intersection! `

*set _{1} set_{2}* ...

`)`

`(set-difference! `

*set _{1} set_{2}* ...

`)`

`(set-xor! `

*set _{1} set_{2}*

`)`

Linear update procedures returning a set that is the union, intersection, asymmetric difference, or symmetric difference of the *sets*. Asymmetric difference is extended to more than two sets by taking the difference between the first set and the union of the others. Symmetric difference is not extended beyond two sets. Elements in the result set are drawn from the first set in which they appear.

Bags are like sets, but can contain the same object more than once. However, if two elements that are the same in the sense of the equality predicate, but not in the sense of `eqv?`

, are both included, it is not guaranteed that they will remain distinct when retrieved from the bag. It is an error for a single procedure to be invoked on bags with different comparators.

The procedures for creating and manipulating bags are the same as those for sets, except that `set`

is replaced by `bag`

in their names, and that adjoining an element to a bag is effective even if the bag already contains the element. If two elements in a bag are the same in the sense of the bag's comparator, the implementation may in fact store just one of them.

The `bag-union`

, `bag-intersection`

, `bag-difference`

, and `bag-xor`

procedures (and their linear update analogues) behave as follows when both bags contain elements that are equal in the sense of the bags' comparator:

For

`bag-union`

, the number of equal elements in the result is the largest number of equal elements in any of the original bags.For

`bag-intersection`

, the number of equal elements in the result is the smallest number of equal elements in any of the original bags.For

`bag-difference`

, the number of equal elements in the result is the number of equal elements in the first bag, minus the number of elements in the other bags (but not less than zero).For

`bag-xor`

, the number of equal elements in the result is the absolute value of the difference between the number of equal elements in the first and second bags.

`(bag-sum `

*set _{1} set_{2}* ...

`)`

`(bag-sum! `

*bag _{1} bag_{2}* ...

`)`

The `bag-sum`

procedure returns a newly allocated bag containing all the unique elements in all the *bags*, such that the count of each unique element in the result is equal to the sum of the counts of that element in the arguments. It differs from `bag-union`

by treating identical elements as potentially distinct rather than attempting to match them up.

The `bag-sum!`

procedure is equivalent except that it is linear-update.

`(bag-product `

*n bag*`)`

`(bag-product! `

*n bag*`)`

`bag-product`

procedure returns a newly allocated bag containing all the unique elements in `bag`

multiplied by The `bag-product!`

procedure is equivalent except that it is linear-update.

`(bag-unique-size `

*bag*`)`

Returns the number of unique elements of *bag*.

`(bag-element-count `

*bag element*`)`

Returns an exact integer representing the number of times that *element* appears in *bag*.

`(bag-for-each-unique `

*proc bag*`)`

Applies *proc* to each unique element of *bag* in arbitrary order, passing the element and the number of times it occurs in *bag*, and discarding the returned values. Returns an unspecified result.

`(bag-fold-unique `

*proc nil bag*`)`

Invokes *proc* on each unique element of *bag* in arbitrary order, passing the number of occurrences as a second argument and the result of the previous invocation as a third argument. For the first invocation, *nil* is used as the third argument. Returns the result of the last invocation.

`(bag-increment! `

*bag element count*

`)`

`(bag-decrement! `

*bag element count*

`)`

Linear update procedures that return a bag with the same elements as *bag*, but with the element count of *element* in *bag* increased or decreased by the exact integer *count* (but not less than zero).

`(bag->set `

*bag*`)`

`(set->bag `

*set*`)`

`(set->bag! `

*bag set*`)`

The `bag->set`

procedure returns a newly allocated set containing the unique elements (in the sense of the equality predicate) of *bag*. The `set->bag`

procedure returns a newly allocated bag containing the elements of *set*. The `set->bag!`

procedure returns a bag containing the elements of both *bag* and *set*. In all cases, the comparator of the result is the same as the comparator of the argument or arguments.

`(bag->alist `

*bag*`)`

`(alist->bag `

*comparator alist*`)`

The `bag->alist`

procedure returns a newly allocated alist whose keys are the unique elements of *bag* and whose values are the number of occurrences of each element. The `alist->bag`

returning a newly allocated bag based on *comparator*, where the keys of *alist* specify the elements and the corresponding values of *alist* specify how many times they occur.

The following comparators are used to compare sets or bags, and allow sets of sets, bags of sets, etc.

`set-comparator`

`bag-comparator`

Note that these comparators do not provide comparison procedures, as there is no ordering between sets or bags. It is an error to compare sets or bags with different element comparators.

Kevin Wortman's detailed advice and whole-hearted cooperation with me has been essential to making this SRFI what it is. He has coordinated his efforts toward a forthcoming immutable deque, bag, and map SRFI with mine in this SRFI and my forthcoming mutable queue and hash table SRFIs.

Thanks also to Per Bothner, Art Gleckler, Aubrey Jaffer, Takashi Kato, Alexey Radul, Jim Rees, Alex Shinn, Charles Wang, and Alan Watson for their participation in the SRFI 113 mailing list, and to Michael Sperber for his editorial services.

Of course, I take all the blame for any errors or infelicities.

The implementation places the identifiers defined above into the `sets`

library.

Sets and bags are implemented as a thin veneer over hashtables.

The sample implementation contains the following files:

`sets-impl.scm`

– implementation of general sets and bags`sets.sld`

– an R7RS library named`(sets)`

`sets.scm`

– a Chicken library`sets-test.scm`

– test suite

The test suite will work with the Chicken `test`

egg, which is provided on Chibi as the `(chibi test)`

library.

Copyright (C) John Cowan 2013. All Rights Reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Editor: Mike Sperber