Copyright | Unison Computing 2021 |
---|---|
License | MIT |
Maintainer | runar.bjarnason@unison.cloud |
Stability | experimental |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
A package that provides an API for fuzzy text search in Haskell, using a modified version of the Smith-Waterman algorithm. The search is intended to behave similarly to the excellent fzf tool by Junegunn Choi.
Synopsis
- bestMatch :: String -> String -> Maybe Alignment
- fuzzyFind :: [String] -> [String] -> [Alignment]
- fuzzyFindOn :: (a -> String) -> [String] -> [a] -> [(Alignment, a)]
- type Score = Int
- data Alignment = Alignment {}
- defaultMatchScore :: Int
- defaultMismatchScore :: Int
- defaultBoundaryBonus :: Int
- defaultCamelCaseBonus :: Int
- defaultFirstCharBonusMultiplier :: Int
- defaultGapPenalty :: Int
- defaultConsecutiveBonus :: Int
- defaultLaterBonusMultiplier :: Int
- segmentToString :: ResultSegment -> String
- highlight :: Alignment -> String
- highlight' :: Alignment -> Text
- bestMatch' :: Int -> Int -> Int -> Int -> Int -> Int -> Int -> Int -> String -> String -> Maybe Alignment
- data ResultSegment
- newtype Result = Result {}
- mergeResults :: Result -> Result -> Result
Documentation
bestMatch query string
will return Nothing
if query
is not a
subsequence of string
. Otherwise, it will return the "best" way to line up
the characters in query
with the characters in string
. Lower-case
characters in the query
are assumed to be case-insensitive, and upper-case
characters are assumed to be case-sensitive.
For example:
> bestMatch "ff" "FuzzyFind" Just (Alignment {score = 25, result = Result {[Match "F", Gap "uzzy", Match "F", Gap "ind"]}})
The score indicates how "good" the match is. Better matches have higher
scores. There's no maximum score (except for the upper limit of the Int
datatype), but the lowest score is 0
.
A substring from the query will generate a Match
, and any characters from
the input that don't result in a Match
will generate a Gap
.
Concatenating all the Match
and Gap
results should yield the original
input string.
Note that the matched characters in the input always occur in the same order as they do in the query pattern.
The algorithm prefers (and will generate higher scores for) the following kinds of matches:
- Contiguous characters from the query string. For example,
bestMatch "pp"
will find the last two ps in "pickled pepper". - Characters at the beginnings of words. For example,
bestMatch "pp"
will find the first two Ps in "Peter Piper". - Characters at CamelCase humps. For example,
bestMatch "bm" "BatMan"
will score higher thanbestMatch "bm" "Batman".
- The algorithm strongly prefers the first character of the query pattern
to be at the beginning of a word or CamelHump. For example,
bestMatch "mn" "Bat Man"
will score higher thanbestMatch "atn" "Batman"
.
All else being equal, matches that occur later in the input string are preferred.
Finds input strings that match all the given input patterns. For each input
that matches, it returns one Alignment
. The output is not sorted.
ascending.
For example:
> import Data.Foldable > traverse_ (putStrLn . ("\n" ++) . highlight) $ fuzzyFind ["dad", "mac", "dam"] ["red macadamia", "Madam Card"] Madam Card * *** ** * red macadamia * *******
fuzzyFindOn :: (a -> String) -> [String] -> [a] -> [(Alignment, a)] Source #
A version of fuzzyFind
that searches on the given text field of the data.
Instances
Monoid Alignment Source # | |
Semigroup Alignment Source # | |
Generic Alignment Source # | |
Show Alignment Source # | |
Eq Alignment Source # | |
Ord Alignment Source # | |
Defined in Text.FuzzyFind | |
type Rep Alignment Source # | |
Defined in Text.FuzzyFind type Rep Alignment = D1 ('MetaData "Alignment" "Text.FuzzyFind" "fuzzyfind-3.0.2-4YO0x8cFre42MKuVqElImZ" 'False) (C1 ('MetaCons "Alignment" 'PrefixI 'True) (S1 ('MetaSel ('Just "score") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Score) :*: S1 ('MetaSel ('Just "result") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 Result))) |
defaultMatchScore :: Int Source #
The base score given to a matching character
defaultMismatchScore :: Int Source #
The base score given to a mismatched character
defaultBoundaryBonus :: Int Source #
Bonus points given to characters matching at the beginning of words
defaultCamelCaseBonus :: Int Source #
Bonus points given to characters matching a hump of a CamelCase word. We subtract a point from the word boundary score, since a word boundary will incur a gap penalty.
defaultFirstCharBonusMultiplier :: Int Source #
Double any bonus points for matching the first pattern of the character. This way we strongly prefer starting the match at the beginning of a word.
defaultGapPenalty :: Int Source #
We prefer consecutive runs of matched characters in the pattern, so we impose a penalty for any gaps, which is added to the size of the gap.
defaultConsecutiveBonus :: Int Source #
We give a bonus to consecutive matching characters. A number about the same as the boundary bonus will prefer runs of consecutive characters vs finding acronyms.
defaultLaterBonusMultiplier :: Int Source #
We give a bonus for matches that occur later in the input string. If this is e.g. 100, we give one bonus point for each percentage of the length of the input that the match occurs at (100 points for the last character, 50 for characters in the middle, and so on).
highlight :: Alignment -> String Source #
Renders an Alignment
as a pair of lines with "*" on the lower line
indicating the location of pattern matches.
highlight' :: Alignment -> Text Source #
:: Int | Base score for a matching character. See |
-> Int | Base score for a mismatched character. See |
-> Int | Additional penalty for a gap. See |
-> Int | Bonus score for a match at the beginning of a word. See |
-> Int | Bonus score for a match on a CamelCase hump. See |
-> Int | Bonus multiplier for matching the first character of the pattern.
See |
-> Int | Bonus score for each consecutive character matched.
See |
-> Int | Bonus multiplier for matching later characters of the input. E.g. if this is 10, the bonus for matching the last character of the input is 10, and 5 for characters close to the middle, and so on. |
-> String | The query pattern. |
-> String | The input string. |
-> Maybe Alignment |
A highly configurable version of bestMatch
.
data ResultSegment Source #
Instances
Concatenating all the ResultSegment
s should yield the original input string.
Instances
Monoid Result Source # | |
Semigroup Result Source # | |
Generic Result Source # | |
Show Result Source # | |
Eq Result Source # | |
Ord Result Source # | |
type Rep Result Source # | |
Defined in Text.FuzzyFind type Rep Result = D1 ('MetaData "Result" "Text.FuzzyFind" "fuzzyfind-3.0.2-4YO0x8cFre42MKuVqElImZ" 'True) (C1 ('MetaCons "Result" 'PrefixI 'True) (S1 ('MetaSel ('Just "segments") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 (Seq ResultSegment)))) |