rewrite it in Haskell

This commit is contained in:
stuebinm 2023-11-14 02:25:10 +01:00
parent e816bce69c
commit 02b29877f4
11 changed files with 609 additions and 7016 deletions

2
.gitignore vendored
View file

@ -1 +1 @@
build/* dist-newstyle/*

11
CHANGELOG.md Normal file
View file

@ -0,0 +1,11 @@
# Revision history for bahnhof.name
## 0.2.0.0 -- 2023-11-14
* Rewrite it in Haskell
* New feature: query the overpass API for platform information
## 0.1.0.0 -- 2023-05-11
* First version. Released on an unsuspecting world.
* Basic search of stations, implemented in gleam

287
LICENSE Normal file
View file

@ -0,0 +1,287 @@
EUROPEAN UNION PUBLIC LICENCE v. 1.2
EUPL © the European Union 2007, 2016
This European Union Public Licence (the EUPL) applies to the Work (as defined
below) which is provided under the terms of this Licence. Any use of the Work,
other than as authorised under this Licence is prohibited (to the extent such
use is covered by a right of the copyright holder of the Work).
The Work is provided under the terms of this Licence when the Licensor (as
defined below) has placed the following notice immediately following the
copyright notice for the Work:
Licensed under the EUPL
or has expressed by any other means his willingness to license under the EUPL.
1. Definitions
In this Licence, the following terms have the following meaning:
- The Licence: this Licence.
- The Original Work: the work or software distributed or communicated by the
Licensor under this Licence, available as Source Code and also as Executable
Code as the case may be.
- Derivative Works: the works or software that could be created by the
Licensee, based upon the Original Work or modifications thereof. This Licence
does not define the extent of modification or dependence on the Original Work
required in order to classify a work as a Derivative Work; this extent is
determined by copyright law applicable in the country mentioned in Article 15.
- The Work: the Original Work or its Derivative Works.
- The Source Code: the human-readable form of the Work which is the most
convenient for people to study and modify.
- The Executable Code: any code which has generally been compiled and which is
meant to be interpreted by a computer as a program.
- The Licensor: the natural or legal person that distributes or communicates
the Work under the Licence.
- Contributor(s): any natural or legal person who modifies the Work under the
Licence, or otherwise contributes to the creation of a Derivative Work.
- The Licensee or You: any natural or legal person who makes any usage of
the Work under the terms of the Licence.
- Distribution or Communication: any act of selling, giving, lending,
renting, distributing, communicating, transmitting, or otherwise making
available, online or offline, copies of the Work or providing access to its
essential functionalities at the disposal of any other natural or legal
person.
2. Scope of the rights granted by the Licence
The Licensor hereby grants You a worldwide, royalty-free, non-exclusive,
sublicensable licence to do the following, for the duration of copyright vested
in the Original Work:
- use the Work in any circumstance and for all usage,
- reproduce the Work,
- modify the Work, and make Derivative Works based upon the Work,
- communicate to the public, including the right to make available or display
the Work or copies thereof to the public and perform publicly, as the case may
be, the Work,
- distribute the Work or copies thereof,
- lend and rent the Work or copies thereof,
- sublicense rights in the Work or copies thereof.
Those rights can be exercised on any media, supports and formats, whether now
known or later invented, as far as the applicable law permits so.
In the countries where moral rights apply, the Licensor waives his right to
exercise his moral right to the extent allowed by law in order to make effective
the licence of the economic rights here above listed.
The Licensor grants to the Licensee royalty-free, non-exclusive usage rights to
any patents held by the Licensor, to the extent necessary to make use of the
rights granted on the Work under this Licence.
3. Communication of the Source Code
The Licensor may provide the Work either in its Source Code form, or as
Executable Code. If the Work is provided as Executable Code, the Licensor
provides in addition a machine-readable copy of the Source Code of the Work
along with each copy of the Work that the Licensor distributes or indicates, in
a notice following the copyright notice attached to the Work, a repository where
the Source Code is easily and freely accessible for as long as the Licensor
continues to distribute or communicate the Work.
4. Limitations on copyright
Nothing in this Licence is intended to deprive the Licensee of the benefits from
any exception or limitation to the exclusive rights of the rights owners in the
Work, of the exhaustion of those rights or of other applicable limitations
thereto.
5. Obligations of the Licensee
The grant of the rights mentioned above is subject to some restrictions and
obligations imposed on the Licensee. Those obligations are the following:
Attribution right: The Licensee shall keep intact all copyright, patent or
trademarks notices and all notices that refer to the Licence and to the
disclaimer of warranties. The Licensee must include a copy of such notices and a
copy of the Licence with every copy of the Work he/she distributes or
communicates. The Licensee must cause any Derivative Work to carry prominent
notices stating that the Work has been modified and the date of modification.
Copyleft clause: If the Licensee distributes or communicates copies of the
Original Works or Derivative Works, this Distribution or Communication will be
done under the terms of this Licence or of a later version of this Licence
unless the Original Work is expressly distributed only under this version of the
Licence — for example by communicating EUPL v. 1.2 only. The Licensee
(becoming Licensor) cannot offer or impose any additional terms or conditions on
the Work or Derivative Work that alter or restrict the terms of the Licence.
Compatibility clause: If the Licensee Distributes or Communicates Derivative
Works or copies thereof based upon both the Work and another work licensed under
a Compatible Licence, this Distribution or Communication can be done under the
terms of this Compatible Licence. For the sake of this clause, Compatible
Licence refers to the licences listed in the appendix attached to this Licence.
Should the Licensee's obligations under the Compatible Licence conflict with
his/her obligations under this Licence, the obligations of the Compatible
Licence shall prevail.
Provision of Source Code: When distributing or communicating copies of the Work,
the Licensee will provide a machine-readable copy of the Source Code or indicate
a repository where this Source will be easily and freely available for as long
as the Licensee continues to distribute or communicate the Work.
Legal Protection: This Licence does not grant permission to use the trade names,
trademarks, service marks, or names of the Licensor, except as required for
reasonable and customary use in describing the origin of the Work and
reproducing the content of the copyright notice.
6. Chain of Authorship
The original Licensor warrants that the copyright in the Original Work granted
hereunder is owned by him/her or licensed to him/her and that he/she has the
power and authority to grant the Licence.
Each Contributor warrants that the copyright in the modifications he/she brings
to the Work are owned by him/her or licensed to him/her and that he/she has the
power and authority to grant the Licence.
Each time You accept the Licence, the original Licensor and subsequent
Contributors grant You a licence to their contributions to the Work, under the
terms of this Licence.
7. Disclaimer of Warranty
The Work is a work in progress, which is continuously improved by numerous
Contributors. It is not a finished work and may therefore contain defects or
bugs inherent to this type of development.
For the above reason, the Work is provided under the Licence on an as is basis
and without warranties of any kind concerning the Work, including without
limitation merchantability, fitness for a particular purpose, absence of defects
or errors, accuracy, non-infringement of intellectual property rights other than
copyright as stated in Article 6 of this Licence.
This disclaimer of warranty is an essential part of the Licence and a condition
for the grant of any rights to the Work.
8. Disclaimer of Liability
Except in the cases of wilful misconduct or damages directly caused to natural
persons, the Licensor will in no event be liable for any direct or indirect,
material or moral, damages of any kind, arising out of the Licence or of the use
of the Work, including without limitation, damages for loss of goodwill, work
stoppage, computer failure or malfunction, loss of data or any commercial
damage, even if the Licensor has been advised of the possibility of such damage.
However, the Licensor will be liable under statutory product liability laws as
far such laws apply to the Work.
9. Additional agreements
While distributing the Work, You may choose to conclude an additional agreement,
defining obligations or services consistent with this Licence. However, if
accepting obligations, You may act only on your own behalf and on your sole
responsibility, not on behalf of the original Licensor or any other Contributor,
and only if You agree to indemnify, defend, and hold each Contributor harmless
for any liability incurred by, or claims asserted against such Contributor by
the fact You have accepted any warranty or additional liability.
10. Acceptance of the Licence
The provisions of this Licence can be accepted by clicking on an icon I agree
placed under the bottom of a window displaying the text of this Licence or by
affirming consent in any other similar way, in accordance with the rules of
applicable law. Clicking on that icon indicates your clear and irrevocable
acceptance of this Licence and all of its terms and conditions.
Similarly, you irrevocably accept this Licence and all of its terms and
conditions by exercising any rights granted to You by Article 2 of this Licence,
such as the use of the Work, the creation by You of a Derivative Work or the
Distribution or Communication by You of the Work or copies thereof.
11. Information to the public
In case of any Distribution or Communication of the Work by means of electronic
communication by You (for example, by offering to download the Work from a
remote location) the distribution channel or media (for example, a website) must
at least provide to the public the information requested by the applicable law
regarding the Licensor, the Licence and the way it may be accessible, concluded,
stored and reproduced by the Licensee.
12. Termination of the Licence
The Licence and the rights granted hereunder will terminate automatically upon
any breach by the Licensee of the terms of the Licence.
Such a termination will not terminate the licences of any person who has
received the Work from the Licensee under the Licence, provided such persons
remain in full compliance with the Licence.
13. Miscellaneous
Without prejudice of Article 9 above, the Licence represents the complete
agreement between the Parties as to the Work.
If any provision of the Licence is invalid or unenforceable under applicable
law, this will not affect the validity or enforceability of the Licence as a
whole. Such provision will be construed or reformed so as necessary to make it
valid and enforceable.
The European Commission may publish other linguistic versions or new versions of
this Licence or updated versions of the Appendix, so far this is required and
reasonable, without reducing the scope of the rights granted by the Licence. New
versions of the Licence will be published with a unique version number.
All linguistic versions of this Licence, approved by the European Commission,
have identical value. Parties can take advantage of the linguistic version of
their choice.
14. Jurisdiction
Without prejudice to specific agreement between parties,
- any litigation resulting from the interpretation of this License, arising
between the European Union institutions, bodies, offices or agencies, as a
Licensor, and any Licensee, will be subject to the jurisdiction of the Court
of Justice of the European Union, as laid down in article 272 of the Treaty on
the Functioning of the European Union,
- any litigation arising between other parties and resulting from the
interpretation of this License, will be subject to the exclusive jurisdiction
of the competent court where the Licensor resides or conducts its primary
business.
15. Applicable Law
Without prejudice to specific agreement between parties,
- this Licence shall be governed by the law of the European Union Member State
where the Licensor has his seat, resides or has his registered office,
- this licence shall be governed by Belgian law if the Licensor has no seat,
residence or registered office inside a European Union Member State.
Appendix
Compatible Licences according to Article 5 EUPL are:
- GNU General Public License (GPL) v. 2, v. 3
- GNU Affero General Public License (AGPL) v. 3
- Open Software License (OSL) v. 2.1, v. 3.0
- Eclipse Public License (EPL) v. 1.0
- CeCILL v. 2.0, v. 2.1
- Mozilla Public Licence (MPL) v. 2
- GNU Lesser General Public Licence (LGPL) v. 2.1, v. 3
- Creative Commons Attribution-ShareAlike v. 3.0 Unported (CC BY-SA 3.0) for
works other than software
- European Union Public Licence (EUPL) v. 1.1, v. 1.2
- Québec Free and Open-Source Licence — Reciprocity (LiLiQ-R) or Strong
Reciprocity (LiLiQ-R+).
The European Commission may update this Appendix to later versions of the above
licences without producing a new version of the EUPL, as long as they provide
the rights granted in Article 2 of this Licence and protect the covered Source
Code from exclusive appropriation.
All other changes or additions to this Appendix require the production of a new
EUPL version.

14
Readme.md Normal file
View file

@ -0,0 +1,14 @@
# bahnhof.name
This repository contains the sources for [bahnhof.name](https://bahnhof.name),
a simple web server which gives helpful information about railway stations
in Germany, from both static files & by querying osm for information which
are harder/impossible to get via Open Data from DB AG directly.
## A note on packaging
Building and packaging this is generally simple as long as you use `cabal`;
however, the `fuzzyfind` package seems a little out of date. Yet it compiles
fine even with newer dependencies, so you may want to disregard its version
bounds if that makes things easier for you. An example package done with Nix can
be found on [my nixfiles](https://stuebinm.eu/git/nixfiles/tree/pkgs/overlay.nix).

255
app/Main.hs Normal file
View file

@ -0,0 +1,255 @@
{-# LANGUAGE MultiWayIf #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE RecordWildCards #-}
module Main where
import Control.Concurrent.STM
import qualified Data.ByteString as BS
import qualified Data.ByteString.Lazy as LB
import Data.Char
import Data.Csv hiding (lookup)
import Data.Function ((&))
import Data.Functor ((<&>))
import Data.FuzzySet
import Data.List
import Data.Map (Map)
import qualified Data.Map as M
import Data.Maybe (fromMaybe)
import Data.Ord
import Data.Text (Text)
import qualified Data.Text as T
import Data.Text.Encoding (encodeUtf8)
import Data.Time.Clock
import Data.Tuple (swap)
import Data.Vector (Vector)
import qualified Data.Vector as V
import qualified Network.HTTP.Client as Client
import qualified Network.HTTP.Client.OpenSSL as Client
import Network.HTTP.Types
import Network.Wai
import Network.Wai.Handler.Warp (run)
import Network.Wai.Middleware.RequestLogger
import Text.FuzzyFind (Alignment (score),
bestMatch)
csvOptions, tsvOptions :: DecodeOptions
csvOptions = defaultDecodeOptions { decDelimiter = fromIntegral (ord ';') }
tsvOptions = defaultDecodeOptions { decDelimiter = fromIntegral (ord '\t') }
data MatchResult a b
= Exact a
| Fuzzy b
| None
deriving Show
findStationName :: T.Text -> FuzzySet -> MatchResult (Double, Text) (Double, Text)
findStationName query set = case sorted of
[exact] -> Exact exact
_ -> case maybeHbf of
station:_ -> Fuzzy station
_ -> case results of
station:_ -> Fuzzy station
_ -> None
where
sorted = results
& fmap (\(_, match) -> (fromIntegral . maybe 0 score . bestMatch (T.unpack query) $ T.unpack match, match))
& sortOn (Down . fst)
results = get set query
maybeHbf = filter (T.isInfixOf "Hbf" . snd) sorted
data Platform = Platform
{ osmType :: Text
, osmId :: Text
, ref :: Maybe Text
, localRef :: Maybe Text
} deriving Show
instance FromRecord Platform where
parseRecord v =
Platform <$>
v .! 0 <*>
v .! 1 <*>
v .! 2 <*>
v .! 3
data Answer
= Redirect Text
| Plaintext Text
| Html Text
| Notfound
| Unimplemented
maybeAnswer :: (a -> Answer) -> Maybe a -> Answer
maybeAnswer = maybe Notfound
newtype Ril100 = Ril100 { unRil100 :: Text }
deriving (Eq, Ord, Show)
data AppData = AppData
{ ril100map :: DoubleMap Ril100 Text
, leitpunktMap :: DoubleMap Ril100 Text
, ril100set :: FuzzySet
, ril100BaseUrl :: Text
, leitpunktBaseUrl :: Text
, clientManager :: Client.Manager
, platformCache :: TVar (Map Ril100 (UTCTime, Answer))
, cacheTime :: NominalDiffTime
}
app :: AppData -> Application
app AppData{..} request respond = mkAnswer >>= (respond . toResponse)
where
mkAnswer :: IO Answer
mkAnswer = case pathInfo request of
[] -> pure helptext
["favicon.ico"] -> pure Notfound
["cache"] -> do
cache <- readTVarIO platformCache
now <- getCurrentTime
M.toList cache
& fmap (\(ril100, (age, _)) -> (T.pack . show) (unRil100 ril100, now `diffUTCTime` age))
& T.unlines
& (pure . Plaintext)
[query]
| T.all isUpper query && host `elem` ["leitpunkt"]
-> lookupName query leitpunktMap
>>= (`lookupCode` ril100map)
& maybeAnswer Plaintext & pure
| T.all isUpper query
-> lookupCode (Ril100 query) ril100map
& maybeAnswer Plaintext & pure
| host `elem` ["leitpunkt"]
-> pure $ case findStationName query ril100set of
None -> Notfound
Exact (_,match) -> lookupName match ril100map
>>= (`lookupCode` leitpunktMap)
& maybeAnswer Plaintext
Fuzzy (_,match) -> Redirect (leitpunktBaseUrl <> "/" <> match)
| otherwise
-> pure $ case findStationName query ril100set of
None -> Notfound
Exact (_,match) -> lookupName match ril100map
& maybeAnswer (Plaintext . unRil100)
Fuzzy (_,match) -> Redirect (ril100BaseUrl <> "/" <> match)
[query, segment] | segment `elem` ["gleis", "track", "tracks", "gleise", "platform", "platforms", "fetch"]
-> case queriedRil100 query of
None -> pure Notfound
Fuzzy url -> pure (Redirect url)
Exact ril100 -> do
maybeCache <- readTVarIO platformCache <&> M.lookup ril100
now <- getCurrentTime
case maybeCache of
Just (age, answer)
| now `diffUTCTime` age < cacheTime && segment /= "fetch" -> pure answer
_ -> do
let overpassQuery = " \
\[out:csv(::type, ::id, ref, local_ref;false)][timeout:25];\n\
\node[~\"railway:ref|railway:ref:parent\"~\"^"<>encodeUtf8 (unRil100 ril100)<>"$\"][operator~\"^(DB |Deutsch)\"];\n\
\rel[public_transport~\"stop_area|stop_area_group\"](bn) -> .a;\n\
\rel[public_transport~\"stop_area|stop_area_group\"](br.a) -> .b;\n\
\(.a;.b;);\n\
\nwr[railway=platform](>>);\n\
\out;\n"
let req = "https://overpass-api.de/api/interpreter"
{ Client.requestBody = Client.RequestBodyBS overpassQuery
, Client.method = "POST"}
putStrLn $ "looking up platforms for " <> show ril100
response <- Client.httpLbs req clientManager
case decodeWith tsvOptions NoHeader (Client.responseBody response) of
Left _ -> pure Notfound
Right (platforms :: Vector Platform) -> do
let answer = Html $ T.concat (renderPlatform <$> V.toList platforms)
now <- getCurrentTime
atomically $ do
cache <- readTVar platformCache
writeTVar platformCache (M.insert ril100 (now, answer) cache)
pure answer
where
getRef (Just ref) _ = Just ref
getRef Nothing (Just ref) = Just ref
getRef _ _ = Nothing
renderPlatform Platform{..} = case getRef ref localRef of
Just ref -> "<a href=\"https://osm.org/"<>osmType<>"/"<>osmId<>"\">"<>ref<>"</a><br>"
Nothing -> ""
_ -> pure Notfound
queriedRil100 :: Text -> MatchResult Ril100 Text
queriedRil100 query = if
| T.all isUpper query && host `elem` ["leitpunkt"]
-> lookupName query leitpunktMap
& maybe None Exact
| T.all isUpper query
-> Exact (Ril100 query)
| host `elem` ["leitpunkt"]
-> case findStationName query ril100set of
None -> None
Exact (_,match) -> lookupName match ril100map
& maybe None Exact
Fuzzy (_,match) -> Fuzzy (leitpunktBaseUrl <> "/" <> match)
| otherwise
-> case findStationName query ril100set of
None -> None
Exact (_,match) -> lookupName match ril100map
& maybe None Exact
Fuzzy (_,match) -> Fuzzy (ril100BaseUrl <> "/" <> match)
helptext = Plaintext "no help yet"
host = head (BS.split (fromIntegral (ord '.')) rawHost)
where rawHost = case lookup "x-forwarded-host" $ requestHeaders request of
Nothing -> fromMaybe "" $ requestHeaderHost request
Just some -> some
toResponse :: Answer -> Response
toResponse ans = case ans of
Redirect uri -> responseLBS
status302 [("Location", encodeUtf8 uri)] ""
Plaintext msg -> responseLBS
status200 (mkHeaders "text/plain") (LB.fromStrict $ encodeUtf8 msg)
Html markup -> responseLBS
status200 (mkHeaders "text/html") (LB.fromStrict $ encodeUtf8 markup)
Notfound -> responseLBS
status404 (mkHeaders "text/plain") "??"
Unimplemented -> responseLBS
status404 (mkHeaders "text/plain") "Sorry, this is still under construction"
mkHeaders contentType =
[ ("Content-Type", contentType<>"; charset=utf8")
, ("x-data-by", "CC-BY 4.0 DB Netz AG https://data.deutschebahn.com/dataset/data-betriebsstellen.html")
, ("x-data-by", "OpenStreetMap Contributors https://www.openstreetmap.org/copyright/")
, ("x-sources-at", "https://stuebinm.eu/git/bahnhof.name")
]
data DoubleMap code long = DoubleMap { there :: Map code long, back :: Map long code }
lookupCode :: Ord code => code -> DoubleMap code long -> Maybe long
lookupCode code maps = M.lookup code (there maps)
lookupName :: Ord long => long -> DoubleMap code long -> Maybe code
lookupName name maps = M.lookup name (back maps)
mkDoubleMap :: (Ord code, Ord long) => Vector (code, long) -> DoubleMap code long
mkDoubleMap tuplesvec = DoubleMap (M.fromList tuples) (M.fromList (fmap swap tuples))
where tuples = V.toList tuplesvec
main :: IO ()
main = do
Right (betriebsstellen :: V.Vector [Text]) <-
LB.readFile "data/DBNetz-Betriebsstellenverzeichnis-Stand2021-10.csv"
<&> decodeWith csvOptions HasHeader
<&> fmap (V.filter (\line -> line !! 4 `notElem` ["BUSH", "LGR", "ÜST", "BFT", "ABZW", "BK", "AWAN", "ANST", "LGR"]))
Right (leitpunkte :: V.Vector [Text]) <-
LB.readFile "data/leitpunkte.csv"
<&> decodeWith csvOptions HasHeader
putStrLn "building Index ..."
let ril100set = addMany (emptySet 5 6 False) (V.toList (V.map (!! 2) betriebsstellen))
putStrLn (seq ril100set "done")
let ril100map = mkDoubleMap $ fmap (\line -> (Ril100 (line !! 1), line !! 2)) betriebsstellen
let leitpunktMap = mkDoubleMap $ fmap (\line -> (Ril100 (line !! 2), line !! 0)) leitpunkte
let ril100BaseUrl = "https://ril100.bahnhof.name"
let leitpunktBaseUrl = "https://leitpunkt.bahnhof.name"
let cacheTime = 3600 * 24 * 7 -- one week
platformCache <- newTVarIO mempty
Client.withOpenSSL $ do
clientManager <- Client.newOpenSSLManager
putStrLn "Starting Server"
run 8080 (logStdoutDev (app AppData{..}))

41
bahnhof-name.cabal Normal file
View file

@ -0,0 +1,41 @@
cabal-version: 3.0
name: bahnhof-name
-- PVP summary: +-+------- breaking API changes
-- | | +----- non-breaking API additions
-- | | | +--- code changes with no API change
version: 0.2.0.0
synopsis: Serve information on German railway stations
license: EUPL-1.2
license-file: LICENSE
author: stuebinm
maintainer: stuebinm@disroot.org
build-type: Simple
extra-doc-files: CHANGELOG.md
extra-source-files: data/*
common warnings
ghc-options: -Wall
executable bahnhof-name
import: warnings
main-is: Main.hs
hs-source-dirs: app
build-depends: base ^>=4.16.4.0
, fuzzyset
, fuzzyfind
, text
, vector
, cassava
, bytestring
, wai
, wai-extra
, warp
, http-types
, containers
, http-client
, http-client-openssl
, stm
, time
default-language: GHC2021
ghc-options: -threaded -with-rtsopts=--nonmoving-gc

File diff suppressed because it is too large Load diff

View file

@ -1,21 +0,0 @@
name = "bahnhofname"
version = "0.1.0"
# Fill out these fields if you intend to generate HTML documentation or publish
# your project to the Hex package manager.
#
# licences = ["Apache-2.0"]
# description = "A Gleam library..."
# repository = { type = "github", user = "username", repo = "project" }
# links = [{ title = "Website", href = "https://gleam.run" }]
[dependencies]
gleam_stdlib = "~> 0.19"
gleam_http = "~> 3.0"
mist = "~> 0.4"
gleam_hackney = "~> 1.0"
haystack = "~> 0.1"
gleam_erlang = "~> 0.22"
[dev-dependencies]
gleeunit = "~> 0.6"

View file

@ -1,34 +0,0 @@
# This file was generated by Gleam
# You typically do not need to edit this file
packages = [
{ name = "certifi", version = "2.12.0", build_tools = ["rebar3"], requirements = [], otp_app = "certifi", source = "hex", outer_checksum = "EE68D85DF22E554040CDB4BE100F33873AC6051387BAF6A8F6CE82272340FF1C" },
{ name = "decimal", version = "2.1.1", build_tools = ["mix"], requirements = [], otp_app = "decimal", source = "hex", outer_checksum = "53CFE5F497ED0E7771AE1A475575603D77425099BA5FAEF9394932B35020FFCC" },
{ name = "gleam_erlang", version = "0.22.0", build_tools = ["gleam"], requirements = ["gleam_stdlib"], otp_app = "gleam_erlang", source = "hex", outer_checksum = "367D8B41A7A86809928ED1E7E55BFD0D46D7C4CF473440190F324AFA347109B4" },
{ name = "gleam_hackney", version = "1.1.0", build_tools = ["gleam"], requirements = ["gleam_http", "gleam_stdlib", "hackney"], otp_app = "gleam_hackney", source = "hex", outer_checksum = "CA69AD9061C4A8775A7BD445DE33ECEFD87379AF8E5B028F3DD0216BECA5DD0B" },
{ name = "gleam_http", version = "3.5.0", build_tools = ["gleam"], requirements = ["gleam_stdlib"], otp_app = "gleam_http", source = "hex", outer_checksum = "FAE9AE3EB1CA90C2194615D20FFFD1E28B630E84DACA670B28D959B37BCBB02C" },
{ name = "gleam_otp", version = "0.7.0", build_tools = ["gleam"], requirements = ["gleam_stdlib", "gleam_erlang"], otp_app = "gleam_otp", source = "hex", outer_checksum = "ED7381E90636E18F5697FD7956EECCA635A3B65538DC2BE2D91A38E61DCE8903" },
{ name = "gleam_stdlib", version = "0.31.0", build_tools = ["gleam"], requirements = [], otp_app = "gleam_stdlib", source = "hex", outer_checksum = "6D1BC5B4D4179B9FEE866B1E69FE180AC2CE485AD90047C0B32B2CA984052736" },
{ name = "gleeunit", version = "0.11.0", build_tools = ["gleam"], requirements = ["gleam_stdlib"], otp_app = "gleeunit", source = "hex", outer_checksum = "1397E5C4AC4108769EE979939AC39BF7870659C5AFB714630DEEEE16B8272AD5" },
{ name = "glisten", version = "0.9.1", build_tools = ["gleam"], requirements = ["gleam_otp", "gleam_erlang", "gleam_stdlib"], otp_app = "glisten", source = "hex", outer_checksum = "91809C44C52456D96C8317A19246DE1C06ED494C40D282CD9380565E879A52C4" },
{ name = "hackney", version = "1.20.1", build_tools = ["rebar3"], requirements = ["mimerl", "certifi", "ssl_verify_fun", "metrics", "idna", "unicode_util_compat", "parse_trans"], otp_app = "hackney", source = "hex", outer_checksum = "FE9094E5F1A2A2C0A7D10918FEE36BFEC0EC2A979994CFF8CFE8058CD9AF38E3" },
{ name = "haystack", version = "0.1.0", build_tools = ["mix"], requirements = ["stemmer", "jason"], otp_app = "haystack", source = "hex", outer_checksum = "27A582513EF933C1B11345B96F8D41EE137D03B25312BD85068FFE8FEC503635" },
{ name = "idna", version = "6.1.1", build_tools = ["rebar3"], requirements = ["unicode_util_compat"], otp_app = "idna", source = "hex", outer_checksum = "92376EB7894412ED19AC475E4A86F7B413C1B9FBB5BD16DCCD57934157944CEA" },
{ name = "jason", version = "1.4.1", build_tools = ["mix"], requirements = ["decimal"], otp_app = "jason", source = "hex", outer_checksum = "FBB01ECDFD565B56261302F7E1FCC27C4FB8F32D56EAB74DB621FC154604A7A1" },
{ name = "metrics", version = "1.0.1", build_tools = ["rebar3"], requirements = [], otp_app = "metrics", source = "hex", outer_checksum = "69B09ADDDC4F74A40716AE54D140F93BEB0FB8978D8636EADED0C31B6F099F16" },
{ name = "mimerl", version = "1.2.0", build_tools = ["rebar3"], requirements = [], otp_app = "mimerl", source = "hex", outer_checksum = "F278585650AA581986264638EBF698F8BB19DF297F66AD91B18910DFC6E19323" },
{ name = "mist", version = "0.14.0", build_tools = ["gleam"], requirements = ["gleam_stdlib", "gleam_erlang", "gleam_http", "gleam_otp", "glisten"], otp_app = "mist", source = "hex", outer_checksum = "7CDD0396D9A556F1069D83E9AF2B24388AAC478B9B4846615C6D4797E1D3C6A3" },
{ name = "parse_trans", version = "3.4.1", build_tools = ["rebar3"], requirements = [], otp_app = "parse_trans", source = "hex", outer_checksum = "620A406CE75DADA827B82E453C19CF06776BE266F5A67CFF34E1EF2CBB60E49A" },
{ name = "ssl_verify_fun", version = "1.1.7", build_tools = ["mix", "rebar3", "make"], requirements = [], otp_app = "ssl_verify_fun", source = "hex", outer_checksum = "FE4C190E8F37401D30167C8C405EDA19469F34577987C76DDE613E838BBC67F8" },
{ name = "stemmer", version = "1.1.0", build_tools = ["mix"], requirements = [], otp_app = "stemmer", source = "hex", outer_checksum = "0CB5FAF73476B84500E371FF39FD9A494F60AB31D991689C1CD53B920556228F" },
{ name = "unicode_util_compat", version = "0.7.0", build_tools = ["rebar3"], requirements = [], otp_app = "unicode_util_compat", source = "hex", outer_checksum = "25EEE6D67DF61960CF6A794239566599B09E17E668D3700247BC498638152521" },
]
[requirements]
gleam_erlang = { version = "~> 0.22" }
gleam_hackney = { version = "~> 1.0" }
gleam_http = { version = "~> 3.0" }
gleam_stdlib = { version = "~> 0.19" }
gleeunit = { version = "~> 0.6" }
haystack = { version = "~> 0.1" }
mist = { version = "~> 0.4" }

View file

@ -1,430 +0,0 @@
import gleam/http/response.{Response}
import gleam/http/request.{Request, get_header}
import gleam/http.{Get}
import gleam/bit_builder
import gleam/erlang/process
import gleam/erlang/atom
import gleam/erlang/file
import gleam/io
import gleam/int
import gleam/string
import gleam/bit_string
import gleam/list
import gleam/map.{Map}
import gleam/uri
import gleam/hackney
import gleam/pair.{swap}
import gleam/result
import mist
const ds100_domain = "ds100.bahnhof.name"
const ril100_domain = "ril100.bahnhof.name"
const leitpunkt_domain = "leitpunkt.bahnhof.name"
const domain = "bahnhof.name"
const proto = "https://"
type Index
type Field
@external(erlang, "Elixir.Haystack.Index", "new")
fn index_new(a: atom.Atom) -> Index
@external(erlang, "Elixir.Haystack.Index", "ref")
fn index_ref(a: Index, b: Field) -> Index
@external(erlang, "Elixir.Haystack.Index", "field")
fn index_field(a: Index, b: Field) -> Index
@external(erlang, "Elixir.Haystack.Index.Field", "term")
fn field_term(a: String) -> Field
@external(erlang, "Elixir.Haystack.Index.Field", "new")
fn field_new(a: String) -> Field
@external(erlang, "Elixir.Haystack.Index", "add")
fn index_add(a: Index, b: List(a)) -> Index
@external(erlang, "Elixir.IO", "inspect")
pub fn inspect(a: a) -> a
type Query
type Clause
type Expression
@external(erlang, "Elixir.Haystack.Query", "new")
fn query_new() -> Query
@external(erlang, "Elixir.Haystack.Query", "clause")
fn query_clause(a: Query, b: Clause) -> Query
@external(erlang, "Elixir.Haystack.Query", "run")
fn query_run(a: Query, b: Index) -> List(Map(atom.Atom, String))
@external(erlang, "Elixir.Haystack.Query.Clause", "new")
fn clause_new(a: atom.Atom) -> Clause
@external(erlang, "Elixir.Haystack.Query.Clause", "expressions")
fn query_expressions(a: Clause, b: List(Expression)) -> Clause
@external(erlang, "Elixir.Haystack.Query.Expression", "new")
fn query_expression_new(
a: atom.Atom,
b: List(#(atom.Atom, String)),
) -> Expression
@external(erlang, "Elixir.Haystack.Tokenizer", "tokenize")
fn tokenize(a: String) -> List(Map(atom.Atom, String))
type IdKind {
DS100
Leitpunkt
}
type Matched(t) {
Exact(t)
Fuzzy(t, t)
Failed
}
fn unpercent(encoded: String) -> String {
let #([head], chunks) =
encoded
|> string.split(on: "%")
|> list.split(at: 1)
let assert Ok(res) =
chunks
|> list.map(fn(str) {
case string.length(str) < 2 {
True -> bit_string.from_string(str)
False -> {
let assert Ok(codepoint) =
str
|> string.slice(at_index: 0, length: 2)
|> int.base_parse(16)
<<codepoint:8, string.drop_left(str, 2):utf8>>
}
}
})
|> list.prepend(bit_string.from_string(head))
|> bit_string.concat
|> bit_string.to_string
|> result.map(fn(str) { string.replace(str, "_", " ") })
res
}
/// Looks up a query in a Map by exact value, no fuzzy matching.
fn lookup_exact(query: String, lookup: Map(String, String)) -> #(Int, String) {
case map.get(lookup, query) {
Ok(result) -> #(200, result)
_ -> #(404, "??")
}
}
/// Looks up a station by its name, with fuzzy matching.
fn lookup_fuzzy(
query: String,
kind: IdKind,
fuzzy: fn(String, IdKind) -> Matched(String),
) -> #(Int, String) {
case fuzzy(query, kind) {
Exact(res) -> #(200, res)
Fuzzy(res, _) -> #(302, res)
Failed -> #(404, "??")
}
}
fn if_not(res: #(Int, t), fallback: fn() -> #(Int, t)) -> #(Int, t) {
inspect(case res {
#(200, _) -> res
_ -> fallback()
})
}
fn lookup_station(
request: Request(t),
ds100_to_name: Map(String, String),
leitpunkt_to_name: Map(String, String),
lookup_platform: fn(String) -> String,
fuzzy: fn(String, IdKind) -> Matched(String),
) -> Response(mist.ResponseData) {
let #(#(code, text), is_html) = case request {
// blackhole favicon.ico requests instead of using the index
Request(method: Get, path: "/favicon.ico", ..) -> #(#(404, ""), False)
Request(method: Get, path: "/help", ..)
| Request(method: Get, path: "/", ..) -> #(#(
200,
"ril100 → Name: " <> proto <> ril100_domain <> "/HG\n" <>
"Name → ril100: " <> proto <> ril100_domain <> "/Göttingen\n\n" <>
"Leitpunkt → Name: " <> proto <> leitpunkt_domain <> "/GOE\n" <>
"Name → Leitpunkt: " <> proto <> leitpunkt_domain <> "/Göttingen\n\n" <>
"Fuzzy:" <> proto <> domain <> "/...",
), False)
Request(method: Get, path: "/" <> path, ..) -> {
let raw_query = unpercent(path)
let show_platforms = string.ends_with(raw_query, "/gleis")
|| string.ends_with(raw_query, "/bahnsteig")
|| string.ends_with(raw_query, "/platforms")
|| string.ends_with(raw_query, "/tracks")
|| string.ends_with(raw_query, "/platform")
|| string.ends_with(raw_query, "/track")
let query = raw_query
|> string.replace("/gleis","")
|> string.replace("/bahnsteig","")
|> string.replace("/platforms","")
|> string.replace("/tracks","")
|> string.replace("/platform","")
|> string.replace("/track","")
case #(show_platforms, get_header(request, "x-forwarded-host")) {
#(False, Ok(domain)) if domain == leitpunkt_domain ->
query
|> lookup_exact(leitpunkt_to_name)
|> if_not(fn() { lookup_fuzzy(query, Leitpunkt, fuzzy) })
|> pair.new(False)
#(False, Ok(domain)) if domain == ril100_domain || domain == ds100_domain ->
query
|> lookup_exact(ds100_to_name)
|> if_not(fn() { lookup_fuzzy(query, DS100, fuzzy) })
|> pair.new(False)
#(True, Ok(domain)) if domain == leitpunkt_domain -> {
let query = case map.get(leitpunkt_to_name, query) {
Ok(name) -> name
_ -> query
}
case fuzzy(query, DS100) {
Exact(code) -> #(200, lookup_platform(code))
Fuzzy(_, code) -> #(200, lookup_platform(code))
_ -> #(404, "")
} |> pair.new(True)
}
#(True, Ok(domain)) if domain == ril100_domain || domain == ds100_domain ->
case lookup_exact(query, ds100_to_name) {
#(200,_) -> #(200, lookup_platform(query))
_ -> case fuzzy(query, DS100) {
Exact(code) -> #(200, lookup_platform(code))
Fuzzy(_, code) -> #(200, lookup_platform(code))
_ -> #(404, "")
}
} |> pair.new(True)
_ -> {
let by_ds100 = lookup_exact(query, ds100_to_name)
let by_lp = lookup_exact(query, leitpunkt_to_name)
case #(by_ds100.0, by_lp.0) {
#(200, _) -> #(302, proto <> ril100_domain <> "/" <> path)
#(_, 200) -> #(302, proto <> leitpunkt_domain <> "/" <> path)
_ -> #(302, proto <> ril100_domain <> "/" <> path)
} |> pair.new(False)
}
}
}
_ -> #(#(404, "intended usage is e.g. curl " <> proto <> domain <> "/FF"), False)
}
let body = text
|> bit_builder.from_string
|> mist.Bytes
let content_type = case is_html {
True -> "text/html; charset=utf8"
False -> "text/plain; charset=utf8"
}
response.new(code)
|> response.prepend_header(
"x-data-source",
"https://data.deutschebahn.com/dataset/data-betriebsstellen.html",
)
|> response.prepend_header(
"x-sources-at",
"https://stuebinm.eu/git/bahnhof.name",
)
|> response.prepend_header("content-type", content_type)
|> fn(a) {
case code == 302 {
True -> response.prepend_header(a, "location", text)
_ -> a
}
}
|> response.set_body(body)
}
pub fn main() {
let assert Ok(bahn_ril100) = file.read("data/DBNetz-Betriebsstellenverzeichnis-Stand2021-10.csv")
let ds100s =
read_csv(bahn_ril100, ";")
|> list.filter_map(fn(fields) {
case fields {
[_, ds100, name, ..] -> Ok(#(name, ds100))
_ -> Error(fields)
}
})
let assert Ok(leitpunkte_raw) = file.read("data/leitpunkte.csv")
let leitpunkte =
read_csv(leitpunkte_raw, ";")
|> list.filter_map(fn(fields) {
case fields {
[lp, name, _ds100] -> Ok(#(name, lp))
_ -> Error(fields)
}
})
let assert Ok(platforms_raw) = file.read("data/platforms.tsv")
let platforms = read_csv(platforms_raw, "\t")
let name_to_ds100 = map.from_list(ds100s)
let name_to_leitpunkt = map.from_list(leitpunkte)
let ds100_to_name = map.from_list(list.map(ds100s, swap))
let leitpunkt_to_name = map.from_list(list.map(leitpunkte, swap))
let ds100index =
index_new(atom.create_from_string("ds100"))
|> index_ref(field_term("id"))
|> index_field(field_new("name"))
|> index_add(
ds100s
|> list.map(fn(tuple) {
case tuple {
#(name, ds100) -> map.from_list([#("id", ds100), #("name", name)])
}
}),
)
let leitpunkt_index =
index_new(atom.create_from_string("leitpunkt"))
|> index_ref(field_term("id"))
|> index_field(field_new("name"))
|> index_add(
leitpunkte
|> list.map(fn(tuple) {
case tuple {
#(name, leitpunkt) ->
map.from_list([#("id", leitpunkt), #("name", name)])
}
}),
)
let ref = atom.create_from_string("ref")
let fuzzy = fn(searchterm: String, kind: IdKind) -> List(String) {
let query = query_new()
let index = case kind {
DS100 -> ds100index
Leitpunkt -> leitpunkt_index
}
let match = atom.create_from_string("match")
let field = atom.create_from_string("field")
let term = atom.create_from_string("term")
let expressions =
tokenize(inspect(searchterm))
|> list.filter_map(fn(a) { map.get(a, atom.create_from_string("v")) })
|> list.map(fn(token) {
query_expression_new(match, [#(field, "name"), #(term, token)])
})
let clause =
query_expressions(clause_new(atom.create_from_string("all")), expressions)
let query = query_clause(query, clause)
let matches =
query_run(query, index)
|> list.filter_map(fn(a) { map.get(a, ref) })
inspect(matches)
case list.length(matches) > 5 {
True -> {
let query = query_new()
let clause =
query_expressions(
clause_new(atom.create_from_string("all")),
[
query_expression_new(match, [#(field, "name"), #(term, "hbf")]),
..expressions
],
)
let query = query_clause(query, clause)
let narrow =
query_run(query, index)
|> list.filter_map(fn(a) { map.get(a, ref) })
case narrow {
[] -> matches
_ -> narrow
}
}
_ -> matches
}
}
let exact_then_fuzzy = fn(searchterm: String, kind: IdKind) -> Matched(String) {
let #(stations, ids) = case kind {
DS100 -> #(name_to_ds100, ds100_to_name)
Leitpunkt -> #(name_to_leitpunkt, leitpunkt_to_name)
}
case map.get(stations, searchterm) {
Ok(id) -> Exact(id)
_ -> {
let results =
fuzzy(searchterm, kind)
|> list.filter_map(fn(res) { map.get(ids, string.uppercase(res)) })
case results {
[res] -> {
let assert Ok(station) = map.get(stations, res)
Fuzzy(res, station)
}
[res, ..] -> {
let assert Ok(station) = map.get(stations, res)
Fuzzy(res, station)
}
_ -> Failed
}
}
}
}
let lookup_platform = fn(ds100: String) -> String {
inspect(ds100)
platforms
|> list.filter(fn(a) { list.first(a) == Ok(ds100) })
|> list.map(fn(line) { case line {
[_code,osmid,osmtype,info] -> "<a href=\"https://osm.org/"<>osmtype<>"/"<>osmid<>"\">"<>info<>"</a>"
}})
|> string.join("<br>\n")
|> inspect
}
io.println("compiled indices, starting server …")
let assert Ok(_) =
fn(req: Request(mist.Connection)) -> Response(mist.ResponseData) {
lookup_station(req, ds100_to_name, leitpunkt_to_name, lookup_platform, exact_then_fuzzy)
}
|> mist.new
|> mist.port(2345)
|> mist.start_http
process.sleep_forever()
}
fn fetch_data() -> Result(String, hackney.Error) {
let assert Ok(uri) =
uri.parse(
"https://download-data.deutschebahn.com/static/datasets/betriebsstellen/DBNetz-Betriebsstellenverzeichnis-Stand2021-10.csv",
)
let assert Ok(request) = request.from_uri(uri)
io.println("got response")
let assert Ok(response) = hackney.send(request)
// some ü are corrupted for some reason
Ok(string.replace(response.body, "<EFBFBD>", "ü"))
}
fn read_csv(contents, sep) -> List(List(String)) {
contents
// the file doesn't use quotes, so this is fine
|> string.split(on: "\n")
// drop CSV header
|> list.drop(1)
|> list.map(fn(a) { string.split(a, on: sep) })
}

View file

@ -1,12 +0,0 @@
import gleeunit
import gleeunit/should
pub fn main() {
gleeunit.main()
}
// gleeunit test functions end in `_test`
pub fn hello_world_test() {
1
|> should.equal(1)
}