From 94a52b94dce7ff198d4a03ea72c38825e9417972 Mon Sep 17 00:00:00 2001 From: philipp Date: Tue, 6 Feb 2024 11:30:28 +0100 Subject: [PATCH] clean up repo --- README.md | 64 ++++++++++++------------------------------------------- 1 file changed, 14 insertions(+), 50 deletions(-) diff --git a/README.md b/README.md index 0749d37..a23cdb0 100644 --- a/README.md +++ b/README.md @@ -1,76 +1,40 @@ -RISolve - -# Folder -- ./data - - cache -> cache for `overview` tests - - expected - - overview -> expected xml links of law_ids - # Add new law text ## Tests - Getting paragraphs from `law_id` (`risparser::overview::test::parse()`) - Create file `law_id` in `./data/expected/overview` (then run tests to get current output + save in file) -- Parsing paragraphs: add test in `src/risparser/paragraph/mod.rs` -- - - -# Features (to be moved to lib.rs one-by-one) -- Text to structured law - - `LawBuilder`: Structure law, by specifying (sub-)sections (`new_header`), its description (`new_desc`), paragraphs under the current (sub-)section (`new_par`), and the description of the next paragraph (`new_next_para_header`). `Classifier` need to be set. - - Main output: Properly structured laws (`Law`) - - `Law`: Represents a structured law text. Can be generated with `LawBuilder`. - - Main output: properly formatted (md for a start) law text, no need to export Heading/... etc -- RIS Fetcher (to be mocked) - - all paragraphs of specific law (`overview`) - - xml document from url (`par/mod.rs fetch_age`) -- Parser - - replace errors w/ config file +- Create config file in `./data/configs/` # Integration test - Nice test would be to re-create html ris file and compare it (problem with custom fixes, though) -# History -- [I've created my first parser using RIS API, daily updated. Failed because I tried to do too much automatically (e.g. recognizing headers](https://gitlab.com/PhilippHofer/law) -- [Using print-website, I've extracted stuff w/ regex.](https://gitlab.com/PhilippHofer/ris/) -- [Tried to create a parser using print-website, proper(-ish) parser](https://gitlab.com/PhilippHofer/ris2) - # Goals - [x] I want to have the text of the law. - [x] I want to see the structure (proper headers) of the law. - [ ] I want to be able to make comments (e.g. Erschöpfung) on certain parts - [ ] I want to see since when this paragraph is in use. -- [~] Lawtext should be updateable +- [.] Lawtext should be updateable -# Technical +# Mindset - I don't want to restrict myself with a [parser combinators](docs.rs/nom) but code it myself using *recursive descent* parser. - Be strict in what I process. Fail if anything unexpected happens. The user should handle this case. It's fine if one decides to ignore the new/unexpected field, but it should be done deliberately. -# Progress / Functions - -- [x] Parse structure of law into struct using Deserilize trait, pot. multiple requests (if > 100 paragraphs) -- [x] Parse risdok using own *RD parser*, again strict: fail if anything not expected happens, not sure (yet) if I want to operate on strings, or first parse using off-the-shelve XML reader (prob. 2nd option) - -# Next step - -- [x] Parse ABGB -- [ ] Create config files for laws - - law_id - - replace stuff - - headers -- [ ] Create argument parse - - `--law mschg.conf` - -# Naming +# Nomenclature - Law ("Gesetz"): e.g. UHG, TEG, ABGB - Section ("Paragraph") - Subsection ("Absatz") - Item ("Ziffer") - Heading-{1,2,3,...} - - -# "Scripts" -- Retrieve overview law: `curl -X POST "https://data.bka.gv.at/ris/api/v2.6/Bundesrecht" -H "Content-Type: application/x-www-form-urlencoded" -d "Applikation=BrKons" -d "Gesetzesnummer=10001899" -d "DokumenteProSeite=OneHundred" -d "Seitennummer=1" -d "Fassung.FassungVom=2023-11-03" | jq . > law.json` +# Folder-Structure of this repo +- ./data + - cache -> cache for `overview` tests + - expected + - overview -> expected xml links of law_ids + +# History +- [I've created my first parser using RIS API, daily updated. Failed because I tried to do too much automatically (e.g. recognizing headers](https://gitlab.com/PhilippHofer/law) +- [Using print-website, I've extracted stuff w/ regex.](https://gitlab.com/PhilippHofer/ris/) +- [Tried to create a parser using print-website, proper(-ish) parser](https://gitlab.com/PhilippHofer/ris2)