|
||
---|---|---|
.gitea/workflows | ||
data | ||
src | ||
.gitignore | ||
.gitlab-ci.yml | ||
Cargo.lock | ||
Cargo.toml | ||
download_all_pars_from_list.sh | ||
README.md |
RISolve
Folder
- ./data
- cache -> cache for
overview
tests - expected
- overview -> expected xml links of law_ids
- cache -> cache for
Add new law text
Tests
- Getting paragraphs from
law_id
(risparser::overview::test::parse()
)- Create file
law_id
in./data/expected/overview
(then run tests to get current output + save in file)
- Create file
Features (to be moved to lib.rs one-by-one)
- Text to structured law
LawBuilder
: Structure law, by specifying (sub-)sections (new_header
), its description (new_desc
), paragraphs under the current (sub-)section (new_par
), and the description of the next paragraph (new_next_para_header
).Classifier
need to be set.- Main output: Properly structured laws (
Law
)
- Main output: Properly structured laws (
Law
: Represents a structured law text. Can be generated withLawBuilder
.- Main output: properly formatted (md for a start) law text, no need to export Heading/... etc
- RIS Fetcher (to be mocked)
- all paragraphs of specific law (
overview
) - xml document from url (
par/mod.rs fetch_age
)
- all paragraphs of specific law (
- Parser
- replace errors w/ config file
Integration test
- Nice test would be to re-create html ris file and compare it (problem with custom fixes, though)
History
- I've created my first parser using RIS API, daily updated. Failed because I tried to do too much automatically (e.g. recognizing headers
- Using print-website, I've extracted stuff w/ regex.
- Tried to create a parser using print-website, proper(-ish) parser
Goals
- I want to have the text of the law.
- I want to see the structure (proper headers) of the law.
- I want to be able to make comments (e.g. Erschöpfung) on certain parts
- I want to see since when this paragraph is in use.
- [~] Lawtext should be updateable
Technical
- I don't want to restrict myself with a parser combinators but code it myself using recursive descent parser.
- Be strict in what I process. Fail if anything unexpected happens. The user should handle this case. It's fine if one decides to ignore the new/unexpected field, but it should be done deliberately.
Progress / Functions
- Parse structure of law into struct using Deserilize trait, pot. multiple requests (if > 100 paragraphs)
- Parse risdok using own RD parser, again strict: fail if anything not expected happens, not sure (yet) if I want to operate on strings, or first parse using off-the-shelve XML reader (prob. 2nd option)
Next step
- Parse ABGB
- Create config files for laws
- law_id
- replace stuff
- headers
- Create argument parse
--law mschg.conf
Naming
- Law ("Gesetz"): e.g. UHG, TEG, ABGB
- Section ("Paragraph")
- Subsection ("Absatz")
- Item ("Ziffer")
- Heading-{1,2,3,...}
"Scripts"
- Retrieve overview law:
curl -X POST "https://data.bka.gv.at/ris/api/v2.6/Bundesrecht" -H "Content-Type: application/x-www-form-urlencoded" -d "Applikation=BrKons" -d "Gesetzesnummer=10001899" -d "DokumenteProSeite=OneHundred" -d "Seitennummer=1" -d "Fassung.FassungVom=2023-11-03" | jq . > law.json