Jena SDB IRI validation
I have got several strange IRIs that I want to insert into Jena SDB, but I got some error messages:
- http://example.org/text/1234#offset_2311_2317_10-12% the error message is: Code: 30/ILLEGAL_PERCENT_ENCODING in FRAGMENT: The host component a percent occurred without two following hexadecimal digits.
- http://example.org/text/5678#offset_365_370_NDZ#2 the error message is: Code: 0/ILLEGAL_CHARACTER in FRAGMENT: The character violates the grammar rules for URIs/IRIs.
- http://example.org/text/7890#offset_8872_8878__"Fren the error message is: Code: 4/UNWISE_CHARACTER in FRAGMENT: The character matches no grammar rules of URIs/IRIs. These characters are permitted in RDF URI References, XML system identifiers, and XML Schema anyURIs.
The string 10-12%, NDZ#2 and _"Fren are extracted from plain text document and I have to attach it directly at the back of the IRIs. So my question is: are they valid IRIs? If not, considering I need to attach plain text at the back of IRIs, how can I convert them to valid IRIs?
1 is wrong because it ends in % -- % is for hex encoding so it must be %xx
Encode the % -- use %25
2 is wrong because it has two fragments. USe %23 is you mean # as a charcater, not as a fragment
3 has " in it. Encode that.
Spaces are a bad idea as well. Use %20.