Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design APIs #4

Open
defanor opened this issue Jan 24, 2019 · 15 comments
Open

Design APIs #4

defanor opened this issue Jan 24, 2019 · 15 comments

Comments

@defanor
Copy link
Collaborator

defanor commented Jan 24, 2019

Basic search is implemented now, but we need stable and specified APIs (DB schema, HTTP requests + {XML/XHTML, RDF, text} ones).

This issue continues Lyrics/lyrics.github.io#15.

@defanor
Copy link
Collaborator Author

defanor commented Jan 24, 2019

Currently search.pl handles 3 parameters: artist, album, and title. Additional parameters could be used to affect the following:

  • How the filtering gets performed (maybe adding exact matching).
  • Output format (XHTML, XML, text, etc).
  • Whether to output the best matching song's lyrics, more/all matches (as it is now), to list matching titles/albums/artists, or some combination of those.

Preferably they should be orthogonal.

@C0rn3j
Copy link
Collaborator

C0rn3j commented Jan 24, 2019

Just to clarify, current output is XHTML and lyrics.wikia.com uses XML?

Would explain why am getting the middle finger from a plugin that used lyrics.wikia.com when trying to make it work with lyrics-api

xmlpp::TextReader reader{api_url} >> Cannot instantiate underlying libxml2 structure

I guess if I wanted to use it in the current state with XHTML only I'd have to somehow clean up the HTML crust to just have the XML file?

Looks like I should be using something like Tidy before processing it on the HTML.

So yeah, I practically answered my own questions as I wrote this comment, but if I got something wrong, please do correct me.

EDIT: Looks like Tidy is used to clean up non-compliant XHTML, which this project hopefully isn't, but I still can't figure out how to feed XHTML to the parser >.>

@defanor
Copy link
Collaborator Author

defanor commented Jan 24, 2019

To be precise, search.pl currently serves HTML 5 with XHTML concrete syntax. XHTML is basically HTML serialized as XML, which should be usable by both regular web browsers and XML tools (and once we'll add RDFa -- also by RDF tools, covering 3 ways to access it with a single format). HTML 5 also requires <!DOCTYPE html>, not sure whether it may be an issue for some XML readers. And maybe we should set encoding in the XML processing instruction.

Can you please experiment with the used reader (by editing the document and trying to parse it), maybe get more error information out of it?

@C0rn3j
Copy link
Collaborator

C0rn3j commented Jan 24, 2019

It seems to work if I plop it into a file.xml and load the file instead of the URL with an identical file.

Think I should be using DomParser instead of TextReader somehow but I either suck at googling this or the info is sparse.

@defanor
Copy link
Collaborator Author

defanor commented Jan 24, 2019

If an XML reader retrieves the file on its own, maybe it expects the application/xml content type HTTP header, rather than application/xhtml+xml. But it also should be trivial to serve simplified XML once we'll add format selection: documents get composed as plain XML now, and then the XSLT stylesheet gets applied to turn it into (X)HTML. So the difference between the two will be just in a header, and whether to apply that stylesheet/transform/template or not.

@C0rn3j
Copy link
Collaborator

C0rn3j commented Jan 24, 2019

I tried the headers hack but that wasn't it. I'll keep trying for a bit but I guess in the end I'll just wait till there's XML support in lyrics-api

@C0rn3j
Copy link
Collaborator

C0rn3j commented Jan 24, 2019

I nailed it down to HTTP vs HTTPS.

Works:

xmlpp::TextReader reader{"http://lyrics.wikia.com/api.php?action=lyrics&fmt=xml&artist=boa&song=duvet"};

Doesn't:

xmlpp::TextReader reader{"https://lyrics.wikia.com/api.php?action=lyrics&fmt=xml&artist=boa&song=duvet"};

I'll discuss this on xmlpp mailing list I guess.

@defanor
Copy link
Collaborator Author

defanor commented Jan 24, 2019

I just pushed a commit adding format parameter handling: now plain xml can be requested. XML document schema is not stable yet, and an "experimental" namespace URI is used (urn:x-lyrics); we probably should define and host the schema somewhere, using that URI instead, but it's not critical for development.

This was referenced Jan 24, 2019
@defanor
Copy link
Collaborator Author

defanor commented Jan 25, 2019

Since FTS was replaced with exact matching on preprocessed text (and likely aliases in the future), we won't need the first group of parameters (controlling how to match, since it's more straightforward now). As for the third group, the output can mostly depend on query results (a listing when there are multiple matches, "no results" when there's none, showing lyrics when there's one), but probably we'll still need a parameter to request returning 404 if nothing is found, as described in #5. Maybe it should also limit query results to a single one, or return an error if there are multiple ones. Not sure yet.

I'm going to check how other lyrics search APIs work, maybe that'd give a clearer idea: some common parameters can be identified and then reused to mimic those APIs, as well as to use for regular queries.

@defanor
Copy link
Collaborator Author

defanor commented Jan 25, 2019

Some websites/APIs return 404, some serve regular documents with "not found" messages in place of lyrics or elsewhere. So, to match their output, we'll need 2 parameters: a stylesheet/template to use, and whether to return 404 if nothing is found. While input can be handled with nginx (for instance) rewrites, content types can also be adjusted there when needed. Then we could both mimic other services, and tweak the parameters to alter the regular API behaviour. With tweakable XSLTs it won't be necessary to introduce a parameter governing whether to list matches or to show lyrics.

Update: Actually we already have the format parameter, could just use that for template selection. Then we'll need just one additional parameter, with 5 parameters total.

@defanor
Copy link
Collaborator Author

defanor commented Jan 25, 2019

Added the errors parameter and adjusted the format one (so that stylesheets/templates can be selected with it). It should be sufficient for input part of the API for now.

As for output, there are templates to design/write/adjust, including ones for mimicking other services' APIs.

@defanor
Copy link
Collaborator Author

defanor commented Jan 26, 2019

The url element is added into XML now, providing a relative reference. And it's used by the default stylesheet for listings, but maybe it should be split into separate components (for more flexible links), and there's currently no guarantee that those links will be unambiguous. In practice they should be, and we can set a UNIQUE constraint on (search_artist, search_album, search_title), but then will have to reconsider it once there will be aliases.

@defanor
Copy link
Collaborator Author

defanor commented Jan 26, 2019

Regarding RDF embedding: we're focusing on lyrics (and there's mo:Lyrics), and have at least song title, album name, and artist name. Lyrics can be associated with a mo:MusicalWork, a subclass of frbr:Work. Seems to be distinct from mo:Track, which is used by both MO's XSPF RDFizer and xiph's/XSPF's XSPF.xsl, which encode similar data, except for lyrics. They also employ FOAF, which has some generic properties and can indeed easily be attached/used, though they use strings (names) in place of foaf:Agent. It doesn't seem right, even though gets used that way from time to time.

By the way, MO examples use MusicBrainz to link the artists, but MusicBrainz only embeds some metadata in ld+json, which doesn't seem to be widely supported (not supported by librdf in particular, and AFAIK it's merely RDF-compatible, not quite one of serialization formats). Perhaps wouldn't harm to link them as an alternative, if we'll be fetching links to them in the future, but not very usable or easy to link right now.

I think we'll need to properly attach artist/album/song names to lyrics, possibly in different ways (using different ontologies/relations, that is), and perhaps will have to introduce separate artist and album IRIs that would be consistent across lyrics pages/search results (so, not just #artist).

Perhaps better to focus on other interfaces for now, since it's tricky and not immediately useful.

@defanor defanor pinned this issue Jan 26, 2019
@defanor defanor unpinned this issue Jan 26, 2019
@defanor
Copy link
Collaborator Author

defanor commented Jan 26, 2019

Managed to mimic lyrics.wikia.com for Clementine with it, it's pretty easy. Maybe will prepare such XSLTs for a few more websites, and push them along with nginx configs, but mimicking other services' interfaces can be counted as ready.

Going to add a textual interface next, and then we could bikeshed XML and XHTML structures, add some light styling to the web interface, etc.

@defanor
Copy link
Collaborator Author

defanor commented Jan 26, 2019

format=text gets handled now, similarly to other templates (using an XSLT). Further API adjustments shouldn't require search.pl changes, and should be achievable by tweaking the format/*.xsl files and httpd configs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants