Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(routes/html): add support for xhtml documents #1787

Merged
merged 2 commits into from
Feb 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Docsmith is a RESTful API, built using Node.js and the [Fastify](https://fastify
| DOC | TXT | DOT file variant supported |
| DOCX | HTML | DOCM, DOTM, and DOTX file variants supported |
| DOCX | TXT | DOCM, DOTM, and DOTX file variants supported |
| HTML | TXT | |
| HTML | TXT | XHTML file variant supported |
| PDF | HTML | |
| PDF | TXT | Scanned documents supported using OCR |
| RTF | HTML | Images are removed[^1] |
Expand Down
3 changes: 2 additions & 1 deletion src/routes/admin/healthcheck/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ const cors = require("@fastify/cors");

const { healthcheckGetSchema } = require("./schema");

const accepts = ["text/plain"];
// Cache supported media types so not having to navigate schema object each time
const accepts = Object.keys(healthcheckGetSchema.response[200].content);

/**
* @author Frazer Smith
Expand Down
3 changes: 2 additions & 1 deletion src/routes/doc/txt/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ const docToTxt = require("../../../plugins/doc-to-txt");

const { docToTxtPostSchema } = require("./schema");

const accepts = ["text/plain"];
// Cache supported media types so not having to navigate schema object each time
const accepts = Object.keys(docToTxtPostSchema.response[200].content);

/**
* @author Frazer Smith
Expand Down
3 changes: 2 additions & 1 deletion src/routes/docs/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ const staticPlugin = require("@fastify/static");

const { docsGetSchema } = require("./schema");

const accepts = ["text/html"];
// Cache supported media types so not having to navigate schema object each time
const accepts = Object.keys(docsGetSchema.response[200].content);

// Cache immutable regex as they are expensive to create and garbage collect
const pathRegex = /\/redoc\.standalone\.js(?:.map)?/u;
Expand Down
1 change: 1 addition & 0 deletions src/routes/docs/openapi/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ const cors = require("@fastify/cors");

const { docsOpenapiGetSchema } = require("./schema");

// Cache supported media types so not having to navigate schema object each time
const accepts = docsOpenapiGetSchema.produces;

/**
Expand Down
3 changes: 2 additions & 1 deletion src/routes/docx/html/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ const docxToHtml = require("../../../plugins/docx-to-html");

const { docxToHtmlPostSchema } = require("./schema");

const accepts = ["text/html"];
// Cache supported media types so not having to navigate schema object each time
const accepts = Object.keys(docxToHtmlPostSchema.response[200].content);

/**
* @author Frazer Smith
Expand Down
3 changes: 2 additions & 1 deletion src/routes/docx/txt/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ const docxToHtml = require("../../../plugins/docx-to-html");

const { docxToTxtPostSchema } = require("./schema");

const accepts = ["text/plain"];
// Cache supported media types so not having to navigate schema object each time
const accepts = Object.keys(docxToTxtPostSchema.response[200].content);

/**
* @author Frazer Smith
Expand Down
3 changes: 2 additions & 1 deletion src/routes/html/txt/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ const cors = require("@fastify/cors");

const { htmlToTxtPostSchema } = require("./schema");

const accepts = ["text/plain"];
// Cache supported media types so not having to navigate schema object each time
const accepts = Object.keys(htmlToTxtPostSchema.response[200].content);

/**
* @author Frazer Smith
Expand Down
41 changes: 37 additions & 4 deletions src/routes/html/txt/route.test.js
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
/* eslint-disable security/detect-non-literal-fs-filename -- Test files are not user-provided */

"use strict";

const { readFile } = require("node:fs/promises");
Expand Down Expand Up @@ -30,14 +32,29 @@ describe("HTML-to-TXT route", () => {

afterAll(async () => server.close());

it("Returns HTML file converted to TXT", async () => {
it.each([
{
testName: "HTML file",
filePath: "./test_resources/test_files/html_valid.html",
headers: {
"content-type": "text/html",
},
},
{
testName: "XHTML file",
filePath: "./test_resources/test_files/xhtml_valid.xhtml",
headers: {
"content-type": "application/xhtml+xml",
},
},
])("Returns $testName converted to TXT", async ({ filePath, headers }) => {
const response = await server.inject({
method: "POST",
url: "/",
body: await readFile("./test_resources/test_files/html_valid.html"),
body: await readFile(filePath),
headers: {
accept: "application/json, text/plain",
"content-type": "text/html",
...headers,
},
});

Expand Down Expand Up @@ -66,7 +83,23 @@ describe("HTML-to-TXT route", () => {
expect(response.statusCode).toBe(400);
});

it("Returns HTTP status code 415 if body is not a valid HTML file", async () => {
it.each([
{
testName: "is not a valid HTML file",
body: Buffer.from("test"),
headers: {
"content-type": "text/html",
},
},
{
testName: "is not a valid XHTML file",
body: Buffer.from("test"),
headers: {
"content-type": "aapplication/xhtml+xml",
},
},
]);
it("Returns HTTP status code 415 if body $testName", async () => {
const response = await server.inject({
method: "POST",
url: "/",
Expand Down
75 changes: 75 additions & 0 deletions src/routes/html/txt/route.test.js.snap
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,81 @@ Maecenas mauris lectus, lobortis et purus mattis, blandit dictum tellus.



In non mauris justo. Duis vehicula mi vel mi pretium, a viverra erat efficitur. Cras aliquam est ac eros varius, id iaculis dui auctor. Duis pretium neque ligula, et pulvinar mi placerat et. Nulla nec nunc sit amet nunc posuere vestibulum. Ut id neque eget tortor mattis tristique. Donec ante est, blandit sit amet tristique vel, lacinia pulvinar arcu. Pellentesque scelerisque fermentum erat, id posuere justo pulvinar ut. Cras id eros sed enim aliquam lobortis. Sed lobortis nisl ut eros efficitur tincidunt. Cras justo mi, porttitor quis mattis vel, ultricies ut purus. Ut facilisis et lacus eu cursus.


Cras fringilla ipsum magna, in fringilla dui commodo a.





Lorem ipsum Lorem ipsum Lorem ipsum
1 In eleifend velit vitae libero sollicitudin euismod. Lorem
2 Cras fringilla ipsum magna, in fringilla dui commodo a. Ipsum
3 Aliquam erat volutpat. Lorem
4 Fusce vitae vestibulum velit. Lorem
5 Etiam vehicula luctus fermentum. Ipsum





Etiam vehicula luctus fermentum. In vel metus congue, pulvinar lectus vel, fermentum dui. Maecenas ante orci, egestas ut aliquet sit amet, sagittis a magna. Aliquam ante quam, pellentesque ut dignissim quis, laoreet eget est. Aliquam erat volutpat. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Ut ullamcorper justo sapien, in cursus libero viverra eget. Vivamus auctor imperdiet urna, at pulvinar leo posuere laoreet. Suspendisse neque nisl, fringilla at iaculis scelerisque, ornare vel dolor. Ut et pulvinar nunc. Pellentesque fringilla mollis efficitur. Nullam venenatis commodo imperdiet. Morbi velit neque, semper quis lorem quis, efficitur dignissim ipsum. Ut ac lorem sed turpis imperdiet eleifend sit amet id sapien





I am a footer"
`;

exports[`HTML-to-TXT route Returns XHTML file converted to TXT 1`] = `
"I am a header

Lorem ipsum






Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc ac faucibus odio.





Vestibulum neque massa, scelerisque sit amet ligula eu, congue molestie mi. Praesent ut varius sem. Nullam at porttitor arcu, nec lacinia nisi. Ut ac dolor vitae odio interdum condimentum. Vivamus dapibus sodales ex, vitae malesuada ipsum cursus convallis. Maecenas sed egestas nulla, ac condimentum orci. Mauris diam felis, vulputate ac suscipit et, iaculis non est. Curabitur semper arcu ac ligula semper, nec luctus nisl blandit. Integer lacinia ante ac libero lobortis imperdiet. Nullam mollis convallis ipsum, ac accumsan nunc vehicula vitae. Nulla eget justo in felis tristique fringilla. Morbi sit amet tortor quis risus auctor condimentum. Morbi in ullamcorper elit. Nulla iaculis tellus sit amet mauris tempus fringilla.

Maecenas mauris lectus, lobortis et purus mattis, blandit dictum tellus.

* Maecenas non lorem quis tellus placerat varius.

* Nulla facilisi.

* Aenean congue fringilla justo ut aliquam.

* Mauris id ex erat. Nunc vulputate neque vitae justo facilisis, non condimentum ante sagittis.

* Morbi viverra semper lorem nec molestie.

* Maecenas tincidunt est efficitur ligula euismod, sit amet ornare est vulputate.















In non mauris justo. Duis vehicula mi vel mi pretium, a viverra erat efficitur. Cras aliquam est ac eros varius, id iaculis dui auctor. Duis pretium neque ligula, et pulvinar mi placerat et. Nulla nec nunc sit amet nunc posuere vestibulum. Ut id neque eget tortor mattis tristique. Donec ante est, blandit sit amet tristique vel, lacinia pulvinar arcu. Pellentesque scelerisque fermentum erat, id posuere justo pulvinar ut. Cras id eros sed enim aliquam lobortis. Sed lobortis nisl ut eros efficitur tincidunt. Cras justo mi, porttitor quis mattis vel, ultricies ut purus. Ut facilisis et lacus eu cursus.


Expand Down
2 changes: 1 addition & 1 deletion src/routes/html/txt/schema.js
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ const htmlToTxtPostSchema = {
description:
"Returns the result of converting a HTML document to TXT format.",
operationId: "postHtmlToTxt",
consumes: ["text/html"],
consumes: ["application/xhtml+xml", "text/html"],
produces: ["application/json", "application/xml"],
response: {
200: {
Expand Down
3 changes: 2 additions & 1 deletion src/routes/pdf/html/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ const pdfToHtml = require("../../../plugins/pdf-to-html");

const { pdfToHtmlPostSchema } = require("./schema");

const accepts = ["text/html"];
// Cache supported media types so not having to navigate schema object each time
const accepts = Object.keys(pdfToHtmlPostSchema.response[200].content);

/**
* @author Frazer Smith
Expand Down
3 changes: 2 additions & 1 deletion src/routes/pdf/txt/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ const pdfToTxt = require("../../../plugins/pdf-to-txt");

const { pdfToTxtPostSchema } = require("./schema");

const accepts = ["text/plain", "text/html"];
// Cache supported media types so not having to navigate schema object each time
const accepts = Object.keys(pdfToTxtPostSchema.response[200].content);

/**
* @author Frazer Smith
Expand Down
3 changes: 2 additions & 1 deletion src/routes/rtf/html/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ const rtfToHtml = require("../../../plugins/rtf-to-html");

const { rtfToHtmlPostSchema } = require("./schema");

const accepts = ["text/html"];
// Cache supported media types so not having to navigate schema object each time
const accepts = Object.keys(rtfToHtmlPostSchema.response[200].content);

/**
* @author Frazer Smith
Expand Down
3 changes: 2 additions & 1 deletion src/routes/rtf/txt/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ const rtfToHtml = require("../../../plugins/rtf-to-html");

const { rtfToTxtPostSchema } = require("./schema");

const accepts = ["text/plain"];
// Cache supported media types so not having to navigate schema object each time
const accepts = Object.keys(rtfToTxtPostSchema.response[200].content);

/**
* @author Frazer Smith
Expand Down
Loading
Loading