Interpret content type if unknown
Description
Attachments
is depended on by
is related to
Activity

Matthew Jones March 16, 2017 at 11:14 AM
I believe that all/some of these can be marked as excluded in the pom, and this has grown significantly over time. It will just take away some functionality and work without them. In Tika 2 (when that's ever released) you have to explicitly add them rather than have them all added as part of this package.
I mean there's some of these that are probably much less useful like opengis, vorbis, junrar and probably others.
https://tika.apache.org/1.8/gettingstarted.html
Sounds like a new issue to clean that up a little?
Matthew Buckett March 16, 2017 at 10:35 AM
Side note, tika-parsers which this change introduced has a huge dependency list.

Jose Mariano Lujan June 25, 2014 at 7:56 AM
Hi, could someone who is 'watcher' of this jira take a look at https://jira.sakaiproject.org/browse/SAK-26574 to verify that Sakai 10 release notes need that update.
thanks!

Hudson CI Server April 7, 2014 at 1:48 PM
Integrated in sakai-10-java-1.7 #19 (See http://builds.sakaiproject.org:8080/job/sakai-10-java-1.7/19/)
merge 307352 307353 307374 307640 307652 to 10.x (Revision 307862)
Result = SUCCESS

Joshua Swink April 3, 2014 at 4:57 PM
Tika is correctly assigning application/vnd.ms-powerpoint to the powerpoint files. I tried it with the normal extension and with .txt. It assigned the correct type each time. It's a pity that LibreOffice goes by the filename extension, like Quicktime. But Tika is looking good.
Some combinations of OS/browser/files lead to unhelpful content types, e.g. uploading a PDF from Firefox 3 on WinXP can give a content-type of application/binary as discussed here:
http://techblog.procurios.nl/k/618/news/view/15872/14863/Mimetype-corruption-in-Firefox.html
Both IE7 and FF3 upload a .rtf file with a content-type of application/msword rather than application/rtf.
In these cases, it would be helpful for content hosting to look at the file extension and change the content type appropriately (possibly trusting the browser-supplied content-type last rather than first).